Retrospect of Improving Runtime Performance for SyncTree 4.0 Update

We will review the efforts we have made to improve SyncTree Runtime performance and optimize the resources required for operation.
TECH
July 31, 2023

Hi, I'm Wally, Backend developer for Ntuple

Developing an enterprise solution like SyncTree feels like juggling. You shouldn't miss any of the balls called 'performance of implementation', 'operating cost', and 'multiple constraints'. In the SyncTree 4.0 update, we did our best to achieve the 'performance improvement' that SyncTree users most wanted. In this article, we will review the efforts we have made to improve SyncTree Runtime performance and optimize the resources required for operation.

Sharing some basic concepts for better understanding this article

Before discussing the SyncTree runtime changes, let's briefly explain the basic concepts needed to make the text easier to read. Each concept is a topic deep enough to write a book on, so please use my explanations only to understand the concept!

  1. Stateful / Stateless

In computer science, the state is the information a system has at a particular point in time. 'Stateful' means that the system maintains state changes that occur during interactions with the outside world, while 'Stateless' is used to mean that it does not maintain that state. When describing a server-client system using network communication, the criterion of 'Stateful/Stateless' is whether or not to maintain the state of successive requests from the same client within a certain period. At this time, the state maintained in the server is called 'Session'.

  1. Linux I/O models (Blocking/Non-Blocking | Synchronous/Asynchronous)

In computer science, input/output refers to the communication of a system with 'external systems' or 'human users'. Since information transmission cannot exceed the speed of light, communication with the outside inevitably occurs with a time delay. The system has to wait for the result of the communication, so I/O can be executed in a total of four ways by combining two options for process control and two options for simultaneous processing.

2-1. Blocking/Non-Blocking

It's a matter of who has control over the flow of execution. If the OS, which is the main body of I/O execution, takes over control of the execution flow and returns it after I/O is completed, it is 'Blocking', and if it returns control right even though it is not completed, it is 'Non-Blocking'

2-2. Synchronous/Asynchronous

It is a matter of synchronizing different execution flows. 'Sync' if the flow of execution for I/O proceeds in the order requested, 'Async' if the task is notified as soon as it completes, regardless of the caller's flow of execution.

  1. Cooperative multi-tasking

Also known as non-preemptive multitasking. A transition in the flow of execution is achieved by voluntary yielding of the process. Although deprecated at the operating system level, it is still a valid method at the execution flow control unit within a process.

3-1. Coroutine

A unit of work whose execution can be interrupted and resumed at any point. It separates what is actually one execution flow into several concurrent ones. Use for user thread implementation in preparation for kernel threads at the OS level.

Limitations of the Runtime on SyncTree 3.0

Now that the concept has been explained, let's begin the retrospective of the runtime improvement process in earnest. SyncTree is made in PHP. Ah, for the PHP story of SyncTree, please refer to the post 'Though it is PHP? No Problem!' series by 'Bradley', a backend developer who is working as a technical blog writer for Ntuple!😀

By nature PHP is stateless. To talk about why this is so innately, we first need to look back at PHP's past. Most of today's applications written in PHP run on top of php-fpm (FastCGI Process Manager), but in the past, when CGI was defined, web servers operated by running a process to handle requests, outputting the results, and then shutting down. Since a new process is launched for every request, performance was bound to be low due to process initialization and shutdown. On the other hand, FastCGI has an improvement over CGI in that it continues to run and receive requests instead of launching/terminating the process for every request.

Since there is no longer overhead for starting/terminating processes, request processing speed is improved. However, the property of returning all the resources created to process the request for each request was maintained. This feature allows you to focus on the business logic without thinking about the resource management issues of the process, but it still limits the implementation of high-performance applications because preparations for request processing must be restarted from scratch each time. Several PHP extensions have been created to solve this problem. As a scripting language, PHP has evolved by installing extensions into the language runtime to increase functionality rather than extending functionality with libraries.

After many reviews, our team ended up choosing an option called 'Swoole Extension' to improve the SyncTree runtime.

Process of exploring options to find the best solution

Let's take a look at some of the options we've reviewed for changing our application server.

  1. RoadRunner

RoadRunner is a high-performance PHP application server and process manager designed with extensibility through the use of plugins in mind. Developed in Go, RoadRunner runs applications in the form of workers. A worker is a process and guarantees isolation and independence while operating.

A brief introduction to RoadRunner is that it is a replacement for php-fpm. Instead of repeating the bootloading process, which is a drawback of php-fpm, memory-resident code handles requests. The advantage of this approach is that the PHP code changes are minimal, but the execution control is given to the server written in Go language, and the complex process is handled using Go language extensions, and the PHP process is still processed synchronously. This means that the increase in performance is limited.

  1. ReactPHP

ReactPHP uses a slightly different approach. It is to implement a server by handling sockets directly in PHP. By implementing the server using non-blocking event-driven async I/O, it achieves significant performance gains over the traditional php-fpm model. As we'll see, Swoole uses the same approach to address I/O overhead. However, the difference is that ReactPHP is a pure PHP library and Swoole is a PHP extension. Let's look at the following code.

This is a sample code for ReactPHP that reads a file from the file system and exports it to standard output. Since it is an asynchronous execution, we can see that we process the result of the file I/O into a callback function. It's not a big deal with simple business logic, but in practice you can't just do simple things. Several tasks will follow, such as opening a file, sending the file to another remote location, combining it with a file from another remote location and rewriting it, or analyzing the contents and saving them to a DB. Callbacks will be followed by callback functions. By the way, PHP doesn't have syntactic support like async/await yet. Similarly, there is a downside that you can suffer from callback hell, which was common in JavaScript code before ES2017 that did not support async/await syntax.

So, the ReactPHP project provides a separate component for providing async/await in the form of a function. To reap the performance benefits of asynchronous I/O, the code itself has to be reimplemented to be asynchronous. This is a story that makes it difficult to leverage legacy code.

  1. Swoole

So, how does Swoole handle the callback problem of asynchronous processing? Let's take a look at the Swool version of 'Reading a file and writing to stdout' that we saw in ReactPHP.

You have to use the coroutine/run() function to declare that the task should run inside the coroutine context, but the business logic code is much simpler. You just called a PHP built-in function.

As you can see, the anonymous function above is declared as a Swoole coroutine. Other than that, it seems to work the same as a normal PHP function. According to the normal PHP execution sequence, the execution of this function stops as soon as file_get_contents() is called, and the execution of the entire process stops until the result of the I/O request is ready in the OS kernel.

So what does Swoole do here? Swoole provides a feature called 'coroutine hook'.

Since the code above was executed as a Swoole coroutine, Swoole replaces the I/O built-in PHP functions with coroutine supported versions. The PHP built-in function file_get_contents(), which was implemented as Syc Blocking I/O, has been changed to a coroutine-based function by the operation of the Swoole extension. Therefore, the moment file_get_contents() is called, instead of blocking the entire process, control of the process passes to another coroutine. This is called yield in cooperative multitasking. At this time, when the I/O is completed, the original I/O requestor coroutine is called as a callback to continue execution. However, it is difficult to know the difference between general PHP code and operation just by looking at the sample code above. A clear difference is seen in overlapping execution of I/O. Let's look at the following code.

The code above reads two files and outputs each to standard output. In normal PHP execution, the next file reading operation will not be executed until the execution of the previously executed file reading is finished. However, in the Swoole coroutine environment, the above code is executed almost simultaneously, and the execution result will be different depending on which file finishes reading first. See registering coroutine execution with the go() function. If you know the Go language, I think you will think of Goroutine when you see the code above. It seems to have been partially influenced by Go's language design. FYI, since file_get_contents() can also handle HTTP URLs, code like the following is also possible:

When the above code is executed, the response from the server is output from the fastest one. Developers seeking implementation purity might look at coroutine hooks and say 'black magic'. However, this approach makes it possible to implement Async Non-blocking I/O without changing the interface of numerous I/O-related functions that already exist.

What we gained from adopting Swoole

While changing the application server to Swoole, we were able to minimize the change by matching the interface of the block execution engine itself with Swoole. Let's break down the major changes in order, looking at specific benefits.

  1. I/O Performance

SyncTree runtime execution is I/O intensive. If you look at real user cases, there are many mashups with other APIs or data reprocessing. Calling various data created by the user in SyncTree STUDIO for API execution is also I/O. Waiting for all these I/Os while the API is running won't be good, right?

All I/O is executed asynchronously as the SyncTree runtime has been changed to be based on Swoole. While an I/O occurs during API execution and the coroutine waits for the result, another coroutine takes over the execution flow and handles another API request. In terms of numbers, TPS has improved by about 400% compared to the existing SyncTree 3.0. Of course, there are still many areas for improvement.

  1. Async Task

One of the many requests from SyncTree users was 'Please allow long-running tasks to run in the background'. Accordingly, the latest runtime of SyncTree 4.0 supports concurrent computing based on coroutines.

Anything inside the AsyncTask block runs in the background. You can think of the API's main execution flow and background task as separate coroutines.

The AsyncTask block doesn't wait for the execution of Statements, it immediately moves on to the next one, and what it returns is thought of as a kind of Promise.

The AwaitTask block internally waits for the completion of execution of the AsyncTask through a Channel. Implementation was simplified by using the features supported by the platform.

  1. WebSocket

Unfortunately, SyncTree currently supports very limited debugger capabilities. Until now, it has been very difficult to support interactive debugging in the php-fpm model, and communication between SyncTree STUDIO and the debugger requires maintaining a persistent connection. As mentioned above, the php-fpm model must be stateless. But now the SyncTree runtime is stateful and can communicate directly with SyncTree STUDIO via WebSockets!

Based on this, an entirely new debugger function will be released in the second half of 2023.

Concluding the article...

This change in the SyncTree runtime execution model was very meaningful as it became the basis for further development of SyncTree as an integrated development environment (IDE). There are still improvements to be made, but useful features will be added in the future for a better user experience, so stay tuned as SyncTree evolves every day!

Keywords
Related Posts
TECH

03 Instagram Block - Using Instagram on my service?!

You can use the token to retrieve information about your Instagram account and media.
TECH

[SyncTree Dynamic URL Feature Introduction Retrospective] Part 2. This Is The Reality of Routing Implementation!

Part 2 shows the actual progress and final code of Dynamic URL-related tasks and tries to apply the theory.
August 28, 2023
TECH

04 KB Kookmin Bank Block - Using a Bank's APIs for sure?!

A block that enables to use KB Kookmin Bank banking service.