Hi, I'm Bradley, Ntuple backend developer!😀
Today, I'm going to talk about the retrospective on the introduction of the SyncTree Dynamic URL feature, Part 2. 'This is the reality of routing implementation!' If you missed the Part 1 post, check out the link below!
>> Part 1. Routing, How Exactly Do You Know?
In Part 1, I explained that how the Dynamic URL function was a novel yet tricky challenge. Part 2 shows the actual progress and final code of Dynamic URL-related tasks and tries to apply the theory. How could concepts such as 'Route Parameters' and 'URL Variables' be applied to the SyncTree? Let's start with the hope that it will be of some help to those who read this article to get a sense of 'This is what happens when I implement web application routing myself!'
Restrictions
SyncTree is in PHP, but 'a decent PHP library or two' couldn't simply solve this challenge. Back when there was no concept of a 'path variable', it was very simple to find BizUnit by the URI Path of the requested API. Since all routes are 'static routes' and duplicates are not possible, it was possible to find only one '=', but not anymore.
If a SyncTree user creates only one BizUnit by specifying /foo/{bar} in the Proxy Base Path, when API consumers request /foo/x, /foo/y, /foo/z, etc., all of them are /foo/{bar } (of BizUnit). How would you guys solve this problem? Some of you may think so.
'Can't we just search them all?
If the first segment of /foo/x is a variable, if the second is a variable, if both are variables, etc...
Wouldn't it take at least one if you search them all?'
Unfortunately, this method is difficult to adopt due to performance issues. MySQL's LIKE isn't very efficient in most cases, even if you only do it once. LIKE cannot be performed exponentially proportional to the number of ‘path segments’ each time for a proxy table that all users read and write together. For example, when GET /a/b/c/d/e is called, since 2 to the fifth power is 32, it means that you have to query the operating DB every time by grouping 31 WHERE OR LIKE searches, but that doesn't make sense.
Or maybe there are people who think like me at the beginning of the work.
Can't we just use simpleDispatcher?
I also had that kind of thought during 'the 1st Dynamic URL Technical Verification Meeting'. But I was the only one who was relaxed, and the head of the backend team, the head of the development department, and even the CTO stamped his feet. I just listened and said, 'Um... Anyway, I can't use simpleDispatcher... ’ I remember leaving the meeting room barely understanding.
SyncTree does not have route definition order
Why couldn't I use the fast-route library? As we looked at in Part 1., I think we can just implement it like this.
However, there is one important prerequisite for using fast-route. 'You must be able to call addRoute() in the order in which shadowing does not occur'. Like most other 'routing implementations' in fast-route, 'order' is important. Only after this condition is met, there is room for concern about using fast-route, but there is no order to create BizUnit in SyncTree.🙅
As with most CRUDs, when creating an APP or BizUnit in SyncTree STUDIO, 'A' There is no such constraint as 'Can't add B because it was added first'. In order to impose such cumbersome restrictions, we need restrictions and rules that users can understand and come up with countermeasures for. And in reality, it wasn't that easy. there was a situation like this.
Take a look at the next two routes. Do these two overlap or not? (* is a variable, it can be anything like a, b, c, etc.)
From a fast-route point of view, the two do not overlap and there is nothing to confuse. What if the user requested GET /foo/a/b? Of the two routes, the one addedRoute() handles first GET /foo/a/b. But that's the library's point of view, and at first glance, the two are mutually exclusive definitions, and neither can hold. So, at the “2nd Dynamic URL Technical Verification Meeting,” pros and cons were divided over this issue, and a lamentation burst out saying, “Oh, I’m confused.”
Originally, when creating/modifying/deleting APIs in SyncTree, the ordering rules have never been and cannot be enforced. However, if an order is suddenly introduced in BizUnit to implement ‘route variable’ and a function that addsRoute() is created in that order, wouldn’t it result in performance degradation as well as a barrage of inquiries from users who are confused as to why this does not overlap? So, after the 2nd meeting, the following two things became the key tasks.
In the end, it came to the conclusion that it should be solved with 'DB query'. Now, let's see the answer.
Solution
This is the additional table that introduced. Do you see the 'id' and 'parent_id' columns? This is a typical tree-structured table.
Now, when a SyncTree user sets the Base Path of /foo/{bar}/dee and creates BizUnit, INSERT the following data.
And in the proxy table, add a column that refers to this proxy_path.id column...
Use as below. (In this example, it should be '7'. Can you guess why?)
If you create one more BizUnit with /foo/dee as the Base Path at this time, add the following data to the proxy_path table and insert…
… Just set the proxy_path_id of the new proxy data to '8'.
In short, the idea is to operate a minimum tree with each ‘segment’ of a given path as a node, and search up to the root based on the ‘leaf node’ of the path to obtain the shortest path constituting the base path of a specific BizUnit.
It can look very difficult, so let's take a look at the picture. The data in the example so far constitutes the following tree.
Looking at this picture, can you tell which bizunit_id to look for when a user requests /foo/dee? Since the segment has a total of two stages, the number of cases will be the following two.
Checking the first case, when searching in the order of 5 → 6, there is no bizunit_id in the number 6 data, so it seems that the BizUnit with the path /foo/* has never been registered.
However, if you check the second case, you can find a proxy_path with id = 8 when you search in the order of foo → dee, and bizunit_id = 246 is assigned to the proxy_path. That's right! Even though an element called a path variable has been added, bizunit_id can be found even with a segment (only).
As a result, the query shown at the beginning has been replaced with the query below and is in operation. Compared to the lengthy process to get here, the final implementation itself is not too difficult.
Looking at the above query, it can be inferred as follows.
Ah! now the GET /foo/1/dee request came in!
It's looking for a proxy that can handle this request!
bizunit_id = 245!
Runtime
This query does not always run. Because there is no need for that.
For example, there is only one API request running BizUnit 146, GET /foo/dee, but you don't need to do two more JOIN to figure this out. Just look for 'Is there a Base Path definition with no variables?' If you can't find it, it's not too late to go there and search for variables.
So, part of the current SyncTree engine code is structured like this. The engine receiving the GET /foo/dee request completes the search by obtaining the $staticBizUnitProxy.
So what about $dynamicBizUnitProxy? What happens in the engine when this returns? For example, since /foo/1/dee actually substitutes 1 in place of {bar} in /foo/{bar}/dee, there must be an extraction operation implemented somewhere to find out the information bar=1, right? That's right. So what regular expression would that extract use? Turns out, I didn't even need a regular expression.
Let's see one more actual SyncTree engine source. (Even this section is not touched at all because there is no security issue.)
1. The path of the user request is segmented.
2. Segmentation of the path of the proxy in segments obtained by the query we saw earlier.
3. The presumption is that the number of 1 and 2 is the same. Error handling with appropriate description if different.
4. Let's look at each segment obtained in 2 from the beginning, and if the xth segment contains {, take the remainder of { and } as the variable name, and find the xth segment obtained in 1 and make it the value of the variable.
5. Returns a list of variable names-variable values obtained from 4. End.
Since this section has the following premises, it is possible to save energy while only responding to the minimum exception.
1. When saving BizUnit in STUDIO, it is forced to match the pair of parentheses of Base Path.
→ If parentheses are used, both are unconditionally present.
→ In order to check ‘whether parentheses were used?’, you only need to check one parenthesis.
2. In the proxy_path search query, the depth value uses the number of $urlSegments.
→ The number of $proxySegments of proxy_path found by this query must be the same as the number of $urlSegments.
→ Except for some extreme exceptions, there is reason to consider that the two are basically symmetrical to each other.
Note that an example of an 'extreme situation' is GET /foo//dee. See the two slashes? You need to put a value in the place of the variable, but if you don’t put it in and ‘skip’, this happens. It seems to be a safety device code to cope with even this situation.
What I learned
I would like to conclude this retrospective by writing what I felt through the above process.
When setting up rules, try to establish it 'for users' rather than 'code', 'developers', or 'system convenience'.
A typical example of a ‘rule for system convenience’ is ‘password length up to 16 characters’. Shouldn't the password be longer than 16 characters? No. Longer passwords are better for security. A 'passphrase' that is easy for users to memorize is better than a 16-character password that mixes all kinds of special characters.
Then, it is not good for security and also inconvenient for users too, why limit the maximum length?
This is for the convenience of developers. In the task of one-way encryption and storage of passwords, if the condition of ‘unlimited length’ is added, the task becomes a little more cumbersome. Although this specification definition is convenient for developers, there is a high probability that the inconvenience will be passed on to users.
I almost made that definition this time. Because at first I just planned to leave all the problems to the fast-route library. If I had actually done that, I would have been comfortable. However, the library requires strict conditions such as 'shadowing' or BizUnit definition/registration/interpretation order, right? SyncTree, which aims for a ‘Composable Platform’, cannot force users to experience such complexity and inconvenience. So, it was a bit more difficult from the developer's point of view, but I researched and studied hard and worked on the development. Among them, I learned this lesson.
'You have to keep in mind that customers and users can understand (how far) what rules and restrictions are needed (why)! You have to try to make a product that meets that level!
Technically, something may not be implemented or may be difficult. Conversely, there may be answers that seem too perfect. But I don't think you should force it on End Users. 🤷
️What about your web applications and services? Is there any work being done for the understanding and convenience of the development team rather than the understanding and convenience of the customer? I think it would be nice to take some time to think about it through this post.😉