[SyncTree Dynamic URL feature introduction retrospective] Part 1. Routing, How Exactly Do You Know?

I'll share with you what I've learned about 'routing' topic, so that other developers reading this won't have to go in circles like me
TECH
July 4, 2023
SyncTree Dynamic URLย Feature Introduction Retrospective - Routing, How Exactly Do You Know?

โ€

Hi, this is Bradley, a Ntuple backend developer!๐Ÿ˜€

As of February 2023, SyncTree APIs can now be developed with 'dynamic URLs'!

โ€

1. When creating or modifying a BizUnit, you can enter /{x}/ in the Proxy Setting > Base Path field.

2. Added a block coding method to get the value assigned to that variable. You can do this by fetching a request and writing Get Hashmap Variable in the corresponding Hashmap: request > pathVariables > variable name.

3. The result is an API that can appropriately handle any value that comes into the path variable's place, making it an API that supports "dynamic URLs".

It seems like a straightforward feature, right? But the process of modifying the engine and SyncTree STUDIO to implement this feature was neither easy nor straightforward, because making URLs dynamic can be considered "routing", and since SyncTree STUDIO doesn't have that in the first place, I had to implement routing from the ground up.

In today's post, I'm going to share with you what I've learned about this 'routing' topic, in the form of a 'theory fail' note, so that other developers reading this won't have to go in circles like me while studying, implementing, and developing routing...

Actually, "path variables" are a mystery

Let's start with a simple quiz. A user has sent the following HTTP request to your application


GET https://domain.tld/a/b/c


Where should your application route this request? Choose from the options below!

1. GET /{x}/b/c

2. GET /a/{y}/c

3. GET /a/b/{z}

4. GET /{a}/{b}/c

5. GET /{x}/{y}/{z}โ€

โ€

If you're thinking, "Aren't they all right?" or "There's no right answer?", then you're right.

I'm sure you know the reason why, but let me explain more... A given request could be made with an "a" in place of the first {x}, or it could be made with a "b" in place of the second {y}, so they're all possible answers. Conversely, if you say, "Oh, this must be routed to definition ~," that's a definite wrong answer.

The lesson of this case? Routing with route variables is a mystery in itself: there is no general solution to determining which route definition should be paired with a given request URL/URI, and if you're unlucky, it's impossible.

Here's another fun fact that illustrates the "no common solution" point. It's that the terminology around "route variables" isn't uniform: different web frameworks have subtly different ways of referring to this concept. Like the example below.


1. PHP Slim: Route Placeholder (Route Filler Character)

2. PHP Laravel: Route Parameters (Route Parameters)

3. Python Flask: URL Variables (URL Variables)

4. Python FastAPI: Path "Parameters or Variables" (Path "Parameters or Variables")

5. Java Spring: Path Variable (@PathVariable)

6. NodeJS Express: (doesn't have a separate path variable concept, it's integrated into the concept of "route path") ๐Ÿคฏ


There are common behaviors/features that everyone vaguely agrees "there is such a thing"

Web standards only have 'paths', not 'routing'

But if you look closely, you'll notice that even the "route" in the "path variable" is called "route" by some people and "path" by others. Why isn't even this unified? Is there a standard for it? Yes, surprisingly, there is no standard for the "route" of a web address.

"Web addresses" are a type of URI ("Unified Resource Identifier"), and the standards for them are well documented in a document called RFC 3986. But have you ever noticed that the English word "route" doesn't appear in that document? Sure, URLs are mentioned, and paths are mentioned a few times, but routing isn't even close.

If you think about it a bit, it makes sense: the URI convention is meant to be universally applicable to many protocols beyond HTTP, so it makes sense to minimize the requirements for "identifying the location of a resource." That's why it only defines a path as "the path to where a resource is located." It would have been even more confusing if the standard had added a rule that said, "If the protocol is this and the path is that, then you can do whatever you want with it," etc.

Looking at the two URIs above, I have no idea what I'm going to get with the first one, but with the second one, I know exactly what I'm going to get! It's going to be an a/b/c file in a user-only folder named x on ntuple.com's host machine, right? Because that's the URI for "file transfer protocol".

And actually, it was the same for HTTP in the beginning. Like FTP, on the old Internet, a "web address" was often a path to a file that actually existed on that server. This isn't necessarily the case today, and it's probably more accurate to say that the concept of "routing" was invented or named along the way.

Routing is an artificial rule that deals with virtual paths

Today's HTTP services have grown in demand, responsibility, and complexity: few websites can afford to serve just GET /parents.htm, and most need to serve something like GET /a/b/c, and they don't want to go through the trouble of creating folders a, b, and c to do it.

This is where web services shift from serving physical folders/files to configuring "virtual hosts" (as they're often called). We now have very clever applications that can make files/folders that don't actually exist look like they do. For example, if you have a web service consisting of Apache and PHP, you might be using a setup like the one below.

In a nutshell, this setting will, for the most part, prefix the path of your request with /index.php and reprocess it. For example, if you request a.com/about/parents, this will be treated as a request for a.com/index.php/about/parents (unless you actually have an about/ folder and a file called parents in it, hence the "if").

The path in an HTTP request is, in principle, a literal pointing to a real resource on a real path that exists. The URI convention itself specifies just that, and the early web was just that simple. But as web services became more sophisticated, instead of literally reading the PATH of a request and physically processing it, it became necessary to assign a virtual meaning to the path and interpret and process it arbitrarily. And it became possible! The result is the concept of "routing," which has become a staple of today's web services.

Now I think we can answer the initial question: if a web framework prefers to use the term route, routing, then the people who designed it are probably more focused on the development requirements and business objectives of "giving the user directions". A framework that chooses the term path is probably more focused on the technical rigor, history, and general principles of how it works.

Routing is up to the application

To summarize, when a web service "routes", it means that it reads the entire PATH of incoming requests and processes them as it pleases. At this point, let's review the quiz from the beginning. Obviously, we can't say for sure which of the five route definitions is the correct answer based on the initial conditions we were given. But if this request came into an application that had only two route definitions, like the one below, then the answer is definitely number 2. Because that's what this application has decided to do.

And that's just for this application: it can't handle GET /a/b/z. (You'll probably get a "404 Not Found" because you didn't create the /a/b folder and the z file.)

So is this the best we can do? No, you can still create an application that has all five of the above route definitions and still work fine! For example, if it's a PHP web application, you can utilize the nikic/fast-route library, as shown below.

Does anyone else find the result for $dispatch1 makes sense, but the result for $dispatch2 doesn't? Obviously, all 2-5 could be the answer, so on what basis does fast-route choose 1? That's a good question. The library decided on the following two things as its own feature specification

Duplicate issues - when looking at a route in "segment" order, any matching configuration of variables and constants is considered a duplicate. (We ignore the name of the route variable).

The "shadowing" issue - if you want to register a route definition with no route variables, you must make sure that there is not a route definition registered first that "shadows" it.

Number 1 is simple. The next two are redundant.

Routing implementation are mostly about 'order of definition'

fast-route checks from the very first registered route definition, so the code below will not work when using fast-route.

In this state, if a user requests PUT /w/x/y/z, the checking procedure for that request never goes to route-q and always ends up at route-p, because it interprets route-p as having "x" in place of a and "y" in place of b. Fast-route prevents this by saying that route-q is "in the shadow" of route-p and throwing an error when it tries to register it.

This is why our example code interpreted GET /a/b/c as GET /{x}/b/c. When we looked at the first route defined, we realized that GET /a/b/c is the same as GET /{x}/b/c with an a in place of the x, so all the route definitions after that were obscured and it was determined to be the get1 route.

So what if I'm implementing routing with fast-route, can't I register two routes like this and use them? You can. And it's really easy to do. You just need to reverse the order. If you look closely, you can see that route-q is no longer in the shadow of route-p, so we can expect it to match route-p only if we PUT /w/x/y/z on it.

This specification is generally applied not only to PHP fast-route but also to other routing implementations such as Python FastAPI. FastAPI even has a whole sub-section of its manual dedicated to this issue, titled "Order is important".

Why are most routing implementations making a rule that the order of routing definitions is important? I think it's because they want to allow as many route definitions as possible, and if you want to support routes that have fixed values in variable places, and you want to support routes that leave variable places as variables, you need some sort of rule that says, "Just make sure they're in the right order".

"Why can't we have both" requirements, which are pretty common in the real world.

GET /user/{user name}/posts โ†’ View other people's posts (list of posts by a specific user)
GET /user/my/posts โ†’ View my posts (a list of posts by the currently logged-in user)

A professional web application would support both of these features, and it's possible. If we could check the view my posts route before the view others route, we could show "view my posts" for URLs that look like they have "my" in place of {user name}. This is the kind of arbitrary rule we've seen so far, and it's something that SyncTree should be able to do as well.

So SyncTree decided to support this too! And with that decision came a bumpy ride...

Stay tuned for the Part 2, 'The Answer and commentary'

Keywords
Related Posts
COMMUNITY

[Nocode & AI Development Competition: Syncathon Season 2] 2nd Place Winner - MISO's To Do List

Today, we're going to introduce the 2nd Place Winner's service of Syncathon, <MISO's To Do List>
July 18, 2023
TECH

[Development Environment Improvement Retrospective] The importance of DX(Developer Experience) that boosts productivity

Today, I'd like to briefly introduce some of the things we've tried to improve DX(Developer Experience) at Ntuple!
October 4, 2023