Recently I equipped my nginx combined upstreams module with a new entity that I named upstrand. An upstrand is an nginx configuration block that can be defined in the http clause after upstream blocks. It was designed to combine upstreams in super-layers where all upstreams keep their identity and do not get flattened down to separate servers. This can be useful in multiple areas. Let me rephrase here an example from the module documentation.
Imagine a geographically distributed system of backends to deliver tasks to an executor. Let the executor be implemented as a poller separated from the backends by an nginx router proxying requests from the poller to an arbitrary upstream in a list. These upstreams (i.e. geographical parts) may have multiple servers within. Let them send HTTP status 204 if they do not have any tasks at the moment. If an upstream has 10 servers and the first server sends 204 No tasks upon receiving a request from the poller then other 9 servers will presumably send the same status in a short time interval. The nginx upstreams are smart enough to skip checking all servers if the first server returns status 204: the status will be sent to the poller and then the poller must decide what to do next.
This scheme has several shortcomings. Remember words arbitrary upstream that I intentionally emphasized in the previous paragraph? Nginx cannot choose arbitrary upstreams! It can combine several upstreams into a bigger upstream but in this case the poller may sequentially receive up to 10 204 responses from the same upstream trying to get a next task. The concrete value (10 or less) of repeated requests depends on how the bigger upstream combines separate servers from the geographical upstreams. On the other hand the poller may know the topology of the backends and send requests to concrete upstreams. I even do not want to explain how bad this is. And again it will send new requests when already polled upstreams have no tasks.
What if encapsulate all the polling logic inside the proxy? In this case the proxy router would initiate a new request to another upstream itself if the previous server responds with 204 No tasks. It would eventually send back to the poller a new task or the final 204 status when there were no tasks in all upstreams. The single request from the poller and no need for knowledge about the backends topology downstream the router!
This was a good example of what the new upstrand thing can accomplish. Let’s look at a simple nginx configuration that involves upstrands.
worker_processes 1;
error_log /var/log/nginx/error.log info;
events {
worker_connections 1024;
}
http {
upstream u01 {
server localhost:8020;
}
upstream u02 {
server localhost:8030;
}
upstream b01 {
server localhost:8040;
}
upstream b02 {
server localhost:8050;
}
upstrand us1 {
upstream ~^u0;
upstream b01 backup;
order start_random;
next_upstream_statuses 204 5xx;
}
upstrand us2 {
upstream ~^u0;
upstream b02 backup;
order start_random;
next_upstream_statuses 5xx;
}
server {
listen 8010;
server_name main;
location /us1 {
proxy_pass http://$upstrand_us1;
}
location /us2 {
rewrite ^ /index.html last;
}
location /index.html {
proxy_pass http://$upstrand_us2;
}
}
server {
listen 8020;
server_name server01;
location / {
return 503;
}
}
server {
listen 8030;
server_name server02;
location / {
return 503;
}
}
server {
listen 8040;
server_name server03;
location / {
echo "In 8040";
}
}
server {
listen 8050;
server_name server04;
location / {
proxy_pass http://rsssf.com/;
}
}
}
Upstrand us1 imports 2 upstreams u01 and u02 using a regular expression (directive upstream regards names as regular expressions when they start with tilde) and backup upstream b01. While searching through the upstreams for a response status that may be sent to the client the upstrand performs 2 cycles: normal and backup. Backup cycle starts only when all upstreams in the normal cycle responded with unacceptable statuses. Directive order start_random indicates that the first upstream in the upstrand after the worker started up will be chosen randomly, next upstreams will follow the round-robin order. Directive next_upstream_statuses lists HTTP response statuses that will sign to the router to postpone the response and send the request to the next upstream. The directive accepts 4xx and 5xx statuses notation. If all servers in normal and backup cycles responded with unacceptable statuses (like 204 No tasks in the above example) the last response is sent to the client.
Upstrand us2 does not differ a lot. It imports same u01 and u02 upstreams in normal cycle and b02 in backup cycle. Each upstream in this configuration has a single server. Servers from upstreams u01 and u02 simply return status 503, server from b01 says In 8040, server from b02 proxies the request to a good old days site rsssf.com (which is a great place for soccer stats!) that I chose because it won’t redirect to https and will send back a huge response: a nice thing to test if buffering of the response won’t break it while going through the proxy router and filters of the combined upstreams module.
Let’s look how it works. Start nginx with the above configuration and a sniffer on the loopback interface (better in another terminal).
nginx -c /path/to/our/nginx.conf
ngrep -W byline -d lo '' tcp
Run a client for location /us1.
curl 'http://localhost:8010/us1'
In 8040
Nice. We are in the backup upstream b01. However it shows too little information. That’s why we ran a sniffer. Here is its output.
######
T 127.0.0.1:44070 -> 127.0.0.1:8010 [AP]
GET /us1 HTTP/1.1.
User-Agent: curl/7.40.0.
Host: localhost:8010.
Accept: */*.
.
#####
T 127.0.0.1:37341 -> 127.0.0.1:8030 [AP]
GET /us1 HTTP/1.0.
Host: u02.
Connection: close.
User-Agent: curl/7.40.0.
Accept: */*.
.
##
T 127.0.0.1:8030 -> 127.0.0.1:37341 [AP]
HTTP/1.1 503 Service Temporarily Unavailable.
Server: nginx/1.8.0.
Date: Fri, 18 Sep 2015 12:58:01 GMT.
Content-Type: text/html.
Content-Length: 212.
Connection: close.
.
<html>.
<head><title>503 Service Temporarily Unavailable</title></head>.
<body bgcolor="white">.
<center><h1>503 Service Temporarily Unavailable</h1></center>.
<hr><center>nginx/1.8.0</center>.
</body>.
</html>.
########
T 127.0.0.1:50128 -> 127.0.0.1:8020 [AP]
GET /us1 HTTP/1.0.
Host: u01.
Connection: close.
User-Agent: curl/7.40.0.
Accept: */*.
.
##
T 127.0.0.1:8020 -> 127.0.0.1:50128 [AP]
HTTP/1.1 503 Service Temporarily Unavailable.
Server: nginx/1.8.0.
Date: Fri, 18 Sep 2015 12:58:01 GMT.
Content-Type: text/html.
Content-Length: 212.
Connection: close.
.
<html>.
<head><title>503 Service Temporarily Unavailable</title></head>.
<body bgcolor="white">.
<center><h1>503 Service Temporarily Unavailable</h1></center>.
<hr><center>nginx/1.8.0</center>.
</body>.
</html>.
########
T 127.0.0.1:55270 -> 127.0.0.1:8040 [AP]
GET /us1 HTTP/1.0.
Host: b01.
Connection: close.
User-Agent: curl/7.40.0.
Accept: */*.
.
##
T 127.0.0.1:8040 -> 127.0.0.1:55270 [AP]
HTTP/1.1 200 OK.
Server: nginx/1.8.0.
Date: Fri, 18 Sep 2015 12:58:01 GMT.
Content-Type: text/plain.
Connection: close.
.
In 8040
#####
T 127.0.0.1:8010 -> 127.0.0.1:44070 [AP]
HTTP/1.1 200 OK.
Server: nginx/1.8.0.
Date: Fri, 18 Sep 2015 12:58:01 GMT.
Content-Type: text/plain.
Transfer-Encoding: chunked.
Connection: keep-alive.
.
8.
In 8040
.
0.
.
The client sent the request to port 8010 (our router’s frontend). Then the router started the normal cycle: it proxied the request to the upstream u02 (8030 server) which responded with unacceptable status 503, after that the router tried the next upstream u01 from the normal cycle with the only 8020 server and received 503 status response again. Finally the router started backup cycle and received response In 8040 with HTTP status 200 from 8040 server that belonged to upstream b01, This response was sent back to the client and shown on the terminal. I won’t show results of testing location /us2: the response was very large while the scenario is essentially the same.
To search through the sniffer output is boring. Let’s better show what we are visiting during waiting for response on the terminal. An addition of a simple location
location /echo/us1 {
echo $upstrand_us1;
}
into the frontend 8010 server will make our testing easy. (Beware that older versions of the echo module compiled for nginx-1.8.0 will behave badly in the next tests! I used latest version v0.58 and it worked fine.)
Restart nginx and run some curls again.
curl 'http://localhost:8010/echo/us1'
u02
curl 'http://localhost:8010/echo/us1'
u01
curl 'http://localhost:8010/echo/us1'
u02
curl 'http://localhost:8010/echo/us1'
u01
curl 'http://localhost:8010/echo/us1'
u02
Hmm. Servers from the normal cycle following each other in the round-robin manner. Upstrand us1 must return response from the first upstream in normal cycle whose status is not listed in next_upstream_statuses. Directive echo normally returns 200 and this status is not present in the list. Thus the list of the upstreams shown on the terminal corresponds to first upstreams in the normal cycle: they start randomly and proceed in the round-robin sequence. Let’s now show the last upstreams chosen during the two cycles. As soon as servers from upstreams u01 and u02 in the normal cycle will always return unacceptable status 503 we must expect that upstream b01 from backup cycle will always be the last. To check this we will add status 200 which is normally returned by echo into the list of directive next_upstream_statuses in the upstrand us1.
upstrand us1 {
upstream ~^u0;
upstream b01 backup;
order start_random;
next_upstream_statuses 200 204 5xx;
}
Run curls.
curl 'http://localhost:8010/echo/us1'
b01
curl 'http://localhost:8010/echo/us1'
b01
curl 'http://localhost:8010/echo/us1'
b01
Nice, as expected. And now let’s show all upstreams the router tries before sending the last response. For such a task there is another upstrand block directive debug_intermediate_stages (do not use it in other cases besides testing because it is not designed for normal usage in upstrands).
upstrand us1 {
upstream ~^u0;
upstream b01 backup;
order start_random;
next_upstream_statuses 200 204 5xx;
debug_intermediate_stages;
}
Restart nginx and run curls again.
curl 'http://localhost:8010/echo/us1'
b01
u02
u01
curl 'http://localhost:8010/echo/us1'
b01
u01
u02
curl 'http://localhost:8010/echo/us1'
b01
u02
u01
curl 'http://localhost:8010/echo/us1'
b01
u01
u02
Looks interesting. The last upstream is shown first and the first upstream is shown last. Do not worry, this is an artefact of the upstrand implementation.
And now I want to say a little bit about the implementation. All standard and third-party directives like proxy_pass and echo that magically traverse upstreams in an upstrand refer to a magic variable whose name starts with upstrand_ and ends with the upstrand name. Such magic variables are created automatically in the configuration handler ngx_http_upstrand_block() for each declared upstrand. The handler of these variables ngx_http_upstrand_variable() creates the combined upstreams module’s request context with imported from the module’s main configuration indices of the next upstreams in both normal and backup cycles and shifts the main configuration’s indices forward for the future requests.
The module installs response header and body filters. The header filter checks whether the response context exists (i.e. an upstrand variable was accessed) and if so it may start up a subrequest: this depends on the response status of the main request or the previous subrequest if they have been already launched. In nginx a subrequest runs the location’s content handler again. Remember that the both proxy and echo directives referred to our magic variable? This means that their content handlers accessed the variable’s handler on every subrequest because the upstrand variables had attribute nocacheable. The variable’s handler might feed the subrequests with the shifted index of the next upstream in the upstrand after checking whether to move to the backup cycle or finish the subrequests when all upstreams in the both cycles had been exhausted. In the latter case the final response was extracted from the subrequests’ response headers and body buffer chains and returned to the client. Want to know how?
In case when the subrequests have been launched, original headers of the main request are replaced by headers of the last subrequest. Thanks to running subrequests from the header filter the response bodies nest in reverse order (this explains why our last test listed visited upstreams from last to first). This makes possible to break feeding next body filters with older responses bodies following the last response body and thus to return the body of the last subrequest. The beginning of the older response bodies is recognized by the last buffer in the last response’s buffer chain. Replacing the main request output headers and the response body buffer chain by those from the last response does the trick.