Connecting to the upstream backends

First, we will have to tell Nginx where it can find the upstream backend servers. You declare this in an upstream configuration block. At Firelay we put upstream configuration in a separate file. In this example that would be /etc/nginx/upstream.d/liferay.conf. We include files in this directory by adding include /etc/nginx/upstreams.d/*.conf; in the “http” block.

This is the upstream configuration file in our example:

upstream liferay_upstream {

    server 10.0.0.3:8080 fail_timeout=3m weight=2000000000;
    server 10.0.0.4:8080 fail_timeout=3m weight=1;

    ip_hash;

}

In Nginx, an upstream block starts with the keyword upstream, after which you give it a unique name. In our case we named it liferay_upstream.

Individual backends are declared with a server keyword with at the very least an address. In our case, this is an IP address plus port number where the Tomcat servers will be listening and accepting a connection, but if you are connecting to a locally installed Tomcat you could also use a UNIX socket as an address.

An upstream server declaration can accept many parameters. In this case, we picked a fail_timeout of 3 minutes. This means that if a backend is found to be unavailable, Nginx will not try to send requests to the unavailable backend for the next 3 minutes.

Backup parameter effects

The weight parameter of the server keyword is used to give a weighted preference to particular backends. Here we choose to maximize the likelihood that backend 10.0.0.3 will be used for normal requests as a way to configure failover behaviour. Alternatively you could skip setting this relative weight, but instead, use the parameter backup for 1 backend to indicate that this backend should only be used if all other backends are unavailable.

In testing, however, we noticed that users would receive an “HTTP 50x error page” when Nginx was switching over to the backup backend before actually getting the content returned from the backup backend. Therefore we don’t use the backup parameter. Using the maximum weight difference, in combination with a proxy_next_upstream setting mentioned, later on, we actually let Nginx reattempt sending the request to the backup backend and send back the content of that request instead of an error page. This way users would only be able to notice a few requests being slower when the primary backend fails, instead of receiving error messages.

The ip_hash keyword indicates that Nginx will look at the IP address of a visitor and will try to pick the same backend to send the requests to. This setting is more important if you load balance requests equally among the backend servers. In particular, when you are not using session replication this is pretty crucial. Otherwise, if you are logged in on one backend and your next request were to be sent to the other backend, then you probably are not logged into the Liferay portal on the other Tomcat. An additional benefit of using ip_hash is that Liferay’s and Tomcat’s caches are better utilized this way. (NOTE: In the case described above the effect, however, is limited, because effectively every request will be going to the primary backend under normal circumstances).

Adding caching to improve performance

Next up I’ll quickly mention one line of the configuration relating to caching. In the configuration file at /etc/nginx/conf.d/caching.conf we tell Nginx where and how it can store the data it will be caching.

proxy_cache_path /var/cache/nginx/liferay_cache levels=1:2 keys_zone=liferay_cache:10m inactive=60m max_size=256M;

Right after the keyword proxy_cache_path we define that Nginx can store data caches in the directory /var/cache/nginx/liferay_cache on the local filesystem. The levels parameter tells Nginx something about the form and number of subdirectories used to store data. A shared memory zone is used to store a bit of metadata about the requests being cached. Using keys_zone we define the internal name of this zone as liferay_cache and set its maximum size to 10 MegaBytes. If cached data isn’t used for a period of 60 minutes it will be removed from the cache as indicated by the keyword inactive. Finally, we tell Nginx to not cache more than 256 MegaBytes of data with the keyword max_size.

By using Nginx to cache certain requests you prevent the Liferay application having to deal with sending a response, but instead, Nginx will quickly look up the response in its cache and return the response directly. In fact, Nginx is nearly as fast as Varnish for caching. If you have more complex caching requirements then Varnish does offer a lot more in caching rule flexibility. With that flexibility also comes complexity and in some sense, another potential component which can fail in your stack.

Looking at the virtual host

Let’s round things up by looking at the server configuration for the virtual host in question. In this case, the configuration could be located in the file /etc/nginx/sites-available/example.firelay.com.conf.

server {
    listen        443 backlog=4096;
    server_name    example.firelay.com;
    access_log    /opt/www/sites/example.firelay.com/logs/access.log main_timed;
    
    # See https://mozilla.github.io/server-side-tls/ssl-config-generator/ for appropriate SSL settings
    ssl                 on;
    ssl_certificate     /opt/ssl/example.firelay.com/example.firelay.com.crt;
    ssl_certificate_key    /opt/ssl/example.firelay.com/example.firelay.com.key;
    add_header          Strict-Transport-Security max-age=15768000;
        
    location / {
        proxy_pass              http://liferay_upstream;
        proxy_set_header        X-Real-IP $remote_addr;
        proxy_set_header        Host $host;
        proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header        X-Forwarded-Proto $scheme;
        proxy_read_timeout      180s;
        proxy_connect_timeout    10s;
        proxy_redirect          http:// https://;
        proxy_next_upstream     error timeout invalid_header http_502 http_503 http_504;
        add_header              X-Cached $upstream_cache_status;
        proxy_cache_use_stale   off;
        proxy_cache             liferay_cache;
        gzip_comp_level         3;
        gzip_proxied            any;
        gzip_types              text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
    }
}

server {
    listen          80 backlog=4096;
    server_name     example.firelay.com;
    
    rewrite         ^ https://$server_name$request_uri? redirect;
}

The important part of this configuration is the location / block. By mentioning the name liferay_upstream at the proxy_pass keyword we are telling Nginx to use the backends we defined earlier. To make sure Liferay knows the specifics about the request as Nginx has received it, we use proxy_set_header to pass on some additional information about the request. The proxy_next_upstream keyword is used to tell Nginx that if it encounters the conditions defined after the keyword, that it should instead send the request again to the next available backend and return this response instead. This the key to gracefully switching over to the second backend if the first backend is unavailable.

Some performance tweaking

To improve performance there are also two noteworthy bits of configuration. With the proxy_cache keyword we are using the caching settings mentioned earlier. We’re not overriding any other caching settings, meaning that Nginx will use the request headers supplied by the Liferay backends to determine if requests can be cached. We added an X-Cached to be able to check easily if a request has been cached or not. In testing so far Liferay does an excellent job of setting HTTP headers correctly for requests which can be cached.

Taking advantage of these redundant features we were able to deploy critical patch relating to for instance the Heartbleed and the Ghost vulnerabilities as soon as possible without noticeable impact for end users of these clusters.

Further opportunities

Does this mean that we are satisfied? Never! I am sure we can and will be improving our deployments further. Testing has shown us that our setups are still struggling with the C10K problem. While we are halfway there (so in essence more like “C5K”), it would be nice to be able to cope with the onslaught of 10000 concurrent connections stampeding through our stack. I’m sure we can also do additional tweaking on the caching and compression configuration and squeeze a little bit more performance. Nginx’s commercial version (Nginx Plus) also appears to have many additional features tailored to improving high availability. It’s definitely worth investigating that a bit more. Also, as a stack is only as strong as its weakest component, we will be working on improving the other components like Liferay, Tomcat and the database continuously.

In the end, there will always be opportunities for improvement to strive for! 😉