Filed under: Automation, Continuous deployment, Linux, Tools, — Tags: Gatlin, HAProxy — Thomas Sundberg — 2017-03-29
I want to deploy new versions of an application with no downtime. It turns out to be a bit tricky. Here is one solution that sort of works.
I am not in control over the deployment process, all I can do is monitor an URL and stop sending traffic to it if there are errors.
I want to deploy small changes often to reduce the risk associated with large deploys. This is not a distributed system with lots of small services, it is a monolith that is redeployed often.
The solution is to have more than one server handling the load and divide the traffic between these servers. The technique is called load balancing and is not new. All I have to do is to setup a load balancer and configure it properly.
Load balancers work on layer 4, the transport layer. Or layer 7, the application layer. I want to load balance a web application so a layer 7 load balancer is what I need. The layers here refer to the OSI model.
Using HAProxy as a layer 7 load balancer does the trick.
The installation of HAPoxy is different on different systems, I installed it on an Ubuntu 16.04 like this:
apt-get install software-properties-common
add-apt-repository ppa:vbernat/haproxy-1.7
apt-get update
apt-get install haproxy
I found the instructions at https://haproxy.debian.net/ and was able to install the latest version, 1.7 as of this writing.
Installing HAProxy was the easy part, the real work was in tuning its configuration. I ended up with this
configuration in /etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
maxconn 2000
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 10000
timeout server 10000
frontend loadbalanser
stats enable
stats uri /admin?stats
bind *:80
mode http
default_backend gfr
backend gfr
stats enable
stats uri /admin?stats
mode http
balance roundrobin
option forwardfor
http-request set-header X-Forwarded-Port %[dst_port]
option httpchk GET /service/foretag/6.0/ws?wsdl
server gfr1 l7700744.ata.ams.se:8580 check rise 8 downinter 30000ms observe layer7 on-error mark-down
server gfr2 l7700745.ata.ams.se:8580 check rise 8 downinter 30000ms observe layer7 on-error mark-down
The most important part is the two last lines. They specify two different servers that should handle the load.
option httpchk
defines how the check will be done
The real magic, and tuning, was to find values for the server specification so a deploy could be done while using the servers. I used the servers by adding some load generated using Gatling.
The health check was performed using an HTTP call to a url where I check if the wsdl for a web service
is available or not. If it isn't, the application isn't up and running.
The load balancing works. When a server responds with an error, that particular server is marked as down. It will
come back when the deploy is done and the expected wsdl is available again.
I still lose a few calls during deployment. With constant load, about twice the production load, I lose approximately ten calls per server when they are reinstalled. That's not good, but given that I'm not able to alter the deploy process, I guess it will have to do.
I wish I could find a setting that resends a failed call once to another server, but I can't find one that works.
The option redispatch
seemed promising, but it didn't work well for me. When I had option redispatch and retries
set I lost more traffic compared to not having them set.
If I could change the deploy process, I would change it so that the server that is about to be re-deployed is removed from the load balancer before the deploy. HAProxy is really good at reloading its configuration. A script that removes a server, reloads HAProxy's configuration, performs the deployment, adds the server again, and finally reloads the configuration would not be too hard to write. This would give me a real zero-downtime deployment. Not just short downtime deployment as I am able to achieve with this setup.
HAProxy works very well. It is possible to re-configure it during usage without losing traffic.
I would like to thank Malin Ekholm for proof reading.