Rolling update/upgrade

tommytusj · April 23, 2018, 7:16am

How well supported is doing a rolling update/upgrade in a cluster? Lets say from 6.14 to 6.15 with three nodes. Can you upgrade one server and restart it and then do the next one? Or do you always have to shut everything down?

Given there are no breaking changes to the core

rfo · April 23, 2018, 7:34am

I have done this without problems with a couple of customers running 3 node clusters, that wanted an upgrade without downtime on their services, both between minor versions and bugfix versions.

The only time there was a hickup was when there was changes to the internal ES structure in a previous version. I think it was a 6.8 version that could not be in a cluster with 6.9 version (it might have been 6.7 to 6.8 - but at least before 6.10, don’t remember but could check if it is important).

We discovered that it did not work in test, and in the future I think it will be mentioned in the upgrade notes if there is problems putting the new version in cluster with the old version, but @rmy / @tsi will have to confirm that.

tommytusj · April 23, 2018, 7:38am

How did you solve the upgrade? Just upgraded all the nodes and it was fine? or did everything crash?

That would be good to have in the release notes if a rolling upgrade is possible or not

Nat · April 23, 2018, 7:47am

Hi there,

you can try to test it on dev/stage server (or local machine). It’s more safer and you’ll see if you have any problems with your prod just by debugging it with no consequences. (I know, it might sounds obvious )

rfo · April 23, 2018, 7:51am

Everything went fine.

We monitor the /status/cluster endpoint to verify that the new node re-joins the cluster before we take down the next node. We also switch where traffic is sendt by configuring the load-balancer in front to always send traffic to a node that is online, we verify that a node don’t get any traffic by checking its request logs before we take it down for upgrade.

At one customer the xp version is set in a docker-compose like setup, so there we just updated one docker container at the time, it was shut down, upgraded the xp version, and then restarted and joined the cluster again. The cluster was meanwhile running and online with two nodes, this is ok since we have minimum_master_nodes=2 in a three node cluster. Then we repeated the process for the two other nodes, one at the time.

At another customer, we run on windows, with the xp_home folder separate from the xp_install folder, so there we stop the enonic xp service on one of the nodes, switch the xp_install folder with the new version, and restart the service and verifies that it have joined the cluster before we upgrade the next one.