Using enonic-xp in Google Cloud with kubernetes

trondeh · November 24, 2017, 2:34pm

This weekend I experimented with setting up enonic XP in google cloud using kubernetes. It proved to be a little tricky. I had no problems spinning up an enonic xp instance in the cloud, but I had problems with the data in this setup. When I first spun up a xp pod everything looked fine, and I created a site “catfacts.com” - Then I destroyed the pod so kubernetes would spin up my application again. When I accessed the site again, all my data was lost. I tried to delete the pod again, and then the “catfacts.com” site was there again.
My setup had a persistant disk, and a claim for 20gb that was reserved for XP_HOME and the blob storage.

Here is the yaml config I used for my setup: https://github.com/trondeh80/enonic-xp-kubernetes

I had an alternative idea to how I could solve this: If I were to spin up a master enonic XP node, and severeal pods for the actual application (LoadBalanced) I could make sure each pod had their own private XP_HOME, but shared a blob storage, and synched their databases with the master node(s).
This would require me to have the pods and master nodes on the same subnet, which is a beta feature in GCP today.

My question is, do you have any recommendations on to how one should setup a cluster in Google Cloud that will scale properly, or should I wait for the launcher you guys are creating?

tsi · November 27, 2017, 4:02pm

Hi Trond, thanks for reaching out!

At this moment Kubernetes is a bit new even to us We are working on a Google Cloud launcher image, that will use traditional VM’s initially. When this (and a cluster version is live) we plan to look into the container engine version (Kubernetes).

As I understand, you lost your data after deleting a pod, and then after deleting a pod once you got your data back again? Sounds more like a problem with the kubernetes setup than XP? Did you get any logs, like that the system was cleanly initialized on the second boot or similar?

Imho Kubernetes only really adds value when running an XP cluster. Simple respawns can be done with plain Docker setups etc. In principal the cluster deployment scheme described in our docs also applies here, but is automated and managed by Kubernetes. For Google Cloud specifically, and Kubernetes in particular the main issue would be getting access to a shared disk. The simplest possible solution would be to launch this image and mount it as the shared disk: https://console.cloud.google.com/launcher/details/click-to-deploy-images/singlefs

However… this would not be in the spirit of Kubernetes, or even optimal for failover and scaling. As such, a better solution might be using FUSE and mount a Google Cloud Storage instance as the shared disk: https://cloud.google.com/storage/docs/gcs-fuse. I am not exactly sure how much the overhead this implies, but it will definetly be slower than a “local” disk. XP requires just minimal file system features because of the “append only” strategy (no updates or locks etc), so this part should work without friction. Amazon has the Elastic File System, and as far as I see GC does not have something like that yet?

We are also open to input on making improvements to the XP core in order to optimize such deployments. So let us know your thoughts!

The local persistent disks for each pod would still need to be working naturally. It should hold the index files and config etc like you describe above. In a proper cluster deployment these data can be reconstructed automatically by XP should an instance die completely.

Hope this was of any help. Please share any progress you make on this issue as we are eager to provide the community with a way to go Kubernetes with XP

trondeh · November 29, 2017, 6:31pm

Thank you for such a good and in-depth answer.

You understood correctly about what happened when I destroyed my pod. The logs did not really reveal anything interesting as far as I could see. It did find its XP_HOME with the data I had populated there.
I had reserved a persistant disk that my pod would use. I checked and everytime my pod started up it mounted the same storage. What I did see was that in the folder: $XP_HOME\repo\index\data\mycluster\nodes
I suddenly had a new folder named “1.” (Could it be that the new pod is identified as a separate node or something?)

I tend to agree with you about setting up XP in a cluster is the only option that really adds value (at least as long as ES is so tightly coupled with XP). The problem with doing that is connecting the XP pods together so they will form the cluster. Seems like they must be on the same subnet in order to find eachother. In google cloud, having pods in the same subnet (CIDR), is a beta feature. (Please correct me if I’m wrong here)
Alternatively you could specify the hostnames of all the other pods in the zen config and then have a kubernetes config that connected the pods together, but this is not ideal if you want to spin up pods dynamically.

Ideally, you would have an elasticsearch cluster running separately and spin up xp pods that connected themselves to this datastore. I know that is not possible now, but I really really really hope you guys will detach elasticsearch from xp in the near future.

For Google Cloud specifically, and Kubernetes in particular the main issue would be getting access to a shared disk

Yes, I did set this up. You can use alot of different storage techniques to achieve this, what I did was that I “exposed” storage on my cluster-OS to the pods.

I am going to attempt the “xp clustering” (with googles CIDR beta feature) very soon, and will continue to document my progress in the github repo, and here.

Oh, and btw. Thanks for making an AWESOME cms guys

tsi · November 29, 2017, 7:48pm

The problem you got with the 1/ folder does sound like a known bug in Elasticsearch. Maybe @rmy can elaborate on this?

Now, the fact that XP uses Elasticsearch is internal workings. XP is very tightlly integrated with ES using the Java API’s and more. XP also uses the same clustering and transport mechanisms as ES and these are used for many different purposes beyond basic ES indexing - soon we are also adding memory grid to further strengthening the clustering capabilities of XP. Architecturally we want XP to be easier to deploy, upgrade and manage than traditional platform approaches, also it is faster as the data is just a clock cycle away.

Due to the way XP is built we are even able to replace ES with different technologies without affecting our customers. Also the cost of deployment would increase if ES had to be deployed on separate servers.

You really have to consider XP a database with embedded appserver (or visa versa). We know this is different from traditional approaches, but it also brings great benefits and simplicity.

You should also be aware that XP nodes (like ES) can be deployed as data-nodes or “clients”. If the data flag is set to false, the nodes will not have locally stored indexes etc, such nodes can be scaled up without worrying about issues related to this. So, for any cluster ranging beyond 3 nodes, additional nodes may be started without local data. What works best here for a given deployment might vary.

In the end, I guess the problems we face here are primarily related to discovery and tuning, the most robust solution so far has been using defined lists of ip’s for a given cluster (the multicast stuff is not really reliable, and we have changed the default mechanism for 6.13). I guess studying recommendations for ES/Kubernetes deployment would be a good approach for getting this done for XP too?

If I understand correctly, the way you shared your filesystem would require all your pods to be running on the same underlying VM, so that would not really work in a bigger deployment, right?

Thanks for using XP and supporting us - highly appreciated

trondeh · November 30, 2017, 7:50am

If I understand correctly, the way you shared your filesystem would require all your pods to be running on the same underlying VM, so that would not really work in a bigger deployment, right?

You understand correctly. My initial approach was just to see if I could get 1 single node up and running. I am pretty sure this problem will persist unless each pod has its own XP_HOME even if they are in the same deployment configuration.

You should also be aware that XP nodes (like ES) can be deployed as data-nodes or “clients”.

Yes, this is a very good point. This will be very useful for scaling. For now I will just focus on figuring out how I can create the cluster, and not have XP split my data upon pod reboots.

Due to the way XP is built we are even able to replace ES with different technologies without affecting our customers. Also the cost of deployment would increase if ES had to be deployed on separate servers.
You really have to consider XP a database with embedded appserver (or visa versa). We know this is different from traditional approaches, but it also brings great benefits and simplicity.

Ok, fair enough. It is awesomly simple to work with.