Enonic with millions of nodes

Enonic version: 7.15.0
OS: Ubuntu 20.04.1

Hello! I’m structuring a system based on Enonic where part of it is content-oriented and part is node-oriented. The content part, which I manage via Content Studio is ok. It has around 1500 objects, which gives me no problem. But the node-part of my system is giving me some pain due to the large amount of nodes I have to deal, which seems to be bigger than Enonic can hold.I have nearly 8 milions of XML files that have some core-data extracted to create the nodes I insert on Enonic, but after some time the server is up, I have some errors like:

  • StorageDaoImpl.java - 155 - Cannot store node index
  • AdapterActionFuture.java - 70 - Timeout waiting for task.
    etc…

Based on that, I would like to ask some things:

  • Is Enonic able to handle this amount of data in the node api or should I make some external resource to do that, like a stand-alone database where Enonic just connects and query data?
  • If it is possible to work entirely with Enonic without the need of external resources, there’s some configuration I can make to handle better this amount of entries?

Any help is appreciated. Thanks!

The messages you get indicate the system is overloaded somehow. What capacity does the infrastructure you are deploying to actually have Cpu/Ram/Disk?

Do you have any metrics for CPU/Memory etc?

It’s a m5.large EC2 instance with 2 CPUs and 8GB RAM running Ubuntu 20.04.
I’ve just found a background task that uses the HTTP Client Lib to make requests to another server is creating lots of threads and not closing them after the request is done. Here’s the thread dump:
https://pastebin.com/raw/nBxzDgr5
And here is some metrics that I get when using the System-info app on the metrics tab:
https://pastebin.com/raw/pZXAutu8

You will probably need to up your node size, try doubling it to XL first. Remember 2vCPU is typically the same as a single CPU core.

Also, if your app is using lib http, make sure it is upgraded to latest greatest, as older versions tend to bleed threads

After finding that multi-thread OkHttp client (which was on the latest version already), I changed that task that does lots of http requests to use Apache’s CloseableHttpClient, which avoided the creation of the many unclosed threads and til now (2 hours since the latest deploy) the server is handling everything fine. Before, when using lib-http-client, in 10~15 minutes the server went full of CPU usage, which is not happening anymore.

Update:
Changing from the http-client lib to invoking CloseableHttpClient made possible to the server handle this amount of node objects with no problem in the last ~15h since it closes the Thread just after the response body is get.