Health Check / Load Balancers

In a production configuration of Cloud CMS, you will want to have a load balancer (LB) in front of both your API and UI clusters. You should have one load balancer for the API cluster and one load balancer for the UI cluster.

Each load balancer is responsible for receiving requests from the outside world and efficiently distributing those requests to the N servers that make up each cluster.

Load balancers use a number of strategies to try to determine which servers are most readily "available" to handle the next request. To figure this out, load balancers often make HTTP calls to each server to get back a status code (200) that indicates the server is healthy and available. More advanced load balancers may make further use of response information to determine just "how healthy" a server is.

If a Load Balancer receives negative information about a server (i.e. non-200 error codes) for a period of time, it may elect to pull that server out of service. Cloud CMS API and UI servers are stateless - they can be pulled out of service and new servers can be added to their respective clusters on-the-fly.

API

When setting up a load balancer with the Cloud CMS API, we recommend the following HTTP endpoint:

GET /healthcheck

This will return a 200 if the server is healthy. The actual calculation about whether the server is healthy takes into account a number of internal metrics including memory usage, CPU utilization, available disk storage and open file handles. If any of these metrics are deemed to be failing (such as if memory is allocated and staying high over a long period of time and isn't being released), a non-200 may be returned.

In addition, this method will hand back a JSON object with information about the computed metrics. Here is an example of what that may look like:

{
    "healthy": true,
    "process-cpu-usage": 0.003007125105603858,
    "system-cpu-usage": 0.036103610361036105,
    "memory-usage-percentage": 0.1568696691605085,
    "max-file-handles": 10240,
    "open-file-handles": 701,
    "disk-usage-percentage[0]": 0.9851858949946486,
    "initialized": true,
    "ok": true
}

Where:

healthy - whether the server is healthy
process-cpu-usage - CPU utilization for Cloud CMS processes (across all CPUs/cores)
system-cpu-usage - CPU utilization for Cloud CMS + OS + other processes (across all CPUs/cores)
memory-usage-percentage - Memory utilization for Cloud CMS processes
max-file-handles - The maximum number of file handles configured
open-file-handles - The number of open file handles
disk-usage-percentage[x] - Disk/volume utilization for Cloud CMS volume mounts (indexed)
initialized - Whether Cloud CMS is running (true) or still starting up (false)

UI

When setting up a load balancer with the Cloud CMS UI, we recommend the following HTTP endpoint:

GET /healthcheck

This will return a 200 if the server is healthy. Unlike the API case, there are no additional computations calculated to take into account metrics around disk space, memory or CPU utilization. This is a straightforward check where a 200 will be returned if the server is healthy and a non-200 will be returned if the server is not.