How to properly shutdown an on-premise Imply Druid Cluster

Permanently deleted user

Updated April 12, 2019 01:38

There are some instances where you want to do a planned shut down of your on-premise Imply Druid cluster such as network maintenance, operating system updates, or hardware maintenance.

Below are the steps to do so.

Assumptions:

The Imply cluster is running colocated services as the following:

Pre-requisites:

Notify your end users the cluster will be shut down, hence they are aware services being powered by Imply will not be operational.
If you have a streaming ingestion, ensure retention policies for the source systems are longer than the planned downtime. For instance, if you have a downtime window of 4 hours, make sure that the retention policy is at least 8 hours to prevent any gaps in data.
Stop all ingestions jobs.

*NOTE make sure to have a backup of the ingestion spec since you may need to resubmit the ingestion jobs.

Steps

Stop the Query Servers (Broker, Pivot, Router). On the Query Server machine log in as the user running the Imply druid processes. Then run the command bin/service --down. Verify the service is no longer running by executing a ps -ef | grep Imply.
Stop the Data Servers (Middle Manager, Historical) On the Data Server machine log in as the user running the Imply druid processes. Then run the command bin/service --down. Verify the service is no longer running by executing a ps -ef | grep Imply.
Stop the Master Servers (Overlord, Coordinator, Zookeeper). On the Master Server machine log in as the user running the Imply druid processes. Then run the command bin/service --down. Verify the service is no longer running by executing a ps -ef | grep Imply.