Migrate existing Druid Cluster to a new Imply cluster

Permanently deleted user

Updated February 08, 2022 00:15

Introduction

In a typical Imply cluster, we have:

Master services (Overlord, Coordinator)
Data services (MiddleManager, Historical)
Query services (Broker, Imply UI)
Zookeeper servers
Deep Storage (S3, HDFS, NFS etc)
Metadata DB (MySQL, PostgresSQL etc) for Druid and Pivot if Imply is used

Migrating the cluster is simple since we only need to move (or maintain) data from Deep Storage and Metadata DB (Druid / Pivot). The other services contain only transient data, and they are repopulated anyway when the new cluster is started.

Following are the possible migration plans:

Migration where there is no change in Deep Storage and the Metadata DB
Migration where there is change in Deep Storage / Metadata DB
Migration from Apache Druid to Imply

Multiple scenario's are presented in the rest of the article.

Scenario 1 : Old and new clusters share same Deep Storage and DB servers

Druid services need to be rebooted.

Steps:

Configure the new clusters with the same Deep Storage path and DB server address as the old
Stop the old cluster
Make sure the old cluster is COMPLETELY down before starting new cluster, or Druid data can be corrupted
Start the new cluster

Scenario 2 :

Old and new clusters use different Deep Storage and DB servers.

This scenario would apply if you are using the Imply Cloud and

Old deep storage is accessible from new imply cluster
Old metadata DB and new metadata DB is MySQL
New deep storage is clean with no data
New metadata DB is brand new and clean with no data

Steps:

Configure the new clusters with different Deep Storage path and DB server address
Copy tables "druid_config, druid_datasource, druid_supervisors, druid_segments" from old DB to new DB by following the next step
mysqldump on the source metadata DB ( and Pivot if used)and then import the mysqldump into new target metadata DB. Output file is given .sql extension because it contains sql commands. The -p command in mysqlimport asks for a password
```
mysqldump -h <host_name> -u <user_name> -p --single-transaction --skip-add-drop-table  --no-create-info  --no-create-db <db_name> druid_config druid_dataSource druid_supervisors druid_segments > output_file.sql
```
Start the new cluster
Coordinator automatically starts reading the old segments' metadata in the new DB, and then historical nodes load them from the old Deep Storage. Note that the data in old Deep Storage is kept intact.
Old cluster will keep writing to the old metadata DB and old Deep Storage
New cluster will write to the new metadata DB and new Deep Storage

Note #1: Once you do this migration, the old and new cluster will share the same data segment files in deep storage for any data ingested before the migration. (Data ingested after the migration will go to different files.) It is important to avoid running kill tasks (permanent data deletion) on datasources that may have segments between two clusters, because it will cause the clusters to delete each others' data.

Note #2: If the new Druid cluster shares the same ZooKeeper quorum as the old, it must use a different base znode path, by configuring the property druid.zk.paths.base in Druid's common.runtime.properties to a different name, such as /druid-newcluster. The default value is /druid.

Scenario 3 :

Old and new clusters use different Deep Storage and DB servers. This scenario would apply if you are using the Imply Cloud and

Old deep storage is accessible from new imply cluster
Old metadata DB and new metadata DB is MySQL
New deep storage has some data in it
New metadata DB has some data in it

Steps:

Make sure there are no collisions in the paths between old deep storage and new deep storage
If there are collisions, change the path of source deep storage to something else and give the same paths when you modify source mysqldump file of source metadata DB
Copy the data from old deep storage to new deep storage
configure the new clusters with different Deep Storage path and DB server address.

mysqldump on the source metadata DB (and excluding the DDL

mysqldump -h <host_name> -u <user_name> -p --single-transaction --skip-add-drop-table  --no-create-info  --no-create-db <db_name> druid_config druid_dataSource druid_supervisors druid_segments > source_output_file.sql

We have to be careful in this scenario NOT to overwrite the target metadata DB
Change the location of the segments in druid_segments table in the above source mysqldump file to point to new deep storage location
```
sed -i .bak 's/\\"bucket\\":\\"<old_segment_name>\\"/\\"bucket\\":\\"<new_segment_name>\\"/' /dir/source_output_file.sql
```
copy tables "druid_config, druid_datasource, druid_supervisors, druid_segments" from old metadata DB to new metadata DB by following the next step

import the above modified source mysqldump file into new target metadata DB

mysql -h <host_name> -u <user_name> -p <db_name>  < /dir/source_output_file.sql

Old cluster will keep writing to the old metadata DB and old Deep Storage
New cluster will write to the new DB and new Deep Storage

Scenario 4 :

Old and new clusters use different Deep Storage and DB servers. This scenario would apply if you are using Imply Cloud and

Old deep storage is NOT accessible from new imply cluster
Old metadata DB and new metadata DB is MySQL
New deep storage is clean with no data in it
New metadata DB is brand new and has no data in it

Steps:

Copy the data from old deep storage to new deep storage(may be using some staging area as an intermediate location)
configure the new clusters with different Deep Storage path and DB server address.

mysqldump on the source metadata DB

mysqldump -h <host_name> -u <user_name> -p --single-transaction --skip-add-drop-table  --no-create-info  --no-create-db <db_name> druid_config druid_dataSource druid_supervisors druid_segments > output_file.sql

Change the location of the segments in druid_segments table in the above mysqldump file to point to new deep storage location
```
sed -i .bak 's/\\"bucket\\":\\"<old_bucket_name>\\"/\\"bucket\\":\\"<new_bucket_name>\\"/' /tmp/output_file.sql
```
Copy tables "druid_config, druid_datasource, druid_supervisors, druid_rules, druid_segments" from old DB to new DB by following the next step

import the above modified source mysqldump file into new target metadata DB

 mysql -h <host_name> -u <user_name> -p <db_name>   <  /dir/output_file.sql

Drop druid_rules table from target mysql. This will be re-created once the cluster is started
Start the new cluster.
Coordinator automatically starts reading the old segments' metadata in the new DB, and then historical nodes load them from the NEW Deep Storage. Note that the data in old Deep Storage is kept intact.
Old cluster will keep writing to the old DB and old Deep Storage.
New cluster will write to the new DB and new Deep Storage.

Scenario 5 :

Old and new clusters use different Deep Storage and DB servers. This scenario would apply if you are using Imply Cloud and

Old deep storage is NOT accessible from new imply cluster
Old metadata DB and new metadata DB is MySQL
New deep storage has some data in it
New metadata DB has some data in it

Steps:

Make sure there are no collisions in the paths between old deep storage and new deep storage
If there are collisions, change the path of source deep storage to something else and give the same paths when you modify source mysqldump file of source metadata DB
Copy the data from old deep storage to new deep storage(may be using some staging area as an intermediate location)
configure the new clusters with different Deep Storage path and DB server address

mysqldump on the source metadata DB excluding the DDL

mysqldump -h <host_name> -u <user_name> -p --skip-add-drop-table  --no-create-info  --no-create-db <db_name> druid_config druid_dataSource druid_supervisors druid_segments > source_output_file.sql

We have to be careful in this scenario NOT to overwrite the target metadata DB
Change the location of the segments in druid_segments table in the above source mysqldump file to point to new deep storage location
```
sed -i .bak 's/\\"bucket\\":\\"<old_segment_name>\\"/\\"bucket\\":\\"<new_segment_name>\\"/' /dir/source_output_file.sql
```
Copy tables "druid_config, druid_datasource, druid_supervisors, druid_segments" from old metadata DB to new metadata DB by following the next step.

import the above modified source mysqldump file into new target metadata DB.

mysql -h <host_name> -u <user_name> -p <db_name>  <  /dir/source_output_file.sql

Start the new cluster.
Coordinator automatically starts reading the old segments' metadata in the new DB, and then historical nodes load them from the NEW Deep Storage. Note that the data in old Deep Storage is kept intact
Old cluster will keep writing to the old metadata DB and old Deep Storage
New cluster will write to the new DB and new Deep Storage

Scenario 6:

Apache Druid to Imply

All of the above scenarios can be considered for migrating from Apache Druid to Imply.

Guidelines

Review Imply Release notes for changes in the new version
Upgrade with Cluster restarts. Rolling upgrade is a possibility, but not preferred
Move to an Apache Druid version matching with that of Imply's Druid Version before the migration. This is a required condition if a rolling upgrade needs to be performed
Use the same configuration parameters for the services as the Apache Druid versions configuration after reviewing release notes for parameter changes
if the `druid.segmentCache.locations` are changed, copy over the segment cache from the existing Apache Druid cluster before restarting services using the new Imply version
While starting the cluster in Imply Version, following steps are preferable when the Segment Count is very high:
1. Start all the Historicals and await Lifecyle to be started
2. Start all the Master Services (Co-ordinator / Overlord) and await Lifecycle to be started
3. Start Query Node services (Broker / Router) and await Lifecyle to be started
4. Start Middlemanager service and await Lifecycle to be started
5. Resume Supervisors / tasks

druid_mysql.sql
(10 KB)