Introduction
In a typical Imply cluster, we have:
- Master services (Overlord, Coordinator)
- Data services (MiddleManager, Historical)
- Query services (Broker, Imply UI)
- Zookeeper servers
- Deep Storage (S3, HDFS, NFS etc)
- Metadata DB (MySQL, PostgresSQL etc) for Druid and Pivot if Imply is used
Migrating the cluster is simple since we only need to move (or maintain) data from Deep Storage and Metadata DB (Druid / Pivot). The other services contain only transient data, and they are repopulated anyway when the new cluster is started.
Following are the possible migration plans:
- Migration where there is no change in Deep Storage and the Metadata DB
- Migration where there is change in Deep Storage / Metadata DB
- Migration from Apache Druid to Imply
Multiple scenario's are presented in the rest of the article.
Scenario 1 : Old and new clusters share same Deep Storage and DB servers
Druid services need to be rebooted.
Steps:
- Configure the new clusters with the same Deep Storage path and DB server address as the old
- Stop the old cluster
- Make sure the old cluster is COMPLETELY down before starting new cluster, or Druid data can be corrupted
- Start the new cluster
Scenario 2 :
Old and new clusters use different Deep Storage and DB servers.
This scenario would apply if you are using the Imply Cloud and
- Old deep storage is accessible from new imply cluster
- Old metadata DB and new metadata DB is MySQL
- New deep storage is clean with no data
- New metadata DB is brand new and clean with no data
Steps:
- Configure the new clusters with different Deep Storage path and DB server address
- Copy tables "druid_config, druid_datasource, druid_supervisors, druid_segments" from old DB to new DB by following the next step
- mysqldump on the source metadata DB ( and Pivot if used)and then import the mysqldump into new target metadata DB. Output file is given .sql extension because it contains sql commands. The -p command in mysqlimport asks for a password
mysqldump -h <host_name> -u <user_name> -p --single-transaction
--skip-add-drop-table --no-create-info --no-create-db
<db_name> druid_config druid_dataSource druid_supervisors druid_segments > output_file.sql - Start the new cluster
- Coordinator automatically starts reading the old segments' metadata in the new DB, and then historical nodes load them from the old Deep Storage. Note that the data in old Deep Storage is kept intact.
- Old cluster will keep writing to the old metadata DB and old Deep Storage
- New cluster will write to the new metadata DB and new Deep Storage
Note #1: Once you do this migration, the old and new cluster will share the same data segment files in deep storage for any data ingested before the migration. (Data ingested after the migration will go to different files.) It is important to avoid running kill tasks (permanent data deletion) on datasources that may have segments between two clusters, because it will cause the clusters to delete each others' data.
Note #2: If the new Druid cluster shares the same ZooKeeper quorum as the old, it must use a different base znode path, by configuring the property druid.zk.paths.base in Druid's common.runtime.properties to a different name, such as /druid-newcluster. The default value is /druid.
Scenario 3 :
Old and new clusters use different Deep Storage and DB servers. This scenario would apply if you are using the Imply Cloud and
- Old deep storage is accessible from new imply cluster
- Old metadata DB and new metadata DB is MySQL
- New deep storage has some data in it
- New metadata DB has some data in it
Steps:
- Make sure there are no collisions in the paths between old deep storage and new deep storage
- If there are collisions, change the path of source deep storage to something else and give the same paths when you modify source mysqldump file of source metadata DB
- Copy the data from old deep storage to new deep storage
- configure the new clusters with different Deep Storage path and DB server address.
- mysqldump on the source metadata DB (and excluding the DDL
mysqldump -h <host_name> -u <user_name> -p --single-transaction
--skip-add-drop-table --no-create-info --no-create-db <db_name>
druid_config druid_dataSource druid_supervisors druid_segments > source_output_file.sql - We have to be careful in this scenario NOT to overwrite the target metadata DB
- Change the location of the segments in druid_segments table in the above source mysqldump file to point to new deep storage location
sed -i .bak 's/\\"bucket\\":\\"<old_segment_name>\\"/\\"bucket\\":\\"<new_segment_name>\\"/' /dir/source_output_file.sql
- copy tables "druid_config, druid_datasource, druid_supervisors, druid_segments" from old metadata DB to new metadata DB by following the next step
- import the above modified source mysqldump file into new target metadata DB
mysql -h <host_name> -u <user_name> -p <db_name> < /dir/source_output_file.sql
- Old cluster will keep writing to the old metadata DB and old Deep Storage
- New cluster will write to the new DB and new Deep Storage
Scenario 4 :
Old and new clusters use different Deep Storage and DB servers. This scenario would apply if you are using Imply Cloud and
- Old deep storage is NOT accessible from new imply cluster
- Old metadata DB and new metadata DB is MySQL
- New deep storage is clean with no data in it
- New metadata DB is brand new and has no data in it
Steps:
- Copy the data from old deep storage to new deep storage(may be using some staging area as an intermediate location)
- configure the new clusters with different Deep Storage path and DB server address.
- mysqldump on the source metadata DB
mysqldump -h <host_name> -u <user_name> -p --single-transaction
--skip-add-drop-table --no-create-info --no-create-db
<db_name> druid_config druid_dataSource druid_supervisors druid_segments > output_file.sql - Change the location of the segments in druid_segments table in the above mysqldump file to point to new deep storage location
sed -i .bak 's/\\"bucket\\":\\"<old_bucket_name>\\"/\\"bucket\\":\\"<new_bucket_name>\\"/' /tmp/output_file.sql
- Copy tables "druid_config, druid_datasource, druid_supervisors, druid_rules, druid_segments" from old DB to new DB by following the next step
- import the above modified source mysqldump file into new target metadata DB
mysql -h <host_name> -u <user_name> -p <db_name> < /dir/output_file.sql
- Drop druid_rules table from target mysql. This will be re-created once the cluster is started
- Start the new cluster.
- Coordinator automatically starts reading the old segments' metadata in the new DB, and then historical nodes load them from the NEW Deep Storage. Note that the data in old Deep Storage is kept intact.
- Old cluster will keep writing to the old DB and old Deep Storage.
- New cluster will write to the new DB and new Deep Storage.
Scenario 5 :
Old and new clusters use different Deep Storage and DB servers. This scenario would apply if you are using Imply Cloud and
- Old deep storage is NOT accessible from new imply cluster
- Old metadata DB and new metadata DB is MySQL
- New deep storage has some data in it
- New metadata DB has some data in it
Steps:
- Make sure there are no collisions in the paths between old deep storage and new deep storage
- If there are collisions, change the path of source deep storage to something else and give the same paths when you modify source mysqldump file of source metadata DB
- Copy the data from old deep storage to new deep storage(may be using some staging area as an intermediate location)
- configure the new clusters with different Deep Storage path and DB server address
- mysqldump on the source metadata DB excluding the DDL
mysqldump -h <host_name> -u <user_name> -p
--skip-add-drop-table --no-create-info --no-create-db <db_name>
druid_config druid_dataSource druid_supervisors druid_segments > source_output_file.sql - We have to be careful in this scenario NOT to overwrite the target metadata DB
- Change the location of the segments in druid_segments table in the above source mysqldump file to point to new deep storage location
sed -i .bak 's/\\"bucket\\":\\"<old_segment_name>\\"/\\"bucket\\":\\"<new_segment_name>\\"/' /dir/source_output_file.sql
- Copy tables "druid_config, druid_datasource, druid_supervisors, druid_segments" from old metadata DB to new metadata DB by following the next step.
- import the above modified source mysqldump file into new target metadata DB.
mysql -h <host_name> -u <user_name> -p <db_name> < /dir/source_output_file.sql
- Start the new cluster.
- Coordinator automatically starts reading the old segments' metadata in the new DB, and then historical nodes load them from the NEW Deep Storage. Note that the data in old Deep Storage is kept intact
- Old cluster will keep writing to the old metadata DB and old Deep Storage
- New cluster will write to the new DB and new Deep Storage
Scenario 6:
Apache Druid to Imply
All of the above scenarios can be considered for migrating from Apache Druid to Imply.
Guidelines
- Review Imply Release notes for changes in the new version
- Upgrade with Cluster restarts. Rolling upgrade is a possibility, but not preferred
- Move to an Apache Druid version matching with that of Imply's Druid Version before the migration. This is a required condition if a rolling upgrade needs to be performed
- Use the same configuration parameters for the services as the Apache Druid versions configuration after reviewing release notes for parameter changes
- if the `druid.segmentCache.locations` are changed, copy over the segment cache from the existing Apache Druid cluster before restarting services using the new Imply version
- While starting the cluster in Imply Version, following steps are preferable when the Segment Count is very high:
- Start all the Historicals and await Lifecyle to be started
- Start all the Master Services (Co-ordinator / Overlord) and await Lifecycle to be started
- Start Query Node services (Broker / Router) and await Lifecyle to be started
- Start Middlemanager service and await Lifecycle to be started
- Resume Supervisors / tasks
Comments
0 comments
Please sign in to leave a comment.