This document will help to configure the segments/data deletion from Deep storage using coordinator kill option. Please find the steps below.
Configure the coordinator/runtime.properties with the following
druid.coordinator.kill.on=true
druid.coordinator.kill.durationToRetain=P30D
druid.coordinator.kill.maxSegments=100
druid.coordinator.kill.period=P1D
- druid.coordinator.kill.on, has to be set to true to trigger the kill task.
- druid.coordinator.kill.durationToRetain, data to be retained in the Deep storage.
- druid.coordinator.kill.maxSegments, maximum/total segments to be removed per kill task.
- druid.coordinator.kill.period, this determines how often to run the kill task.
The kill task will only delete the segments if they are marked as "unused" in the metadata-DB for the data-source. So before configuring the coordinator to kill/delete the segments from Deep storage, we have to define the retention policy for the data-source so that they are dropped from the historical nodes.
Once the above settings are configured, we need to restart all the master nodes in the cluster.
As per the above configuration, the kill task will be triggered once a day (P1D) and we can delete 100 segments in one task. This task might pick up segments older than 30 days from any of the Data-source.
Below are the steps if we want to specify the particular data-source to run the kill task.
- Navigate to "Coordinator dynamic config" from the druid console.
- Set Kill all data sources to "False"
- Add the data-source name in the "Kill data source whitelist" box.
- Save the config.
Here is the sample kill task which gets triggered once a day.
{
"type" : "kill",
"id" : "kill_parking-citations_2019-05-12T00:00:00.000Z_2019-05-13T00:00:00.000Z_2019-08-08T18:14:19.405Z",
"dataSource" : "parking-citations",
"interval" : "2019-05-12T00:00:00.000Z/2019-05-13T00:00:00.000Z",
"context" : { },
"groupId" : "kill_parking-citations_2019-05-12T00:00:00.000Z_2019-05-13T00:00:00.000Z_2019-08-08T18:14:19.405Z",
"resource" : {
"availabilityGroup" : "kill_parking-citations_2019-05-12T00:00:00.000Z_2019-05-13T00:00:00.000Z_2019-08-08T18:14:19.405Z",
"requiredCapacity" : 1
}
The indexing logs will have the segments which are being deleted.
2019-08-08T18:14:28,109 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Performi
ng action for task[kill_parking-citations_2019-05-12T00:00:00.000Z_2019-05-13T00:00:00.000Z_2019-08-08T18:14:19.405Z]: SegmentNuke
Action{segments=[parking-citations_2019-05-12T00:00:00.000Z_2019-05-13T00:00:00.000Z_2019-07-16T07:27:33.368Z]}
Once the kill task are successful, we can check the deep storage and metadata DB to confirm the segments are deleted successfully.
Comments
0 comments
Please sign in to leave a comment.