By default Druid keeps all the historical versions of Segments in deep storage (HDFS/S3). In a case of multiple segment versions for the same data, we can manually delete the old data and keep the latest segments by using the Kill Task on the interval to be cleaned:
http://druid.io/docs/latest/ingestion/tasks.html#kill-task
This will remove any unused segments in the specified interval from the metadata storage and from deep storage, which have disabled flag (used==0) in the Druid segment table.
The kill task only deletes disabled segments, the most recent versions of segments have used==1 (not disabled) in the segment table, and will be kept intact.
Druid also has autoKill feature if property druid.coordinator.kill.on is turned on (http://druid.io/docs/latest/configuration/coordinator.html). However, please use this feature in great caution since it may lead to permanent data loss. For example, in case a user accidentally disables a datasource and then autoKill runs.
Comments
0 comments
Please sign in to leave a comment.