How to set the auto compaction config in the Druid console

Hemanth

Updated March 10, 2021 16:33

We have a few options that we can set in the Auto compaction config settings which determines how the segments will be compacted and published. You can access these by clicking the edit pencil next to the datasource in the Datasources tab of the druid console.

Screenshot_2020-10-05_at_2.58.13_PM.png

1. Input segment size bytes
The total amount of bytes that need to be taken as input for that interval is determined by this setting. So for example, if the segment granularity is Hourly and the total size of the segments in the hourly interval is 20MB then we can set the "Input segment size bytes" to 20MB.
Typically we can set this property a little higher than The total amount of bytes.

2. Skip offset from the latest
This skips the compaction for a certain amount of time. The default value for this property is P1D, so the compaction will trigger tasks for the intervals older than 24 hours. If the ingestion data source has backlog segments, then this property is helpful to avoid compacting those segments.
typically we set this property based on the hours of backlog expected from the source data into the Druid.

3. Max rows per segment
The total number of rows in the compacted segments is determined using this setting. By default, we recommend setting the number of rows to 5M. This is the segment size for optimal performance.

4. Task priority

If a task of a lower priority asks a lock later than another of a higher priority, this task will also wait for the task of a higher priority to release the lock. If a task of a higher priority asks a lock later than another of a lower priority, then this task will preempt the other task of a lower priority.
Typically the Realtime index task has more priority(75) than the Compaction task(25).

5. Tuning config

We can set the tuning parameters here in this section like maxRowsInMemory, maxBytesInMemory, maxNumConcurrentSubTasks and etc. Example of the tuning config

{
"maxRowsInMemory": 500000,
"maxBytesInMemory": null,
"maxTotalRows": null,
"splitHintSpec": null,
"indexSpec": null,
"maxPendingPersists": null,
"pushTimeout": null,
"maxNumConcurrentSubTasks": 1,
"type": "index_parallel"
}