Reduced throughput from Kafka/Kinesis indexing service due to WARN "Ingestion was throttled.. because persists were pending"

Permanently deleted user

Updated December 18, 2017 21:00

This happens when the indexing thread is spending most of its time writing data to disk.

The way the indexer works is that it generates an in-memory index up to maxRowsInMemory in size and then spills that index to disk to free up memory once that threshold is hit. If it does this too frequently due to misconfiguration, throughput will be limited due to excessive disk I/O. To reduce the time spent writing to disk, please consider following changes:

1. Increase taskCount (up to the number of partitions in Kafka/Kinesis)
Bu default, all Kafka/Kinesis shards are handled by a single indexing task. By increasing taskCount, the supervisor creates more indexing tasks to parallelize the ingestion work, reducing the amount of work any single task has to handle. taskCount should ideally by a divisor of numPartitions and cannot be greater than numPartitions.

2. Increase maxRowsInMemory
This allows the index to grow larger in-memory before it is spilled to disk which reduces the amount of disk writes and hence the amount of time the indexing thread blocks waiting for the write to complete. Too low results in too many writes, but too high could cause the task to run out of memory.

3. Use a coarser segmentGranularity (e.g. change from HOUR to DAY)
This may also help reduce the number of persists, especially if the data is not time-ordered in the stream.

In general, the Kafka/Kinesis indexing service can generate a very large number of segments in Druid which will greatly affect query performance and can lead to other memory issues. For some documentation on why this happens, please refer to the docs for the Kafka indexing service :

http://druid.io/docs/0.11.0/development/extensions-core/kafka-ingestion.html#on-the-subject-of-segments