Symptom : Job tasks submitting to Overlord via HTTP POST fails due to error "Channel disconnected"
2017-11-22 19:38:28,056 WARN druid-coordinator c.m.h.c.NettyHttpClient [HttpClient-Netty-Worker-61] [POST http://<overlord_host>:8090/druid/indexer/v1/task] Channel disconnected before response complete
2017-11-22 19:38:28,056 ERROR druid-coordinator i.d.s.c.h.DruidCoordinatorSegmentKiller [Coordinator-Exec--0] Failed to submit kill task for dataSource [event_1349]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.jboss.netty.channel.ChannelException: Channel disconnected
ROOT CAUSE : One of the possible root cause is the overlord's Jetty pool is tied up handling requests from large number of indexing tasks and the Jetty acceptor queue is also completely full with pending connections so the requests coming from the coordinator are being dropped. The coordinator logic for submitting kill tasks doesn't do any retrying so that's why they are immediately failing.
To confirm, try to submit tasks when the overlord isn't processing other indexing tasks and see if they are accepted.
RESOLUTION : try increasing value of druid.server.http.numThreads on overlord nodes which increases the size of Jetty's acceptor thread pool.
Please sign in to leave a comment.