Issue Description
"Balancer move segments queue has a segment stuck" in the coordinator log may be caused by the coordinator looking in the wrong place for the segment/s it wants to move.
Implications
When this happens then coordinator won't be able to balance segments on the cluster. You may see balancing happening but at a very slow speed. The coordinator will have list of errors for the given historical node.
Symptoms:
Check the coordinator log for errors similar to this:
Failed to connect to host[<host name>:8283]
connection timed out: <host name>/<ip>:8283
Debugging
Run Telnet command to test connectivity with the historical node with the same host and port mentioned in the log from the host which is running the coordinator process. It should fail to connect.
Now run same command from on this failing historical node and it should connect.
> telnet <hostname> <port>
Resolution
Check the hosts file on the coordinator for a missing/incorrect entry for the failing historical node. Once this hosts file is fixed the issue will resolve.
Comments
0 comments
Please sign in to leave a comment.