QUESTION:
What are the streaming exceptions for supervisors and how are they different? There are times where LOST_CONTACT_WITH_STREAM is shown and others where UNABLE_TO_CONNECT_TO_STREAM is shown. These will be shown when the supervisor shows UNHEALTHY_SUPERVISOR. This differs from UNHEALTHY_TASKS in that this status in the supervisor status means that 1+ of the most recent tasks have failed. The status will change when one round of tasks completes successfully.
ANSWER:
UNHEALTHY_SUPERVISOR:
Below is the code block responsible for determining the exception for the supervisor:
protected State getSpecificUnhealthySupervisorState()
{
ExceptionEvent event = getRecentEventsQueue().getLast();
if (event instanceof SeekableStreamExceptionEvent && ((SeekableStreamExceptionEvent) event).isStreamException()) {
return isAtLeastOneSuccessfulRun()
? SeekableStreamState.LOST_CONTACT_WITH_STREAM
: SeekableStreamState.UNABLE_TO_CONNECT_TO_STREAM;
}return BasicState.UNHEALTHY_SUPERVISOR;
}
This is checking the recent events/errors to see if there are any stream exceptions. If the tasks have run once without issue, it shows LOST_CONNECTION_WITH_STREAM. If the tasks have not run once but there are stream exceptions, it shows UNABLE_TO_CONNECT_TO_STREAM.
If you look at the recentErrors of the supervisor, it shows if the errors are considered streaming exceptions (streamException:true) at the bottom of each one.
"recentErrors": [
{
"timestamp": "2022-06-22T19:00:00.624Z",
"exceptionClass": "com.amazonaws.services.kinesis.model.LimitExceededException",
"message": "com.amazonaws.services.kinesis.model.LimitExceededException: Rate exceeded for stream live-ga-annotated-events under account 118928031713. (Service: AmazonKinesis; Status Code: 400; Error Code: LimitExceededException; Request ID: ca2d05f1-12dc-2ef0-9100-6d0148561ccd)",
"streamException": true
},
UNHEALTHY_TASKS:
Below is the code block responsible for determining whether or not to display UNHEALTHY_TASKS.
if (consecutiveFailedTasks >= supervisorStateManagerConfig.getTaskUnhealthinessThreshold()) {
hasHitTaskUnhealthinessThreshold = true;
supervisorState = BasicState.UNHEALTHY_TASKS;
return;
}
REFERENCES:
The code for unhealthy tasks can be found below. This is a reference point to review and understand how health of supervisors/tasks is determined:
Comments
0 comments
Please sign in to leave a comment.