SYMPTOM: Users can use an API call to find the number of segments left to load by coordinator. Sometimes even after the API call returns `0`, which means no more segments to load, a query will occasionally return results based on incomplete segment loads. Users still need to wait for a few seconds or a minute to get query results from all segments.
ROOT CAUSE: There are two factors that can contribute to this delay:
- After taskPeriod + windowPeriod, the overlord copies the segments from MM to deep storage, and updates the Meta DB. Then coordinator picks up the segment update in Meta DB, and starts copying the segments from deep storage to historical nodes. During that time, the metadata is updated, but copying actual segments takes time, so the API call to the coordinator will get a 'no more segments to load' message, but actual segments could be still not in position yet.
- The broker hasn't noticed the load yet. It is a design aspect of druid that loads are asynchronous and each broker comes to its own conclusions as to when it's ready.
WORK AROUND: Implement a short sleep time before executing the query. Or, alternatively, using timeBoundary to check the max ingested time before issuing query (to be sure the data is loaded). Also, please note that it would be important to use the same broker consistently in this case, since different brokers may wake up to new data at different times.
Please sign in to leave a comment.