Incident inquiry: spike in delivered news volume + exceptions from Refinitiv Data Delivery (StartPollingAsync) + application restart.
Problem description (what we observed)
Around 2026-02-09 23:00–23:30 CET (2026-02-09 22:00–22:30 UTC) we observed an unusual and very large spike in the number of delivered/published news messages, followed by two exceptions originating from the Refinitiv library / SQS integration. The second exception caused a full application restart.
The largest spike occurred around 2026-02-09 23:20 CET (2026-02-09 22:20 UTC): Count = 11,737 (aggregation per 10 minutes).
Timeline
1. First exception (handled by our logger) + background worker restart
This exception was logged by our application and caused a restart of the background worker only.
System.TimeoutException: A task was canceled.
---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.SendAsync(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
--- End of inner exception stack trace ---
at Refinitiv.Data.Delivery.Queue.AWSSubscriber.StartPollingAsync(...)
...
at StoneX.GMD.NewsFeed.Application.Inbound.Refinitiv.Feeds.SqsClient.Subscribe(...) line 69
at StoneX.GMD.NewsFeed.Application.Inbound.Refinitiv.Services.NewsStreamer.StartStreaming(...) line 31
at StoneX.GMD.NewsFeed.Application.AppServices.NewsStreamerAppService.ExecuteAsync(...) line 29
2. Second exception (not handled by our code – “Unhandled exception”) + full application restart
2026-02-09 @ 23 :14:20.055 CET (2026-02-09 @ 22 :14:20.055 UTC)
This exception was logged in the Refinitiv library format (not by our standard application logger) and resulted in a restart of the entire application:
Unhandled exception. System.InvalidOperationException: Data services unavailable. Session is closed
at Refinitiv.Data.Delivery.Request.EndpointDefinition.GetDataAsync(ISession session, Action`3 cb, CancellationToken cancellationToken)
at Refinitiv.Data.Delivery.Queue.QueueNode.RefreshCloudCredentialsAsync()
at Refinitiv.Data.Delivery.Queue.QueueNode.CloudRefreshTimerHandler(Object source, ElapsedEventArgs e)
at System.Threading.Tasks.Task.ThrowAsync(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
After the restart, the application recovered automatically and news publishing continued normally.
1. What happened on the Refinitiv side on 2026-02-09 around 22:30–23:20 CET (2026-02-09 21:30–22:20 UTC), according to the timestamps above? Did you experience an incident, service degradation, or maintenance that could have resulted in:
- System.InvalidOperationException: Data services unavailable. Session is closed
- and/or timeouts in Refinitiv.Data.Delivery.Queue.AWSSubscriber.StartPollingAsync?
2. Why did we observe such a large spike in news volume (~11,737 messages within ~10 minutes around 23:20 CET / 22:20 UTC)? Could this indicate redelivery, replay, or backlog draining (for example after a temporary interruption), or re-delivery of previously queued messages?
If so, can you confirm whether there was any delivery retry, queue delay, or reconnect event during that time window?
3. Could this behavior be related to a temporary unavailability of an AWS SQS-based component on the Refinitiv side? Our logs reference AWSSubscriber.StartPollingAsync and the SQS integration path, so we would like to confirm whether there were any issues at that time related to:
- endpoint availability,
- authorization or credentials refresh
- or any known anomaly on the SQS / delivery infrastructure.
4. Can you provide an RCA, incident reference, or status page entry for this time window, along with any recommended actions on our side? In particular, we would like to understand whether the Refinitiv library is expected to throw unhandled exceptions that terminate the hosting process, or whether additional defensive handling/wrapping is recommended on the client side.