URLSession downloadTask(with:) TimeOut Error NSURLErrorDomain Code=-1001, _kCFStreamErrorCodeKey=-2103

I have been battling this intermittent error for some time. It is generally random and has been difficult to reproduce until yesterday when I stumbled across a way to reproduce it each time. I can cause the code to throw this error:

Task <70E3909F-8C30-4F34-A8B0-4AF3B41DD81B>.<1> finished with error [-1001] Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo={_kCFStreamErrorCodeKey=-2103, _NSURLErrorFailingURLSessionTaskErrorKey=BackgroundDownloadTask <70E3909F-8C30-4F34-A8B0-4AF3B41DD81B>.<1>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "BackgroundDownloadTask <70E3909F-8C30-4F34-A8B0-4AF3B41DD81B>.<1>",
    "LocalDownloadTask <70E3909F-8C30-4F34-A8B0-4AF3B41DD81B>.<1>"
), NSLocalizedDescription=The request timed out., _kCFStreamErrorDomainKey=4, NSErrorFailingURLStringKey=https://redacted*, NSErrorFailingURLKey=https://redacted*}
*"redacted" is the backend URL, and it is the correct and same path for each

immediately after restarting an actual device. I have been over the following threads with no results:

What is kCFStreamErrorCodeKey=-4 (kCFStreamErrorDomainKey=4)

Request timed out with _kCFStreamErrorCodeKey=60

How to better diagnose -1001 "The request timed out." URLSession errors

Random timed out error on app start

Because I was able to reproduce it, I have been able to get the following logs:

Last bit of information is that I had Network Instruments running, and when this error occurred, I found that the Connection ID was "No Connection" and it appears the request was never actually sent, though it waited the full time out for a backend response.

Any help would be appreciated. This data request is being used after sending a certain APNs to update necessary data in the background, and has been the source of many user complaints.

Answered by DTS Engineer in 819872022

So, I think the issue here is with your timeoutIntervalForResource. More specifically, breaking down the two timeouts:

  • "timeoutIntervalForRequest"-> The is the network level timeout. When a connection is attempted, the system tries for 15s and then "gives up" if it doesn't complete it gives up. In you particular case, 15s is "short", but not unreasonable.

  • "timeoutIntervalForResource"-> This is the "overall" timeout on the entire request, meaning how long should the system keep the request "around" before discarding the entire request. In concrete terms, "if you can't get me this data in the next <insert time> don't bother trying at all". In that context, 15s is WAY to short, as you're basically saying "if you can't get it now, don't bother at all". Note that the default here is 7 days, which is the kind of time window you should be thinking in terms of.

Covering a few edge cases:

Using the same server endpoint, I am successfully completing the fetch when the app starts up, albeit using data(for:delegate:).

  • The "data(for:delegate:)" delegate changes things, as I believe it will end up doing the transfer in process (not as a true background transfer).

  • Having your app in the foreground also skews results, as it means your "background" transfer is getting much higher priority than it normally would.

The issue does not just occur after restart. It will also randomly occur, but I can replicate it reliably after restart.

If you're launching immediately after restart, then you can easily be in a situation where our radios have not fully initialized and/or the backend support daemon hasn't fully launched. Those delays could easily add 5-15s of delay in starting your transfer, at which point we'll end the entire request based on timeoutIntervalForResource.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

To add a few more details here:

  1. I don't believe I mentioned this is being called in the background. The basic code I am using is:
let configuration = URLSessionConfiguration.background(withIdentifier: sessionIdentifier)
configuration.timeoutIntervalForRequest = 15 // The backend times out in 15 seconds
configuration.timeoutIntervalForResource = 15
configuration.isDiscretionary = false
let session = URLSession(
          configuration: configuration,
          delegate: self,
          delegateQueue: nil
        )
let task = session.downloadTask(with: request)
task.resume()

In the delegate, I am implementing both urlSession(_:downloadTask:didFinishDownloadingTo:) and urlSession(_:task:didCompleteWithError:)

Using the same server endpoint, I am successfully completing the fetch when the app starts up, albeit using data(for:delegate:).

The issue does not just occur after restart. It will also randomly occur, but I can replicate it reliably after restart.

So, I think the issue here is with your timeoutIntervalForResource. More specifically, breaking down the two timeouts:

  • "timeoutIntervalForRequest"-> The is the network level timeout. When a connection is attempted, the system tries for 15s and then "gives up" if it doesn't complete it gives up. In you particular case, 15s is "short", but not unreasonable.

  • "timeoutIntervalForResource"-> This is the "overall" timeout on the entire request, meaning how long should the system keep the request "around" before discarding the entire request. In concrete terms, "if you can't get me this data in the next <insert time> don't bother trying at all". In that context, 15s is WAY to short, as you're basically saying "if you can't get it now, don't bother at all". Note that the default here is 7 days, which is the kind of time window you should be thinking in terms of.

Covering a few edge cases:

Using the same server endpoint, I am successfully completing the fetch when the app starts up, albeit using data(for:delegate:).

  • The "data(for:delegate:)" delegate changes things, as I believe it will end up doing the transfer in process (not as a true background transfer).

  • Having your app in the foreground also skews results, as it means your "background" transfer is getting much higher priority than it normally would.

The issue does not just occur after restart. It will also randomly occur, but I can replicate it reliably after restart.

If you're launching immediately after restart, then you can easily be in a situation where our radios have not fully initialized and/or the backend support daemon hasn't fully launched. Those delays could easily add 5-15s of delay in starting your transfer, at which point we'll end the entire request based on timeoutIntervalForResource.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hey Kevin,

Thanks for the response. Originally, the timeoutIntervalForResource was set to 60 seconds, and the timeoutIntervalForRequest was default. We were still getting these errors in the wild. The only reason I implemented these times is because I implemented a retry once this specific error came in, and the server has a 15 second timeout. I am not opposed to increasing those times at all, but I am not sure this solves the root issue.

But yes, this data is pretty ephemeral, so "if you can't get me this data in the next <insert time> don't bother trying at all" is exactly what we were going for. We also have an app refresh we call that uses this same code that is called regularly throughout the day. That will also exhibit this error on occasion.

"If you're launching immediately after restart, then you can easily be in a situation where our radios have not fully initialized and/or the backend support daemon hasn't fully launched. Those delays could easily add 5-15s of delay in starting your transfer, at which point we'll end the entire request based on timeoutIntervalForResource." That is a worry of ours. These were the steps where we could reproduce the error, but they are not the conditions that we would expect with the errors in the wild. And, because this is in the background, getting a Sysdiagnose Log is a non starter, because the user won't know it failed to initiate one.

With regard to the foreground fetches, they are not being done with this code. They are performed with an asynchronous data fetch. The only reason I included that was simply to show that the endpoint was up and radios working before the background fetch was made, to try to eliminate that as a possible issue.

The "backend support daemon hasn't fully launched" is a concern, and it may be the cause of this relocation of the problem, but not for what we see in the wild. The issue occurs when the app has been in the background and/or has been stopped by the system. But only occasionally. The issue we have is being able to get logs and traces on it as a result.

Could there be an issue we are triggering with the backend support daemon that we are inadvertently triggering? If so, how best to avoid it

This is a question that I really can't seem to find an answer on. This part of the system is triggered by a silent APNs. It is my understanding that we get 30 seconds total processing, give or take, ymmmv, to complete the processing. It is also my understanding that this time is suspended once the fetch request(s) are sent, until the delegate is called. Is this correct?

Also, am I correct in my guess that the Network Trace showing "No connection" literally means a connection wasn't attempted by the OS? We are showing no connection errors from the server, but it also has to get through Cloudflare.

Any assistance is greatly appreciated. This is a very difficult issue to debug, and we have been dealing with it for at least 9 months that we know of. I would really like to put this to bed.

URLSession downloadTask(with:) TimeOut Error NSURLErrorDomain Code=-1001, _kCFStreamErrorCodeKey=-2103
 
 
Q