How to better diagnose -1001 "The request timed out." URLSession errors

I'm investigating some reported connectivity issues from users of our iOS app. The app is failing to load/refresh data for multiple minutes at a time, despite other apps, Safari, etc. working fine. We are investigating this at various layers in the pipeline (i.e. server-side, network), but I'm taking a look at what we know or can find out on the client.


I'm sure a lot of these kinds of issues get posted which end up being caused by transient network issues but I have a few specific questions about how I can find out more, and some questions around the behaviour of URLSession, so hopefully there are some people qualified to answer (BatEskimo-signal activated).


packet trace or gtfo 🙂


Unfortunately we've been unable to get a network/packet trace as this requires reproducing the issue locally which we haven't been able to do.


Device logs show what look like typical timeout errors, occurring 60 seconds after initiating the requests:


Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo={NSUnderlyingError=0x280764ae0 {Error Domain=kCFErrorDomainCFNetwork Code=-1001 "(null)" UserInfo={_kCFStreamErrorCodeKey=-2102, _kCFStreamErrorDomainKey=4}}, NSErrorFailingURLStringKey=https://REDACTED, NSErrorFailingURLKey=https://REDACTED, _kCFStreamErrorDomainKey=4, _kCFStreamErrorCodeKey=-2102, NSLocalizedDescription=The request timed out.})


The app is trying to send multiple requests over the minutes things are not working, and they are all failing with the same timeout error after 60 seconds. We've had users give up after 5 minutes because nothing is working. This is despite them having a good cellular or wifi connection and with other apps and Safari working. The users who have reported this so far are on iOS 12.


We are using a single URLSession for all our requests, created when the app starts. It's pretty vanilla: the config is a

URLSessionConfiguration.default
but with a custom User-Agent to override the default one, added via
httpAdditionalHeaders
. All our requests hit the same https hostname, and they are all POSTs.


Now the interesting part is that we have a separate health check request we send occasionally which sends a POST to exactly the same end point as normal requests, and we are seeing this succeed during the periods when regular requests are timing out. One difference with the ping check is that we use a different URLSession instance on the client. This URLSession is also created on startup, and uses the same configuration. The only difference is a delegate that we use to do some cert pinning and report any certificate mismatch from what we expect.


We do have DNS load balancing on our end point, so different connections can end up hitting a different IP.


So there are a few initial thoughts and questions I have:


  • The failing requests could be going to a different IP than the successful health check ones, and a specific server could be bad in some way. Is there a way to log the resolved IP address that a particular URLSession task used, at the point of receiving the error? Googling and looking in the docs doesn't show an obvious way to get this information. I imagine since URLSession can maintain a pool of connections to the same host, and there can be redirects during a request, that this is difficult to expose "nicely" via the API. We can obviously do this with local profiling but we would like to add telemetry to gather this data in the wild if possible.
  • Is it possible the "bad" URLSession is reusing a stale/dead persistent (keep-alive) connection, and everything on that socket is just timing out? What is the behaviour of connection reuse in these situations and under what circumstances will URLSession open a new connection? How long will it reuse a connection for? Will it continue reusing a connection even when requests are failing with timeout errors, even for multiple minutes?
  • Is there a way to log exactly where in the lifetime of the request the URLSession task got to before it timed out? i.e. did it even resolve DNS? Did it connect at all? Did it finish the TLS handshake? Did it send headers? Did it receive anything at all? There is the NSURLSessionTaskMetrics API but it doesn't look like there's an easy way to correlate an event from
    urlSession(_ session: URLSession, task: URLSessionTask, didFinishCollecting metrics: URLSessionTaskMetrics)
    to a particular data task / request, so we'd have to log everything (maybe checking if response is null to detect an incomplete load) and correlate later.
  • Some docs (e.g. "Technical Q&A QA1941" which I won't link because this post will be put in a moderator queue) talk about some retry behaviour in URLSession for idempotent (e.g. GET) vs. non-idempotent (e.g. POST) requests, at least for "The network connection was lost" errors. Is there a similar or related behaviour for timeouts, or when a connection looks dead? If this is some transient network issue, would GET requests behave better in such situations when stuff is timing out? There are reasons we are using POST but it would be interesting to know more about how differently idempotent requests are treated


Thanks in advance

Replies

In situations like this, where users are reporting issues from the field but you can’t reproduce them, the weapon of choice is a sysdiagnose log. This includes a bunch of information from both CFNetwork and the underlying libnetcore (this is the internal name for the core implementation of the now-public Network framework).

You can learn more about sysdiagnose logs on our Bug Reporting > Profiles and Logs page.

Device logs show what look like typical timeout errors, occurring 60 seconds after initiating the requests:

FYI, error -2102 is an internal error

kCFStreamErrorRequestTimeout
, which isn’t any more informative.

Is there a way to log the resolved IP address that a particular URLSession task used, at the point of receiving the error?

No. IMO this would make a great addition to

NSURLSessionTaskTransactionMetrics
and if you agree I encourage you to file an enhancement request explaining your rationale.

Please post your bug number, just for the record.

I think you’ll be able to get this information out of a sysdiagnose log, but I’d have to look at the log in depth to be sure.

Is it possible the "bad"

URLSession
is reusing a stale/dead persistent (keep-alive) connection, and everything on that socket is just timing out? What is the behaviour of connection reuse in these situations and under what circumstances will
URLSession
open a new connection?

Is this HTTP/1.1 or HTTP 2?

For HTTP 1.1 a timeout error on a specific request will close the connection used by that request because the nature of HTTP 1.1 means that it’s impossible to continue using that connection.

Is there a way to log exactly where in the lifetime of the request the

URLSession
task got to before it timed out?

I would’ve thought the task and transaction metrics would help here.

There is the

NSURLSessionTaskMetrics
API but it doesn't look like there's an easy way to correlate an event from
urlSession(_:task:didFinishCollecting:)
to a particular data task / request, so we'd have to log everything (maybe checking if response is null to detect an incomplete load) and correlate later.

I don’t understand this problem. That delegate callback gets the task via the

task
parameter. What more do you need for correlation purposes?

Is there a similar or related behaviour for timeouts, or when a connection looks dead?

I’m not sure. I will say that whatever retry logic available won’t kick in for POST requests. However, I don’t think that switching to GET makes sense. Something is blocking your entire session, so that completely new requests are also failing. In that situation it’s unlikely that an automatic retry would fair any better than the other requests you issue to your session.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"
No. IMO this would make a great addition to
NSURLSessionTaskTransactionMetrics
and if you agree I encourage you to file an enhancement request explaining your rationale.

Will do

Is this HTTP/1.1 or HTTP 2?

HTTP 2

I don’t understand this problem. That delegate callback gets the task via the
task
parameter. What more do you need for correlation purposes?

Sorry, disregard my previous comments, it looks like it has everything we need.

HTTP 2

OK, that opens up a bunch of possibilities, in that the HTTP 2 connection might be open but there’s something broken with its internal state machine (either on the client or the server). A CFNetwork diagnostic log is definitely the right next step here.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Thanks.


45304558 for enhancements to NSURLSessionTaskMetrics / NSURLSessionTaskTransactionMetrics

A couple more followup questions:


To get CFNetwork diagnostics information from devices in the wild, am I correct in thinking we need to first setenv("CFNETWORK_DIAGNOSTICS", "3", 1); in main() as per https://developer.apple.com/library/archive/qa/qa1887/_index.html (and probably put this behind some kind of remote flag or Settings.app setting to avoid doing it for all our users), then have the appropriate users install the profile etc. from https://developer.apple.com/bug-reporting/profiles-and-logs/?name=sysdiagnose&platform=ios and capture a sysdiagnose and send it to us?


Without setting the env var will there be anything useful in a sysdiagnose, or is the CFNETWORK_DIAGNOSTICS where all/most of the interesting stuff comes from?


Also, is there a way to capture the CFNetwork diagnostic os_log output from within our app/process, so we could include it directly into our application logs which we can upload automatically? Or can it only be accessed via Analytics->... / Console.app?

Also, is there a way to capture the CFNetwork diagnostic os_log output from within our app/process …

No. There are numerous difficulties in making that work [1] but I realise that it’d be a super-useful feature so don’t let that stop you from filing an enhancement request describing your requirements

Please post your bug number, just for the record.

To get CFNetwork diagnostics information from devices in the wild, am I correct in thinking we need to first

setenv("CFNETWORK_DIAGNOSTICS", "3", 1);
in
main()

Correct.

and probably put this behind some kind of remote flag or Settings.app setting to avoid doing it for all our users

If you put this in a production build, make sure that the UI informs the user of the privacy risks involved in enabling it.

then have the appropriate users install the profile

That step is not necessary.

capture a sysdiagnose and send it to us?

Yes.

Without setting the env var will there be anything useful in a sysdiagnose … ?

Yes, although it’s hard to say whether that will help you debug this question. This environment variable enables CFNetwork diagnostics, which lets you see what’s going on internal to CFNetwork. If you don’t set it then you still get a bunch of logging. On the networking front that includes extensive DNS logging from

mDNSResponder
and a bunch of lower-level network logging from
libnetcore
(the private library used by CFNetwork for its on-the-wire activities; this is the same library that backs the new Network framework). That might help you understand this problem, or it might not. It kinda depends on what the actual problem is, and we don’t know that yet.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

[1] One major problem is that we can’t give you all log messages on the system because of privacy concerns, but a lot of the value in OSLog is that it lets you track activity as it crosses processes. For an example close to your heart, much of the internals of CFNetwork are implemented in separate daemons (DNS is done via

mDNSResponder
, authentication UI via
CFNetworkAgent
, cookies and caches via
nsurlstoraged
, background sessions via
nsurlsessiond
, and so on).

I have similar strange periodic timeouts in my app as Userbla

But what is different:

- I use NSURLSession's shared instance which means no custom delegate code is involved.

- HTTP/1.1

- Timeout happens for a specific domain/ip address

- App restart fixes the issue

- I am able to reproduce the issue locally. Although it is hard and may take up to 1-2 hours sometimes

- Captured packet trace and network logs and opened a bug 48359240

- Requests are sent OK using the same code but if I create a new instance of NSURLSession

- The issue happens when the app was in the background for more than 10-15 minutes(?)
- Can't reproduce the issue in a separate sample project, so not sure what exactly causes the issue

Hi sergeyne,

We've faced with a similar issue, had your bug is closed with some resolution?


Use case:

  • Make a switch between wifi and 3G/4G;
  • Moving to the background and foreground;

At some point, the user sees time out error for all requests. Only restart of the app helps.


eskimo,

Using "setenv("CFNETWORK_DIAGNOSTICS", "3", 1);"

I've gathered the following.

default 19:54:56.119246 +0300 **** CFNetwork Diagnostics [3:37959] 18:54:56.116 {
 Protocol Enqueue: request GET https://***** HTTP/1.1
          Request:  {url = https://****, cs = 0x0}
          Message: GET https://**** HTTP/1.1
           Accept: application/json
     Content-Type: application/json
      X-Client-Id: 2.3.41 (18460)||Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148
  header: ***
  Accept-Language: en-us
header: ***
  Accept-Encoding: br, gzip, deflate
     header: ***
} [3:37959]
default 19:54:56.274545 +0300 bluetoothd Matched UUID *** for device ""
default 19:54:56.392140 +0300 *** TIC TCP Conn Cancel [518:0x282f57780]
default 19:54:56.392264 +0300 *** [C518 ***:443 tcp, url hash: e37a5a66, tls] cancel
default 19:54:56.392425 +0300 *** [C518 ***:443 tcp, url hash: e37a5a66, tls] cancelled

And then.

default 19:55:06.535110 +0300 *** CFNetwork Diagnostics [3:37960] 18:55:06.533 {
        Did Timeout: (null)
             Loader:  {url = https://***, cs = 0x0}
   Timeout Interval: 60.000 seconds
init to origin load: 0.000351071s
         total time: 60.1877s
        total bytes: 0
} [3:37960]

Do you have any clues about what we should look for do understand the reason for such behavior?
Thanks in advance.

Ok, I lost a lot of time investigeting similar issue.


In my case the problem was strange (bad?) firewall on the server. It was banning device when there was many (not so many) requests in short period of time.


I believe you can do easy test if you are facing similar issue.

1. Send a lot of (depends of firewall settings) requests in loop (let's say 50 in 1 second).

2. Close/Kill app (this will close connections to server)

3. (OPTIONAL) Wait a while (lets say 60 seconds)

4. Start app again and try send request


If you now got timeout for all next requests, you probably have same issue and you should talk with sever guys.


PS: if you don't have access to server you can give user info that he should restart wifi on device to quit that timeout loop. It could be last resort in some cases.

I see this issue in 14.4 devices.

Hello,

Did you manage to solve this issue?

I am getting the following error only from Apple Review, never while testing on real devices myself (iOS 14.6)

Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo={_kCFStreamErrorCodeKey=60, NSUnderlyingError=0x282a3b600 {Error Domain=kCFErrorDomainCFNetwork Code=-1001 "(null)" UserInfo={_kCFStreamErrorCodeKey=60, _kCFStreamErrorDomainKey=1}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask

When tried to access a URL link in iPhone 13( iOS 15.0 ) browser, getting “Error loading page, Domain NSURLErrorDomain, Error Code 1001, Description: The request timed out”. We are loading the same URL in webview in our iOS app. It is not working in the app also. Giving the same error

Using high speed Wifi and Cellular data. No connectivity issues. Verified speed of connectivity in this phone. Not getting this issue in any other iPhones/ iPads & Android mobiles. There the URL is loading fine within seconds

Same URL is accessible in Laptop and desktop computers in same network connectivity

Will there be any settings to edit or device specific issues that cause this?

Posting in case this helps anyone else. We were getting a lot of these timeouts after loading a bunch of image URLs. Our app would load ~100 images and then all future image requests would time out.

After lots of digging the fix was server side: we disabled the HTTP/3 (with QUIC) setting on our Cloudflare instance.

My issue causing -1001 timeout error was URLSessionConfiguration custom configuration. Once I set a default URL session configuration -1001 timeout errors disappeared.

**Notice, it did not cause the issue on a real device, only on iOS simulator. **