URLSession: The network connection was lost.

I'm seeing behaviour that I'm struggling to work out when making GraphQL related network requests (these are POST requests).

I note that QA1941 covers the "lost connection" errors that I see in my responses, but I'd like to understand more, because just retrying the connection without understanding why I need to seems problematic.

Here's the (partially redacted) logs I see when the error occurs:

quic_conn_keepalive_handler [C69.1.1.1:2] [-0178f8467262b9e978791446c6629ddb66b2efc1] keep-alive timer fired, exceeding 2 outstanding keep-alives
nw_read_request_report [C69] Receive failed with error "Operation timed out"
nw_read_request_report [C69] Receive failed with error "Operation timed out"
nw_read_request_report [C69] Receive failed with error "Operation timed out"
nw_read_request_report [C69] Receive failed with error "Operation timed out"
nw_read_request_report [C69] Receive failed with error "Operation timed out"
0x16264d818 69 stalled, attempting fallback
Task <A6CC8D4D-83E1-4C61-96C7-BDDF2F04A35F>.<11> HTTP load failed, 1822/0 bytes (error code: -1005 [4:-4])
Task <3B0AAA67-3162-4F80-A930-D93F8A7EF1A4>.<12> HTTP load failed, 1823/0 bytes (error code: -1005 [4:-4])
nw_endpoint_flow_fillout_data_transfer_snapshot copy_info() returned NULL
Task <A6CC8D4D-83E1-4C61-96C7-BDDF2F04A35F>.<11> finished with error [-1005] Error Domain=NSURLErrorDomain Code=-1005 "The network connection was lost." UserInfo={_kCFStreamErrorCodeKey=-4, NSUnderlyingError=0x60000139bd50 {Error Domain=kCFErrorDomainCFNetwork Code=-1005 "(null)" UserInfo={NSErrorPeerAddressKey=<CFData 0x60000a217340 [0x1e006f658]>{length = 16, capacity = 16, bytes = 0x100201bb646394820000000000000000}, _kCFStreamErrorCodeKey=-4, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <A6CC8D4D-83E1-4C61-96C7-BDDF2F04A35F>.<11>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <A6CC8D4D-83E1-4C61-96C7-BDDF2F04A35F>.<11>"
), NSLocalizedDescription=The network connection was lost., NSErrorFailingURLStringKey=https://myserver.com/graphql, NSErrorFailingURLKey=https://myserver.com/graphql, _kCFStreamErrorDomainKey=4}
Task <3B0AAA67-3162-4F80-A930-D93F8A7EF1A4>.<12> finished with error [-1005] Error Domain=NSURLErrorDomain Code=-1005 "The network connection was lost." UserInfo={_kCFStreamErrorCodeKey=-4, NSUnderlyingError=0x60000139afd0 {Error Domain=kCFErrorDomainCFNetwork Code=-1005 "(null)" UserInfo={NSErrorPeerAddressKey=<CFData 0x60000a217340 [0x1e006f658]>{length = 16, capacity = 16, bytes = 0x100201bb646394820000000000000000}, _kCFStreamErrorCodeKey=-4, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <3B0AAA67-3162-4F80-A930-D93F8A7EF1A4>.<12>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <3B0AAA67-3162-4F80-A930-D93F8A7EF1A4>.<12>"
), NSLocalizedDescription=The network connection was lost., NSErrorFailingURLStringKey=https://myserver.com/graphql, NSErrorFailingURLKey=https://myserver.com/graphql, _kCFStreamErrorDomainKey=4}
GraphQL request query failed for query with hash: 6677767707705440859 with error: Networking.NetworkGraphQLService.Error.underlying(Apollo.URLSessionClient.URLSessionClientError.networkError(data: 0 bytes, response: nil, underlying: Error Domain=NSURLErrorDomain Code=-1005 "The network connection was lost." UserInfo={_kCFStreamErrorCodeKey=-4, NSUnderlyingError=0x60000139afd0 {Error Domain=kCFErrorDomainCFNetwork Code=-1005 "(null)" UserInfo={NSErrorPeerAddressKey=<CFData 0x60000a217340 [0x1e006f658]>{length = 16, capacity = 16, bytes = 0x100201bb646394820000000000000000}, _kCFStreamErrorCodeKey=-4, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <3B0AAA67-3162-4F80-A930-D93F8A7EF1A4>.<12>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <3B0AAA67-3162-4F80-A930-D93F8A7EF1A4>.<12>"
), NSLocalizedDescription=The network connection was lost., NSErrorFailingURLStringKey=https://myserver.com/graphql, NSErrorFailingURLKey=https://myserver.com/graphql, _kCFStreamErrorDomainKey=4}))
GraphQL request query failed for query with hash: 148576198322832328 with error: Networking.NetworkGraphQLService.Error.underlying(Apollo.URLSessionClient.URLSessionClientError.networkError(data: 0 bytes, response: nil, underlying: Error Domain=NSURLErrorDomain Code=-1005 "The network connection was lost." UserInfo={_kCFStreamErrorCodeKey=-4, NSUnderlyingError=0x60000139bd50 {Error Domain=kCFErrorDomainCFNetwork Code=-1005 "(null)" UserInfo={NSErrorPeerAddressKey=<CFData 0x60000a217340 [0x1e006f658]>{length = 16, capacity = 16, bytes = 0x100201bb646394820000000000000000}, _kCFStreamErrorCodeKey=-4, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <A6CC8D4D-83E1-4C61-96C7-BDDF2F04A35F>.<11>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <A6CC8D4D-83E1-4C61-96C7-BDDF2F04A35F>.<11>"
), NSLocalizedDescription=The network connection was lost., NSErrorFailingURLStringKey=https://myserver.com/graphql, NSErrorFailingURLKey=https://myserver.com/graphql, _kCFStreamErrorDomainKey=4}))
GraphQL request query starting for query with hash: -5824714640174886330
nw_connection_copy_connected_local_endpoint_block_invoke [C90] Connection has no local endpoint
nw_connection_copy_connected_local_endpoint_block_invoke [C90] Connection has no local endpoint
GraphQL request query succeeded for query with hash: -5824714640174886330
nw_read_request_report [C43] Receive failed with error "Operation timed out"
nw_read_request_report [C43] Receive failed with error "Operation timed out"
nw_read_request_report [C43] Receive failed with error "Operation timed out"
nw_connection_add_timestamp_locked_on_nw_queue [C43] Hit maximum timestamp count, will start dropping events
nw_endpoint_flow_fillout_data_transfer_snapshot copy_info() returned NULL

I'd really appreciate any advice or insight that people might have — I can't reproduce this problem consistently, so I want to know more.

Thanks!

Answered by DTS Engineer in 814845022

Unfortunately it’s hard to work out what’s going on here without (at least) a sysdiagnose and a packet trace. Based on the log snippet you posted it seems like your app is app is using HTTP/3, and hence QUIC. The QUIC connection has stopped receiving data and that’s why your URLSession tasks have failed. As to why the QUIC connection stopped receiving data, that’s hard to say. There’s a lot of stuff between your app and the server:

  • URLSession could be mishandling the QUIC connection. The log entries you posted suggest that’s not the case.

  • The QUIC implementation within Network framework could be mishandling the datagrams.

  • The TCP/IP networking stack could be doing the same.

  • Likewise for the Wi-Fi driver.

  • And then we get to off-device issues:

    • Like the Wi-Fi access point

    • Or any of the network hops between you and the original server

    • Or some middlebox

    • Or something wonky on the server

If you want to really understand what’s going on, you have to dig through each of these layers to identify the cause of the failure. And even once you do understand that, what can you do about it? If it’s on the iOS side, you can report it as a bug. If it’s on the server side, and you have influence with the folks who run the server, you can report it to them. But what if it’s between the two?

Hence the policy that I recommended in QA1941.

just retrying the connection without understanding why I need to seems problematic.

You do have to be careful when retrying. For example, if you retry indefinitely with no backoff, you’re effectively implementing a DoS attack against your server. However, a limited form of retry seems pretty reasonable to me.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Unfortunately it’s hard to work out what’s going on here without (at least) a sysdiagnose and a packet trace. Based on the log snippet you posted it seems like your app is app is using HTTP/3, and hence QUIC. The QUIC connection has stopped receiving data and that’s why your URLSession tasks have failed. As to why the QUIC connection stopped receiving data, that’s hard to say. There’s a lot of stuff between your app and the server:

  • URLSession could be mishandling the QUIC connection. The log entries you posted suggest that’s not the case.

  • The QUIC implementation within Network framework could be mishandling the datagrams.

  • The TCP/IP networking stack could be doing the same.

  • Likewise for the Wi-Fi driver.

  • And then we get to off-device issues:

    • Like the Wi-Fi access point

    • Or any of the network hops between you and the original server

    • Or some middlebox

    • Or something wonky on the server

If you want to really understand what’s going on, you have to dig through each of these layers to identify the cause of the failure. And even once you do understand that, what can you do about it? If it’s on the iOS side, you can report it as a bug. If it’s on the server side, and you have influence with the folks who run the server, you can report it to them. But what if it’s between the two?

Hence the policy that I recommended in QA1941.

just retrying the connection without understanding why I need to seems problematic.

You do have to be careful when retrying. For example, if you retry indefinitely with no backoff, you’re effectively implementing a DoS attack against your server. However, a limited form of retry seems pretty reasonable to me.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

is there any information on the behaviour when this happens?

No. It’s very much an implementation detail.

The last time I looked at this — and this was back when I wrote QA1941, so in 2017! — I saw a couple of relevant behaviours:

  • If the connection setup failed, URLSession would retry that, possible with different TLS settings.

  • If the connection dropped before it sent the full request, it’d retry that.

  • If the connection dropped before it received any response, and the request was idempotent, it’d retry.

But please don’t take this as gospel. The system has evolved a lot since 2017.

I assume it's a limited set of retries with an time interval based backoff?

No. It was more like it has a specific sequence of potential retries and then it’d fail. That put a strict bound on the total number of retries.

I should also note that background sessions have an entire separate ‘high level’ retry and resume mechanism for downloads and, starting with macOS 14 and its cohort, uploads.

ps It’s better if you reply as a reply rather than in the comments. See Quinn’s Top Ten DevForums Tips for this and other titbits.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Quinn, do you feel like I'd be able to gain more insight by opening a TSI for this? Would you recommend that if I needed more help working out why the initial situation is occurring?

do you feel like I'd be able to gain more insight by opening a TSI for this?

No.

Would you recommend that if I needed more help working out why the initial situation is occurring?

That’s kinda where I leave off with QA1941. You need to correlate the system log with a packet trace to see what’s actually happening on the wire.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

URLSession: The network connection was lost.
 
 
Q