HTTPConnection::_onqueue_doNotAllowMoreRequests() crush

Hi!

In the past few weeks we detected over 90 affected users with the following crush log:

std::__1::__shared_weak_count::lock()
CFNetwork
HTTPConnection::_onqueue_doNotAllowMoreRequests()
CFNetwork
HTTPConnectionCacheEntry::ConnectionArray::stopAndRemove(long)
CFNetwork
HTTPConnectionCacheEntry::_removeConnection(std::__1::shared_ptr<HTTPConnection>)
CFNetwork
HTTPConnectionCacheEntry::purgeIdleConnections(double, double)
CFNetwork
HTTPConnectionCache::performIdleSweep()

Unfortunately, our investigation haven't given any results. Did anyone experienced such crushes?

Top iOS versions for the crush:

12.5.4 30% 14.4.2 14% 12.5.3 6%

I have not seen a crash signature like this. Can you provide more insight as to what is going on with your network stack? Are you running a lot of concurrent connections?

Also, are you able to post a complete Apple Crash log here for more insight as to what is happening?

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

@meaton sent complete crash log on your email

@meaton added crash logs here:

and the info about what's going on on network layer:

  • we've got continuous websocket connection (via https://github.com/daltoniam/Starscream)
  • also there are multiple polling systems - once in 10 secs we are sending REST requests via native URLSession

Thanks for posting these crash logs. It looks as though your connection is being removed from the CFNetwork cache and it "could" be getting lost and thus segmentation faulting with SIGNAL, Code 0xb when it's removed because the pointer is no longer in memory. Now that is just a theory based on what I see. Are you by chance setting up multiple web socket connections here or just using 1 the entire time that you are sending messages back and forth on that one connection?

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

We're recreating web socket if connection is lost or it's 401 and it's time to refresh tokens. We're closing socket, killing object and after some time opening it again. Also we've got a couple of third party libraries that have web socket connections in parallel with us.

Thanks for the insight. I suspect that this would not be an issue with 1 URLSessionWebSocketTask running in a loop like you described. One way to get to the bottom of this would be to extract your networking logic out of your main app into a single test bed project that only runs your networking logic on a loop with 1 web socket connection. Do this continuously for a period of time. Then, after you have proved that this is not the problem, add another connection to your test project, and then another over time. Doing so should lead you to answer on what is happening here.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com
HTTPConnection::_onqueue_doNotAllowMoreRequests() crush
 
 
Q