How to fix system panic "Too many alloc retries"?

Hi there,

I’m developing an app that has a strong interactions with network requests and that is structured in layers. All the communication is written in C.

The app is live since months but only last week, with the launch of iOS 13.4, user stated complaining about phone reboots.

Apparently it happens mainly (or only) on iPhone XS and iPhone XR and all scenarios (there is no common pattern) seem to be related to the network requests.

I was finally able to gather some of the panic reports and all of them seems to report the same reason and the panic string always contains this:

  "panicString" : "panic(cpu 3 caller 0xfffffff029b4ce60): \"Too many alloc retries: 502, table:0xfffffff02b34b410, type:3, nelem:1\"\nDebugger message: panic

Do you have any hint? Is there anything that could cause such system panics?

Accepted Reply

I used an internal tool to get a backtrace from your panic log:

0 debugger_collect_diagnostics xnu/osfmk/kern/debug.c:1005
 1 handle_debugger_trap xnu/osfmk/kern/debug.c:1200
 2 kdp_trap xnu/osfmk/kdp/ml/arm/kdp_machdep.c:344
 3 sleh_synchronous xnu/osfmk/arm/model_dep.c:1016
 4 fleh_synchronous
 5 DebuggerTrapWithState xnu/osfmk/kern/debug.c:549
 6 panic_trap_to_debugger xnu/osfmk/kern/debug.c:877
 7 panic xnu/osfmk/kern/debug.c:746
 8 ltable_alloc_elem xnu/osfmk/kern/ltable.c:503
 9 waitq_link_reserve xnu/osfmk/kern/waitq.c:370
10 selprocess xnu/bsd/kern/sys_generic.c:1659
11 select_nocancel xnu/bsd/kern/sys_generic.c:1032
12 unix_syscall xnu/bsd/dev/arm/systemcalls.c:173
13 sleh_synchronous xnu/osfmk/arm64/sleh.c:1443
14 fleh_synchronous

If you’re curious, most of this code is part of the Darwin open source, where the 10.5 code (very) roughly lines up with iOS 13.

As you can see, you’re crashing deep inside

select
(frame 11). This is curious because most network code on iOS uses the user space networking stack rather than the kernel. What’s going the networking heavy lifting in your app? Is it using BSD Sockets?

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Replies

It’s safe say that nothing your app does should be able to kernel panic an iOS device. You should definitely file a bug this with whatever info you have. Please post your bug number, just for the record.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Thanks eskimo.


Submitted as bug FB7653601

I used an internal tool to get a backtrace from your panic log:

0 debugger_collect_diagnostics xnu/osfmk/kern/debug.c:1005
 1 handle_debugger_trap xnu/osfmk/kern/debug.c:1200
 2 kdp_trap xnu/osfmk/kdp/ml/arm/kdp_machdep.c:344
 3 sleh_synchronous xnu/osfmk/arm/model_dep.c:1016
 4 fleh_synchronous
 5 DebuggerTrapWithState xnu/osfmk/kern/debug.c:549
 6 panic_trap_to_debugger xnu/osfmk/kern/debug.c:877
 7 panic xnu/osfmk/kern/debug.c:746
 8 ltable_alloc_elem xnu/osfmk/kern/ltable.c:503
 9 waitq_link_reserve xnu/osfmk/kern/waitq.c:370
10 selprocess xnu/bsd/kern/sys_generic.c:1659
11 select_nocancel xnu/bsd/kern/sys_generic.c:1032
12 unix_syscall xnu/bsd/dev/arm/systemcalls.c:173
13 sleh_synchronous xnu/osfmk/arm64/sleh.c:1443
14 fleh_synchronous

If you’re curious, most of this code is part of the Darwin open source, where the 10.5 code (very) roughly lines up with iOS 13.

As you can see, you’re crashing deep inside

select
(frame 11). This is curious because most network code on iOS uses the user space networking stack rather than the kernel. What’s going the networking heavy lifting in your app? Is it using BSD Sockets?

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Thanks for sharing the panic backtrace!

This is really helpful to address a workaround.

Actually it seems that the panic is nested in the table_alloc that should be called by select() that I'm using to define a custom timeout.

Unless BSD sockets are used by 3rd party libraries, I'm only using TCP sockets.

Unless BSD sockets are used by 3rd party libraries, I'm only using TCP sockets.

OK, quick terminology clarification…

BSD Sockets is the name of API,

socket
,
bind
,
connect
, and so on. You can use it to run a variety of network transports, including TCP connections. Many folks conflate TCP connection and TCP socket. These are different things on Apple platforms because there are multiple ways to run a TCP connection:
  • If you use a TCP socket, your TCP connection is run by the kernel’s networking stack.

  • If you use Network framework — or anything layered on top of that, like

    NSURLSession
    — your TCP connection is run by the user space networking stack.

It seems likely that you’re using BSD Sockets because of the crash within

select
. The user space TCP stack does not use file descriptors, and thus does not use
select
.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"