macOS 15.x crashes in MetalPerformanceShadersGraph

In our app we use CoreML. But ever since macOS 15.x was released we started to get a great bunch of crashes like this:

Incident Identifier: 424041c3-884b-4e50-bb5a-429a83c3e1c8
CrashReporter Key:   B914246B-1291-4D44-984D-EDF84B52310E
Hardware Model:      Mac14,12
Process:         <REMOVED> [1509]
Path:            /Applications/<REMOVED>
Identifier:      com.<REMOVED>
Version:         <REMOVED>
Code Type:       arm64
Parent Process:  launchd [1]

Date/Time:       2024-11-13T13:23:06.999Z
Launch Time:     2024-11-13T13:22:19Z
OS Version:      Mac OS X 15.1.0 (24B83)
Report Version:  104

Exception Type:  SIGABRT
Exception Codes: #0 at 0x189042600
Crashed Thread:  36

Thread 36 Crashed:
0   libsystem_kernel.dylib               0x0000000189042600 __pthread_kill + 8
1   libsystem_c.dylib                    0x0000000188f87908 abort + 124
2   libsystem_c.dylib                    0x0000000188f86c1c __assert_rtn + 280
3   Metal                                0x0000000193fdd870 MTLReportFailure.cold.1 + 44
4   Metal                                0x0000000193fb9198 MTLReportFailure + 444
5   MetalPerformanceShadersGraph         0x0000000222f78c80 -[MPSGraphExecutable initWithMPSGraphPackageAtURL:compilationDescriptor:] + 296
6   Espresso                             0x00000001a290ae3c E5RT::SharedResourceFactory::GetMPSGraphExecutable(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, NSDictionary*) + 932
.
.
.
43  CoreML                               0x0000000192d263bc -[MLModelAsset modelWithConfiguration:error:] + 120
44  CoreML                               0x0000000192da96d0 +[MLModel modelWithContentsOfURL:configuration:error:] + 176
45  <REMOVED>                            0x000000010497b758 -[<REMOVED> <REMOVED>] (<REMOVED>)

No similar crashes on macOS 12-14!

Any clue what is causing this?

Thanks! :)

I don’t see an easy way to debug this with the info you have available. Consider this tiny test project:

@import Foundation;
@import MetalPerformanceShadersGraph;

int main(int argc, char **argv) {
    [[MPSGraphExecutable alloc] initWithMPSGraphPackageAtURL:nil compilationDescriptor:nil];
    return EXIT_SUCCESS;
}

It crashes with a similar backtrace.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #0: 0x0000000186dd2600 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000186e0af70 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x0000000186d17908 libsystem_c.dylib`abort + 128
    frame #3: 0x0000000186d16c1c libsystem_c.dylib`__assert_rtn + 284
  * frame #4: 0x0000000191d6d870 Metal`MTLReportFailure.cold.1 + 48
    frame #5: 0x0000000191d49198 Metal`MTLReportFailure + 448
    frame #6: 0x0000000220d08c80 MetalPerformanceShadersGraph`-[MPSGraphExecutable initWithMPSGraphPackageAtURL:compilationDescriptor:] + 300
    frame #7: 0x0000000100003f14 Test769129`main + 60
    frame #8: 0x0000000186a88274 dyld`start + 2840

It also prints a handy-dandy error, Error: did not find file at url: (null). However, when you disassemble the code you’ll see that the error is coming from a helper method, -initWithMPSGraphPackageAtURLCommon:compilationDescriptor:error:. So, the actual error could be anything, and there’s no way to tell from this backtrace what actually got printed.

I’m not an expert in Metal or MPS, but my reading of MTLReportFailure is that it might record this failure in the system log. If so, you might be able to make progress on this from a sysdiagnose log captured by the user shortly after seeing the crash. I talk about this more in Using a Sysdiagnose Log to Debug a Hard-to-Reproduce Problem.

Also, if you have access to a JSON crash report (.ips), please post it here. I might be able to learn more from that. See Posting a Crash Report for advice on how to post a crash report.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

According to the declaration of the initWithMPSGraphPackageAtURL:compilationDescriptor: method it does not accept the nil URL. That's probably the reason why the app crashes in the provided example.

The problem in our case, however, is that we are not using the MPSGraphExecutable directly, but trying to load a CoreML model. The MPS seems to be used under the hood and we do not have control over it.

What we already know is that CoreML creates an MPS graph package somewhere under ~/Library/Caches directory when it loads the model for the first time. It tries to load the graph on a subsequent use from the caches. And that's the moment when it crashes the app instead of handling the error. Naively, we would expect CoreML to return an NSError from [MLModel modelWithContentsOfURL:configuration:error:] method instead of crashing. But it does not happen.

What we've also noticed, is that these caches are taking a log of disk space. In our case it's hundreds of megabytes on Sequoia. It was taking much less on previous OS versions (just some megabytes). Can it be that there is just not enough room for all the caches on a user machine, so that they get corrupted?

macOS 15.x crashes in MetalPerformanceShadersGraph
 
 
Q