Xcode producing corrupt build

I'm experiencing 100% reproducible bug with Xcode. It took me days to figure it out. After many tears shed.

Xcode is producing a corrupt binary. It seems like my metal buffers are being mis-aligned or something.

The problem showed up after changing the Deployment to iOS 15. It had been at iOS 13 and built and ran without any related issues - for years that I've been developing this app.

So at some point after changing the target to iOS 15, compiling the shaders went from about 0.001 seconds to 30 seconds. When that happens, the GPU will also hang with the messages:

GPUDebug] Invalid device load executing kernel function "computeArtPointsToRender" encoder: "0", dispatch: 0, at offset 22080048

Shaders.metal:1290:41 - computeArtPointsToRender()

To fix the issue, I have to change the build target to iOS 13, clean, build, change to iOS 15, clean, build and then works again as expected (until it doesn't at some random point or until I have restarted the machine).

This is 100% reproducible:

  1. Restart mac
  2. Build project
  3. Issue occurs
  4. Change build target to iOS 13
  5. Clean build folder
  6. Build
  7. Change build target to iOS 15
  8. Delete derived files
  9. Build
  10. Works as expected

Xcode: Version 13.1 (13A1030d) macOS: 11.5.2 (20G95) Mac mini (M1, 2020)

In my 11+ years as a full time iOS developer, I've never encountered such a serious issue with Xcode. If I don't have stable tools, it is impossible for me to develop.

  • If I had to guess, I think that the issue is something that has worked for iOS 13 and before is no longer properly working with either iOS 15 SDK or possibly iOS 14 SDK.

    Building for iOS 13 then building for iOS 15 works, because I suspect that something is actually being cached when I build for iOS 13 that should not be cached after deleting the derived files. So I fear the “correct” behavior is for it to complain about something and not successfully compile at all using the iOS 15 SDK. I’m pretty much forced to only support iOS 15 because of a severe issue that it resolved with CoreML or I would be glad to just leave it at iOS 13 and shrug it off.

    The most suspicious difference is in iOS 13 Float16 was not supported yet so I am using Uint8 with a Float16 alias. 

    I tried commenting out by Float16 alias (which just pointed to UInt8) and to use the SDK for Float16 instead, but it still had the same issue of taking 30 seconds to compile the shader and hanging.

    Can anyone tell me what iOS uses under the hood for to represent Float16? Does it have the same alignment as UInt8?

    For all of by MTLBuffers and parameters I use MemoryLayout’s Stride. Should I be using Size instead of Stride for iOS 15?

    All the metal structures are defined in a C header which is suppose to enforce Metal compatibility as I understand it.

Add a Comment

Accepted Reply

It turns out that the slow compile only occurs (intermittently) when Shader Validation is enabled. If I disable Shader Validation it compiles as expected.

When compiling for iOS 13, since Shader Validation is not supported, the slow compile would not occur since it was not being validated.

As for the runtime hang, it turns out that I was indeed accessing a buffer out-of-bounds. Why it was never an issue before I can't say, but it was my bad. I was just super suspicious of the build since it was taking abnormally long to compile.

As for the apparent bug with Shader Validation, I have updated my FeedBack Assistant report with sysdiagnose reports.

Replies

Hi 3DTOPO,

The the error message you are obtaining indicates that there is an out-of-bounds access occurring within your computeArtPointsToRender function. These problems are difficult to catch without additional instrumentation in the shader, and may explain the difference in behavior you are experiencing between iOS 13 and iOS 15, as they ultimately depend on how your resources are laid out in memory.

Shader Validation, which you are already using, instruments your shader to include additional out-of-bounds error detection. In addition to reporting the problem you copied, you can also click on the arrow next to the shader validation checkbox to enable a shader breakpoint for when the problem is detected.

An additional check you can enable is to opt you command buffers into enhanced error detection. This will allow you to verify if the problem is causing a command buffer failure, which could help explain the difference in timings you are seeing.

If you are unsure about how to accomplish this, here is a video that shows this process in detail, as well as tips and tricks for debugging these kinds of difficult errors - https://developer.apple.com/videos/play/wwdc2020/10616/

  • Thanks for the tips. The thing is, I've checked, double-checked, triple checked and I am not accessing anything out of bounds, which is why it has worked for years without any issue.

    The fact that it is taking 30 seconds to compile a shader indicates something is going on and not at even at run time - so it's not out of bounds there and obviously some issue there.

    What concerns me the most is that the issue is completely intermittent. For given the same source code, Xcode should always produce an identical binary, but clearly it is not.

  • When the issue occurs, it takes 30 seconds per shader to compile - so for my main 3 shaders, it takes 1:30+ just to compile. Is there anyway to shed light on what is going on there?

  • I guess I will have to use a developer support issue to get to the bottom of this. It's pretty clear to me, that taking 30 seconds to compile a shader that normally compiles in 0.001 is the root of the runtime error. The compiler is obviously very confused and apparently not creating a sane binary.

Add a Comment

It turns out that the slow compile only occurs (intermittently) when Shader Validation is enabled. If I disable Shader Validation it compiles as expected.

When compiling for iOS 13, since Shader Validation is not supported, the slow compile would not occur since it was not being validated.

As for the runtime hang, it turns out that I was indeed accessing a buffer out-of-bounds. Why it was never an issue before I can't say, but it was my bad. I was just super suspicious of the build since it was taking abnormally long to compile.

As for the apparent bug with Shader Validation, I have updated my FeedBack Assistant report with sysdiagnose reports.