Post

Replies

Boosts

Views

Activity

Potential Metal Shader Compiler/Driver Bug
I have discovered what is almost certainly a subtle bug in the metal compiler or driver. Briefly, the output of the shader becomes broken only on newer devices and only if the metal standard is above 2.2 or the minimum iOS version is above iOS 13. I discovered this when I shared a build of my app on TestFlight and one of the testers sent video showing visual bugs (objects drawn in the wrong positions) I had never seen before. I confirmed he was on the same build I was using, and that I did not see this issue on my phone (iPhone 6s). I knew he had a newer phone (iPhone 12 or 13 Pro), so I then tested on an iPhone X, and was able to reproduce it there. I was also able to reproduce it on an iPad Pro 11 in (1st gen). I reverted the codebase until I found a snapshot where the issue was no longer produced. After much testing, I finally determined that the single cause of the issue was that I had increased the minimum iOS deployment target from iOS 12 to iOS 14. Simply setting it back to iOS 12 made the problem disappear on the newer devices. I tried some more combinations and narrowed it down to this: if either the deployment target is above iOS 13, or the metal sdk is above 2.2, the issue is reproducible (only on the newer devices). Metal version 2.2 shipped with iOS 13, so I assume this means it's the Metal version that really matters, and setting the minimum deployment target forces the metal version up. The deployment target of the code doesn't matter. It's only what the deployment target is during compilation of the shaders (I do this in a separate build script). I then started inspecting with the GPU workflow capture. The captured workflow shows the correct vertex positions being output by the vertex shader, even though they are not drawn in those positions!. The symptoms I'm seeing clearly indicate something is broken under the hood. I added debug vertex attributes, then compared the output positions and debug attributes on both the working shader (with older metal version) and the broken one. Everything is exactly the same. The position outputs are identical, but the object is drawn in a different spot. I can share the shader, but I'm not sure that will help because almost the exact same calculations are done in multiple shaders, it works in some, and not in others. I can briefly describe the calculation and the symptoms, which may shed light on the underlying compiler or driver bug. I use a trick to get double precision positions for objects in my app. I do this by splitting both the positions of objects and the position of the camera, which are doubles on the CPU, into two floats (casting to float, then subtracting the casted value from the original to get the remainder), then do this to get the displacement between objects and the camera: float3 displacementRough = centerRough - cameraPositionRough; float3 displacementFine = centerFine - cameraPositionFine; float3 displacement = displacementRough + displacementFine; This way, when objects are close to the camera but far from the origin, the "rough" displacement is zero, and the fine displacement supplies the additional resolution. The symptoms of the broken shaders are basically that the "fine" displacement gets ignored. It's exactly what you'd see if I didn't use this trick: the further out from the origin you get, the more stuff jumps in larger steps instead of smoothly moving around. For this reason my investigation began by close inspection of the "fine" vertex buffers to confirm the values were being sent correctly. Here's the really strange part: in a far away scene with an object right in front of the camera, the rough displacement is zero. If I just hardcode zero into the rough displacement in the shader, it starts working again (only for nearby objects of course)! When I capture the frame, the rough displacement is one of the debug attributes I added, and it's equal to zero in the broken shader! So, what I seeing is that if the shader value is calculated to be zero, it somehow breaks the next calculation (making it behave as if it also equals zero). But if I just hardcode zero, the next calculation works fine. None of this is visible in the captured GPU work. All these calculations, which I'm storing in debug vertex attributes, are always correct according to the collected buffer values, but the screen grab shows the error. The screen grab and the NDC coordinates of vertices plainly don't match. I wish I could share a minimally reproducible example, and I may be able to soon. But I'm happy to work in any way necessary to help Apple figure out what's going on here. These kinds of issues give me nightmares.
4
0
1.1k
Feb ’22
XCTest - continueAfterFailure doesn't work for async tests
In an XCTest, setting continueAfterFailure to true in setUp should cause the test to stop executing as soon a failed assertion occurs. This works correctly unless the test is marked async. In that case, what I've seen is that execution of the failed test continues, and another test starts to execute in parallel. You can see this by pausing execution and seeing two cooperative threads actively executing two different tests. This can cause weird problems due to XCTestCase classes often not being reentrant. It seems that tests are executed concurrently only if they are async, one of them fails and continueAfterFailure is set to false. It almost seems like the "host" thread for the test is properly stopped, but the cooperative thread that runs the async test continues on, while the "host" thread moves on and starts the next async test, which results in two cooperative threads running tests in parallel.
6
5
1.8k
Aug ’22
XCTests - Transparency, Consent and Control Causes Hang
I have a suite of tests for a macOS target that perform various file access operations (opening files to read/write contents, copying/moving/deleting files, iterating directories, etc.). Any operation that touches the Desktop, Documents or Downloads folder results in a one time ~15 second hang. This only happens once. No matter what order tests are run in, or what filesystem operation happens first, only the first such operation hangs like this, then the rest run nearly instantaneously. This only happens on a macOS target. If I run the same tests in an iOS target, and perform the same file operations on the same files through the iOS simulator, this freeze doesn't happen. And it only happens in the three folders protected by Transparency, Consent and Control (Desktop, Downloads and Documents). I first noticed this on C++ filesystem access operations (defined in std::filesystem), but I tested and confirmed it happens through Cocoa APIs too (like [NSData dataWithContentsOfFile]. I'm not sure why it would hang rather than just prompt for access (which I have a vague memory of it doing the first time I ran the test suite) or simply fail. But does anyone know of any settings that can be added to, say, Security & Privacy that will stop the hang from occurring? Without it the tests would run in probably a 100th of a second, so 15 seconds is a massive slowdown.
3
0
625
Oct ’22
Swift Existential Any causes crashes on iOS 15.7 devices
I have discovered that certain usages of existential any from Swift 5.6 can result in crashes on certain devices. After experimenting it seems that the required conditions are a device that is running iOS 15, and optimization is enabled. I have a minimally reproducing example attached. The app crashes immediately when run on an iOS 15 simulator. Should I file a bug?
3
0
566
Oct ’22