XCode 16 clang++ compiler generates unexpected results for conditional checks at -O2 and -O3 optimization levels

Around a month back, developers of the OpenJDK project, when using XCode 16 to build the JDK started noticing odd failures when executing code which was compiled using the clang++ compiler shipped in that XCode 16 release (details in https://bugs.openjdk.org/browse/JDK-8340341). Specifically, a trivial for loop in a c++ code of the form:

int limit = ... // method local variable
for (i=0; i<limit; i++) { 
...
}

ends up iterating more times than the specified limit. The "i<limit" returns true even when it should have returned false. In fact, debug log messages within the for loop of the form:

fprintf(stderr, "parsing %d of %d, %d < % d == %s", i, limit, i, limit, (i<limit) ? "true" : "false"); 

would show output of the form:

parsing 0 of 2, 0 < 2 == true
parsing 1 of 2, 1 < 2 == true
parsing 2 of 2, 2 < 2 == true 

Notice, how it entered the for loop even when 2 < 2 should have prevented it from entering it. Furthermore, notice the message says 2 < 2 == true (which clearly isn't right).

This happens when that code is compiled with optimization level -O2 or -O3. The issue doesn't happen with -O1.

I had reported this as an issue to Apple through feedback assistance, more than a month back. The feedback id is FB15162411. There hasn't been any response to it nor any indication that the issue has been noticed and can be reproduced (the steps to reproduce have been provided in that issue). In the meantime, more and more users are now running into this failure in JDK when using XCode 16. We haven't put any workaround in place (the only workaround we know of is using -O1 for the compilation of this file) because it isn't clear what exactly is causing this issue (other than the fact that it shows up with specific optimization levels). It's also unknown if this bug has wider impact. Would it be possible to check if FB15162411 is being looked into and any technical details on what's causing this? That would help us decide if it's OK to put in place a temporary workaround in the OpenJDK build and how long to maintain that workaround.

For reference, this was reproduced on:

clang++ --version
Apple clang version 16.0.0 (clang-1600.0.26.3)
Target: arm64-apple-darwin23.6.0
Thread model: posix
InstalledDir: /xcode-16/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Answered by DTS Engineer in 810013022
I had reported this as an issue to Apple through feedback assistance, more than a month back. The feedback id is FB15162411.

Thanks for that. It’s definitely landed in the right place. Sadly, I’ve no info to share beyond that.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Accepted Answer
I had reported this as an issue to Apple through feedback assistance, more than a month back. The feedback id is FB15162411.

Thanks for that. It’s definitely landed in the right place. Sadly, I’ve no info to share beyond that.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Hello Quinn,

It’s definitely landed in the right place. Sadly, I’ve no info to share beyond that.

Thank you very much for taking a look and confirming. I'll continue to wait for an update in that issue then.

I couldn't add a comment to that JDK thread (is there no way to register an account?) but I posted about another Xcode clang miscompilation bug about a week ago as well (https://developer.apple.com/forums/thread/766030). It doesn't look identical to what the JDK bug is reporting but there are some similarities, including wrong codegen that leads to subtle offset when looping, and only happening in -O2/-O3.

This codegen bug is currently blocking Vim from being upgraded to using the latest toolchain for CI, and is actually causing the active distributed Vim in Homebrew to be miscompiled, because Homebrew believes in using the latest Apple toolchains (which Apple encourages).

Edit: I just tested, and the system default Vim in macOS 15 (/usr/bin/vim) also shows the same bug. The repro steps is at https://github.com/Homebrew/homebrew-core/issues/195325

I share the above commenter's frustration that while Apple tells us to use the Feedback Assistant to file bugs, it is essentially a black hole with zero feedbacks. I still have bugs that I opened years ago that remain open.

Miscompilation bugs seem like a pretty serious class of bugs in compilers. It's one thing for clang to crash or fail to compile or suffer performance regressions. It's a completely different issue if it generates subtly incorrect code which has all sorts of ramifications as compilers are pretty low-level (hence relied on by a lot of software) in the software supply chain, and need to be trusted to generate correct code.

Is it possible to get some more info than "no info to share"? As I mentioned I think miscompilation bugs are serious enough that developers deserve to know if they should be warned to not upgrade to the latest toolchain if their software are going to break in subtle ways. JDK is lucky that this is immediately throwing a runtime exception that it got noticed.

Is it possible to get some more info than "no info to share"?

No. I can only discuss stuff that’s in either a shipping or a seeded OS release (or, in this case, tools) [1]. AFAICT there’s been no developments like that for FB15162411 (or FB15489959, which is the bug from your thread).

developers deserve to know if they should be warned to not upgrade to the latest toolchain if their software are going to break in subtle ways.

Agreed. And that does happen, via the Xcode Release Notes, when we know about specific bugs. AFAICT these two bugs were not known about until Xcode 16 shipped.

This is one of the reasons why we have a comprehensive seeding programme, for both OS releases and developer tools. The goal is to uncover these issues before the final release. And yes, you will absolutely find examples where that system failed, but there are lots of cases where it worked.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] I talk about this more in tip 3 of Quinn’s Top Ten DevForums Tips.

Hello @y-c-c

I couldn't add a comment to that JDK thread (is there no way to register an account?)

Registering for an account to the JDK's issue tracker requires a "Author" role in the issue tracker. That role is explained here https://openjdk.org/guide/#becoming-an-author and it boils down to requiring few prior contributions to the project. To file new bugs of course one doesn't have to create an account and they can instead use https://bugs.java.com/bugdatabase/. Given the current case where you wanted to comment on an existing issue, you are right that it isn't possible to create an account for that.

but I posted about another Xcode clang miscompilation bug about a week ago as well (https://developer.apple.com/forums/thread/766030). It doesn't look identical to what the JDK bug is reporting but there are some similarities, including wrong codegen that leads to subtle offset when looping, and only happening in -O2/-O3.

Thank you for pointing me to that. I hadn't noticed that thread. In that thread you note:

I did try to see if there are some undefined behaviors or something which caused the optimizer to go to town with the code. Funnily when I use UBSAN (by compiling the code with -O2 -fsanitize=undefined) the code works just fine and the bug doesn't happen.

I'm going to see if using that -fsanitize flag against our JDK build results in the build passing and thus hinting at some kind of similarity between these issues.

Have either of you tried compiling with more fine-grained optimisation settings? In particular, have you tried with -O3 but with vectorisation disabled?

Hello @endecotp

Have either of you tried compiling with more fine-grained optimisation settings? In particular, have you tried with -O3 but with vectorisation disabled?

When this issue showed up, I tried using -fno-vectorize and -fno-slp-vectorize (both with -O2 and -O3). The issue continues to reproduce against the JDK build even with those flags. If those aren't the flags you meant or if there are additional flags you want me to try, please let me know.

Have either of you tried compiling with more fine-grained optimisation settings? In particular, have you tried with -O3 but with vectorisation disabled?

Just tried to turn off vectorization and the bug still occurs for me. Looking at the disassembly I don't think there's any weird vectorization going on.

I did force functions to be inlined and that removes the problem. I personally think the inlining is just exposing the problem though, as multiple inlined functions allowed the optimizer to optimize more.

Yes, those are the options I had in mind. Oh well, worth a try!

I did force functions to be inlined and that removes the problem. I personally think the inlining is just exposing the problem though, as multiple inlined functions allowed the optimizer to optimize more.

Small correction: I meant I forced functions to not be inlined and that removes the problem.

Just as an update, the Homebrew issue thread above helped narrow down the optimization pass for Vim using a bisect. If we build it with -mllvm -enable-constraint-elimination=0 it will remove the issue. From looking at what constraint elimination does that sounds like it would likely be the same issue as JDK where the compiler misunderstands the constraints. I'm playing around with this flag a little more to see how it behaves.

If we build it with -mllvm -enable-constraint-elimination=0 it will remove the issue.

@y-c-c, thank you for that input and the discussion on the github thread where you provided the reproducer https://github.com/Homebrew/homebrew-core/issues/195325.

I have now verified that using -mllvm -enable-constraint-elimination=0 does indeed prevent this issue and the OpenJDK build succeeds (a very minimal test of using the built binary seems to work fine too). I have tried those flags with both -O2 and -O3 and it prevents this issue for both those optimization levels. I've added this detail to the Apple feedback issue FB15162411. Hopefully this will help address the issue in some upcoming release of XCode.

Thank you everyone for all the help in trying to narrow this down.

XCode 16 clang++ compiler generates unexpected results for conditional checks at -O2 and -O3 optimization levels
 
 
Q