If we build it with -mllvm -enable-constraint-elimination=0 it will remove the issue.
@y-c-c, thank you for that input and the discussion on the github thread where you provided the reproducer https://github.com/Homebrew/homebrew-core/issues/195325.
I have now verified that using -mllvm -enable-constraint-elimination=0 does indeed prevent this issue and the OpenJDK build succeeds (a very minimal test of using the built binary seems to work fine too). I have tried those flags with both -O2 and -O3 and it prevents this issue for both those optimization levels. I've added this detail to the Apple feedback issue FB15162411. Hopefully this will help address the issue in some upcoming release of XCode.
Thank you everyone for all the help in trying to narrow this down.
Post
Replies
Boosts
Views
Activity
Hello @endecotp
Have either of you tried compiling with more fine-grained optimisation settings? In particular, have you tried with -O3 but with vectorisation disabled?
When this issue showed up, I tried using -fno-vectorize and -fno-slp-vectorize (both with -O2 and -O3). The issue continues to reproduce against the JDK build even with those flags. If those aren't the flags you meant or if there are additional flags you want me to try, please let me know.
Hello @y-c-c
I couldn't add a comment to that JDK thread (is there no way to register an account?)
Registering for an account to the JDK's issue tracker requires a "Author" role in the issue tracker. That role is explained here https://openjdk.org/guide/#becoming-an-author and it boils down to requiring few prior contributions to the project. To file new bugs of course one doesn't have to create an account and they can instead use https://bugs.java.com/bugdatabase/. Given the current case where you wanted to comment on an existing issue, you are right that it isn't possible to create an account for that.
but I posted about another Xcode clang miscompilation bug about a week ago as well (https://developer.apple.com/forums/thread/766030). It doesn't look identical to what the JDK bug is reporting but there are some similarities, including wrong codegen that leads to subtle offset when looping, and only happening in -O2/-O3.
Thank you for pointing me to that. I hadn't noticed that thread. In that thread you note:
I did try to see if there are some undefined behaviors or something which caused the optimizer to go to town with the code. Funnily when I use UBSAN (by compiling the code with -O2 -fsanitize=undefined) the code works just fine and the bug doesn't happen.
I'm going to see if using that -fsanitize flag against our JDK build results in the build passing and thus hinting at some kind of similarity between these issues.
Hello Quinn,
It’s definitely landed in the right place. Sadly, I’ve no info to share beyond that.
Thank you very much for taking a look and confirming. I'll continue to wait for an update in that issue then.
Later today I will go ahead and report this, through feedback assistant, as a log noise.
I've filed FB15498510 suggesting this log message be suppressed.
Hello Quinn,
Apple’s frameworks tend to use lazy initialisation, ...
Thank you for that detail. That (and the rest of what you note in your answer) addresses my curiosity.
Taking a big step back, the standard way to get the macOS version from the command line
Noted. Although in the context of where this issue shows up, the command line way of determining the OS version isn't applicable for us. For context, this unexpected log message got reported as a bug against the JDK https://bugs.openjdk.org/browse/JDK-8340727. The JDK internally uses the NSProcessInfo's operationSystemProperty to determine the OS version, which then triggers the log.
Well, you can report it as log noise if you like, but it’s definitely not a sign of an actual problem.
Later today I will go ahead and report this, through feedback assistant, as a log noise.
Thank you for your answers and the detailed technical explanations.
Thank you Quinn for that explanation and the example. That helps.
Foundation was created as part of Apple’s (well, NeXT’s) app development story. For this reason it contains a component, UserDefaults, with some helpful, but non-obvious, behaviour: You can override specific user defaults by passing them on the command line.
More out of curiosity - for a property like NSProcessInfo's operatingSystemVersion, which I guess will always be fixed on a given host, does it still internally need/use any user overriddable values that require parsing the command line of a process?
Hello Quinn,
To be clear, that’s most definitely a bug. I encourage you to file a bug report about it, even if you can’t reproduce it yourself. Ideally that bug report would include a sysdiagnose log taken by one of your users just after that reproduce it.
Please post your bug number, just for the record.
I have now created a bug through feedback assistance. The id is FB15368430. I've attached the trivial .c code which reproduces this on several of the hosts that I have run this against. For now, I don't have access to sysdiagnose output. I will check if that can be shared from one of the hosts that we reproduce this issue on. I will upload to that issue once I get access to those logs.
Now, I can’t guarantee that the solution to that bug might be that we add ENOEXEC to the man page, but someone from the networking team needs to make that call.
Understood. In fact, more than the "man" page update, once the relevant team finds out the root cause of this issue, what would be useful is either details of what exactly causes this error and/or advice to application developers (those who call setsockopt) on what is expected of them to address this error or advice to network administrators on how to fix/updated their configurations to prevent this error.
For the record, during investigation of this issue, I've experimented with retrying the "setsockopt" when it fails with this errno, just to see if there is some kind of race or some such thing. But that hasn't helped - the subsequent call returns back with the same error.
Is the uptick you mentioned correlated with macOS 15’s release? The firewall got a major rework in that release.
I can answer this one for certain that this issue isn't related to macOS 15 release. None of the reports (including for hosts that run in our internal setups) have been against this version. In fact, we haven't yet started using this version in our setup.
Have you looked to see if reports of the problem are always from those folks using the firewall?
With help from admins who have access to some of these hosts, I know that some of the hosts on which this issue reproduces, has the firewall disabled. Specifically, on those hosts, "Settings" -> "Network" -> "Firewall" says "This computer's firewall is currently turned off. ...."
If there's anything else that you or others would like to know to narrow this down, please do let me know, either here on in the feedback issue FB15368430.
Hello Quinn,
I am in the middle of investigating an issue arising in the call to setsockopt syscall where it returns an undocumented and unexpected errno.
What’s that value?
A IPv4 SOCK_DGRAM socket that's bound and subsequently a setsockopt on that socket for IP_ADD_MEMBERSHIP option is leading to the return value from that call to be -1 with errno set to 8, which gets reported as "Exec format error". The setsockopt reproducer is very trivial
#include <netinet/in.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
int main(int argc, char *argv[]) {
if (argc != 3) {
fprintf(stderr, "Error, expected usage: <program> <multicast-ip-address> <network-interface-ip-address>\n");
fprintf(stderr, "example usage: ./a.out 225.4.5.6 192.168.1.2\n");
return -1;
}
char *mcast_join_group_addr = argv[1];
char *network_intf_addr = argv[2];
fprintf(stderr, "test will join multicast group address = %s of network interface address = %s\n",
mcast_join_group_addr, network_intf_addr);
// create a datagram IPv4 socket
int type = SOCK_DGRAM;
int domain = AF_INET;
int fd = socket(domain, type, 0);
if (fd < 0) {
fprintf(stderr, "FAILED to create socket, errno %d - %s\n", errno, strerror(errno));
return -1;
}
fprintf(stderr, "SOCK_DGRAM socket created, fd=%d\n", fd);
// bind the socket to a wildcard address and ephemeral port
struct sockaddr_in sa;
memset((char *) &sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_port = 0;
// bind to wildcard
inet_pton(AF_INET, "0.0.0.0", &(sa.sin_addr.s_addr));
socklen_t len = sizeof(sa);
int b = bind(fd, (struct sockaddr *) &sa, len);
if (b < 0) {
fprintf(stderr, "failed to bind: errno=%d - %s\n", errno, strerror(errno));
return -1;
}
fprintf(stderr, "socket successfully bound\n");
// set IP_ADD_MEMBERSHIP socket option on the socket
struct ip_mreq mreq;
// multicast group address
inet_pton(AF_INET, mcast_join_group_addr, &(mreq.imr_multiaddr.s_addr));
// interface IP address
inet_pton(AF_INET, network_intf_addr, &(mreq.imr_interface.s_addr));
int opt = IP_ADD_MEMBERSHIP;
void *optval = (void *) &mreq;
int optlen = sizeof(mreq);
fprintf(stderr, "setting IP_ADD_MEMBERSHIP on socket\n");
int n = setsockopt(fd, IPPROTO_IP, opt, optval, optlen);
if (n < 0) {
fprintf(stderr, "FAILED - setsockopt(IP_ADD_MEMBERSHIP) returned %d with errno %d - %s\n",
n, errno, strerror(errno));
close(fd);
return -1;
}
close(fd);
fprintf(stderr, "SUCCESSFUL completion of the test\n");
}
The fact that the errno is set to (or atleast interpreted as a) ENOEXEC is surprising since man setsockopt makes no mention of that error for this call.
My guess is that some specific filter/extension code gets run through the setsockopt syscall. Reading through https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/NKEConceptual/socket_nke/socket_nke.html I suspected it could be some socket filter.
This issue has been reported to the JDK team since around a decade https://bugs.openjdk.org/browse/JDK-8144003 but it's only recently that we have started noticing it more frequently in our setups. It could be something to do with our macosx hosts, but at this point I don't have an idea of what tools/commands/options I should be using to understand what code from within setsockopt is interfering here.
Would you happen to know any tracing (ktrace?) that might help narrow this down further? The system logs (viewed through Console app) haven't shown anything specific.
Having said that, the netstat output you posted makes it clear that all the filters currently attached were attached by the OS. You are not dealing with third-party code here.
That's good to know. In context of multicasting (or more specifically that setsockopt IP_ADD_MEMBERSHIP option) do these OS attached filters play any role or apply any specific rules that I should be aware of?
Hello Marten,
Is the issue public? I'm getting a "Feedback Not Found" under this link.
The issue isn't public. None of the issues filed with "Feedback assistant" are public. It's still an open issue and we very regularly run into this. I have been told in a different discussion that the issue is being investigated by Apple. There's no fix for it right now.
This bug has cost me a few days of debugging work to track down flaky test failures in quic-go. It also seems to be the root cause behind https://github.com/golang/go/issues/67226.
I am not from Apple, but my recommendation would be to file a feedback assistant issue of your own with these details (and any other details) so that this gets additional attention. While filing that issue, I would recommend following Quinn's suggestions here https://forums.developer.apple.com/forums/thread/751587?answerId=787971022#787971022 (specifically choose Developer Technologies & SDKs at the top level when filing the issue)
P.S: I didn't receive any notification from this thread when you posted your message. I only accidentally happened to view this thread today and noticed your post.
Hello Quinn,
So, that last one FB9997771 was reported as fixed in the 2022 OS rereleases (so, macOS 13 and friends).
Thank you for that detail, Quinn. I am very happy to hear that that one is officially fixed. That had caused really odd intermittent failures in unexpected areas within the JDK and had taken us a long time to narrow it down. I will run our tests against macosx 13 and higher to verify the fix.
I ran our reproducer against macos 12.x, 13.x and 14.x versions of macos aarch64. I can confirm that the issue is fixed and no longer reproducible in 13.x and 14.x versions. The issue continues to reproduce in 12.x versions of macos aarch64.
So, that last one FB9997771 was reported as fixed in the 2022 OS rereleases (so, macOS 13 and friends).
Thank you for that detail, Quinn. I am very happy to hear that that one is officially fixed. That had caused really odd intermittent failures in unexpected areas within the JDK and had taken us a long time to narrow it down. I will run our tests against macosx 13 and higher to verify the fix.
That should’ve been communicated to you but wasn’t. I’m not sure why. I’ll follow-up on that internally, just for my own understanding, but this is all about Apple internal processes so I probably won’t post any more details here.
I understand.
The other two are still under investigation; I don’t have any further info to share.
Thank you for that update.
Bug Reporting: How and Why? has a bunch of hints and tips on this front, but probably the best important is this one:
If you’re filing a bug against an API, choose Developer Technologies & SDKs at the top level.
This is useful. So far I've been filing it under "Something else not in this list" category for several of the bugs that I've opened either related to network APIs or kernel APIs. Including the one that I opened in this discussion. Henceforth, I'll keep that category in mind.
Overall, thank you again for all the help and responses you have been providing, not just in this thread but other previous discussions too. Not receiving any updates/responses on feedback assistance issues is demotivating but seeing the responses in the developer forums here and being assured that the feedback assistant issues have been noticed and are being investigated does encourage in filing new ones.
Thank you Quinn for your help so far. I have now filed FB13799990 to track this issue and attached the same reproducer to it.
On a different note, during the past couple of years I have filed a few issues through feedback assistant, 2 of them can be classified as belonging to networking area. They are still open and haven't seen any acknowledgement or any response. In some other channels, I have been told (by people who aren't from Apple) that when such issues in feedback assistant don't see any response, it most likely means that they haven't been triaged and the suggestion is to refile them afresh. I don't know if I should be doing that. The feedback ids for those other issues are FB12128351, FB12016446 and FB9997771. Would you have any inputs on whether or not I should refile these other issues or just leave them alone and hope someone responds to them?
Hello Quinn, I don't have data on whether applications still rely on out-of-band or whether applications do send out-of-band data. However, given that this is an option at the socket level, the JDK as part of the Java specification exposes an API on its java.net.Socket class to allow applications to enable or disable this option https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/net/Socket.html#setOOBInline(boolean). As part of verifying that the Java API works as expected (on all OS), we run tests for that API. In fact, that's what prompted me to look into this issue when we noticed it fail (only) on macos. For context, here's the JDK issue https://bugs.openjdk.org/browse/JDK-8279920 where we have been tracking this.
Specifically here, if some peer ends up sending a out-of-band data (for whatever reason) and even if the application has decided it isn't interested in out-of-band data (which is the default), then the application can still end up receiving this out-of-band data unexpectedly.
@javadev12345, since you note that this happens with jpackage and is even reproducible with the recent released Java 20, I would recommend that you open an issue here https://bugreport.java.com/bugreport/start_form with all relevant details, including the commands that you use and whether this is a macos x64 or M1, so that someone from the jpackage team can take a look.
I remember that in the past there was at least one similar issue which I think was addressed in https://bugs.openjdk.org/browse/JDK-8276150 and https://bugs.openjdk.org/browse/JDK-8277493. This could be a different variant of the issue though.