Post

Replies

Boosts

Views

Activity

launchd and other system apps get stuck in uninterruptible wait after generating core dump
Hi! I configured my system to collect core dumps on app crash, but it turned out that in some cases the system becomes unresponsive after a user app crashes. I was unable to reproduce the issue crashing simple C or even C# (by .NET runtime) programs, but for example it always happens after crashes of our unit tests (NUnit framework, and .NET 7 runtime): the corresponding core file appears on the disk, but an attempt to copy/open that file hangs forever: $ ls -la .../core.dotnet.5406.511 -r-------- 1 ... staff 30273536 Jan 2 09:19 .../core.dotnet.5406.511 $ cp /opt/buildAgent/temp/buildTmp/core.dotnet.5406.511 1 ^C^C^C^C^C^C^C^C^C^C^Z(hangs) And now at that time I cannot open Activity Monitor, or Console, or Terminal (they are just bouncing in the Dock), sw_vers command doesn't work (as well as lldb and sample), and according to ps they get stuck in an uninterruptible wait: $ sw_vers (hangs)^C $ sudo lldb -p 5477 (hangs)^C $ sudo sample -p 5477 (hangs)^C [Interrupted] Not currently sampling -- exiting immediately. $ ps ax | grep "  U" 1 ?? Us 1:03.58 /sbin/launchd 5406 ?? U 0:59.11 .../dotnet exec [...] 5477 s000 U+ 0:00.00 cp .../core.dotnet.5406.511 1  5496 s001  S+     0:00.00 grep   U After a reboot, the core file is not saved and everything works fine again until the same case: $ ls -la .../core.dotnet.5406.511 -r-------- 1 ... staff 0 Jan 2 09:19 .../core.dotnet.5406.511 Just in case, here are the entitlements of dotnet: $ codesign -d --entitlement :- .../dotnet Executable=.../dotnet <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>com.apple.security.cs.allow-jit</key> <true/> <key>com.apple.security.cs.allow-dyld-environment-variables</key> <true/> <key>com.apple.security.cs.disable-library-validation</key> <true/> <key>com.apple.security.cs.debugger</key> <true/> <key>com.apple.security.get-task-allow</key> <true/> </dict> </plist> I didn't see this issue on macOS Catalina 10.15.7, is there anything I can help with? -- .NET SDK 7.0.100 macOS Big Sur 11.7.1
0
0
386
Jan ’23
Non-Master PTY output discarded on child process exit
Hi! I use "non-master" term as a replacement for *****. I cannot find a documented safe way to read an output from non-master PTY because unlike Linux, on macOS it is discarded immediately after the child process exits. The situation is similar, regardless of whether I use forkpty(), or open /dev/ptmx directly. Here is my example: #undef NDEBUG #define CHECK assert #include <assert.h> #include <errno.h> #include <fcntl.h> #include <limits.h> #include <stdio.h> #include <stdlib.h> #include <sys/wait.h> #include <unistd.h> #if defined(&#92;&#95;_APPLE__) #include <util.h> #else #include <pty.h> #endif int main(void) { &#9;&#9;int master; &#9;&#9;char name[PATH_MAX]; &#9;&#9;pid_t pid = forkpty(&master, name, NULL, NULL); &#9;&#9;CHECK(pid != -1); &#9;&#9;if (pid == 0) {&#9;// child &#9;&#9;&#9;&#9;//char *args[] = {"/bin/echo", "01234567890123456789012345678901234567890123456789", NULL}; &#9;&#9;&#9;&#9;//execv(args[0], args); &#9;&#9;&#9;&#9;char buf[] = &#9;&#9;&#9;&#9;&#9;&#9;"01234567890123456789012345678901234567890123456789"&#9; // 50 &#9;&#9;&#9;&#9;&#9;&#9;"01234567890123456789012345678901234567890123456789";&#9;// 100 &#9;&#9;&#9;&#9;CHECK(printf("%s", buf) == sizeof buf - 1); &#9;&#9;&#9;&#9;CHECK(fflush(stdout) == 0); &#9;&#9;&#9;&#9;exit(EXIT_SUCCESS); &#9;&#9;} &#9;&#9;// parent #define BUFSIZE 64 &#9;&#9;char buf[BUFSIZE]; &#9;&#9;//int non-master = open(name, O_WRONLY);&#9;// here! &#9;&#9;//CHECK(non-master != -1); &#9;&#9;sleep(2);&#9;// simulate process scheduling &#9;&#9;int len; &#9;&#9;do { &#9;&#9;&#9;&#9;len = read(master, buf, BUFSIZE); &#9;&#9;&#9;&#9;if (len == -1) { &#9;&#9;&#9;&#9;&#9;&#9;printf("read = %d, err = %d\n", len, errno);&#9;// EIO on Linux &#9;&#9;&#9;&#9;&#9;&#9;break; &#9;&#9;&#9;&#9;} &#9;&#9;&#9;&#9;printf("read %d bytes: %.*s\n", len, len, buf); &#9;&#9;} while (len != 0); &#9;&#9;int status; &#9;&#9;CHECK(waitpid(pid, &status, 0) == pid); &#9;&#9;CHECK(WIFEXITED(status)); &#9;&#9;CHECK(WEXITSTATUS(status) == EXIT_SUCCESS); &#9;&#9;CHECK(close(master) != -1); &#9;&#9;return 0; } On Linux, the output is: $ ./a.out read 64 bytes: 0123456789012345678901234567890123456789012345678901234567890123 read 36 bytes: 456789012345678901234567890123456789 read = -1, err = 5 but on macOS: $ ./a.out read 0 bytes: There is a known workaround -- keep an open non-master fd in the parent process (see unix.stackexchange.com/a/478969), but it really looks like a "hacky" way, which btw causes a bug on Linux (read() never returns), and which isn't documented anywhere, right? So from this point of view, the code above should work as good as on Linux, and many projects use PTY this way (w/o non-master fd), which turns out to not work. For example, Python's Pexpect will fail on macOS if there is a delay between spawn() and the first .expect(): import pexpect, time child = pexpect.spawn('python3 -c \'print("%s", flush=True)\'' % ("0123456789" * 10)) time.sleep(2) child.expect("0123456789" * 10) results to: $ python3 pexpect_test.py Traceback (most recent call last): &#9;File "pexpect_test.py", line 5, in <module> &#9;&#9;child.expect("0123456789" * 10) &#9;File "/usr/local/lib/python3.8/site-packages/pexpect/spawnbase.py", line 343, in expect &#9;&#9;return self.expect_list(compiled_pattern_list, &#9;File "/usr/local/lib/python3.8/site-packages/pexpect/spawnbase.py", line 372, in expect_list &#9;&#9;return exp.expect_loop(timeout) &#9;File "/usr/local/lib/python3.8/site-packages/pexpect/expect.py", line 179, in expect_loop &#9;&#9;return self.eof(e) &#9;File "/usr/local/lib/python3.8/site-packages/pexpect/expect.py", line 122, in eof &#9;&#9;raise exc pexpect.exceptions.EOF: End Of File (EOF). Empty string style platform. Is this the expected behavior that should be specified in openpty(3) manual page, or is it a bug in macOS Catalina 10.15.7?
1
0
818
Oct ’20