Hi!
I configured my system to collect core dumps on app crash, but it turned out that in some cases the system becomes unresponsive after a user app crashes.
I was unable to reproduce the issue crashing simple C or even C# (by .NET runtime) programs, but for example it always happens after crashes of our unit tests (NUnit framework, and .NET 7 runtime): the corresponding core file appears on the disk, but an attempt to copy/open that file hangs forever:
$ ls -la .../core.dotnet.5406.511
-r-------- 1 ... staff 30273536 Jan 2 09:19 .../core.dotnet.5406.511
$ cp /opt/buildAgent/temp/buildTmp/core.dotnet.5406.511 1
^C^C^C^C^C^C^C^C^C^C^Z(hangs)
And now at that time I cannot open Activity Monitor, or Console, or Terminal (they are just bouncing in the Dock), sw_vers command doesn't work (as well as lldb and sample), and according to ps they get stuck in an uninterruptible wait:
$ sw_vers
(hangs)^C
$ sudo lldb -p 5477
(hangs)^C
$ sudo sample -p 5477
(hangs)^C [Interrupted]
Not currently sampling -- exiting immediately.
$ ps ax | grep " U"
1 ?? Us 1:03.58 /sbin/launchd
5406 ?? U 0:59.11 .../dotnet exec [...]
5477 s000 U+ 0:00.00 cp .../core.dotnet.5406.511 1
5496 s001 S+ 0:00.00 grep U
After a reboot, the core file is not saved and everything works fine again until the same case:
$ ls -la .../core.dotnet.5406.511
-r-------- 1 ... staff 0 Jan 2 09:19 .../core.dotnet.5406.511
Just in case, here are the entitlements of dotnet:
$ codesign -d --entitlement :- .../dotnet
Executable=.../dotnet
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.security.cs.allow-jit</key>
<true/>
<key>com.apple.security.cs.allow-dyld-environment-variables</key>
<true/>
<key>com.apple.security.cs.disable-library-validation</key>
<true/>
<key>com.apple.security.cs.debugger</key>
<true/>
<key>com.apple.security.get-task-allow</key>
<true/>
</dict>
</plist>
I didn't see this issue on macOS Catalina 10.15.7, is there anything I can help with?
--
.NET SDK 7.0.100
macOS Big Sur 11.7.1
Post
Replies
Boosts
Views
Activity
Hi! I use "non-master" term as a replacement for *****. I cannot find a documented safe way to read an output from non-master PTY because unlike Linux, on macOS it is discarded immediately after the child process exits. The situation is similar, regardless of whether I use forkpty(), or open /dev/ptmx directly. Here is my example:
#undef NDEBUG
#define CHECK assert
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
#if defined(\__APPLE__)
#include <util.h>
#else
#include <pty.h>
#endif
int main(void)
{
		int master;
		char name[PATH_MAX];
		pid_t pid = forkpty(&master, name, NULL, NULL);
		CHECK(pid != -1);
		if (pid == 0) {	// child
				//char *args[] = {"/bin/echo", "01234567890123456789012345678901234567890123456789", NULL};
				//execv(args[0], args);
				char buf[] =
						"01234567890123456789012345678901234567890123456789"	 // 50
						"01234567890123456789012345678901234567890123456789";	// 100
				CHECK(printf("%s", buf) == sizeof buf - 1);
				CHECK(fflush(stdout) == 0);
				exit(EXIT_SUCCESS);
		}
		// parent
#define BUFSIZE 64
		char buf[BUFSIZE];
		//int non-master = open(name, O_WRONLY);	// here!
		//CHECK(non-master != -1);
		sleep(2);	// simulate process scheduling
		int len;
		do {
				len = read(master, buf, BUFSIZE);
				if (len == -1) {
						printf("read = %d, err = %d\n", len, errno);	// EIO on Linux
						break;
				}
				printf("read %d bytes: %.*s\n", len, len, buf);
		} while (len != 0);
		int status;
		CHECK(waitpid(pid, &status, 0) == pid);
		CHECK(WIFEXITED(status));
		CHECK(WEXITSTATUS(status) == EXIT_SUCCESS);
		CHECK(close(master) != -1);
		return 0;
}
On Linux, the output is:
$ ./a.out
read 64 bytes: 0123456789012345678901234567890123456789012345678901234567890123
read 36 bytes: 456789012345678901234567890123456789
read = -1, err = 5
but on macOS:
$ ./a.out
read 0 bytes:
There is a known workaround -- keep an open non-master fd in the parent process (see unix.stackexchange.com/a/478969), but it really looks like a "hacky" way, which btw causes a bug on Linux (read() never returns), and which isn't documented anywhere, right? So from this point of view, the code above should work as good as on Linux, and many projects use PTY this way (w/o non-master fd), which turns out to not work.
For example, Python's Pexpect will fail on macOS if there is a delay between spawn() and the first .expect():
import pexpect, time
child = pexpect.spawn('python3 -c \'print("%s", flush=True)\'' % ("0123456789" * 10))
time.sleep(2)
child.expect("0123456789" * 10)
results to:
$ python3 pexpect_test.py
Traceback (most recent call last):
	File "pexpect_test.py", line 5, in <module>
		child.expect("0123456789" * 10)
	File "/usr/local/lib/python3.8/site-packages/pexpect/spawnbase.py", line 343, in expect
		return self.expect_list(compiled_pattern_list,
	File "/usr/local/lib/python3.8/site-packages/pexpect/spawnbase.py", line 372, in expect_list
		return exp.expect_loop(timeout)
	File "/usr/local/lib/python3.8/site-packages/pexpect/expect.py", line 179, in expect_loop
		return self.eof(e)
	File "/usr/local/lib/python3.8/site-packages/pexpect/expect.py", line 122, in eof
		raise exc
pexpect.exceptions.EOF: End Of File (EOF). Empty string style platform.
Is this the expected behavior that should be specified in openpty(3) manual page, or is it a bug in macOS Catalina 10.15.7?