Post

Replies

Boosts

Views

Activity

Seems like an issue in Gatekeeper or syspolicyd: killing random sibling of gone process
Hello, I am working at DevTools at Yandex and maintaining our proprietary large scale build system. Around release time of Catalina 10.15.4 our users on macOS started to compain about random crashes during build process. What is known so far: Some build process is killed by the following reason: default 19:56:59.128134+0300&#9;&#9;kernel&#9;initiating malware scan (activeRulesVersion: 8593777743213535705 lastScanVersion: 8593777743213535705 chgtime: 1599757019 lastFileScanTime: 1599757018 pid: 36603 info_path: /Users/<ydx_private>/.ya/build/build_root/t21p/000a16/contrib/tools/python3/pycc/pycc proc_path: /Users/<ydx_private>/.ya/build/build_root/t21p/000a16/contrib/tools/python3/pycc/pycc default 19:56:59.128366+0300&#9;&#9;kernel&#9;build_userspace_exit_reason: illegal flags passed from userspace (some masked off) 0x141, ns: 9, code 0x8 default 19:56:59.127993+0300&#9;&#9;taskgated&#9; no signature for pid=36603 (cannot make code: UNIX[No such file or directory]) error&#9; 19:56:59.128313+0300&#9;&#9;syspolicyd&#9;Unable (errno: 2) to read file at <private> for pid: 36603 process path: <private> library path: (null) error&#9; 19:56:59.128336+0300&#9;&#9;syspolicyd&#9;Terminating process due to Malware rejection: 36603, <private> default 19:56:59.128390+0300&#9;&#9;kernel&#9;Sleep interrupted, signal 0x100 default 19:56:59.128406+0300&#9;&#9;kernel&#9;Security policy would not allow process: 36603, /Users/<ydx_private>/.ya/build/build_root/t21p/000a16/contrib/tools/python3/pycc/pycc The file to be scanned is an Python3 pycc tool built from sources during the build process and is hard-linked from build cache to working directory where it is executed. The location is a working directory for some build command (we call it build root). We create separate directory tree for each command executed and hard-linking built dependencies there including tools. From our build logs I know that command in build_root/t21p/000a16/ is already finished and so build root is being removed. Results are hard-linked into the build cache and so this build root is not needed any more. The pycc process which might be subject to kill have already finished and gone. So gatekeeper comes late, cannot find process' file and terminates some other sibling process ran by our build system. Killed process is another hardlink for the same tool but in another build root (though this may be a coincidence). Even more interesting (but more rare) cases happen when we disable build root cleanup completely. In this case I see: error 17:02:43.522820+0700 syspolicyd Unable (errno: 2) to read file at /Users/<ydx_private>/.ya/build/cache/7/rm/9f3ff5e2a5ecfc999b115c215a1d36a4-0/new/8121f31ef16b4222a8fd3843d90c46aeaa91ad04 for process path: /Users/<ydx_private>/.ya/build/cache/7/rm/9f3ff5e2a5ecfc999b115c215a1d36a4-0/new/8121f31ef16b4222a8fd3843d90c46aeaa91ad04 library path: (null) error 17:02:43.522953+0700 syspolicyd Terminating process due to Gatekeeper rejection: 15676, /Users/<ydx_private>/.ya/build/cache/7/rm/9f3ff5e2a5ecfc999b115c215a1d36a4-0/new/8121f31ef16b4222a8fd3843d90c46aeaa91ad04 default 17:02:43.522995+0700 kernel build_userspace_exit_reason: illegal flags passed from userspace (some masked off) 0x141, ns: 9, code 0x8 default 17:02:43.523040+0700 kernel Sleep interrupted, signal 0x100 default 17:02:43.523058+0700 kernel Security policy would not allow process: 15676, /Users/<ydx_private>/.ya/build/cache/7/rm/9f3ff5e2a5ecfc999b115c215a1d36a4-0/new/8121f31ef16b4222a8fd3843d90c46aeaa91ad04 This is same issue but at delayed removal of file during build cache garbage collection. This case is even more puzzling: while in first case the process to be scanned had been running short time before, in this case the file to be scanned had never run in printed location. "rm" in the path means that file was moved from cache to special location for transacted removal. We never execute anything from build cache directly (only via hardlinks to build roots) and even more so for "rm" place. This plainly doesn't seem right, so I am looking for any explanations and hints how to fix or workaround this (except complete disable of SIP, which is hardly be approved by our InfoSec). Please, note that tools are built and immediately needed as part of code build process, so we plainly cannot codesign and notorize these. Also seems like the same issue was spotted in the wild by others: https://github.com/christopherfujino/catalina-crasher-demo . This looks like another manifestation of the same issue, and apparently it was caught before 10.15.4, but most of our reports are started around 10.15.4, so issue might become more frequent or more likely in our setup. I will appreciate any help in workarounding or completely resolving the issue. I will be also happy if Apple will fix this issue in some Catalina update.
5
0
2k
Oct ’20