Zombie processes for terminal commands

Hello.


We have encountered the following issue: we have a dynamic library that performs signature checks via codesign. The checks are performed using a pipe mechanism. The child runs execv("/usr/bin/codesign", "-dv --versbose=2 {full_path_to_library}") and the parent reads the output of the execv command.


When the library is loaded from an user level application, everything is working as expected: our library will run execv, a new codesign process will start and will finish its job successfully.


When the library is loaded from a daemon process, our library will run execv, a new codesign process will start, will return success but it will not stop. The process will be marked as a zombie process. Tried to kill it programatically using 'kill( pid, SIGKILL)' and also via terminal using 'kill -9 pid'. We tried also with every available parameter of the kill command, still no success.


The zombie processes will not stop, until machine restart. The issue is reproducing only on macOS 10.15.4, in 10.14.6 is working as expected.


The issue is reproducing also with other processes, including 'pkgutil', 'spctl', 'sysctl', 'date', 'id' and others.


Please let us know how can we kill these zombie processes and what should we do in order to gracefully stop them after executing the requested command. Or if, maybe, this is a know issue with the latest macOS version.


I just want to mention that the library is a C++ library.


If there is anything we can collect from the system, please let us know.


Many thanks in advance!


P.S. I noticed a similar thread started in XCode: https://forums.developer.apple.com/thread/133094

Replies

Is the parent cleaning up its children using waitpid() or similar? Or are you setting the signal action for SIGCHLD to SIG_IGN? Or using the double-fork technique? If you're not doing one of those, zombies is the expected result.


By the way, are you sure you can't get the results you need using the Code Signing Services APIs rather than spawning a subprocess?

Ken Thomases wrote:

are you sure you can't get the results you need using the Code Signing Services APIs rather than spawning a subprocess?

This! Exec’ing a command-line tool when there’s an API you can call is poor form.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

The parent is waiting for the children using waitid. Here is how we do it:


do

{

// read from pipe until exit/timeout is reached

}

while( ( waitid( P_PID, childPid, &sigInfo, WEXITED | WSTOPPED | WNOHANG ) == 0)

&& (sigInfo.si_pid == 0)

&& (usleep( 1000 ) == 0)

&& (waitTimeMS++ <= max_timeout_ms) ); // this is the timeout check, but in our case it is not used


We will enhance, in the future, the signature verification check to use the Objective C APIs, but this does not seem to be the root cause for our issue.


The zombie process is not a codesign specific issue and is reproducing also when calling other processes, like 'pkgutil', 'spctl', 'sysctl', 'date', 'id' and others. I mentioned 'codesign' as an example because is the most used process and the majority of the zombie processes are coming from it.


Is the waitid approach a good one? Also, do you have any idea on what could be the root cause or do you have any guidance that will help us fix the issue?


In the meantime, we are working on implementing some of the suggestions.


Thank you!

waitid() is not documented. It is declared in the system headers, but has no man page (unlike, say, waitpid()). I'd be reluctant to use it.


Also, the loop you showed will end if waitid() returns -1. How do you distinguish that from the child having exited? And why are you specifying WSTOPPED? That doesn't indicate the child has exited and won't reap the (eventual) zombie.


Does it work any better to using waitpid()?