I'm using a Mac Mini as a Jenkins agent, which we use to run our Xcode tests on physical iOS devices. It's configured for remote access with SSH & screen sharing with VNC. Every few days they start failing completely. Sometimes one of them is up a bit longer, but more usually they're both down.
If I look in the GUI it says it's running. The correct ports are listening.
In Console I can see that the sshd process exits with 255 the instant it's started, but I haven't been able to get anything more specific.
I've found that I can get SSH & VNC access back with
launchctl bootout system/<svc name>
launchctl disable system/<svc name>
launchctl enable system/<svc name>
launchctl bootstrap system <plist file name>
The problem is that I can't tell from the remote device that it's not accessible by SSH/VNC. The different interfaces say that sure, everything's fine.
When I do a launchctl print there are some differences between the non-working and working versions. I don't know if these are actual indicators that it's down, or artifacts of the way I restarted them. The differences are consistent for both VNC & SSH:
Not working but apparently running:
path = (submitted by smd.215)
submitted job. ignore execute allowed.
system service = 0
Working after launchctl stop/restart:
path = /System/Library/LaunchDaemons/<plist file>
system service = 1
So, a few questions:
- Has anyone else seen this?
- Is there some way to get more error information about why sshd is exiting in the logs/console?
- Is there a way to detect that sshd is failing, even though there's no system log entry for the failure, and the various interfaces show that everything's fine?
- What's the cleanest way to tell the system to restart remote access every day just in case it can't be identified any other way?