The VM gets a NAT IP just fine, but it doesn't have access through the proxy so I'm guessing 15.x macOS setup has a bug where it can't break out of a loop trying to phone home back to macOS.
FBID: FB15689777
This is not an issue for 14.x VMs. It's also seen across different Virtualization tools.
Post
Replies
Boosts
Views
Activity
We're creating macOS VMs on both 15.x and 14.x hosts and only the 14.x created VMs can run on both 15 and 14 hosts. If we create the VMs on 15.x, something is done by Virtualization that prevents it from running on 14.x. We've tried digging in and don't see anything that our code is doing that's special.
What is Apple doing to the VMs created on 15.x hosts that's special here?
https://feedbackassistant.apple.com/feedback/15645457
Metal passthrough on intel VMs causes com.apple.screensharing.menuextra to crash and screensharing to exit
Create a 15.1 VM with metal passthrough on 15.0.1 or 15.1 host, enable Screen Sharing, then try connecting to with VNC after restarting the machine. I'm using Anka to create the VM. You'll see VNC work (open vnc://192.168.64.3:5900), then a few seconds in show "Reconnecting...", then work, then go to "Reconnecting..." for ~5m until it eventually works consistently.
You'll see launchd showing exits/failures (see screenshots)
You'll see diagnostic reports showing things like:
Thread 0 Crashed:: Dispatch queue: com.apple.RenderBox.Encoder
0 libsystem_kernel.dylib 0x7ff801da5b52 __pthread_kill + 10
1 libsystem_pthread.dylib 0x7ff801ddff85 pthread_kill + 262
2 libsystem_c.dylib 0x7ff801d00b19 abort + 126
3 libsystem_c.dylib 0x7ff801cffddc __assert_rtn + 314
4 Metal 0x7ff80d045d72 MTLReportFailure.cold.1 + 41
5 Metal 0x7ff80d01fa2a MTLReportFailure + 513
6 Metal 0x7ff80cfb74e0 +[MTLLoader sliceIDForDevice:legacyDriverVersion:airntDriverVersion:] + 200
7 Metal 0x7ff80cf265c9 +[_MTLBinaryArchive(MTLBinaryArchiveInternal) deserializeBinaryArchiveHeader:fileData:device:] + 89
8 Metal 0x7ff80cf10f0c -[_MTLBinaryArchive loadFromURL:error:] + 537
9 Metal 0x7ff80cf10288 -[_MTLBinaryArchive initWithOptions:device:url:error:] + 844
10 RenderBox 0x7ff9041a15fd RB::(anonymous namespace)::load_library_archive(NSBundle*,
Hello, here is what I'm doing:
I creating AWS macOS instance
I then set up a /Library/LaunchDaemon plist file that runs a bash script:
&#9;<?xml version="1.0" encoding="UTF-8"?>
&#9;<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "XXXXX/DTDs/PropertyList-1.0.dtd">
&#9;<plist version="1.0">
&#9;<dict>
&#9;&#9;<key>Label</key>
&#9;&#9;<string>aws-ec2-mac-amis.cloud-connect</string>
&#9;&#9;<key>ProgramArguments</key>
&#9;&#9;<array>
&#9;&#9;&#9;<string>/usr/bin/env</string>
&#9;&#9;&#9;<string>/Users/ec2-user/aws-ec2-mac-amis/cloud-connect.bash</string>
&#9;&#9;</array>
&#9;&#9;<key>RunAtLoad</key>
&#9;&#9;<true/>
&#9;&#9;<key>WorkingDirectory</key>
&#9;&#9;<string>/Users/ec2-user</string>
&#9;&#9;<key>StandardErrorPath</key>
&#9;&#9;<string>/var/log/cloud-connect.log</string>
&#9;&#9;<key>StandardOutPath</key>
&#9;&#9;<string>/var/log/cloud-connect.log</string>
&#9;&#9;<key>EnableTransactions</key>
&#9;&#9;<true/>
&#9;&#9;<key>ExitTimeOut</key>
&#9;&#9;<string>300</string>
&#9;</dict>
&#9;</plist>
I've tried this same list without EnableTransactions and there is no difference.
This works and my bash script runs just fine:
#!/bin/bash
set -exo pipefail
[[ ! $EUID -eq 0 ]] && echo "RUN AS ROOT!" && exit 1
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd $SCRIPT_DIR
echo "Waiting for networking..."
while ! ping -c 1 -n github.com &> /dev/null; do sleep 1; done
git pull
. ./_helpers.bash
disjoin() {
&#9;set -x
&#9;/usr/local/bin/ankacluster disjoin &
&#9;CERTS=""
&#9;[[ ! -z "$CLOUD_CONNECT_CERT" ]] && CERTS="--cert $CLOUD_CONNECT_CERT"
&#9;[[ ! -z "$CLOUD_CONNECT_KEY" ]] && CERTS="$CERTS --cert-key $CLOUD_CONNECT_KEY"
&#9;[[ ! -z "$CLOUD_CONNECT_CA" ]] && CERTS="$CERTS --cacert $CLOUD_CONNECT_CA"
&#9;NODE_ID="$(curl -s $CERTS "${ANKA_CONTROLLER_ADDRESS}/api/v1/node" | jq -r ".body | .[] | select(.node_name==\"$(hostname)\") | .node_id")"
&#9;curl -s $CERTS -X DELETE "${ANKA_CONTROLLER_ADDRESS}/api/v1/node" -H "Content-Type: application/json" -d "{\"node_id\": \"$NODE_ID\"}"
}
Grab the ENVS the user sets in user-data
if [[ ! -e $CLOUD_CONNECT_PLIST_PATH ]]; then
&#9;mkdir -p $LAUNCH_LOCATION
cat > $CLOUD_CONNECT_PLIST_PATH <<EOD
&#9;<?xml version="1.0" encoding="UTF-8"?>
&#9;<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
&#9;<plist version="1.0">
&#9;<dict>
&#9;&#9;<key>Label</key>
&#9;&#9;<string>aws-ec2-mac-amis.cloud-connect</string>
&#9;&#9;<key>ProgramArguments</key>
&#9;&#9;<array>
&#9;&#9;&#9;<string>/usr/bin/env</string>
&#9;&#9;&#9;<string>/Users/ec2-user/aws-ec2-mac-amis/cloud-connect.bash</string>
&#9;&#9;</array>
&#9;&#9;<key>RunAtLoad</key>
&#9;&#9;<true/>
&#9;&#9;<key>WorkingDirectory</key>
&#9;&#9;<string>/Users/ec2-user</string>
&#9;&#9;<key>StandardErrorPath</key>
&#9;&#9;<string>/var/log/cloud-connect.log</string>
&#9;&#9;<key>StandardOutPath</key>
&#9;&#9;<string>/var/log/cloud-connect.log</string>
&#9;&#9;<key>EnableTransactions</key>
&#9;&#9;<true/>
&#9;&#9;<key>ExitTimeOut</key>
&#9;&#9;<string>300</string>
&#9;</dict>
&#9;</plist>
EOD
&#9;launchctl load -w $CLOUD_CONNECT_PLIST_PATH
else
&#9;echo "$(date) ($(whoami)): Attempting join..."
&#9;Check if user-data exists
&#9;[[ ! -z "$(curl -s XXXX/latest/user-data | grep 404)" ]] && echo "Could not find required ANKA_CONTROLLER_ADDRESS in instance user-data!" && exit 1
&#9;create user ENVs for this session
&#9;$(curl -s XXXX/latest/user-data | sed 's/\"//g')
&#9;IF the user wants to change the IP address for the registry domain name (if they want to use a second EC2 registry for better speed), handle setting the /etc/hosts
&#9;if [[ ! -z "$ANKA_REGISTRY_OVERRIDE_IP" && ! -z "$ANKA_REGISTRY_OVERRIDE_DOMAIN" ]]; then
&#9;&#9;&#9;modify_hosts $ANKA_REGISTRY_OVERRIDE_DOMAIN $ANKA_REGISTRY_OVERRIDE_IP
&#9;fi
&#9;Ensure that anytime the script stops, we disjoin first
&#9;/usr/local/bin/ankacluster join $ANKA_CONTROLLER_ADDRESS $ANKA_JOIN_ARGS
&#9;trap disjoin 0 Disjoin after we joined properly to avoid unloading prematurely
&#9;set +x
&#9;while true; do
&#9;&#9;sleep 1 &
&#9;&#9;wait $!
&#9;done
fi
I see the process running, and the host has connected to the remote server's controller:
root&#9;&#9;&#9;&#9;&#9;&#9;46851&#9; 0.0&#9;0.0&#9;4283172&#9; 1120&#9; ??&#9;Ss&#9;&#9;8:49PM&#9; 0:00.09 /bin/bash /Users/ec2-user/aws-ec2-mac-amis/cloudconnect.bash
However, when I terminate the AWS instance, the process stays running and the bash script's trap is never attempted (at least according to the logs).
This could very well be an AWS specific issue, however, I wanted to check here and see if I was potentially missing something important.
Some things that do work:
I can sudo shutdown -r now inside of the host and it disjoins properly before the host shuts down.
I can sudo launchctl -w unload inside of the host and it disjoins properly, too.