Outgoing SSL connections fail on macOS 15, work fine on earlier versions

The OpenSSL library interface to Allegro Common Lisp system stopped working with macOS 15.x (15.0.1 and 15.1).

We have tried many versions of OpenSSL. 1.1.1t (which we built ourselves), 3.0.x, 3.3.x, 3.4.0. All work fine on macOS 14 and earlier. All fail on macOS 15.

What is bizarre about the failure: we can load the SSL libraries fine, but when we try to make an outgoing connection it fails (with varying errors). Also, trying to use lldb to debug just hangs, once we step into the SSL libraries.

More specifically, using Homebrew OpenSSL 3.0.15 gives an exception that we see in lldb, but we cannot step into SSL_ctrl(), which is in libssl.3.dylib, provided by Homebrew.

We have also tried a version of OpenSSL 1.1.1t that we built ourselves (and codesigned and is included in the notarized app), and it fails with a SEGV, rather than the error below, which is using 3.0.15:

What started this were errors using the OpenSSL libraries. Here's the use case:

cl-user(2): (net.aserve.client:do-http-request "https://franz.com")
(net.aserve.client:do-http-request "https://franz.com")
Error: Received signal number 0 
  [condition type: synchronous-operating-system-signal]

Restart actions (select using :continue):
 0: Return to Top Level (an "abort" restart).
 1: Abort entirely from this (lisp) process.
[1] cl-user(3): :zo :all t :count 5
:zo :all t :count 5
Evaluation stack:

... 5 more newer frames ...

   (excl::SSL_ctrl 6133462816 55 ...)
   (excl::ssl-device-open-common #<excl::ssl-client-stream  closed fd # @ #x3079fed32> nil ...)
 ->((method device-open (excl::ssl-client-stream t t)) #<excl::ssl-client-stream  closed fd # @ #x3079fed32> t ...)
   ((:internal (:effective-method 3 nil nil nil t) 0) #<excl::ssl-client-stream  closed fd # @ #x3079fed32> t ...)
   ((:runsys sys::lisp_apply))
   [... excl::function_lisp_apply ]
   (excl::caching-miss #<standard-generic-function device-open> (# t #) ...)
   [... device-open ]

... more older frames ...
[1] cl-user(4): 

If you want to see the problem for yourself, I created a new, signed and notarized version of our application https://franz.com/ftp/pri/layer/acl11.0express-macos-arm64.dmg.

To use it, install Homebrew and do brew install openssl@3.0, then execute the following to get the error:

cd /Applications/AllegroCL64express.app/Contents/Resources
env ACL_OPENSSL_VERSION=30 DYLD_LIBRARY_PATH="$(brew --prefix openssl@3.0)/lib:$DYLD_LIBRARY_PATH" ./alisp
(progn (require :ssl)(require :aserve))
(net.aserve.client:do-http-request "https://franz.com")

You should get the error shown above.

Here's what we see when we set a breakpoint at SSL_ctrl:

lldb alisp
_regexp-env ACL_OPENSSL_VERSION=30
_regexp-env DYLD_LIBRARY_PATH=/opt/homebrew/opt/openssl@3.0/lib:
br s -n SSL_ctrl
run
(progn (require :ssl)(require :aserve))
(net.aserve.client:do-http-request "https://franz.com")

Then, we see this:

cl-user(2): (net.aserve.client:do-http-request "https://franz.com")
(net.aserve.client:do-http-request "https://franz.com")
Process 5886 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
	frame #0: 0x0000000102081090 libssl.3.dylib`SSL_ctrl
libssl.3.dylib`SSL_ctrl:
->  0x102081090 <+0>:  stp    x20, x19, [sp, #-0x20]!
	0x102081094 <+4>:  stp    x29, x30, [sp, #0x10]
	0x102081098 <+8>:  add    x29, sp, #0x10
	0x10208109c <+12>: mov    x20, x2
(lldb) si
<<<hang here>>>

Again, it only started with macOS 15. We have not seen this on any previous version.

More detail:

$ codesign -vvvv /Applications/AllegroCL64express.app
/Applications/AllegroCL64express.app: valid on disk
/Applications/AllegroCL64express.app: satisfies its Designated Requirement
$

$ codesign -d --entitlements - /Applications/AllegroCL64express.app
Executable=/Applications/AllegroCL64express.app/Contents/MacOS/AllegroCL64express
[Dict]
[Key] com.apple.security.cs.allow-dyld-environment-variables
[Value]
[Bool] true
[Key] com.apple.security.cs.allow-jit
[Value]
[Bool] true
[Key] com.apple.security.cs.disable-library-validation
[Value]
[Bool] true
[Key] com.apple.security.get-task-allow
[Value]
[Bool] true
$

The other thing we noticed in debugging this is even though we set DYLD_LIBRARY_PATH, another libssl seemed to be found by lldb. For example, in this case 3 versions of SSL_new were found by lldb:

$ lldb alisp
(lldb) target create "alisp"
Current executable set to '/Applications/AllegroCL64express.app/Contents/Resources/alisp' (arm64).
(lldb) _regexp-env ACL_OPENSSL_VERSION=30
(lldb) _regexp-env DYLD_LIBRARY_PATH=/opt/homebrew/opt/openssl@3.0/lib:
(lldb) br s -n SSL_new
br s -n SSL_new
Breakpoint 1: 2 locations.
(lldb) run
Process 6339 launched: '/Applications/AllegroCL64express.app/Contents/Resources/alisp' (arm64)
Copyright (C) 1985-2023, Franz Inc., Lafayette, CA, USA.  All Rights Reserved.
...
CL-USER(1): (progn (require :ssl)(require :aserve))
; Fast loading
;    /Applications/AllegroCL64express.app/Contents/Resources/code/SSL.002
...
T
CL-USER(2): (net.aserve.client:do-http-request "https://franz.com")
Process 6339 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
    frame #0: 0x00000001020803ec libssl.3.dylib`SSL_new
libssl.3.dylib`SSL_new:
->  0x1020803ec <+0>:  stp    x20, x19, [sp, #-0x20]!
    0x1020803f0 <+4>:  stp    x29, x30, [sp, #0x10]
    0x1020803f4 <+8>:  add    x29, sp, #0x10
    0x1020803f8 <+12>: cbz    x0, 0x102080700           ; <+788>
(lldb) br list
Current breakpoints:
1: name = 'SSL_new', locations = 3, resolved = 3, hit count = 1
  1.1: where = libboringssl.dylib`SSL_new, address = 0x0000000193f1b160, resolved, hit count = 0
  1.2: where = libssl.48.dylib`SSL_new, address = 0x000000026907f64c, resolved, hit count = 0
  1.3: where = libssl.3.dylib`SSL_new, address = 0x00000001020803ec, resolved, hit count = 1

(lldb) 

We are out of ideas on how to debug this.

Answered by DTS Engineer in 817787022

For those following along at home, I’ll be continuing this conversation with dklayer privately.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Quinn,

I apologize for the bad test case the other day. I'm sorry about wasting your time.

I do have a new test case, and we tested it on macOS 14 and 15. No DYLD_LIBRARY_PATH is needed. Here it is:

cd ~/Downloads
DMG=acl11.0express-macos-arm64-5.dmg
wget https://franz.com/ftp/pri/layer/$DMG
rm -fr /Applications/AllegroCL64express.app
open $DMG
cd /Applications/AllegroCL64express.app/Contents/Resources/
./alisp
(progn(require :ssl)(require :aserve))
(trace excl::rand_add)
(net.aserve.client:do-http-request "https://franz.com")

It gets a SEGV on macOS 15.1 and works properly on 14.7.

Thank you for helping.

Kevin

Thanks for that.

I was able to reproduce the problem. Or at least I think I was. At the last step I saw:

Error: Received signal number 11 (Segmentation fault: 11)
  [condition type: SYNCHRONOUS-OPERATING-SYSTEM-SIGNAL]

which presumably means that your runtime caught the SIGSEGV.

Is a way to configure your runtime to not catch that signal? I’d like the process to crash so that I can get a crash report. And, if necessary, a core dump.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

@DTS Engineer just insert this form before the last one:

(excl::unix-signal 11 0)

That is just before the do-http-request that causes the problem. This will prevent our trapping of the error.

CL-USER(2): (excl::unix-signal 11 0)
4372251680
CL-USER(3): (net.aserve.client:do-http-request "https://franz.com")
Segmentation fault: 11
@max[git:master]$

Let me know if there is anything else I can do.

Kevin

Thanks for that.

On disabling the signal handler I was able to get a crash report. However, the contents of that crash report are quite weird, and that leads me to another question about your runtime.

In my crash report I see this:

Thread 1:
0   libsystem_kernel.dylib  … mach_msg2_trap + 8
1   libsystem_kernel.dylib  … mach_msg2_internal + 80
2   libsystem_kernel.dylib  … mach_msg_overwrite + 480
3   libsystem_kernel.dylib  … mach_msg + 24
4   libacli11029t6.dylib    … lisp_exception_watcher + 348
5   libsystem_pthread.dylib … _pthread_start + 136
6   libsystem_pthread.dylib … thread_start + 8

which suggests you have a Mach exception handler that’s active even with your signal handler disabled. Is it possible to disable that?

Implementing a Mach exception handler correctly is really hard, so I’m hoping that I can disable this in order to increase my trust level of the crash report. Also, I suspect that disabling this exception handler will help with the LLDB issue [1].

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] With the signal handler disabled I don’t see LLDB hang, but it’s also not catching the crash. Rather, LLDB sees the alisp process terminate, as if LLDB weren’t attached.

@DTS Engineer

On disabling the signal handler I was able to get a crash report.

... which suggests you have a Mach exception handler that’s active even with your signal handler disabled. Is it possible to disable that?

No, the exception handler is crucial to the operation of the product. Remember this is a language. We produce compiled code, catch exceptions on behalf of users, etc. This has been part of the product since the first version, in MacOS 10.0.

For example, in Common Lisp, we have an API for virtual multi-threading and that facility uses the exception handler for its operation. That feature cannot be turned off.

Our application is not a normal application. We have an entire programming system inside our product, and that includes catching exceptions and dealing with them programmatically.

Implementing a Mach exception handler correctly is really hard,

We agree. We would be happy to send you the source code for it, but we don't want to post it here. This exception handler is as old as MacOS (from 10.0).

so I’m hoping that I can disable this in order to increase my trust level of the crash report. Also, I suspect that disabling this exception handler will help with the LLDB issue [1].

So, hopefully you see that it's not possible to disable the exception handler because it is essential to the operation of the application. In the spirit of what you asked, we did try to disable it to see if we would get lucky and we could fulfill your request, but we can't even get to the example which fails.

[1] With the signal handler disabled I don’t see LLDB hang, but it’s also not catching the crash. Rather, LLDB sees the alisp process terminate, as if LLDB weren’t attached.

You don't see LLDB hang because you aren't single stepping into RAND_add. If you did, you would see the hang. Instead, you are running until you get the SEGV, because you evaluated (excl::unix-signal 11 0).

We want to point that LLDB works perfectly fine in the presence of our exception handler. It is ONLY when single stepping into the SSL library that it hangs. I want to underscore this. We have used LLDB for many years on debugging our code. We have NEVER seen it fail like this. Ever. It seems very much a coincidence that the first time it does fail is on SSL, which after all is a security related API. We wonder if macOS 15 has some security heuristics which prevent snooping on SSL connections.

Kevin

@DTS Engineer Quinn, I just sent you the source code to the exception handler via email.

Kevin

@DTS Engineer I've updated the DTS ticket with questions about how to move forward. This continues to be a huge issue for us and our customers.

For those following along at home, I’ll be continuing this conversation with dklayer privately.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Outgoing SSL connections fail on macOS 15, work fine on earlier versions
 
 
Q