An update here.
when compiling with the native apple clang 15 compiler on macosx, the code in question:
return __atomic_load_n(p, __ATOMIC_ACQUIRE)
Produces the following assembly
`
0000000100120488 <_ossl_rcu_uptr_deref>:
100120488: f8bfc000 ldapr x0, [x0]
10012048c: d65f03c0 ret
`
Whereas the homebrew gcc-13 compiler produces:
`
0000000100143a40 <_ossl_rcu_uptr_deref>:
100143a40: c8dffc00 ldar x0, [x0]
100143a44: d65f03c0 ret
100143a48: d503201f nop
00143a4c: d503201f nop
`
It seems like based on the ARM LDAPR instruction docs, that the safe use of the ldapr instruction is dependent on the configuration of the Local Ordering registers in the coprocessor. As such this seems like a compiler bug to issue the instruction unilaterally