The video about Porting to Apple Silicon mentions the Intel/arm64 memory ordering differences and states that 'Correct code' will behave the same but data races may behave differently. What is implied by 'Correct code'? Does that include using memory barriers? Are there any limitations on using memory barriers at the application and DriverKit level? Or for IOKit kexts where they are allowed?
Memory barriers in Apps, Kexts and Dexts running on Apple Silicon
'Correct code' in this context is code that uses memory barriers to declare what memory consistency requirements you have to avoid data races.
data race --> use of a memory location by two or more threads without adequate synchronization.
consistency --> constraining what order memory accesses must occur in to avoid a data race
For example, when using C++11 and accessing a shared variable, you'd typically use a std::mutex or similar, or when using a std::atomic as a signal between threads you'd use acquire/release or seq_cst memory ordering
The x86_64 architecture naturally provides fairly strong memory ordering rules, so even if you entirely omit the use of memory barriers in your code, you have a data race but you often get the results you expected. The ARM architecture is weakly ordered: the CPU is permitted to aggressively reorder memory accesses to improve performance, and that data race that existed all along is now much more likely to reveal itself.
(note, even on x86_64, the compiler would have been free to reorder your memory accesses even if the hardware didn't, so those missing barriers are still dangerous there. It is generally just down to luck what ends up happening).
data race --> use of a memory location by two or more threads without adequate synchronization.
consistency --> constraining what order memory accesses must occur in to avoid a data race
For example, when using C++11 and accessing a shared variable, you'd typically use a std::mutex or similar, or when using a std::atomic as a signal between threads you'd use acquire/release or seq_cst memory ordering
The x86_64 architecture naturally provides fairly strong memory ordering rules, so even if you entirely omit the use of memory barriers in your code, you have a data race but you often get the results you expected. The ARM architecture is weakly ordered: the CPU is permitted to aggressively reorder memory accesses to improve performance, and that data race that existed all along is now much more likely to reveal itself.
(note, even on x86_64, the compiler would have been free to reorder your memory accesses even if the hardware didn't, so those missing barriers are still dangerous there. It is generally just down to luck what ends up happening).