Getting Valgrind to run on macOS 10.15 Catalina

I'm looking at getting Valgrind to run on macOS 10.15 Catalina.


So far I have the build working OK (based on a fork for 10.14 plus a few tweaks specific to 10.15).


However when I run Valgrind [and I'm running the minimal --tool=none with an app that is just "int main(void) {}"] then I'm getting an error related to pthread_init. From what I see from the executed machine code, there is a test for _os_xbs_chrooted (a global variable in the kernel by the looks of it) which then leads to a call to __pthread_init.cold.2. This function contains a ud2 opcode which triggers a SIGILL in the Valgrind VM.


Dearching google for _os_xbs_chrooted doesn't come up with anything much. There's this https://github.com/apple/darwin-libpthread/blob/master/src/pthread.c for the pthread check, and one other reference for the initialization.


I realize this looks like it could be security related and information is not made public.


Any suggestions as to how I can proceed? I have little experience in kernel programming.

Replies

I haven’t looked at this in depth, but

_os_xbs_chrooted
is not new in 10.15; you can see it in the Darwin source for 10.14. Specifically, check out
xnu/libsyscall/wrappers/_libkernel_init.c
.

btw It’s likely that the

xbs
referenced here is not security related, but rather refers to an internal build system used here at Apple.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Hmm. The official version of Valgrind doesn't support 10.14 either, and I don't have a 10.14 system to test with.


What I'm seeing is the following (this is the Valgrind opcode dump, not the execution sequence)


// this is the read of _os_xbs_chrooted

0x1005F5DB2: movq 41575(%rip),%rax


// checking to see if it is zero

0x1005F5DB9: cmpb $0, (%rax)


// if equal jump to ECB

0x1005F5DBC: je-32 0x1005F5ECB


// these opcodes not executed

0x1005F5ECB: call 0x1005FD7A6


0x1005F5DC2: movq -48(%rbp),%rax


0x1005F5DC6: movq 42299(%rip),%rcx


0x1005F5DCD: xorq %r13,%rcx


// target of above jump, call __pthread_init.cold.2

0x1005F5ECB: call 0x1005FD7A6


// __pthread_init.cold.2

0x1005FD7A6: leaq 2759(%rip), %rcx


0x1005FD7AD: xorl %eax,%eax


0x1005FD7AF: movq %rcx,11002(%rip)


0x1005FD7B6: movq %rax,11043(%rip)


// game over

0x1005FD7BD: ud2


It seems to me that it is expecting _os_xbs_chrooted to be non-zero. However I have no idea what system call or other is required to change the value of this variable.

However I have no idea what system call or other is required to change the value of this variable.

The code that sets

_os_xbs_chrooted
is not part of Darwin, alas.

The official version of Valgrind doesn't support 10.14 either, and I don't have a 10.14 system to test with.

It’s going to be a lot of easier to reason about this stuff if you have Darwin source that correlates to the OS you’re testing on. Can you set up 10.14.{1,2,3} in a VM? Critically, those 10.14.x releases have kernel source available (the xnu project).

It seems to me that it is expecting

_os_xbs_chrooted
to be non-zero.

That’s unlikely. What’s more likely is that there’s been a failure earlier, and you’re now on a fallback path, and that fallback path is only enabled if

_os_xbs_chrooted
is set.

For example, one example I found relates to the way that pthreads sets up its workqueue interaction with the kernel. If the kernel doesn’t understand the modern setup mechanism — this can happen if you run a new user space on top of an old kernel — pthreads will fallback to a legacy mechanism, but it’ll only do this if

_os_xbs_chrooted
is set.

As to what’s going on in your specific situation, it’s hard to say without more context. I’m not going to be able to help you on the 10.15 side of things, because there’s no Darwin source for 10.15.x. Let me know if you can reproduce the problem on one of the 10.14.x builds I mentioned above.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Thanks for the info. I'll take a look into setting up a 10.14 VM.


I've also been asking on the Valgrind dev mailing list, and your description of the possible cause sounds plausible.

I have managed to produce a build on 10.14, but the problem is not reproduced there.


So my next step was to run on both Mojave and Catalina with --trace-syscalls=yes. It looks like the one that is failing on Catalina is a thread_selfid syscall.


From what I see this ends with the following assembler


.text

.align 4

.globl _thread_self_trap

_thread_self_trap:

movq $__NR_thread_self_trap, %rax

movq %rcx, %r10

syscall

ret


I guess that this interface might have changed?

I have managed to produce a build on 10.14, but the problem is not reproduced there.

Well, that’s good news at least.

I guess that this interface might have changed?

Hmmm, it’s hard to see how that could be the case, given that it’s a system call that takes no arguments and simply returns a

uint64_t
that is the thread ID. You don’t get much simpler than that (-:

So my next step was to run on both Mojave and Catalina with

--trace-syscalls=yes
. It looks like the one that is failing on Catalina is a
thread_selfid
syscall.

How does it fail?

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

It looks like I'll have to step through the code on both OSes to try to see the difference. Will get back on this shortly.

This seems to be related to ptr_munge... Not sure what we can do in this case.

This seems to be related to

ptr_munge

Yeah, that’s exciting. It’s not an area I’m familiar with. If you have any specific questions, post ’em here and I’ll take a look.

Also, I just noticed that the Darwin source for the 10.15 kernel (xnu) was recently published. This isn’t the whole story (the libpthread project isn’t up yet, for example) but it’s a start.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"
It's about a year since I wrote this, time for a bit of an update, especially in the light of macOS 11.

Short version:
If you want Valgrind on macOS, then your best bet is to go here
Louis Brunners github repo

Louis seems to be working on improving support on macOS 11 Intel.

If you are using macOS on ARM, then you are out of luck. Valgrind is supported on ARM/Linux, but not on ARM for any other platform. It should be possible to make Valgrind work on ARM, but this would be a fairly substantial effort.

For my part, I'm not actively working on macOS, and have been working more on FreeBSD. I am now the port maintainer on FreeBSD and also contribute to upstream Valgrind. (FreeBSD is not officially supported, I hope that some time in 2021 I'll get the FreeBSD code added upstream).

XCode contains command line tool called leaks.

Use it as follows: compile your_program_to_check with debug information:

clang++ -g  src.cpp -o your_program_to_check

export MallocStackLogging=1

leaks --atExit --list -- ./your_program_to_check

Report will hopefully give you lines where leaks occur.