Need information about macOS use of user-process virtual address space

I need to know more about how macOS uses user-process virtual address space in ways not explicily requested by the user.

In detail, I have a process that needs to mmap a file to a specific virtual address. I don't care what the address is, but I need to know it at compile-time. I know how to set up such a mapping, and how to specify in the executable that a particular chunk of virtual address space is reserved for it: Specifically, my Xcode build includes a .s file containing a .zerofill directive to create a named segment and section of the required size, and I use the -segaddr flag in the linker to specify the virtual address where that segment is to be loaded. I can then use the address and size that I have chosen, elsewhere in my source code, via mmap with MAP_FIXED.

This method seems to protect the special segment from being used by macOS, which is of course what I want. My problem is, I don't know whether the location I have chosen for the new segment is inconveniencing macOS in some undesirable way: For example, staking out a big chunk of user memory in the wrong place might restrict the space available to the memory allocation system, or limit stack growth, or some such thing. At the moment, my empirical choice of location works on my own Macs, but it might not work on others, or on other versions of macOS.

What I am looking for -- and haven't found -- is documentation about how macOS user-process virtual address space is used, in sufficient detail that I can choose the location of my special segment so that it does not get in the way. I need to know it for both the x86_64 architecture and the arm64 architecture, and I need to know how that usage might vary from machine to machine and from macOS version to macOS version.

Can anyone help or advise?

(For the curious, I need to mmap the file to a specific virtual address because it is binary data that contains absolute pointers to locations within itself. I can set up the pointers correctly for any given mmap loading address when I first mmap the file, but I need each subsequent process that mmaps the same file to be able to dereference pointers correctly -- thus I need a fixed load address that all processes can use when mmapping.)

Replies

Generally speaking this is not supported. The issue is that as we develop macOS we often need to change the address map for various reasons (including just the normal growth of the OS), so even if we were to specify the exact layout for a give machine or OS version that is no guarantee it will remain that way in the future.

I understand your motivation, though I am a bit confused how you have gotten so far. Adding a zero fill segment via -segaddr will only work the way you describe if you can guarantee your binary is loaded at a fixed address, which is not the default behavior on any platform, so I assume you are passing in -no_pie as well? If so I am guessing you have only tested this so far on x86_64, since -no_pie does not work on arm64 (the static will emit a warning and ignore the flag, and the kernel will not load a binaries that is not built PIE).

Okay, so given that MAP_FIXED is not generally viable for this, what are your options? Without knowing more about the specific data you are trying to map (how large it is, if you have control over the code that walks the structs and pointer in it, etc) I am not sure what the best solution is, but I can offer a couple of strategies here:

  1. Convert from pointers to an offset relative to the base of your mapping. This may be a lot of work, but it has several benefits. You can load the data read only and keep it clean (which may not be a big deal if you are dirtying it anyway). It requires no extra work at load time, and if the data is under 4GB you can use 32 bit offsets which will may save you space if your data contains a lot of pointers. The Swift runtime uses this technique, which was discussed during the 2018 LLVM developers conference in the talk "Efficiently Implementing Runtime Metadata with LLVM"

  2. Build the data structures at compile time and let the dynamic linker just take care of it for you. If this is possible it is probably your best bet. The dynamic linker already needs to do this exact operation to patch globals that contain internal pointers, so if you can generate the structure and compile it into a binary you can just use it.

  3. Patch the pointers yourself at load time. This is essentially what the linker does, but if you cannot generate the data at compile time you can do it yourself. All you need to do is encode the offsets of pointers (the simplest way is a bitmap, but you can get a lot more clever), and then at runtime you use those bitmaps to find the locations to adjust, and then just subtract at the base address it was mapped at when it was built, and add in the base address it is currently loaded at. This is probably the least amount of effort to get working, but it will involve the most runtime code and dirty memory I your app.