Determining Why a Symbol is Referenced

Question

DTS Engineer OP

Apple

Created May ’24

Replies 0

Boosts 0

Participants 1

Recently a bunch of folks have asked about why a specific symbol is being referenced by their app. This is my attempt to address that question.

If you have questions or comments, please start a new thread. Tag it with Linker so that I see it.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Determining Why a Symbol is Referenced

In some situations you might want to know why a symbol is referenced by your app. For example:

You might be working with a security auditing tool that flags uses of malloc.
You might be creating a privacy manifest and want to track down where your app is calling stat.

This post is my attempt at explaining a general process for tracking down the origin of these symbol references. This process works from ‘below’. That is, it works ‘up’ from you app’s binary rather than ‘down’ from your app’s source code. That’s important because:

It might be hard to track down all of your source code, especially if you’re using one or more package management systems.
If your app has a binary dependency on a static library, dynamic library, or framework, you might not have access to that library’s source code.

IMPORTANT This post assumes the terminology from An Apple Library Primer. Read that before continuing here.

The general outline of this process is:

Find all Mach-O images.
Find the Mach-O image that references the symbol.
Find the object files (.o) used to make that Mach-O.
Find the object file that references the symbol.
Find the code within that object file.

This post assumes that you’re using Xcode. If you’re using third-party tools that are based on Apple tools, and specifically Apple’s linker, you should be able to adapt this process to your tooling. If you’re using a third-party tool that has its own linker, you’ll need to ask for help via your tool’s support channel.

Find all Mach-O images

On Apple platforms an app consists of a number of Mach-O images. Every app has a main executable. The app may also embed dynamic libraries or frameworks. The app may also embed app extensions or system extensions, each of which have their own executable. And a Mac app might have embedded bundles, helper tools, XPC services, agents, daemons, and so on.

To find all the Mach-O images in your app, combine the find and file tools. For example:

% find "Apple Configurator.app" -print0 | xargs -0 file | grep Mach-O
Apple Configurator.app/Contents/MacOS/Apple Configurator:                                                                                                                                                                           Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64]
…
Apple Configurator.app/Contents/MacOS/cfgutil:                                                                                                                                                                                      Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
…

Apple Configurator.app/Contents/Extensions/ConfiguratorIntents.appex/Contents/MacOS/ConfiguratorIntents:                                                                                                                            Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
…
Apple Configurator.app/Contents/Frameworks/ConfigurationUtilityKit.framework/Versions/A/ConfigurationUtilityKit:                                                                                                                    Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit dynamically linked shared library x86_64] [arm64]
…

This shows that Apple Configurator has a main executable (Apple Configurator), a helper tool (cfgutil), an app extension (ConfiguratorIntents), a framework (ConfigurationUtilityKit), and many more.

This output is quite unwieldy. For nicer output, create and use a shell script like this:

% cat FindMachO.sh 
#! /bin/sh

# Passing `-0` to `find` causes it to emit a NUL delimited after the
# file name and the `:`. Sadly, macOS `cut` doesn’t support a nul
# delimiter so we use `tr` to convert that to a DLE (0x01) and `cut` on
# that.
#
# Weirdly, `find` only inserts the NUL on the primary line, not the
# per-architecture Mach-O lines. We use that to our advantage, filtering
# out the per-architecture noise by only passing through lines
# containing a DLE.

find "$@" -type f -print0 \
    | xargs -0 file -0 \
    | grep -a Mach-O \
    | tr '\0' '\1' \
    | grep -a $(printf '\1') \
    | cut -d $(printf '\1') -f 1

Find the Mach-O image that references the symbol

Once you have a list of Mach-O images, use nm to find the one that references the symbol. The rest of this post investigate a test app, WaffleVarnishORama, that’s written in Swift but uses waffle management functionality from the libWaffleCore.a static library. The goal is to find the code that calls calloc.

This app has a single Mach-O image:

% FindMachO.sh "WaffleVarnishORama.app"
WaffleVarnishORama.app/WaffleVarnishORama

Use nm to confirm that it references calloc:

% nm "WaffleVarnishORama.app/WaffleVarnishORama" | grep "calloc"
                 U _calloc

The _calloc symbol has a leading underscore because it’s a C symbol. This convention dates from the dawn of Unix, where the underscore distinguish C symbols from assembly language symbols.

The U prefix indicates that the symbol is undefined, that is, the Mach-O images is importing the symbol. If the symbol name is prefixed by a hex number and some other character, like T or t, that means that the library includes an implementation of calloc. That’s weird, but certainly possible. OTOH, if you see this then you know this Mach-O image isn’t importing calloc.

IMPORTANT If this Mach-O isn’t something that you build — that is, you get this Mach-O image as a binary from another developer — you won’t be able to follow the rest of this process. Instead, ask for help via that library’s support channel.

Find the object files used to make that Mach-O image

The next step is to track down which .o file includes the reference to calloc. Do this by generating a link map. A link map is an old school linker feature that records the location, size, and origin of every symbol added to the linker’s output.

To generate a link map, enable the Write Link Map File build setting. By default this puts the link map into a text (.txt) file within the derived data directory. To find the exact path, look at the Link step in the build log. If you want to customise this, use the Path to Link Map File build setting.

A link map has three parts:

A simple header
A list of object files used to build the Mach-O image
A list of sections and their symbols

In our case the link map looks like this:

# Path: …/WaffleVarnishORama.app/WaffleVarnishORama
# Arch: arm64
# Object files:
[  0] linker synthesized
[  1] objc-file
[  2] …/AppDelegate.o
[  3] …/MainViewController.o
[  4] …/libWaffleCore.a[2](WaffleCore.o)
[  5] …/Foundation.framework/Foundation.tbd
…
# Sections:
# Address	Size    	Segment	Section
0x100008000	0x00001AB8	__TEXT	__text
…

The list of object files contains:

An object file for each of our app’s source files — That’s AppDelegate.o and MainViewController.o in this example.
A list of static libraries — Here that’s just libWaffleCore.a.
A list of dynamic libraries — These might be stub libraries (.tbd), dynamic libraries (.dylib), or frameworks (.framework).

Focus on the object files and static libraries. The list of dynamic libraries is irrelevant because each of those is its own Mach-O image.

Find the object file that references the symbol

Once you have list of object files and static libraries, use nm to each one for the calloc symbol:

% nm "…/AppDelegate.o" | grep calloc
% nm "…/MainViewController.o" | grep calloc
% nm "…/libWaffleCore.a" | grep calloc
                 U _calloc

This indicates that only libWaffleCore.a references the calloc symbol, so let’s focus on that.

Note As in the Mach-O case, the U prefix indicates that the symbol is undefined, that is, the object file is importing the symbol.

Find the code within that object file

To find the code within the object file that references the symbol, use the objdump tool. That tool takes an object file as input, but in this example we have a static library. That’s an archive containing one or more object files. So, the first step is to unpack that archive:

% mkdir "libWaffleCore-objects"
% cd "libWaffleCore-objects"
% ar -x "…/libWaffleCore.a"
% ls -lh
total 24
-rw-r--r--  1 quinn  staff   4.1K  8 May 11:24 WaffleCore.o
-rw-r--r--  1 quinn  staff    56B  8 May 11:24 __.SYMDEF SORTED

There’s only a single object file in that library, which makes things easy. If there were a multiple, run the following process over each one independently.

To find the code that references a symbol, run objdump with the -S and -r options:

% xcrun objdump -S -r "WaffleCore.o"
…
; extern WaffleRef newWaffle(void) {
       0: d10083ff      sub     sp, sp, #32
       4: a9017bfd      stp     x29, x30, [sp, #16]
       8: 910043fd      add     x29, sp, #16
       c: d2800020      mov     x0, #1
      10: d2800081      mov     x1, #4
;     Waffle * result = calloc(1, sizeof(Waffle));
      14: 94000000      bl      0x14 <ltmp0+0x14>
                0000000000000014:  ARM64_RELOC_BRANCH26 _calloc
…

Note the ARM64_RELOC_BRANCH26 line. This tells you that the instruction before that — the bl at offset 0x14 — references the _calloc symbol.

IMPORTANT The ARM64_RELOC_BRANCH26 relocation is specific to the bl instruction in 64-bit Arm code. You’ll see other relocations for other instructions. And the Intel architecture has a whole different set of relocations. So, when searching this output don’t look for ARM64_RELOC_BRANCH26 specifically, but rather any relocation that references _calloc.

In this case we’ve built the object file from source code, so WaffleCore.o contains debug symbols. That allows objdump include information about the source code context. From that, we can easily see that calloc is referenced by our newWaffle function.

To see what happens when you don’t have debug symbols, create an new object file with them stripped out:

% cp "WaffleCore.o" "WaffleCore-stripped.o"
% strip -x -S "WaffleCore-stripped.o"

Then repeat the objdump command:

% xcrun objdump -S -r "WaffleCore-stripped.o"
…
0000000000000000 <_newWaffle>:
       0: d10083ff      sub     sp, sp, #32
       4: a9017bfd      stp     x29, x30, [sp, #16]
       8: 910043fd      add     x29, sp, #16
       c: d2800020      mov     x0, #1
      10: d2800081      mov     x1, #4
      14: 94000000      bl      0x14 <_newWaffle+0x14>
                0000000000000014:  ARM64_RELOC_BRANCH26 _calloc
…

While this isn’t as nice as the previous output, you can still see that newWaffle is calling calloc.

Boost