Recently a bunch of folks have asked about why a specific symbol is being referenced by their app. This is my attempt to address that question.
If you have questions or comments, please start a new thread. Tag it with Linker so that I see it.
Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"
Determining Why a Symbol is Referenced
In some situations you might want to know why a symbol is referenced by your app. For example:
-
You might be working with a security auditing tool that flags uses of
malloc
. -
You might be creating a privacy manifest and want to track down where your app is calling
stat
.
This post is my attempt at explaining a general process for tracking down the origin of these symbol references. This process works from ‘below’. That is, it works ‘up’ from you app’s binary rather than ‘down’ from your app’s source code. That’s important because:
-
It might be hard to track down all of your source code, especially if you’re using one or more package management systems.
-
If your app has a binary dependency on a static library, dynamic library, or framework, you might not have access to that library’s source code.
IMPORTANT This post assumes the terminology from An Apple Library Primer. Read that before continuing here.
The general outline of this process is:
-
Find all Mach-O images.
-
Find the Mach-O image that references the symbol.
-
Find the object files (
.o
) used to make that Mach-O. -
Find the object file that references the symbol.
-
Find the code within that object file.
This post assumes that you’re using Xcode. If you’re using third-party tools that are based on Apple tools, and specifically Apple’s linker, you should be able to adapt this process to your tooling. If you’re using a third-party tool that has its own linker, you’ll need to ask for help via your tool’s support channel.
Find all Mach-O images
On Apple platforms an app consists of a number of Mach-O images. Every app has a main executable. The app may also embed dynamic libraries or frameworks. The app may also embed app extensions or system extensions, each of which have their own executable. And a Mac app might have embedded bundles, helper tools, XPC services, agents, daemons, and so on.
To find all the Mach-O images in your app, combine the find
and file
tools. For example:
% find "Apple Configurator.app" -print0 | xargs -0 file | grep Mach-O
Apple Configurator.app/Contents/MacOS/Apple Configurator: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64]
…
Apple Configurator.app/Contents/MacOS/cfgutil: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
…
Apple Configurator.app/Contents/Extensions/ConfiguratorIntents.appex/Contents/MacOS/ConfiguratorIntents: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
…
Apple Configurator.app/Contents/Frameworks/ConfigurationUtilityKit.framework/Versions/A/ConfigurationUtilityKit: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit dynamically linked shared library x86_64] [arm64]
…
This shows that Apple Configurator has a main executable (Apple Configurator
), a helper tool (cfgutil
), an app extension (ConfiguratorIntents
), a framework (ConfigurationUtilityKit
), and many more.
This output is quite unwieldy. For nicer output, create and use a shell script like this:
% cat FindMachO.sh
#! /bin/sh
# Passing `-0` to `find` causes it to emit a NUL delimited after the
# file name and the `:`. Sadly, macOS `cut` doesn’t support a nul
# delimiter so we use `tr` to convert that to a DLE (0x01) and `cut` on
# that.
#
# Weirdly, `find` only inserts the NUL on the primary line, not the
# per-architecture Mach-O lines. We use that to our advantage, filtering
# out the per-architecture noise by only passing through lines
# containing a DLE.
find "$@" -type f -print0 \
| xargs -0 file -0 \
| grep -a Mach-O \
| tr '\0' '\1' \
| grep -a $(printf '\1') \
| cut -d $(printf '\1') -f 1
Find the Mach-O image that references the symbol
Once you have a list of Mach-O images, use nm
to find the one that references the symbol. The rest of this post investigate a test app, WaffleVarnishORama, that’s written in Swift but uses waffle management functionality from the libWaffleCore.a
static library. The goal is to find the code that calls calloc
.
This app has a single Mach-O image:
% FindMachO.sh "WaffleVarnishORama.app"
WaffleVarnishORama.app/WaffleVarnishORama
Use nm
to confirm that it references calloc
:
% nm "WaffleVarnishORama.app/WaffleVarnishORama" | grep "calloc"
U _calloc
The _calloc
symbol has a leading underscore because it’s a C symbol. This convention dates from the dawn of Unix, where the underscore distinguish C symbols from assembly language symbols.
The U
prefix indicates that the symbol is undefined, that is, the Mach-O images is importing the symbol. If the symbol name is prefixed by a hex number and some other character, like T
or t
, that means that the library includes an implementation of calloc
. That’s weird, but certainly possible. OTOH, if you see this then you know this Mach-O image isn’t importing calloc
.
IMPORTANT If this Mach-O isn’t something that you build — that is, you get this Mach-O image as a binary from another developer — you won’t be able to follow the rest of this process. Instead, ask for help via that library’s support channel.
Find the object files used to make that Mach-O image
The next step is to track down which .o
file includes the reference to calloc
. Do this by generating a link map. A link map is an old school linker feature that records the location, size, and origin of every symbol added to the linker’s output.
To generate a link map, enable the Write Link Map File build setting. By default this puts the link map into a text (.txt
) file within the derived data directory. To find the exact path, look at the Link step in the build log. If you want to customise this, use the Path to Link Map File build setting.
A link map has three parts:
-
A simple header
-
A list of object files used to build the Mach-O image
-
A list of sections and their symbols
In our case the link map looks like this:
# Path: …/WaffleVarnishORama.app/WaffleVarnishORama
# Arch: arm64
# Object files:
[ 0] linker synthesized
[ 1] objc-file
[ 2] …/AppDelegate.o
[ 3] …/MainViewController.o
[ 4] …/libWaffleCore.a[2](WaffleCore.o)
[ 5] …/Foundation.framework/Foundation.tbd
…
# Sections:
# Address Size Segment Section
0x100008000 0x00001AB8 __TEXT __text
…
The list of object files contains:
-
An object file for each of our app’s source files — That’s
AppDelegate.o
andMainViewController.o
in this example. -
A list of static libraries — Here that’s just
libWaffleCore.a
. -
A list of dynamic libraries — These might be stub libraries (
.tbd
), dynamic libraries (.dylib
), or frameworks (.framework
).
Focus on the object files and static libraries. The list of dynamic libraries is irrelevant because each of those is its own Mach-O image.
Find the object file that references the symbol
Once you have list of object files and static libraries, use nm
to each one for the calloc
symbol:
% nm "…/AppDelegate.o" | grep calloc
% nm "…/MainViewController.o" | grep calloc
% nm "…/libWaffleCore.a" | grep calloc
U _calloc
This indicates that only libWaffleCore.a
references the calloc
symbol, so let’s focus on that.
Note As in the Mach-O case, the U
prefix indicates that the symbol is undefined, that is, the object file is importing the symbol.
Find the code within that object file
To find the code within the object file that references the symbol, use the objdump
tool. That tool takes an object file as input, but in this example we have a static library. That’s an archive containing one or more object files. So, the first step is to unpack that archive:
% mkdir "libWaffleCore-objects"
% cd "libWaffleCore-objects"
% ar -x "…/libWaffleCore.a"
% ls -lh
total 24
-rw-r--r-- 1 quinn staff 4.1K 8 May 11:24 WaffleCore.o
-rw-r--r-- 1 quinn staff 56B 8 May 11:24 __.SYMDEF SORTED
There’s only a single object file in that library, which makes things easy. If there were a multiple, run the following process over each one independently.
To find the code that references a symbol, run objdump
with the -S
and -r
options:
% xcrun objdump -S -r "WaffleCore.o"
…
; extern WaffleRef newWaffle(void) {
0: d10083ff sub sp, sp, #32
4: a9017bfd stp x29, x30, [sp, #16]
8: 910043fd add x29, sp, #16
c: d2800020 mov x0, #1
10: d2800081 mov x1, #4
; Waffle * result = calloc(1, sizeof(Waffle));
14: 94000000 bl 0x14 <ltmp0+0x14>
0000000000000014: ARM64_RELOC_BRANCH26 _calloc
…
Note the ARM64_RELOC_BRANCH26
line. This tells you that the instruction before that — the bl
at offset 0x14 — references the _calloc
symbol.
IMPORTANT The ARM64_RELOC_BRANCH26
relocation is specific to the bl
instruction in 64-bit Arm code. You’ll see other relocations for other instructions. And the Intel architecture has a whole different set of relocations. So, when searching this output don’t look for ARM64_RELOC_BRANCH26
specifically, but rather any relocation that references _calloc
.
In this case we’ve built the object file from source code, so WaffleCore.o
contains debug symbols. That allows objdump
include information about the source code context. From that, we can easily see that calloc
is referenced by our newWaffle
function.
To see what happens when you don’t have debug symbols, create an new object file with them stripped out:
% cp "WaffleCore.o" "WaffleCore-stripped.o"
% strip -x -S "WaffleCore-stripped.o"
Then repeat the objdump
command:
% xcrun objdump -S -r "WaffleCore-stripped.o"
…
0000000000000000 <_newWaffle>:
0: d10083ff sub sp, sp, #32
4: a9017bfd stp x29, x30, [sp, #16]
8: 910043fd add x29, sp, #16
c: d2800020 mov x0, #1
10: d2800081 mov x1, #4
14: 94000000 bl 0x14 <_newWaffle+0x14>
0000000000000014: ARM64_RELOC_BRANCH26 _calloc
…
While this isn’t as nice as the previous output, you can still see that newWaffle
is calling calloc
.