Embedded Assembler in Xcode C++ routines?

I am developing in C++ and Objective C using Xcode 13.1, running on a 2019 Mac Pro with Big Sur 11.6.1. I am working on an app to run on (only) Macintoshes with Intel silicon. (Apple silicon will involve a port, later.) In Xcode, my compiler is set to Apple CLang, with defaults across the board.

I need to embed some Intel assembly-language code in some of the C++ code for my application. I can't find any manuals or examples for how to do that. I have written Intel assembler before, but not for a long time, and never embedded in a MacOS app. Can someone recommend any books or links to documentation or examples? Apple's own documentation seems in great part no longer maintained.

Answered by Knightley in 697021022

I am the original poster of this question. I have made some progress on this one on my own, and since the matter comes up now and then I thought it would be useful to post some useful links I have found, as an aid to others. What follows is a raw list of links, but I think it will be pretty obvious on opening them why I found them useful, and I shall have some further comments on my learning process after the list ... (And the list will probably suggest to you that the reason I wanted to use assembler in the first place had to do with interprocess synchronization.)

======== Start of list of links ========

Bibliography for parallel process synchronization:

x86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors

Peter Sewell Susmit Sarkar Scott Owens University of Cambridge University of Cambridge University of Cambridge

Francesco Zappa Nardelli Magnus O. Myreen INRIA University of Cambridge

       http://www.cl.cam.ac.uk/users/pes20/weakmemory

https://developer.apple.com/documentation/driverkit/3131285-ossynchronizeio

by googling "mfence macintosh"

https://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf

Intel's official stuff:

https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

Embedding assembly in C:

https://clang.llvm.org/compatibility.html

https://llvm.org/docs/LangRef.html

https://llvm.org/docs/CommandGuide/llvm-as.html

https://en.wikipedia.org/wiki/GNU_Assembler

https://cs.lmu.edu/~ray/notes/gasexamples/

https://www.cs.uaf.edu/courses/cs301/2014-fall/notes/inline-assembly/

https://gcc.gnu.org/onlinedocs/gcc-4.0.2/gcc/Extended-Asm.html#Extended-Asm

https://en.wikipedia.org/wiki/X86_calling_conventions

https://stackoverflow.com/tags/inline-assembly/info

https://en.wikipedia.org/wiki/X86_calling_conventions

https://www.agner.org/optimize/calling_conventions.pdf

https://developer.apple.com/forums/thread/64494

======== End of list of links ========

Further comments: I encountered several issues in my investigations, most of which had to do with the fact that the CLang compiler was smarter than I was.

The first issue, however, was simply that I hadn't written any x86-family assembler since the time when the newest, hottest chip in the Intel family was a wonderful ***** with full 16-bit internal data paths, called the 8088: In connection with its friends the 8086 and the 8087, Intel was near the head of the pack in advanced microprocessors in the late 1970s. In particular, I did not know that there were different names for the registers on later-model Intel processors, corresponding to whether you wanted to access 8, 16, 32 or 64 bits of them. This led to a lot of syntax errors, which were certainly mostly due to my own ignorance, but I might mention that the error messages reported by the assembler weren't much more useful than the classic yacc "Syntax error" message.

With that one out of the way, I then was confronted with trying to outsmart the compiler. I was using embedded assembler of the canonical form

__asm__(
    "this"
    "that"
    "the other"
    :  "=r" (anOutput)
    :  "r" (anInput)
    :  "%this-register-is-dead-in-the-water"
    );

all embedded in a little C++ function, using the assembler code to examine the function's inputs and change its outputs. That function was temporarily installed in the actual Xcode project I was developing, so I could be sure the set of options and flags that affected the assembler were the same as for the rest of the project. (I was, of course, planning to use assembler in that project, so that condition was appropriate.) The function was declared and defined in one place and called from somewhere else, early in my project's initialization code, so I could print out results to a log file and see what was happening.

I had figured out the compiler's conventions for register use in calling C functions, but the correct registers did not seem to contain sensible values -- or indeed, any relevant values at all. Eventually, a little fussing with "otool -tvV" (see list items above) (and much easier than using the debugger) led to the discovery that the compiler was in-lining my little function: The calling conventions were irrelevant because there was no call in the first place. Fortunately, there is a directive to prevent in-lining of functions that seems to work. It looks like this, e.g.:

void __attribute__ ((noinline)) foo( int whatever )
{
    /* ... regular C code and assembler herein ... */
}

With that out of the way, I still had problems, and more use of otool -tvV revealed that evidently, the compiler was smart enough to figure out that there was only one call to my little function in the entire body of my big project, and although it did not inline the call, nevertheless it modified the function to contain some of the literal values that were used where that call appeared in the source code. Thus my function happened to be called with one parameter equal to "2", and instead of pushing "2" into a register for the call, the compiler simply inserted a literal "2" wherever it was required in the assembled code for the body of my function. At least, that is what it looked like. What a wonderful optimization, but not what I wanted for purposes of experiment. I might have tried dialing back on optimization (I was running -O3 because that is what I have been using in my project), but it was easier and more consistent with my intentions of using assembler, simply to arrange to call my little function twice, with different literal values. At that point, the compiler gave up and generated the kind of call-with-register-use that all the books tell you to expect.

So for the moment I have outsmarted the compiler and am able to proceed with my project, though I suspect it is plotting other surprises for me. I may modify this comment later, depending on what I find out, but in the interim, those of you who wish to use in-line assembler in MacOS applications created with Xcode should be advised that the matter is subtler than you think, even if you already think it is pretty subtle.

https://clang.llvm.org/compatibility.html#inline-asm

Accepted Answer

I am the original poster of this question. I have made some progress on this one on my own, and since the matter comes up now and then I thought it would be useful to post some useful links I have found, as an aid to others. What follows is a raw list of links, but I think it will be pretty obvious on opening them why I found them useful, and I shall have some further comments on my learning process after the list ... (And the list will probably suggest to you that the reason I wanted to use assembler in the first place had to do with interprocess synchronization.)

======== Start of list of links ========

Bibliography for parallel process synchronization:

x86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors

Peter Sewell Susmit Sarkar Scott Owens University of Cambridge University of Cambridge University of Cambridge

Francesco Zappa Nardelli Magnus O. Myreen INRIA University of Cambridge

       http://www.cl.cam.ac.uk/users/pes20/weakmemory

https://developer.apple.com/documentation/driverkit/3131285-ossynchronizeio

by googling "mfence macintosh"

https://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf

Intel's official stuff:

https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

Embedding assembly in C:

https://clang.llvm.org/compatibility.html

https://llvm.org/docs/LangRef.html

https://llvm.org/docs/CommandGuide/llvm-as.html

https://en.wikipedia.org/wiki/GNU_Assembler

https://cs.lmu.edu/~ray/notes/gasexamples/

https://www.cs.uaf.edu/courses/cs301/2014-fall/notes/inline-assembly/

https://gcc.gnu.org/onlinedocs/gcc-4.0.2/gcc/Extended-Asm.html#Extended-Asm

https://en.wikipedia.org/wiki/X86_calling_conventions

https://stackoverflow.com/tags/inline-assembly/info

https://en.wikipedia.org/wiki/X86_calling_conventions

https://www.agner.org/optimize/calling_conventions.pdf

https://developer.apple.com/forums/thread/64494

======== End of list of links ========

Further comments: I encountered several issues in my investigations, most of which had to do with the fact that the CLang compiler was smarter than I was.

The first issue, however, was simply that I hadn't written any x86-family assembler since the time when the newest, hottest chip in the Intel family was a wonderful ***** with full 16-bit internal data paths, called the 8088: In connection with its friends the 8086 and the 8087, Intel was near the head of the pack in advanced microprocessors in the late 1970s. In particular, I did not know that there were different names for the registers on later-model Intel processors, corresponding to whether you wanted to access 8, 16, 32 or 64 bits of them. This led to a lot of syntax errors, which were certainly mostly due to my own ignorance, but I might mention that the error messages reported by the assembler weren't much more useful than the classic yacc "Syntax error" message.

With that one out of the way, I then was confronted with trying to outsmart the compiler. I was using embedded assembler of the canonical form

__asm__(
    "this"
    "that"
    "the other"
    :  "=r" (anOutput)
    :  "r" (anInput)
    :  "%this-register-is-dead-in-the-water"
    );

all embedded in a little C++ function, using the assembler code to examine the function's inputs and change its outputs. That function was temporarily installed in the actual Xcode project I was developing, so I could be sure the set of options and flags that affected the assembler were the same as for the rest of the project. (I was, of course, planning to use assembler in that project, so that condition was appropriate.) The function was declared and defined in one place and called from somewhere else, early in my project's initialization code, so I could print out results to a log file and see what was happening.

I had figured out the compiler's conventions for register use in calling C functions, but the correct registers did not seem to contain sensible values -- or indeed, any relevant values at all. Eventually, a little fussing with "otool -tvV" (see list items above) (and much easier than using the debugger) led to the discovery that the compiler was in-lining my little function: The calling conventions were irrelevant because there was no call in the first place. Fortunately, there is a directive to prevent in-lining of functions that seems to work. It looks like this, e.g.:

void __attribute__ ((noinline)) foo( int whatever )
{
    /* ... regular C code and assembler herein ... */
}

With that out of the way, I still had problems, and more use of otool -tvV revealed that evidently, the compiler was smart enough to figure out that there was only one call to my little function in the entire body of my big project, and although it did not inline the call, nevertheless it modified the function to contain some of the literal values that were used where that call appeared in the source code. Thus my function happened to be called with one parameter equal to "2", and instead of pushing "2" into a register for the call, the compiler simply inserted a literal "2" wherever it was required in the assembled code for the body of my function. At least, that is what it looked like. What a wonderful optimization, but not what I wanted for purposes of experiment. I might have tried dialing back on optimization (I was running -O3 because that is what I have been using in my project), but it was easier and more consistent with my intentions of using assembler, simply to arrange to call my little function twice, with different literal values. At that point, the compiler gave up and generated the kind of call-with-register-use that all the books tell you to expect.

So for the moment I have outsmarted the compiler and am able to proceed with my project, though I suspect it is plotting other surprises for me. I may modify this comment later, depending on what I find out, but in the interim, those of you who wish to use in-line assembler in MacOS applications created with Xcode should be advised that the matter is subtler than you think, even if you already think it is pretty subtle.

Embedded Assembler in Xcode C++ routines?
 
 
Q