Infinite compilation when creating compute pipeline state

I'm converting an existing project to Metal. It currently can run computations on other GPU platforms like CUDA/Optix, OpenCL or the CPU using common headers for expressing the platform-independent calculation logic. I've imported these headers into Metal and gone through the long process of annotating all of the pointer references to make the Metal compiler happy and it's successfully compiling and creating a metallib from my trampoline kernel. The issue is when I run the program and it gets to the point in the Metal setup process where it needs to create a compute pipeline state from the functions I've retreived from the library. The application hangs forever (with the synchronous call) and the state is never created, nor is an error ever thrown. I've discovered that when this is happening, the MTLCompilerService process is running at 100% CPU usage and sampling the process seems to show that llvm is endlessly optimizing the code, but I'm not sure. Is there anything I can do to fix this issue? I can provide a sample project with the compiled metallib if that helps narrow it down.


Sampling process 1005 for 3 seconds with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Analysis of sampling MTLCompilerService (pid 1005) every 1 millisecond
Process:         MTLCompilerService [1005]
Path:            /System/Library/Frameworks/Metal.framework/Versions/A/XPCServices/MTLCompilerService.xpc/Contents/MacOS/MTLCompilerService
Load Address:    0x1041ef000
Identifier:      MTLCompilerService
Version:         ???
Code Type:       X86-64
Parent Process:  ??? [1]


Date/Time:       2020-05-09 23:01:26.694 -0400
Launch Time:     2020-05-09 22:56:38.200 -0400
OS Version:      Mac OS X 10.15.4 (19E287)
Report Version:  7
Analysis Tool:   /usr/bin/sample


Physical footprint:         991.2M
Physical footprint (peak):  3.2G
----


Call graph:
    2756 Thread_   DispatchQueue_22: (null)  (serial)
    + 2756 start_wqthread  (in libsystem_pthread.dylib) + 15  [0x7fff6d1f1b77]
    +   2756 _pthread_wqthread  (in libsystem_pthread.dylib) + 290  [0x7fff6d1f2a3d]
    +     2756 _dispatch_workloop_worker_thread  (in libdispatch.dylib) + 596  [0x7fff6cfa7c09]
    +       2756 _dispatch_lane_invoke  (in libdispatch.dylib) + 363  [0x7fff6cf9e5d6]
    +         2756 _dispatch_lane_serial_drain  (in libdispatch.dylib) + 263  [0x7fff6cf9daf6]
    +           2756 _dispatch_mach_invoke  (in libdispatch.dylib) + 481  [0x7fff6cfae71c]
    +             2756 _dispatch_lane_serial_drain  (in libdispatch.dylib) + 263  [0x7fff6cf9daf6]
    +               2756 _dispatch_mach_msg_invoke  (in libdispatch.dylib) + 435  [0x7fff6cfadbc9]
    +                 2756 _dispatch_client_callout4  (in libdispatch.dylib) + 9  [0x7fff6cf986f8]
    +                   2756 _xpc_connection_mach_event  (in libxpc.dylib) + 934  [0x7fff6d2351cb]
    +                     2756 _xpc_connection_call_event_handler  (in libxpc.dylib) + 56  [0x7fff6d2362bc]
    +                       2756 invocation function for block in MTLCompilerServiceHandleEvent(NSObject<os_xpc_object>*)  (in MTLCompilerService) + 417  [0x1041f0802]
    +                         2756 MTLCodeGenServiceBuildRequest  (in MTLCompiler) + 268  [0x7fff58b15989]
    +                           2756 split_stack_call  (in MTLCompiler) + 13  [0x7fff58b159ed]
    +                             2756 MTLCompilerObject::buildRequest(unsigned int, unsigned int, void const*, unsigned long, void (unsigned int, void const*, unsigned long, char const*) block_pointer)  (in MTLCompiler) + 15824  [0x7fff58b197f0]
    +                               2756 AMDGFX10MTLCompilerPlugin::buildRequest(void const*, unsigned long, unsigned int, void const*, void const**, unsigned long*, void const**, unsigned long*, void const**, unsigned long*, char const**)  (in AMDRadeonX6000Shared) + 965  [0x7fff26a30ab5]
    +                                 2756 AMDGFX10MTLCompilerPlugin::compileShaders(llvm::Module*, void const*, llvm::ILEntryFunc*, char const**)  (in AMDRadeonX6000Shared) + 115  [0x7fff26a2f9d1]
    +                                   2756 AMDMTLCompilerPluginIL::compileShader(llvm::Module&, std::__1::basic_string<char, std::__1::char_traits<char="">, std::__1::allocator >&, std::__1::basic_string<char, std::__1::char_traits<char="">, std::__1::allocator >&)  (in AMDRadeonX6000Shared) + 652  [0x7fff26a2f38a]
    +                                     2756 llvm::legacy::PassManagerImpl::run(llvm::Module&)  (in libLLVM.dylib) + 653  [0x7fff52159bcf]
    +                                       2756 llvm::FPPassManager::runOnModule(llvm::Module&)  (in libLLVM.dylib) + 52  [0x7fff5215402e]
    +                                         2756 llvm::FPPassManager::runOnFunction(llvm::Function&)  (in libLLVM.dylib) + 395  [0x7fff521542cd]
    +                                           2756 llvm::LPPassManager::runOnFunction(llvm::Function&)  (in libLLVM.dylib) + 933  [0x7fff51e4a5d7]
    +                                             2717 llvm::IVUsersWrapperPass::runOnLoop(llvm::Loop*, llvm::LPPassManager&)  (in libLLVM.dylib) + 334  [0x7fff51e08e46]
    +                                             ! 2717 llvm::IVUsers::IVUsers(llvm::Loop*, llvm::AssumptionCache*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::ScalarEvolution*)  (in libLLVM.dylib) + 191  [0x7fff51e08f43]
    +                                             !   2717 llvm::IVUsers::AddUsersIfInteresting(llvm::Instruction*)  (in libLLVM.dylib) + 60  [0x7fff51e090d2]
    +                                             !     2641 llvm::IVUsers::AddUsersImpl(llvm::Instruction*, llvm::SmallPtrSetImpl<llvm::loop*>&)  (in libLLVM.dylib) + 693  [0x7fff51e093a7]
    +                                             !     : 2638 llvm::Loop::hasDedicatedExits() const  (in libLLVM.dylib) + 66  [0x7fff51e47c96]
    +                                             !     : | 1228 llvm::LoopBase<llvm::basicblock, llvm::loop="">::getExitBlocks(llvm::SmallVectorImpl<llvm::basicblock*>&) const  (in libLLVM.dylib) + 85,60,...  [0x7fff51e465b7,0x7fff51e4659e,...]
    +                                             !     : | 865 llvm::LoopBase<llvm::basicblock, llvm::loop="">::getExitBlocks(llvm::SmallVectorImpl<llvm::basicblock*>&) const  (in libLLVM.dylib) + 146  [0x7fff51e465f4]
    +                                             !     : | + 723 ???  (in libLLVM.dylib)  load address 0x7fff51d99000 + 0x25a1e  [0x7fff51dbea1e]
    +                                             !     : | + ! 560 ???  (in libLLVM.dylib)  load address 0x7fff51d99000 + 0x5f42  [0x7fff51d9ef42]
    +                                             !     : | + ! : 560 llvm::SmallPtrSetImplBase::FindBucketFor(void const*) const  (in libLLVM.dylib) + 34,94,...  [0x7fff525ed98c,0x7fff525ed9c8,...]

Replies

I recommend filing a radar for this problem and attaching as small sample project so that we can investigate this further.

Another good data point and experiment would be to use the integrated GPU to see if this is specific to the discrete GPU compiler or not.