I'm running into an issue with threadgroup memory where data written to it seemingly gets lost when I use int8_t or int16_t element types:
#include <metal_stdlib>
using namespace metal;
kernel void kernel_function(device int16_t* R, uint index [[thread_position_in_threadgroup]]) {
threadgroup int16_t shared[1];
shared[index] = (int16_t) 42;
threadgroup_barrier(mem_flags::mem_threadgroup);
R[0] = shared[index];
}
If I execute this kernel (using the following host code: https://gist.github.com/maleadt/ffcda8fc94f03f32347c3167ccca70a8 ), I get zeros in my output buffer. If I change the element type from int16_t to int32_t (just find/replace in the kernel and host code) I get the expected results.
I'm new to Metal, so I guess I'm doing something wrong here. I'm using an M1 Pro on Monterey, with Xcode 13.4.1.
EDIT: interestingly, running under MTL_SHADER_VALIDATION=1 results in the expected output, so this does start to look like a miscompilation in the back-end.
Post
Replies
Boosts
Views
Activity
I'm working on an FFI for working with ObjectiveC/Foundation/Metal. AFAIU, as many APIs create and autorelease objects, I need to ensure that an NSAutoreleasePool is active when calling into these APIs. So I've created a wrapper that basically mimics @autoreleasepool by creating and initializing an NSAutoreleasePool, running some code, and then draining the pool. So far so good; I'm using this functionality around most of my entry points into the ObjectiveC world.
There's two issues I don't understand though. The first is that NSAutoreleasePool initialization seems to require an active autorelease pool, which seems like a chicken-and-egg problem. Running with OBJC_DEBUG_MISSING_POOLS=YES and breaking on objc_autoreleaseNoPool, I see:
objc[30336]: MISSING POOLS: (0x1e2e89c40) Object 0x6000007e0050 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
(lldb) bt
objc_autoreleaseNoPool at /usr/lib/libobjc.A.dylib (unknown line)
_ZN19AutoreleasePoolPage17autoreleaseNoPageEP11objc_object at /usr/lib/libobjc.A.dylib (unknown line)
_ZN19AutoreleasePoolPage4pushEv at /usr/lib/libobjc.A.dylib (unknown line)
_CFAutoreleasePoolPush at /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (unknown line)
-[NSAutoreleasePool init] at /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (unknown line)
macro expansion at /Users/tim/Julia/pkg/ObjectiveC/src/syntax.jl:163 [inlined]
NSAutoreleasePool at /Users/tim/Julia/pkg/ObjectiveC/src/foundation.jl:427
The second, related problem is that some APIs that are called by Metal from a background thread I don't control:
(lldb) bt
* thread #11, queue = 'com.Metal.CompletionQueueDispatch', stop reason = breakpoint 1.1
* frame #0: 0x000000018c21b9e8 libobjc.A.dylib`objc_autoreleaseNoPool
frame #1: 0x000000018c1eb99c libobjc.A.dylib`AutoreleasePoolPage::autoreleaseNoPage(objc_object*) + 252
frame #2: 0x000000018c21c9ec libobjc.A.dylib`AutoreleasePoolPage::push() + 76
frame #3: 0x00000001aaef9694 IOGPU`-[IOGPUMetalBuffer dealloc] + 104
frame #4: 0x00000001f83554e4 AGXMetalG15X_B0`-[AGXBuffer dealloc] + 44
frame #5: 0x00000001f837eb98 AGXMetalG15X_B0`-[AGXG15XFamilyBuffer dealloc] + 76
frame #6: 0x000000019687c1c4 Metal`MTLResourceListChunkFreeEntries(MTLResourceListChunk*) + 64
frame #7: 0x000000019674e2b0 Metal`-[MTLResourceList releaseAllObjectsAndReset] + 72
frame #8: 0x00000001aaefbd10 IOGPU`IOGPUMetalCommandBufferStorageReset + 36
frame #9: 0x00000001aaefbcac IOGPU`IOGPUMetalCommandBufferStorageDealloc + 76
frame #10: 0x00000001aaefa130 IOGPU`-[IOGPUMetalCommandBuffer didCompleteWithStartTime:endTime:error:] + 240
frame #11: 0x000000019674dce4 Metal`-[_MTLCommandQueue commandBufferDidComplete:startTime:completionTime:error:] + 108
frame #12: 0x00000001aaf03b54 IOGPU`IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 128
frame #13: 0x00000001aaf03c60 IOGPU`__IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64
frame #14: 0x000000018c4049d0 libdispatch.dylib`_dispatch_client_callout4 + 20
frame #15: 0x000000018c420c5c libdispatch.dylib`_dispatch_mach_msg_invoke + 468
frame #16: 0x000000018c40bd28 libdispatch.dylib`_dispatch_lane_serial_drain + 368
frame #17: 0x000000018c421998 libdispatch.dylib`_dispatch_mach_invoke + 444
frame #18: 0x000000018c40bd28 libdispatch.dylib`_dispatch_lane_serial_drain + 368
frame #19: 0x000000018c40ca08 libdispatch.dylib`_dispatch_lane_invoke + 432
frame #20: 0x000000018c40bd28 libdispatch.dylib`_dispatch_lane_serial_drain + 368
frame #21: 0x000000018c40c9d4 libdispatch.dylib`_dispatch_lane_invoke + 380
frame #22: 0x000000018c41761c libdispatch.dylib`_dispatch_root_queue_drain_deferred_wlh + 288
frame #23: 0x000000018c416e90 libdispatch.dylib`_dispatch_workloop_worker_thread + 404
frame #24: 0x000000018c5b2114 libsystem_pthread.dylib`_pthread_wqthread + 288
(lldb) c
Process 26902 resuming
objc[26902]: MISSING POOLS: (0x1e2e89c40) Object 0x14c8a2400 of class AGXG15SDevice autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
Again, I'm not sure how I'm supposed to run this under an autorelease pool.