Post

Replies

Boosts

Views

Activity

MTLSharedEvent scheduled block called before command buffer scheduling and not in-flight
I am using a MTLSharedEvent to occasionally relay new information from the CPU to the GPU by writing into a MTLBuffer with storage mode .storageModeManaged within a block registered by the shared event (using the notify(_:atValue:block:) method of MTLSharedEvent, with a MTLSharedEventListener configured to be notified on a background dispatch queue). The process looks something like this: let device = MTLCreateSystemDefaultDevice()! 	let synchronizationQueue = DispatchQueue(label: "com.myproject.synchronization") 		 		let sharedEvent = device.makeSharedEvent()! 		let sharedEventListener = MTLSharedEventListener(dispatchQueue: synchronizationQueue) 		 		// Updated only occasionally on the CPU (on user interaction). Mostly written to 		// on the GPU 		let managedBuffer = device.makeBuffer(length: 10, options: .storageModeManaged)! 		 		var doExtra = true func computeSomething(commandBuffer: MTLCommandBuffer) { 	 	 // Do work on the GPU every frame 	 // After writing to the buffer on the GPU, synchronize the buffer (required) 	 let blitToSynchronize = commandBuffer.makeBlitCommandEncoder()! 				blitToSynchronize.synchronize(resource: managedBuffer) 				blitToSynchronize.endEncoding() 				 	 // Occassionally, add extra information on the GPU 	 if doExtraWork { 					 			 // Register a block to write into the buffer 			sharedEvent.notify(sharedEventListener, atValue: 1) { event, value in 								 						 // Safely write into the buffer. Make sure we call `didModifyRange(_:)` after 								 						// Update the counter 						event.signaledValue = 2 			} 		 commandBuffer.encodeSignalEvent(sharedEvent, value: 1) 		 commandBuffer.encodeWaitForEvent(sharedEvent, value: 2) 	 } 				 				// Commit the work 			 commandBuffer.commit() } The expected behavior is as follows: The GPU does some work with the managed buffer Occasionally, the information needs to be updated with new information on the CPU. In this frame, we register a block of work to be executed. We do so in a dedicated block because we cannot guarantee that by the time execution on the main thread reaches this point the GPU is not simultaneously reading from or writing to the managed buffer. Hence, it is unsafe to simply write to it currently and must make sure the GPU is not doing anything with this data When the GPU schedules this command buffer to be executed, commands executed before the encodeSignalEvent(_:value:) call are executed and then execution on the GPU stops until the block increments the signaledValue property of the event passed into the block When execution reaches the block, we can safely write into the managed buffer because we know the CPU has exclusive access to the resource. Once we've done so, we resume execution of the GPU The issue is that it seems Metal is not calling the block when the GPU is executing the command, but rather *before* the command buffer is even scheduled. Worse, the system seems to "work" with the initial command buffer (the very first command buffer, before any other are scheduled). I first noticed this issue when I looked at a GPU frame capture after my scene would vanish after a CPU update, which is where I saw that the GPU had NaNs all over the place. I then ran into this strange situation when I purposely waited on the background dispatch queue with a sleep(:_) call. Quite correctly, my shared resource semaphore (not shown, signaled in a completion block of the command buffer and waited on in the main thread) reached a value of -1 after committing three command buffers to the command queue (three being the number of recycled shared MTLBuffers holding scene uniform data etc.). This suggests that the first command buffer has not finished executing by then time the CPU is more than three frames ahead, which is consistent with the sleep(_:) behavior. Again, what isn't consistent is the ordering: Metal seems to call the block before even scheduling the buffer. Further, in subsequent frames, it doesn't seem that Metal cares that the sharedEventListener block is taking so long and schedules the command buffer for execution even while the block is running, which finishes dozens of frames later. This behavior is completely inconsistent with what I expect. What is going on here? P.S. There is probably a better way to periodically update a managed buffer whose contents are mostly modified on the GPU, but I have not yet found a way to do so. Any advice on this subject is appreciated as well. Of course, a triple buffer system *could* work, but it would waste a lot of memory as the managed buffer is quite large (whereas the shared buffers managed by the semaphore are quite small)
2
0
1.1k
Feb ’21
GPU Hardware and Metal concerning Tile Memory
In the WWDC talks on Metal that I have watched so far, many of the videos talk about Apple's A_ (fill in the blank, 11, 12, etc.) chip and the power it gives to the developer, such as allowing developers to leverage tile memory by opting to use TBDR. On macOS (at least Intel macs without the M1 chip), TBDR is unavailable, and other objects that leverage tile memory like image blocks are also unavailable. That made me wonder about the structure of the GPUs on macOS and external GPUs like the Blackmagic eGPU (which is currently hooked up to my computer). Are the concepts of tile memory ubiquitous across GPU architectures? For example, if in a Metal kernel function we declared threadgroup float tgfloats[16]; Is this value stored in tile memory (threadgroup memory) on the Blackmagic? Or is there an equivalent storage that is dependent on hardware but available on all hardware in some form? I know there are some WWDCs that deal with multiple GPUs which will probably be helpful, but extra information is always useful. Any links to information about GPU hardware architectures would be appreciated as well
2
0
2.0k
Nov ’20
Metal Debugger Issues
I have been unable to use the metal debugger ever since Apple released Xcode 12 as an update on the app store. It is very frustrating. Xcode 12.0.1 simply crashed on frame capture or after trying to debug a fragment/vertex. Now, Xcode 12.2 issues the following message: "Shader Debugger is not supported in this system configuration. Please install an Xcode with an SDK that is aligned to your target device OS version." I have macOS 10.15.7 and have not upgraded to Big Sur yet I downloaded Xcode 11.7 from the developer website but again, Xcode simply crashes. I will try other older Xcode versions but this should not be something that developers face, especially those working with Metal as it is nearly impossible to debug shaders without the shader debugger. Has anybody else had this issue? If so, what did you do to resolve it?
6
0
1.9k
Nov ’20
What is the purpose of threadgroup memory in Metal?
I have been working with Metal for a little while now and I have encountered the threadgroup address space. After reading a little about it in Apple’s MSL reference, I am aware of how threadgroups are formed and how they can be split into SIMD groups; however, I have not yet seen threadgroup memory in action. Can someone give me some examples of when/how threadgroup memory is used? Specifically, how is the [[threadgroup(n)]] attribute used in both kernel and fragment shaders? References to WWDC videos, articles, and/or other resources would be appreciated.
2
0
1.9k
Sep ’20