Post

Replies

Boosts

Views

Activity

Reply to Slow performance on iPhone 12/Pro/Max when copying pixel data from metal texture
Hi , I have encountered the same problem. I take MTKView's currentDrawable.texture in commandBuffer.addCompletedHandler, and then as mentioned above, call (getBytes and use AVAssetWriterInputPixelBufferAdaptor to append CVPixelBuffer) on another thread. Same code on different devices. iPhone 12 pro max: 642 * 1388 is good and fps is 60. 887 * 1920 is laggy and fps is 40. iPhone Xs Max: 1242 * 2688 is good and fps is 60. iPhone 7: 1080 * 1920 is good and fps is 60. I use Time Profiler and it shows that getBytes is the heaviest stack trace.
Apr ’21
Reply to Slow performance on iPhone 12/Pro/Max when copying pixel data from metal texture
I submit a feedback and get the reply: This likely has to do with the internal representation of the texture data, which on certain newer Apple Silicon GPU can be compressed so as to save on bandwidth and power. However, when the CPU needs to make a copy into user memory (ie: via getBytes), it needs to perform decompression, which is what the perf issue you found likely is. There is several ways to deal with this, the best one depends on how the texture is being used by your application, which we don’t know, so we’ll just list a few options: Instead of using getBytes into user memory, allocate a MTLBuffer of the same size and issue a GPU blit from the texture into the buffer right after the texture contents you want to get have been computed on the GPU. Then, instead of calling getBytes, just read through the .contents pointer of the buffer. Additional tips for this case: create and reuse from a pool of MTLBuffer to avoid resource creation and destruction repeatedly. Keep using getBytes as you already do. However, make the GPU change the representation of the texture to be friendly to the CPU after the texture contents have been computed on the GPU. See https://developer.apple.com/documentation/metal/mtlblitcommandencoder/2966538-optimizecontentsforcpuaccess. This burns some GPU cycles, but is probably the least intrusive change. To avoid burning the GPU cycles, see the next option. Adjust the texture creation (this assumes you are creating the MTLTexture instance in your code, if it occurs elsewhere outside of your control, this option may not be possible). On the MTLTextureDescriptor, set this property to NO: https://developer.apple.com/documentation/metal/mtltexturedescriptor/2966641-allowgpuoptimizedcontents. This will make the GPU never use compressed internal representation for this texture (and you lose the GPU badwidth/power benefits, but if your usecase involves frequent CPU access, it can be a good tradeoff). Since all of these options are essentially performance tradeoffs, you should review the app performance before and after the change to verify you see the expected upside, and no (or acceptable) downsides elsewhere. (end) So I build a demo project to test the solutions, you can check it here: Github
Sep ’21