ReplayKit2 Video CMSampleBuffer inconsistencies

Hello - I'm having consistency problems with video CMSampleBuffers captured with iOS's ReplayKit2 on iOS 12-13 when used from a Broadcast Extension.


Some background information:


• broadcast extensions have an extremely tight memory limit: 50MB, even on iPad where screen capture frames are very large

• video sample buffers arrive at 30fps in 420YpCbCr8BiPlanarFullRange (yuv is an odd choice for an bgra buffer, but I guess it's for that sweet 4/1.5 memory saving factor)

• I talk about copying/accessing the video sample buffers for simplicity - the original intention was to rotate portrait to landscape using vImage for reasons outside this question.



I would describe the problem like so: if I try to access the video buffer memory using the CPU, I see inconsistent frames - it looks like tiles of the next frame are appearing in the current frame that I'm trying to copy. Because the data is planar, the effect can be quite pretty, with parts of the next frame's colour appearing on the current frame's hue.



Where it gets confusing is that the following uses all result in consistent output

1. passing the sample buffers directly to an AVAssetWriter

2. copying/rotating using the GPU with a few hundred lines of Metal

3. copying/rotating using a CoreImage 3-liner



I'm not sure what to make of this. Is there a secret sauce for getting a consistent picture of a CMSampleBuffer's CVPixelBuffer that these 3 methods know about and I don't?


Are these three methods somehow GPU-only and non-CPU? That's eyebrow raising for AVAssetWriter, but I can also configure the CIContext to use a software renderer and the result is consistent.


So I have 2 work arounds (or 3 if I want to get the app transcode a rotation of the portrait video file), I should be ecstatic, right? but I'm not because the memory requirement s of my current Metal and CoreImage solutions can easily spike and take me over the 50MB limit.



I would love to get the CPU/vImage approach working, here's what I've tried already:

• locking/unlocking the CVPixelBuffer as readOnly or with no flags (INCONSISTENT)

• locking/unlocking the CVPixelBuffer's IOSurface with all combinations of .readOnly and .avoidSync

• using the IOSurface seedIDs to discard frames that have been modified while locked (INCONSISTENT - not all modifications are reported this way)

• combinations of the above

• manually incrementing and decrementing the IOSurface's use count (lock/unlock incref/decref is what I hope CVPixelBuffer lock/unlock does anyway)

• copying via "outer" base address and rowbytes of CVPixelBuffer/IOSurface and inner-planar too (INCONSISTENT)

• tip-toeing around padding bytes (neither explicitly reading nor writing them)

• still looking at vImageBuffer_InitForCopyFromCVPixelBuffer although this seems married to CoreGraphics where I don't think 420 will work, would still like to know if this fn is capable of getting a consistent snapshot of the buffer

• trying to set IOSurface purgeability (seems to be already set to non purgeable)


I would otherwise like to reduce the memory requirements of my Metal & CoreImage attempts. Things I have tried:

• asking the CoreImage context to not cache anything

• using software/non-software CIContext

• disabling CoreImage colour conversion with render(image:to:bounds:colorSpace:nil)(I just want to copy/rotate dumb components)


I haven't yet profiled the Metal version yet. Is there something smaller than half4?


If anyone has any suggestions I'm all ears.


p.s. some of these OOMs happen as the IOSurfaces get mapped into my address space. Having these shared memory buffers billed to my process seems kinda unfair.

Replies

I am also having memory limit issues right now. If possible I would like to refer to your methodology to find the optimal solution together
I am having the same consistency issue, did you ever solve it without the workarounds?
VImage is a good solution for this problem with low memory cost, but high cpu.
Metal ComputeRender will be a better choice. Do impl MTLTexture by using CVMetalTextureCacheCreateTextureFromImage api to avoid gpu -> cpu copy.