tl;dr how can I get raw YUV in a Metal fragment shader from a VideoToolbox 10-bit/BT.2020 HEVC stream without any extra/secret format conversions?
With VideoToolbox and 10-bit HEVC, I've found that it defaults to CVPixelBuffers w/ formats kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarFullRange or kCVPixelFormatType_Lossy_420YpCbCr10PackedBiPlanarFullRange. To mitigate this, I have the following snippet of code to my application:
// We need our pixels unpacked for 10-bit so that the Metal textures actually work
var pixelFormat:OSType? = nil
let bpc = getBpcForVideoFormat(videoFormat!)
let isFullRange = getIsFullRangeForVideoFormat(videoFormat!)
// TODO: figure out how to check for 422/444, CVImageBufferChromaLocationBottomField?
if bpc == 10 {
pixelFormat = isFullRange ? kCVPixelFormatType_420YpCbCr10BiPlanarFullRange : kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
}
let videoDecoderSpecification:[NSString: AnyObject] = [kVTVideoDecoderSpecification_EnableHardwareAcceleratedVideoDecoder:kCFBooleanTrue]
var destinationImageBufferAttributes:[NSString: AnyObject] = [kCVPixelBufferMetalCompatibilityKey: true as NSNumber, kCVPixelBufferPoolMinimumBufferCountKey: 3 as NSNumber]
if pixelFormat != nil {
destinationImageBufferAttributes[kCVPixelBufferPixelFormatTypeKey] = pixelFormat! as NSNumber
}
var decompressionSession:VTDecompressionSession? = nil
err = VTDecompressionSessionCreate(allocator: nil, formatDescription: videoFormat!, decoderSpecification: videoDecoderSpecification as CFDictionary, imageBufferAttributes: destinationImageBufferAttributes as CFDictionary, outputCallback: nil, decompressionSessionOut: &decompressionSession)
In short, I need kCVPixelFormatType_420YpCbCr10BiPlanar so that I have a straightforward MTLPixelFormat.r16Unorm/MTLPixelFormat.rg16Unorm texture binding for Y/CbCr. Metal, seemingly, has no direct pixel format for 420YpCbCr10PackedBiPlanar. I'd also rather not use any color conversion in VideoToolbox, in order to save on processing (and to ensure that the color transforms/transfer characteristics match between streamer/client, since I also have a custom transfer characteristic to mitigate blocking in dark scenes).
However, I noticed that in visionOS 2, the CVPixelBuffer I receive is no longer a compressed render target (likely a bug), which caused GPU texture read bandwidth to skyrocket from 2GiB/s to 30GiB/s. More importantly, this implies that VideoToolbox may in fact be doing an extra color conversion step, wasting memory bandwidth.
Does Metal actually have no way to handle 420YpCbCr10PackedBiPlanar? Are there any examples for reading 10-bit HDR HEVC buffers directly with Metal?