How Can I Access The Secondary MV-HEVC Frame

I’m working with the Spatial Video related APIs in AVFoundation, and while I can create an AVAssetReader that reads an AVAssetTrack that reports a .containsStereoMultiviewVideo media characteristic (on a spatial video recorded by an iPhone 15 Pro), the documentation doesn’t make it clear how I can obtain the secondary video frame from that track.

Does anyone know where to look? I've scoured the forums, documentation, and other resources, and I've had no luck.

Thanks!

Accepted Reply

Hello,

When you read that video track with an AVAssetReaderTrackOutput, the CMSampleBuffers that are output should have an array of taggedBuffers, giving you access to the left and right views.

Replies

I'm trying to do the same.

As far as I understand we have to use Video Toolbox APIs (VTDecompressionSessionSetMultiImageCallback) to get the second frame. So far I haven't figured out what pixel format to use when creating a AVAssetReaderTrackOutput instance and/or the matching CMFormatDescription for the VTDecompressionSession. I've only done simple AVFoundation transcoding in the past so I'm not even sure I'm on the right track :)

Cheers!

Hello,

When you read that video track with an AVAssetReaderTrackOutput, the CMSampleBuffers that are output should have an array of taggedBuffers, giving you access to the left and right views.

Oh, a very important addendum to my answer:

By default, AVAssetReaderTrackOutput will not give you the array of tagged buffers (it assumes you just want the primary view). Here is how you can ask that it give you both layers (as an array of tagged buffers):

// The outputSettings dictionary for the AVAssetReaderTrackOutput.
var outputSettings: [String: Any] = [:]
                
// The decompressionProperties dictionary for the outputSettings.
var decompressionProperties: [String: Any] = [:]
                
// Specify that you want to read both layers.
decompressionProperties[kVTDecompressionPropertyKey_RequestedMVHEVCVideoLayerIDs as String] = [0, 1]
                
// Set the decompressionProperties.
outputSettings[AVVideoDecompressionPropertiesKey] = decompressionProperties
                
// Create your output with the outputSettings and the video track (you can inspect the format description of the video track to make sure it contains multiple layers).
let output = AVAssetReaderTrackOutput(track: videoTracks.first!, outputSettings: outputSettings)

Now, when you call copyNextSampleBuffer(), that sample buffer should have a non-nil array of taggedBuffers.

Addendum 2:

In the snippet above, there is an assumption that the layer IDs will be 0 and 1. This does not have to always be the case, so to be safe you should query for the layer IDs dynamically:

private func loadVideoLayerIdsForTrack(_ videoTrack: AVAssetTrack) async throws -> [Int64]? {
    let formatDescriptions = try await videoTrack.load(.formatDescriptions)
    var tags = [Int64]()
    if let tagCollections = formatDescriptions.first?.tagCollections {
        tags = tagCollections.flatMap({ $0 }).compactMap { tag in
            tag.value(onlyIfMatching: .videoLayerID)
        }
    }
    return tags
}