AVVideoComposition fails while trying to read video frame

I have a source video and I want to generate a new video from it by taking a region of each frame of the source video. For example, if I have a video with resolution `A` x `B`, a content size of `X` x `Y` and an output resolution of `C` x `D`, then I want to create a video of resolution `C` x `D` whose content will be the first `X` x `Y` pixels of each frame from the original video.


To achieve this I'm using an `AVAssetReader` for reading the source video and an `AVAssetWriter` for writing the new one. For extracting just the region `X` x `Y` of the source video I'm using an `AVAssetReaderVideoCompositionOutput` object as the output of the asset reader. The setup code is something like:


let output = AVAssetReaderVideoCompositionOutput(...)
output.videoComposition = AVMutableVideoComposition(
    asset: asset,
    videoTrack: videoTrack,
    contentRect: contentRect,
    renderSize: renderSize
)



Then the logic for cropping the video content happens in that custom initialiser:


extension AVMutableVideoComposition {
    convenience init(asset: AVAsset, videoTrack: AVAssetTrack, contentRect: CGRect, renderSize: CGSize) {
        // Compute transform for rendering the video content at `contentRect` with a size equal to `renderSize`.
        let trackFrame = CGRect(origin: .zero, size: videoTrack.naturalSize)
        let transformedFrame = trackFrame.applying(videoTrack.preferredTransform)
        let moveToOriginTransform = CGAffineTransform(translationX: -transformedFrame.minX, y: -transformedFrame.minY)
        let moveToContentRectTransform = CGAffineTransform(translationX: -contentRect.minX, y: -contentRect.minY)
        let scaleTransform = CGAffineTransform(scaleX: renderSize.width / contentRect.width, y: renderSize.height / contentRect.height)
        let transform = videoTrack.preferredTransform.concatenating(moveToOriginTransform).concatenating(moveToContentRectTransform).concatenating(scaleTransform)
      
        let layerInstruction = AVMutableVideoCompositionLayerInstruction(assetTrack: videoTrack)
        layerInstruction.setTransform(transform, at: .zero)
      
        let instruction = AVMutableVideoCompositionInstruction()
        instruction.timeRange = CMTimeRange(start: .zero, duration: asset.duration)
        instruction.layerInstructions = [layerInstruction]
      
        self.init(propertiesOf: asset)
        instructions = [instruction]
        self.renderSize = renderSize
    }
}

This code works fine for some cases, for example, for a content size `(origin = (x = 0, y = 0), size = (width = 1416, height = 1920))`. However, if I change the width to 1417 then it doesn't work and I get the error message:


> Error Domain=AVFoundationErrorDomain Code=-11858 "Source frame unsupported format" UserInfo={NSUnderlyingError=0x283e689c0 {Error Domain=NSOSStatusErrorDomain Code=-12502 "(null)"}, NSLocalizedFailureReason=The video could not be composited., NSDebugDescription=Source frame unsupported format, NSLocalizedDescription=Operation Stopped}


Here is a link to a sample project with the test video I get the error. The cases were this fails look random to me since it works for the widths 1416, 1421, 1422, 1423, 1429 and fails for all the others width values between 1416 and 1429.

What's the problem here and how can I fix the issue?


Why am I using this approach?

The reason why I'm using an `AVAssetReaderVideoCompositionOutput` instead of using an `AVAssetReaderTrackOutput` and then doing the cropping manually is because using the former I can reduce the memory footprint of the app, since in my use case the output render size will be much smaller than the video original size. This is relevant when I'm processing several videos at the same time.