Hi I'm working on a project that require video frame PTS to be consistent between original video and a transcoded one. It's working fairly well on regular mp4, however if I set preferredOutputSegmentInterval to have generate a fMP4 output, even I specified the initialSegmentStartTime as 0, it always add one frame pts offset to all frames.
For example: if I use the code sample provided by Apple: https://developer.apple.com/videos/play/wwdc2020/10011/?time=406, useffprobe -select_streams v:0 -show_entries packet=pts_time -of csv ~/Downloads/fmp4/prog_index.m3u8 to display the pts of the output, it doesn't start from 0, but has some one frame pts offset. I also tried open with MP4Box, it also shows the first frames dts and cts are not start from 0.
However, if I use AVAssetReader to read the same output video, and get the PTS from 1st frame, it's returning 0. So I can't use it to calculate the pts difference between 2 videos neither.
Can I get some help to understand why there is difference between AVAssetWriter/Reader fMP4's pts and others like ffprobe?
VideoToolbox
RSS for tagWork directly with hardware-accelerated video encoding and decoding capabilities using VideoToolbox.
Posts under VideoToolbox tag
28 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
Hi, I'm trying to decode a HLS livestream with VideoToolbox. The CMSampleBuffer is successfully created (OSStatus == noErr). When I enqueue the CMSampleBuffer to a AVSampleBufferDisplayLayer the view isn't displaying anything and the status of the AVSampleBufferDisplayLayer is 1 (rendering).
When I use a VTDecompressionSession to convert the CMSampleBuffer to a CVPixelBuffer the VTDecompressionOutputCallback returns a -8969 (bad data error).
What do I need to fix in my code? Do I incorrectly parse the data from the segment for the CMSampleBuffer?
let segmentData = try await downloadSegment(from: segment.url)
let (sps, pps, idr) = try parseH264FromTSSegment(tsData: segmentData)
if self.formatDescription == nil {
self.formatDescription = try CMFormatDescription(h264ParameterSets: [sps, pps])
}
if let sampleBuffer = try createSampleBuffer(from: idr, segment: segment) {
try self.decodeSampleBuffer(sampleBuffer)
}
func parseH264FromTSSegment(tsData: Data) throws -> (sps: Data, pps: Data, idr: Data) {
let tsSize = 188
var pesData = Data()
for i in stride(from: 0, to: tsData.count, by: tsSize) {
let tsPacket = tsData.subdata(in: i..<min(i + tsSize, tsData.count))
guard let payload = extractPayloadFromTSPacket(tsPacket) else { continue }
pesData.append(payload)
}
let nalUnits = parseNalUnits(from: pesData)
var sps: Data?
var pps: Data?
var idr: Data?
for nalUnit in nalUnits {
guard let firstByte = nalUnit.first else { continue }
let nalType = firstByte & 0x1F
switch nalType {
case 7: // SPS
sps = nalUnit
case 8: // PPS
pps = nalUnit
case 5: // IDR
idr = nalUnit
default:
break
}
if sps != nil, pps != nil, idr != nil {
break
}
}
guard let validSPS = sps, let validPPS = pps, let validIDR = idr else {
throw NSError()
}
return (validSPS, validPPS, validIDR)
}
func extractPayloadFromTSPacket(_ tsPacket: Data) -> Data? {
let syncByte: UInt8 = 0x47
guard tsPacket.count == 188, tsPacket[0] == syncByte else {
return nil
}
let payloadStart = (tsPacket[1] & 0x40) != 0
let adaptationFieldControl = (tsPacket[3] & 0x30) >> 4
var payloadOffset = 4
if adaptationFieldControl == 2 || adaptationFieldControl == 3 {
let adaptationFieldLength = Int(tsPacket[4])
payloadOffset += 1 + adaptationFieldLength
}
guard adaptationFieldControl == 1 || adaptationFieldControl == 3 else {
return nil
}
let payload = tsPacket.subdata(in: payloadOffset..<tsPacket.count)
return payloadStart ? payload : nil
}
func parseNalUnits(from h264Data: Data) -> [Data] {
let startCode = Data([0x00, 0x00, 0x00, 0x01])
var nalUnits: [Data] = []
var searchRange = h264Data.startIndex..<h264Data.endIndex
while let range = h264Data.range(of: startCode, options: [], in: searchRange) {
let nextStart = h264Data.range(of: startCode, options: [], in: range.upperBound..<h264Data.endIndex)?.lowerBound ?? h264Data.endIndex
let nalUnit = h264Data.subdata(in: range.upperBound..<nextStart)
nalUnits.append(nalUnit)
searchRange = nextStart..<h264Data.endIndex
}
return nalUnits
}
private func createSampleBuffer(from data: Data, segment: HLSSegment) throws -> CMSampleBuffer? {
var blockBuffer: CMBlockBuffer?
let alignedData = UnsafeMutableRawPointer.allocate(byteCount: data.count, alignment: MemoryLayout<UInt8>.alignment)
data.copyBytes(to: alignedData.assumingMemoryBound(to: UInt8.self), count: data.count)
let blockStatus = CMBlockBufferCreateWithMemoryBlock(
allocator: kCFAllocatorDefault,
memoryBlock: alignedData,
blockLength: data.count,
blockAllocator: nil,
customBlockSource: nil,
offsetToData: 0,
dataLength: data.count,
flags: 0,
blockBufferOut: &blockBuffer
)
guard blockStatus == kCMBlockBufferNoErr, let validBlockBuffer = blockBuffer else {
alignedData.deallocate()
throw NSError()
}
var sampleBuffer: CMSampleBuffer?
var timing = [calculateTiming(for: segment)]
var sampleSizes = [data.count]
let sampleStatus = CMSampleBufferCreate(
allocator: kCFAllocatorDefault,
dataBuffer: validBlockBuffer,
dataReady: true,
makeDataReadyCallback: nil,
refcon: nil,
formatDescription: formatDescription,
sampleCount: 1,
sampleTimingEntryCount: 1,
sampleTimingArray: &timing,
sampleSizeEntryCount: sampleSizes.count,
sampleSizeArray: &sampleSizes,
sampleBufferOut: &sampleBuffer
)
guard sampleStatus == noErr else {
alignedData.deallocate()
throw NSError()
}
return sampleBuffer
}
private func decodeSampleBuffer(_ sampleBuffer: CMSampleBuffer) throws {
guard let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer) else {
throw NSError()
}
if decompressionSession == nil {
try setupDecompressionSession(formatDescription: formatDescription)
}
guard let session = decompressionSession else {
throw NSError()
}
let flags: VTDecodeFrameFlags = [._EnableAsynchronousDecompression, ._EnableTemporalProcessing]
var flagOut = VTDecodeInfoFlags()
let status = VTDecompressionSessionDecodeFrame(
session,
sampleBuffer: sampleBuffer,
flags: flags,
frameRefcon: nil,
infoFlagsOut: nil)
if status != noErr {
throw NSError()
}
}
private func setupDecompressionSession(formatDescription: CMFormatDescription) throws {
self.formatDescription = formatDescription
if let session = decompressionSession {
VTDecompressionSessionInvalidate(session)
self.decompressionSession = nil
}
var decompressionSession: VTDecompressionSession?
var callback = VTDecompressionOutputCallbackRecord(
decompressionOutputCallback: decompressionOutputCallback,
decompressionOutputRefCon: Unmanaged.passUnretained(self).toOpaque())
let status = VTDecompressionSessionCreate(
allocator: kCFAllocatorDefault,
formatDescription: formatDescription,
decoderSpecification: nil,
imageBufferAttributes: nil,
outputCallback: &callback,
decompressionSessionOut: &decompressionSession
)
if status != noErr {
throw NSError()
}
self.decompressionSession = decompressionSession
}
let decompressionOutputCallback: VTDecompressionOutputCallback = { (
decompressionOutputRefCon,
sourceFrameRefCon,
status,
infoFlags,
imageBuffer,
presentationTimeStamp,
presentationDuration
) in
guard status == noErr else {
print("Callback: \(status)")
return
}
if let imageBuffer = imageBuffer {
}
}
We have developed a simple video player Swift application for macOS, which uses the AVFoundation Framework. A special feature of this app is the ability to play the video backward with speeds like -0.25x, -0.5x, and -1.0x. MP4 video file is played directly from the local file system, video codec is h.264, and audio AAC. Video files are huge, like 10 GB, and a length of 3 hours.
Playing video in reverse direction works well on a Macbook Air with M1 or M2 chip. When we run the same app with the same video on a Macbook Air with M3 chip the reverse playback is much worse. Playback might stutter badly, especially in the latter part of the video. This same behavior also happens in Apple's Quicktime video player when playing in the reverse direction with -1x speed. What's even more strange is that at one point of a time, the video playback is totally smooth, but again, after a while, the playback is stuttering. For example, this morning reverse playback worked 100 % smoothly, then I rebooted the Mac and tried again: the result was stuttering. After this the Mac stayed idle for several hours and I tried to reverse play video again: smooth performance! My conclusion: M3 playback works fine if the stars in the sky are aligned correctly. :-)
So it's not only our app, but also Quicktime player is having exactly the same behavior. And only with the M3 chip. The same symptom appears with another similar M3 Mac, so it can't be a single fault. At the same time, open-source video player iina can reverse play the video well on the same Mac.
All Macs have otherwise identical configuration: 16 GB RAM and macOS 15.1.1.
Have you experienced the same problem? Any chance to solve this problem?
I really hope that the M4 chip Mac is behaving better here.
I don’t know what you felt was wrong with video scrubbing that you needed to **** it up this badly. I can’t scrub within the video, only within several seconds of the pause or worse it restarts the whole damn video.
used to be an Easy and enjoyable process to harvest photos from a video and you’ve turned it into the most frustrating part of operating my phone.
release me a patch to optionally enable the old scrubbing behavior.
I’m using AVFoundation in my iPhone application to encode a video in MP4 format with H.264, which can then be shared or exported.
Do I need to pay a license for using the H.264 format to MPEG LA? Or are these fees already covered by Apple?
I’ve read articles suggesting that Apple covers these fees when encoding is done through its native APIs (or via its dedicated encoding hardware components), but I haven’t found any explicit confirmation of this point in the various documentation or contracts... Did I miss something?
xcode 16:
VT_EXPORT void
VT_EXPORT OSStatus
VTPixelTransferSessionCreate(
CM_NULLABLE CFAllocatorRef allocator,
CM_RETURNS_RETAINED_PARAMETER CM_NULLABLE VTPixelTransferSessionRef * CM_NONNULL pixelTransferSessionOut) API_AVAILABLE(macos(10.8), ios(16.0), tvos(16.0), visionos(1.0)) API_UNAVAILABLE(watchos);
xcode 15:
VT_EXPORT OSStatus
VTPixelTransferSessionCreate(
CM_NULLABLE CFAllocatorRef allocator,
CM_RETURNS_RETAINED_PARAMETER CM_NULLABLE VTPixelTransferSessionRef * CM_NONNULL pixelTransferSessionOut) VT_AVAILABLE_STARTING(10_8);
Hi,
Im trying to use this example (https://developer.apple.com/documentation/avfoundation/media_reading_and_writing/converting_side-by-side_3d_video_to_multiview_hevc_and_spatial_video)
to encode a stereoscopic (left eye right eye) video frame using MVHEVC. The sample project creates tagged buffers for left and right eye, and uses a writer to write the MVHEC encoded video buffers. But i after i get right and left tagged buffers, i want to use VideoMaterial and its AVSampleBufferVideoRenderer to enqueue these video frames. If i render MVHEVC encoded left eye sample buffer, and right eye sample buffer, sequentially will the AVSampleBufferVideoRenderer render it as a stereoscopic view? How does this work with VideoMaterial and AVSampleBufferVideoRenderer ? Thanks!
I was trying to migrate Core Image based code that's rotating an image in a CVPixelBuffer to the newer VTPixelRotationSession from Video Toolbox. Hoping to increase performance.
The original code does:
let rotatedImage = CIImage(cvPixelBuffer: origPixelBuffer).oriented(.left)
context.render(rotatedImage, to: newPixelBuffer)
The new code uses a session:
_ = VTPixelRotationSessionRotateImage(rotationSession, origPixelBuffer, newPixelBuffer)
However I immediately ran into memory limitations, since my code has to be able to run in an iOS extension. It seems VTPixelRotationSessionRotateImage easily lets memory usage spike over the 50MB of allowed memory. While the CIImage based implementation has no such high memory usage at all.
Is this expected? Does the VTPixelRotationSession implementation gain more performance by sacrificing memory? Or is there something I'm overlooking?
I was expecting the VTPixelRotationSession at worst to be on par in terms of memory usage and processing speed compared to CIImage. At this moment it seems VTPixelRotationSession is unusable in extensions.
See also Feedback: FB14977240
My team and I have created an iPhone application that receives and utilizes sensor data from a separate raspberrypi-powered device.
The iPhone app does not function without the use of the sensor data. How will the App Reviewers be able to test the application when submitting to the App Store?
How is it possible to enable EDR on Apple TV without AVFoundation for custom HDR video playback? The use case is a custom video player for HDR playback via VideoToolbox and Metal, which seem to render colors correctly on iOS but not on tvOS.
All related documentation and WWDC sessions describe APIs that are unavailable for tvOS:
let metalLayer = CAMetalLayer()
metalLayer.wantsExtendedDynamicRangeContent = true
metalLayer.edrMetadata = CAEDRMetadata.hdr10(minLuminance: 0.0, maxLuminance: 1000, opticalOutputScale: 100)
What's the alternative path for tvOS to have correct system tone mapping for a setup like:
metalLayer.pixelFormat = .rgba16Float // (or .bgr10_xr)
metalLayer.colorspace = CGColorSpace(name: CGColorSpace.itur_2100_PQ)
Video format: HEVC, YUV 4:2:0 10bit, BT.2020 PQ.
We do set the preferredDisplayCriteria on AVDisplayManager and thus video range matching is in place.
WWDC Ref: https://developer.apple.com/videos/play/wwdc2022/110565?time=557
Is there any way to place 3D objects, maybe using ARKit or Metalkit in the video.
I have tried to extract frames from video, then draw a cube using SCNNode and then render it into UIImage, then gather all images and create video.
But this is not feasible solution as it creates huge memory spike and ultimately gives memory warning.
So is there any other way to draw 3D objects on the video file.
Hello
I am testing the new Media Extension API in macOS 15 Beta 4.
Firstly, THANK YOU FOR THIS API!!!!!! This is going to be huge for the video ecosystem on the platform. Seriously!
My understanding is that to support custom container formats you make a MEFormatReader extension, and to support a specific custom codec, you create a MEVideoDecoder for that codec.
Ok - I have followed the docs - esp the inline header info and have gotten quite far
A Host App which hosts my Media Extenion (MKV files)
A Extension Bundle which exposes the UTTYpes it supports to the system and plugin class ID as per the docs
Entitlements as per docs
I'm building debug - but I have a valid Developer ID / Account associated in Teams in Xcode
My Plugin is visible to the Media Extension System preference
My Plugin is properly initialized, I get the MEByteReader and can read container level metadata in callbacks
I can instantiate my tracks readers, and validate the tracks level information and provide the callbacks
I can instantiate my sample cursors, and respond to seek requests for samples for the track in question
Now, here is where I get hit some issues.
My format reader is leveraging FFMPEGs libavformat library, and I am testing with MKV files which host AVC1 h264 samples, which should be decodable as I understand it out of the box from VideoToolbox (ie, I do not need a separate MEVideoDecoder plugin to handle this format).
Here is my CMFormatDescription which I vend from my MKV parser to AVFoundation via the track reader
Made Format Description: <CMVideoFormatDescription 0x11f005680 [0x1f7d62220]> {
mediaType:'vide'
mediaSubType:'avc1'
mediaSpecific: {
codecType: 'avc1' dimensions: 1920 x 1080
}
extensions: {(null)}
}
My MESampleCursor implementation implements all of the callbacks - and some of the 'optional' sample cursor location methods: (im only sharing the optional ones here)
- (MESampleLocation * _Nullable) sampleLocationReturningError:(NSError *__autoreleasing _Nullable * _Nullable) error
- (MESampleCursorChunk * _Nullable) chunkDetailsReturningError:(NSError *__autoreleasing _Nullable * _Nullable) error
I also populate the AVSampleCursorSyncInfo and AVSampleCursorDependencyInfo structs per each AVPacket* I decode from libavformat
Now my issue:
I get these log files in my host app:
<<<< VRP >>>> figVideoRenderPipelineSetProperty signalled err=-12852 (kFigRenderPipelineError_InvalidParameter) (sample attachment collector not enabled) at FigStandardVideoRenderPipeline.c:2231
<<<< VideoMentor >>>> videoMentorDependencyStateCopyCursorForDecodeWalk signalled err=-12836 (kVideoMentorUnexpectedSituationErr) (Node not found for target cursor -- it should have been created during videoMentorDependencyStateAddSamplesToGraph) at VideoMentor.c:4982
<<<< VideoMentor >>>> videoMentorThreadCreateSampleBuffer signalled err=-12841 (err) (FigSampleGeneratorCreateSampleBufferAtCursor failed) at VideoMentor.c:3960
<<<< VideoMentor >>>> videoMentorThreadCreateSampleBuffer signalled err=-12841 (err) (FigSampleGeneratorCreateSampleBufferAtCursor failed) at VideoMentor.c:3960
Which I presume is telling me I am not providing the GOP or dependency metadata correctly to the plugin.
I've included console logs from my extension and host app:
LibAVExtension system logs
And my SampleCursor implementation is here
https://github.com/vade/FFMPEGMediaExtension/blob/main/LibAVExtension/LibAVSampleCursor.m
Any guidance is very helpful.
Thank you!
tl;dr how can I get raw YUV in a Metal fragment shader from a VideoToolbox 10-bit/BT.2020 HEVC stream without any extra/secret format conversions?
With VideoToolbox and 10-bit HEVC, I've found that it defaults to CVPixelBuffers w/ formats kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarFullRange or kCVPixelFormatType_Lossy_420YpCbCr10PackedBiPlanarFullRange. To mitigate this, I have the following snippet of code to my application:
// We need our pixels unpacked for 10-bit so that the Metal textures actually work
var pixelFormat:OSType? = nil
let bpc = getBpcForVideoFormat(videoFormat!)
let isFullRange = getIsFullRangeForVideoFormat(videoFormat!)
// TODO: figure out how to check for 422/444, CVImageBufferChromaLocationBottomField?
if bpc == 10 {
pixelFormat = isFullRange ? kCVPixelFormatType_420YpCbCr10BiPlanarFullRange : kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
}
let videoDecoderSpecification:[NSString: AnyObject] = [kVTVideoDecoderSpecification_EnableHardwareAcceleratedVideoDecoder:kCFBooleanTrue]
var destinationImageBufferAttributes:[NSString: AnyObject] = [kCVPixelBufferMetalCompatibilityKey: true as NSNumber, kCVPixelBufferPoolMinimumBufferCountKey: 3 as NSNumber]
if pixelFormat != nil {
destinationImageBufferAttributes[kCVPixelBufferPixelFormatTypeKey] = pixelFormat! as NSNumber
}
var decompressionSession:VTDecompressionSession? = nil
err = VTDecompressionSessionCreate(allocator: nil, formatDescription: videoFormat!, decoderSpecification: videoDecoderSpecification as CFDictionary, imageBufferAttributes: destinationImageBufferAttributes as CFDictionary, outputCallback: nil, decompressionSessionOut: &decompressionSession)
In short, I need kCVPixelFormatType_420YpCbCr10BiPlanar so that I have a straightforward MTLPixelFormat.r16Unorm/MTLPixelFormat.rg16Unorm texture binding for Y/CbCr. Metal, seemingly, has no direct pixel format for 420YpCbCr10PackedBiPlanar. I'd also rather not use any color conversion in VideoToolbox, in order to save on processing (and to ensure that the color transforms/transfer characteristics match between streamer/client, since I also have a custom transfer characteristic to mitigate blocking in dark scenes).
However, I noticed that in visionOS 2, the CVPixelBuffer I receive is no longer a compressed render target (likely a bug), which caused GPU texture read bandwidth to skyrocket from 2GiB/s to 30GiB/s. More importantly, this implies that VideoToolbox may in fact be doing an extra color conversion step, wasting memory bandwidth.
Does Metal actually have no way to handle 420YpCbCr10PackedBiPlanar? Are there any examples for reading 10-bit HDR HEVC buffers directly with Metal?
I’m creating a objective C command-line utility to encode RAW image sequences to ProRes 4444, but I’m encountering, blocky compression artifacts in the ProRes 4444 video output.
To test the integrity of the image data before encoding to ProRes, I added a snippet in my encoding function that saves a 16-bit PNG before encoding to ProRes and the PNG looks perfect, I can see all detail in every part of the image dynamic range.
Here’s a comparison between the 16-bit PNG(on the right) and the ProRes 4444 output. (on the left)
As a further test, I re-encoded the ‘test PNG’ to ProRes 4444 using DaVinci Resolve, and the ProRes4444 output video from Resolve doesn’t have any blocky compression artifacts. Looks identical.
In short, this is what the utility does:
Unpacks the 12-bit raw data into 16-bit values. After unpacking, the raw data is debayered to convert it into a standard color image format (BGR) using OpenCV.
Scale the debayered pixel values from their original 12-bit depth to fit into a 16-bit range. Up to this point everything is fine and confirmed by saving 16bit PNGs.
The images are encoded to ProRes 4444 using the AVFoundation framework.
The pixel buffers are created and managed using dictionary method with ‘kCVPixelFormatType_64RGBALE’.
I need help figuring this out, I’m a real novice when it comes to AVfoundation/encoding to ProRes.
See relevant parts of my 'encodeToProRes' function:
void encodeToProRes(const std::string &outputPath, const std::vector<std::string> &rawPaths, const std::string &proResFlavor) {
NSError *error = nil;
NSURL *url = [NSURL fileURLWithPath:[NSString stringWithUTF8String:outputPath.c_str()]];
AVAssetWriter *assetWriter = [AVAssetWriter assetWriterWithURL:url fileType:AVFileTypeQuickTimeMovie error:&error];
if (error) {
std::cerr << "Error creating AVAssetWriter: " << error.localizedDescription.UTF8String << std::endl;
return;
}
// Load the first image to get the dimensions
std::cout << "Debayering the first image to get dimensions..." << std::endl;
Mat firstImage;
int width = 5320;
int height = 3900;
if (!debayer_image(rawPaths[0], firstImage, width, height)) {
std::cerr << "Error debayering the first image" << std::endl;
return;
}
width = firstImage.cols;
height = firstImage.rows;
// Save the first frame as a PNG 16-bit image for validation
std::string pngFilePath = outputPath + "_frame1.png";
if (!imwrite(pngFilePath, firstImage)) {
std::cerr << "Error: Failed to save the first frame as a PNG image" << std::endl;
} else {
std::cout << "First frame saved as PNG: " << pngFilePath << std::endl;
}
NSString *codecKey = nil;
if (proResFlavor == "4444") {
codecKey = AVVideoCodecTypeAppleProRes4444;
} else if (proResFlavor == "422HQ") {
codecKey = AVVideoCodecTypeAppleProRes422HQ;
} else if (proResFlavor == "422") {
codecKey = AVVideoCodecTypeAppleProRes422;
} else if (proResFlavor == "LT") {
codecKey = AVVideoCodecTypeAppleProRes422LT;
} else {
std::cerr << "Error: Invalid ProRes flavor specified: " << proResFlavor << std::endl;
return;
}
NSDictionary *outputSettings = @{
AVVideoCodecKey: codecKey,
AVVideoWidthKey: @(width),
AVVideoHeightKey: @(height)
};
AVAssetWriterInput *videoInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeVideo outputSettings:outputSettings];
videoInput.expectsMediaDataInRealTime = YES;
NSDictionary *pixelBufferAttributes = @{
(id)kCVPixelBufferPixelFormatTypeKey: @(kCVPixelFormatType_64RGBALE),
(id)kCVPixelBufferWidthKey: @(width),
(id)kCVPixelBufferHeightKey: @(height)
};
AVAssetWriterInputPixelBufferAdaptor *adaptor = [AVAssetWriterInputPixelBufferAdaptor assetWriterInputPixelBufferAdaptorWithAssetWriterInput:videoInput sourcePixelBufferAttributes:pixelBufferAttributes];
...
[assetWriter startSessionAtSourceTime:kCMTimeZero];
CMTime frameDuration = CMTimeMake(1, 24); // Frame rate of 24 fps
int numFrames = static_cast<int>(rawPaths.size());
...
// Encoding thread
std::thread encoderThread([&]() {
int frameIndex = 0;
std::vector<CVPixelBufferRef> pixelBufferBuffer;
while (frameIndex < numFrames) {
std::unique_lock<std::mutex> lock(queueMutex);
queueCondVar.wait(lock, [&]() { return !frameQueue.empty() || debayeringFinished; });
if (!frameQueue.empty()) {
auto [index, debayeredImage] = frameQueue.front();
frameQueue.pop();
lock.unlock();
if (index == frameIndex) {
cv::Mat rgbaImage;
cv::cvtColor(debayeredImage, rgbaImage, cv::COLOR_BGR2RGBA);
CVPixelBufferRef pixelBuffer = NULL;
CVReturn result = CVPixelBufferPoolCreatePixelBuffer(NULL, adaptor.pixelBufferPool, &pixelBuffer);
if (result != kCVReturnSuccess) {
std::cerr << "Error: Could not create pixel buffer" << std::endl;
dispatch_group_leave(dispatchGroup);
return;
}
CVPixelBufferLockBaseAddress(pixelBuffer, 0);
void *pxdata = CVPixelBufferGetBaseAddress(pixelBuffer);
for (int row = 0; row < height; ++row) {
memcpy(static_cast<uint8_t*>(pxdata) + row * CVPixelBufferGetBytesPerRow(pixelBuffer),
rgbaImage.ptr(row),
width * 8);
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
pixelBufferBuffer.push_back(pixelBuffer);
...
Thanks very much!
Hello,
Notice that when using h.265 VideoToolBox encoding with Handbrake (latest snapshots) the resulting output is larger than the 264 source.
Obviously this was the case for OSX 14.x and below.
Is this a known "regression" ?
Thanks.
My app stores and transports lots of groups of similar PNGs. These aren't compressed well by official algorithms like .lzfse, .lz4, .lzbitmap... not even bz2, but I realized that they are well-suited for compression by video codecs since they're highly similar to one another.
I ran an experiment where I compressed a dozen images into an HEVCWithAlpha .mov via AVAssetWriter, and the compression ratio was fantastic, but when I retrieved the PNGs via AVAssetImageGenerator there were lots of artifacts which simply wasn't acceptable. Maybe I'm doing something wrong, or maybe I'm chasing something that doesn't exist.
Is there a way to use video compression like a specialized archive to store and retrieve PNGs losslessly while retaining alpha? I have no intention of using the videos except as condensed storage.
Any suggestions on how to reduce storage size of many large PNGs are also welcome. I also tried using HEVC instead of PNG via the new UIImage.hevcData(), but the decompression/processing times were just insane (5000%+ increase), on top of there being fatal errors when using async.
I'm trying to cast the screen from an iOS device to an Android device.
I'm leveraging ReplayKit on iOS to capture the screen and VideoToolbox for compressing the captured video data into H.264 format using CMSampleBuffers. Both iOS and Android are configured for H.264 compression and decompression.
While screen casting works flawlessly within the same platform (iOS to iOS or Android to Android), I'm encountering an error ("not in avi mode") on the Android receiver when casting from iOS. My research suggests that the underlying container formats for H.264 might differ between iOS and Android.
Data transmission over the TCP socket seems to be functioning correctly.
My question is:
Is there a way to ensure a common container format for H.264 compression and decompression across iOS and Android platforms?
Here's a breakdown of the iOS sender details:
Device: iPhone 13 mini running iOS 17
Development Environment: Xcode 15 with a minimum deployment target of iOS 16
Screen Capture: ReplayKit for capturing the screen and obtaining CMSampleBuffers
Video Compression: VideoToolbox for H.264 compression
Compression Properties:
kVTCompressionPropertyKey_ConstantBitRate: 6144000 (bitrate)
kVTCompressionPropertyKey_ProfileLevel: kVTProfileLevel_H264_Main_AutoLevel (profile and level)
kVTCompressionPropertyKey_MaxKeyFrameInterval: 60 (maximum keyframe interval)
kVTCompressionPropertyKey_RealTime: true (real-time encoding)
kVTCompressionPropertyKey_Quality: 1 (lowest quality)
NAL Unit Handling: Custom header is added to NAL units
Android Receiver Details:
Device: RedMi 7A running Android 10
Video Decoding: MediaCodec API for receiving and decoding the H.264 stream
I am a bit confused on whether certain Video Toolbox (VT) encoders support hardware acceleration or not.
When I query the list of VT encoders (VTCopyVideoEncoderList(nil,&encoderList)) on an iPhone 14 Pro device, for avc1 (AVC / H.264) and hevc1 (HEVC / H.265) encoders, the kVTVideoEncoderList_IsHardwareAccelerated flag is not there, which -based on the documentation found on the VTVideoEncoderList.h- means that the encoders do not support hardware acceleration:
optional. CFBoolean. If present and set to kCFBooleanTrue, indicates that the encoder is hardware accelerated.
In fact, no encoders from this list return this flag as true and most of them do not include the flag at all on their dictionaries.
On the other hand, when I create a compression session using the VTCompressionSessionCreate() and pass the kVTVideoEncoderSpecification_EnableHardwareAcceleratedVideoEncoder as true in the encoder specifications, after querying the kVTCompressionPropertyKey_UsingHardwareAcceleratedVideoEncoder using the following code, I get a CFBoolean value of true for both H.264 and H.265 encoder.
In fact, I get a true value (for both of the aforementioned encoders) even if I don't specify the kVTVideoEncoderSpecification_EnableHardwareAcceleratedVideoEncoder during the creation of the compression session (note here that this flag was introduced in iOS 17.4 ^1).
So the question is: Are those encoders actually hardware accelerated on my device, and if so, why isn't that reflected on the VTCopyVideoEncoderList() call?
I have been seeing some crash reports for my app on some devices (not all of them). The crash occurs while converting a CVPixelBuffer captured from Video to a JPG using VTCreateCGImageFromCVPixelBuffer from VideoToolBox. I have not been able to reproduce the crash on local devices, even under adverse memory conditions (many apps running in the background).
The field crash reports show that VTCreateCGImageFromCVPixelBuffer does the conversion in another thread and that thread crashed at call to vConvert_420Yp8_CbCr8ToARGB8888_vec.
Any suggestions on how to debug this further would be helpful.
I have an image viewing app with support for avif (and avis) images. I'm trying to figure out if the recent bug in CoreMedia (dav1d) affects my app. The apple security update: https://support.apple.com/en-gb/HT214097
The vulnerable code path in dav1d is only reached when c->n_fc > 1 (https://code.videolan.org/videolan/dav1d/-/blob/2b475307dc11be9a1c3cc4358102c76a7f386a51/src/decode.c#L2845), where c is the dav1d context.
With some reverse engineering, the way I see CMPhoto calling into VideoToolBox (which internally calls into AV1SW.videodecoder, which is a wrapper around dav1d), the max frame delay is hardcoded to 1 in the dav1d settings which intern means that c->n_fc in dav1d is always 1. The vulnerable code path in dav1d is only reached when c->n_fc > 1 (https://code.videolan.org/videolan/dav1d/-/blob/2b475307dc11be9a1c3cc4358102c76a7f386a51/src/decode.c#L2845).
From my understand, this should mean that my app isn't affected. The apple security update however clearly mentions that "Processing an image may lead to arbitrary code execution". Surely I'm missing something?