I’m working on real-time object detection using YOLOv8, but I only need to detect objects in approximately 40% of the screen area. Is it possible to limit the captureOut method to focus solely on that specific region of the screen?
If this isn’t feasible, I’m considering an approach where the full-screen pixel buffer is captured and then cropped to the target area before running detection. However, I’m concerned about how this might affect real-time performance.
I’d appreciate any insights on how to maintain real-time performance or suggestions for better alternatives. Thank you!
AVFoundation
RSS for tagWork with audiovisual assets, control device cameras, process audio, and configure system audio interactions using AVFoundation.
Posts under AVFoundation tag
200 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
I am working on capturing 48MP images using the iPhone 16 Pro Max with the Ultra-wide camera. I’ve updated the code to capture the maximum supported dimensions with the following snippet:
if #available(iOS 16.0, *) {
photoOutput.maxPhotoDimensions = device.activeFormat.supportedMaxPhotoDimensions.last!
photoSettings.maxPhotoDimensions = .init(width: 5712, height: 4284)
}
However, I’m still not getting the expected results. My goal is to capture 48MP images, and I want to confirm if the Ultra-wide camera supports this resolution or if I’m missing any other configuration.
Any guidance would be appreciated!
AddInstanceForFactory: No factory registered for id <CFUUID 0x6000002e76c0> F8BB1C28-BAE8-11D6-9C31-00039315CD46
AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved
HALSystem.cpp:2216 AudioObjectPropertiesChanged: no such object
AQMEIO_HAL.cpp:2552 timeout
AudioHardware-mac-imp.cpp:2706 AudioDeviceStop: no device with given ID
AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved
AudioQueueObject.cpp:6707 ConvertInput: aq@0x109994200: AudioConverterFillComplexBuffer returned -50, packetCount 5328
Xcode version 15.2(15C500b)
iPhone 15Pro Version 17.2 (Simulator)
Language : Swift
In version 17.0 or above, there are no recording issues in the object-c project, iPhone and simulators can't start recording,
Why?
I don't know what triggered this in a previously-running application I'm developing: When I have the build target set to "My Mac (designed for iPad)," I now must delete all the app's build materials under DerivedData to get the app to build and run exactly once. Cleaning isn't enough; I have to delete everything.
On second launch, it will crash without even getting to the instantiation of the application class. None of my code executes.
Also: If I then set my iPhone as the build target, the app will build and run repeatedly. If I then return to "My Mac (designed for iPad)," the app will again launch once and then crash on every subsequent launch.
The crash is the same every time:
dyld[3875]: Symbol not found: _OBJC_CLASS_$_AVPlayerView
Referenced from: <D566512D-CAB4-3EA6-9B87-DBD15C6E71B3> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/Library/Debugger/libViewDebuggerSupport.dylib
Expected in: <4C34313C-03AD-32EB-8722-8A77C64AB959> /System/iOSSupport/System/Library/Frameworks/AVKit.framework/Versions/A/AVKit
Interestingly, I haven't found any similar online reports that mention this symbol.
Has anyone seen this behavior before, where the crash only happens after the first run... and gets reset when you toggle the target type?
I am developing an iOS app that uses YOLOv8 for object detection and aims to detect objects at 60 FPS using the UltraWide camera. My goal is to process every frame within captureOutput and utilize the detected data (such as coordinates) for each one.
I have a question regarding how background thread processing behaves in this scenario. Does the size of the YOLO model (n, s, m, etc.) or the weight of the operations inside captureOutput affect the number of frames that can be successfully processed?
Specifically, I would like to know if all frames will be processed sequentially with a delay due to heavy processing in the background, or if some frames will be dropped and not processed at all. Any insights on how to handle this would be greatly appreciated.
Thank you!
While trying to control the following two scenes in 1 ImmersiveSpace, we found the following memory leak when we background the app while a stereoscopic video is playing.
ImmersiveView's two scenes:
Scene 1 has 1 toggle button
Scene 2 has same toggle button with a 180 degree skysphere playing a stereoscopic video
Attached are the files and images of the memory leak as captured in Xcode.
To replicate this memory leak, follow these steps:
Create a new visionOS app using Xcode template as illustrated below.
Configure the project to launch directly into an immersive space (set Preferred Default Scene Session Role to Immersive Space Application Session Role in Info.plist.
Replace all swift files with those you will find in the attached texts.
In ImmersiveView, replace the stereoscopic video to play with a large 3d 180 degree video of your own bundled in your project.
Launch the app in debug mode via Xcode and onto the AVP device or simulator
Display the memory use by pressing on keys command+7 and selecting Memory in order to view the live memory graph
Press on the first immersive space's button "Open ImmersiveView"
Press on the second immersive space's button "Show Immersive Video"
Background the app
When the app tray appears, foreground the app by selecting it
The first immersive space should appear
Repeat steps 7, 8, 9, and 10 multiple times
Observe the memory use going up, the graph should look similar to the below illustration.
In ImmersiveView, upon backgrounding the app, I do:
a reset method to clear the video's memory
dismiss of the Immersive Space containing the video (even though upon execution, visionOS raises the purple warning "Unable to dismiss an Immersive Space since none is opened". It appears visionOS dismisses any ImmersiveSpace upon backgrounding, which makes sense..)
Am I not releasing the memory correctly?
Or, is there really a memory leak issue in either SwiftUI's ImmersiveSpace or in AVFoundation's AVPlayer upon background of an app?
App file TestVideoLeakOneImmersiveView
First ImmersiveSpace file InitialImmersiveView
Second ImmersiveSpace File ImmersiveView
Skysphere Model File Immersive180VideoViewModel
File AppModel
Hi, I have a video player app that lost its audio spatialization since the VisionOS 2 update. I am using the VideoPlayerComponent (https://developer.apple.com/documentation/realitykit/videoplayercomponent), to implement my videos as entities, as I want a custom look and controls to my player.
In VisionOS 1, there was automatic audio spatialization. Depending where my video entity is, the app automatically enables head tracking audio spatialization. Since VisionOS 2 however, I cannot get my video entities to play Spatial Audio. I've looked into DestinationVideo and even set up AVAudioSessionSpatialExperience but Spatial Audio is still not working.
Appreciate any help. Thanks.
Hello,
I am a developer currently working on an AR application using ARKit. I aim to implement a Zoom feature that allows users to enlarge and reduce objects within the AR scene while simultaneously measuring the distance to those objects. Specifically, I want to incorporate Optical Zoom to provide a more natural and precise user experience. I have considered several approaches and would appreciate your advice on the most effective methods.
Approaches Being Considered:
Using UIPinchGestureRecognizer to Adjust the Camera's Field of View Modifying the scale Property of SCNNode to Enlarge/Reduce Specific Objects Leveraging AVFoundation to Control the Camera's Optical Zoom Questions:
Compatibility Between ARKit and Optical Zoom: Is it feasible to control the camera's optical zoom using AVFoundation while utilizing ARKit's features? What should be considered when integrating these two frameworks?
Integrating Object Distance Measurement with Zoom Functionality: What is the most effective approach to measure and display the distance to an object in real-time when a user zooms in on it?
User Experience Considerations: Do you have any UI/UX design tips for implementing optical zoom to ensure a natural and intuitive experience? For example, how can visual feedback for zoom actions and distance measurements be effectively presented to users?
Performance Optimization: What optimization strategies can minimize potential performance issues when implementing both optical zoom and distance measurement features simultaneously?
Example Code and Reference Materials: Could you share any example code or reference materials that demonstrate similar functionalities?
Thank you.
Example Code Request:
If possible, providing sample code that integrates optical zoom with distance measurement would be extremely helpful.
Reference Links:
Please share any tutorials or resources that demonstrate the combined use of ARKit and AVFoundation.
Hi,
I currently have Enterprise API access and have observed that the main camera API only provides RGB data. I am trying to access point cloud information from LIDAR, but it seems ARKit doesn't offer this directly via the standard APIs that iPad uses.
I wanted to ask if there are any possible options to access depth data or enhanced camera capabilities using the Enterprise API.
Specifically:
Does having Enterprise API access unlock any additional camera-related APIs in AVFoundation that could provide depth information or more advanced control over the camera?
Are there any workarounds or alternative methods to obtain depth data from the camera?
Hi!
I have an AVAudioSequencer with some AVMusicTracks that are filled with AVParameterEvents.
If I toggle the isMuted property of a track, it will instantly mute when changed to true. However, after turning the muting to false, the events will only triggers on the next round of a loop and not instantly. Is this intended behaviour, and is there some way to get the events to trigger immediately after toggling the isMuted to be false?
Hey, captureHighResolutionFrame() produces the normal camera shutter sound and that really doesn't fit the ARKit context. I can't override it the usual way because there's no AVCaptureSession object in ARSession. Any ideas on what to do? Thanks!
AddInstanceForFactory: No factory registered for id <CFUUID 0x6000002e76c0> F8BB1C28-BAE8-11D6-9C31-00039315CD46
AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved
HALSystem.cpp:2216 AudioObjectPropertiesChanged: no such object
AQMEIO_HAL.cpp:2552 timeout
AudioHardware-mac-imp.cpp:2706 AudioDeviceStop: no device with given ID
AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved
AudioQueueObject.cpp:6707 ConvertInput: aq@0x109994200: AudioConverterFillComplexBuffer returned -50, packetCount 5328
Why can't I start recording? ...
Hello,
First, some version and software details:
Software: iOS 18.1
Hardware: iPhone 14 Pro Max and later
Xcode: 16.0
Summary: AVAssetReader is not concatenating a video at the beginning of the output video. The output video should contain a scene of me introducing the content, followed by a blue screen with AVSpeechSynthesizer reading out a text that I pasted above the "Generate Video" button.
Details:
Now, let's talk about the app.
Basically, I’m developing an app that generates a video with the following features:
My app will create an output video that is split into an opening scene followed by a fully blue screen.
The opening scene will be taken from a video I choose from my gallery.
I will read the opening video using AVAssetReader as usual.
After the opening scene, I will use the content of a text read by AVSpeechSynthesizer.write().
After the opening scene, the synthesized audio will start playing while the blue screen is displayed.
All of this is already defined in the attached project.
Each project file has a comment at the beginning introducing its content.
How to test:
Write something in the field above the "Generate Video" button. For example, type "Hello, world!"
Then, press the "Library" button and select a video from the gallery, about 30 seconds long.
That’s it. Press the "Generate Video" button.
The result I’ve experienced is a crash or failure to generate the video.
Practical example of what I want to achieve:
Suppose I record a 30-second video where I say, "I’m going to tell you the story of Snow White."
Then, I paste the "Snow White" story into the field above the "Generate Video" button.
The output video should contain me saying, "I’m going to tell you the story of Snow White."
After that, the AVSpeechSynthesizer will read the story I pasted, while displaying a blue screen.
I look forward to a solution.
Thank you very much!
convertToCMSampleBuffer.swift
convertToPixelBuffer.swift
createInputs.swift
createVideo.swift
test.swift
saveVideo.swift
TestApp.swift
editingVideo.swift
sampleReaderProvider.swift
misc.swift
sampleProvider.swift
SoundRecognition causes Input/Output callbacks to have varying Buffer sizes and introduces Glitching
Hello,
We have noticed an issue with SoundRecognition that causes glitching with our AudioUnit setup in Smule.
Input and output frame sizes are inconsistent.
Input frame size does not match [AVAudioSession sharedInstance].IOBufferDuration
My best guess is that SoundRecognition influences the input frame size and not the output frame size.
To reproduce use the example app here:
https://github.com/MarkoGill/SoundRecognitionBug
Hardware/OS
iPhone 14 Pro on iOS 18 -> Experiences the problem
iPhone 11 on iOS 18 -> Experiences the problem
iPhone 15 on iOS 18 -> Not experiencing the problem
Reproduction Steps
Enable Sound Recognition (Settings > Accessibility > Sound Recognition > On)
Enable a Sound for detection (Sounds > Dog > On)
Open the example app with headset (it routes input to output)
Notice glitching occurs
Check the logs. Record and Playback buffer sizes vary
Example Log:
AU input sample rate: 48000.000000
AU output sample rate: 48000.000000
hardware sample rate: 48000.000000
hardware buffer size: 1104.000000
updated record frame counts: 1024
updated playback frame counts: 1104
Notes:
You can disable Sound Recognition, restart the app, and playback behaves correctly.
I’m working on a macOS app, written in Swift. My goal is to record audio from an external microphone, e.g., one connected via USB.
For this, I’m using an AVCaptureSession and recording its output with an AVAssetWriter. This works perfectly in principle (and reliably with internal microphones, for example).
The problem occurs after my app has successfully completed the first recording and I then want to make additional recordings (which makes me think it might be process-dependent, because it works again after restarting the app).
The problem: Noisy or distorted-sounding audio files. In addition, the following error message appears in the Console from CoreAudio / its AudioConverter:
Input data proc returned inconsistent 512 packets for 2048 bytes; at 3 bytes per packet, that is actually 682 packets
It is easy to reproduce. This problem is reproducible even if I don’t configure the AVAssetWriter manually and instead let it receive its audioSettings using a preset from an AVOutputSettingsAssistant. I’m running on macOS 15.0 (24A335).
I’ve filed a feedback including a demo project → FB15333298 🎟️
I would greatly appreciate any help!
Have a great day,
Martin
Hi,
I'm trying to insert CMSampleBuffers into an AVAssetWriterInput that has been configured with expectsMediaDataInRealTime = false with pauses. That is, I insert fixed-length audio at specific (increasing and non-overlapping) time points with large gaps in between. E.g., 5 seconds of audio at t=3.0, 5 seconds of audio at t=12.0, etc.
The first audio sample plays at t=3 in the final output video as expected. But then all the other samples are bunched up immediately after it instead of being scheduled at the correct time. Below is my code.
I'm just loading the asset and then readjusting its timestamps to be correct in the absolute timeline. Why do they not get scheduled correctly when the timestamps and durations are definitely correct and non-overlapping?
func addFrame(_ pixelBuffer: CVPixelBuffer) {
guard CGSize(width: pixelBuffer.width, height: pixelBuffer.height) == outputSize else { return }
let frameTime = CMTimeMake(value: frameCount, timescale: frameRate)
if videoInput?.isReadyForMoreMediaData == true {
pixelBufferAdaptor?.append(pixelBuffer, withPresentationTime: frameTime)
frameCount += 1
currentTime = frameTime
}
}
func addMP3AudioClip(_ audioData: Data) async throws {
let tempURL = FileManager.default.temporaryDirectory.appendingPathComponent(UUID().uuidString + ".mp3")
defer {
try? FileManager.default.removeItem(at: tempURL)
}
try audioData.write(to: tempURL)
let asset = AVAsset(url: tempURL)
let duration = try await asset.load(.duration)
let audioTrack = try await asset.loadTracks(withMediaType: .audio).first!
let audioReader = try AVAssetReader(asset: asset)
let outputSettings: [String: Any] = [
AVFormatIDKey: kAudioFormatLinearPCM,
AVSampleRateKey: 44100,
AVNumberOfChannelsKey: 2,
AVLinearPCMBitDepthKey: 16,
AVLinearPCMIsFloatKey: false,
AVLinearPCMIsBigEndianKey: false,
AVLinearPCMIsNonInterleaved: false
]
let audioReaderOutput = AVAssetReaderTrackOutput(track: audioTrack, outputSettings: outputSettings)
audioReader.add(audioReaderOutput)
guard audioReader.startReading() else {
throw NSError(domain: "AudioReaderError", code: 0, userInfo: [NSLocalizedDescriptionKey: "Failed to start reading audio"])
}
let baseInsertionTime = currentTime.convertScale(duration.timescale, method: .default) // Capture the current video time when the method is called
print("Adding audio clip at \(baseInsertionTime.seconds) seconds, duration: \(duration.seconds) seconds")
var audioTime = CMTime.zero
var totalDuration: Double = 0
while let sampleBuffer = audioReaderOutput.copyNextSampleBuffer() {
let bufferDuration = CMSampleBufferGetDuration(sampleBuffer)
let adjustedBuffer = adjustTimeStamp(of: sampleBuffer, by: baseInsertionTime)
while !audioInput!.isReadyForMoreMediaData {
try await Task.sleep(nanoseconds: 100_000_000) // 0.1 second
}
audioInput!.append(adjustedBuffer)
print(" Adjusted time: \(adjustedBuffer.presentationTimeStamp.seconds)")
audioTime = CMTimeAdd(audioTime, bufferDuration)
totalDuration += bufferDuration.seconds
}
print("Finished adding audio clip. Last sample at: \(CMTimeAdd(baseInsertionTime, audioTime).seconds) seconds")
print(" totalDuration=\(totalDuration)")
}
private func adjustTimeStamp(of sampleBuffer: CMSampleBuffer, by timeOffset: CMTime) -> CMSampleBuffer {
var count: CMItemCount = 0
CMSampleBufferGetSampleTimingInfoArray(sampleBuffer, entryCount: 0, arrayToFill: nil, entriesNeededOut: &count)
var timingInfo = [CMSampleTimingInfo](repeating: CMSampleTimingInfo(), count: count)
CMSampleBufferGetSampleTimingInfoArray(sampleBuffer, entryCount: count, arrayToFill: &timingInfo, entriesNeededOut: nil)
for i in 0..<count {
timingInfo[i].presentationTimeStamp = CMTimeAdd(timingInfo[i].presentationTimeStamp, timeOffset)
if timingInfo[i].decodeTimeStamp != .invalid {
timingInfo[i].decodeTimeStamp = CMTimeAdd(timingInfo[i].decodeTimeStamp, timeOffset)
} else {
timingInfo[i].decodeTimeStamp = timingInfo[i].presentationTimeStamp
}
}
var adjustedBuffer: CMSampleBuffer?
CMSampleBufferCreateCopyWithNewTiming(allocator: nil, sampleBuffer: sampleBuffer, sampleTimingEntryCount: count, sampleTimingArray: &timingInfo, sampleBufferOut: &adjustedBuffer)
return adjustedBuffer!
}
The app needs to play remote videos. Sometimes it takes very long time (~10 seconds) to load the media and play with AVPlayer.
So I use a timer to check and try to play next video if it is over 5 seconds:
AVURLAsset *asset = [[AVURLAsset alloc] initWithURL:videoUrl options:nil];// line 1
NSArray *keys = @[@"playable"];
mediaLoaded = NO;
[asset loadValuesAsynchronouslyForKeys:keys completionHandler:^() { // line 2
mediaLoaded = YES; // line 4
dispatch_async(dispatch_get_main_queue(), ^{
[self.player replaceCurrentItemWithPlayerItem:[AVPlayerItem playerItemWithAsset:asset]];
[self.player playImmediatelyAtRate:playSpeed];
});
}];
dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(5 * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
if (!mediaLoaded) {
[self playNextVideo]; // line 3
}
});
So the flow is: line 1 (of video 1)- line 2 (of video 1)- line 3 (if over 5 seconds and video 1 is not playing)- line 1 (of video 2)-...
Now the problem is that seems line 2 is blocking line 1: only line 4 (for video 1 after ~10 seconds) or the completionHandler is executed will line 2 (for video 2) will be executed.
Anybody can give any insight?
Thx!
We have observed consistent glitches in video playback when using AVPlayer to stream HLS (HTTP Live Streaming) videos on iOS. The issue manifests as intermittent frame drops, stuttering, and playback instability during HLS streams. However, the same behavior is not present when playing MP4 videos using the same AVPlayer instance. The HLS streams being used follow standard encoding practices, and network conditions have been ruled out as a cause for this problem.
https://drive.google.com/file/d/1lhdpHTyjPYCYLHjzvb6ZF6P6jehIuwY0/view?usp=sharing
Steps to Reproduce:
1. Load an HLS video into AVPlayer and initiate playback.
2. Observe intermittent glitches and stuttering during video playback.
3. Load and play an MP4 video in the same AVPlayer instance.
4. Notice that MP4 playback is smooth without any glitches.
I am trying to play HLS videos on my iPhone app and I see frequent glitch on the video, it is not happening with MP4.
https://drive.google.com/file/d/1lhdpHTyjPYCYLHjzvb6ZF6P6jehIuwY0/view?usp=sharing.
Hi,
I'm recording videos frame by frame and occasionally a sound plays (from an MP3 asset). I want to composite these sounds into the video at the correct timings. But this doesn't work. Really pulling my hair out here. I've tried everything, including adding one after another and then inserting silence in between (allegedly this pushes subsequent clips back) but nothing works.
Here, _currentTime is the current time according to the video frames added, which are added at 20Hz.
You can see I am adding silence long enough to cover the time from the end of the last audio clip to now, plus extra padding to contain the audio we are about to add. Doesn't matter if I remove this, it just doesn't work. Sometimes I can get two pieces of audio to play but never a third and usually, only the first audio plays, and then nothing after.
I'm completely stumped.
func addFrame(_ pixelBuffer: CVPixelBuffer) {
guard CGSize(width: pixelBuffer.width, height: pixelBuffer.height) == _outputSize else { return }
let frameTime = CMTimeMake(value: Int64(_frameCount), timescale: _frameRate)
if _videoInput?.isReadyForMoreMediaData == true {
_pixelBufferAdaptor?.append(pixelBuffer, withPresentationTime: frameTime)
_frameCount += 1
_currentTime = frameTime
}
}
func addMP3AudioClip(_ audioData: Data) async throws {
let tempURL = FileManager.default.temporaryDirectory.appendingPathComponent(UUID().uuidString + ".mp3")
try audioData.write(to: tempURL)
let asset = AVAsset(url: tempURL)
let duration = try await asset.load(.duration)
let audioTrack = try await asset.loadTracks(withMediaType: .audio).first!
let currentAudioTime = _currentTime.convertScale(duration.timescale, method: .default)
_audioTrack?.insertEmptyTimeRange(CMTimeRangeFromTimeToTime(start: _lastAudioClipEndTime, end: currentAudioTime))
_audioTrack?.insertEmptyTimeRange(CMTimeRangeFromTimeToTime(start: currentAudioTime, end: CMTimeAdd(currentAudioTime, duration)))
let timeRange = CMTimeRangeMake(start: .zero, duration: duration)
try _audioTrack?.insertTimeRange(timeRange, of: audioTrack, at: currentAudioTime)
_lastAudioClipEndTime = CMTimeAdd(currentAudioTime, duration)
try FileManager.default.removeItem(at: tempURL)
_audioClipTimeRanges.append(CMTimeRangeMake(start: _currentTime, duration: duration))
}
Thank you,
-- B.