AVFoundation

RSS for tag

Work with audiovisual assets, control device cameras, process audio, and configure system audio interactions using AVFoundation.

Posts under AVFoundation tag

200 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

Optimizing YOLOv8 for Real-Time Object Detection in a Specific Screen Area
I’m working on real-time object detection using YOLOv8, but I only need to detect objects in approximately 40% of the screen area. Is it possible to limit the captureOut method to focus solely on that specific region of the screen? If this isn’t feasible, I’m considering an approach where the full-screen pixel buffer is captured and then cropped to the target area before running detection. However, I’m concerned about how this might affect real-time performance. I’d appreciate any insights on how to maintain real-time performance or suggestions for better alternatives. Thank you!
2
0
279
Oct ’24
How to capture 48MP capture with Ultra wide lens using iPhone 16 pro max
I am working on capturing 48MP images using the iPhone 16 Pro Max with the Ultra-wide camera. I’ve updated the code to capture the maximum supported dimensions with the following snippet: if #available(iOS 16.0, *) { photoOutput.maxPhotoDimensions = device.activeFormat.supportedMaxPhotoDimensions.last! photoSettings.maxPhotoDimensions = .init(width: 5712, height: 4284) } However, I’m still not getting the expected results. My goal is to capture 48MP images, and I want to confirm if the Ultra-wide camera supports this resolution or if I’m missing any other configuration. Any guidance would be appreciated!
1
2
293
Oct ’24
Voice recording cannot be enabled in ios 17.2
AddInstanceForFactory: No factory registered for id <CFUUID 0x6000002e76c0> F8BB1C28-BAE8-11D6-9C31-00039315CD46 AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved HALSystem.cpp:2216 AudioObjectPropertiesChanged: no such object AQMEIO_HAL.cpp:2552 timeout AudioHardware-mac-imp.cpp:2706 AudioDeviceStop: no device with given ID AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved AudioQueueObject.cpp:6707 ConvertInput: aq@0x109994200: AudioConverterFillComplexBuffer returned -50, packetCount 5328 Xcode version 15.2(15C500b) iPhone 15Pro Version 17.2 (Simulator) Language : Swift In version 17.0 or above, there are no recording issues in the object-c project, iPhone and simulators can't start recording, Why?
3
1
266
Oct ’24
App crashes at launch on missing symbol AVPlayerView... except on first launch
I don't know what triggered this in a previously-running application I'm developing: When I have the build target set to "My Mac (designed for iPad)," I now must delete all the app's build materials under DerivedData to get the app to build and run exactly once. Cleaning isn't enough; I have to delete everything. On second launch, it will crash without even getting to the instantiation of the application class. None of my code executes. Also: If I then set my iPhone as the build target, the app will build and run repeatedly. If I then return to "My Mac (designed for iPad)," the app will again launch once and then crash on every subsequent launch. The crash is the same every time: dyld[3875]: Symbol not found: _OBJC_CLASS_$_AVPlayerView Referenced from: <D566512D-CAB4-3EA6-9B87-DBD15C6E71B3> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/Library/Debugger/libViewDebuggerSupport.dylib Expected in: <4C34313C-03AD-32EB-8722-8A77C64AB959> /System/iOSSupport/System/Library/Frameworks/AVKit.framework/Versions/A/AVKit Interestingly, I haven't found any similar online reports that mention this symbol. Has anyone seen this behavior before, where the crash only happens after the first run... and gets reset when you toggle the target type?
4
0
306
Oct ’24
Handling YOLOv8 Object Detection in 60FPS UltraWideCamera on iOS: Frame Processing Query
I am developing an iOS app that uses YOLOv8 for object detection and aims to detect objects at 60 FPS using the UltraWide camera. My goal is to process every frame within captureOutput and utilize the detected data (such as coordinates) for each one. I have a question regarding how background thread processing behaves in this scenario. Does the size of the YOLO model (n, s, m, etc.) or the weight of the operations inside captureOutput affect the number of frames that can be successfully processed? Specifically, I would like to know if all frames will be processed sequentially with a delay due to heavy processing in the background, or if some frames will be dropped and not processed at all. Any insights on how to handle this would be greatly appreciated. Thank you!
2
0
348
Oct ’24
Video Memory Leak when Backgrounding
While trying to control the following two scenes in 1 ImmersiveSpace, we found the following memory leak when we background the app while a stereoscopic video is playing. ImmersiveView's two scenes: Scene 1 has 1 toggle button Scene 2 has same toggle button with a 180 degree skysphere playing a stereoscopic video Attached are the files and images of the memory leak as captured in Xcode. To replicate this memory leak, follow these steps: Create a new visionOS app using Xcode template as illustrated below. Configure the project to launch directly into an immersive space (set Preferred Default Scene Session Role to Immersive Space Application Session Role in Info.plist. Replace all swift files with those you will find in the attached texts. In ImmersiveView, replace the stereoscopic video to play with a large 3d 180 degree video of your own bundled in your project. Launch the app in debug mode via Xcode and onto the AVP device or simulator Display the memory use by pressing on keys command+7 and selecting Memory in order to view the live memory graph Press on the first immersive space's button "Open ImmersiveView" Press on the second immersive space's button "Show Immersive Video" Background the app When the app tray appears, foreground the app by selecting it The first immersive space should appear Repeat steps 7, 8, 9, and 10 multiple times Observe the memory use going up, the graph should look similar to the below illustration. In ImmersiveView, upon backgrounding the app, I do: a reset method to clear the video's memory dismiss of the Immersive Space containing the video (even though upon execution, visionOS raises the purple warning "Unable to dismiss an Immersive Space since none is opened". It appears visionOS dismisses any ImmersiveSpace upon backgrounding, which makes sense..) Am I not releasing the memory correctly? Or, is there really a memory leak issue in either SwiftUI's ImmersiveSpace or in AVFoundation's AVPlayer upon background of an app? App file TestVideoLeakOneImmersiveView First ImmersiveSpace file InitialImmersiveView Second ImmersiveSpace File ImmersiveView Skysphere Model File Immersive180VideoViewModel File AppModel
3
0
351
Oct ’24
App lost audio spatialization from VisionOS 2 Update
Hi, I have a video player app that lost its audio spatialization since the VisionOS 2 update. I am using the VideoPlayerComponent (https://developer.apple.com/documentation/realitykit/videoplayercomponent), to implement my videos as entities, as I want a custom look and controls to my player. In VisionOS 1, there was automatic audio spatialization. Depending where my video entity is, the app automatically enables head tracking audio spatialization. Since VisionOS 2 however, I cannot get my video entities to play Spatial Audio. I've looked into DestinationVideo and even set up AVAudioSessionSpatialExperience but Spatial Audio is still not working. Appreciate any help. Thanks.
1
0
214
Oct ’24
Compatibility Between ARKit and Optical Zoom
Hello, I am a developer currently working on an AR application using ARKit. I aim to implement a Zoom feature that allows users to enlarge and reduce objects within the AR scene while simultaneously measuring the distance to those objects. Specifically, I want to incorporate Optical Zoom to provide a more natural and precise user experience. I have considered several approaches and would appreciate your advice on the most effective methods. Approaches Being Considered: Using UIPinchGestureRecognizer to Adjust the Camera's Field of View Modifying the scale Property of SCNNode to Enlarge/Reduce Specific Objects Leveraging AVFoundation to Control the Camera's Optical Zoom Questions: Compatibility Between ARKit and Optical Zoom: Is it feasible to control the camera's optical zoom using AVFoundation while utilizing ARKit's features? What should be considered when integrating these two frameworks? Integrating Object Distance Measurement with Zoom Functionality: What is the most effective approach to measure and display the distance to an object in real-time when a user zooms in on it? User Experience Considerations: Do you have any UI/UX design tips for implementing optical zoom to ensure a natural and intuitive experience? For example, how can visual feedback for zoom actions and distance measurements be effectively presented to users? Performance Optimization: What optimization strategies can minimize potential performance issues when implementing both optical zoom and distance measurement features simultaneously? Example Code and Reference Materials: Could you share any example code or reference materials that demonstrate similar functionalities? Thank you. Example Code Request: If possible, providing sample code that integrates optical zoom with distance measurement would be extremely helpful. Reference Links: Please share any tutorials or resources that demonstrate the combined use of ARKit and AVFoundation.
1
0
226
Oct ’24
Raw point cloud access
Hi, I currently have Enterprise API access and have observed that the main camera API only provides RGB data. I am trying to access point cloud information from LIDAR, but it seems ARKit doesn't offer this directly via the standard APIs that iPad uses. I wanted to ask if there are any possible options to access depth data or enhanced camera capabilities using the Enterprise API. Specifically: Does having Enterprise API access unlock any additional camera-related APIs in AVFoundation that could provide depth information or more advanced control over the camera? Are there any workarounds or alternative methods to obtain depth data from the camera?
1
0
183
Oct ’24
Toggling AVMusicTrack isMuted
Hi! I have an AVAudioSequencer with some AVMusicTracks that are filled with AVParameterEvents. If I toggle the isMuted property of a track, it will instantly mute when changed to true. However, after turning the muting to false, the events will only triggers on the next round of a loop and not instantly. Is this intended behaviour, and is there some way to get the events to trigger immediately after toggling the isMuted to be false?
1
0
224
Oct ’24
AddInstanceForFactory: No factory registered for id <CFUUID 0x6000002e76c0>
AddInstanceForFactory: No factory registered for id <CFUUID 0x6000002e76c0> F8BB1C28-BAE8-11D6-9C31-00039315CD46 AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved HALSystem.cpp:2216 AudioObjectPropertiesChanged: no such object AQMEIO_HAL.cpp:2552 timeout AudioHardware-mac-imp.cpp:2706 AudioDeviceStop: no device with given ID AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved AudioQueueObject.cpp:6707 ConvertInput: aq@0x109994200: AudioConverterFillComplexBuffer returned -50, packetCount 5328 Why can't I start recording? ...
3
0
249
Oct ’24
Writing video using AVAssetWriter, AVAssetReader, and AVSPEECHSYNTHESIZER
Hello, First, some version and software details: Software: iOS 18.1 Hardware: iPhone 14 Pro Max and later Xcode: 16.0 Summary: AVAssetReader is not concatenating a video at the beginning of the output video. The output video should contain a scene of me introducing the content, followed by a blue screen with AVSpeechSynthesizer reading out a text that I pasted above the "Generate Video" button. Details: Now, let's talk about the app. Basically, I’m developing an app that generates a video with the following features: My app will create an output video that is split into an opening scene followed by a fully blue screen. The opening scene will be taken from a video I choose from my gallery. I will read the opening video using AVAssetReader as usual. After the opening scene, I will use the content of a text read by AVSpeechSynthesizer.write(). After the opening scene, the synthesized audio will start playing while the blue screen is displayed. All of this is already defined in the attached project. Each project file has a comment at the beginning introducing its content. How to test: Write something in the field above the "Generate Video" button. For example, type "Hello, world!" Then, press the "Library" button and select a video from the gallery, about 30 seconds long. That’s it. Press the "Generate Video" button. The result I’ve experienced is a crash or failure to generate the video. Practical example of what I want to achieve: Suppose I record a 30-second video where I say, "I’m going to tell you the story of Snow White." Then, I paste the "Snow White" story into the field above the "Generate Video" button. The output video should contain me saying, "I’m going to tell you the story of Snow White." After that, the AVSpeechSynthesizer will read the story I pasted, while displaying a blue screen. I look forward to a solution. Thank you very much! convertToCMSampleBuffer.swift convertToPixelBuffer.swift createInputs.swift createVideo.swift test.swift saveVideo.swift TestApp.swift editingVideo.swift sampleReaderProvider.swift misc.swift sampleProvider.swift
8
0
551
1w
SoundRecognition causes Input/Output callbacks to have varying Buffer sizes and introduces Glitching
Hello, We have noticed an issue with SoundRecognition that causes glitching with our AudioUnit setup in Smule. Input and output frame sizes are inconsistent. Input frame size does not match [AVAudioSession sharedInstance].IOBufferDuration My best guess is that SoundRecognition influences the input frame size and not the output frame size. To reproduce use the example app here: https://github.com/MarkoGill/SoundRecognitionBug Hardware/OS iPhone 14 Pro on iOS 18 -> Experiences the problem iPhone 11 on iOS 18 -> Experiences the problem iPhone 15 on iOS 18 -> Not experiencing the problem Reproduction Steps Enable Sound Recognition (Settings > Accessibility > Sound Recognition > On) Enable a Sound for detection (Sounds > Dog > On) Open the example app with headset (it routes input to output) Notice glitching occurs Check the logs. Record and Playback buffer sizes vary Example Log: AU input sample rate: 48000.000000 AU output sample rate: 48000.000000 hardware sample rate: 48000.000000 hardware buffer size: 1104.000000 updated record frame counts: 1024 updated playback frame counts: 1104 Notes: You can disable Sound Recognition, restart the app, and playback behaves correctly.
4
1
522
Oct ’24
Distorted Audio When Recording External Mics With AVCaptureSession and AVAssetWriter
I’m working on a macOS app, written in Swift. My goal is to record audio from an external microphone, e.g., one connected via USB. For this, I’m using an AVCaptureSession and recording its output with an AVAssetWriter. This works perfectly in principle (and reliably with internal microphones, for example). The problem occurs after my app has successfully completed the first recording and I then want to make additional recordings (which makes me think it might be process-dependent, because it works again after restarting the app). The problem: Noisy or distorted-sounding audio files. In addition, the following error message appears in the Console from CoreAudio / its AudioConverter: Input data proc returned inconsistent 512 packets for 2048 bytes; at 3 bytes per packet, that is actually 682 packets It is easy to reproduce. This problem is reproducible even if I don’t configure the AVAssetWriter manually and instead let it receive its audioSettings using a preset from an AVOutputSettingsAssistant. I’m running on macOS 15.0 (24A335). I’ve filed a feedback including a demo project → FB15333298 🎟️ I would greatly appreciate any help! Have a great day, Martin
5
0
343
1w
AVAssetWriterInput -- inserting sample buffers with pauses in between not working
Hi, I'm trying to insert CMSampleBuffers into an AVAssetWriterInput that has been configured with expectsMediaDataInRealTime = false with pauses. That is, I insert fixed-length audio at specific (increasing and non-overlapping) time points with large gaps in between. E.g., 5 seconds of audio at t=3.0, 5 seconds of audio at t=12.0, etc. The first audio sample plays at t=3 in the final output video as expected. But then all the other samples are bunched up immediately after it instead of being scheduled at the correct time. Below is my code. I'm just loading the asset and then readjusting its timestamps to be correct in the absolute timeline. Why do they not get scheduled correctly when the timestamps and durations are definitely correct and non-overlapping? func addFrame(_ pixelBuffer: CVPixelBuffer) { guard CGSize(width: pixelBuffer.width, height: pixelBuffer.height) == outputSize else { return } let frameTime = CMTimeMake(value: frameCount, timescale: frameRate) if videoInput?.isReadyForMoreMediaData == true { pixelBufferAdaptor?.append(pixelBuffer, withPresentationTime: frameTime) frameCount += 1 currentTime = frameTime } } func addMP3AudioClip(_ audioData: Data) async throws { let tempURL = FileManager.default.temporaryDirectory.appendingPathComponent(UUID().uuidString + ".mp3") defer { try? FileManager.default.removeItem(at: tempURL) } try audioData.write(to: tempURL) let asset = AVAsset(url: tempURL) let duration = try await asset.load(.duration) let audioTrack = try await asset.loadTracks(withMediaType: .audio).first! let audioReader = try AVAssetReader(asset: asset) let outputSettings: [String: Any] = [ AVFormatIDKey: kAudioFormatLinearPCM, AVSampleRateKey: 44100, AVNumberOfChannelsKey: 2, AVLinearPCMBitDepthKey: 16, AVLinearPCMIsFloatKey: false, AVLinearPCMIsBigEndianKey: false, AVLinearPCMIsNonInterleaved: false ] let audioReaderOutput = AVAssetReaderTrackOutput(track: audioTrack, outputSettings: outputSettings) audioReader.add(audioReaderOutput) guard audioReader.startReading() else { throw NSError(domain: "AudioReaderError", code: 0, userInfo: [NSLocalizedDescriptionKey: "Failed to start reading audio"]) } let baseInsertionTime = currentTime.convertScale(duration.timescale, method: .default) // Capture the current video time when the method is called print("Adding audio clip at \(baseInsertionTime.seconds) seconds, duration: \(duration.seconds) seconds") var audioTime = CMTime.zero var totalDuration: Double = 0 while let sampleBuffer = audioReaderOutput.copyNextSampleBuffer() { let bufferDuration = CMSampleBufferGetDuration(sampleBuffer) let adjustedBuffer = adjustTimeStamp(of: sampleBuffer, by: baseInsertionTime) while !audioInput!.isReadyForMoreMediaData { try await Task.sleep(nanoseconds: 100_000_000) // 0.1 second } audioInput!.append(adjustedBuffer) print(" Adjusted time: \(adjustedBuffer.presentationTimeStamp.seconds)") audioTime = CMTimeAdd(audioTime, bufferDuration) totalDuration += bufferDuration.seconds } print("Finished adding audio clip. Last sample at: \(CMTimeAdd(baseInsertionTime, audioTime).seconds) seconds") print(" totalDuration=\(totalDuration)") } private func adjustTimeStamp(of sampleBuffer: CMSampleBuffer, by timeOffset: CMTime) -> CMSampleBuffer { var count: CMItemCount = 0 CMSampleBufferGetSampleTimingInfoArray(sampleBuffer, entryCount: 0, arrayToFill: nil, entriesNeededOut: &count) var timingInfo = [CMSampleTimingInfo](repeating: CMSampleTimingInfo(), count: count) CMSampleBufferGetSampleTimingInfoArray(sampleBuffer, entryCount: count, arrayToFill: &timingInfo, entriesNeededOut: nil) for i in 0..<count { timingInfo[i].presentationTimeStamp = CMTimeAdd(timingInfo[i].presentationTimeStamp, timeOffset) if timingInfo[i].decodeTimeStamp != .invalid { timingInfo[i].decodeTimeStamp = CMTimeAdd(timingInfo[i].decodeTimeStamp, timeOffset) } else { timingInfo[i].decodeTimeStamp = timingInfo[i].presentationTimeStamp } } var adjustedBuffer: CMSampleBuffer? CMSampleBufferCreateCopyWithNewTiming(allocator: nil, sampleBuffer: sampleBuffer, sampleTimingEntryCount: count, sampleTimingArray: &timingInfo, sampleBufferOut: &adjustedBuffer) return adjustedBuffer! }
0
0
227
Oct ’24
Cancel or quit from loadValuesAsynchronouslyForKeys?
The app needs to play remote videos. Sometimes it takes very long time (~10 seconds) to load the media and play with AVPlayer. So I use a timer to check and try to play next video if it is over 5 seconds: AVURLAsset *asset = [[AVURLAsset alloc] initWithURL:videoUrl options:nil];// line 1 NSArray *keys = @[@"playable"]; mediaLoaded = NO; [asset loadValuesAsynchronouslyForKeys:keys completionHandler:^() { // line 2 mediaLoaded = YES; // line 4 dispatch_async(dispatch_get_main_queue(), ^{ [self.player replaceCurrentItemWithPlayerItem:[AVPlayerItem playerItemWithAsset:asset]]; [self.player playImmediatelyAtRate:playSpeed]; }); }]; dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(5 * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{ if (!mediaLoaded) { [self playNextVideo]; // line 3 } }); So the flow is: line 1 (of video 1)- line 2 (of video 1)- line 3 (if over 5 seconds and video 1 is not playing)- line 1 (of video 2)-... Now the problem is that seems line 2 is blocking line 1: only line 4 (for video 1 after ~10 seconds) or the completionHandler is executed will line 2 (for video 2) will be executed. Anybody can give any insight? Thx!
1
0
203
Oct ’24
Glitch in AVPlayer while playing HLS videos
We have observed consistent glitches in video playback when using AVPlayer to stream HLS (HTTP Live Streaming) videos on iOS. The issue manifests as intermittent frame drops, stuttering, and playback instability during HLS streams. However, the same behavior is not present when playing MP4 videos using the same AVPlayer instance. The HLS streams being used follow standard encoding practices, and network conditions have been ruled out as a cause for this problem. https://drive.google.com/file/d/1lhdpHTyjPYCYLHjzvb6ZF6P6jehIuwY0/view?usp=sharing Steps to Reproduce: 1. Load an HLS video into AVPlayer and initiate playback. 2. Observe intermittent glitches and stuttering during video playback. 3. Load and play an MP4 video in the same AVPlayer instance. 4. Notice that MP4 playback is smooth without any glitches.
2
0
258
Oct ’24
How to insert multiple AVAssets into AVMutableCompositionTrack with silence in between?
Hi, I'm recording videos frame by frame and occasionally a sound plays (from an MP3 asset). I want to composite these sounds into the video at the correct timings. But this doesn't work. Really pulling my hair out here. I've tried everything, including adding one after another and then inserting silence in between (allegedly this pushes subsequent clips back) but nothing works. Here, _currentTime is the current time according to the video frames added, which are added at 20Hz. You can see I am adding silence long enough to cover the time from the end of the last audio clip to now, plus extra padding to contain the audio we are about to add. Doesn't matter if I remove this, it just doesn't work. Sometimes I can get two pieces of audio to play but never a third and usually, only the first audio plays, and then nothing after. I'm completely stumped. func addFrame(_ pixelBuffer: CVPixelBuffer) { guard CGSize(width: pixelBuffer.width, height: pixelBuffer.height) == _outputSize else { return } let frameTime = CMTimeMake(value: Int64(_frameCount), timescale: _frameRate) if _videoInput?.isReadyForMoreMediaData == true { _pixelBufferAdaptor?.append(pixelBuffer, withPresentationTime: frameTime) _frameCount += 1 _currentTime = frameTime } } func addMP3AudioClip(_ audioData: Data) async throws { let tempURL = FileManager.default.temporaryDirectory.appendingPathComponent(UUID().uuidString + ".mp3") try audioData.write(to: tempURL) let asset = AVAsset(url: tempURL) let duration = try await asset.load(.duration) let audioTrack = try await asset.loadTracks(withMediaType: .audio).first! let currentAudioTime = _currentTime.convertScale(duration.timescale, method: .default) _audioTrack?.insertEmptyTimeRange(CMTimeRangeFromTimeToTime(start: _lastAudioClipEndTime, end: currentAudioTime)) _audioTrack?.insertEmptyTimeRange(CMTimeRangeFromTimeToTime(start: currentAudioTime, end: CMTimeAdd(currentAudioTime, duration))) let timeRange = CMTimeRangeMake(start: .zero, duration: duration) try _audioTrack?.insertTimeRange(timeRange, of: audioTrack, at: currentAudioTime) _lastAudioClipEndTime = CMTimeAdd(currentAudioTime, duration) try FileManager.default.removeItem(at: tempURL) _audioClipTimeRanges.append(CMTimeRangeMake(start: _currentTime, duration: duration)) } Thank you, -- B.
0
0
193
Oct ’24