
RSS for tag

Work with audiovisual assets, control device cameras, process audio, and configure system audio interactions using AVFoundation.

Posts under AVFoundation tag

200 Posts
Sort by:






"AVSpeechSynthesisVoice" choppy at start.
So, I'm trying to create my own text-to-speech setup. Problem I'm having is whenever I do a test run, the speech gets a bit choppy at the start kind of skipping over maybe a word or a few characters. A few details: I've essentially built a separate class for handling the speech events. AVSpeechSynthesizer is set up as a private variable for the class so I don't expect deallocation to be the issue. Especially since it's a problem at the start. I've got a queue set up for what it's worth so that shouldn't be a problem. I'd appreciate any advice.
Oct ’24
Some use AVCaptureControl problems
Set 3 controls to the AVCaptureSession and remove them all. The number of controls in the session is indeed 0, but the camera controls button still shows the previous 3 controls. If it is only 3->2 or 3->1, it can be modified normally, 3->0 is not OK, 0->3 is OK. f (self.captureControl.zoom) { if (self.zoomScaleControl) { self.zoomScaleControl.enabled = false; [_session removeControl:self.zoomScaleControl]; } AVCaptureSlider *zoomSlider = [self.captureControl.zoom fetchCaptureSlider]; [zoomSlider setActionQueue:dispatch_get_main_queue() action:^(float zoomFactor) { @strongify(self); if ([self.dataOutputDelegate respondsToSelector:@selector(videoCaptureSession:tryChangeZoomScale:)]) { [self.dataOutputDelegate videoCaptureSession:self tryChangeZoomScale:zoomFactor]; } }]; self.zoomScaleControl = zoomSlider; } else { if (self.zoomScaleControl) { self.zoomScaleControl.enabled = false; [_session removeControl:self.zoomScaleControl]; } self.zoomScaleControl = nil; } if (self.captureControl.exposure) { if (self.exposureBiasControl) { self.exposureBiasControl.enabled = false; [_session removeControl:self.exposureBiasControl]; } AVCaptureSlider *exposureSlider = [self.captureControl.exposure fetchCaptureSlider]; [exposureSlider setActionQueue:dispatch_get_main_queue() action:^(float bias) { @strongify(self); if ([self.dataOutputDelegate respondsToSelector:@selector(videoCaptureSession:tryChangeExposureBias:)]) { [self.dataOutputDelegate videoCaptureSession:self tryChangeExposureBias:bias]; } }]; self.exposureBiasControl = exposureSlider; } else { if (self.exposureBiasControl) { self.exposureBiasControl.enabled = false; [_session removeControl:self.exposureBiasControl]; } self.exposureBiasControl = nil; } if (self.captureControl.len) { if (self.lenControl) { self.lenControl.enabled = false; [_session removeControl:self.lenControl]; } ORLenCaptureControlCustomModel *len = self.captureControl.len; AVCaptureIndexPicker *picker = [len fetchCaptureSlider]; [picker setActionQueue:dispatch_get_main_queue() action:^(NSInteger selectedIndex) { @strongify(self); if ([self.dataOutputDelegate respondsToSelector:@selector(videoCaptureSession:didChangeLenIndex:datas:)]) { [self.dataOutputDelegate videoCaptureSession:self didChangeLenIndex:selectedIndex datas:self.captureControl.len.indexDatas]; } }]; self.lenControl = picker; } else { if (self.lenControl) { self.lenControl.enabled = false; [_session removeControl:self.lenControl]; } self.lenControl = nil; } if ([_session canAddControl:self.zoomScaleControl]) { [_session addControl:self.zoomScaleControl]; } else { self.zoomScaleControl = nil; } if ([_session canAddControl:self.lenControl]) { [_session addControl:self.lenControl]; } else { self.lenControl = nil; } if ([_session canAddControl:self.exposureBiasControl]) { [_session addControl:self.exposureBiasControl]; } else { self.exposureBiasControl = nil; } if (_session.controlsDelegate == nil) { [_session setControlsDelegate:self queue:GetCaptureControlQueue()]; }
Oct ’24
AVAudioFile.processingFormat, only Float32 is allowed?
Here is some code I have to create an AVAudioFile instance based on Int16 samples. let format = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: 44100.0, channels: 2, interleaved: false)! let audioFile = try AVAudioFile(forWriting: outputURL, settings: format.settings) When writing to the file I get the following runtime error, presumably from CoreAudio. CABufferList.h:184 ASSERTION FAILURE [(nBytes <= buf->mDataByteSize) != 0 is false]: I read this as a size mismatch between what is specified in the format used to create the file and the file's own internal processingFormat property, which is read-only. Here is my debugger console output showing the input format I created, along with the resulting AVAudioFile fileFormat and processingFormat properties. (lldb) po format <AVAudioFormat 0x300e553b0: 2 ch, 44100 Hz, Int16, deinterleaved> (lldb) po format.settings ▿ 7 elements ▿ 0 : 2 elements - key : "AVNumberOfChannelsKey" - value : 2 ▿ 1 : 2 elements - key : "AVLinearPCMBitDepthKey" - value : 16 ▿ 2 : 2 elements - key : "AVFormatIDKey" - value : 1819304813 ▿ 3 : 2 elements - key : "AVLinearPCMIsNonInterleaved" - value : 1 ▿ 4 : 2 elements - key : "AVLinearPCMIsBigEndianKey" - value : 0 ▿ 5 : 2 elements - key : "AVLinearPCMIsFloatKey" - value : 0 ▿ 6 : 2 elements - key : "AVSampleRateKey" - value : 44100 (lldb) po audioFile.fileFormat <AVAudioFormat 0x300ea5400: 2 ch, 44100 Hz, Int16, interleaved> (lldb) po audioFile.processingFormat <AVAudioFormat 0x300ea5450: 2 ch, 44100 Hz, Float32, deinterleaved> Please note that the input format I'm using does not match either the audio file fileFormat or processingFormat properties. The file format is interleaved even though I specified de-interleaved. This makes sense to me as working with audio files that are growing is much easier and more efficient with interleaved data. The head-scratcher is the processingFormat. I specified Int16 samples and it is expecting Float32? According to the format settings dictionary, we are specifying the correct key/value pairs. Is this expected behavior? Does Apple always insist on Float32 internally or is this a bug?
Oct ’24
Error on connect AudioEngin with AudioPlayerNoded with AVAudioPCMFormatInt16
Hi community, I'm trying to setup an AVAudioFormat with AVAudioPCMFormatInt16. But, i've an error : AVAEInternal.h:125 [AUInterface.mm:539:SetFormat: ([[busArray objectAtIndexedSubscript:(NSUInteger)element] setFormat:format error:&nsErr])] returned false, error Error Domain=NSOSStatusErrorDomain Code=-10868 "(null)" If i understand the error code 10868, the format is not correct. But, how i can use PCM Int16 format ? Here is my method : - (void)setupAudioDecoder:(double)sampleRate audioChannels:(double)audioChannels { if (self.isRunning) { return; } self.audioEngine = [[AVAudioEngine alloc] init]; self.audioPlayerNode = [[AVAudioPlayerNode alloc] init]; [self.audioEngine attachNode:self.audioPlayerNode]; AVAudioChannelCount channelCount = (AVAudioChannelCount)audioChannels; self.audioFormat = [[AVAudioFormat alloc] initWithCommonFormat:AVAudioPCMFormatInt16 sampleRate:sampleRate channels:channelCount interleaved:YES]; NSLog(@"Audio Format: %@", self.audioFormat); NSLog(@"Audio Player Node: %@", self.audioPlayerNode); NSLog(@"Audio Engine: %@", self.audioEngine); // Error on this line [self.audioEngine connect:self.audioPlayerNode to:self.audioEngine.mainMixerNode format:self.audioFormat]; /**NSError *error = nil; if (![self.audioEngine startAndReturnError:&error]) { NSLog(@"Erreur lors de l'initialisation du moteur audio: %@", error); return; } [self.audioPlayerNode play]; self.isRunning = YES;*/ } Also, i see the audioEngine seem not running ? Audio Engine: ________ GraphDescription ________ AVAudioEngineGraph 0x600003d55fe0: initialized = 0, running = 0, number of nodes = 1 Anyone have already use this format with AVAudioFormat ? Thank you !
Oct ’24
Optimizing YOLOv8 for Real-Time Object Detection in a Specific Screen Area
I’m working on real-time object detection using YOLOv8, but I only need to detect objects in approximately 40% of the screen area. Is it possible to limit the captureOut method to focus solely on that specific region of the screen? If this isn’t feasible, I’m considering an approach where the full-screen pixel buffer is captured and then cropped to the target area before running detection. However, I’m concerned about how this might affect real-time performance. I’d appreciate any insights on how to maintain real-time performance or suggestions for better alternatives. Thank you!
Oct ’24
How to capture 48MP capture with Ultra wide lens using iPhone 16 pro max
I am working on capturing 48MP images using the iPhone 16 Pro Max with the Ultra-wide camera. I’ve updated the code to capture the maximum supported dimensions with the following snippet: if #available(iOS 16.0, *) { photoOutput.maxPhotoDimensions = device.activeFormat.supportedMaxPhotoDimensions.last! photoSettings.maxPhotoDimensions = .init(width: 5712, height: 4284) } However, I’m still not getting the expected results. My goal is to capture 48MP images, and I want to confirm if the Ultra-wide camera supports this resolution or if I’m missing any other configuration. Any guidance would be appreciated!
Oct ’24
Voice recording cannot be enabled in ios 17.2
AddInstanceForFactory: No factory registered for id <CFUUID 0x6000002e76c0> F8BB1C28-BAE8-11D6-9C31-00039315CD46 AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved HALSystem.cpp:2216 AudioObjectPropertiesChanged: no such object AQMEIO_HAL.cpp:2552 timeout AudioHardware-mac-imp.cpp:2706 AudioDeviceStop: no device with given ID AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved AudioQueueObject.cpp:6707 ConvertInput: aq@0x109994200: AudioConverterFillComplexBuffer returned -50, packetCount 5328 Xcode version 15.2(15C500b) iPhone 15Pro Version 17.2 (Simulator) Language : Swift In version 17.0 or above, there are no recording issues in the object-c project, iPhone and simulators can't start recording, Why?
Oct ’24
App crashes at launch on missing symbol AVPlayerView... except on first launch
I don't know what triggered this in a previously-running application I'm developing: When I have the build target set to "My Mac (designed for iPad)," I now must delete all the app's build materials under DerivedData to get the app to build and run exactly once. Cleaning isn't enough; I have to delete everything. On second launch, it will crash without even getting to the instantiation of the application class. None of my code executes. Also: If I then set my iPhone as the build target, the app will build and run repeatedly. If I then return to "My Mac (designed for iPad)," the app will again launch once and then crash on every subsequent launch. The crash is the same every time: dyld[3875]: Symbol not found: _OBJC_CLASS_$_AVPlayerView Referenced from: <D566512D-CAB4-3EA6-9B87-DBD15C6E71B3> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/Library/Debugger/libViewDebuggerSupport.dylib Expected in: <4C34313C-03AD-32EB-8722-8A77C64AB959> /System/iOSSupport/System/Library/Frameworks/AVKit.framework/Versions/A/AVKit Interestingly, I haven't found any similar online reports that mention this symbol. Has anyone seen this behavior before, where the crash only happens after the first run... and gets reset when you toggle the target type?
Nov ’24
Handling YOLOv8 Object Detection in 60FPS UltraWideCamera on iOS: Frame Processing Query
I am developing an iOS app that uses YOLOv8 for object detection and aims to detect objects at 60 FPS using the UltraWide camera. My goal is to process every frame within captureOutput and utilize the detected data (such as coordinates) for each one. I have a question regarding how background thread processing behaves in this scenario. Does the size of the YOLO model (n, s, m, etc.) or the weight of the operations inside captureOutput affect the number of frames that can be successfully processed? Specifically, I would like to know if all frames will be processed sequentially with a delay due to heavy processing in the background, or if some frames will be dropped and not processed at all. Any insights on how to handle this would be greatly appreciated. Thank you!
Oct ’24
Video Memory Leak when Backgrounding
While trying to control the following two scenes in 1 ImmersiveSpace, we found the following memory leak when we background the app while a stereoscopic video is playing. ImmersiveView's two scenes: Scene 1 has 1 toggle button Scene 2 has same toggle button with a 180 degree skysphere playing a stereoscopic video Attached are the files and images of the memory leak as captured in Xcode. To replicate this memory leak, follow these steps: Create a new visionOS app using Xcode template as illustrated below. Configure the project to launch directly into an immersive space (set Preferred Default Scene Session Role to Immersive Space Application Session Role in Info.plist. Replace all swift files with those you will find in the attached texts. In ImmersiveView, replace the stereoscopic video to play with a large 3d 180 degree video of your own bundled in your project. Launch the app in debug mode via Xcode and onto the AVP device or simulator Display the memory use by pressing on keys command+7 and selecting Memory in order to view the live memory graph Press on the first immersive space's button "Open ImmersiveView" Press on the second immersive space's button "Show Immersive Video" Background the app When the app tray appears, foreground the app by selecting it The first immersive space should appear Repeat steps 7, 8, 9, and 10 multiple times Observe the memory use going up, the graph should look similar to the below illustration. In ImmersiveView, upon backgrounding the app, I do: a reset method to clear the video's memory dismiss of the Immersive Space containing the video (even though upon execution, visionOS raises the purple warning "Unable to dismiss an Immersive Space since none is opened". It appears visionOS dismisses any ImmersiveSpace upon backgrounding, which makes sense..) Am I not releasing the memory correctly? Or, is there really a memory leak issue in either SwiftUI's ImmersiveSpace or in AVFoundation's AVPlayer upon background of an app? App file TestVideoLeakOneImmersiveView First ImmersiveSpace file InitialImmersiveView Second ImmersiveSpace File ImmersiveView Skysphere Model File Immersive180VideoViewModel File AppModel
Oct ’24
App lost audio spatialization from VisionOS 2 Update
Hi, I have a video player app that lost its audio spatialization since the VisionOS 2 update. I am using the VideoPlayerComponent (https://developer.apple.com/documentation/realitykit/videoplayercomponent), to implement my videos as entities, as I want a custom look and controls to my player. In VisionOS 1, there was automatic audio spatialization. Depending where my video entity is, the app automatically enables head tracking audio spatialization. Since VisionOS 2 however, I cannot get my video entities to play Spatial Audio. I've looked into DestinationVideo and even set up AVAudioSessionSpatialExperience but Spatial Audio is still not working. Appreciate any help. Thanks.
Oct ’24
Compatibility Between ARKit and Optical Zoom
Hello, I am a developer currently working on an AR application using ARKit. I aim to implement a Zoom feature that allows users to enlarge and reduce objects within the AR scene while simultaneously measuring the distance to those objects. Specifically, I want to incorporate Optical Zoom to provide a more natural and precise user experience. I have considered several approaches and would appreciate your advice on the most effective methods. Approaches Being Considered: Using UIPinchGestureRecognizer to Adjust the Camera's Field of View Modifying the scale Property of SCNNode to Enlarge/Reduce Specific Objects Leveraging AVFoundation to Control the Camera's Optical Zoom Questions: Compatibility Between ARKit and Optical Zoom: Is it feasible to control the camera's optical zoom using AVFoundation while utilizing ARKit's features? What should be considered when integrating these two frameworks? Integrating Object Distance Measurement with Zoom Functionality: What is the most effective approach to measure and display the distance to an object in real-time when a user zooms in on it? User Experience Considerations: Do you have any UI/UX design tips for implementing optical zoom to ensure a natural and intuitive experience? For example, how can visual feedback for zoom actions and distance measurements be effectively presented to users? Performance Optimization: What optimization strategies can minimize potential performance issues when implementing both optical zoom and distance measurement features simultaneously? Example Code and Reference Materials: Could you share any example code or reference materials that demonstrate similar functionalities? Thank you. Example Code Request: If possible, providing sample code that integrates optical zoom with distance measurement would be extremely helpful. Reference Links: Please share any tutorials or resources that demonstrate the combined use of ARKit and AVFoundation.
Oct ’24
Raw point cloud access
Hi, I currently have Enterprise API access and have observed that the main camera API only provides RGB data. I am trying to access point cloud information from LIDAR, but it seems ARKit doesn't offer this directly via the standard APIs that iPad uses. I wanted to ask if there are any possible options to access depth data or enhanced camera capabilities using the Enterprise API. Specifically: Does having Enterprise API access unlock any additional camera-related APIs in AVFoundation that could provide depth information or more advanced control over the camera? Are there any workarounds or alternative methods to obtain depth data from the camera?
Oct ’24
Toggling AVMusicTrack isMuted
Hi! I have an AVAudioSequencer with some AVMusicTracks that are filled with AVParameterEvents. If I toggle the isMuted property of a track, it will instantly mute when changed to true. However, after turning the muting to false, the events will only triggers on the next round of a loop and not instantly. Is this intended behaviour, and is there some way to get the events to trigger immediately after toggling the isMuted to be false?
Oct ’24
AddInstanceForFactory: No factory registered for id <CFUUID 0x6000002e76c0>
AddInstanceForFactory: No factory registered for id <CFUUID 0x6000002e76c0> F8BB1C28-BAE8-11D6-9C31-00039315CD46 AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved HALSystem.cpp:2216 AudioObjectPropertiesChanged: no such object AQMEIO_HAL.cpp:2552 timeout AudioHardware-mac-imp.cpp:2706 AudioDeviceStop: no device with given ID AudioQueueObject.cpp:1580 BuildConverter: AudioConverterNew returned -50 from: 0 ch, 16000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to: 2 ch, 16000 Hz, Int16, interleaved AudioQueueObject.cpp:6707 ConvertInput: aq@0x109994200: AudioConverterFillComplexBuffer returned -50, packetCount 5328 Why can't I start recording? ...
Oct ’24
Writing video using AVAssetWriter, AVAssetReader, and AVSPEECHSYNTHESIZER
Hello, First, some version and software details: Software: iOS 18.1 Hardware: iPhone 14 Pro Max and later Xcode: 16.0 Summary: AVAssetReader is not concatenating a video at the beginning of the output video. The output video should contain a scene of me introducing the content, followed by a blue screen with AVSpeechSynthesizer reading out a text that I pasted above the "Generate Video" button. Details: Now, let's talk about the app. Basically, I’m developing an app that generates a video with the following features: My app will create an output video that is split into an opening scene followed by a fully blue screen. The opening scene will be taken from a video I choose from my gallery. I will read the opening video using AVAssetReader as usual. After the opening scene, I will use the content of a text read by AVSpeechSynthesizer.write(). After the opening scene, the synthesized audio will start playing while the blue screen is displayed. All of this is already defined in the attached project. Each project file has a comment at the beginning introducing its content. How to test: Write something in the field above the "Generate Video" button. For example, type "Hello, world!" Then, press the "Library" button and select a video from the gallery, about 30 seconds long. That’s it. Press the "Generate Video" button. The result I’ve experienced is a crash or failure to generate the video. Practical example of what I want to achieve: Suppose I record a 30-second video where I say, "I’m going to tell you the story of Snow White." Then, I paste the "Snow White" story into the field above the "Generate Video" button. The output video should contain me saying, "I’m going to tell you the story of Snow White." After that, the AVSpeechSynthesizer will read the story I pasted, while displaying a blue screen. I look forward to a solution. Thank you very much! convertToCMSampleBuffer.swift convertToPixelBuffer.swift createInputs.swift createVideo.swift test.swift saveVideo.swift TestApp.swift editingVideo.swift sampleReaderProvider.swift misc.swift sampleProvider.swift
Nov ’24
SoundRecognition causes Input/Output callbacks to have varying Buffer sizes and introduces Glitching
Hello, We have noticed an issue with SoundRecognition that causes glitching with our AudioUnit setup in Smule. Input and output frame sizes are inconsistent. Input frame size does not match [AVAudioSession sharedInstance].IOBufferDuration My best guess is that SoundRecognition influences the input frame size and not the output frame size. To reproduce use the example app here: https://github.com/MarkoGill/SoundRecognitionBug Hardware/OS iPhone 14 Pro on iOS 18 -> Experiences the problem iPhone 11 on iOS 18 -> Experiences the problem iPhone 15 on iOS 18 -> Not experiencing the problem Reproduction Steps Enable Sound Recognition (Settings > Accessibility > Sound Recognition > On) Enable a Sound for detection (Sounds > Dog > On) Open the example app with headset (it routes input to output) Notice glitching occurs Check the logs. Record and Playback buffer sizes vary Example Log: AU input sample rate: 48000.000000 AU output sample rate: 48000.000000 hardware sample rate: 48000.000000 hardware buffer size: 1104.000000 updated record frame counts: 1024 updated playback frame counts: 1104 Notes: You can disable Sound Recognition, restart the app, and playback behaves correctly.
Oct ’24
Distorted Audio When Recording External Mics With AVCaptureSession and AVAssetWriter
I’m working on a macOS app, written in Swift. My goal is to record audio from an external microphone, e.g., one connected via USB. For this, I’m using an AVCaptureSession and recording its output with an AVAssetWriter. This works perfectly in principle (and reliably with internal microphones, for example). The problem occurs after my app has successfully completed the first recording and I then want to make additional recordings (which makes me think it might be process-dependent, because it works again after restarting the app). The problem: Noisy or distorted-sounding audio files. In addition, the following error message appears in the Console from CoreAudio / its AudioConverter: Input data proc returned inconsistent 512 packets for 2048 bytes; at 3 bytes per packet, that is actually 682 packets It is easy to reproduce. This problem is reproducible even if I don’t configure the AVAssetWriter manually and instead let it receive its audioSettings using a preset from an AVOutputSettingsAssistant. I’m running on macOS 15.0 (24A335). I’ve filed a feedback including a demo project → FB15333298 🎟️ I would greatly appreciate any help! Have a great day, Martin
Nov ’24