I’ve encountered an issue when trying to transcribe audio during a SharePlay session in VisionOS. Specifically, the AVAudioSession appears to fail when sharing audio, preventing successful transcription. The problem seems related to AVAudioSession.sharedInstance() and using the .mixWithOthers option, which is supposed to enable multiple audio sources to coexist without interference.
Here’s the relevant code snippet that throws the error:
private static func prepareEngine() throws -> (AVAudioEngine, SFSpeechAudioBufferRecognitionRequest) {
let audioEngine = AVAudioEngine()
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.playAndRecord, mode: .default, options: [.mixWithOthers, .allowBluetooth])
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
let inputNode = audioEngine.inputNode
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
request.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
return (audioEngine, request)
}
The setup is designed to initialize an AVAudioEngine and a SFSpeechAudioBufferRecognitionRequest for real-time transcription, but fails within the SharePlay context. Notably, while .mixWithOthers is intended to handle concurrent audio sessions, it doesn’t appear to work as expected during SharePlay. The audioSession.setActive(true) line is where the setup typically fails, with no clear solution to proceed.
Has anyone else faced similar issues with AVAudioSession and SharePlay in VisionOS? Any insights on how to manage audio sharing or transcription during a SharePlay session would be greatly appreciated!
The specific error is:
The operation couldn't be completed. (com.apple.coreaudio.avfaudio error 561145187.)
Post
Replies
Boosts
Views
Activity
In VisionOS versions 2.1 and 2.2, I’m encountering a significant limitation when using the .immersionStyle(selection: .constant(.mixed), in: .mixed) mode, specifically in mixed immersive style. Here’s a breakdown of the behavior:
In full immersion mode (.immersionStyle(selection: .constant(.full), in: .full)), users can interact with and manipulate system windows while inside a 3D model, allowing typical interactions like moving windows, pinching, or activating UI switches.
However, in mixed immersive mode, using the exact same layout “inside” a 3D model (which doesn’t visually obstruct the window), users are unable to interact with window content or move the window. Basic interactions like pinching or toggling switches require users to physically touch these elements in AR space, which is inconsistent with the behavior in full immersion.
From a usability perspective, this restriction seems unnecessary, as the software should ideally allow for similar interaction capabilities across both immersive styles. The expected behavior is to enable window manipulation within a 3D model in mixed mode, matching the functionality observed in full immersion.
The scene in question is a House in which the user is placed during the immersion that's why I am referring to the user being "Inside" of the scene.
Has anyone else experienced this or found a workaround?
I’m working on a memo app that records audio from the iPhone’s microphone (and other devices like MacBook or iPad) and processes it in 10-second chunks at a target sample rate of 16 kHz. However, I’ve encountered limitations with installTap in AVAudioEngine, which doesn’t natively support configuring a target sample rate on the mic input (the default being 44.1 kHz).
To address this, I tried using AVAudioMixerNode to downsample the mic input directly. Although everything seems correctly configured, no audio is recorded—just a flat signal with zero levels. There are no errors, and all permissions are granted, so it seems like an issue with downsampling rather than the mic setup itself.
To make progress, I implemented a workaround by tapping and resampling each chunk tapped using installTap (every 50ms in my case) with AVAudioConverter. While this works, it can introduce artifacts at the beginning and end of each chunk, likely due to separate processing instead of continuous downsampling.
Here are the key issues and questions I have:
1. Can we change the mic input sample rate directly using AVAudioSession or another native API in AVAudio? Setting up the desired sample rate initially would be ideal for my use case.
2. Are there alternatives to installTap for recording audio at a different sample rate or for continuously downsampling the live input without chunk-based artifacts?
This issue seems longstanding, as noted in a 2018 forum post:
https://forums.developer.apple.com/forums/thread/111726
Any guidance on configuring or processing mic input at a lower sample rate in real-time would be greatly appreciated. Thank you!