Hello,
I noticed that SFSpeechRecognizer is broken on iOS 18. During a recognition task, it keeps dropping the recognized text on every pause. For example, if you say "how are you fine", it will drop the "how are you" part and only give you "fine" as the result.
Say "how are you <pause> fine"
// iOS 17 ✅ (perfect final result)
How
How are
How are you
How are you.
How are you. Fine.
// iOS 18 ❌
How
How are
How are you
How are you
Fine
(the text before the pause is dropped, and fail to recognize the punctuations.)
Reproducing the issue:
Download the official sample project.
Run it on an iOS 18 device or simulator.
Say "how are you fine"
Only "fine" will be displayed.
Speech
RSS for tagRecognize spoken words in recorded or live audio using Speech.
Posts under Speech tag
46 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
let debugString = "<speak><emphasis level=\"reduced\">Hello</emphasis></speak>"
let utterance = AVSpeechUtterance(ssmlRepresentation: debugString)! // <--- Freezes
I encountered this bug in iOS 18 beta
I sent a feedback through Feedback app.
It looks like Apple has added some new API(s) to SFSpeechRecognition
My app, which is currently listed on App Store does feature speech recognition.
Yet, trying to use it under iOS 18.0 throws errors:
-[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)"
What happens is that after several words are transcribed and displayed, the next sentence results in previous words disappearance.
That's probably what that portion of the error text - "Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" means.
The problem occurs ONLY when the app is running under iOS 18.0
Even when it's compiled in Xcode 16.0 using iOS 17.5 everything works fine.
Any suggestions?
I'm writing an app that uses on-device voice to text for recognising scientific terms. It works fine on my phone but now in beta my first tester cannot make it work. All the permission requests are working: p&s Mic and Speech Recognition are both now enabled on the target device where the user granted the app permission. Is there something else I'm missing?
Incidentally, both my phone, the target phone and my XCode are fully up to date.
Thanks.
I'm having trouble using SFSpeechRecognizer & SFSpeechRecognitionTask to show me the words from an audio file. I found a solution on stackoverflow to separate the audio file into smaller sizes. How would I do that programmatically using Swift for a macOS app Xcode project?
I would prefer not to separate the file into smaller files. I will submit another post with more information for that.
When using the AVSpeechSynthesizer() , I get an error after a couple of seconds :"IPCAUClient.cpp:139 IPCAUClient: can't connect to server (-66748) <0x104309130>", and then it speaks the text.
The second time I call speak, there is no delay and error and it speaks immediately.
Where does this error and delay come from and how can I resolve it?
Intialization code:
self.audioSession = AVAudioSession.sharedInstance() // 2) handle audio session first, before trying to read the text
do {
try audioSession.setCategory(.playback, mode: .voicePrompt, options: .duckOthers)
try audioSession.setActive(false)
} catch let error {
Logger.model.debug("❓\(error.localizedDescription)")
}
speechSynthesizer = AVSpeechSynthesizer()
speechSynthesizer.usesApplicationAudioSession = true
Speak code:
let utterance = AVSpeechUtterance(string: text)
utterance.preUtteranceDelay = 0.1
utterance.rate = 0.5
utterance.pitchMultiplier = 0.75
utterance.prefersAssistiveTechnologySettings = false
self.speechSynthesizer.speak(utterance)
The last statement gives this error message!
hi,
i am currently developing an app that has core functionalities reliant on detecting user laughter in the background. in our early stages we noticed apple's built-in sound recognition functionality. at the core, i am guessing that sound recognition requires permission from the user to access the microphone 24/7. currently, using the conventional avenue of background audio recording, a yellow indicator will be present on the top of the iphone screen indicating recording. this is not the case for sound recognition; instead. if all sound processing/recognition is kept on-device, is there any way to avoid the yellow dot and achieve sound laughter in a way that is similar to how apple's sound recognition does it?
from the settings interface for sound recognition accessible to the user in the settings app, the only detectable "people" sounds are baby crying, coughing, and shouting. is it also possible to add laughter to this list somehow?
thank you in advance.
I am developing a visionOS app that captions speech in real environments. Currently, I am using Apple's built-in speech recognizer. However, when I was testing the app with a Vision Pro, the device seemed to only pick up the user's voice (in other words, the voices of the wearer of the Vision Pro device). For example, when the speech recognition task is running, and another person in front of me is talking, the system does not pick up the speech well.
I tried to set the AVAudioSession to be equally sensitive to all directions:
private func configureAudioSession() {
do {
try audioSession.setCategory(.record, mode: .measurement)
try audioSession.setActive(true)
if #available(visionOS 1.0, *) {
let availableDataSources = audioSession.availableInputs?.first?.dataSources
if let omniDirectionalSource = availableDataSources?.first(where: {$0.preferredPolarPattern == .omnidirectional}) {
try audioSession.setInputDataSource(omniDirectionalSource)
}
}
} catch {
print("Failed to set up audio session: \(error)")
}
}
And here is how I set up the speech recognition and configure the microphone inputs:
private func startSpeechRecognition(completion: @escaping (String) -> Void) {
do {
// Cancel the previous task if it's running.
if let recognitionTask = recognitionTask {
recognitionTask.cancel()
self.recognitionTask = nil
}
// The AudioSession is already active, creating input node.
let inputNode = audioEngine.inputNode
try inputNode.setVoiceProcessingEnabled(false)
// Create and configure the speech recognition request
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = recognitionRequest else { fatalError("Unable to create a recognition request") }
recognitionRequest.shouldReportPartialResults = true
// Keep speech recognition data on device
if #available(iOS 13, *) {
recognitionRequest.requiresOnDeviceRecognition = true
}
// Create a recognition task for speech recognition session.
// Keep a reference to the task so that it can be canceled.
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) { result, error in
// var isFinal = false
if let result = result {
// Update the recognizedText
completion(result.bestTranscription.formattedString)
} else if let error = error {
completion("Recognition error: \(error.localizedDescription)")
}
if error != nil || result?.isFinal == true {
// Stop recognizing speech if there is a problem
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
}
}
// Configure the microphone input
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
} catch {
completion("Audio engine could not start: \(error.localizedDescription)")
}
}
Hello!
I have noticed this in Sonoma and in the betas for Sequoia, the ARM variants. I am using the example from https://github.com/sveinbjornt/hear?tab=readme-ov-file in an attempt to cobble together an all-in-one transcription and low-level grammar checker utilizing LanguageTool.
I have noticed that the ram usage, specifically the swap, just keeps on climbing while it is processing an audio file. It is... quite amazing to see exactly how much swap the dang thing can use in a pinch. Frighteningly so considering the Mini I am using only has 256gb of storage.
Throw an eight hour mp3 audiobook at the process and see for yourself.
I am aware that localspeechrecognition wasn't really designed with the idea that people will be throwing audio files at it, so it is understandable that it wouldn't be equipped to gracefully handle this situation.
I am a novice programmer here. Seriously - this is my first major stab at programming since dabbling with Qbasic back in elementary school. Thus, this question: if there is a memory leak, is there a way to shunt the swap being used by the app to an external drive? I am willing to take the performance hit if it keeps the internal SSD from paying the ferryman sooner than expected due to excessive swap usage.
Thanks!
Here is the use case, I have a language learning app that uses AVSpeechSynthesizer ❤️. When a user listens to a phrase with the AVSpeechSynthesizer using a AVSpeechSynthesisVoice with a AVSpeechSynthesisVoiceQuality of default it sounds much much worse than voices with enhanced or premium, really affecting usability.
There appears to be no API for the app to know if there are enhanced or premium voices available to download (via Settings.app) but not yet downloaded to a device. The only API I could find is AVSpeechSynthesisVoice.speechVoices() which returns all available voices on the device, but not a full list of voices available via download. So the app cannot know if it should inform the user "hey this voice your listening to is a much lower quality than enhanced or premium, go to settings and download the enhanced or premium version".
Any ideas? Do I need to send in an enhancement request via Feedback Assistant? Thank you for helping my users ears and helping them turn speech synthesis voice quality up to 11 when it's available to them with just a couple of extra taps!
(I suppose the best workaround is to display a warning every time the user is using a default quality voice, I wonder what % of voices have enhanced or premium versions...)
import AVFoundation
Button {
let utterance = AVSpeechUtterance(string: "Hello world")
utterance.voice = AVSpeechSynthesisVoice(language: "en-GB")
utterance.rate = 1
let synthesizer = AVSpeechSynthesizer()
synthesizer.speak(utterance)
} label: {
Text("hello")
}
i omitted some code but this is the core part. When i run this on apple watch se 2 simulator (watch os 10.5) nothing happens and gives the error
Query for com.apple.MobileAsset.VoiceServicesVocalizerVoice failed: 2
Unable to list voice folder
Query for com.apple.MobileAsset.VoiceServices.GryphonVoice failed: 2
Unable to list voice folder
Query for com.apple.MobileAsset.VoiceServices.CustomVoice failed: 2
Unable to list voice folder
Query for com.apple.MobileAsset.VoiceServices.GryphonVoice failed: 2
Unable to list voice folder
Hi everyone !
I'm getting random crashes when I'm using the Speech Recognizer functionality in my app.
This is an old bug (for 8 years on Apple Forums) and I will really appreciate if anyone from Apple will be able to find a fix for this crashes.
Can anyone also help me please to understand what could I do to keep the Speech Recognizer functionality still available in my app, but to avoid this crashes (if there is any other native library available or a CocoaPod library).
Here is my code and also the crash log for it.
Code:
func startRecording() {
startStopRecordBtn.setImage(UIImage(#imageLiteral(resourceName: "microphone_off")), for: .normal)
if UserDefaults.standard.bool(forKey: Constants.darkTheme) {
commentTextView.textColor = .white
} else {
commentTextView.textColor = .black
}
commentTextView.isUserInteractionEnabled = false
recordingLabel.text = Constants.recording
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSession.Category.record)
try audioSession.setMode(AVAudioSession.Mode.measurement)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
showAlertWithTitle(message: Constants.error)
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
let inputNode = audioEngine.inputNode
guard let recognitionRequest = recognitionRequest else {
fatalError(Constants.error)
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
self.commentTextView.text = result?.bestTranscription.formattedString
isFinal = (result?.isFinal)!
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.startStopRecordBtn.isEnabled = true
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {[weak self] (buffer: AVAudioPCMBuffer, when: AVAudioTime) in // CRASH HERE
self?.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
showAlertWithTitle(message: Constants.error)
}
}
Here is the crash log:
Thanks for very much for reading this !
I got this SSML from w3. org. AVSpeechUtterance(ssmlRepresentation:) is not complying with the contour. It doesn't change hz.
<?xml version="1.0"?>
<speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
xml:lang="en-US">
<prosody contour="(0%,+20Hz) (10%,+30%) (40%,+10Hz)">
good morning
</prosody>
</speak>
override func viewDidLoad() {
super.viewDidLoad()
guard let localUtterance = AVSpeechUtterance(ssmlRepresentation: self.speechSML) else {
print("SML did not work.")
return
}
self.utterance = localUtterance
self.utterance.voice = self.voiceNoelle
}
self.synthesizer.speak(self.utterance)
Hello iOS Developer Community,
I hope this message finds you healthy and happy. I am reaching out to seek your expertise and assistance with a particular challenge I’ve encountered while using the Speak Screen and Speak Selection features on iOS.
As you may know, these features are incredibly useful for reading text aloud, but they sometimes struggle with the correct pronunciation of homographs—words that are spelled the same but have different meanings and pronunciations. An example of this is the word “live,” which can be pronounced differently based on the context of the sentence.
To enhance my user experience, I am looking to input corrections for the pronunciation of “live” in its “happening now” context, such as in “live broadcast” or “live event.” However, the current process requires manual entry for each phrase, which is quite labor-intensive.
I am wondering if there is a way to automate or streamline this process, perhaps through a shortcut or script that allows for bulk input of these corrections. Additionally, if anyone has already compiled a list of common phrases with homographs and their correct pronunciations, I would greatly appreciate it if you could share it or guide me on where to find such resources.
Your insights and guidance on this matter would be invaluable, and I believe any solutions could benefit not just myself but many other users facing similar issues.
Thank you for your time and consideration. I look forward to any suggestions or advice you may have.
Best regards,
Alec
Hello!
We have an app that utilises the SpeechKit Framework. Especially the local on-device speech recognition for the audio files with the user selected language.
Up until recently it worked as expected. However after updating one of our testing device to iOS 17.4.1 we found out that the local recognition on it stopped working completely.
The error that we are getting has code 102 at its localised description reads:
"Failed to access assets".
That sounds just like a rear though known issue in previous iOS versions. The solution was inconvenient for our users but at least it worked – they were to go to the System settings and tweak with the dictation setting in the keyboard section.
Right now no tweaks of this sort appear to help us fix the situation. We even tried to do the setting reset of the device (not the factory reset though). The error persists.
it appears one one of our devices 100% of the time, halting the local recognition process. It sometimes shows on other devices for some particular languages too, but it does not show for other languages.
As it is a UX breaking bug for our app, today I decided to check the logs of the Console app at the moment of the recognition attempt.
There are lots of errors with code 1101 which from our research appear to be the general notifications about some local recognition setup problems.
Removing the lines about the 1101 error from the log we have some interesting stuff remaining, that is (almost) never mentioned in any of the searchable webpages in the Internet. I assume they are the private API calls that the SpeechKit Framework executes under the hood:
default localspeechrecognition -[UAFAssetSet assetNamed:]_block_invoke 9067C4F1-0B29-4A57-85DD-F8740DF7C344: No assets in asset set com.apple.siri.understanding
default localspeechrecognition -[UAFAssetSet assetNamed:] 9067C4F1-0B29-4A57-85DD-F8740DF7C344: Returning com.apple.siri.asr.assistant from source none
error localspeechrecognition -[SFEntitledAssetManager _assetWithAssetConfig:regionId:] No asset found with name: com.apple.siri.asr.assistant, asset set: com.apple.siri.understanding, usage: <private>
error localspeechrecognition +[LSRConnection modelRootWithLanguage:clientID:modelOverrideURL:returningAssetType:error:] Fetch asset error (null)
error localspeechrecognition -[LSRConnection prepareRecognizerWithLanguage:recognitionOverrides:modelOverrideURL:anyConfiguration:task:clientID:error:] modelRoot is nil (null)
default OurApp [0x113e96d40] invalidated because the current process cancelled the connection by calling xpc_connection_cancel()
Looks like there are some language-model related problems that appeared after the device was updated to 17.4.1.
The Settings -> General -> Keyboard -> Dictation Languages appear to be configured correctly, the dictation toggle is On, we tried tweaking all these setting, rebooting the device and resetting the device settings.
However the log lines still tell us that there is something wrong with the private resources of the SpeechKit framework.
We are very concerned as the speech recognition is the core of out application's logic. And we don't understand what is the scale of possible impact of such a faulty behaviour (rare occurrences / some users / all users?) and how we can fix it to provide our users with the desired behaviour.
Description:
Problem Statement:
State the problem clearly: The Siri Intent for the "Next","Previous","Repeat" command is not working as expected within the Speech Framework.
Steps to Reproduce:
Provide a detailed description of the steps to reproduce the issue. For example:
Open the Speech Framework application.
Tap on the Siri button to activate voice input.
Say "Next" to trigger the intended action.
Observe that the action is not executed correctly.
IN Our Demo App:
Steps of my demo application as below:
Open SIRI
Speak: Check
In Response: Open dialog as below:
What user wants?
One 2) Next 3) Yes 4) Goodbye
Speak: Next
In Response: SIRI repeat same dialog (Step: 2)
3) Speak: Yes, or One or Goodbye
In Response: SIRI goes to next dialog.
Expected Behavior:
Should be get "Next" Value in siri kit intent or app intent.
Actual Behavior:
But it give previous user input key word give in siri kit intent and recuresively repeat dialog in app intent.
Device versions and Region and Language:
Device model: IPhone 11 and OS version: 17.4.1
Region: Us and Language: English(US)
Impact:
User Cant use Iterative dialog in one context.
Additional:
How Different command work on app intent and siri kit intent on diffrent diffrent device. you can follow No vise in order.
|| No || Diffrent Device test on Diffrent sinario || SiriKit intent || app Intent ||
| 1 | ISG iPhone 11 - Next | Not | Not |
| 2 | ISG iPhone 11 - Yes | Not | Yes (But Using Enum) |
| 3 | ISG iPhone 11 - GoodBye | Not | Yes (But Using Enum) |
| 4 | ISG iPhone 11 - One | Yes | Yes |
| 5 | iPad - Next | Not | Not |
| 6 | iPad - One | Yes | Yes |
| 7 | iPad - GoodBye | Not | Yes |
| 8 | iPad - Yes | Not | Yes |
| 9 | Simulator - iPhone 15 - Next, Yes, One, GoodBye | Yes | Yes |
Please help me in it...
Hello! I'm writing to the Apple developers to request the addition of an API for downloading premium voices directly within the app. Currently, this can only be done via the settings, which is not convenient for our users. As a developer for an application where this plays a crucial role, I ask you to take this into consideration. Thank you!
The application is developed in SwiftUI.
Our application is responsible for audio recording, transcribing the audio file and uploading it to the backend.
So, the 2 main components on the iOS application are : AVAudioRecorder, SFSpeechRecognizer.
The UI compromises a visual design which showcases the recording of audio, and lets the user know if the audio is being recorded on not using a Text component.
Lately the customer has been complaining that though the application says “Recording ” on the UI, their audios are not being are not being received at the backend.
The customers try restarting there device(iPad) and the application started working normally
We haven’t been able to reproduce the issue. But we suspect an intermittent failure in audio transmission or a potential UI freezing.
Note : I have tried using Leaks instrument and had not encountered any memory leaks while using the application.
Is there a way to determine whether the issue lies with the audio recorder, the speech recognizer, or elsewhere in the app?
Are there any known issues or limitations with audio recorder lately on iOS that could be causing this behaviour?
Please let me know if you have any suggestions to diagnose this issue.
Also, do let me know if more information is required
Thank you in advance
I would like to contact a developer on the SSML team regarding the possibility to create a new downloadable voice, in a language yet unsupported. I don't mind making a free contribution. Creating Custom voices does not seem to be a solution, since only English is supported when creating a custom voice.
I am trying to use the Speech Synthesizer to speak the pronunciation of a word in British English rather than play a local audio file which I had before. However, I keep getting this in the debugger:
#FactoryInstall Unable to query results, error: 5 Unable to list voice folder Unable to list voice folder Unable to list voice folder IPCAUClient.cpp:129 IPCAUClient: bundle display name is nil Unable to list voice folder
Here is my code, any suggestions??
` func playSampleAudio() {
let speechSynthesizer = AVSpeechSynthesizer()
let speechUtterance = AVSpeechUtterance(string: currentWord)
// Search for a voice with a British English accent.
let voices = AVSpeechSynthesisVoice.speechVoices()
var foundBritishVoice = false
for voice in voices {
if voice.language == "en-GB" {
speechUtterance.voice = voice
foundBritishVoice = true
break
}
}
if !foundBritishVoice {
print("British English voice not found. Using default voice.")
}
// Configure the utterance's properties as needed.
speechUtterance.rate = AVSpeechUtteranceDefaultSpeechRate
speechUtterance.pitchMultiplier = 1.0
speechUtterance.volume = 1.0
// Speak the word.
speechSynthesizer.speak(speechUtterance)
}