Speech

Unclear interimResults Web Speech API implementation in Safari iOS (WebKit)

Hi all! I have been working on a web speech recognition service using the Web Speech API. This service is intended to work on smartphones, primarily Chrome on Android and Safari (or WebKit WebView) on iOS. In my specific use case, I need to set the properties continuous = true and interimResults = true. However, I have noticed that interimResults = true does not always work as expected in WebKit. I understand that this setting should provide fast, native, on-device speech recognition with isFinal = false. However, at times, the recognition becomes throttled and slow, yielding isFinal = true and switching to cloud-based recognition. To confirm whether the recognition is cloud-based, I tested it by disabling the internet connection before starting speech recognition. In some cases, recognition fails entirely, which suggests that requiresOnDeviceRecognition = false is being applied. (Reference: SFSpeechRecognitionRequest.requiresOnDeviceRecognition) I believe this is not the expected behavior when setting interimResults = true. I have researched the native services used by the Web Speech API on iOS devices, and the following links seem relevant: • SFSpeechRecognizer • SFSpeechRecognitionRequest.shouldReportPartialResults • SFSpeechRecognizer.supportsOnDeviceRecognition • Recognizing speech in live audio • Apple Developer Forums Discussion I found that setRequiresOnDeviceRecognition and setShouldReportPartialResults appear to be set correctly, but apparently, they do not work as expected: WebKit Source Code

Safari & Web General WebKit Speech Safari and Web

0

61

7h

SFSpeechRecognizer throws User denied access to speech recognition

I have created an app where you can speak using SFSpeechRecognizer and it will recognize you speech into text, translate it and then return it back using speech synthesis. All locales for SFSpeechRecognizer and switching between them work fine when the app is in the foreground but after I turn off my screen(the app is still running I just turned off the screen) and try to create new recognitionTask it it receives this error inside the recognition task: User denied access to speech recognition. The weird thing about this is it only happens with some languages. The error happens with Croatian or Hungarian locale for speech recognition but doesn't with English or Spanish locale.

Media Technologies General Speech

0

199

1w

How to uninstall/delete Voice Control on macOS?

How to uninstall/delete Voice Control on macOS so that I can test my app for the case when the initial use of Voice Control causes it to be downloaded from Apple? Is there a folder in the macOS System or Library to delete to force a re-download of Voice Control? My macOS app uses the older NSSpeechRecognizer to handle speech commands, but to use NSSpeechRecognizer required authorization via [SFSpeechRecognizer requestAuthorization...]. I do this and on a macOS system it can trigger a download of Voice Control, the macOS feature. An alert appears with: "A 390 MB download is required to use speech recognition features in MyApp. You may need to quit and open MyApp again after download completes."

App & System Services General Speech Siri and Voice

0

181

Jan ’25

Am I allowed to use Speech framework on Swift Student Challenge?

Hello! I would like to use Speech Framework on my App Playground for this year challenge. But I still can't understand if I am allowed to use it to respect the rule of "not rely on a network connection". That's why: Speech framework can use on-device Speech recognition – No internet connection needed ✅. But it can ask to download an Apple's native language package to use it for this on-device recognition – To get this, you need to be connected to the Internet ❌. When I try to add a Speech Recognition Capabilities on my App Playground, its' description says: "Required to perform speech recognition using Apple's servers." (screenshot is attached). Does it mean that I won't be able to use on-device recognition on my App Playground? – And therefore, only online-version of this framework is available and I can't use it to participate on the challenge successfully❓. If it's possible, could you please make it clearer? This framework is crucial for my App Playground and I really need this to make it work. Thanks for your help in advance! And a have a good day!

Developer Tools & Services Swift Playground Swift Student Challenge Speech

1

0

339

Jan ’25

Any way to adjust the speechRecognitionMetadata pause duration?

Speech Framework I've been checking for SFSpeechRecognitionMetadata to determine the end of a sentence when using Voice Recognition. Yet it doesn't detect small pauses but only large ones, so that I've transcribed basically an entire paragraph before going onto the next one. Besides implementing your own timer, are there any other ways to have more natural pauses to detect the end of sentences, similar to the browser's Web Speech recognition? Since it's in Safari, I assume there should be some similar feature that can be equivalent in MacOS.

App & System Services General Speech

0

205

Dec ’24

Speech synthesis from Safari app extension

I'm making a Safari extension for learning languages. I need speech synthesis for any language the user chooses to learn. I initially tried to make this work within JavaScript, but Safari 18 doesn't reliably list voices for all languages on the web SpeechSynthesis API as described here: https://stackoverflow.com/questions/79179072/how-do-you-use-a-japanese-voice-with-speechsynthesis-in-safari-ios-18 As a workaround, I've had to use AVSpeechSynthesizer in SafariWebExtensionHandler (NSExtensionRequestHandling implementation for the extension). This works in the simulator but not on a real device. I've found this note from Apple in a StackOverflow reply: "Safari extensions are very short-lived, hence not fit for audio playback or speech synthesis. Not being able to validate an app extension in Xcode with a manually-added plist entry for background audio is the designed behavior. The general recommendation is to synthesize speech using JavaScript in conjunction with the Web Speech API." Unfortunately, the suggestion to use the Web Speech API is unsuitable as I just explained. Is there a way to set up a background process in the host app that can do speech synthesis? The app extension would need a way to communicate with this process, and start it if it's not running. Is that possible?

Safari & Web General Speech Safari Extensions Background Tasks

0

401

Dec ’24

How to implement continuous speech recognition in the background?

Hi, I'd like to develop an app which runs speech recognition even after going into background. I know I can accomplish this using audio background mode and the process the audio but I am not sure if this workaround would get accepted into App Store because of the processing limitations while in the background. How can I accomplish this while still being compliant with Apples privacy policy and other restrictions? Thanks, Marek

App & System Services General iOS Speech AVAudioSession Background Tasks

0

330

Dec ’24

Starting/restarting SFSpeechRecognizer?

Hello all, I'm working on a project that involves listening to a person speak off of a script and I want to stop then restart the recognitionTask between sections so I don't run afoul of keeping the recognitionTask running for longer than it needs to. Also, I'd like to be able to flush the current input between sections so the input from the previous section doesn't roll over into the next one. This is based on the sample code for SFSpeechRecognizer so there's a chance I might be misunderstanding something. private func restartRecording() { let inputNode = audioEngine.inputNode audioEngine.stop() inputNode.removeTap(onBus: 0) recognitionRequest?.endAudio() recordingStarted = false recognitionTask?.cancel() do { try startRecording() } catch { print("Oopsie.") } } Here's my code. When I run it, the recognition task doesn't restart. Any ideas?

Machine Learning & AI General Speech

0

405

Dec ’24

Speech Recognizer Clears Transcription After Pause on iOS 18.0

Description: I have encountered an issue with SFSpeechRecognizer on iOS 18.0. During live dictation, if a natural pause (e.g., 1-2 seconds) is introduced, the previously transcribed text is cleared, and the transcription starts over. This behavior makes it difficult to use the API for real-time speech recognition scenarios where pauses are expected. Steps to Reproduce: Open Apple's demo app "SpokenWord". Start the dictation process using SFSpeechRecognizer. Speak a few words, pause for 1-2 seconds, and then continue speaking. Observe that the previously transcribed text is truncated, and the transcription starts anew. Expected Behavior: The transcription should continue appending new results to the previous ones after a natural pause, maintaining a seamless user experience. Observed Behavior: After a pause, the transcription resets, clearing previously transcribed text. Impact: This behavior makes the SFSpeechRecognizer API unreliable for scenarios requiring continuous speech recognition with intermittent pauses. Additional Information: iOS Version: 18.0 Device: [Specify your device, e.g., iPhone 13 Pro] Speech Recognizer Locale: [Specify locale, e.g., en-US] App Behavior: Issue persists in both Apple's demo app ('SpokenWord') and custom implementations.

App & System Services General Speech

1

0

384

Dec ’24

AVSpeechUtterance problem

I see this error in the debugger: #FactoryInstall Unable to query results, error: 5 IPCAUClient.cpp:129 IPCAUClient: bundle display name is nil Error in destroying pipe Error Domain=NSCocoaErrorDomain Code=4099 "The connection from pid 5476 on anonymousListener or serviceListener was invalidated from this process." UserInfo={NSDebugDescription=The connection from pid 5476 on anonymousListener or serviceListener was invalidated from this process.} on this function: func speakItem() { let utterance = AVSpeechUtterance(string: item.toString()) utterance.voice = AVSpeechSynthesisVoice(language: "en-GB") try? AVAudioSession.sharedInstance().setCategory(.playback) utterance.rate = 0.3 let synthesizer = AVSpeechSynthesizer() synthesizer.speak(utterance) } When running without the debugger, it will (usually) speak once, then it won't speak unless I tap the button that calls this function many times. I know AVSpeech has problems that Apple is long aware of, but I'm wondering if anyone has a work around. I was thinking there might be a way to call the destructor for AVSpeechUtterance and generate a new object each time speech is needed, but utterance.deinit() shows: "Deinitializers cannot be accessed"

UI Frameworks SwiftUI Speech

0

1

388

Nov ’24

ios 18.2 beta(22C5131E) Speech Recognition Discards Previously Transcribed Audio

I was testing SFSpeechRecognition on my real device running ios 18.2 beta, and found that the result's "final" field is true, the result itself does not contain entire conversation's transcription. I came across some blog posts saying it's fixed in a 18.1 beta, is this not the case for 18.2 beta? Example code: recognitionTask = recognizer.recognitionTask(with: request) { [weak self] result, error in guard let self = self else { return } if let error = error { DispatchQueue.main.async { self.errorMessage = "Transcription failed: \(error.localizedDescription)" self.isTranscribing = false } } else if let result = result, result.isFinal { // HERE! } }

App & System Services General Speech

4

0

573

Nov ’24

AVSpeechUtterance Mandarin voice output replaced by SIRI language setting after upgraded the IOS to 18

Hi, Apple's engineer. Hoping that you can reply to this one. We're developing a Text-to-Speak app. Everything went well until the IOS got upgraded to 18. AVSpeechSynthesisVoice(language: "zh-CN") is running well under IOS 16 AND IOS 17. It speaks Mandarin correctly. In IOS 18, we noticed that Siri's Language setting interrupted the performance of AVSpeechSynthesisVoice. It plays Cantonese instead of Mandarin. Buggy language setting in Siri that affects the AVSpeechSynthesisVoice : Chinese (Cantonese - China mainland) Chinese (Cantonese -Hong Kong)

Media Technologies Audio Siri and Voice AVFoundation Speech

3

597

Jan ’25

Batch transcribe from file fails on all but the last, async problem?

I am attempting to do batch Transcription of audio files exported from Voice Memos, and I am running into an interesting issue. If I only transcribe a single file it works every time, but if I try to batch it, only the last one works, and the others fail with No speech detected. I assumed it must be something about concurrency, so I implemented what I think should remove any chance of transcriptions running in parallel. And with a mocked up unit of work, everything looked good. So I added the transcription back in, and 1: It still fails on all but the last file. This happens if I am processing 10 files or just 2. 2: It no longer processes in order, any file can be the last one that succeeds. And it seems to not be related to file size. I have had paragraph sized notes finish last, but also a single short sentence that finishes last. I left the mocked processFiles() for reference. Any insights would be greatly appreciated. import Speech import SwiftUI struct ContentView: View { @State private var processing: Bool = false @State private var fileNumber: String? @State private var fileName: String? @State private var files: [URL] = [] let locale = Locale(identifier: "en-US") let recognizer: SFSpeechRecognizer? init() { self.recognizer = SFSpeechRecognizer(locale: self.locale) } var body: some View { VStack { if files.count > 0 { ZStack { ProgressView() Text(fileNumber ?? "-") .bold() } Text(fileName ?? "-") } else { Image(systemName: "folder.badge.minus") Text("No audio files found") } } .onAppear { files = getFiles() Task { await processFiles() } } } private func getFiles() -> [URL] { do { let documentsURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first! let path = documentsURL.appendingPathComponent("Voice Memos").absoluteURL let contents = try FileManager.default.contentsOfDirectory(at: path, includingPropertiesForKeys: nil, options: []) let files = (contents.filter {$0.pathExtension == "m4a"}).sorted { url1, url2 in url1.path < url2.path } return files } catch { print(error.localizedDescription) return [] } } private func processFiles() async { var fileCount = files.count for file in files { fileNumber = String(fileCount) fileName = file.lastPathComponent await processFile(file) fileCount -= 1 } } // private func processFile(_ url: URL) async { // let seconds = Double.random(in: 2.0...10.0) // await withCheckedContinuation { continuation in // DispatchQueue.main.asyncAfter(deadline: .now() + seconds) { // continuation.resume() // print("\(url.lastPathComponent) \(seconds)") // } // } // } private func processFile(_ url: URL) async { let recognitionRequest = SFSpeechURLRecognitionRequest(url: url) recognitionRequest.requiresOnDeviceRecognition = false recognitionRequest.shouldReportPartialResults = false await withCheckedContinuation { continuation in recognizer?.recognitionTask(with: recognitionRequest) { (transcriptionResult, error) in guard transcriptionResult != nil else { print("\(url.lastPathComponent.uppercased())") print(error?.localizedDescription ?? "") return } if ((transcriptionResult?.isFinal) == true) { if let finalText: String = transcriptionResult?.bestTranscription.formattedString { print("\(url.lastPathComponent.uppercased())") print(finalText) } } } continuation.resume() } } }

Media Technologies General Speech Concurrency

0

422

Nov ’24

Strange error in com.apple.speech.localspeechrecognition that doesn't affect output?

While running Swift's SpeechRecognition capabilities I get the error below. However, the app successfully transcribes the audio file. So am not sure how worried I have to be, as well, would like to know that if when that error occurred, did that mean that the app went to the internet to transcribe that file? Yes, requiresOnDeviceRecognition is set to false. Would like to know what that error meant, and how much I need to worry about it? Received an error while accessing com.apple.speech.localspeechrecognition service: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)"

Community Apple Developers Swift Speech

0

481

Oct ’24

VoiceOver needs to support CFBundleSpokenName

VoiceOver does not support the plist property CFBundleSpokenName. This is wrong and should be fixed. Ultimately the issue I am dealing with is that our app name is UWCU, and instead of VoiceOver pronouncing each letter, it tries to read this as a word and horribly butchers our organization's/app's name. Alternatives such as using U.W.C.U. and U W C U are not acceptable. @Apple, I know you're first response is going to be "no, it is working perfectly," but quite frankly you are wrong. I know you feel strongly about this, given your response in posts like this: https://forums.developer.apple.com/forums/thread/734545?answerId=760084022 HOWEVER, with iOS 18, your argument for "VoiceOver should read what's on the screen" doesn't hold water anymore. With iOS 18, you Apple have added a new feature that lets users customize their home screens and completely remove the name of apps. Here's your own guide: https://support.apple.com/guide/iphone/customize-apps-and-widgets-on-the-home-screen-iph385473442/ios Quoted from your guide: Make the icons bigger: Tap Large. (In large size, the names of the apps disappear.) With large icons + VoiceOver turned on, VoiceOver still reads the app name even though it has disappeared from the screen. So, your own argument "VoiceOver should read the text as it appears on the screen" is invalid, because there is NO text on the screen. If you can't tell, I'm pretty peeved about all this. There's a reason why screen readers support aria attributes to help deliver the right accessible experience. It's a simple ask for VoiceOver to do the same thing.

Accessibility & Inclusion General Speech Siri and Voice Accessibility

3

0

753

Oct ’24

SFCustomLanguageModelData.CustomPronunciation and X-SAMPA string conversion

Can anyone please guide me on how to use SFCustomLanguageModelData.CustomPronunciation? I am following the below example from WWDC23 https://wwdcnotes.com/documentation/wwdcnotes/wwdc23-10101-customize-ondevice-speech-recognition/ While using this kind of custom pronunciations we need X-SAMPA string of the specific word. There are tools available on the web to do the same Word to IPA: https://openl.io/ IPA to X-SAMPA: https://tools.lgm.cl/xsampa.html But these tools does not seem to produce the same kind of X-SAMPA strings used in demo, example - "Winawer" is converted to "w I n aU @r". While using any online tools it gives - "/wI"nA:w@r/".

Media Technologies Audio Speech

4

2

594

Dec ’24

"AVSpeechSynthesisVoice" choppy at start.

So, I'm trying to create my own text-to-speech setup. Problem I'm having is whenever I do a test run, the speech gets a bit choppy at the start kind of skipping over maybe a word or a few characters. A few details: I've essentially built a separate class for handling the speech events. AVSpeechSynthesizer is set up as a private variable for the class so I don't expect deallocation to be the issue. Especially since it's a problem at the start. I've got a queue set up for what it's worth so that shouldn't be a problem. I'd appreciate any advice.

App & System Services General Accessibility AVFoundation Speech

2

0

368

Oct ’24

Where can I find documentation for Apple's implementation of SMSL?

Hello. I can't find anything about the SSML that is used in Apple's speech synthesis. SSML from Google, Amazon and W3C either don't work or work incorrectly. Where is Apple's documentation for their implementation of SSML?

Machine Learning & AI General Speech

0

429

Oct ’24

Speech diarization

Have there been any hints that Apple may offer speech diarization services (speech recongnition which recognizes multiple speakers) in the near future?

App & System Services General Speech

2

0

499

Oct ’24

(iOS 18) SFSpeechRecognitionResult providing new text after a gap in speaking

Here is the demo from Apple's site This issues is specific to iOS 18. When running this demo, we are getting new text when we have a gap in speaking, the recognitionTask(with:resultHandler:) provides new text which is only spoken after the gap and not the concatenation of old text and the new spoken text.

Media Technologies Audio Speech AVAudioEngine

5

0

864

Dec ’24

Post

Replies

Boosts

Views

Activity

Speech

Posts under Speech tag

Post

Replies

Boosts

Views

Activity