Speech

Audio Recognition and Live captioning

Hi Apple Team, We have a technical query regarding one feature- Audio Recognition and Live captioning. We are developing an app for deaf community to avoid communication barriers. We want to know if there is any possibility to recognize the sound from other applications in an iPhone and show live captions in our application (based on iOS).

Machine Learning & AI General Speech

0

471

Dec ’23

IsFormatSampleRateAndChannelCountValid false when playing outside audio

My app listens for verbal commands "Roll" & "Skip". It was working well until I used it while listening to a podcast in another app. I am getting a crash with the error: Thread 1: "required condition is false: IsFormatSampleRateAndChannelCountValid(format)" . It crashes when I am playing audio from the apps Snipd (a podcast app) or the Apple Podcast app. When I am playing audio from Youtube or the Apple Music it does not crash. This is the code for when I start listening for the commands: // MARK: - Speech Recognition func startListening() { do { try configureAudioSession() createRecognitionRequest() try prepareAudioEngine() } catch { print("Audio Engine error: \(error.localizedDescription)") } } private func configureAudioSession() throws { let audioSession = AVAudioSession.sharedInstance() try audioSession.setCategory(.playAndRecord, mode: .measurement, options: [.interruptSpokenAudioAndMixWithOthers, .duckOthers]) try audioSession.setActive(true, options: .notifyOthersOnDeactivation) } private func createRecognitionRequest() { recognitionRequest = SFSpeechAudioBufferRecognitionRequest() guard let recognitionRequest = recognitionRequest else { return } recognitionRequest.shouldReportPartialResults = true recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: handleRecognitionResult) } private func prepareAudioEngine() throws { let inputNode = audioEngine.inputNode inputNode.removeTap(onBus: 0) let inputFormat = inputNode.inputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: inputFormat) { [weak self] (buffer, _) in self?.recognitionRequest?.append(buffer) } audioEngine.prepare() try audioEngine.start() isActuallyListening = true } Thanks

App & System Services Core OS iOS Speech AVAudioEngine

2

1

1.7k

Jan ’24

AVAudioEngine & AVAudioPlayer Voice Processing Volume.

As the title suggests I am using AVAudioEngine for SpeechRecognition input & AVAudioPlayer for sound output. Apple says in this talk https://developer.apple.com/videos/play/wwdc2019/510 that the setVoiceProcessingEnabled function very usefully cancels the output from speaker to the mic. I set voiceProcessing on the Input and output nodes. It seems to work however the volume is low, even when the system volume is turned up. Any solution to this would be much appreciated.

Machine Learning & AI General Speech AVAudioEngine AVAudioNode

0

832

Dec ’23

Microphone not working in iOS simulators under macos Sonoma 14.1.2

Hello, I am trying to test a speech to text feature in several iPhone simulators, but microphones don't seem to work. The microphone and speech recognition permissions are correctly asked for the feature. My internal and external microphones are detected in I/O options in simulators. But nothing happens when I launch the recognition. The recognition doesn't work also for speech to text in native messages keyboard or Siri. This problem is the same for all the simulators so I believe the issue is about Xcode permissions not accessing microphone. In my settings > Privacy & Security > Microphone : can't see Xcode (Considering an other issue, I can't see Xcode Source Editor in Extensions as well) I've already tried to uninstall and reinstall Xcode. I use Xcode 15.0.1 under Sonoma 14.1.2. Any help is welcome.

Developer Tools & Services Xcode iOS Speech Xcode Simulator

4

3

2.1k

Dec ’23

NSSpeechRecognitionUsageDescription not working

I have gotten an error stating, "This app has crashed because it attempted to access privacy-sensitive data without a usage description. The app's Info.plist must contain an NSSpeechRecognitionUsageDescription key with a string value explaining to the user how the app uses this data." But I have already added NSSpeechRecognitionUsageDescription to my info.plist and the error is still occuring. Anyone have a solution to this?

Developer Tools & Services Xcode Speech Xcode

1

0

1k

Nov ’23

SFSpeechRecognizer.isAvailable returns wrong values

As of iOS 17 SFSpeechRecognizer.isAvailable returns true, even when recognition tasks cannot be fulfilled and immediately fail with error “Siri and Dictation are disabled”. The same speech recognition code works as expected in iOS 16. In iOS 16, neither Siri or Dictation needed to be enabled to have SpeechRecognition to be available and it works as expected. In the past, once permissions given, only an active network connection is required to have functional SpeechRecognition. There seems to be 2 issues in play: In iOS 17, SFSpeechRecognizer.isAvailable incorrectly returns true, when it can’t fulfil requests. In iOS 17 dictation or Siri being enabled is required to handle SpeechRecognition tasks, while in iOS 17 this isn’t the case. If issue 2. Is expected behaviour (I surely hope not), there is no way to actually query if Siri or dictation is enabled to properly handle those cases in code and inform the user why speech recognition doesn’t work. Expected behaviour: Speech recognition is available when Siri and dictation is disabled SFSpeechRecognizer.isAvailable returns correctly false when no SpeechRecognition requests can be handled. iOS Version 17.0 (21A329) Xcode Version 15.0 (15A240d) Anyone else experiencing the same issues or have a solution? Reported this to Apple as well -> FB13235751

App & System Services Core OS iOS iPadOS Speech

1

0

1.1k

Apr ’24

Using mixToTelephonyUplink to allow speech synthesizer to be audible during a phone call

I'd like to allow the speech synthesizer to play on the device speaker while simultaneously mixing with a phone call. I've worked with a number of different configurations but am unable to find a configuration that achieves the functionality I am trying to achieve - or allows mixing with a phone call at all. There is a flag: mixToTelephonyUplink that seems to suggest that at least some mixing with a phone call is possible using the speech synthesizer, but I'm currently unable to find almost any documentation about this flag besides basic API docs. I've had some some luck at least getting the synthesizer to always play to the speaker with the following audio session configuration - but the sound never is mixed with a phone call. Instead, it is ducked and muted while the phone call takes place. I've tried quite a few configuration combinations for the category and overrides, but nothings seems to work quite as I'd expect it to. synthesizer.mixToTelephonyUplink = true try? audioSession.setCategory(.playback, mode: .voicePrompt, options: [.mixWithOthers, .defaultToSpeaker]) try? audioSession.setActive(true, options: []) try? audioSession.overrideOutputAudioPort(.speaker) Is there some kind of documentation for this that's off the beaten path that I'm somehow missing? I'm going to continue with guess and check, but I'm starting to think this flag - and the functionality it implies, actually wasn't ever fully implemented.

App & System Services Core OS iOS Speech AVAudioSession Core Telephony

1

0

1.2k

Mar ’24

TTS problem iOS 17 beta

I see a lot of crashes on iOS 17 beta regarding some problem of "Text To Speech". Does anybody has a clue why TTS crashes? Anybody else seeing the same problem? Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Subtype: KERN_INVALID_ADDRESS at 0x000000037f729380 Exception Codes: 0x0000000000000001, 0x000000037f729380 VM Region Info: 0x37f729380 is not in any region. Bytes after previous region: 3748828033 Bytes before following region: 52622617728 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL MALLOC_NANO 280000000-2a0000000 [512.0M] rw-/rwx SM=PRV ---> GAP OF 0xd20000000 BYTES commpage (reserved) fc0000000-1000000000 [ 1.0G] ---/--- SM=NUL ...(unallocated) Termination Reason: SIGNAL 11 Segmentation fault: 11 Terminating Process: exc handler [36389] Triggered by Thread: 9 ..... Thread 9 name: Thread 9 Crashed: 0 libobjc.A.dylib 0x000000019eeff248 objc_retain_x8 + 16 1 AudioToolboxCore 0x00000001b2da9d80 auoop::RenderPipeUser::~RenderPipeUser() + 112 (AUOOPRenderPipePool.mm:400) 2 AudioToolboxCore 0x00000001b2e110b4 -[AUAudioUnit_XPC internalDeallocateRenderResources] + 92 (AUAudioUnit_XPC.mm:904) 3 AVFAudio 0x00000001bfa4cc04 AUInterfaceBaseV3::Uninitialize() + 60 (AUInterface.mm:524) 4 AVFAudio 0x00000001bfa894bc AVAudioEngineGraph::PerformCommand(AUGraphNodeBaseV3&, AVAudioEngineGraph::ENodeCommand, void*, unsigned int) const + 772 (AVAudioEngineGraph.mm:3317) 5 AVFAudio 0x00000001bfa93550 AVAudioEngineGraph::_Uninitialize(NSError**) + 132 (AVAudioEngineGraph.mm:1469) 6 AVFAudio 0x00000001bfa4b50c AVAudioEngineImpl::Stop(NSError**) + 396 (AVAudioEngine.mm:1081) 7 AVFAudio 0x00000001bfa4b094 -[AVAudioEngine stop] + 48 (AVAudioEngine.mm:193) 8 TextToSpeech 0x00000001c70b3c5c __55-[TTSSynthesisProviderAudioEngine renderSpeechRequest:]_block_invoke + 1756 (TTSSynthesisProviderAudioEngine.m:613) 9 libdispatch.dylib 0x00000001ae4b0740 _dispatch_call_block_and_release + 32 (init.c:1519) 10 libdispatch.dylib 0x00000001ae4b2378 _dispatch_client_callout + 20 (object.m:560) 11 libdispatch.dylib 0x00000001ae4b990c _dispatch_lane_serial_drain + 748 (queue.c:3885) 12 libdispatch.dylib 0x00000001ae4ba470 _dispatch_lane_invoke + 432 (queue.c:3976) 13 libdispatch.dylib 0x00000001ae4c5074 _dispatch_root_queue_drain_deferred_wlh + 288 (queue.c:6913) 14 libdispatch.dylib 0x00000001ae4c48e8 _dispatch_workloop_worker_thread + 404 (queue.c:6507) ... Thread 9 crashed with ARM Thread State (64-bit): x0: 0x0000000283309360 x1: 0x0000000000000000 x2: 0x0000000000000000 x3: 0x00000002833093c0 x4: 0x00000002833093c0 x5: 0x0000000101737740 x6: 0x0000000000000013 x7: 0x00000000ffffffff x8: 0x0000000283309360 x9: 0x3c788942d067009a x10: 0x0000000101547000 x11: 0x0000000000000000 x12: 0x00000000000007fb x13: 0x00000000000007fd x14: 0x000000001ee24020 x15: 0x0000000000000020 x16: 0x0000b1037f729360 x17: 0x000000037f729360 x18: 0x0000000000000000 x19: 0x0000000000000000 x20: 0x00000001016a8de8 x21: 0x0000000283e21d00 x22: 0x0000000283b3f1f8 x23: 0x0000000283098000 x24: 0x00000001bfb4fc35 x25: 0x00000001bfb4fc43 x26: 0x000000028033a688 x27: 0x0000000280c93090 x28: 0x0000000000000000 fp: 0x000000016fc86490 lr: 0x00000001b2da9d80 sp: 0x000000016fc863e0 pc: 0x000000019eeff248 cpsr: 0x1000 esr: 0x92000006 (Data Abort) byte read Translation fault

Machine Learning & AI General Speech Accessibility wwdc2023-10033

21

2

6.8k

Jan ’24

SFSpeechRecognitionResult discards previous transcripts with on-device option set to true

Hi everyone, I might need some help with on-device recognition. It seems that the speech recognition task will discard whatever it has transcribed after a new sentence starts (or it believes it becomes a new sentence) during a single audio session, with requiresOnDeviceRecognition is set to true. This doesn't happen with requiresOnDeviceRecognition set to false. System environment: macOS 14 with Xcode 15, deploying to iOS 17 Thank you all!

Machine Learning & AI General Speech

13

4

1.7k

Oct ’24

Can we incorporate Memoji into our apps?

I need a simple text-to-speech avatar in my iOS app. iOS already has Memojis ready to go - but I cannot find anywhere in the dev docs on how to access Memojis to use in as a tool in app development. Am I missing something? Also - can anyone point me to any resources besides the Apple docs for using AVSpeechSynthesis?

Machine Learning & AI General Speech Developer Program wwdc2023-10033

2

3

2.2k

Mar ’24

Failure of speech recognition when "supportsOnDeviceRecognition" is set to "True".

I am using SFSpeechRecognizer to perform speech recognition, but I am getting the following error. [SpeechFramework] -[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" Setting requiresOnDeviceRecognition to False works correctly, but previously it worked with True with no error. The value of supportsOnDeviceRecognition was True, so the device is recognizing that it supports speech recognition. iPad Pro 11inch iOS 16.5. Is this expected behavior?

Machine Learning & AI General Speech

2

0

2.0k

Feb ’24

Error throws while using the speech recognition service in my app

Recently I updated to Xcode 14.0. I am building an iOS app to convert recorded audio into text. I got an exception while testing the application from the simulator(iOS 16.0). [SpeechFramework] -[SFSpeechRecognitionTask handleSpeechRecognitionDidFailWithError:]_block_invoke Ignoring subsequent recongition error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" Error Domain=kAFAssistantErrorDomain Code=1107 "(null)" I have to know what does the error code means and why this error occurred.

Programming Languages Swift iOS Speech Swift Debugging

19

3

9.6k

Feb ’24

iOS 15 - AVSpeechSynthesizerDelegate didCancel not getting called

in iOS 15, on stopSpeaking of AVSpeechSynthesizer, didFinish delegate method getting called instead of didCancel which is working fine in iOS 14 and below version.

Machine Learning & AI General Speech

3

1

1.2k

Sep ’24

Post

Replies

Boosts

Views

Activity

Speech

Posts under Speech tag

Post

Replies

Boosts

Views

Activity