Speech Recognition Problem in iOS 18.0

Question

Created Aug ’24

Replies 35

Boosts 8

Views 2.6k

Participants 16

It looks like Apple has added some new API(s) to SFSpeechRecognition My app, which is currently listed on App Store does feature speech recognition. Yet, trying to use it under iOS 18.0 throws errors: -[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" What happens is that after several words are transcribed and displayed, the next sentence results in previous words disappearance. That's probably what that portion of the error text - "Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" means. The problem occurs ONLY when the app is running under iOS 18.0 Even when it's compiled in Xcode 16.0 using iOS 17.5 everything works fine. Any suggestions?

Boost

Answer 1

Squids OP

Sep ’24

I may have come up with a solution for now. I closer into SFSpeechRecognitionResult -> SFSpeechRecognitionMetadata and saw that there was a variable 'speechDuration'.

Turns out that speechDuration will spit out how long the previous utterance was. And while speech is coming in it will default to nil. So with that, I created another published var "accumulatedTranscript" and checked to see if speechDuration != nil then append whatever the current transcript is, then reset the transcript to an empty string (to clear out the UI's text).

For the UI I'm using a combined var of accumulatedTranscript + transcript to give the appearance of a continuous stream of text. And from my screenshots you can see it will use the last transcript/final result that comes in after the pause

Some things worth noting:

I haven't seen iOS17 display a non-nil speech duration so this solution shouldn't affect how iOS17 works but there may be some edge cases I'm not able to think of now.
The new transcript appended will begin with a capital letter, you'll want to deal with this however you need to for your app (for me, I'll just make everything past the first word lowercase since the pause timer is finicky).
I haven't done a robust test of this solution yet but I've tested on iOS18 simulator and physical device and iOS17 simulator only
I'm not sure how this workaround will affect any changes Apple might make to address this so, you know, keep that in mind.

Screenshot 2024-09-20 at 3.03.03 PM.png Screenshot 2024-09-20 at 3.04.11 PM.png Screenshot 2024-09-20 at 3.33.01 PM.png

IMG_1304 1.png

2

Answer 2

DTS Engineer OP

Apple

Sep ’24

Thanks for those bug numbers (FB15166325, FB15192539). Those are both quite new, filed within the last few days, so there no news to report on that front yet.

Does anyone have a bug they filed earlier in the beta cycle?

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

0

Answer 3

iaborodin OP

Sep ’24

@DTS Engineer

Another # FB15245186 though it's even more recent then the previous ones.

0

Answer 4

peterwarbo OP

Sep ’24

Wow that's pretty incredible this bug snuck into iOS 18.

0

Answer 5

apple-man OP

Sep ’24

There is also FB15110263 and FB15110251

0

Answer 6

LokeshKumar OP

4w

I'm experiencing the same issue on iOS 18, although it works fine on older versions. The problem is that I'm receiving partial results, but the text disappears and returns as empty later in the repeated callbacks.

Adding the screenshot and code for reference here.

import UIKit import Speech

public protocol SpeechRecognizerWrapperDelegate: AnyObject { func speechRecognitionFinished(transcription: String) func speechRecognitionPartialResult(transcription: String) func speechRecognitionRecordingNotAuthorized(statusMessage: String) func speechRecognitionTimedOut() }

public class SpeechRecognizerWrapper: NSObject, SFSpeechRecognizerDelegate { public weak var delegate: SpeechRecognizerWrapperDelegate?

private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: (LocalData.sharedInstance.UPAppLanguage == LanguageCode.Hindi.rawValue) ? "hi-IN" : "en-IN"))!

private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?

private var recognitionTask: SFSpeechRecognitionTask?

private let audioEngine = AVAudioEngine()
var notAuthorise = true
var noAuthStatus = ""
var allPermissionGranted:(()->())?
public override init() {
    super.init()
    setupSpeechRecognition()
}

private func setupSpeechRecognition() {
    speechRecognizer.delegate = self
}

func requestAuthorization() {
    if SFSpeechRecognizer.authorizationStatus() == .authorized && AVAudioSession.sharedInstance().recordPermission == .granted {
        self.notAuthorise = false
        return
    }
    self.notAuthorise = true
    SFSpeechRecognizer.requestAuthorization { [weak self] authStatus in
        guard let self = self else { return }
        /*
         The callback may not be called on the main thread. Add an
         operation to the main queue to update the record button's state.
         */
        OperationQueue.main.addOperation {
            if authStatus != .authorized {
                self.notAuthorise = true
                self.noAuthStatus = ""
                if authStatus == .denied {
                    self.noAuthStatus = "User denied access to speech recognition"
                } else if authStatus == .restricted {
                    self.noAuthStatus = "Speech recognition restricted on this device"
                }
            } else {
                self.checkTheRecord()
                self.notAuthorise = false
            }
        }
    }
}

func checkTheRecord() {
    switch AVAudioSession.sharedInstance().recordPermission {
    case AVAudioSession.RecordPermission.granted:

// self.allPermissionGranted?() break case AVAudioSession.RecordPermission.denied: break case AVAudioSession.RecordPermission.undetermined: AVAudioSession.sharedInstance().requestRecordPermission({ [weak self] (granted) in if granted { // self?.allPermissionGranted?() } else { self?.notAuthorise = true } }) default: break } }

private var speechRecognitionTimeout: Timer?

public var speechTimeoutInterval: TimeInterval = 2 {
    didSet {
        restartSpeechTimeout()
    }
}

private func restartSpeechTimeout() {
    speechRecognitionTimeout?.invalidate()
    speechRecognitionTimeout = Timer.scheduledTimer(timeInterval: speechTimeoutInterval, target: self, selector: #selector(timedOut), userInfo: nil, repeats: false)
}

public func startRecording() throws {
    if let recognitionTask = recognitionTask {
        recognitionTask.cancel()
        self.audioEngine.stop()
        self.audioEngine.inputNode.removeTap(onBus: 0)
        self.recognitionTask = nil
        self.recognitionRequest = nil
        self.recognitionTask = nil
    }

    let audioSession = AVAudioSession.sharedInstance()
    try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
    try audioSession.setActive(true, options: .notifyOthersOnDeactivation)

    recognitionRequest = SFSpeechAudioBufferRecognitionRequest()

    let inputNode = audioEngine.inputNode

    let mixerNode = AVAudioMixerNode()
    audioEngine.attach(mixerNode)
    audioEngine.connect(inputNode, to: mixerNode, format: nil)

    guard let recognitionRequest = recognitionRequest else { return }

    // Configure request so that results are returned before audio recording is finished
    recognitionRequest.shouldReportPartialResults = true

    // A recognition task represents a speech recognition session.
    // We keep a reference to the task so that it can be cancelled.
    recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { [weak self] result, error in
        guard let self = self else { return }
        var isFinal = false
        if let result = result {
            print("formattedString: \(result.bestTranscription.formattedString)")
            isFinal = result.isFinal
            self.delegate?.speechRecognitionPartialResult(transcription: result.bestTranscription.formattedString)
        }

        if error != nil || isFinal {
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)

            self.recognitionRequest = nil
            self.recognitionTask = nil
        }

        if isFinal {
            self.delegate?.speechRecognitionFinished(transcription: result!.bestTranscription.formattedString)
            self.stopRecording()
        } else {
            if error == nil {
                self.restartSpeechTimeout()
            } else {
                // cancel voice recognition
            }
        }
    }

    let recordingFormat = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { [weak self] (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
        guard let self = self else { return }
        self.recognitionRequest?.append(buffer)
    }

    audioEngine.prepare()

    try audioEngine.start()
}

@objc private func timedOut() {
    stopRecording()

    self.delegate?.speechRecognitionTimedOut()
}

public func stopRecording() {
    audioEngine.stop()
    audioEngine.inputNode.removeTap(onBus: 0) // Remove tap on bus when stopping recording.

    recognitionRequest?.endAudio()

    speechRecognitionTimeout?.invalidate()
    speechRecognitionTimeout = nil
}

}

Screenshot 2024-10-01 at 3.24.53 PM.png

0

Answer 7

jsnbro OP

4w

iOS 18.1 Beta 5 (22B5054e) seems to have resolved this issue and improved U.S. English language recognition & punctuation.

https://developer.apple.com/download/

Here's hoping its Speech framework makes it into the next release.

0

Answer 8

righteoustales OP

4w

18.1 Beta 5 (22B5054e) does not fix it. Not quite.

--- @jsnbro stated above "iOS 18.1 Beta 5 (22B5054e) seems to have resolved this issue and improved U.S. English language recognition & punctuation."

I upgraded to 22B5054e to re-test. What I am seeing is not quite a fix. It seems to have reverted back to the behavior I saw (and reported in this thread on page 1) on iOS 17.6, specifically this:

the bug does not manifest if you set requiresOnDeviceRecognition = false
the bug does manifest if you set requiresOnDeviceRecognition = true

As before I am using Apple's SpokenWord example app to test.

My first bug report here was using: (Context: iphone12 running 17.6.1, XCode Version 15.4 (15F31d))

For this update: (Context: iphone12 running 18.1 Beta (22B5054e), XCode Version 16.0 (16A242d))

Tagging you, @DTS Engineer. Looks like your efforts are helping.

0

Answer 9

DTS Engineer OP

Apple

4w

Looks like your efforts are helping.

Nah, I’m just watching the bugs go by |-:

Seriously though folks, if you have a product that’s affected by this issue and you haven’t already filed a bug, please do so, and post your bug number here, just for the record.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

FB15166325, FB15192539, FB15245186, FB15110263, FB15110251

1

Answer 10

iaborodin OP

3w

@DTS Engineer

I did receive a request from Apple to clarify the framework(s):

"Apple Sep 26, 2024 at 1:53 PM Engineering has requested the following information regarding your report:

Is this with mainstream Dictation or Voice Control?"

Sure enough I clarified that it's Dictation and the frameworks I used are SFSpeechRecognizer, SFSpeechAudioBufferRecognitionRequest, etc.

As you can see it was a week ago, and so far I haven't heard from them.

What surprises me is that I don't see other reports on that bug. All of the numbers except mine - FB15245186 - return 'Not found'. Needless to say that if I get any response, I'll post it here.

0

Answer 11

DTS Engineer OP

Apple

3w

All of the numbers except mine … return 'Not found'.

That’s expected. Feedback Assistant only shows you bugs that you filed [1]. I address this explicitly in Bug Reporting: How and Why?.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] Or members of your team, if you use that feature.

0

Answer 12

iaborodin OP

3w

@DTS Engineer

I've read you post about the rules of Feedback Assistant. Thank you for clarifying certain points. Nevertheless, I can't help asking a question: if "Feedback Assistant only shows you bugs that you filed", then what the purpose of the header of the page: "Recent Similar Reports:None Resolution:Open"

0

Answer 13

DTS Engineer OP

Apple

3w

I guess that boils down to your definition of “shows”. IMO showing you a count of similar bugs isn’t showing your the other bugs. You can’t, for example, see the titles of the bugs, the initial problem description, the attachments, any communication with the originator, and so on. That’s the definition of “shows” that I’m using.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

0

Answer 14

remeja OP

2w

Another report here: FB15498488

0

Answer 15

righteoustales OP

1w

Is anyone seeing progress on their submitted bugs in Feedback Assistant? I just checked mine and was disappointed to see it updated with:

Resolution:Investigation complete - Unable to diagnose with current information

I submitted details about my phone/build. I told them very specifically how to repro the bug using the Apple-written example app 'SpokenWord'. I provided an MP4 showing that app running and manifesting the bug. I provided links to other reports of this same bug (other FB* submissions) in this thread.

I'm not sure what is missing with respect to being able to diagnose it.

Is anyone else having better luck than me?

As of the latest beta (18.1 22B5075a), the 'dropping words' bug as reported here still occurs if requiresOnDeviceRecognition is set to true.

0