[TTS] willSpeakRangeOfSpeechString wrong characterRange

XCode 11.4.1. Run on simulators (iOS 13.4.1)


If we write this code for speech (Swift), 'willSpeakRangeOfSpeechString' delegate method will return a wrong character range

let speechSynthesizer = AVSpeechSynthesizer()
let speechUtterance = AVSpeechUtterance(string: "VIA, .\nNeedless to say, 2020 is off to an unprecedented start.")
speechUtterance.voice = AVSpeechSynthesisVoice(language: "en-US")
        
speechUtterance.rate = 0.40
speechUtterance.pitchMultiplier = 0.50
speechUtterance.volume = 0.75
        
speechSynthesizer.speak(speechUtterance)


willSpeakRangeOfSpeechString is wrong when the speaker says "2020".

Here below I list character range and word referenced each time:

character range: {0, 3} -> 'VIA'
character range: {3, 1} -> ','
character range: {5, 1} -> '.'
character range: {7, 8} -> 'Needless'
character range: {16, 2} -> 'to'
character range: {19, 4} -> 'say,'
character range: {47, 4} -> 'cede' // ERROR: instead of 2020 it send back wrong character range
character range: {29, 2} -> 'is'
character range: {32, 3} -> 'off'
character range: {36, 2} -> 'to'
character range: {39, 2} -> 'an'
character range: {42, 13} -> 'unprecedented'
character range: {56, 6} -> 'start.'

Interesting to note that if we change voice language, character range sent back is correct.

speechUtterance.voice = AVSpeechSynthesisVoice(language: "it-IT")


Is it a bug on speech framework on simulators?

Thanks

  • There is no error. Character range: {47, 4} -> 'cede' " is indeed correct if " character range: {42, 13} -> 'unprecedented' "

Add a Comment

Replies

I've created a unit test for it and it seems this issue has been fixed on XCode 11.5

This bug is still present in Xcode 11.6 beta on macOS 10.15.5.

Considering the properly setup delegate....
  • (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer willSpeakRangeOfSpeechString:(NSRange)characterRange utterance:(AVSpeechUtterance *)utterance...

In my case, when the system is speaking.. "....advisable to use the real...."
The word 'use' causes the character range to drift

020-07-03 10:34:53.965403-0700 parsing[5102:78169] word: advisable
2020-07-03 10:34:54.690781-0700 parsing[5102:78169] word: to
2020-07-03 10:34:54.840140-0700 parsing[5102:78169] word: the
2020-07-03 10:34:55.213452-0700 parsing[5102:78169] word: rea

Without code to catch the issue, I hit an exception when attempting to style text that shouldn't be out of range.

Changing to "to use" to "using" gets around the display bug. Although it gets around the issue, I'm having to change text that I should not be changing...
Could you clarify whether this is running on macOS or iOS, and if on iOS, which voice is being used? Based on your code it looks like we will choose Samantha. Does this reproduce on the latest iOS 14 beta?

If this does reproduce in iOS 14, it'd be helpful if you could file a bug through Feedback Assistant and include a sample project that reproduces the issue. If you post your feedback ID here I will take a look. Ranges can become incorrect for a number of reasons, the biggest being rules that are applied to ensure the text is pronounced correctly. We do our best to keep track of when this happens and adjust the ranges before they are sent back to the client, but it's always possible we still have some bugs lying around. We did make some improvements to that logic this year though.

Also just a tip, for TTS questions, it'd be helpful if you can add the Accessibility tag for future questions. Not a requirement, but it will improve the chances it will get seen by an engineer working on the TTS frameworks.