How to chain speech utterances

Hi, I spoke to someone during the audio lab today in relation to audio output of Chinese dialects. So I can now get cantonese and mandarin outputted. But now I'm trying to read the same line of text in both dialects, and the audio overlaps each other. What do I need to change in my code to create a pause between the two utterances rather than reading both at the same time?

Code Block
let cantoUtterance = AVSpeechUtterance(string: chinese)
//utterance.voice = AVSpeechSynthesisVoice(language: "zh-HK")
cantoUtterance.voice = AVSpeechSynthesisVoice(identifier: "com.apple.ttsbundle.Sin-Ji-compact")
// utterance.rate = 0.1
let cantoSynthesizer = AVSpeechSynthesizer()
cantoSynthesizer.speak(cantoUtterance)
let mandoUtterance = AVSpeechUtterance(string: chinese)
mandoUtterance.voice = AVSpeechSynthesisVoice(language: "zh-CN")
let mandoSynthesizer = AVSpeechSynthesizer()
mandoSynthesizer.speak(mandoUtterance)

Answered by Engineer in 617741022
Depending on how you are using this code, it's possible that the synthesizer is being deallocated. You need to retain the synthesizer until both utterances have been spoken. You should set up the synthesizer as a property on your class and initialize it during an init method, and then use it when you want to speak your utterances. If you put all of this in one code block, and then the code block ends, your synthesizer will be deallocated which could explain why your remaining utterances are not spoken.

If you are sure your synthesizer is being retained, please file a bug through feedback assistant and include a sample project that reproduces the issue, and paste the bug number here.
You should be using one synthesizer for these 2 utterances, rather than having 2 synthesizers (one for Mandarin and one for Cantonese). The voice and language are set at the utterance level, so you can queue multiple utterances with different voices using the same synthesizer. This should queue the utterances so that they don’t interrupt each other.

Code Block swift
let synth = AVSpeechSynthesizer()
let cantoUtterance = AVSpeechUtterance(string: chinese)
cantoUtterance.voice = AVSpeechSynthesisVoice(identifier: "com.apple.ttsbundle.Sin-Ji-compact")
synth.speak(cantoUtterance)
let mandoUtterance = AVSpeechUtterance(string: chinese)
mandoUtterance.voice = AVSpeechSynthesisVoice(language: "zh-CN")
synth.speak(mandoUtterance)

Make sure that your synthesizer is retained until the speech finishes as well, consider setting it as a property on your class.
I just tried this, and only the first utterance is outputted in Cantonese, no audio output for the second Mandarin utterance. Do you know why this is?
I'm not sure without more detail. If you post some sample code that reproduces the issue, I could try to investigate further. I did just test on the iOS 14 Beta and calling speak on one synth with 2 utterances using the string "Test" with each utterance having different voices (one zh-CN and one zh-HK), worked as expected (both strings were spoken by their respective voices in succession)
I tried the code you posted above:

Code Block
let synth = AVSpeechSynthesizer()
let cantoUtterance = AVSpeechUtterance(string: chinese)
cantoUtterance.voice = AVSpeechSynthesisVoice(identifier: "com.apple.ttsbundle.Sin-Ji-compact")
synth.speak(cantoUtterance)
let mandoUtterance = AVSpeechUtterance(string: chinese)
mandoUtterance.voice = AVSpeechSynthesisVoice(language: "zh-CN")
synth.speak(mandoUtterance)

and on my Watch I only hear the first utterance.
Accepted Answer
Depending on how you are using this code, it's possible that the synthesizer is being deallocated. You need to retain the synthesizer until both utterances have been spoken. You should set up the synthesizer as a property on your class and initialize it during an init method, and then use it when you want to speak your utterances. If you put all of this in one code block, and then the code block ends, your synthesizer will be deallocated which could explain why your remaining utterances are not spoken.

If you are sure your synthesizer is being retained, please file a bug through feedback assistant and include a sample project that reproduces the issue, and paste the bug number here.
Thanks - I created it as a property and it is no longer being deallocated after the first utterance.
How to chain speech utterances
 
 
Q