Post

Replies

Boosts

Views

Activity

Reply to AVSpeechSynthesizer buffer conversion, write format bug?
adding missing method and calls var outBuffer : AVAudioPCMBuffer // outBuffer = self.resampleBuffer( inSource: pcmBuffer, newSampleRate: usingSampleRate)! // doesnt work outBuffer = self.convertSpeechBufferToFloatStereo( pcmBuffer ) // doesnt work // outBuffer = pcmBuffer // original format does work func convertSpeechBufferToFloatStereo( _ inSource: AVAudioPCMBuffer ) -> AVAudioPCMBuffer { /* macOS speech buffer is int32ChannelData change format from int32ChannelData to floatChannelData duplicate left channel to right */ let numSamples = AVAudioFrameCount(inSource.frameLength) let sampleRate = inSource.format.sampleRate let outFormat = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32, sampleRate: sampleRate, channels:AVAudioChannelCount(2), interleaved: false) let outSource = AVAudioPCMBuffer(pcmFormat: outFormat!, frameCapacity: numSamples)! outSource.frameLength = numSamples // The framelength must be set to ensure the data is written to disk let sourceChannels = UnsafeBufferPointer(start: inSource.int32ChannelData, count: Int(inSource.format.channelCount)) let destinChannels = UnsafeBufferPointer(start: outSource.floatChannelData, count: Int(outSource.format.channelCount)) let sourceLeftChan = sourceChannels[0] let destinLeftChan = destinChannels[0] let destinRightChan = destinChannels[1] for index in 0 ..< Int(numSamples) { // Must normalize Int32 to Float [-1.0, +1.0] // Int32.max: 2147483647, Int32.min: -2147483648 // let sample = Int32(bigEndian: sourceLeftChan[index]) let sample = sourceLeftChan[index] let floatVal = Float(sample) / Float(Int32.max) destinLeftChan[index] = floatVal destinRightChan[index] = floatVal } return outSource }
Jul ’21
Reply to How to make AVSpeechSynthesizer work for write and delegate (Big Sur)
You are 100% correct. The utterance aloud completed but nothing else worked. I mistakenly thought the process would remain in memory until the completion of speech and the associated delegates fired. For those who may need a working solution, here it is. class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() override init() { super.init() synth.delegate = self } func isSandboxEnvironment() -> Bool { let environ = ProcessInfo.processInfo.environment return ( environ["APP_SANDBOX_CONTAINER_ID"] != nil ) } func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) { print("Utterance didFinish") } func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) { print("speaking range: \(characterRange)") } func selectVoice(targetSpeaker: String, defLangCode: String) -> AVSpeechSynthesisVoice { var usedVoice = AVSpeechSynthesisVoice(language: defLangCode) // should be the default voice let userCode = AVSpeechSynthesisVoice.currentLanguageCode() let voices = AVSpeechSynthesisVoice.speechVoices() for voice in voices { // print("\(voice.identifier) \(voice.name) \(voice.quality) \(voice.language)") if (voice.name.lowercased() == targetSpeaker.lowercased()) { usedVoice = AVSpeechSynthesisVoice(identifier: voice.identifier) break } } // ensure we return a valid voice if (usedVoice == nil) {usedVoice = AVSpeechSynthesisVoice(language: userCode) } return usedVoice! } func speak(_ string: String, speaker: String) { let utterance = AVSpeechUtterance(string: string) utterance.voice = selectVoice(targetSpeaker: speaker, defLangCode: "en-US") synth.speak(utterance) } func writeToBuffer(_ stringToSpeak: String, speaker: String) { print("entering writeToBuffer") let utterance = AVSpeechUtterance(string: stringToSpeak) utterance.voice = selectVoice(targetSpeaker: speaker, defLangCode: "en-US") synth.write(utterance) { (buffer: AVAudioBuffer) in print("executing synth.write") guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if ( pcmBuffer.frameLength == 0 ) { print("buffer is empty") } else { print("buffer has content \(buffer)") } } } func writeToFile(_ stringToSpeak: String, speaker: String) { let utterance = AVSpeechUtterance(string: stringToSpeak) var output : AVAudioFile? let desktop = "~/Desktop" let fileName = "Utterance_Test.caf" // not in sandbox var tempPath = desktop + "/" + fileName tempPath = (tempPath as NSString).expandingTildeInPath // if sandboxed, it goes in the container if ( isSandboxEnvironment() ) { tempPath = "Utterance_Test.caf" } utterance.voice = selectVoice(targetSpeaker: speaker, defLangCode: "en-US") synth.write(utterance) { (buffer: AVAudioBuffer) in guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if ( pcmBuffer.frameLength == 0 ) { // done } else { // append buffer to file if ( output == nil ) { let bufferSettings = utterance.voice?.audioFileSettings output = try! AVAudioFile( forWriting: URL(fileURLWithPath: tempPath),settings: bufferSettings!) } try! output?.write(from: pcmBuffer) } } } } class ViewController: NSViewController { let speechDelivery = SpeakerTest() override func viewDidLoad() { super.viewDidLoad() let targetSpeaker = "Allison" var sentenceToSpeak = "This writes to buffer and disk." sentenceToSpeak += "Also, 'didFinish' and 'willSpeakRangeOfSpeechString' delegates fire." speechDelivery.writeToBuffer(sentenceToSpeak, speaker: targetSpeaker) speechDelivery.speak(sentenceToSpeak, speaker: targetSpeaker) speechDelivery.writeToFile(sentenceToSpeak, speaker: targetSpeaker) } override var representedObject: Any? { didSet { // Update the view, if already loaded. } } }
Jun ’21
Reply to How to observe AVKit (AVPlayer) player rate >= 2.0?
I was still seeking a resolution, alternative method or property to observe for identifying a fast-forward behavior. https://developer.apple.com/forums/thread/663489 AVPlayer.timeControlStatus and AVPlayer.rate are wrong during fast forward or backward I found above post, which describes the same behavior. When the fast-forward is engaged, the "rate" reported is zero, although it is clearly not paused but moving at a fast rate. Unfortunately, I have not identified a solution.
Jun ’21
Reply to How to observe AVKit (AVPlayer) player rate >= 2.0?
oddly, I cant edit the post to address the formatting (missing newlines) and the image didnt show up but here is the image to clarify what I mean regarding the fast-forward control. Option-click here will increment the value by 0.1 from 1.0 to 2.0 (e.g. 1.1, 1.2, 1.3x). Command-click here will rotate between 2x,5x, 10x, 30x and 60x. The property observer reports back 0.0 once you command-click.
Jun ’21
Reply to Speech Recognition and how to utilze SFTranscriptionSegment to track recognition timestamps?
Conclusion: This can't be done with SFSpeechURLRecognitionRequest(url:) You must utilize SFSpeechAudioBufferRecognitionRequest() The solution is to utilize SFSpeechAudioBufferRecognitionRequest and read the audio into a buffer then either shift the entire audio block left (left trimming, removing the previously recognized speech segment) after every recognition or to feed SFSpeechAudioBufferRecognitionRequest 60 second snippets of audio. Also because the progress didn't work, I used the running count to determine the current position that was being recognized relative to the length of the audio, to determine the progress. Requirement You must keep a rolling count of where you segments were found to keep track of your position. Caveats If the request was short (let's say 30 seconds), recognition will not proceed, so a segment that was short must be padded with silence and adjusted for in your accounting If the audio buffer contains blocks that are more than 1 minute of non-speech (could be silence, music, unintelligible speech), you must wait for a timeout and then advance 60 seconds, otherwise you will just timeout and not get any further recognition data. I have not been able to determine how to shorten the timeout which appears to be 22 seconds. Example: If you have audio with a two minute stretch of non-speech, you will need to wait 22 seconds after the first timeout, cancel the request, then advance to the next position, append the audio and wait for the recognition request, which is another 22 seconds for the timeout before again advancing. So, if the audio contains many stretches of non-speech, this process works but is problematic in terms of processing time. Granted a 22 second timeout is better than a 1 minute timeout. I am still tuning this process but it does work.
May ’21
Reply to How to make AVSpeechSynthesizer work for write and delegate (Catalina)
I also tried considered that the callback didnt execute because of using the write method, so I only executed the speak method. The speech was heard, however, the callbacks were not executed. I am at a loss for what is needed to make this work. I will gladly continue to utilize NSSpeechSynthesizer if someone has information on addSpeechDictionary. This is my only impediment in speech, getting the correct pronunciation without resorting to creating spelling workarounds for word pronunciations. }
Apr ’21
Reply to How to make AVSpeechSynthesizer work for write and delegate (Catalina)
I took the plunge today and upgraded to BigSur (11.2.3) and unfortunately, neither write, nor the call backs for didFinish nor willSpeakRangeOfSpeechString get called. I only upgraded to utilize this framework. Is this working for anyone and if so, what is the secret? Is there an entitlement that must be enabled? In the writeToBuffer function, none of the print statements are executed, so synth.write(utterance) is not executing. Any advice? } func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) { print("Utterance didFinish") } }
Apr ’21
Reply to How to make AVSpeechSynthesizer work for write and delegate (Catalina)
Fixed -- YES! Thank you for that information. I'm not going mad :-). Hopefully this is not a silly question but is there a way to resolve or work around this in Catalina? It would seem a major upgrade as a fix is a big ask. Also does AVSpeechSynthesizer support offline rendering for faster than real-time. Its not something I can test under Catalina and I havent been able to find documentation for this feature, as it does exist in NSSpeechSynthesizer. Why? I have been using NSSpeechSynthesizer for years, using an AVAudioEngine + Speech mixing workflow for recording audio to a file, faster than real-time rendering. I have a system and it works but I must workaround some challenges with pronunciations. I cant make the speech dictionary (addSpeechDictionary) work under NSSpeechSynthesizer (I dont know if its a known problem or just me) and resorted to butchered spellings to get a voice to pronounce a words correctly. My hope is that I can utilize AVSpeechSynthesizer in IPA mode or some other method to pronounce words correctly while also rendering to disk, all in faster than real-time.
Apr ’21