Posts

Post not yet marked as solved
0 Replies
430 Views
Could you provide guidance on how to add chapter marks to an M4A. I've been attempting bookmark. From what I've read, it requires the use of AVMetadataKey.quickTimeUserDataKeyChapter track.addTrackAssociation(to: ... type: .chapterList) or both. I've looked into AVTimedMetadataGroup but I havent found a way to get it added based on the documentation. I also havent found anyone who has used native Swift to add chapter marks. They've always given in and used ffmpeg or some other external solution. inputURL is for the file that is being read in outputURL is for the the final file chapters is an array of dictionaries, where time is the start of each chapter and its name in the list The target is macOS import AVFoundation class AudioChapterCreator { // Function to create an audio file with chapters and a chapter list func createAudioFileWithChapters(inputURL: URL, outputURL: URL, chapters: [(time: CMTime, title: String)]) { let options = [AVURLAssetPreferPreciseDurationAndTimingKey: true] let asset = AVURLAsset(url: inputURL, options: options) let durationInSeconds = CMTimeGetSeconds(asset.duration) print("asset durationInSeconds: \(durationInSeconds)") guard let audioTrack = asset.tracks(withMediaType: .audio).first else { print("Error: Unable to find audio track in the asset.") return } // Create metadata items for chapters let chapterMetadataItems = chapters.map { chapter -> AVMetadataItem in let item = AVMutableMetadataItem() // this duration is just for testing let tempDur = CMTime(seconds: 100, preferredTimescale: 1) item.keySpace = AVMetadataKeySpace.quickTimeUserData item.key = AVMetadataKey.quickTimeUserDataKeyChapter as NSString item.value = chapter.title as NSString item.time = chapter.time item.duration = tempDur return item } // Create an AVAssetExportSession for writing the output file guard let exportSession = AVAssetExportSession(asset: asset, presetName: AVAssetExportPresetAppleM4A) else { print("Error: Unable to create AVAssetExportSession.") return } // Configure the AVAssetExportSession exportSession.outputFileType = .m4a exportSession.outputURL = outputURL exportSession.metadata = asset.metadata + chapterMetadataItems exportSession.timeRange = CMTimeRangeMake(start: CMTime.zero, duration: asset.duration); // Export the audio file exportSession.exportAsynchronously { switch exportSession.status { case .completed: print("Audio file with chapters and chapter list created successfully.") case .failed: print("Error: Failed to create the audio file.") case .cancelled: print("Export cancelled.") default: print("Export failed with unknown status.") } } } }
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
1 Replies
1.6k Views
I am attempting to utilize alternative pronunciation utilizing the IPA notation for AVSpeechSynthesizer on macOS (Big Sur 11.4). The attributed string is being ignored and so the functionality is not working. I tried this on iOS simulator and it works properly. The India English voice pronounces the word "shame" as shy-em, so I applied the correct pronunciation but no change was heard. I then substituted the pronunciation for a completely different word but there was no change. Is there something else that must be done to make this work? AVSpeechSynthesisIPANotationAttribute Attributed String: It's a '{ }shame{ AVSpeechSynthesisIPANotationAttribute = "\U0283\U02c8e\U0361\U026am"; }' it didn't work out.{ } Target Range: {8, 5} Target String: shame, Substitution: ʃˈe͡ɪm Attributed String: It's a '{ }shame{ AVSpeechSynthesisIPANotationAttribute = "\U0283\U02c8e\U0361\U026am"; }' it didn't work out.{ } Target Range: {8, 5} Target String: shame, Substitution: ʃˈe͡ɪm Attributed String: It's a '{ }shame{ AVSpeechSynthesisIPANotationAttribute = "t\U0259.\U02c8me\U0361\U026a.do\U0361\U028a"; }' it didn't work out.{ } Target Range: {8, 5} Target String: shame, Substitution: tə.ˈme͡ɪ.do͡ʊ Attributed String: It's a '{ }shame{ AVSpeechSynthesisIPANotationAttribute = "t\U0259.\U02c8me\U0361\U026a.do\U0361\U028a"; }' it didn't work out.{ } Target Range: {8, 5} Target String: shame, Substitution: tə.ˈme͡ɪ.do͡ʊ class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() func speakIPA_Substitution(subst: String, voice: AVSpeechSynthesisVoice) { let text = "It's a 'shame' it didn't work out." let mutAttrStr = NSMutableAttributedString(string: text) let range = NSString(string: text).range(of: "shame") let pronounceKey = NSAttributedString.Key(rawValue: AVSpeechSynthesisIPANotationAttribute) mutAttrStr.setAttributes([pronounceKey: subst], range: range) let utterance = AVSpeechUtterance(attributedString: mutAttrStr) utterance.voice = voice utterance.postUtteranceDelay = 1.0 let swiftRange = Range(range, in: text)! print("Attributed String: \(mutAttrStr)") print("Target Range: \(range)") print("Target String: \(text[swiftRange]), Substitution: \(subst)\n") synth.speak(utterance) } func customPronunciation() { let shame = "ʃˈe͡ɪm" // substitute correct pronunciation let tomato = "tə.ˈme͡ɪ.do͡ʊ" // completely different word pronunciation let britishVoice = AVSpeechSynthesisVoice(language: "en-GB")! let indiaVoice = AVSpeechSynthesisVoice(language: "en-IN")! speakIPA_Substitution(subst: shame, voice: britishVoice) // already correct, no substitute needed // pronounced incorrectly and ignoring the corrected pronunciation from IPA Notation speakIPA_Substitution(subst: shame, voice: indiaVoice) // ignores substitution speakIPA_Substitution(subst: tomato, voice: britishVoice) // ignores substitution speakIPA_Substitution(subst: tomato, voice: indiaVoice) // ignores substitution } }
Posted
by MisterE.
Last updated
.
Post marked as solved
3 Replies
2.2k Views
I am unable to get AVSpeechSynthesizer to write or to acknowledge the delegate actions . I was informed this was resolved in macOS 11. I thought it was a lot to ask but am now running on macOS 11.4 (Big Sur). My target is to output speech faster than real-time and and drive the output through AVAudioengine. First, I need to know why the write doesnt occur and neither do delegates get called whether I am using write or simply uttering to the default speakers in "func speak(_ string: String)". What am I missing? Is there a workaround? Reference: https://developer.apple.com/forums/thread/678287 let sentenceToSpeak = "This should write to buffer and also call 'didFinish' and 'willSpeakRangeOfSpeechString' delegates." SpeakerTest().writeToBuffer(sentenceToSpeak) SpeakerTest().speak(sentenceToSpeak) class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() override init() { super.init() synth.delegate = self } func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) { print("Utterance didFinish") } func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) { print("speaking range: \(characterRange)") } func speak(_ string: String) { let utterance = AVSpeechUtterance(string: string) var usedVoice = AVSpeechSynthesisVoice(language: "en") // should be the default voice let voices = AVSpeechSynthesisVoice.speechVoices() let targetVoice = "Allison" for voice in voices { // print("\(voice.identifier) \(voice.name) \(voice.quality) \(voice.language)") if (voice.name.lowercased() == targetVoice.lowercased()) { usedVoice = AVSpeechSynthesisVoice(identifier: voice.identifier) break } } utterance.voice = usedVoice print("utterance.voice: \(utterance.voice)") synth.speak(utterance) } func writeToBuffer(_ string: String) { print("entering writeToBuffer") let utterance = AVSpeechUtterance(string: string) synth.write(utterance) { (buffer: AVAudioBuffer) in print("executing synth.write") guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if pcmBuffer.frameLength == 0 { print("buffer is empty") } else { print("buffer has content \(buffer)") } } } }
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
1 Replies
1.4k Views
Can you perform two or more OFFLINE speech recognition tasks simultaneously? SFSpeechRecognizer, SFSpeechURLRecognitionRequest offline limitation? Running on macOS Big Sur 11.5.2 I would like to be perform two or more offline speech recognition tasks simultaneously. I've executed two tasks in the same application AND executed two different applications, both using offline recognition. Once I initiate the other thread or other application, the first recognition stops. Since the computer supports multiple threads, I planned to take make use of the concurrency. Use cases #1 multiple Audio or video files that I wish to transcribe -- cuts down on the wait time. #2 split a single large file up into multiple sections and stitch the results together -- again cuts down on the wait time. I set on device recognition to TRUE because my target files can be up to two hours in length. My test files are 15-30 minutes in length and I have a number of them, so recognition must be done on the device. func recognizeFile_Compact(url:NSURL) { let language = "en-US" //"en-GB" let recognizer = SFSpeechRecognizer(locale: Locale.init(identifier: language))! let recogRequest = SFSpeechURLRecognitionRequest(url: url as URL) recognizer.supportsOnDeviceRecognition = true // ensure the DEVICE does the work -- don't send to cloud recognizer.defaultTaskHint = .dictation // give a hint as dictation recogRequest.requiresOnDeviceRecognition = true // don recogRequest.shouldReportPartialResults = false // we dont want partial results var strCount = 0 let recogTask = recognizer.recognitionTask(with: recogRequest, resultHandler: { (result, error) in guard let result = result else { print("Recognition failed, \(error!)") return } let text = result.bestTranscription.formattedString strCount += 1 print(" #\(strCount), "Best: \(text) \n" ) if (result.isFinal) { print("WE ARE FINALIZED") } }) }
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
5 Replies
2.1k Views
Is the format description AVSpeechSynthesizer for the speech buffer is correct? When I attempt to convert it, I get back noise from two different conversion methods. I am seeking to convert the speech buffer provided by the AVSpeechSynthesizer "func write(_ utterance: AVSpeechUtterance..." method. The goal is to convert the sample type, change the sample rate and change from mono to stereo buffer. I later manipulate the buffer data and pass it through AVAudioengine. For testing purposes, I have kept the sample rate to the original 22050.0 What have I tried? I have a method that I've been using for years named "resampleBuffer" that does this. When I apply it to the speech buffer, I get back noise. When I attempt to manually convert format and to stereo with "convertSpeechBufferToFloatStereo", I am getting back clipped output. I tested flipping the samples, addressing the Big Endian, Signed Integer but that didn't work. The speech buffer description is inBuffer description: <AVAudioFormat 0x6000012862b0: 1 ch, 22050 Hz, 'lpcm' (0x0000000E) 32-bit big-endian signed integer> import Cocoa import AVFoundation class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() override init() { super.init() } func resampleBuffer( inSource: AVAudioPCMBuffer, newSampleRate: Double) -> AVAudioPCMBuffer? { // resample and convert mono to stereo var error : NSError? let kChannelStereo = AVAudioChannelCount(2) let convertRate = newSampleRate / inSource.format.sampleRate let outFrameCount = AVAudioFrameCount(Double(inSource.frameLength) * convertRate) let outFormat = AVAudioFormat(standardFormatWithSampleRate: newSampleRate, channels: kChannelStereo)! let avConverter = AVAudioConverter(from: inSource.format, to: outFormat ) let outBuffer = AVAudioPCMBuffer(pcmFormat: outFormat, frameCapacity: outFrameCount)! let inputBlock : AVAudioConverterInputBlock = { (inNumPackets, outStatus) -> AVAudioBuffer? in outStatus.pointee = AVAudioConverterInputStatus.haveData // very important, must have let audioBuffer : AVAudioBuffer = inSource return audioBuffer } avConverter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Mastering avConverter?.sampleRateConverterQuality = .max if let converter = avConverter { let status = converter.convert(to: outBuffer, error: &error, withInputFrom: inputBlock) // print("\(status): \(status.rawValue)") if ((status != .haveData) || (error != nil)) { print("\(status): \(status.rawValue), error: \(String(describing: error))") return nil // conversion error } } else { return nil // converter not created } // print("success!") return outBuffer } func writeToFile(_ stringToSpeak: String, speaker: String) { var output : AVAudioFile? let utterance = AVSpeechUtterance(string: stringToSpeak) let desktop = "~/Desktop" let fileName = "Utterance_Test.caf" // not in sandbox var tempPath = desktop + "/" + fileName tempPath = (tempPath as NSString).expandingTildeInPath let usingSampleRate = 22050.0 // 44100.0 let outSettings = [ AVFormatIDKey : kAudioFormatLinearPCM, // kAudioFormatAppleLossless AVSampleRateKey : usingSampleRate, AVNumberOfChannelsKey : 2, AVEncoderAudioQualityKey : AVAudioQuality.max.rawValue ] as [String : Any] // temporarily ignore the speaker and use the default voice let curLangCode = AVSpeechSynthesisVoice.currentLanguageCode() utterance.voice = AVSpeechSynthesisVoice(language: curLangCode) // utterance.volume = 1.0 print("Int32.max: \(Int32.max), Int32.min: \(Int32.min)") synth.write(utterance) { (buffer: AVAudioBuffer) in guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if ( pcmBuffer.frameLength == 0 ) { // done } else { // append buffer to file var outBuffer : AVAudioPCMBuffer outBuffer = self.resampleBuffer( inSource: pcmBuffer, newSampleRate: usingSampleRate)! // doesnt work // outBuffer = self.convertSpeechBufferToFloatStereo( pcmBuffer ) // doesnt work // outBuffer = pcmBuffer // original format does work if ( output == nil ) { //var bufferSettings = utterance.voice?.audioFileSettings // Audio files cannot be non-interleaved. var outSettings = outBuffer.format.settings outSettings["AVLinearPCMIsNonInterleaved"] = false let inFormat = pcmBuffer.format print("inBuffer description: \(inFormat.description)") print("inBuffer settings: \(inFormat.settings)") print("inBuffer format: \(inFormat.formatDescription)") print("outBuffer settings: \(outSettings)\n") print("outBuffer format: \(outBuffer.format.formatDescription)") output = try! AVAudioFile( forWriting: URL(fileURLWithPath: tempPath),settings: outSettings) } try! output?.write(from: outBuffer) print("done") } } } } class ViewController: NSViewController { let speechDelivery = SpeakerTest() override func viewDidLoad() { super.viewDidLoad() let targetSpeaker = "Allison" var sentenceToSpeak = "" for indx in 1...10 { sentenceToSpeak += "This is sentence number \(indx). [[slnc 3000]] \n" } speechDelivery.writeToFile(sentenceToSpeak, speaker: targetSpeaker) } } Three test can be performed. The only one that works is to directly write the buffer to disk Is this really "32-bit big-endian signed integer"? Am I addressing this correctly or is this a bug? I'm on macOS 11.4
Posted
by MisterE.
Last updated
.
Post marked as solved
1 Replies
1.4k Views
How do you register to catch these notifications from going to XCode console? The messages occur whenever I execute a URLSession.shared.dataTask, which is often. The messages are not an indication the code has faulted but are notifications for unknown reasons that fill the console. You should only get a notification if something is wrong. How do you register to catch this message so it does not go to the XCode console? nw_endpoint_handler_set_adaptive_read_handler [C14.1 104.21.42.21:443 ready socket-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for read_timeout failed nw_endpoint_handler_set_adaptive_write_handler [C14.1 104.21.42.21:443 ready socket-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for write_timeout failed I believe this will duplicate the issue func queryEndpoint(address: String) { let url = URL(string: address) let task = URLSession.shared.dataTask(with: url!) {(data, response, error) in let result = String(data: data!, encoding: String.Encoding.utf8)! print(result) } task.resume() }
Posted
by MisterE.
Last updated
.
Post marked as solved
1 Replies
984 Views
I upgraded to XCode 13. Previously versions of XCode would show the variable name and type in Quickhelp, when you clicked on the variable "let classRef: ViewController" (example) or a user defined methods, it would show its declaration information. Now, quick help only shows information when you click on a built-in function or a function parameter I thought quitting XCode or cleaning the project would resolve this but it did not. This features was extremely beneficial, where you could easily click on an item and copy its declaration information or just see the types being reference. How do I get the previous behavior back?
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
0 Replies
701 Views
I am utilizing macOS and would like to know how to create accurate or alternate pronunciations using AVSpeechSynthesizer. Is there a guide or document that indicates the unicode symbols that are used or accepted for the IPA notation? The only method that I've found is to create or obtain pronunciations is through an iPhone. References: AVSpeechSynthesisIPANotationAttribute https://developer.apple.com/videos/play/wwdc2018/236/?time=424 https://a11y-guidelines.orange.com/en/mobile/ios/wwdc/2018/236/ https://developer.apple.com/documentation/avfaudio/avspeechsynthesisipanotationattribute
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
1 Replies
1.2k Views
My understanding from the documentation is that an utterance will use the default voice for the current user locale but that does not appear to be the case or I am doing something wrong. Is this the correct way to obtain the default system voice using AVSpeechSynthesizer or is the returned value incorrect? If it matters, I am utilizing Big Sur, 11.4 but I am not getting the correct default voice. What I get back is coincidentally, the last voice in my accessibility voice list. The default voice on my machine is currently "Kate". When using NSSpeechSynthesizer.defaultVoice is get "Kate" as the listed default voice. When using AVSpeechSynthesisVoice, the default voice returned is "Albert" which incorrect. My language code is: en-US let userCode = AVSpeechSynthesisVoice.currentLanguageCode() let usedVoice = AVSpeechSynthesisVoice(language: userCode) // should be the default voice let voice = NSSpeechSynthesizer.defaultVoice print("userCode: \(userCode)") print("NSSpeechSynthesizer: \(voice)") print("AVSpeechSynthesisVoice: \(usedVoice)") . Result: userCode: en-US NSSpeechSynthesizer: NSSpeechSynthesizerVoiceName(_rawValue: com.apple.speech.synthesis.voice.kate.premium) <--- this is the correct system default AVSpeechSynthesisVoice: Optional([AVSpeechSynthesisVoice 0x6000000051a0] Language: en-US, Name: Albert, Quality: Enhanced [com.apple.speech.synthesis.voice.Albert])
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
0 Replies
828 Views
How is it possible to wait for speech to buffer to complete inline before proceeding? I have a function that writes speech to a buffer, then resamples and manipulates the output, then included in an AVAudioengine workflow, where speech is done in faster than real-time. func createSpeechToBuffer( stringToSpeak: String, sampleRate: Float) -> AVAudioPCMBuffer? { var outBuffer : AVAudioPCMBuffer? = nil let utterance = AVSpeechUtterance(string: stringToSpeak) var speechIsBusy = true utterance.voice = AVSpeechSynthesisVoice(language: "en-us") _speechSynth.write(utterance) { (buffer: AVAudioBuffer) in guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if ( pcmBuffer.frameLength == 0 ) { print("buffer is empty") } else { print("buffer has content \(buffer)") } outBuffer = self.resampleBuffer( inSource: pcmBuffer, newSampleRate: sampleRate) speechIsBusy = false } // wait for completion of func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) while ( _speechSynth.isSpeaking ) { /* arbitrary task waiting for write to complete */ } while ( speechIsBusy ) { /* arbitrary task waiting for write to complete */ } return outBuffer } After I wrote the method and it failed to produce the desired output (inline), I realized that it returns before getting the results of the resampling. The callback is escaping, so the initial AVAudioBuffer from the callback will return after createSpeechToBuffer has completed. The resampling does work, however I currently must save the result and continue after being notified by the delegate "didFinish utterance" to proceed. func write(_ utterance: AVSpeechUtterance, toBufferCallback bufferCallback: @escaping AVSpeechSynthesizer.BufferCallback) Attempts at waiting for _speechSynth.isSpeaking or the speechIsBusy flag are not working and a dispatch queue or semaphore are blocking the write method from completing. How is it possible to wait for the result inline versus recreating a workflow depending on the delegate "didFinish utterance"? on macOS 11.4 (Big Sur)
Posted
by MisterE.
Last updated
.
Post marked as solved
3 Replies
1.7k Views
Question #1 How do you determine what the player rate is, after command-clicking the fast-forward control? Question #2 Is there something that needs to be done to keep the audio when manually adjusting the playback rate for 2.0+ Property observing appears to work only for rates below 2.0 Background I am using custom controls (slider and stepper) that are available for the user to manage the speed of playback and keeping both in sync. I am keeping them in sync by obtaining the player rate, using a property observer. Changing the custom control adjusts the play rate and changing the AVKit provided controls changes the custom controls. This works as expected until I command-click the fast-forward control. The property observer reports a play rate of 0, which should mean it is stopped but it is clearly moving forward at a high rate. If I programmatically change the play rate (example, player.rate = 5.0) , the property observer report the rate correctly. Manual method If you option-click the forward fast-forward control, it increments by 0.1 from 1.0 to 2.0x (1.1, 1.2, 1.3, etc.) and the play rate changes accordingly (property observer reports these values) If you command-click the fast-forward control the play rate changes to 2, 5, 10, 30 and 60x. (Property observer reports 0.0) How do you determine what the player rate is, after command-clicking the fast-forward control? The other think is that if I programmatically or manually change the player rate in the range up to 1.9x, the audio continues to play. If I programmatically change the player rate from 2.0x+, the audio continues to play If I manually change the rate (command-click the player fast-forward control) the audio does not continue Is there something that needs to be done to keep the audio when manually adjusting the playback rate for 2.0+ ![]("https://developer.apple.com/forums/content/attachment/e915f567-b721-4430-9c61-2789a9058002" "title=PlayerControl.png;width=627;height=183") var playRateSync: Double = 1.0 { didSet{ // move in specific increment value let increment = 0.1 let oldValue = playRateSync let playRateSync = increment * round(oldValue / increment) let playRateStr = String(format: "%.2f", playRateSync) playbackLabel.stringValue = "Playback Rate: \(playRateStr)" playbackSlider.doubleValue = playRateSync playbackStepper.doubleValue = playRateSync } } func setPlayRateObserver(player: AVPlayer) { player.addObserver(self, forKeyPath: "rate", options: [.new, .old, .initial], context: nil) } override func observeValue(forKeyPath keyPath: String?, of object: Any?, change: [NSKeyValueChangeKey : Any]?, context: UnsafeMutableRawPointer?) { print("keyPath: \(keyPath), object: \(object), change: \(change)") if object as AnyObject? === playerView.player { if keyPath == "rate" { print("The real rate: \(Double(playerView.player!.rate))") if let player = playerView.player { if player.rate > 0 { playRateSync = Double(player.rate) print("Rate changed. New Rate: \(player.rate)") } } } } }
Posted
by MisterE.
Last updated
.
Post marked as solved
3 Replies
1k Views
How do you archive mixed objects that conforms to NSSecureCoding (SFTranscription) for later retrieval? I am utilizing SFSpeechRecognizer and attempting to save the results of the transcription for later analysis and processing. My issue isnt specifically with speech but rather with archiving. Unfortunately, I haven't archived before and after Googling, I have encountered challenges. struct TranscriptionResults: Codable { var currTime : Double // Running start time from beginning of file var currSegStart : Double // start from beginning of segment var currSegSecs : Double // segment length in seconds var currSegEnd : Double // end = currStart + segmentSecs, calculate dont need to save var elapsedTime : Double // how much time to process to this point var fileName : String var fileURL : URL var fileLength : Int64 var transcription : SFTranscription //* does not conform to Codable ** } Type 'TranscriptionResults' does not conform to protocol 'Decodable' Type 'TranscriptionResults' does not conform to protocol 'Encodable' When I add the SFTranscription type with " var transcription : SFTranscription" I get the above error I looked it up and SFTranscription is the following open class SFTranscription : NSObject, NSCopying, NSSecureCoding {...} SFTranscription is my issue is with complying with Codable, as it doe not look like you can mix with NSSecureCoding. I don't think my issue is specifically with SFTranscription but understanding how to save the results that include a mix of NSSecureCoding to disk. How do you save the result for later retrieval?
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
1 Replies
1.8k Views
How is the SFTranscriptionSegment used to track where recognition of statements are made? My goal is to transcribe the audio and record where in the audio file a sentences are spoken. The timestamps start reset after each phrase recognition. If I attempt to keep a running count of the timestamps + duration, it does not match when the phrase was spoken after the first or second recognized phrase. If I keep a running count of the first SFTranscriptionSegment[0] plus subsequent SFTranscriptionSegment[last] + duration, I should stay aligned with to the next speech segment but it does not. How is the SFTranscriptionSegment used to track where recognition of statements are made? The following affirmations are used as a test of speech recognition. I output the affirmations using the NSSpeechSynthesizer Veena speech voice (with only padded silence between sentences) to a file. I then read the file into speech recognition, to test the output against a known set of sentences. If I need to know where in the file a speech segment begins, how to I get it from the timestamps and duration? I set on device recognition to TRUE because, there files are unlimited and my target files can be up to two hours in length, while my test files are 15-30 minutes in length, so this must be done on the device. recogRequest.requiresOnDeviceRecognition = true Running on macOS Catalina 10.5.7 Affirmations I plan leisure time regularly. I balance my work life and my leisure life perfectly. I return to work refreshed and renewed. I experience a sense of well being while I work. I experience a sense of inner peace while I relax. I like what I do and I do what I like. I increase in mental and emotional health daily. This transcription is now concluded. The below function produces the following output func recognizeFile_Compact(url:NSURL) { let language = "en-US" //"en-GB" let recognizer = SFSpeechRecognizer(locale: Locale.init(identifier: language))! let recogRequest = SFSpeechURLRecognitionRequest(url: url as URL) recognizer.supportsOnDeviceRecognition = true // make sure the device is ready to do the work recognizer.defaultTaskHint = .dictation // give a hint as dictation recogRequest.requiresOnDeviceRecognition = true // we want the device to do all the work recogRequest.shouldReportPartialResults = false // we dont want partial results var strCount = 0 let recogTask = recognizer.recognitionTask(with: recogRequest, resultHandler: { (result, error) in guard let result = result else { print("Recognition failed, \(error!)") return } let progress = recognizer.queue.progress.fractionCompleted // we never get progress other then 0.0 let text = result.bestTranscription.formattedString strCount += 1 print(" #\(strCount), Progress: \(progress) \n\n", "FormattedString: \(text) \n\n", "BestTranscription: \(result.bestTranscription)", "\n\n" ) if (result.isFinal) { print("WE ARE FINALIZED") } }) } code-block #1, Progress: 0.0 FormattedString: I plan Lisa time regularly BestTranscription: SFTranscription: 0x600000cac240, formattedString=I plan Lisa time regularly, segments=( "SFTranscriptionSegment: 0x6000026266a0, substringRange={0, 1}, timestamp=15.96, duration=0.1499999999999986, confidence=0.862, substring=I, alternativeSubstrings=(\n), phoneSequence=AY, ipaPhoneSequence=\U02c8a\U0361\U026a, voiceAnalytics=(null)", "SFTranscriptionSegment: 0x6000026275a0, substringRange={2, 4}, timestamp=16.11, duration=0.3000000000000007, confidence=0.172, substring=plan, alternativeSubstrings=(\n planned,\n blend,\n blame,\n played\n), phoneSequence=p l AA n, ipaPhoneSequence=p.l.\U02c8\U00e6.n, voiceAnalytics=(null)", "SFTranscriptionSegment: 0x600002625ec0, substringRange={7, 4}, timestamp=16.41, duration=0.3300000000000018, confidence=0.71, substring=Lisa, alternativeSubstrings=(\n Liza,\n Lise\n), phoneSequence=l EE z uh, ipaPhoneSequence=l.\U02c8i.z.\U0259, voiceAnalytics=(null)", "SFTranscriptionSegment: 0x600002626f40, substringRange={12, 4}, timestamp=16.74, duration=0.2999999999999972, confidence=0.877, substring=time, alternativeSubstrings=(\n), phoneSequence=t AY m, ipaPhoneSequence=t.\U02c8a\U0361\U026a.m, voiceAnalytics=(null)", "SFTranscriptionSegment: 0x6000026271e0, substringRange={17, 9}, timestamp=17.04, duration=0.7200000000000024, confidence=0.88, substring=regularly, alternativeSubstrings=(\n), phoneSequence=r EH g y uh l ur l ee, ipaPhoneSequence=\U027b.\U02c8\U025b.g.j.\U0259.l.\U0259 \U027b.l.i, voiceAnalytics=(null)" ), speakingRate=0.000000, averagePauseDuration=0.000000 Speech Recognition Output - https://developer.apple.com/forums/content/attachment/a001ddb3-481e-43c4-b7b9-00ed2b386fd3
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
5 Replies
1.6k Views
init(componentDescription: AudioComponentDescription,options: AudioComponentInstantiationOptions = []).I built an AudioUnitV3 effect and I set the variable "maximumFramesToRender" within the above method. My preference is to make this value 256 or 1024, however it doesnt appear to matter what I set this value to because always changes to 512. The effect does work, however I cant change the number of frames to render.I would dispense with creating an effect if I could change the to render a maximum of 256 frames for an AVAudioPlayerNode.self.maximumFramesToRender = 256The documentation shows you must set the value before resources are allocated and I have done so.I have two questions1) Is there more than one place you must set this value in an AudioUnitV3 effect unit2) Can you set the frame rendering for an AVAudioPlayerNode, regardless of whether you are online or offline rendering?try! self.audioEngine.enableManualRenderingMode(.offline, format: self.audioFormat, maximumFrameCount: 4096)I do realize that for manual rendering, I can change the maximumFrameCount to 256, however I want either the effect or the player node to render at a different rate because I built a render block around specific timings. So I need to this specific effect or node to render at a defined rate regardless of whether all other downstream nodes are rendering at a higher frame rate.
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
4 Replies
1.6k Views
I am unable to get AVSpeechSynthesizer to write or to acknowledge the delegate did finish. when I call the function, it merely speaks the string aloud. I am running on macOS 10.15.7 (Catalina). What am I missing? language code-block SpeakerTest().writeToBuffer("This should write to buffer and call didFinish delegate.") class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() override init() { super.init() synth.delegate = self } func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) { print("Utterance didFinish") } func speak(_ string: String) { let utterance = AVSpeechUtterance(string: string) synth.speak(utterance) } func writeToBuffer(_ string: String) { let utterance = AVSpeechUtterance(string: string) synth.write(utterance) { (buffer: AVAudioBuffer) in guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if pcmBuffer.frameLength == 0 { print("buffer is empty") } else { print("buffer has content \(buffer)") } } } }
Posted
by MisterE.
Last updated
.