Could you provide guidance on how to add chapter marks to an M4A. I've been attempting bookmark. From what I've read, it requires the use of
AVMetadataKey.quickTimeUserDataKeyChapter
track.addTrackAssociation(to: ... type: .chapterList)
or both.
I've looked into AVTimedMetadataGroup but I havent found a way to get it added based on the documentation. I also havent found anyone who has used native Swift to add chapter marks. They've always given in and used ffmpeg or some other external solution.
inputURL is for the file that is being read in
outputURL is for the the final file
chapters is an array of dictionaries, where time is the start of each chapter and its name in the list
The target is macOS
import AVFoundation
class AudioChapterCreator {
// Function to create an audio file with chapters and a chapter list
func createAudioFileWithChapters(inputURL: URL, outputURL: URL, chapters: [(time: CMTime, title: String)]) {
let options = [AVURLAssetPreferPreciseDurationAndTimingKey: true]
let asset = AVURLAsset(url: inputURL, options: options)
let durationInSeconds = CMTimeGetSeconds(asset.duration)
print("asset durationInSeconds: \(durationInSeconds)")
guard let audioTrack = asset.tracks(withMediaType: .audio).first else {
print("Error: Unable to find audio track in the asset.")
return
}
// Create metadata items for chapters
let chapterMetadataItems = chapters.map { chapter -> AVMetadataItem in
let item = AVMutableMetadataItem()
// this duration is just for testing
let tempDur = CMTime(seconds: 100, preferredTimescale: 1)
item.keySpace = AVMetadataKeySpace.quickTimeUserData
item.key = AVMetadataKey.quickTimeUserDataKeyChapter as NSString
item.value = chapter.title as NSString
item.time = chapter.time
item.duration = tempDur
return item
}
// Create an AVAssetExportSession for writing the output file
guard let exportSession = AVAssetExportSession(asset: asset, presetName: AVAssetExportPresetAppleM4A) else {
print("Error: Unable to create AVAssetExportSession.")
return
}
// Configure the AVAssetExportSession
exportSession.outputFileType = .m4a
exportSession.outputURL = outputURL
exportSession.metadata = asset.metadata + chapterMetadataItems
exportSession.timeRange = CMTimeRangeMake(start: CMTime.zero, duration: asset.duration);
// Export the audio file
exportSession.exportAsynchronously {
switch exportSession.status {
case .completed:
print("Audio file with chapters and chapter list created successfully.")
case .failed:
print("Error: Failed to create the audio file.")
case .cancelled:
print("Export cancelled.")
default:
print("Export failed with unknown status.")
}
}
}
}
Post
Replies
Boosts
Views
Activity
How do you register to catch these notifications from going to XCode console?
The messages occur whenever I execute a URLSession.shared.dataTask, which is often.
The messages are not an indication the code has faulted but are notifications for unknown reasons that fill the console. You should only get a notification if something is wrong.
How do you register to catch this message so it does not go to the XCode console?
nw_endpoint_handler_set_adaptive_read_handler [C14.1 104.21.42.21:443 ready socket-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for read_timeout failed
nw_endpoint_handler_set_adaptive_write_handler [C14.1 104.21.42.21:443 ready socket-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for write_timeout failed
I believe this will duplicate the issue
func queryEndpoint(address: String) {
let url = URL(string: address)
let task = URLSession.shared.dataTask(with: url!) {(data, response, error) in
let result = String(data: data!, encoding: String.Encoding.utf8)!
print(result)
}
task.resume()
}
I upgraded to XCode 13. Previously versions of XCode would show the variable name and type in Quickhelp, when you clicked on the variable "let classRef: ViewController" (example) or a user defined methods, it would show its declaration information. Now, quick help only shows information when you click on a built-in function or a function parameter
I thought quitting XCode or cleaning the project would resolve this but it did not.
This features was extremely beneficial, where you could easily click on an item and copy its declaration information or just see the types being reference.
How do I get the previous behavior back?
Can you perform two or more OFFLINE speech recognition tasks simultaneously?
SFSpeechRecognizer, SFSpeechURLRecognitionRequest offline limitation?
Running on macOS Big Sur 11.5.2
I would like to be perform two or more offline speech recognition tasks simultaneously.
I've executed two tasks in the same application AND executed two different applications, both using offline recognition.
Once I initiate the other thread or other application, the first recognition stops.
Since the computer supports multiple threads, I planned to take make use of the concurrency.
Use cases
#1 multiple Audio or video files that I wish to transcribe -- cuts down on the wait time.
#2 split a single large file up into multiple sections and stitch the results together -- again cuts down on the wait time.
I set on device recognition to TRUE because my target files can be up to two hours in length.
My test files are 15-30 minutes in length and I have a number of them, so recognition must be done on the device.
func recognizeFile_Compact(url:NSURL) {
let language = "en-US" //"en-GB"
let recognizer = SFSpeechRecognizer(locale: Locale.init(identifier: language))!
let recogRequest = SFSpeechURLRecognitionRequest(url: url as URL)
recognizer.supportsOnDeviceRecognition = true // ensure the DEVICE does the work -- don't send to cloud
recognizer.defaultTaskHint = .dictation // give a hint as dictation
recogRequest.requiresOnDeviceRecognition = true // don
recogRequest.shouldReportPartialResults = false // we dont want partial results
var strCount = 0
let recogTask = recognizer.recognitionTask(with: recogRequest, resultHandler: { (result, error) in
guard let result = result else {
print("Recognition failed, \(error!)")
return
}
let text = result.bestTranscription.formattedString
strCount += 1
print(" #\(strCount), "Best: \(text) \n" )
if (result.isFinal) { print("WE ARE FINALIZED") }
})
}
I am attempting to utilize alternative pronunciation utilizing the IPA notation for AVSpeechSynthesizer on macOS (Big Sur 11.4). The attributed string is being ignored and so the functionality is not working. I tried this on iOS simulator and it works properly.
The India English voice pronounces the word "shame" as shy-em, so I applied the correct pronunciation but no change was heard. I then substituted the pronunciation for a completely different word but there was no change.
Is there something else that must be done to make this work?
AVSpeechSynthesisIPANotationAttribute
Attributed String: It's a '{
}shame{
AVSpeechSynthesisIPANotationAttribute = "\U0283\U02c8e\U0361\U026am";
}' it didn't work out.{
}
Target Range: {8, 5}
Target String: shame, Substitution: ʃˈe͡ɪm
Attributed String: It's a '{
}shame{
AVSpeechSynthesisIPANotationAttribute = "\U0283\U02c8e\U0361\U026am";
}' it didn't work out.{
}
Target Range: {8, 5}
Target String: shame, Substitution: ʃˈe͡ɪm
Attributed String: It's a '{
}shame{
AVSpeechSynthesisIPANotationAttribute = "t\U0259.\U02c8me\U0361\U026a.do\U0361\U028a";
}' it didn't work out.{
}
Target Range: {8, 5}
Target String: shame, Substitution: tə.ˈme͡ɪ.do͡ʊ
Attributed String: It's a '{
}shame{
AVSpeechSynthesisIPANotationAttribute = "t\U0259.\U02c8me\U0361\U026a.do\U0361\U028a";
}' it didn't work out.{
}
Target Range: {8, 5}
Target String: shame, Substitution: tə.ˈme͡ɪ.do͡ʊ
class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate {
let synth = AVSpeechSynthesizer()
func speakIPA_Substitution(subst: String, voice: AVSpeechSynthesisVoice)
{
let text = "It's a 'shame' it didn't work out."
let mutAttrStr = NSMutableAttributedString(string: text)
let range = NSString(string: text).range(of: "shame")
let pronounceKey = NSAttributedString.Key(rawValue: AVSpeechSynthesisIPANotationAttribute)
mutAttrStr.setAttributes([pronounceKey: subst], range: range)
let utterance = AVSpeechUtterance(attributedString: mutAttrStr)
utterance.voice = voice
utterance.postUtteranceDelay = 1.0
let swiftRange = Range(range, in: text)!
print("Attributed String: \(mutAttrStr)")
print("Target Range: \(range)")
print("Target String: \(text[swiftRange]), Substitution: \(subst)\n")
synth.speak(utterance)
}
func customPronunciation()
{
let shame = "ʃˈe͡ɪm" // substitute correct pronunciation
let tomato = "tə.ˈme͡ɪ.do͡ʊ" // completely different word pronunciation
let britishVoice = AVSpeechSynthesisVoice(language: "en-GB")!
let indiaVoice = AVSpeechSynthesisVoice(language: "en-IN")!
speakIPA_Substitution(subst: shame, voice: britishVoice) // already correct, no substitute needed
// pronounced incorrectly and ignoring the corrected pronunciation from IPA Notation
speakIPA_Substitution(subst: shame, voice: indiaVoice) // ignores substitution
speakIPA_Substitution(subst: tomato, voice: britishVoice) // ignores substitution
speakIPA_Substitution(subst: tomato, voice: indiaVoice) // ignores substitution
}
}
I am utilizing macOS and would like to know how to create accurate or alternate pronunciations using AVSpeechSynthesizer.
Is there a guide or document that indicates the unicode symbols that are used or accepted for the IPA notation?
The only method that I've found is to create or obtain pronunciations is through an iPhone.
References:
AVSpeechSynthesisIPANotationAttribute
https://developer.apple.com/videos/play/wwdc2018/236/?time=424
https://a11y-guidelines.orange.com/en/mobile/ios/wwdc/2018/236/
https://developer.apple.com/documentation/avfaudio/avspeechsynthesisipanotationattribute
Is the format description AVSpeechSynthesizer for the speech buffer is correct?
When I attempt to convert it, I get back noise from two different conversion methods.
I am seeking to convert the speech buffer provided by the AVSpeechSynthesizer "func write(_ utterance: AVSpeechUtterance..." method.
The goal is to convert the sample type, change the sample rate and change from mono to stereo buffer.
I later manipulate the buffer data and pass it through AVAudioengine.
For testing purposes, I have kept the sample rate to the original 22050.0
What have I tried?
I have a method that I've been using for years named "resampleBuffer" that does this.
When I apply it to the speech buffer, I get back noise.
When I attempt to manually convert format and to stereo with "convertSpeechBufferToFloatStereo", I am getting back clipped output.
I tested flipping the samples, addressing the Big Endian, Signed Integer but that didn't work.
The speech buffer description is
inBuffer description: <AVAudioFormat 0x6000012862b0: 1 ch, 22050 Hz, 'lpcm' (0x0000000E) 32-bit big-endian signed integer>
import Cocoa
import AVFoundation
class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate {
let synth = AVSpeechSynthesizer()
override init() {
super.init()
}
func resampleBuffer( inSource: AVAudioPCMBuffer, newSampleRate: Double) -> AVAudioPCMBuffer?
{
// resample and convert mono to stereo
var error : NSError?
let kChannelStereo = AVAudioChannelCount(2)
let convertRate = newSampleRate / inSource.format.sampleRate
let outFrameCount = AVAudioFrameCount(Double(inSource.frameLength) * convertRate)
let outFormat = AVAudioFormat(standardFormatWithSampleRate: newSampleRate, channels: kChannelStereo)!
let avConverter = AVAudioConverter(from: inSource.format, to: outFormat )
let outBuffer = AVAudioPCMBuffer(pcmFormat: outFormat, frameCapacity: outFrameCount)!
let inputBlock : AVAudioConverterInputBlock = { (inNumPackets, outStatus) -> AVAudioBuffer? in
outStatus.pointee = AVAudioConverterInputStatus.haveData // very important, must have
let audioBuffer : AVAudioBuffer = inSource
return audioBuffer
}
avConverter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Mastering
avConverter?.sampleRateConverterQuality = .max
if let converter = avConverter
{
let status = converter.convert(to: outBuffer, error: &error, withInputFrom: inputBlock)
// print("\(status): \(status.rawValue)")
if ((status != .haveData) || (error != nil))
{
print("\(status): \(status.rawValue), error: \(String(describing: error))")
return nil // conversion error
}
} else {
return nil // converter not created
}
// print("success!")
return outBuffer
}
func writeToFile(_ stringToSpeak: String, speaker: String)
{
var output : AVAudioFile?
let utterance = AVSpeechUtterance(string: stringToSpeak)
let desktop = "~/Desktop"
let fileName = "Utterance_Test.caf" // not in sandbox
var tempPath = desktop + "/" + fileName
tempPath = (tempPath as NSString).expandingTildeInPath
let usingSampleRate = 22050.0 // 44100.0
let outSettings = [
AVFormatIDKey : kAudioFormatLinearPCM, // kAudioFormatAppleLossless
AVSampleRateKey : usingSampleRate,
AVNumberOfChannelsKey : 2,
AVEncoderAudioQualityKey : AVAudioQuality.max.rawValue
] as [String : Any]
// temporarily ignore the speaker and use the default voice
let curLangCode = AVSpeechSynthesisVoice.currentLanguageCode()
utterance.voice = AVSpeechSynthesisVoice(language: curLangCode)
// utterance.volume = 1.0
print("Int32.max: \(Int32.max), Int32.min: \(Int32.min)")
synth.write(utterance) { (buffer: AVAudioBuffer) in
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
}
if ( pcmBuffer.frameLength == 0 ) {
// done
} else {
// append buffer to file
var outBuffer : AVAudioPCMBuffer
outBuffer = self.resampleBuffer( inSource: pcmBuffer, newSampleRate: usingSampleRate)! // doesnt work
// outBuffer = self.convertSpeechBufferToFloatStereo( pcmBuffer ) // doesnt work
// outBuffer = pcmBuffer // original format does work
if ( output == nil ) {
//var bufferSettings = utterance.voice?.audioFileSettings
// Audio files cannot be non-interleaved.
var outSettings = outBuffer.format.settings
outSettings["AVLinearPCMIsNonInterleaved"] = false
let inFormat = pcmBuffer.format
print("inBuffer description: \(inFormat.description)")
print("inBuffer settings: \(inFormat.settings)")
print("inBuffer format: \(inFormat.formatDescription)")
print("outBuffer settings: \(outSettings)\n")
print("outBuffer format: \(outBuffer.format.formatDescription)")
output = try! AVAudioFile( forWriting: URL(fileURLWithPath: tempPath),settings: outSettings)
}
try! output?.write(from: outBuffer)
print("done")
}
}
}
}
class ViewController: NSViewController {
let speechDelivery = SpeakerTest()
override func viewDidLoad() {
super.viewDidLoad()
let targetSpeaker = "Allison"
var sentenceToSpeak = ""
for indx in 1...10
{
sentenceToSpeak += "This is sentence number \(indx). [[slnc 3000]] \n"
}
speechDelivery.writeToFile(sentenceToSpeak, speaker: targetSpeaker)
}
}
Three test can be performed. The only one that works is to directly write the buffer to disk
Is this really "32-bit big-endian signed integer"?
Am I addressing this correctly or is this a bug?
I'm on macOS 11.4
How is it possible to wait for speech to buffer to complete inline before proceeding?
I have a function that writes speech to a buffer, then resamples and manipulates the output, then included in an AVAudioengine workflow, where speech is done in faster than real-time.
func createSpeechToBuffer( stringToSpeak: String, sampleRate: Float) -> AVAudioPCMBuffer?
{
var outBuffer : AVAudioPCMBuffer? = nil
let utterance = AVSpeechUtterance(string: stringToSpeak)
var speechIsBusy = true
utterance.voice = AVSpeechSynthesisVoice(language: "en-us")
_speechSynth.write(utterance) { (buffer: AVAudioBuffer) in
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
}
if ( pcmBuffer.frameLength == 0 ) {
print("buffer is empty")
} else {
print("buffer has content \(buffer)")
}
outBuffer = self.resampleBuffer( inSource: pcmBuffer, newSampleRate: sampleRate)
speechIsBusy = false
}
// wait for completion of func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance)
while ( _speechSynth.isSpeaking )
{
/* arbitrary task waiting for write to complete */
}
while ( speechIsBusy )
{
/* arbitrary task waiting for write to complete */
}
return outBuffer
}
After I wrote the method and it failed to produce the desired output (inline), I realized that it returns before getting the results of the resampling. The callback is escaping, so the initial AVAudioBuffer from the callback will return after createSpeechToBuffer has completed. The resampling does work, however I currently must save the result and continue after being notified by the delegate "didFinish utterance" to proceed.
func write(_ utterance: AVSpeechUtterance, toBufferCallback bufferCallback: @escaping AVSpeechSynthesizer.BufferCallback)
Attempts at waiting for _speechSynth.isSpeaking or the speechIsBusy flag are not working and a dispatch queue or semaphore are blocking the write method from completing.
How is it possible to wait for the result inline versus recreating a workflow depending on the delegate
"didFinish utterance"?
on macOS 11.4 (Big Sur)
My understanding from the documentation is that an utterance will use the default voice for the current user locale but that does not appear to be the case or I am doing something wrong.
Is this the correct way to obtain the default system voice using AVSpeechSynthesizer or is the returned value incorrect?
If it matters, I am utilizing Big Sur, 11.4 but I am not getting the correct default voice. What I get back is coincidentally, the last voice in my accessibility voice list.
The default voice on my machine is currently "Kate".
When using NSSpeechSynthesizer.defaultVoice is get "Kate" as the listed default voice.
When using AVSpeechSynthesisVoice, the default voice returned is "Albert" which incorrect.
My language code is: en-US
let userCode = AVSpeechSynthesisVoice.currentLanguageCode()
let usedVoice = AVSpeechSynthesisVoice(language: userCode) // should be the default voice
let voice = NSSpeechSynthesizer.defaultVoice
print("userCode: \(userCode)")
print("NSSpeechSynthesizer: \(voice)")
print("AVSpeechSynthesisVoice: \(usedVoice)")
.
Result:
userCode: en-US
NSSpeechSynthesizer: NSSpeechSynthesizerVoiceName(_rawValue: com.apple.speech.synthesis.voice.kate.premium) <--- this is the correct system default
AVSpeechSynthesisVoice: Optional([AVSpeechSynthesisVoice 0x6000000051a0] Language: en-US, Name: Albert, Quality: Enhanced [com.apple.speech.synthesis.voice.Albert])
I am unable to get AVSpeechSynthesizer to write or to acknowledge the delegate actions .
I was informed this was resolved in macOS 11.
I thought it was a lot to ask but am now running on macOS 11.4 (Big Sur).
My target is to output speech faster than real-time and and drive the output through AVAudioengine.
First, I need to know why the write doesnt occur and neither do delegates get called whether I am using write or simply uttering to the default speakers in "func speak(_ string: String)".
What am I missing?
Is there a workaround?
Reference: https://developer.apple.com/forums/thread/678287
let sentenceToSpeak = "This should write to buffer and also call 'didFinish' and 'willSpeakRangeOfSpeechString' delegates."
SpeakerTest().writeToBuffer(sentenceToSpeak)
SpeakerTest().speak(sentenceToSpeak)
class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate {
let synth = AVSpeechSynthesizer()
override init() {
super.init()
synth.delegate = self
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
print("Utterance didFinish")
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
willSpeakRangeOfSpeechString characterRange: NSRange,
utterance: AVSpeechUtterance)
{
print("speaking range: \(characterRange)")
}
func speak(_ string: String) {
let utterance = AVSpeechUtterance(string: string)
var usedVoice = AVSpeechSynthesisVoice(language: "en") // should be the default voice
let voices = AVSpeechSynthesisVoice.speechVoices()
let targetVoice = "Allison"
for voice in voices {
// print("\(voice.identifier) \(voice.name) \(voice.quality) \(voice.language)")
if (voice.name.lowercased() == targetVoice.lowercased())
{
usedVoice = AVSpeechSynthesisVoice(identifier: voice.identifier)
break
}
}
utterance.voice = usedVoice
print("utterance.voice: \(utterance.voice)")
synth.speak(utterance)
}
func writeToBuffer(_ string: String)
{
print("entering writeToBuffer")
let utterance = AVSpeechUtterance(string: string)
synth.write(utterance) { (buffer: AVAudioBuffer) in
print("executing synth.write")
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
}
if pcmBuffer.frameLength == 0 {
print("buffer is empty")
} else {
print("buffer has content \(buffer)")
}
}
}
}
Question #1
How do you determine what the player rate is, after command-clicking the fast-forward control?
Question #2
Is there something that needs to be done to keep the audio when manually adjusting the playback rate for 2.0+
Property observing appears to work only for rates below 2.0
Background
I am using custom controls (slider and stepper) that are available for the user to manage the speed of playback and keeping both in sync.
I am keeping them in sync by obtaining the player rate, using a property observer. Changing the custom control adjusts the play rate and changing the AVKit provided controls changes the custom controls.
This works as expected until I command-click the fast-forward control. The property observer reports a play rate of 0, which should mean it is stopped but it is clearly moving forward at a high rate.
If I programmatically change the play rate (example, player.rate = 5.0) , the property observer report the rate correctly.
Manual method
If you option-click the forward fast-forward control, it increments by 0.1 from 1.0 to 2.0x (1.1, 1.2, 1.3, etc.) and the play rate changes accordingly (property observer reports these values)
If you command-click the fast-forward control the play rate changes to 2, 5, 10, 30 and 60x. (Property observer reports 0.0)
How do you determine what the player rate is, after command-clicking the fast-forward control?
The other think is that if I programmatically or manually change the player rate in the range up to 1.9x, the audio continues to play.
If I programmatically change the player rate from 2.0x+, the audio continues to play
If I manually change the rate (command-click the player fast-forward control) the audio does not continue
Is there something that needs to be done to keep the audio when manually adjusting the playback rate for 2.0+
![]("https://developer.apple.com/forums/content/attachment/e915f567-b721-4430-9c61-2789a9058002" "title=PlayerControl.png;width=627;height=183")
var playRateSync: Double = 1.0 {
didSet{
// move in specific increment value
let increment = 0.1
let oldValue = playRateSync
let playRateSync = increment * round(oldValue / increment)
let playRateStr = String(format: "%.2f", playRateSync)
playbackLabel.stringValue = "Playback Rate: \(playRateStr)"
playbackSlider.doubleValue = playRateSync
playbackStepper.doubleValue = playRateSync
}
}
func setPlayRateObserver(player: AVPlayer)
{
player.addObserver(self, forKeyPath: "rate", options: [.new, .old, .initial], context: nil)
}
override func observeValue(forKeyPath keyPath: String?, of object: Any?, change: [NSKeyValueChangeKey : Any]?, context: UnsafeMutableRawPointer?) {
print("keyPath: \(keyPath), object: \(object), change: \(change)")
if object as AnyObject? === playerView.player {
if keyPath == "rate" {
print("The real rate: \(Double(playerView.player!.rate))")
if let player = playerView.player
{
if player.rate > 0 {
playRateSync = Double(player.rate)
print("Rate changed. New Rate: \(player.rate)")
}
}
}
}
}
How do you archive mixed objects that conforms to NSSecureCoding (SFTranscription) for later retrieval?
I am utilizing SFSpeechRecognizer and attempting to save the results of the transcription for later analysis and processing. My issue isnt specifically with speech but rather with archiving.
Unfortunately, I haven't archived before and after Googling, I have encountered challenges.
struct TranscriptionResults: Codable {
var currTime : Double // Running start time from beginning of file
var currSegStart : Double // start from beginning of segment
var currSegSecs : Double // segment length in seconds
var currSegEnd : Double // end = currStart + segmentSecs, calculate dont need to save
var elapsedTime : Double // how much time to process to this point
var fileName : String
var fileURL : URL
var fileLength : Int64
var transcription : SFTranscription //* does not conform to Codable **
}
Type 'TranscriptionResults' does not conform to protocol 'Decodable'
Type 'TranscriptionResults' does not conform to protocol 'Encodable'
When I add the SFTranscription type with " var transcription : SFTranscription" I get the above error
I looked it up and SFTranscription is the following
open class SFTranscription : NSObject, NSCopying, NSSecureCoding {...}
SFTranscription is my issue is with complying with Codable, as it doe not look like you can mix with NSSecureCoding.
I don't think my issue is specifically with SFTranscription but understanding how to save the results that include a mix of NSSecureCoding to disk.
How do you save the result for later retrieval?
How is the SFTranscriptionSegment used to track where recognition of statements are made?
My goal is to transcribe the audio and record where in the audio file a sentences are spoken.
The timestamps start reset after each phrase recognition.
If I attempt to keep a running count of the timestamps + duration, it does not match when the phrase was spoken after the first or second recognized phrase.
If I keep a running count of the first SFTranscriptionSegment[0] plus subsequent SFTranscriptionSegment[last] + duration, I should stay aligned with to the next speech segment but it does not.
How is the SFTranscriptionSegment used to track where recognition of statements are made?
The following affirmations are used as a test of speech recognition.
I output the affirmations using the NSSpeechSynthesizer Veena speech voice (with only padded silence between sentences) to a file.
I then read the file into speech recognition, to test the output against a known set of sentences.
If I need to know where in the file a speech segment begins, how to I get it from the timestamps and duration?
I set on device recognition to TRUE because, there files are unlimited and my target files can be up to two hours in length, while my test files are 15-30 minutes in length, so this must be done on the device.
recogRequest.requiresOnDeviceRecognition = true
Running on macOS Catalina 10.5.7
Affirmations
I plan leisure time regularly.
I balance my work life and my leisure life perfectly.
I return to work refreshed and renewed.
I experience a sense of well being while I work.
I experience a sense of inner peace while I relax.
I like what I do and I do what I like.
I increase in mental and emotional health daily.
This transcription is now concluded.
The below function produces the following output
func recognizeFile_Compact(url:NSURL) {
let language = "en-US" //"en-GB"
let recognizer = SFSpeechRecognizer(locale: Locale.init(identifier: language))!
let recogRequest = SFSpeechURLRecognitionRequest(url: url as URL)
recognizer.supportsOnDeviceRecognition = true // make sure the device is ready to do the work
recognizer.defaultTaskHint = .dictation // give a hint as dictation
recogRequest.requiresOnDeviceRecognition = true // we want the device to do all the work
recogRequest.shouldReportPartialResults = false // we dont want partial results
var strCount = 0
let recogTask = recognizer.recognitionTask(with: recogRequest, resultHandler: { (result, error) in
guard let result = result else {
print("Recognition failed, \(error!)")
return
}
let progress = recognizer.queue.progress.fractionCompleted // we never get progress other then 0.0
let text = result.bestTranscription.formattedString
strCount += 1
print(" #\(strCount), Progress: \(progress) \n\n", "FormattedString: \(text) \n\n", "BestTranscription: \(result.bestTranscription)", "\n\n" )
if (result.isFinal) { print("WE ARE FINALIZED") }
})
}
code-block
#1, Progress: 0.0
FormattedString: I plan Lisa time regularly
BestTranscription: SFTranscription: 0x600000cac240, formattedString=I plan Lisa time regularly, segments=(
"SFTranscriptionSegment: 0x6000026266a0, substringRange={0, 1}, timestamp=15.96, duration=0.1499999999999986, confidence=0.862, substring=I, alternativeSubstrings=(\n), phoneSequence=AY, ipaPhoneSequence=\U02c8a\U0361\U026a, voiceAnalytics=(null)",
"SFTranscriptionSegment: 0x6000026275a0, substringRange={2, 4}, timestamp=16.11, duration=0.3000000000000007, confidence=0.172, substring=plan, alternativeSubstrings=(\n planned,\n blend,\n blame,\n played\n), phoneSequence=p l AA n, ipaPhoneSequence=p.l.\U02c8\U00e6.n, voiceAnalytics=(null)",
"SFTranscriptionSegment: 0x600002625ec0, substringRange={7, 4}, timestamp=16.41, duration=0.3300000000000018, confidence=0.71, substring=Lisa, alternativeSubstrings=(\n Liza,\n Lise\n), phoneSequence=l EE z uh, ipaPhoneSequence=l.\U02c8i.z.\U0259, voiceAnalytics=(null)",
"SFTranscriptionSegment: 0x600002626f40, substringRange={12, 4}, timestamp=16.74, duration=0.2999999999999972, confidence=0.877, substring=time, alternativeSubstrings=(\n), phoneSequence=t AY m, ipaPhoneSequence=t.\U02c8a\U0361\U026a.m, voiceAnalytics=(null)",
"SFTranscriptionSegment: 0x6000026271e0, substringRange={17, 9}, timestamp=17.04, duration=0.7200000000000024, confidence=0.88, substring=regularly, alternativeSubstrings=(\n), phoneSequence=r EH g y uh l ur l ee, ipaPhoneSequence=\U027b.\U02c8\U025b.g.j.\U0259.l.\U0259 \U027b.l.i, voiceAnalytics=(null)"
), speakingRate=0.000000, averagePauseDuration=0.000000
Speech Recognition Output - https://developer.apple.com/forums/content/attachment/a001ddb3-481e-43c4-b7b9-00ed2b386fd3
I am unable to get AVSpeechSynthesizer to write or to acknowledge the delegate did finish.
when I call the function, it merely speaks the string aloud.
I am running on macOS 10.15.7 (Catalina).
What am I missing?
language
code-block
SpeakerTest().writeToBuffer("This should write to buffer and call didFinish delegate.")
class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate {
let synth = AVSpeechSynthesizer()
override init() {
super.init()
synth.delegate = self
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
print("Utterance didFinish")
}
func speak(_ string: String) {
let utterance = AVSpeechUtterance(string: string)
synth.speak(utterance)
}
func writeToBuffer(_ string: String)
{
let utterance = AVSpeechUtterance(string: string)
synth.write(utterance) { (buffer: AVAudioBuffer) in
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
}
if pcmBuffer.frameLength == 0 {
print("buffer is empty")
} else {
print("buffer has content \(buffer)")
}
}
}
}
init(componentDescription: AudioComponentDescription,options: AudioComponentInstantiationOptions = []).I built an AudioUnitV3 effect and I set the variable "maximumFramesToRender" within the above method. My preference is to make this value 256 or 1024, however it doesnt appear to matter what I set this value to because always changes to 512. The effect does work, however I cant change the number of frames to render.I would dispense with creating an effect if I could change the to render a maximum of 256 frames for an AVAudioPlayerNode.self.maximumFramesToRender = 256The documentation shows you must set the value before resources are allocated and I have done so.I have two questions1) Is there more than one place you must set this value in an AudioUnitV3 effect unit2) Can you set the frame rendering for an AVAudioPlayerNode, regardless of whether you are online or offline rendering?try! self.audioEngine.enableManualRenderingMode(.offline, format: self.audioFormat, maximumFrameCount: 4096)I do realize that for manual rendering, I can change the maximumFrameCount to 256, however I want either the effect or the player node to render at a different rate because I built a render block around specific timings. So I need to this specific effect or node to render at a defined rate regardless of whether all other downstream nodes are rendering at a higher frame rate.