SFSpeechRecognizer segment timestamps show erratic behaviour MacOS Catalina

I am using SFSpeechRecognizer on Mac OS Catalina to transcribe audio files that are potentially longer than one minute. In my tests I see that in the middle of the file the timestamps of SFTranscriptionSegments again start at zero, just as if the audio clock would be reset. This sometimes happens after a minute but later in the file also at different times. This renders the timestamps useless. Is this something that can be configured or worked around? Chopping audio files into one-minute segments will have the danger of splitting words, thus hurting result quality.


The documentation states that there is a limit of one minute, which seems to be for mobile devices as it clearly works for longer files (I am reliably transcribing 8 minute files with results that contain the entire text).


Does anyone have an insight to share? It doesn't feel right that such a general-purpose AI functionality should be so limited on MacOS. It looks like a perfectly valid use case to transcribe long audio files using the power of a multi-core Mac Pro, for example.


Thanks

Dealing with the exact same issue on iOS. The "full transcription" string also resets very frequently, so I just made an array and append the string whenever it resets.

Just chiming in - we've also come across the same issue on iOS when transcribing on device. Server works, but is limited to one minute. Will update if we have any luck.

@Ipde + clipchamp-allan have either of you made any progress on this?

SFSpeechRecognizer segment timestamps show erratic behaviour MacOS Catalina
 
 
Q