lpde’s Profile | Apple Developer Forums

SFSpeechRecognizer segment timestamps show erratic behaviour MacOS Catalina

I am using SFSpeechRecognizer on Mac OS Catalina to transcribe audio files that are potentially longer than one minute. In my tests I see that in the middle of the file the timestamps of SFTranscriptionSegments again start at zero, just as if the audio clock would be reset. This sometimes happens after a minute but later in the file also at different times. This renders the timestamps useless. Is this something that can be configured or worked around? Chopping audio files into one-minute segments will have the danger of splitting words, thus hurting result quality.The documentation states that there is a limit of one minute, which seems to be for mobile devices as it clearly works for longer files (I am reliably transcribing 8 minute files with results that contain the entire text).Does anyone have an insight to share? It doesn't feel right that such a general-purpose AI functionality should be so limited on MacOS. It looks like a perfectly valid use case to transcribe long audio files using the power of a multi-core Mac Pro, for example.Thanks

Speech

Posted

lpde.

Last updated

User Profile

lpde

Posts

Posts

SFSpeechRecognizer segment timestamps show erratic behaviour MacOS Catalina