SFSpeechRecognizer segment timestamps show erratic behaviour MacOS Catalina

Question

Created Feb ’20

Replies 3

Boosts 0

Views 1k

Participants 3

I am using SFSpeechRecognizer on Mac OS Catalina to transcribe audio files that are potentially longer than one minute. In my tests I see that in the middle of the file the timestamps of SFTranscriptionSegments again start at zero, just as if the audio clock would be reset. This sometimes happens after a minute but later in the file also at different times. This renders the timestamps useless. Is this something that can be configured or worked around? Chopping audio files into one-minute segments will have the danger of splitting words, thus hurting result quality.

The documentation states that there is a limit of one minute, which seems to be for mobile devices as it clearly works for longer files (I am reliably transcribing 8 minute files with results that contain the entire text).

Does anyone have an insight to share? It doesn't feel right that such a general-purpose AI functionality should be so limited on MacOS. It looks like a perfectly valid use case to transcribe long audio files using the power of a multi-core Mac Pro, for example.

Thanks

Boost

Answer 1

Zaptrem. OP

Feb ’20

Dealing with the exact same issue on iOS. The "full transcription" string also resets very frequently, so I just made an array and append the string whenever it resets.

0

Answer 2

clipchamp-allan OP

Feb ’20

Just chiming in - we've also come across the same issue on iOS when transcribing on device. Server works, but is limited to one minute. Will update if we have any luck.

0

Answer 3

Zaptrem. OP

Mar ’20

@Ipde + clipchamp-allan have either of you made any progress on this?

0