Undocumented behavior of delivering partial results in batches.

Question

Created Nov ’20

Replies 1

Boosts 2

Views 1.2k

Participants 2

I'm using a Speech framework to transcribe a very long audio file (1h+) and I want to present partial results along the way.

What I've noticed is that SFSpeechRecognizer is processing audio in batches.
Delivered SFTranscriptionSegment have timestamp set to 0.0 most of the time, but it seems they are set to meaningful values at the end of the "batch". When it's done, the next reported partial results no longer contains those segments. It starts delivering partial results from the next batch.

Note that all I'm talking about here is when SFSpeechRecognitionResult has isFinal set to false.

I found zero mentions about this in the docs.
What's problematic for me is that segments timestamps in each batch are relative to the batch itself and not the entire audio file. Because of that, it's impossible to determine segment's absolute timestamp because we don't know absolute timestamp of the batch.

Is there any Apple engineer here that could shed some light on that behavior? Is there any way to get a meaningful segment timestamp from partial results callbacks?

Boost

Answer 1

jmdecombe OP

Dec ’20

Were you able to solve your problem and can you share a code solution? It seems that Apple's Speech Recognition framework is limited to 1000 requests per hour and device, and can process at most 1 minute of transcription, so I am not sure how you manage to transcribe 1+ hour of audio.

0