I am attempting to do batch Transcription of audio files exported from Voice Memos, and I am running into an interesting issue. If I only transcribe a single file it works every time, but if I try to batch it, only the last one works, and the others fail with No speech detected.
I assumed it must be something about concurrency, so I implemented what I think should remove any chance of transcriptions running in parallel. And with a mocked up unit of work, everything looked good. So I added the transcription back in, and
1: It still fails on all but the last file. This happens if I am processing 10 files or just 2.
2: It no longer processes in order, any file can be the last one that succeeds. And it seems to not be related to file size. I have had paragraph sized notes finish last, but also a single short sentence that finishes last.
I left the mocked processFiles() for reference.
Any insights would be greatly appreciated.
import Speech
import SwiftUI
struct ContentView: View {
@State private var processing: Bool = false
@State private var fileNumber: String?
@State private var fileName: String?
@State private var files: [URL] = []
let locale = Locale(identifier: "en-US")
let recognizer: SFSpeechRecognizer?
init() {
self.recognizer = SFSpeechRecognizer(locale: self.locale)
}
var body: some View {
VStack {
if files.count > 0 {
ZStack {
ProgressView()
Text(fileNumber ?? "-")
.bold()
}
Text(fileName ?? "-")
} else {
Image(systemName: "folder.badge.minus")
Text("No audio files found")
}
}
.onAppear {
files = getFiles()
Task {
await processFiles()
}
}
}
private func getFiles() -> [URL] {
do {
let documentsURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
let path = documentsURL.appendingPathComponent("Voice Memos").absoluteURL
let contents = try FileManager.default.contentsOfDirectory(at: path, includingPropertiesForKeys: nil, options: [])
let files = (contents.filter {$0.pathExtension == "m4a"}).sorted { url1, url2 in
url1.path < url2.path
}
return files
}
catch {
print(error.localizedDescription)
return []
}
}
private func processFiles() async {
var fileCount = files.count
for file in files {
fileNumber = String(fileCount)
fileName = file.lastPathComponent
await processFile(file)
fileCount -= 1
}
}
// private func processFile(_ url: URL) async {
// let seconds = Double.random(in: 2.0...10.0)
// await withCheckedContinuation { continuation in
// DispatchQueue.main.asyncAfter(deadline: .now() + seconds) {
// continuation.resume()
// print("\(url.lastPathComponent) \(seconds)")
// }
// }
// }
private func processFile(_ url: URL) async {
let recognitionRequest = SFSpeechURLRecognitionRequest(url: url)
recognitionRequest.requiresOnDeviceRecognition = false
recognitionRequest.shouldReportPartialResults = false
await withCheckedContinuation { continuation in
recognizer?.recognitionTask(with: recognitionRequest) { (transcriptionResult, error) in
guard transcriptionResult != nil else {
print("\(url.lastPathComponent.uppercased())")
print(error?.localizedDescription ?? "")
return
}
if ((transcriptionResult?.isFinal) == true) {
if let finalText: String = transcriptionResult?.bestTranscription.formattedString {
print("\(url.lastPathComponent.uppercased())")
print(finalText)
}
}
}
continuation.resume()
}
}
}