I'm building a game where the player is able to speak commands, so I want to enable speech-to-text capability. I've setup the required info.plist property (for speech recognition privacy) as well as the App Sandbox hardware setting (for audio input). I've confirmed that the application is listening via the audio tap and sending audio buffers to the recognition request. However, the recognition task never executes.
NOTE: This is for MacOS, NOT iOS. Also, it works when I have this in a Playground, but when I try to do this in an actual application, the recognition task isn't called.
Specs:
MacOS: 12.1
XCode: 13.2.1 (13C100)
Swift: 5.5.2
Here is the code that I've placed in the AppDelegate of a freshly built SpriteKit application:
//
// AppDelegate.swift
//
import Cocoa
import AVFoundation
import Speech
@main
class AppDelegate: NSObject, NSApplicationDelegate {
private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
private let audioEngine = AVAudioEngine()
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
func applicationDidFinishLaunching(_ aNotification: Notification) {
SFSpeechRecognizer.requestAuthorization(requestMicrophoneAccess)
}
func applicationWillTerminate(_ aNotification: Notification) {
// Insert code here to tear down your application
}
func applicationShouldTerminateAfterLastWindowClosed(_ sender: NSApplication) -> Bool {
return true
}
fileprivate func requestMicrophoneAccess(authStatus: SFSpeechRecognizerAuthorizationStatus) {
OperationQueue.main.addOperation {
switch authStatus {
case .authorized:
self.speechRecognizer.supportsOnDeviceRecognition = true
if let speechRecognizer = SFSpeechRecognizer() {
if speechRecognizer.isAvailable {
do {
try self.startListening()
} catch {
print(">>> ERROR >>> Listening Error: \(error)")
}
}
}
case .denied:
print("Denied")
case .restricted:
print("Restricted")
case .notDetermined:
print("Undetermined")
default:
print("Unknown")
}
}
}
func startListening() throws {
// Cancel the previous task if it's running.
recognitionTask?.cancel()
recognitionTask = nil
let inputNode = audioEngine.inputNode
// Configure the microphone input.
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {(buffer: AVAudioPCMBuffer, when: AVAudioTime) in
/**********
* Confirmed that the following line is executing continuously
**********/
self.recognitionRequest?.append(buffer)
}
startRecognizing()
audioEngine.prepare()
try audioEngine.start()
}
func startRecognizing() {
// Create a recognition task for the speech recognition session.
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequestInternal = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }
recognitionRequestInternal.shouldReportPartialResults = true
recognitionRequestInternal.requiresOnDeviceRecognition = true
/**************
* Confirmed that the following line is executed,
* however the function given to 'recognitionTask' is never called
**************/
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequestInternal) { result, error in
var isFinal = false
if result != nil {
let firstTranscriptionTimestamp = result!.transcriptions.first?.segments.first?.timestamp ?? TimeInterval.zero
isFinal = result!.isFinal || (firstTranscriptionTimestamp != 0)
}
if error != nil {
// Stop recognizing speech if there is a problem.
print("\n>>> ERROR >>> Recognition Error: \(error)")
self.audioEngine.stop()
self.audioEngine.inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
} else if isFinal {
self.recognitionTask = nil
}
}
}
}
Post
Replies
Boosts
Views
Activity
I'm developing a game that will use speech recognition to execute various commands. I am using code from Apple's Recognizing Speech in Live Audio documentation page.
When I run this in a Swift Playground, it works just fine. However, when I make a SpriteKit game application (basic setup from Xcode's "New Project" menu option), I get the following error:
required condition is false: IsFormatSampleRateAndChannelCountValid(hwFormat)
Upon further research, it appears that my input node has no channels. The following is the relevant portion of my code, along with debug output:
let inputNode = audioEngine.inputNode
print("Number of inputs: \(inputNode.numberOfInputs)")
// 1
print("Input Format: \(inputNode.inputFormat(forBus: 0))")
// <AVAudioFormat 0x600001bcf200: 0 ch, 0 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved>
let channelCount = inputNode.inputFormat(forBus: 0).channelCount
print("Channel Count: \(channelCount)")
// 0 <== Agrees with the inputFormat output listed previously
// Configure the microphone input.
print("Number of outputs: \(inputNode.numberOfOutputs)")
// 1
let recordingFormat = inputNode.outputFormat(forBus: 0)
print("Output Format: \(recordingFormat)")
// <AVAudioFormat 0x600001bf3160: 2 ch, 44100 Hz, Float32, non-inter>
inputNode.installTap(onBus: 0, bufferSize: 256, format: recordingFormat, block: audioTap) // <== This is where the error occurs.
// NOTE: 'audioTap' is a function defined in this class. Using this defined function instead of an inline, anonymous function.
The code snippet is included in the game's AppDelegate class (which includes import statements for Cocoa, AVFoundation, and Speech), and executes during its applicationDidFinishLaunching function. I'm having trouble understanding why Playground works, but a game app doesn't work. Do I need to do something specific to get the application to recognize the microphone?
NOTE: This if for MacOS, NOT iOS. While the "How To" documentation cited earlier indicates iOS, Apple stated at WWDC19 that it is now supported on the MacOS.
NOTE: I have included the NSSpeechRecognitionUsageDescription key in the applications plist, and successfully acknowledged the authorization request for the microphone.