Thank you for the insights on fine-tuning SNSoundClassifier with AudioFeaturePrint and logistic regression.
However, I’m still unclear on how to effectively integrate embeddings from SNSoundClassifier into this pipeline, given that they aren’t directly accessible.
Are there specific steps or methodologies to consider for augmenting the base model with user-supplied audio data, and how can I ensure the classifier accurately reflects custom sound classes?
What specific pipeline do you recommend? Base model seems to be necessary while fine-tuning on CreateML. If SNSoundClassifier can be used then how? If it cannot be used as base model then its going to be either TF or PyTorch model (which one)
Any additional guidance would be greatly appreciated!
Post
Replies
Boosts
Views
Activity
Hi Greg,
Thank you for taking the time to look into this.
Unfortunately, I wasn't able to upload the file directly due to the "Add File" option not allowing me to select the 19.9 MB compressed zip file. To make it accessible, I’ve uploaded the file to an S3 bucket, and you can download it here:
https://curvsort-public.s3.ap-south-1.amazonaws.com/AVCamBuildingACameraApp_1.zip. Please let me know once you've downloaded it, and I will delete it from the bucket.
The key files with the focused issue are ESoundClassifier and CaptureService (specifically resultsObserver1 and the sections marked Works 1/2/3).
To Reproduce the Issue:
I am running the app on an iPhone 11 with Swift 5 and Xcode 16
Problem Case: In this case, there are no logs for the analyze and analyzeAudio methods.
Working Case: To observe the working flow, comment out the code at // Does not work 1 and uncomment the three lines at // Works 3. You should then see logs for both analyze and analyzeAudio.
Swift 6 Concurrency Issues:
After switching the build setting to Swift 6, the following concurrency-related errors are reported:
ThumbnailButton Issue: The PhotosPicker throws an error when displaying the thumbnail: "Main actor-isolated property 'thumbnail' cannot be referenced from a Sendable closure."
AVAudioBuffer: The AVAudioBuffer is also flagged as a non-sendable type, which causes concurrency errors.
Apple Source Code References:
AVCam: Building a Camera App
Classifying Sounds in an Audio Stream
I hope this provides the necessary context. Let me know if you need any further details.
Best regards
/////////////////////////
/////////////////////////
// ensures all operations happen off of the `@MainActor`.
actor CaptureService {
...
nonisolated private let resultsObserver1 = ResultsObserver1()
...
private func setUpSession() throws { .. }
...
setVideoCaptureMode() throws {
captureSession.beginConfiguration()
defer { captureSession.commitConfiguration() }
/* -- Works fine (analyzeAudio is printed)
Task {
self.resultsObserver1.startAudioEngine()
}
*/
self.resultsObserver1.startAudioEngine() // Does not work - analyzeAudio not printed
captureSession.sessionPreset = .high
try addOutput(movieCapture.output)
if isHDRVideoEnabled {
setHDRVideoEnabled(true)
}
updateCaptureCapabilities()
}
/////////////////////////
/////////////////////////
class ResultsObserver1 {
let resultsObserver2 = ResultsObserver2()
var classifiedText: String = ""
var confidence: Double = 0.0
private var audioEngine: AVAudioEngine?
private var soundAnalyzer: SNAudioStreamAnalyzer?
private var inputFormat: AVAudioFormat?
let analysisQueue = DispatchQueue(label: "com.example.AnalysisQueue")
func analyzeAudio(buffer: AVAudioBuffer, at time: AVAudioTime) {
print("analyzeAudio")
analysisQueue.async {
print("analyze")
self.soundAnalyzer?.analyze(buffer,
atAudioFramePosition: time.sampleTime)
}
}
func stopAudioEngine() {
print("stopAudioEngine")
soundAnalyzer?.removeAllRequests()
audioEngine?.inputNode.removeTap(onBus: 0)
audioEngine?.stop()
soundAnalyzer = nil
audioEngine = nil
inputFormat = nil
}
// Setup audio analysis using SNAudioStreamAnalyzer
func startAudioEngine() {
print("startAudioEngine")
// Create a new audio engine.
audioEngine = AVAudioEngine()
print("audioEngine: \(String(describing: audioEngine))")
// Get the native audio format of the engine's input bus.
let inputBus = AVAudioNodeBus(0)
inputFormat = audioEngine?.inputNode.inputFormat(forBus: inputBus)
print("inputFormat: \(String(describing: inputFormat))")
guard let inputFormat = inputFormat else {
print("Failed to get input format")
return
}
do {
// Start the stream of audio data.
try audioEngine?.start()
print("audio engine started")
} catch {
print("Unable to start AVAudioEngine: \(error.localizedDescription)")
}
// Create a new stream analyzer.
soundAnalyzer = SNAudioStreamAnalyzer(format: inputFormat)
print("soundAnalyzer: \(String(describing: soundAnalyzer))")
// Use Apple's built-in classifier version 1
let version1 = SNClassifierIdentifier.version1
do {
// Create a classification request for version 1
let request = try SNClassifySoundRequest(classifierIdentifier: version1)
// Add a sound classification request that reports to an observer.
try soundAnalyzer?.add(request,
withObserver: resultsObserver2)
print("Added request to soundAnalyzer with the specified observer")
} catch {
print("Error setting up sound analysis: \(error)")
return
}
audioEngine?.inputNode.installTap(onBus: 0, bufferSize: 8192, format: inputFormat, block: { buffer, when in
self.analyzeAudio(buffer: buffer, at: when)
})
/*
audioEngine?.inputNode.installTap(onBus: 0, bufferSize: 8192, format: inputFormat, block: { buffer, when in
print("buffer recieved")
self.soundAnalyzer?.analyze(buffer, atAudioFramePosition: AVAudioFramePosition(buffer.frameLength))
})
*/
do {
// Prepare and start the audio engine
audioEngine?.prepare()
try audioEngine?.start()
} catch {
print("Error starting audio engine: \(error)")
}
}
}
class ResultsObserver2: NSObject, SNResultsObserving {
/// Notifies the observer when a request generates a prediction.
func request(_ request: SNRequest, didProduce result: SNResult) {
// Downcast the result to a classification result.
guard let result = result as? SNClassificationResult else { return }
// Get the prediction with the highest confidence.
guard let classification = result.classifications.first else { return }
// Get the starting time.
let timeInSeconds = result.timeRange.start.seconds
// Convert the time to a human-readable string.
let formattedTime = String(format: "%.2f", timeInSeconds)
print("Analysis result for audio at time: \(formattedTime)")
// Convert the confidence to a percentage string.
let percent = classification.confidence * 100.0
let percentString = String(format: "%.2f%%", percent)
// Print the classification's name (label) with its confidence.
print("\(classification.identifier): \(percentString) confidence.\n")
}
/// Notifies the observer when a request generates an error.
func request(_ request: SNRequest, didFailWithError error: Error) {
print("The analysis failed: \(error.localizedDescription)")
}
/// Notifies the observer when a request is complete.
func requestDidComplete(_ request: SNRequest) {
print("The request completed successfully!")
}
}
I faced this same problem. Clicked on ParticleEmitter and it crashed. Diorama crashes and wwdc23 session 10083 is not workable on the the platform. Please take a look and plan to address it asap. Platform is:
MacBook Pro (2020) -- 2 GHz Quad-Core Intel Core i5 -- Intel Iris Plus Graphics 1536 MB
MacOS: 14.0 Beta (23A5286i) -- Sonoma 14.0 Beta (23A5286i) -- Before this I tried on MacOS Ventura 13.4 and 13.4.1 both as well, faced same problem
launchd.log
realityComposerPro.crashReport