I'm seeing unexpected results when examining the results from a sound classification test. Whilst I appear to get accurate startTime for observations, the duration is always the same as the value put into the windowDuration.
I'm guessing I'm misunderstanding the purpose of duration in the classification results. The link here says:
The time range’s CMTime values are the number of audio frames at the analyzer’s sample rate. Use these time indices to determine where, in time, the result corresponds to the original audio.
My understanding of this statement is it should give me the startTime AND the duration of that detection event. For example, if I attempt to detect a crowd sound and that sound lasts for 1.8 seconds, then I should see 1.8 seconds in the duration.
Below is some code showing what I'm seeing.
Initialisation of request.windDuration of 1 second. If I change this to any other value, that value is reported back as the duration of the event. Even if the event is half a second in duration.
Any help in either a code issue or understanding the results better would be appreciated. Thanks
let request = try SNClassifySoundRequest(classifierIdentifier: .version1)
request.overlapFactor = 0.8
request.windowDuration = CMTimeMakeWithSeconds(600, preferredTimescale: 600)
My code to get the values out of the SNResult
func request(_ request: SNRequest, didProduce result: SNResult) {
guard let analysisResult = result as? SNClassificationResult,
let predominantSound = analysisResult.classifications.first?.identifier,
soundsToDetect.contains(predominantSound) else { return }
let startTime = analysisResult.timeRange.start.seconds
let duration = analysisResult.timeRange.duration.seconds
let confidence = analysisResult.classifications.first?.confidence ?? 0.0
let detectedSound = ClassificationObject(id: UUID(), name: predominantSound, startTime: startTime, duration: duration, confidence: confidence)
self.detectedSounds.append(detectedSound)
}