How to Fine-Tune the SNSoundClassifier for Custom Sound Classification in iOS?

Hi Apple Developer Community,

I’m exploring ways to fine-tune the SNSoundClassifier to allow users of my iOS app to personalize the model by adding custom sounds or adjusting predictions. While Apple’s WWDC session on sound classification explains how to train from scratch, I’m specifically interested in using SNSoundClassifier as the base model and building/fine-tuning on top of it.

Here are a few questions I have:

1. Fine-Tuning on SNSoundClassifier:

  • Is there a way to fine-tune this model programmatically through APIs? The manual approach using macOS, as shown in this documentation is clear, but how can it be done dynamically - within the app for users or in a cloud backend (AWS/iCloud)?

  • Are there APIs or classes that support such on-device/cloud-based fine-tuning or incremental learning? If not directly, can the classifier’s embeddings be used to train a lightweight custom layer?

  • Training is likely computationally intensive and drains too much on battery, doing it on cloud can be right way but need the right apis to get this done. A sample code will do good.

2. Recommended Approach for In-App Model Customization:

  • If SNSoundClassifier doesn’t support fine-tuning, would transfer learning on models like MobileNetV2, YAMNet, OpenL3, or FastViT be more suitable?

  • Given these models (SNSoundClassifier, MobileNetV2, YAMNet, OpenL3, FastViT), which one would be best for accuracy and performance/efficiency on iOS? I aim to maintain real-time performance without sacrificing battery life. Also it is important to see architecture retention and accuracy after conversion to CoreML model.

3. Cost-Effective Backend Setup for Training:

  • Mac EC2 instances on AWS have a 24-hour minimum billing, which can become expensive for limited user requests. Are there better alternatives for deploying and training models on user request when s/he uploads files (training data)?

4. TensorFlow vs PyTorch:

  • Between TensorFlow and PyTorch, which framework would you recommend for iOS Core ML integration? TensorFlow Lite offers mobile-optimized models, but I’m also curious about PyTorch’s performance when converted to Core ML.

5. Metrics:

  • Metrics I have in mind while picking the model are these: Publisher, Accuracy, Fine-Tuning capability, Real-Time/Live use, Suitability of iPhone 16, Architectural retention after coreML conversion, Reasons for unsuitability, Recommended use case.

Any insights or recommended approaches would be greatly appreciated.

Thanks in advance!

Answered by Frameworks Engineer in 811750022

Thx for the detailed feedback. In fact the underlying embedding that supports CreateML sound classifier can be programmatically accessed by this API: .

You can compose a pipeline by connecting this embedding with a logistic regression classifier. From there, you can do either in-app training from scratch using .fitted() or incremental training using .update().

If you choose Python route (TF or PyTorch), you will need to use coremltools to convert to CoreML supported format, which is agnostic to where the model source comes from once converted, and leverage all available compute units on device to deliver best performance. If you see any issue with the performance, feel free to file feedback or post on the forum here.

Thx for the detailed feedback. In fact the underlying embedding that supports CreateML sound classifier can be programmatically accessed by this API: .

You can compose a pipeline by connecting this embedding with a logistic regression classifier. From there, you can do either in-app training from scratch using .fitted() or incremental training using .update().

If you choose Python route (TF or PyTorch), you will need to use coremltools to convert to CoreML supported format, which is agnostic to where the model source comes from once converted, and leverage all available compute units on device to deliver best performance. If you see any issue with the performance, feel free to file feedback or post on the forum here.

Thank you for the insights on fine-tuning SNSoundClassifier with AudioFeaturePrint and logistic regression.

However, I’m still unclear on how to effectively integrate embeddings from SNSoundClassifier into this pipeline, given that they aren’t directly accessible.

Are there specific steps or methodologies to consider for augmenting the base model with user-supplied audio data, and how can I ensure the classifier accurately reflects custom sound classes?

What specific pipeline do you recommend? Base model seems to be necessary while fine-tuning on CreateML. If SNSoundClassifier can be used then how? If it cannot be used as base model then its going to be either TF or PyTorch model (which one)

Any additional guidance would be greatly appreciated!

How to Fine-Tune the SNSoundClassifier for Custom Sound Classification in iOS?
 
 
Q