Hello,
I’m attempting to convert a TensorFlow model to CoreML using the coremltools package, but I’m encountering an error during the conversion process. The error traceback points to an issue within the Cast operation in the MIL (Model Intermediate Layer) when it tries to perform type inference:
AttributeError: 'float' object has no attribute 'astype'
Here is the relevant part of the error traceback:
File ~/.pyenv/versions/3.10.12/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py", line 896, in get_cast_value
return input_var.val.astype(dtype=type_map[dtype_val])
I’ve tried converting a model from the yamnet-tensorflow2 repository, and this error occurs when CoreML tries to cast a float type during the conversion of certain operations. I’m currently using Python 3.10 and coremltools version 6.0.1, with TensorFlow 2.x.
Has anyone encountered a similar issue or can offer suggestions on how to resolve this?
I’ve also considered that this might be related to mismatches in the model’s data types, but I’m not sure how to proceed.
Platform and package versions:
coremltools 6.1
tensorflow 2.10.0
tensorflow-estimator 2.10.0
tensorflow-hub 0.16.1
tensorflow-io-gcs-filesystem 0.37.1
Python 3.10.12
pip 24.3.1 from ~/.pyenv/versions/3.10.12/lib/python3.10/site-packages/pip (python 3.10)
Darwin MacBook-Pro.local 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:02:27 PDT 2024; root:xnu-11215.41.3~2/RELEASE_X86_64 x86_64
Any help or pointers would be greatly appreciated!
Post
Replies
Boosts
Views
Activity
Hi Apple Developer Community,
I’m exploring ways to fine-tune the SNSoundClassifier to allow users of my iOS app to personalize the model by adding custom sounds or adjusting predictions. While Apple’s WWDC session on sound classification explains how to train from scratch, I’m specifically interested in using SNSoundClassifier as the base model and building/fine-tuning on top of it.
Here are a few questions I have:
1. Fine-Tuning on SNSoundClassifier:
Is there a way to fine-tune this model programmatically through APIs? The manual approach using macOS, as shown in this documentation is clear, but how can it be done dynamically - within the app for users or in a cloud backend (AWS/iCloud)?
Are there APIs or classes that support such on-device/cloud-based fine-tuning or incremental learning? If not directly, can the classifier’s embeddings be used to train a lightweight custom layer?
Training is likely computationally intensive and drains too much on battery, doing it on cloud can be right way but need the right apis to get this done. A sample code will do good.
2. Recommended Approach for In-App Model Customization:
If SNSoundClassifier doesn’t support fine-tuning, would transfer learning on models like MobileNetV2, YAMNet, OpenL3, or FastViT be more suitable?
Given these models (SNSoundClassifier, MobileNetV2, YAMNet, OpenL3, FastViT), which one would be best for accuracy and performance/efficiency on iOS? I aim to maintain real-time performance without sacrificing battery life. Also it is important to see architecture retention and accuracy after conversion to CoreML model.
3. Cost-Effective Backend Setup for Training:
Mac EC2 instances on AWS have a 24-hour minimum billing, which can become expensive for limited user requests. Are there better alternatives for deploying and training models on user request when s/he uploads files (training data)?
4. TensorFlow vs PyTorch:
Between TensorFlow and PyTorch, which framework would you recommend for iOS Core ML integration? TensorFlow Lite offers mobile-optimized models, but I’m also curious about PyTorch’s performance when converted to Core ML.
5. Metrics:
Metrics I have in mind while picking the model are these: Publisher, Accuracy, Fine-Tuning capability, Real-Time/Live use, Suitability of iPhone 16, Architectural retention after coreML conversion, Reasons for unsuitability, Recommended use case.
Any insights or recommended approaches would be greatly appreciated.
Thanks in advance!
Hi everyone,
I'm working on integrating object recognition from live video feeds into my existing app by following Apple's sample code. My original project captures video and records it successfully. However, after integrating the Vision-based object detection components (VNCoreMLRequest), no detections occur, and the callback for the request is never triggered.
To debug this issue, I’ve added the following functionality:
Set up AVCaptureVideoDataOutput for processing video frames.
Created a VNCoreMLRequest using my Core ML model.
The video recording functionality works as expected, but no object detection happens. I’d like to know:
How to debug this further? Which key debug points or logs could help identify where the issue lies?
Have I missed any key configurations? Below is a diff of the modifications I’ve made to my project for the new feature.
Diff of Changes:
(Attach the diff provided above)
Specific Observations:
The captureOutput method is invoked correctly, but there is no output or error from the Vision request callback.
Print statements in my setup function setForVideoClassify() show that the setup executes without errors.
Questions:
Could this be due to issues with my Core ML model compatibility or configuration?
Is the VNCoreMLRequest setup incorrect, or do I need to ensure specific image formats for processing?
Platform:
Xcode 16.1, iOS 18.1, Swift 5, SwiftUI, iPhone 11,
Darwin MacBook-Pro.local 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:02:27 PDT 2024; root:xnu-11215.41.3~2/RELEASE_X86_64 x86_64
Any guidance or advice is appreciated! Thanks in advance.
Hello,
I am exploring real-time object detection, and its replacement/overlay with another shape, on live video streams for an iOS app using Core ML and Vision frameworks. My target is to achieve high-speed, real-time detection without noticeable latency, similar to what’s possible with PageFault handling and Associative Caching in OS, but applied to video processing.
Given that this requires consistent, real-time model inference, I’m curious about how well the Neural Engine or GPU can handle such tasks on A-series chips in iPhones versus M-series chips (specifically M1 Pro and possibly M4) in MacBooks. Here are a few specific points I’d like insight on:
Hardware Suitability: How feasible is it to perform real-time object detection with Core ML on the Neural Engine (i.e., can it maintain low latency)? Would the M-series chips (e.g., M1 Pro or newer) offer a tangible benefit for this type of task compared to the A-series in mobile devices? Which A- and M- chips would be minimum feasible recommendation for such task.
Performance Expectations: For continuous, live video object detection, what would be the expected frame rate or latency using an optimized Core ML model? Has anyone benchmarked such applications, and is the M-series required to achieve smooth, real-time processing?
Differences Across Apple Hardware: How does performance scale between the A-series Neural Engine and M-series GPU and Neural Engine? Is the M-series vastly superior for real-time Core ML tasks like object detection on live video feeds?
If anyone has attempted live object detection on these chips, any insights on real-time performance, limitations, or optimizations would be highly appreciated.
Please refer: Apple APIs
Thank you in advance for your help!
Hi everyone,
I’m encountering a strange issue when trying to archive my iOS app for App Store distribution. The project builds and runs fine on “Any iOS Device (arm64)”, but when I try to Product → Archive, I get multiple errors related to preview sections in my SwiftUI view files. The app uses camera for photo and video capture.
Errors:
• Cannot find 'PreviewCameraModel' in scope
• Cannot infer contextual base in reference to member 'video'
• Cannot infer contextual base in reference to member 'classify'
These errors only appear in code sections inside the #Preview blocks in SwiftUI files. Additionally:
When I click on an error in the Issue Navigator, the file shows the error momentarily but it disappears after less than a second.
The total error count decreases temporarily, but then it returns to the original number when clicking on other errors.
Build and Run works fine without any issues on devices and simulators, but these errors block the archiving process.
Workaround:
For now, I’ve resolved the issue by using #if DEBUG to exclude the preview code from release builds, but I’d prefer a cleaner solution if one exists.
System Details:
Xcode: 16.0
iOS Deployment Target: 16+
Swift: 5
Architecture: arm64
Has anyone encountered this issue or found a better way to handle SwiftUI preview code when archiving? Any advice on fixing this or insights into why the errors behave inconsistently during the archiving process would be appreciated.
Thanks in advance!
Hi everyone,
I’m experiencing an issue where audio interruptions (e.g., phone calls) are not being intercepted while running sound classification in an app that uses the AVAudioSession. Classification works fine, but interruptions aren’t handled, even though I’ve followed Apple’s guidelines on handling audio interruptions [1_Document].
The classification was initially based on [2_Classifer], where it worked perfectly. However, when I adopted classification in a more camera-focused app using [3_Cam], the interruption behavior stopped working. The classification setup is functioning with [3_Cam], but audio interruptions are not triggered.
The listener is invoked before starting sound analysis as suggested in [2_Classifier].
startListeningForAudioSessionInterruptions()
try startAnalyzing([(request, observer)])
FYI, one change I have made for classifications is following. This works fine in all cases.
// try audioSession.setCategory(.record, mode: .default)
try audioSession.setCategory(.playAndRecord, mode: .default, options: [.defaultToSpeaker, .allowBluetooth])
I suspect the issue might be related to the AVAudioSession configuration or how the app handles recording and playback together. Is there anything else I should check related to AVAudioSession? Are there additional APIs I could use to pre-check or better handle audio interruptions?
Any suggestions or guidance would be greatly appreciated!
Platform: Swift 5, Xcode 16, iOS 18.
References:
Document
Classifier
Cam
Best Regards
Question:
When implementing simultaneous video capture and audio processing in an iOS app, does the order of starting these components matter, or can they be initiated in any sequence?
I have an actor responsible for initiating video capture using the setCaptureMode function. In this actor, I also call startAudioEngine to begin the audio engine and register a resultObserver. While the audio engine starts successfully, I notice that the resultObserver is not invoked when startAudioEngine is called synchronously. However, it works correctly when I wrap the call in a Task.
Could you please explain why the synchronous call to startAudioEngine might be blocking the invocation of the resultObserver? What would be the best practice for ensuring both components work effectively together? Additionally, if I were to avoid using Task, what approach would be required? Lastly, is the startAudioEngine effective from the start time of the video capture (00:00)?
Platform: Xcode 16, Swift 6, iOS 18
References:
Classifying Sounds in an Audio Stream – In my case, the analyzeAudio() method is not invoked.
Setting Up a Capture Session – Here, the focus is on video capture.
Classifying Sounds in an Audio File
Code Snippet: (For further details. setVideoCaptureMode() surfaces the problem.)
// ensures all operations happen off of the `@MainActor`.
actor CaptureService {
...
nonisolated private let resultsObserver1 = ResultsObserver1()
...
private func setUpSession() throws { .. }
...
setVideoCaptureMode() throws {
captureSession.beginConfiguration()
defer { captureSession.commitConfiguration() }
/* -- Works fine (analyseAudio is printed)
Task {
self.resultsObserver1.startAudioEngine()
}
*/
self.resultsObserver1.startAudioEngine() // Does not work - analyzeAudio not printed
captureSession.sessionPreset = .high
try addOutput(movieCapture.output)
if isHDRVideoEnabled {
setHDRVideoEnabled(true)
}
updateCaptureCapabilities()
}
Question:
I'm working on a project in Xcode 16.1, using Swift 6 with iOS 18. My code is working fine in Swift 5, but I'm running into concurrency issues when upgrading to Swift 6, particularly with the @preconcurrency attribute in AVFoundation.
Here is the relevant part of my code:
import SwiftUI
@preconcurrency import AVFoundation
struct OverlayButtonBar: View {
...
let audioTracks = await loadTracks(asset: asset, mediaType: .audio)
...
// Tracks are extracted before crossing concurrency boundaries
private func loadTracks(asset: AVAsset, mediaType: AVMediaType) async -> [AVAssetTrack] {
do {
return try await asset.load(.tracks).filter { $0.mediaType == mediaType }
} catch {
print("Error loading tracks: \(error)")
return []
}
}
}
Issues:
When using @preconcurrency, I get the warning:
@preconcurrency attribute on module AVFoundation has no effect. Suggested fix by Xcode is: Remove @preconcurrency.
But if I remove @preconcurrency, I get both a warning and an error:
Warning: Add '@preconcurrency' to treat 'Sendable'-related errors from module 'AVFoundation' as warnings.
Error: Non-sendable type [AVAssetTrack] returned by implicitly asynchronous call to nonisolated function cannot cross actor boundary. (Class AVAssetTrack does not conform to the Sendable protocol (AVFoundation.AVAssetTrack)). This error comes if I attempt to directly access non-Sendable AVAssetTrack in an async context :
let audioTracks = await loadTracks(asset: asset, mediaType: .audio)
How can I resolve this issue while staying compliant with Swift 6 concurrency rules? Is there a recommended approach to handling non-Sendable types like AVAssetTrack in concurrency contexts?
Appreciate any guidance on making this work in Swift 6, especially considering it worked fine in Swift 5.
Thanks in advance!
Hello,
I’m encountering an issue with the PHPhotoLibrary API in Swift 6 and iOS 18. The code I’m using worked fine in Swift 5, but I’m now seeing the following error:
Sending main actor-isolated value of type '() -> Void' with later accesses to nonisolated context risks causing data races
Here is the problematic code:
Button("Save to Camera Roll") {
saveToCameraRoll()
}
...
private func saveToCameraRoll() {
guard let overlayFileURL = mediaManager.getOverlayURL() else {
return
}
Task {
do {
let status = await PHPhotoLibrary.requestAuthorization(for: .addOnly)
guard status == .authorized else {
return
}
try await PHPhotoLibrary.shared().performChanges({
if let creationRequest = PHAssetCreationRequest.creationRequestForAssetFromVideo(atFileURL: overlayFileURL) {
creationRequest.creationDate = Date()
}
})
await MainActor.run {
saveSuccessMessage = "Video saved to Camera Roll successfully"
}
} catch {
print("Error saving video to Camera Roll: \(error.localizedDescription)")
}
}
}
Problem Description:
The error message suggests that a main actor-isolated value of type () -> Void is being accessed in a nonisolated context, potentially leading to data races.
This issue arises specifically at the call to PHPhotoLibrary.shared().performChanges.
Questions:
How can I address the data race issues related to main actor isolation when using PHPhotoLibrary.shared().performChanges?
What changes, if any, are required to adapt this code for Swift 6 and iOS 18 while maintaining thread safety and actor isolation?
Are there any recommended practices for managing main actor-isolated values in asynchronous operations to avoid data races?
I appreciate any points or suggestions to resolve this issue effectively.
Thank you!
Hello,
Which API can be used to programatically fetch the ID of user who installed/paid the app?
This is useful if an app has to create a path hierarchy for different users who have installed/paid the app, for instance, /AppName//user_files, how to get the uniqueUserID, and also, to get the information about which user these files belong to based on this uniqueUserID.
App is using Swift, SwiftUI.
Thanks.