Hi, I found when continuously predicting with the same Core ML model in 120 FPS will be faster than in 60 FPS.
I use Macbook Pro M2 and turn on ProMotion to run Core ML model prediction with a 120 FPS video, the average prediction time is 7.46ms as below:
But when I turn off ProMotion, set 60 Hz refresh rate, and run Core ML model prediction with a 60 FPS video, the average prediction time is 10.91ms as below:
What could be the technical explanation for these results? Is there any documentation or technical literature that addresses this behavior?
Core ML
RSS for tagIntegrate machine learning models into your app using Core ML.
Post
Replies
Boosts
Views
Activity
I have followed https://apple.github.io/coremltools/docs-guides/source/installing-coremltools.html but failed.
Looks like the doc is too outdated.
In our app we use CoreML. But ever since macOS 15.x was released we started to get a great bunch of crashes like this:
Incident Identifier: 424041c3-884b-4e50-bb5a-429a83c3e1c8
CrashReporter Key: B914246B-1291-4D44-984D-EDF84B52310E
Hardware Model: Mac14,12
Process: <REMOVED> [1509]
Path: /Applications/<REMOVED>
Identifier: com.<REMOVED>
Version: <REMOVED>
Code Type: arm64
Parent Process: launchd [1]
Date/Time: 2024-11-13T13:23:06.999Z
Launch Time: 2024-11-13T13:22:19Z
OS Version: Mac OS X 15.1.0 (24B83)
Report Version: 104
Exception Type: SIGABRT
Exception Codes: #0 at 0x189042600
Crashed Thread: 36
Thread 36 Crashed:
0 libsystem_kernel.dylib 0x0000000189042600 __pthread_kill + 8
1 libsystem_c.dylib 0x0000000188f87908 abort + 124
2 libsystem_c.dylib 0x0000000188f86c1c __assert_rtn + 280
3 Metal 0x0000000193fdd870 MTLReportFailure.cold.1 + 44
4 Metal 0x0000000193fb9198 MTLReportFailure + 444
5 MetalPerformanceShadersGraph 0x0000000222f78c80 -[MPSGraphExecutable initWithMPSGraphPackageAtURL:compilationDescriptor:] + 296
6 Espresso 0x00000001a290ae3c E5RT::SharedResourceFactory::GetMPSGraphExecutable(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, NSDictionary*) + 932
.
.
.
43 CoreML 0x0000000192d263bc -[MLModelAsset modelWithConfiguration:error:] + 120
44 CoreML 0x0000000192da96d0 +[MLModel modelWithContentsOfURL:configuration:error:] + 176
45 <REMOVED> 0x000000010497b758 -[<REMOVED> <REMOVED>] (<REMOVED>)
No similar crashes on macOS 12-14!
MetalPerformanceShadersGraph.log
Any clue what is causing this?
Thanks! :)
We are experiencing a major issue with the native .version1 of the SoundAnalysis framework in iOS 18, which has led to all our user not having recordings. Our core feature relies heavily on sound analysis in the background, and it previously worked flawlessly in prior iOS versions. However, in the new iOS 18, sound analysis stops working in the background, triggering a critical warning.
Details of the issue:
We are using SoundAnalysis to analyze background sounds and have enabled the necessary background permissions.
We are using the latest XCode
A warning now appears, and sound analysis fails in the background. Below is the warning message we are encountering:
Warning Message:
Execution of the command buffer was aborted due to an error during execution. Insufficient Permission (to submit GPU work from background)
[Espresso::handle_ex_plan] exception=Espresso exception: "Generic error": Insufficient Permission (to submit GPU work from background) (00000006:kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted); code=7 status=-1
Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).
CoreML prediction failed with Error Domain=com.apple.CoreML Code=0 "Failed to evaluate model 0 in pipeline" UserInfo={NSLocalizedDescription=Failed to evaluate model 0 in pipeline, NSUnderlyingError=0x30330e910 {Error Domain=com.apple.CoreML Code=0 "Failed to evaluate model 1 in pipeline" UserInfo={NSLocalizedDescription=Failed to evaluate model 1 in pipeline, NSUnderlyingError=0x303307840 {Error Domain=com.apple.CoreML Code=0 "Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1)." UserInfo={NSLocalizedDescription=Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).}}}}}
We urgently need guidance or a fix for this, as our application’s main functionality is severely impacted by this background permission error. Please let us know the next steps or if this is a known issue with iOS 18.
Hi everyone,
I'm working on integrating object recognition from live video feeds into my existing app by following Apple's sample code. My original project captures video and records it successfully. However, after integrating the Vision-based object detection components (VNCoreMLRequest), no detections occur, and the callback for the request is never triggered.
To debug this issue, I’ve added the following functionality:
Set up AVCaptureVideoDataOutput for processing video frames.
Created a VNCoreMLRequest using my Core ML model.
The video recording functionality works as expected, but no object detection happens. I’d like to know:
How to debug this further? Which key debug points or logs could help identify where the issue lies?
Have I missed any key configurations? Below is a diff of the modifications I’ve made to my project for the new feature.
Diff of Changes:
(Attach the diff provided above)
Specific Observations:
The captureOutput method is invoked correctly, but there is no output or error from the Vision request callback.
Print statements in my setup function setForVideoClassify() show that the setup executes without errors.
Questions:
Could this be due to issues with my Core ML model compatibility or configuration?
Is the VNCoreMLRequest setup incorrect, or do I need to ensure specific image formats for processing?
Platform:
Xcode 16.1, iOS 18.1, Swift 5, SwiftUI, iPhone 11,
Darwin MacBook-Pro.local 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:02:27 PDT 2024; root:xnu-11215.41.3~2/RELEASE_X86_64 x86_64
Any guidance or advice is appreciated! Thanks in advance.
Hello,
I am exploring real-time object detection, and its replacement/overlay with another shape, on live video streams for an iOS app using Core ML and Vision frameworks. My target is to achieve high-speed, real-time detection without noticeable latency, similar to what’s possible with PageFault handling and Associative Caching in OS, but applied to video processing.
Given that this requires consistent, real-time model inference, I’m curious about how well the Neural Engine or GPU can handle such tasks on A-series chips in iPhones versus M-series chips (specifically M1 Pro and possibly M4) in MacBooks. Here are a few specific points I’d like insight on:
Hardware Suitability: How feasible is it to perform real-time object detection with Core ML on the Neural Engine (i.e., can it maintain low latency)? Would the M-series chips (e.g., M1 Pro or newer) offer a tangible benefit for this type of task compared to the A-series in mobile devices? Which A- and M- chips would be minimum feasible recommendation for such task.
Performance Expectations: For continuous, live video object detection, what would be the expected frame rate or latency using an optimized Core ML model? Has anyone benchmarked such applications, and is the M-series required to achieve smooth, real-time processing?
Differences Across Apple Hardware: How does performance scale between the A-series Neural Engine and M-series GPU and Neural Engine? Is the M-series vastly superior for real-time Core ML tasks like object detection on live video feeds?
If anyone has attempted live object detection on these chips, any insights on real-time performance, limitations, or optimizations would be highly appreciated.
Please refer: Apple APIs
Thank you in advance for your help!
I have a model that uses a CoreML delegate, and I’m getting the following warning whenever I set the model to nil. My understanding is that CoreML is creating a cache in the app’s storage but is having issues clearing it. As a result, the app’s storage usage increases every time the model is loaded.
This StackOverflow post explains the problem in detail: App Storage Size Increases with CoreML usage
This is a critical issue because the cache will eventually fill up the phone’s storage:
doUnloadModel:options:qos:error:: model=_ANEModel: { modelURL=file:///var/mobile/Containers/Data/Application/22DDB13E-DABA-4195-846F-F884135F37FE/tmp/F38A9824-3944-420C-BD32-78CE598BE22D-10125-00000586EFDFD7D6.mlmodelc/ : sourceURL= (null) : key={"isegment":0,"inputs":{"0_0":{"shape":[256,256,1,3,1]}},"outputs":{"142_0":{"shape":[16,16,1,222,1]},"138_0":{"shape":[16,16,1,111,1]}}} : identifierSource=0 : cacheURLIdentifier=E0CD0F44FB0417936057FC6375770CFDCCC8C698592ED412DDC9C81E96256872_C9D6E5E73302943871DC2C610588FEBFCB1B1D730C63CA5CED15D2CD5A0AC0DA : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=6077141501305 : intermediateBufferHandle=6077142786285 : queueDepth=127 } : state=3 : programHandle=6077141501305 : intermediateBufferHandle=6077142786285 : queueDepth=127 : attr={
ANEFModelDescription = {
ANEFModelInput16KAlignmentArray = (
);
ANEFModelOutput16KAlignmentArray = (
);
ANEFModelProcedures = (
{
ANEFModelInputSymbolIndexArray = (
0
);
ANEFModelOutputSymbolIndexArray = (
0,
1
);
ANEFModelProcedureID = 0;
}
);
kANEFModelInputSymbolsArrayKey = (
"0_0"
);
kANEFModelOutputSymbolsArrayKey = (
"138_0@output",
"142_0@output"
);
kANEFModelProcedureNameToIDMapKey = {
net = 0;
};
};
NetworkStatusList = (
{
LiveInputList = (
{
BatchStride = 393216;
Batches = 1;
Channels = 3;
Depth = 1;
DepthStride = 393216;
Height = 256;
Interleave = 1;
Name = "0_0";
PlaneCount = 3;
PlaneStride = 131072;
RowStride = 512;
Symbol = "0_0";
Type = Float16;
Width = 256;
}
);
LiveOutputList = (
{
BatchStride = 113664;
Batches = 1;
Channels = 111;
Depth = 1;
DepthStride = 113664;
Height = 16;
Interleave = 1;
Name = "138_0@output";
PlaneCount = 111;
PlaneStride = 1024;
RowStride = 64;
Symbol = "138_0@output";
Type = Float16;
Width = 16;
},
{
BatchStride = 227328;
Batches = 1;
Channels = 222;
Depth = 1;
DepthStride = 227328;
Height = 16;
Interleave = 1;
Name = "142_0@output";
PlaneCount = 222;
PlaneStride = 1024;
RowStride = 64;
Symbol = "142_0@output";
Type = Float16;
Width = 16;
}
);
Name = net;
}
);
} : perfStatsMask=0} was not loaded by the client.
I use SoundAnalysis to analyze background sounds and have enabled background permissions. It worked well in previous iOS systems, but a warning appeared in the new iOS18beta version and sound analysis was stopped.
Warning List:
Execution of the command buffer was aborted due to an error during execution. Insufficient Permission (to submit GPU work from background)
[Espresso::handle_ex_plan] exception=Espresso exception: "Generic error": Insufficient Permission (to submit GPU work from background) (00000006:kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted); code=7 status=-1
Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).
CoreML prediction failed with Error Domain=com.apple.CoreML Code=0 "Failed to evaluate model 0 in pipeline" UserInfo={NSLocalizedDescription=Failed to evaluate model 0 in pipeline, NSUnderlyingError=0x30330e910 {Error Domain=com.apple.CoreML Code=0 "Failed to evaluate model 1 in pipeline" UserInfo={NSLocalizedDescription=Failed to evaluate model 1 in pipeline, NSUnderlyingError=0x303307840 {Error Domain=com.apple.CoreML Code=0 "Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1)." UserInfo={NSLocalizedDescription=Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).}}}}}
https://developer.apple.com/machine-learning/models/
Adding the DepthAnythingV2SmallF16.mlpackage to a new project in Xcode 16.1 and invoking the class crashes the app.
Anyone else having the same issue?
I tried Xcode 16.2 beta and it has the same response.
Code
import UIKit
import CoreML
class ViewController : UIViewController {
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view.
do {
// Use a default model configuration.
let defaultConfig = MLModelConfiguration()
// app crashes here
let model = try? DepthAnythingV2SmallF16(
configuration: defaultConfig
)
} catch {
//
}
}
}
Response
/AppleInternal/Library/BuildRoots/4b66fb3c-7dd0-11ef-b4fb-4a83e32a47e1/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:129: failed assertion Error: unhandled platform for MPSGraph serialization'
`
I have seen a lot of tutorials on pytorchvision models being able to be converted to coreml models but I have not been able to google or find any tutorials for torchaudio models. Is converting to a torchaudio to coreml model even possible? Does anybody have links that show how to do it?
Hello All,
I'm developing a machine learning model for image classification, which requires managing an exceptionally large dataset comprising over 18,000 classes. I've encountered several hurdles while using Create ML, and I would appreciate any insights or advice from those who have faced similar challenges.
Current Issues:
Create ML Failures with Large Datasets:
When using Create ML, the process often fails with errors such as "Failed to create CVPixelBufferPool." This issue appears when handling particularly large volumes of data.
Custom Implementation Struggles:
To bypass some of the limitations of Create ML, I've developed a custom solution leveraging the MLImageClassifier within the CreateML framework in my own SwiftUI MacOS app.
Initially I had similar errors as I did in Create ML, but I discovered I could move beyond the "extracting features" stage without crashing by employing a workaround: using a timer to cancel and restart the job every 30 seconds. This method is the only way I've been able to finish the extraction phase, even with large datasets, but it causes many errors in the console if I allow it to run too long.
Lack of Progress Reporting:
Using MLJob<MLImageClassifier>, I've noticed that progress reporting stalls after the feature extraction phase. Although system resources indicate activity, there is no programmatic feedback on what is occurring.
Things I've Tried:
Data Validation: Ensured that all images in the dataset are valid and non-corrupted, which helps prevent unnecessary issues from faulty data.
Custom Implementation with CreateML Framework: Developed a custom solution using the MLImageClassifier within the CreateML framework to gain more control over the training process.
Timer-Based Workaround: Employed a workaround using a timer to cancel and restart the job every 30 seconds to move past the "extracting features" phase, allowing progress even with larger datasets.
Monitoring System Resources: Observed ongoing system resource usage when process feedback stalled, confirming background processing activity despite the lack of progress reporting.
Subset Testing: Successfully created and tested a model on a subset of the data, which validated the approach worked for smaller datasets and could produce a functioning model.
Router Model Concept: Considered training multiple models for different subsets of data and implementing a "router" model to decide which specialized model to utilize based on input characteristics.
What I Need Help With:
Handling Large Datasets:
I'm seeking strategies or best practices for effectively utilizing Create ML with large datasets.
Any guidance on memory management or alternative methodologies would be immensely helpful.
Improving Progress Reporting:
I'm looking for ways to obtain more consistent and programmatic progress updates during the training and testing phases.
I'm working on a Mac M1 Pro w/ 32GB RAM, with Apple Silicon and am fully integrated within the Apple ecosystem. I am very grateful for any advice or experiences you could share to help overcome these challenges.
Thank you!
I've pasted the relevant code below:
func go() {
if self.trainingSession == nil {
self.trainingSession = createTrainingSession()
}
if self.startTime == nil {
self.startTime = Date()
}
job = try! MLImageClassifier.resume(self.trainingSession)
job.phase
.receive(on: RunLoop.main)
.sink { phase in
self.phase = phase
}
.store(in: &cancellables)
job.checkpoints
.receive(on: RunLoop.main)
.sink { checkpoint in
self.state = "\(checkpoint)\n\(self.job.progress)"
self.progress = self.job.progress.fractionCompleted + 0.2
self.updateTimeEstimates()
}
.store(in: &cancellables)
job.result
.receive(on: DispatchQueue.main)
.sink(receiveCompletion: { completion in
switch completion {
case .failure(let error):
print("Training Failed: \(error.localizedDescription)")
case .finished:
print("🎉🎉🎉🎉 TRAINING SESSION FINISHED!!!!")
self.trainingFinished = true
}
}, receiveValue: { classifier in
Task {
await self.saveModel(classifier)
}
})
.store(in: &cancellables)
}
private func createTrainingSession() -> MLTrainingSession<MLImageClassifier> {
do {
print("Initializing training Data...")
let trainingData: MLImageClassifier.DataSource = .labeledDirectories(at: trainingDataURL)
let modelParameters = MLImageClassifier.ModelParameters(
validation: .split(strategy: .automatic),
augmentation: self.augmentations,
algorithm: .transferLearning(
featureExtractor: .scenePrint(revision: 2),
classifier: .logisticRegressor
)
)
let sessionParameters = MLTrainingSessionParameters(
sessionDirectory: self.sessionDirectoryURL,
reportInterval: 1,
checkpointInterval: 100,
iterations: self.numberOfIterations
)
print("Initializing training session...")
let trainingSession: MLTrainingSession<MLImageClassifier>
if FileManager.default.fileExists(atPath: self.sessionDirectoryURL.path) && isSessionCreated(atPath: self.sessionDirectoryURL.path()) {
do {
trainingSession = try MLImageClassifier.restoreTrainingSession(sessionParameters: sessionParameters)
}
catch {
print("error resuming, exiting.... \(error.localizedDescription)")
fatalError()
}
}
else {
trainingSession = try MLImageClassifier.makeTrainingSession(
trainingData: trainingData,
parameters: modelParameters,
sessionParameters: sessionParameters
)
}
return trainingSession
} catch {
print("Failed to initialize training session: \(error.localizedDescription)")
fatalError()
}
}
Hi, while trying to diagnose why some of my Core ML models are running slower when their configuration is set with compute units .CPU_AND_GPU compared to running with .CPU_ONLY I've been attempting to create Core ML model performance reports in Xcode to identify the operations that are not compatible with the GPU. However, when selecting an iPhone as the connected device and compute unit of 'All', 'CPU and GPU' or 'CPU and Neural Engine' Xcode displays one of the following two error messages:
“There was an error creating the performance report. The performance report has crashed on device”
"There was an error creating the performance report. Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model."
The performance reports are successfully generated when selecting the connected device as iPhone with compute unit 'CPU only' or Mac with any combination of compute units.
Some of the models I have found the issue to occur with are stateful, some are not. I have tried to replicate the issue with some example models from the CoreML tools stateful model guide/video Bring your machine learning and AI models to Apple silicon. Running the performance report on a model generated from the Simple Accumulator example code the performance report is created successfully when trying all compute unit options, but using models from the toy attention and toy attention with kvcache examples it is only successful with compute units as 'CPU only' when choosing iPhone as the device.
Versions I'm currently working with:
Xcode Version 16.0
MacOS Sequoia 15.0.1
Core ML Tools 8.0
iPhone 16 Pro iOS 18.0.1
Is there a way to avoid these errors? Or is there another way to identify which operations within a CoreML model are supported to run on iPhone GPU/Neural engine?
Hi All,
I am trying to build a new iOS app by following https://developer.apple.com/videos/play/wwdc2024/10163/?time=67
When I trying to remove all legacy VN I am getting error, I would appreciate if someone can help me get up to speed with the new Vision API
I'm hitting a limit when trying to train an Image Classifier.
It's at about 16k images (in line with the error info) - and it gives the error:
IOSurface creation failed: e00002be parentID: 00000000 properties: {
IOSurfaceAllocSize = 529984;
IOSurfaceBytesPerElement = 4;
IOSurfaceBytesPerRow = 1472;
IOSurfaceElementHeight = 1;
IOSurfaceElementWidth = 1;
IOSurfaceHeight = 360;
IOSurfaceName = CoreVideo;
IOSurfaceOffset = 0;
IOSurfacePixelFormat = 1111970369;
IOSurfacePlaneComponentBitDepths = (
8,
8,
8,
8
);
IOSurfacePlaneComponentNames = (
4,
3,
2,
1
);
IOSurfacePlaneComponentRanges = (
1,
1,
1,
1
);
IOSurfacePurgeWhenNotInUse = 1;
IOSurfaceSubsampling = 1;
IOSurfaceWidth = 360;
} (likely per client IOSurface limit of 16384 reached)
I feel like I was able to use more images than this before upgrading to Sonoma - but I don't have the receipts....
Is there a way around this?
I have oodles of spare memory on my machine - it's using about 16gb of 64 when it crashes...
code to create the model is
let parameters = MLImageClassifier.ModelParameters(validation: .dataSource(validationDataSource),
maxIterations: 25,
augmentation: [],
algorithm: .transferLearning(
featureExtractor: .scenePrint(revision: 2),
classifier: .logisticRegressor
))
let model = try MLImageClassifier(trainingData: .labeledDirectories(at: trainingDir.url), parameters: parameters)
I have also tried the same training source in CreateML, it runs through 'extracting features', and crashes at about 16k images processed.
Thank you
I'm trying to run a coreML model.
This is an image classifier generated using:
let parameters = MLImageClassifier.ModelParameters(validation: .dataSource(validationDataSource),
maxIterations: 25,
augmentation: [],
algorithm: .transferLearning(
featureExtractor: .scenePrint(revision: 2),
classifier: .logisticRegressor
))
let model = try MLImageClassifier(trainingData: .labeledDirectories(at: trainingDir.url), parameters: parameters)
I'm trying to run it with the new async Vision api
let model = try MLModel(contentsOf: modelUrl)
guard let modelContainer = try? CoreMLModelContainer(model: model) else {
fatalError("The model is missing")
}
let request = CoreMLRequest(model: modelContainer)
let image = NSImage(named:"testImage")!
let cgImage = image.toCGImage()!
let handler = ImageRequestHandler(cgImage)
do {
let results = try await handler.perform(request)
print(results)
} catch {
print("Failed: \(error)")
}
This gives me
Failed: internalError("Error Domain=com.apple.Vision Code=7 "The VNDetectorProcessOption_ScenePrints required option was not found" UserInfo={NSLocalizedDescription=The VNDetectorProcessOption_ScenePrints required option was not found}")
Please help! Am I missing something?
I'm experiencing issues with the Core ML Async API, as it doesn't seem to be working correctly. It consistently hangs during the
"03 performInference, after get smallInput, before prediction" part,
as shown in the attached:
log1.txt
log2.txt
Below is my code. Could you please advise on how I should modify it?
private func createFrameAsync(for sampleBuffer: CMSampleBuffer ) {
guard let pixelBuffer = sampleBuffer.imageBuffer else { return }
Task {
print("**** createFrameAsync before performInference")
do {
try await runModelAsync(on: pixelBuffer)
} catch {
print("Error processing frame: \(error)")
}
print("**** createFrameAsync after performInference")
}
}
func runModelAsync(on pixelbuffer: CVPixelBuffer) async
{
print("01 performInference, before resizeFrame")
guard let data = metalResizeFrame(sourcePixelFrame: pixelbuffer, targetSize: MTLSize.init(width: InputWidth, height: InputHeight, depth: 1), resizeMode: .scaleToFill) else {
os_log("Preprocessing failed", type: .error)
return
}
print("02 performInference, after resizeFrame, before get smallInput")
let input = model_smallInput(input: data)
print("03 performInference, after get smallInput, before prediction")
if let prediction = try? await mlmodel!.model.prediction(from: input) {
print("04 performInference, after prediction, before get result")
var results: [Float] = []
let output = prediction.featureValue(for: "output")?.multiArrayValue
if let bufferPointer = try? UnsafeBufferPointer<Float>(output!) {
results = Array(bufferPointer)
}
print("05 performInference, after get result, before setRenderData")
let localResults = results
await MainActor.run {
ScreenRecorder.shared
.setRenderDataNormalized(
screenImage: pixelbuffer,
depthData: localResults
)
}
print("06 performInference, after setRenderData")
}
}
Hi all, I'm tuning my app prediction speed with Core ML model. I watched and tried the methods in video: Improve Core ML integration with async prediction and Optimize your Core ML usage. I also use instruments to look what's the bottleneck that my prediction speed cannot be faster.
Below is the instruments result with my app. its prediction duration is 10.29ms
And below is performance report shows the average speed of prediction is 5.55ms, that is about half time of my app prediction!
Below is part of my instruments records. I think the prediction should be considered quite frequent. Could it be faster?
How to be the same prediction speed as performance report? The prediction speed on macbook Pro M2 is nearly the same as macbook Air M1!
Hi everyone,
I’m currently in the process of converting and optimizing the Stable Diffusion XL model for iOS 18. I followed the steps from the WWDC 2024 session on model optimization, specifically the one titled "Bring your machine learning and AI models to Apple Silicon."
I utilized the Stable Diffusion XL model and the tools available in the ml-stable-diffusion GitHub repository and ran the following script to convert the model into an .mlpackage:
python3 -m python_coreml_stable_diffusion.torch2coreml \
--convert-unet \
--convert-vae-decoder \
--convert-text-encoder \
--xl-version \
--model-version stabilityai/stable-diffusion-xl-base-1.0 \
--bundle-resources-for-swift-cli \
--refiner-version stabilityai/stable-diffusion-xl-refiner-1.0 \
--attention-implementation SPLIT_EINSUM \
-o ../PotraitModel/ \
--custom-vae-version madebyollin/sdxl-vae-fp16-fix \
--latent-h 128 \
--latent-w 96 \
--chunk-unet
The model conversion worked without any issues. However, when I proceeded to optimize the model in a Jupyter notebook, following the same process shown in the WWDC session, I encountered an error during the post-training quantization step. Here’s the code I used for that:
op_config = cto_coreml.0pPalettizerConfig(
nbits=4,
mode="kmeans",
granularity="per_grouped_channel",
group_size=16,
)
config = cto_coreml.OptimizationConfig(op_config)
compressed_model = cto_coreml.palettize_weights(mlmodel, config)
Unfortunately, I received the following error:
AssertionError: The IOS16 only supports per-tensor LUT, but got more than one lut on 0th axis. LUT shape: (80, 1, 1, 1, 16, 1)
It appears that the minimum deployment target of the MLModel is set to iOS 16, which might be causing compatibility issues. How can I update the minimum deployment target to iOS 18? If anyone has encountered this issue or knows a workaround, I would greatly appreciate your guidance!
Thanks in advance for any help!
i believe i am encountering a bug in the MPS backend of CoreML.
i believe there is an invalid conversion of a slice_by_index + gather operation resulting in indexing the wrong values on GPU execution.
the following is a python program using the coremltools library illustrating the issue:
from coremltools.converters.mil import Builder as mb
from coremltools.converters.mil.mil import types
dB = 20480
shapeI = (2, dB)
shapeB = (dB, 22)
@mb.program(input_specs=[mb.TensorSpec(shape=shapeI, dtype=types.int32),
mb.TensorSpec(shape=shapeB)])
def prog(i, b):
lslice = mb.slice_by_index(x=i, begin=[0, 0], end=[1, dB], end_mask=[False, True],
squeeze_mask=[True, False], name='slice_left')
rslice = mb.slice_by_index(x=i, begin=[1, 0], end=[2, dB], end_mask=[False, True],
squeeze_mask=[True, False], name='slice_right')
ldata = mb.gather(x=b, indices=lslice)
rdata = mb.gather(x=b, indices=rslice) # actual bug in optimization of gather+slice
x = mb.add(x=ldata, y=rdata)
# dummy ops to make a bigger graph to run on GPU
x = mb.mul(x=x, y=2.)
x = mb.mul(x=x, y=.5)
x = mb.mul(x=x, y=2.)
x = mb.mul(x=x, y=.5)
x = mb.mul(x=x, y=2.)
x = mb.mul(x=x, y=.5)
x = mb.mul(x=x, y=2.)
x = mb.mul(x=x, y=.5)
x = mb.mul(x=x, y=2.)
x = mb.mul(x=x, y=.5)
x = mb.mul(x=x, y=2.)
x = mb.mul(x=x, y=.5)
x = mb.mul(x=x, y=2.)
x = mb.mul(x=x, y=.5)
x = mb.mul(x=x, y=1., name='result')
return x
input_types = [
ct.TensorType(name="i", shape=shapeI, dtype=np.int32),
ct.TensorType(name="b", shape=shapeB, dtype=np.float32),
]
with tempfile.TemporaryDirectory() as tmpdirname:
model_cpu = ct.convert(prog,
inputs=input_types,
compute_precision=ct.precision.FLOAT32,
compute_units=ct.ComputeUnit.CPU_ONLY,
package_dir=tmpdirname + 'model_cpu.mlpackage')
model_gpu = ct.convert(prog,
inputs=input_types,
compute_precision=ct.precision.FLOAT32,
compute_units=ct.ComputeUnit.CPU_AND_GPU,
package_dir=tmpdirname + 'model_gpu.mlpackage')
inputs = {
"i": torch.randint(0, shapeB[0], shapeI, dtype=torch.int32),
"b": torch.rand(shapeB, dtype=torch.float32),
}
cpu_output = model_cpu.predict(inputs)
gpu_output = model_gpu.predict(inputs)
# equivalent to prog
expected = inputs["b"][inputs["i"][0]] + inputs["b"][inputs["i"][1]]
# what actually happens on GPU
actual = inputs["b"][inputs["i"][0]] + inputs["b"][inputs["i"][0]]
print(f"diff expected vs cpu: {np.sum(np.absolute(expected - cpu_output['result']))}")
print(f"diff expected vs gpu: {np.sum(np.absolute(expected - gpu_output['result']))}")
print(f"diff actual vs gpu: {np.sum(np.absolute(actual - gpu_output['result']))}")
the issue seems to occur in the slice_right + gather operations when executed on GPU. the wrong items in input "i" are selected.
the program outpus
diff expected vs cpu: 0.0
diff expected vs gpu: 150104.015625
diff actual vs gpu: 0.0
this behavior has been tested on MacBook Pro 14inches 2023, (M2 pro) on mac os 14.7, using coremltools 8.0b2 with python 3.9.19
All errors in TranslationError return the same error code, making it difficult to differentiate between them. How can this issue be resolved?