Apple Developer Forums

H1xANELoadBalancer is taking longer to load

We have an application that receives a message (through MQTT) from an external system to snap a photo, runs a CoreML vision request on the image, and then sends the results back. The customer has 100s of devices and recently on a couple of those devices (13 pros), the customer encountered an issue in which the devices were not responding in time. There was no crash, just some individual inferences were slowed down. The device performs 1000s of requests per day. Upon further evaluation of the request before and after in the device logs, I noticed that Apple loads the following default 2024-09-04 13:18:31.310401 -0400 ProcessName Processing image for reference: *** default 2024-09-04 13:18:31.403606 -0400 ProcessName Found matching service: H1xANELoadBalancer default 2024-09-04 13:18:31.403646 -0400 ProcessName Found matching service: H11ANEIn default 2024-09-04 13:18:31.403661 -0400 ProcessName Found ANE device :1 default 2024-09-04 13:18:31.403681 -0400 ProcessName Total num of devices 1 default 2024-09-04 13:18:31.403681 -0400 ProcessName (Single-ANE System) Opening H11ANE device at index 0 default 2024-09-04 13:18:31.403681 -0400 ProcessName H11ANEDevice::H11ANEDeviceOpen, usage type: 1 In a good scenario (above), these actions will performed very quickly (in a split second). The app doesn't do anything until coreml inference result is returned. In the bad scenario (below), there is a delay of about 4 seconds from app passing the control to vision request and then getting the response back (leading to timeouts with the customer) default 2024-09-04 13:19:08.777468 -0400 ProcessName Processing image for reference: ZZZ default 2024-09-04 13:19:12.199758 -0400 ProcessName Found matching service: H1xANELoadBalancer default 2024-09-04 13:19:12.199800 -0400 ProcessName Found matching service: H11ANEIn default 2024-09-04 13:19:12.199812 -0400 ProcessName Found ANE device :1 default 2024-09-04 13:19:12.199832 -0400 ProcessName Total num of devices 1 default 2024-09-04 13:19:12.199834 -0400 ProcessName (Single-ANE System) Opening H11ANE device at index 0 default 2024-09-04 13:19:12.199834 -0400 ProcessName H11ANEDevice::H11ANEDeviceOpen, usage type: 1 The logs are in order, I haven't removed anything. The code is fairly simple, it's just running a vision request without doing much. Has anyone encountered this before?

Machine Learning & AI Core ML

0

1

336

Sep ’24

The CoreML MultiArray Float16 input is not supported for running on the NPU, and this issue only occurs on the iPhone 11.

Xcode Version: Version 15.2 (15C500b) com.github.apple.coremltools.source: torch==1.12.1 com.github.apple.coremltools.version: 7.2 Compute: Mixed (Float16, Int32) Storage: Float16 The input to the mlpackage is MultiArray (Float16 1 × 1 × 544 × 960) The flexibility is: 1 × 1 × 544 × 960 | 1 × 1 × 384 × 640 | 1 × 1 × 736 × 1280 | 1 × 1 × 1088 × 1920 I tested this on iPhone XR, iPhone 11, iPhone 12, iPhone 13, and iPhone 14. On all devices except the iPhone 11, the model runs correctly on the NPU. However, on the iPhone 11, the model runs on the CPU instead. Here is the CoreMLTools conversion code I used: mlmodel = ct.convert(trace, inputs=[ct.TensorType(shape=input_shape, name="input", dtype=np.float16)], outputs=[ct.TensorType(name="output", dtype=np.float16, shape=output_shape)], convert_to='mlprogram', minimum_deployment_target=ct.target.iOS16 )

Machine Learning & AI Core ML iPhone iOS Core ML

3

0

505

Sep ’24

How to Ensure Quantized Models Run on ANE on iPhone 15 (iOS 18 Beta 8)

When I use CoreML to infer a w8a8 model on iPhone 15 (iOS 18 beta 8), the model uses CPU inference instead of ANE, which results in slower inference speed. The model I am using is from the coremltools documentation, which indicates that on iOS 17, quantized models can run on ANE properly and achieve faster speeds. How can I make the quantized model run correctly on ANE to achieve the desired inference speed? To reproduce this issue, you can download the Weight & Activation quantized model from the following link: https://apple.github.io/coremltools/docs-guides/source/opt-quantization-perf.html.

Machine Learning & AI Core ML iPhone iOS

0

405

Sep ’24

crash when modelWithContentsOfURL in iOS 16+

We have a code that crashed The crash stack is as follows Thread 26 Crashed: 0 CoreFoundation 0x0000000198b0569c CFRelease + 44 1 CoreFoundation 0x0000000198b12334 __CFBasicHashRehash + 1172 2 CoreFoundation 0x0000000198b015dc __CFBasicHashAddValue + 100 3 CoreFoundation 0x0000000198b232e4 CFDictionarySetValue + 208 4 Foundation 0x00000001979b0378 _getStringAtMarker + 464 5 Foundation 0x00000001979b016c _NSXPCSerializationStringForObject + 56 6 Foundation 0x00000001979cec4c __44-[NSXPCDecoder _decodeArrayOfObjectsForKey:]_block_invoke + 52 7 Foundation 0x00000001979ceb90 _NSXPCSerializationIterateArrayObject + 208 8 Foundation 0x00000001979cda7c -[NSXPCDecoder _decodeArrayOfObjectsForKey:] + 240 9 Foundation 0x00000001979cd1bc -[NSDictionary(NSDictionary) initWithCoder:] + 176 10 Foundation 0x00000001979ae6e8 _decodeObject + 1264 11 Foundation 0x00000001979cec4c __44-[NSXPCDecoder _decodeArrayOfObjectsForKey:]_block_invoke + 52 12 Foundation 0x00000001979ceb90 _NSXPCSerializationIterateArrayObject + 208 13 Foundation 0x00000001979cda7c -[NSXPCDecoder _decodeArrayOfObjectsForKey:] + 240 14 Foundation 0x00000001979cd1a4 -[NSDictionary(NSDictionary) initWithCoder:] + 152 15 Foundation 0x00000001979ae6e8 _decodeObject + 1264 16 Foundation 0x00000001979ad030 -[NSXPCDecoder _decodeObjectOfClasses:atObject:] + 148 17 Foundation 0x0000000197a0a7f0 _NSXPCSerializationDecodeTypedObjCValuesFromArray + 892 18 Foundation 0x0000000197a0a1f8 _NSXPCSerializationDecodeInvocationArgumentArray + 412 19 Foundation 0x0000000197a0866c -[NSXPCDecoder __decodeXPCObject:allowingSimpleMessageSend:outInvocation:outArguments:outArgumentsMaxCount:outMethodSignature:outSelector:isReply:replySelector:] + 700 20 Foundation 0x0000000197a61078 -[NSXPCDecoder _decodeReplyFromXPCObject:forSelector:] + 76 21 Foundation 0x0000000197a5f690 -[NSXPCConnection _decodeAndInvokeReplyBlockWithEvent:sequence:replyInfo:] + 252 22 Foundation 0x0000000197a63664 __88-[NSXPCConnection _sendInvocation:orArguments:count:methodSignature:selector:withProxy:]_block_invoke_5 + 188 23 Foundation 0x0000000197a08058 -[NSXPCConnection _sendInvocation:orArguments:count:methodSignature:selector:withProxy:] + 2244 24 CoreFoundation 0x0000000198b19d88 ___forwarding___ + 1016 25 CoreFoundation 0x0000000198b198d0 _CF_forwarding_prep_0 + 96 26 AppleNeuralEngine 0x00000001e912ab1c -[_ANEDaemonConnection loadModel:sandboxExtension:options:qos:withReply:] + 332 27 AppleNeuralEngine 0x00000001e912a674 __44-[_ANEClient doLoadModel:options:qos:error:]_block_invoke + 360 28 libdispatch.dylib 0x00000001a0a21dd4 _dispatch_client_callout + 20 29 libdispatch.dylib 0x00000001a0a312c4 _dispatch_lane_barrier_sync_invoke_and_complete + 56 30 AppleNeuralEngine 0x00000001e9129ef0 -[_ANEClient doLoadModel:options:qos:error:] + 500 31 Espresso 0x00000001a7e02034 Espresso::ANERuntimeEngine::compiler::build_segment(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, Espresso::net_compiler_segment_based::segment_t const&) + 3736 32 Espresso 0x00000001a7e010cc Espresso::net_compiler_segment_based::build(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, int) + 384 33 Espresso 0x00000001a7df02a4 Espresso::ANERuntimeEngine::compiler::build(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, int) + 120 34 Espresso 0x00000001a7e1b3a4 Espresso::net::__build(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, int) + 360 35 Espresso 0x00000001a7e178e0 Espresso::abstract_context::compute_batch_sync(void (std::__1::shared_ptr<Espresso::abstract_batch> const&) block_pointer) + 112 36 Espresso 0x00000001a7e198b8 EspressoLight::espresso_plan::prepare_compiler_if_needed() + 3208 37 Espresso 0x00000001a7e183f4 EspressoLight::espresso_plan::prepare() + 1712 38 Espresso 0x00000001a7da8e78 espresso_plan_build_with_options + 300 39 Espresso 0x00000001a7da8d30 espresso_plan_build + 44 40 CoreML 0x00000001b346645c -[MLNeuralNetworkEngine rebuildPlan:error:] + 536 41 CoreML 0x00000001b3464294 -[MLNeuralNetworkEngine _setupContextAndPlanWithConfiguration:usingCPU:reshapeWithContainer:error:] + 3132 42 CoreML 0x00000001b34797a0 -[MLNeuralNetworkEngine initWithContainer:configuration:error:] + 196 43 CoreML 0x00000001b347962c +[MLNeuralNetworkEngine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] + 164 44 CoreML 0x00000001b34792a0 +[MLLoader _loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] + 144 45 CoreML 0x00000001b3478c64 +[MLLoader _loadModelFromArchive:configuration:modelVersion:compilerVersion:loaderEvent:useUpdatableModelLoaders:loadingClasses:error:] + 532 46 CoreML 0x00000001b34650c8 +[MLLoader _loadWithModelLoaderFromArchive:configuration:loaderEvent:useUpdatableModelLoaders:error:] + 424 47 CoreML 0x00000001b3474bc8 +[MLLoader _loadModelFromArchive:configuration:loaderEvent:useUpdatableModelLoaders:error:] + 460 48 CoreML 0x00000001b347a024 +[MLLoader _loadModelFromAssetAtURL:configuration:loaderEvent:error:] + 244 49 CoreML 0x00000001b3479cbc +[MLLoader loadModelFromAssetAtURL:configuration:error:] + 104 50 CoreML 0x00000001b347ac2c -[MLModelAsset load:] + 564 51 CoreML 0x00000001b347a9c4 -[MLModelAsset modelWithError:] + 24 52 CoreML 0x00000001b347a7b4 +[MLModel modelWithContentsOfURL:configuration:error:] + 172 53 CoreML 0x00000001b37afbc4 +[MLModel modelWithContentsOfURL:error:] + 76 Core code MLModel* model = nil; NSError *error = nil; @try { model = [MLModel modelWithContentsOfURL:modelURL error:&error]; } @catch (NSException *exception) { model = nil; return Ret_OperationErr_InvalidInit; } Two question: What does this stack mean? I added @ try @ catch, why is it still crashing?

Machine Learning & AI Core ML

1

0

395

Sep ’24

how speed up modelWithContentsOfURL function?

Recently, deep learning projects have been getting larger, and sometimes loading models has become a bottleneck. I download the .mlpackage format CoreML from the internet and need to use compileModelAtURL to convert the .mlpackage into an .mlmodelc, then call modelWithContentsOfURL to convert the .mlmodelc into a handle. Generally, generating a handle with modelWithContentsOfURL is very slow. I noticed from WWDC 2023 that it is possible to cache the compiled results (see https://developer.apple.com/videos/play/wwdc2023/10049/?time=677, which states "This compilation includes further optimizations for the specific compute device and outputs an artifact that the compute device can run. Once complete, Core ML caches these artifacts to be used for subsequent model loads."). However, it seems that I couldn't find how to cache in the documentation.

Machine Learning & AI Core ML

1

0

444

Aug ’24

how speed up modelWithContentsOfURL？

Recently, deep learning model have been getting larger, and sometimes loading models has become a bottleneck. I download the .mlpackage format CoreML from the internet and need to use compileModelAtURL to convert the .mlpackage into an .mlmodelc, then call modelWithContentsOfURL to convert the .mlmodelc into a handle. Generally, generating a handle with modelWithContentsOfURL is very slow. I noticed from WWDC 2023 that it is possible to cache the compiled results (see https://developer.apple.com/videos/play/wwdc2023/10049/?time=677, which states "This compilation includes further optimizations for the specific compute device and outputs an artifact that the compute device can run. Once complete, Core ML caches these artifacts to be used for subsequent model loads."). However, it seems that I couldn't find how to cache in the documentation.

Machine Learning & AI Core ML

1

0

338

Aug ’24

Vision framework not working on Apple Vision Pro

com.apple.Vision Code=9 "Could not build inference plan - ANECF error: failed to load ANE model file:///System/Library/Frameworks/ Vision.framework/anodv4_drop6_fp16.H14G.espresso.hwx Code rise this error: func imageToHeadBox(image: CVPixelBuffer) async throws -> [CGRect] { let request:DetectFaceRectanglesRequest = DetectFaceRectanglesRequest() let faceResult:[FaceObservation] = try await request.perform(on: image) let faceBoxs:[CGRect] = faceResult.map { face in let faceBoundingBox:CGRect = face.boundingBox.cgRect return faceBoundingBox } return faceBoxs }

Machine Learning & AI Core ML Vision visionOS

1

0

542

Aug ’24

MLTensor computation took more time than expected.

func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.

Machine Learning & AI Core ML ML Compute Accelerate Performance Core ML

1

0

451

Aug ’24

MLTensor computation took more time than expected.

func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.

Machine Learning & AI Core ML ML Compute Accelerate Core ML

0

340

Aug ’24

MLTensor computation took more time than expected.

func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.

Machine Learning & AI Core ML ML Compute Accelerate

0

294

Aug ’24

iOS 18 Beta - Proper error code is not given by TranslationError

All errors in TranslationError return the same error code, making it difficult to differentiate between them. How can this issue be resolved?

Machine Learning & AI Core ML Swift Student Challenge iOS Machine Learning Core ML

1

0

414

Aug ’24

iOS 18.1 beta - App crashes at runtime while using Translation.TranslationError in project

I'm trying to cast the error thrown by TranslationSession.translations(from:) as Translation.TranslationError. However, the app crashes at runtime whenever Translation.TranslationError is used in the project. Environment: iOS Version: 18.1 beta Xcode Version: 16 beta yld[14615]: Symbol not found: _$s11Translation0A5ErrorVMa Referenced from: <3426152D-A738-30C1-8F06-47D2C6A1B75B> /private/var/containers/Bundle/Application/043A25BC-E53E-4B28-B71A-C21F77C0D76D/TranslationAPI.app/TranslationAPI.debug.dylib Expected in: /System/Library/Frameworks/Translation.framework/Translation

Machine Learning & AI Core ML ML Compute Natural Language Live Text Apple Intelligence

1

776

Aug ’24

CoreML Crash on iOS18 Beta5

Hello, My App works well on iOS17 and previous iOS18 Beta version, while it crashes on latest iOS18 Beta5, when it calling model predictionFromFeatures. Calling stack of crash is as: *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: 'Unrecognized ANE execution priority MLANEExecutionPriority_Unspecified' Last Exception Backtrace: 0 CoreFoundation 0x000000019bd6408c __exceptionPreprocess + 164 1 libobjc.A.dylib 0x000000019906b2e4 objc_exception_throw + 88 2 CoreFoundation 0x000000019be5f648 -[NSException initWithCoder:] 3 CoreML 0x00000001b7507340 -[MLE5ExecutionStream _setANEExecutionPriorityWithOptions:] + 248 4 CoreML 0x00000001b7508374 -[MLE5ExecutionStream _prepareForInputFeatures:options:error:] + 248 5 CoreML 0x00000001b7507ddc -[MLE5ExecutionStream executeForInputFeatures:options:error:] + 68 6 CoreML 0x00000001b74ce5c4 -[MLE5Engine _predictionFromFeatures:stream:options:error:] + 80 7 CoreML 0x00000001b74ce7fc -[MLE5Engine _predictionFromFeatures:options:error:] + 208 8 CoreML 0x00000001b74cf110 -[MLE5Engine _predictionFromFeatures:usingState:options:error:] + 400 9 CoreML 0x00000001b74cf270 -[MLE5Engine predictionFromFeatures:options:error:] + 96 10 CoreML 0x00000001b74ab264 -[MLDelegateModel _predictionFromFeatures:usingState:options:error:] + 684 11 CoreML 0x00000001b70991bc -[MLDelegateModel predictionFromFeatures:options:error:] + 124 And my model file type is ml package file. Source code is as below: //model MLModel *_model; ...... // model init MLModelConfiguration* config = [[MLModelConfiguration alloc]init]; config.computeUnits = MLComputeUnitsCPUAndNeuralEngine; _model = [MLModel modelWithContentsOfURL:compileUrl configuration:config error:&error]; ..... // model prediction MLPredictionOptions *option = [[MLPredictionOptions alloc]init]; id<MLFeatureProvider> outFeatures = [_model predictionFromFeatures:_modelInput options:option error:&error]; Is there anything wrong? Any advice would be appreciated.

Machine Learning & AI Core ML Beta Debugging Machine Learning Core ML

3

1

561

Aug ’24

How to deploy Vision Transformer with ANE to Achieve Faster Uncached Load Speed

I wanted to deploy some ViT models on an iPhone. I referred to https://machinelearning.apple.com/research/vision-transformers for deployment and wrote a simple demo based on the code from https://github.com/apple/ml-vision-transformers-ane. However, I found that the uncached load time on the phone is very long. According to the blog, the input is already aligned to 64 bytes, but the speed is still very slow. Is there any way to speed it up? This is my test case: import torch import coremltools as ct import math from torch import nn class SelfAttn(torch.nn.Module): def __init__(self, window_size, num_heads, dim, dim_out): super().__init__() self.window_size = window_size self.num_heads = num_heads self.dim = dim self.dim_out = dim_out self.q_proj = nn.Conv2d( in_channels=dim, out_channels=dim_out, kernel_size=1, ) self.k_proj = nn.Conv2d( in_channels=dim, out_channels=dim_out, kernel_size=1, ) self.v_proj = nn.Conv2d( in_channels=dim, out_channels=dim_out, kernel_size=1, ) def forward(self, x): B, HW, C = x.shape image_shape = (B, C, self.window_size, self.window_size) x_2d = x.permute((0, 2, 1)).reshape(image_shape) # BCHW x_flat = torch.unsqueeze(x.permute((0, 2, 1)), 2) # BC1L q, k, v_2d = self.q_proj(x_flat), self.k_proj(x_flat), self.v_proj(x_2d) mh_q = torch.split(q, self.dim_out // self.num_heads, dim=1) # BC1L mh_v = torch.split( v_2d.reshape(B, -1, x_flat.shape[2], x_flat.shape[3]), self.dim_out // self.num_heads, dim=1 ) mh_k = torch.split( torch.permute(k, (0, 3, 2, 1)), self.dim_out // self.num_heads, dim=3 ) scale_factor = 1 / math.sqrt(mh_q[0].size(1)) attn_weights = [ torch.einsum("bchq, bkhc->bkhq", qi, ki) * scale_factor for qi, ki in zip(mh_q, mh_k) ] attn_weights = [ torch.softmax(aw, dim=1) for aw in attn_weights ] # softmax applied on channel "C" mh_x = [torch.einsum("bkhq,bchk->bchq", wi, vi) for wi, vi in zip(attn_weights, mh_v)] x = torch.cat(mh_x, dim=1) return x window_size = 8 path_batch = 1024 emb_dim = 96 emb_dim_out = 96 x = torch.rand(path_batch, window_size * window_size, emb_dim) qkv_layer = SelfAttn(window_size, 1, emb_dim, emb_dim_out) jit = torch.jit.trace(qkv_layer, (x)) mlmod_fixed_shape = ct.convert( jit, inputs=[ ct.TensorType("x", x.shape), ], convert_to="mlprogram", ) mlmodel_path = "test_ane.mlpackage" mlmod_fixed_shape.save(mlmodel_path) The uncached load took nearly 36 seconds, and it was just a single matrix multiplication.

Machine Learning & AI Core ML

0

1

373

Aug ’24

Bug Report: macOS 15 Beta - PyTorch gridsample Not Utilising Apple Neural Engine on MacBook Pro M2

In macOS 15 beta the gridsample function from PyTorch is not executing as expected on the Apple Neural Engine in MacBook Pro M2. Please find below a Python code snippet that demonstrates the problem: import coremltools as ct import torch.nn as nn import torch.nn.functional as F class PytorchGridSample(torch.nn.Module): def __init__(self, grids): super(PytorchGridSample, self).__init__() self.upsample1 = nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1) self.upsample2 = nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1) self.upsample3 = nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1) self.upsample4 = nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2, padding=1) self.upsample5 = nn.ConvTranspose2d(32, 3, kernel_size=4, stride=2, padding=1) self.grids = grids def forward(self, x): x = self.upsample1(x) x = F.grid_sample(x, self.grids[0], padding_mode='reflection', align_corners=False) x = self.upsample2(x) x = F.grid_sample(x, self.grids[1], padding_mode='reflection', align_corners=False) x = self.upsample3(x) x = F.grid_sample(x, self.grids[2], padding_mode='reflection', align_corners=False) x = self.upsample4(x) x = F.grid_sample(x, self.grids[3], padding_mode='reflection', align_corners=False) x = self.upsample5(x) x = F.grid_sample(x, self.grids[4], padding_mode='reflection', align_corners=False) return x def convert_to_coreml(model, input_): traced_model = torch.jit.trace(model, example_inputs=input_, strict=False) coreml_model = ct.converters.convert( traced_model, inputs=[ct.TensorType(shape=input_.shape)], compute_precision=ct.precision.FLOAT16, minimum_deployment_target=ct.target.macOS14, compute_units=ct.ComputeUnit.ALL ) return coreml_model def main(pt_model, input_): coreml_model = convert_to_coreml(pt_model, input_) coreml_model.save("grid_sample.mlpackage") if __name__ == "__main__": input_tensor = torch.randn(1, 512, 4, 4) grids = [torch.randn(1, 2*i, 2*i, 2) for i in [4, 8, 16, 32, 64, 128]] pt_model = PytorchGridSample(grids) main(pt_model, input_tensor)

Machine Learning & AI Core ML

0

328

Aug ’24

Upgraded to MacOS 15, CoreML models is more slower

After I upgraded to MacOS 15 Beta 4(M1 16G), the sampling speed of apple ml-stable-diffusion was about 40% slower than MacOS 14. And when I recompile and run with xcode 16, the following error will appear: loc("EpicPhoto/Unet.mlmodelc/model.mil":2748:12): error: invalid axis: 4294967296, axis must be in range -|rank| <= axis < |rank| Assertion failed: (0 && "failed to infer output types"), function _inferJITOutputTypes, file GPUBaseOps.mm, line 339. I checked the macos 15 release notes and saw that the problem of slow running of Core ML models was fixed, but it didn't seem to be fixed. Fixed: Inference time for large Core ML models is slower than expected on a subset of M-series SOCs (e.g. M1, M1 max) on macOS. (129682801)

Machine Learning & AI Core ML

2

0

405

Aug ’24

UI interface for on device LLMs / Foundation models

I was watching wwdc2024 Deploy machine learning and AI models on-device with Core ML (https://developer.apple.com/videos/play/wwdc2024/10161/) and speaker was showing UI interface where he was ruining on device LLMs / Foundation models. I was wondering if this UI interface is open source and I can download and play around with similar app what was shown:

Machine Learning & AI Core ML

2

1

464

Aug ’24

Help Needed: Error Codes in VCPHumanPoseImageRequest.mm[85] and NSArrayM insertObject

Hey all 👋🏼 We're currently working on a video processing project using the Vision framework (face, body and hand pose detection), and We've encountered a couple of errors that I need help with. We are on Xcode 16 Beta 3, testing on an iPhone 14 Pro running iOS 18 beta. The error messages are as follows: [LOG_ERROR] /Library/Caches/com.apple.xbs/Sources/MediaAnalysis/VideoProcessing/VCPHumanPoseImageRequest.mm[85]: code 18,446,744,073,709,551,598 encountered an unexpected condition: *** -[__NSArrayM insertObject:atIndex:]: object cannot be nil What we've tried: Debugging: I’ve tried stepping through the code, but the errors occur before I can gather any meaningful insights. Searching Documentation: Looked through Apple’s developer documentation and forums but couldn’t find anything related to these specific error codes. Nil Check: Added checks to ensure objects are not nil before inserting them into arrays, but the error persists. Here are my questions: Has anyone encountered similar errors with the Vision framework, specifically related to VCPHumanPoseImageRequest and NSArray operations? Is there any known issue or bug in the version of the framework I might be using? Could it also be related to the beta? Are there any additional debug steps or logging mechanisms I can implement to narrow down the cause? Any suggestions on how to handle nil objects more effectively in this context? I would greatly appreciate any insights or suggestions you might have. Thank you in advance for your assistance! Thanks all!

Machine Learning & AI Core ML Foundation Vision Core Image

3

0

628

Jul ’24

Matmul with quantized weight does not run on ANE with FP16 offset: `ane: Failed to retrieved zero_point`

Hi, the following model does not run on ANE. Inspecting with deCoreML I see the error ane: Failed to retrieved zero_point. import numpy as np import coremltools as ct from coremltools.converters.mil import Builder as mb import coremltools.converters.mil as mil B, CIN, COUT = 512, 1024, 1024 * 4 @mb.program( input_specs=[ mb.TensorSpec((B, CIN), mil.input_types.types.fp16), ], opset_version=mil.builder.AvailableTarget.iOS18 ) def prog_manual_dequant( x, ): qw = np.random.randint(0, 2 ** 4, size=(COUT, CIN), dtype=np.int8).astype(mil.mil.types.np_uint4_dtype) scale = np.random.randn(COUT, 1).astype(np.float16) offset = np.random.randn(COUT, 1).astype(np.float16) # offset = np.random.randint(0, 2 ** 4, size=(COUT, 1), dtype=np.uint8).astype(mil.mil.types.np_uint4_dtype) dqw = mb.constexpr_blockwise_shift_scale(data=qw, scale=scale, offset=offset) return mb.linear(x=x, weight=dqw) cml_qmodel = ct.convert( prog_manual_dequant, compute_units=ct.ComputeUnit.CPU_AND_NE, compute_precision=ct.precision.FLOAT16, minimum_deployment_target=ct.target.iOS18, ) Whereas if I use an offset with the same dtype as the weights (uint4 in this case), it does run on ANE Tested on coremltools 8.0b1, on macOS 15.0 beta 2/Xcode 15 beta 2, and macOS 15.0 beta 3/Xcode 15 beta 3.

Machine Learning & AI Core ML Core ML

0

479

Jul ’24

Missing GPU implementation Op:StatelessRandomGetKeyCounter for the Embedding layer in tensorflow-metal

The Keras Embedding layer cannot be calculated on Metal because of the missing Op:StatelessRandomGetKeyCounter, as shown in this error message: tensorflow.python.framework.errors_impl.InvalidArgumentError: Could not satisfy device specification '/job:localhost/replica:0/task:0/device:GPU:0'. enable_soft_placement=0. Supported device types [CPU]. All available devices [/job:localhost/replica:0/task:0/device:GPU:0, /job:localhost/replica:0/task:0/device:CPU:0]. [Op:StatelessRandomGetKeyCounter] A workaround is to enable soft placement, but this obviously is slower: tf.config.set_soft_device_placement(True) Reporting it here as recommended by the TensorFlow Plugin Metal team.

Machine Learning & AI Core ML tensorflow-metal

0

511

Jul ’24

Core ML

Post

Replies

Boosts

Views

Activity