Apple Developer Forums

why there's nerual engine-data copy in coreml npu prediction

I am currently facing a performance issue while using CoreML on iOS 16+ devices to run a simple grid_sample model. When profiling the model using xcode Profiler, I noticed that before each NPU computation, there is a significant delay caused by the "input copy" and "neural engine-data copy" operations.I have specified that both the input and output of the model are of type float16, there shouldn't be any data type convert. I would appreciate any insights or suggestions regarding the reasons behind this delay and possible solutions My simple model is class GridSample(torch.nn.Module): def __init__( self, ): super().__init__() def forward(self, input: torch.Tensor, grid: torch.Tensor) -> torch.Tensor: output = F.grid_sample( input, grid.to(input), mode='nearest', padding_mode='zeros', align_corners=True, ) return output tr_input = torch.randn((8, 64, 512, 512) tr_grid = torch.randn((8, 256, 256, 2) simple_model = GridSample() simple_model.eval() traced_model = torch.jit.trace(simple_model, [tr_input, tr_grid]) coreml_input = [coremltools.TensorType(name="image_input", shape=tr_input.shape, dtype=np.float16), coremltools.TensorType(name="warp_grid", shape=tr_grid.shape, dtype=np.float16)] mlmodel = coremltools.converters.convert(traced_model, inputs=coreml_input, convert_to="mlprogram", minimum_deployment_target=coremltools.target.iOS16, compute_units=coremltools.ComputeUnit.ALL, compute_precision = coremltools.precision.FLOAT16, outputs=[ct.TensorType(name="x0", dtype=np.float16)], debug=False) mlmodel.save("./grid_sample.mlpackage") os.system(f"xcrun coremlcompiler compile './grid_sample.mlpackage' './')

Machine Learning & AI General ML Compute Core ML

0

738

Feb ’24

Memory leaks when using GPU

I haven't used the GPU implementation for over a year now due to constant issues (I use tf.config.set_visible_devices([], 'GPU') to use CPU only. I have also had a couple of issues with model convergence using GPU, however this issue seems more prominent, and possibly unrelated. Here is an example of code that causes a memory leak using GPU (I cannot link the dataset, but it is called: Text classification documentation, by TANISHQ DUBLISH on Kaggle. import pandas as pd import numpy as np import matplotlib.pyplot as plt import tensorflow as tf df = pd.read_csv('df_file.csv') df.head() train_df = df.sample(frac=0.7, random_state=42) val_df = df.drop(train_df.index).sample(frac=0.5, random_state=42) test_df = df.drop(train_df.index).drop(val_df.index) train_dataset = tf.data.Dataset.from_tensor_slices((train_df['Text'].values, train_df['Label'].values)).batch(32).prefetch(tf.data.AUTOTUNE) val_dataset = tf.data.Dataset.from_tensor_slices((val_df['Text'].values, val_df['Label'].values)).batch(32).prefetch(tf.data.AUTOTUNE) test_dataset = tf.data.Dataset.from_tensor_slices((test_df['Text'].values, test_df['Label'].values)).batch(32).prefetch(tf.data.AUTOTUNE) text_vectorizer = tf.keras.layers.TextVectorization(max_tokens=100_000, output_mode='int', output_sequence_length=1000, pad_to_max_tokens=True) text_vectorizer.adapt(train_df['Text'].values) embedding = tf.keras.layers.Embedding(input_dim=len(text_vectorizer.get_vocabulary()), output_dim=128, input_length=1000) inputs = tf.keras.layers.Input(shape=[], dtype=tf.string) x = text_vectorizer(inputs) x = embedding(x) x = tf.keras.layers.LSTM(64)(x) outputs = tf.keras.layers.Dense(5, activation='softmax')(x) model_2 = tf.keras.Model(inputs, outputs, name='model_2_lstm') model_2.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), optimizer=tf.keras.optimizers.legacy.Adam(), metrics=['accuracy']) model_2_history = model_2.fit(train_dataset, epochs=50, validation_data=val_dataset, callbacks=[ tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True), tf.keras.callbacks.ModelCheckpoint(model_2.name, save_best_only=True), tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', patience=5, verbose=1) ])

Machine Learning & AI General tensorflow-metal

0

579

Feb ’24

Use front camera for data scanner

I'm using DataScannerViewController with SwiftUI to scan text and barcodes from a card. I would like the user to be able to hold the card in front of the device, but I am not finding a way to select the front camera with DataScannerViewController. Does anyone know of a way to select the front camera?

Machine Learning & AI General VisionKit

0

439

Feb ’24

Ionic capacitor xcode. Error Domain=com.apple.VisionKit.RemoveBackground Code=-8 "(null)"

After migrating my ionic cordova app to ionic capacitor I am encountering a persistent white screen on a particular page. Along with this, I have observed the following error messages in the console: Error Message: [com.apple.VisionKit.RemoveBackground] Request to remove background on an unsupported device. Error Domain=com.apple.VisionKit.RemoveBackground Code=-8 "(null)" Error Message: [UILog] Called -[UIContextMenuInteraction updateVisibleMenuWithBlock:] while no context menu is visible. This won't do anything. The actual page becomes visible after clicking on that white screen. the same code is working fine for android build but facing issue on ios.

Machine Learning & AI General VisionKit

1

0

658

Feb ’24

Massives issues with tensorflow gpu, when will apple do something?

Hello, We all face issues with the latest tensorflow gpu. Incorrect result, errors etc... We all agreed to pay extra for the M1/2/3 so we could work on a professional grade computer but in the end we must use CPU. When will apple actually comment on that and provide updates. I totally understand these issues aren't fixed overnight and take some time, but i've never seen any apple dev answer saying that they understand and they're working on a fix. I've basically bought a Mac M3 Pro to be able to run on GPU some stuff without having to purchase a server and it's now useless. It's really frustrating.

Machine Learning & AI General tensorflow-metal

7

3

2.4k

Feb ’24

Vision Pro & Vision SDK

I'm exploring my Vision Pro and finding it unclear whether I can even achieve things like body pose detection etc. https://developer.apple.com/videos/play/wwdc2023/111241/ It's clear that I can apply it to self provided images, but how about to the data coming from visionOS SDKs? All I can find is this mesh data from ARKit, https://developer.apple.com/documentation/arkit/arkit_in_visionos - am I missing something or do we not yet have good APIs for this? Appreciate any guidance! Thanks.

Machine Learning & AI General Vision Machine Learning Core ML visionOS

2

0

1.3k

Feb ’24

jax-metal error on jax.numpy.einsum

Problem I am trying to use the jax.numpy.einsum function (https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.einsum.html). However, for some subscripts, this seems to fail. Hardware Apple M1 Max, 32GB RAM Steps to Reproduce follow installation steps from https://developer.apple.com/metal/jax/ conda create -n 'jax_metal_demo' python=3.11 conda activate jax_metal_demo python -m pip install numpy wheel ml-dtypes==0.2.0 python -m pip install jax-metal Save the following code in a file called minimal_example.py import numpy as np from jax import device_put import jax.numpy as jnp np.random.seed(0) a = np.random.rand(11, 12, 13, 11, 12) b = np.random.rand(11, 12, 13) subscripts = 'ijklm,ijk->lmk' # intended result print(np.einsum(subscripts, a, b)) # will cause crash a, b = device_put(a), device_put(b) print(jnp.einsum(subscripts, a, b)) run the code python minimal_example.py Output I waas expecting Platform 'METAL' is experimental and not all JAX functionality may be correctly supported! 2024-02-12 16:45:34.684973: W pjrt_plugin/src/mps_client.cc:563] WARNING: JAX Apple GPU support is experimental and not all JAX functionality is correctly supported! Metal device set to: Apple M1 Max systemMemory: 32.00 GB maxCacheSize: 10.67 GB Traceback (most recent call last): File "/Users/linus/workspace/minimal_example.py", line 15, in <module> print(jnp.einsum(subscripts, a, b)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/linus/miniforge3/envs/jax_metal_demo/lib/python3.11/site-packages/jax/_src/numpy/lax_numpy.py", line 3369, in einsum return _einsum_computation(operands, contractions, precision, # type: ignore[operator] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/linus/miniforge3/envs/jax_metal_demo/lib/python3.11/contextlib.py", line 81, in inner return func(*args, **kwds) ^^^^^^^^^^^^^^^^^^^ jaxlib.xla_extension.XlaRuntimeError: UNKNOWN: /Users/linus/workspace/minimal_example.py:15:6: error: failed to legalize operation 'mhlo.dot_general' print(jnp.einsum(subscripts, a, b)) ^ /Users/linus/workspace/minimal_example.py:15:6: note: see current operation: %0 = "mhlo.dot_general"(%arg1, %arg0) {dot_dimension_numbers = #mhlo.dot<lhs_batching_dimensions = [2], rhs_batching_dimensions = [2], lhs_contracting_dimensions = [0, 1], rhs_contracting_dimensions = [0, 1]>, precision_config = [#mhlo<precision DEFAULT>, #mhlo<precision DEFAULT>]} : (tensor<11x12x13xf32>, tensor<11x12x13x11x12xf32>) -> tensor<13x11x12xf32> -------------------- For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these. Conclusion I would greatly appreciate any ideas for workarounds.

Machine Learning & AI General tensorflow-metal

2

0

946

Feb ’24

CreateML debug log lines?

Where can I find CreateML logs? I'd like to inspect log lines if they exist to diagnose what kind of error the app encounters when I provide it training data for a multi-label image classifier and the UI displays "Data Analysis stopped". I do see some crash reports for "MLRecipeExecutionService" in the Console app which seem related, but I haven't spotted anything useful there yet.

Machine Learning & AI Create ML Create ML

1

3

739

Feb ’24

Create ML maximum grid size

Is 30x30 the maximum grid size on Create ML App? The input allows me to set any number higher than that, but on starting training, the number falls back to 30x30. Is that a limitation or a bug in the app?

Machine Learning & AI Core ML Core ML Create ML

1

753

Feb ’24

Vision Pro CoreML inference 10x slower than M1 Mac/seems to run on CPU

Have a CoreML model that I run in my app Spatial Media Toolkit which lets you convert 2D photos to Spatial. Running the model on my 13" M1 mac gets 70ms inference. Running the exact same code on my Vision Pro takes 700ms. I'm working on adding video support but Vision Pro inference is feeling impossible due to 700ms per frame (20x realtime for for 30fps! 1 sec of video takes 20 sec!) There's a ModelConfiguration you can provide, and when I force CPU I get the same exact performance. Either it's only running on CPU, the NeuralEngine is throttled, or maybe GPU isn't allowed to help out. Disappointing but also feels like a software issue. Would be curious if anyone else has hit this/have any workarounds

Machine Learning & AI Core ML Core ML visionOS

0

685

Feb ’24

How to use CoreML to accept multiple frames as input

I want to use CoreML to process video data. The ML model will take multiple frames as input. How should I get multi frames from ios and process it? Thanks in advance for any suggestions.

Machine Learning & AI General ML Compute Image I/O

0

566

Feb ’24

PyTorch convert function for op 'intimplicit' not implemented

I am trying to coremltools.converters.convert a traced PyTorch model and I got an error: PyTorch convert function for op 'intimplicit' not implemented I am trying to convert a RVC model from github. I traced the model with torch.jit.trace and it fails. So I traced down the problematic part to the ** layer : https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/infer/lib/infer_pack/modules.py#L188 import torch import coremltools as ct from infer.lib.infer_pack.modules import ** model = **(192, 5, dilation_rate=1, n_layers=16, ***_channels=256, p_dropout=0) model.remove_weight_norm() model.eval() test_x = torch.rand(1, 192, 200) test_x_mask = torch.rand(1, 1, 200) test_g = torch.rand(1, 256, 1) traced_model = torch.jit.trace(model, (test_x, test_x_mask, test_g), check_trace = True) x = ct.TensorType(name='x', shape=test_x.shape) x_mask = ct.TensorType(name='x_mask', shape=test_x_mask.shape) g = ct.TensorType(name='g', shape=test_g.shape) mlmodel = ct.converters.convert(traced_model, inputs=[x, x_mask, g]) I got an error RuntimeError: PyTorch convert function for op 'intimplicit' not implemented. How could I modify the **::forward so it does not generate an intimplicit operator ? Thanks David

Machine Learning & AI Core ML Core ML

1

0

903

Feb ’24

Operation not available: Mattel

InvalidArgumentError: Cannot assign a device for operation don_nn/model_2/branch_hidden0/MatMul/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node don_nn/model_2/branch_hidden0/MatMul/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].

Machine Learning & AI General tensorflow-metal

0

503

Feb ’24

Poor Quality 2021 MBP Speakers

I've only been using this late 2021 MBP 16 for nearly 2 years, and now the speaker is producing a crackling sound. Upon inquiring about repairs, customer service informed me that it would cost $728 to replace the speaker, which is a third of the price of the laptop itself. It's absolutely absurd that a $2200 laptop's speaker would fail within such a short period without any external damage. The repair cost being a third of the laptop's price is outrageous. I intend to initiate a petition in the US, hoping to connect with others experiencing the same problem. This is indicative of a subpar product, and customers shouldn't bear the burden of Apple's shortcomings. I plan to share my grievances on various social media platforms and if the issue persists, I will escalate it to the media for further exposure.

Machine Learning & AI General Sound Analysis

2

0

625

Feb ’24

Does Low Power Mode make system pressure interruptions during video capture more likely?

In investigating a capture session crash, it's unclear what's causing occasional system pressure interruptions, except that it's happening on older iOS devices. Does Low Power Mode have a meaningful impact on whether these interruptions happen?

Machine Learning & AI Core ML Core ML AVFoundation

0

592

Feb ’24

Better Results with Separate Sound Classifier Models?

I'm working with MLSoundClassifier to try to look for 2 different sounds in a live audio stream. I have been debating with the team if it is better to train 2 separate models, one for each different sound, or train 1 model on both sounds? Has anyone had any experience with this. Some of us believe that we have received better results with the separate models and some with 1 single model trained on both sounds. Thank you!

Machine Learning & AI General Sound Analysis Core ML

0

657

Feb ’24

Tensorflow-Metal Errors

Hi i am trying to set up tensorflow-metal as instructed by https://developer.apple.com/metal/tensorflow-plugin/ when running line (python -m pip install tensorflow-metal) I get the following error: ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none) ERROR: No matching distribution found for tensorflow-metal According to the troubleshooting section: "Check that the Python version used in the environment is supported (Python 3.8, Python 3.9, Python 3.10)." My current version is Python 3.9.12. Any insight would be great!

Machine Learning & AI General tensorflow-metal

5

1

1.2k

Mar ’24

Transferable Item in Reality View

Can you use View with Transferable View in the one WindowGroup to another ImmersiveSpace with RealityView? I can drag, but the drop event isn't captured when with RealityView var body: some View { let droppable = Droppable( model: model ) RealityView { content in // Add the initial RealityKit content content.add(floorEntity) } .onDrop( of: ... // or .dropDestination( For ... {} //or .gesture( DragGesture() .targetedToAnyEntity() .onChanged({ value in none of them triggers the drop

Machine Learning & AI General VisionKit

0

344

Mar ’24

Python - Complex-valued linear algebra on GPU

Hi, I am looking for a routine to perform complex-valued linear algebra on the GPU in python for scientific programming, in particular quantum physics simulations. At the moment I am looking for a routine for complex-valued matrix multiplication. I found MLX has a routine for float matrix multiplication, but it does not directly work for complex-valued matrices. I figured a work-around by splitting the complex valued matrix into real and imaginary part and working with the pair, but it makes it cumbersome to integrate with the remainder of the code. I was hoping for a library-based implementation similar to cupy. I also tried out using the tensorflow linear algebra routines, but I couldn't get them to run on the GPU by now. Specifically, a testfile with a tensorflow.keras.applications.ResNet50 routine runs on the GPU, but the routines from tensorflow.linalg and tensorflow.math that I tested (matmul, expm, eigh) were not running on the GPU. Any advice on how to make linear algebra calculations on mac GPUs work is highly appreciated! For my application the unified memory might be especially beneficial. Thank you!

Machine Learning & AI General ML Compute Metal tensorflow-metal

0

707

Mar ’24

Xcode 15.3 AppIntentsSSUTraining warning: missing the definition of locale # variables.1.definitions

Hello! I've noticed that adding localizations for AppShortcuts triggers the following warnings in Xcode 15.3: warning: missing the definition of zh-Hans # variables.1.definitions warning: missing the definition of zh-Hans # variables.2.definitions This occurs with both legacy strings files and String Catalogs. Example project: https://github.com/gongzhang/AppShortcutsLocalizationWarningExample

Machine Learning & AI General App Intents

7

4

1.8k

Mar ’24

Machine Learning & AI

Post

Replies

Boosts

Views

Activity