Hi,
I met the same issues that posted here: https://developer.apple.com/forums/thread/684263?login=true&page=1#693284022
My mac also cannot find mlcompute module.
Unlike the previous question, I wanna use CPU instead of GPU during training.
I've tried a bunch of methods that I can easily find online, can anyone help me address this? is it a tensorflow-macos or tensorflow-metal bug?
Post
Replies
Boosts
Views
Activity
I followed the instruction by apple to install the latest tensorflow-metal and tensorflow-macos. and then test the performance of GPU with the tensorflow MINIST example: https://www.tensorflow.org/datasets/keras_example
notice that I cannot use CPU only as mlcompute module cannot be found. don't know why.
so basically I cannot train any model now.
the output of MINIST example:
Metal device set to Apple M1 Pro
2021-10-31 00:13:03.147040: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-31 00:13:03.147896: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-31 00:13:03.148505: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )
Epoch 1/6
and that's it, it just keeps freezing.
Extremely appreciate if someone can help me to address these issues.
OS version: 12.01
tensorflow: 2.6.0
tensorflow-deps: 2.6.0
tensorflow-macos: 2.6.0
tensorflow-metal: 0.2.0
conda: 4.10.3
Python 3.8.12
I ran example code on jupyter notebook
here is the example code:
import sys
import time
import tensorflow as tf
import tensorflow.keras
import pandas as pd
import sklearn as sk
try:
import tensorflow_datasets as tfds
except:
!pip install -q tensorflow_datasets
import tensorflow_datasets as tfds
import tensorflow.compat.v2 as tf
tf.enable_v2_behavior()
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()
# from tensorflow.python.compiler.mlcompute import mlcompute
# mlcompute.set_mlc_device(device_name='gpu')
# (mlcompute cannot be found)
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label
batch_size = 128
ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)
ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, kernel_size=(3, 3),
activation='relu'),
tf.keras.layers.Conv2D(64, kernel_size=(3, 3),
activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
# tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
# tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(0.001),
metrics=['accuracy'],
)
start_time = time.time()
model.fit(
ds_train,
epochs=10,
validation_data=ds_test,
)
print("--- %s minutes with GPU ---" % ((time.time() - start_time)/60))
here are the outputs:
commenting
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()
kernel died:
Metal device set to: Apple M1 Pro
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB
Epoch 1/10
2021-10-31 19:10:20.277599: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-31 19:10:20.278606: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-31 19:10:20.279185: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
with
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()
RuntimeError: Caught an unknown exception!:
I installed tensorflow-macos and tensorflow-metal by following https://developer.apple.com/metal/tensorflow-plugin/
I am struggling of using tensorflow on new MBP for many days and it just totally a nightmare. I met tons of issues, searched ways to solved them and tried tons of methods and none of them can really help me to train a NN.
Honestly, I am out of patience right now. I brought MBP as I suppose it can facilitate my work and it turns out I even can use tensorflow! so ridiculous!
Tensorflow: 2.6.0
Keras Version: 2.6.0
Python 3.8.12
macOS: 12.0.1
Anaconda: 2.0.3
tensorflow-macos: 2.6.0
tensorflow-metal: 0.2.0
I followed this guideline to install TensorFlow: https://developer.apple.com/metal/tensorflow-plugin/
and run example code on miniforge3 base environment, but importing the numpy C-extensions failed. because * The Python version is: Python3.9
The NumPy version is: "1.19.5"
then I tried to create a new conda environment with python 3.8, and then tried to install tensorflow-macos, but it failed because of grpcio cannot be installed.
so in this case, how can I solve this problem?
I followed this guideline to install tensorflow https://developer.apple.com/metal/tensorflow-plugin/
but sklearn cannot be found so I used conda install sklearn and then somehow sklearn module still cannot be imported.
Here is the outputs when I tried to import sklearn:
(base) (tensorflow-metal) a@A ~ % python
Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:24:02)
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/a/miniforge3/lib/python3.9/site-packages/sklearn/__init__.py", line 82, in <module>
from .base import clone
File "/Users/a/miniforge3/lib/python3.9/site-packages/sklearn/base.py", line 17, in <module>
from .utils import _IS_32BIT
File "/Users/a/miniforge3/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 28, in <module>
from .fixes import np_version, parse_version
File "/Users/a/miniforge3/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 20, in <module>
import scipy.stats
File "/Users/a/miniforge3/lib/python3.9/site-packages/scipy/stats/__init__.py", line 441, in <module>
from .stats import *
File "/Users/a/miniforge3/lib/python3.9/site-packages/scipy/stats/stats.py", line 37, in <module>
from scipy.spatial.distance import cdist
File "/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/__init__.py", line 98, in <module>
from .qhull import *
ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib
Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so
Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file)
>>>
some people said sklearn cannot be used in M1 Chip, is it right?
tensorflow-macos: 2.6.0
tensorflow-metal: 0.2.0
macOS: 12.0.1
Many thanks for any help.
Hi,
I installed skearn successfully and ran the MINIST toy example successfully.
then I started to run my project. The finning thing everything seems good at the start point (at least no ImportError occurs). but when I made some changes of my code and try to run all cells (I use jupyter lab) again, ImportError occurs.....
ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib
Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so
Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file)
then I have to uninstall scipy, sklearn, etc and reinstall all of them. and my code can be ran again.....
Magically I hate to say, anyone knows how to permanently solve this problem? make skearn more stable?
Hi everyone,
I found that the performance of GPU is not good as I expected (as slow as a turtle), I wanna switch from GPU to CPU. but mlcompute module cannot be found, so wired.
The same code ran on colab and my computer (jupyter lab) take 156s vs 40 minutes per epoch, respectively.
I only used a small dataset (a few thousands of data points), and each epoch only have 20 baches.
I am so disappointing and it seems like the "powerful" GPU is a joke.
I am using 12.0.1 macOS and the version of tensorflow-macos is 2.6.0
Can anyone tell me why this happens?
as above.
I found that M1 Chip has extremely bad performance on training RNNs (158s vs 6h). Could anyone know why the M1 Pro chip is so unfriendly for RNNs? How can I only use CPU to run my code as mlcompute package cannot be recognized somehow.