Post

Replies

Boosts

Views

Activity

M1 GPU is extremely slow, how can I enable CPU to train my NNs?
Hi everyone, I found that the performance of GPU is not good as I expected (as slow as a turtle), I wanna switch from GPU to CPU. but mlcompute module cannot be found, so wired. The same code ran on colab and my computer (jupyter lab) take 156s vs 40 minutes per epoch, respectively. I only used a small dataset (a few thousands of data points), and each epoch only have 20 baches. I am so disappointing and it seems like the "powerful" GPU is a joke. I am using 12.0.1 macOS and the version of tensorflow-macos is 2.6.0 Can anyone tell me why this happens?
9
1
10k
Nov ’21
Sklearn is unstable on Apple Silicon
Hi, I installed skearn successfully and ran the MINIST toy example successfully. then I started to run my project. The finning thing everything seems good at the start point (at least no ImportError occurs). but when I made some changes of my code and try to run all cells (I use jupyter lab) again, ImportError occurs..... ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib   Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so   Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file) then I have to uninstall scipy, sklearn, etc and reinstall all of them. and my code can be ran again..... Magically I hate to say, anyone knows how to permanently solve this problem? make skearn more stable?
6
1
6.6k
Nov ’21
miniforge3 doesn't include sklearn?
I followed this guideline to install tensorflow https://developer.apple.com/metal/tensorflow-plugin/ but sklearn cannot be found so I used conda install sklearn and then somehow sklearn module still cannot be imported. Here is the outputs when I tried to import sklearn: (base) (tensorflow-metal) a@A ~ % python Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:24:02)  [Clang 11.1.0 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sklearn Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/Users/a/miniforge3/lib/python3.9/site-packages/sklearn/__init__.py", line 82, in <module>     from .base import clone   File "/Users/a/miniforge3/lib/python3.9/site-packages/sklearn/base.py", line 17, in <module>     from .utils import _IS_32BIT   File "/Users/a/miniforge3/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 28, in <module>     from .fixes import np_version, parse_version   File "/Users/a/miniforge3/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 20, in <module>     import scipy.stats   File "/Users/a/miniforge3/lib/python3.9/site-packages/scipy/stats/__init__.py", line 441, in <module>     from .stats import *   File "/Users/a/miniforge3/lib/python3.9/site-packages/scipy/stats/stats.py", line 37, in <module>     from scipy.spatial.distance import cdist   File "/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/__init__.py", line 98, in <module>     from .qhull import * ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib   Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so   Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file) >>>  some people said sklearn cannot be used in M1 Chip, is it right? tensorflow-macos: 2.6.0 tensorflow-metal: 0.2.0 macOS: 12.0.1 Many thanks for any help.
4
0
2.3k
Oct ’21
tensorflow-macos cannot be installed on Python 3.8 virtual environment
I followed this guideline to install TensorFlow: https://developer.apple.com/metal/tensorflow-plugin/ and run example code on miniforge3 base environment, but importing the numpy C-extensions failed. because  * The Python version is: Python3.9 The NumPy version is: "1.19.5" then I tried to create a new conda environment with python 3.8, and then tried to install tensorflow-macos, but it failed because of grpcio cannot be installed. so in this case, how can I solve this problem?
2
0
1.5k
Oct ’21
Help! Cannot train any NNs with tensorflow on M1 Chip
here is the example code: import sys import time import tensorflow as tf import tensorflow.keras import pandas as pd import sklearn as sk try: import tensorflow_datasets as tfds except: !pip install -q tensorflow_datasets import tensorflow_datasets as tfds import tensorflow.compat.v2 as tf tf.enable_v2_behavior() from tensorflow.python.framework.ops import disable_eager_execution disable_eager_execution() # from tensorflow.python.compiler.mlcompute import mlcompute # mlcompute.set_mlc_device(device_name='gpu') # (mlcompute cannot be found) (ds_train, ds_test), ds_info = tfds.load( 'mnist', split=['train', 'test'], shuffle_files=True, as_supervised=True, with_info=True, ) def normalize_img(image, label): """Normalizes images: `uint8` -> `float32`.""" return tf.cast(image, tf.float32) / 255., label batch_size = 128 ds_train = ds_train.map( normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE) ds_train = ds_train.cache() ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples) ds_train = ds_train.batch(batch_size) ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE) ds_test = ds_test.map( normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE) ds_test = ds_test.batch(batch_size) ds_test = ds_test.cache() ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE) model = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu'), tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu'), tf.keras.layers.MaxPooling2D(pool_size=(2, 2)), # tf.keras.layers.Dropout(0.25), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation='relu'), # tf.keras.layers.Dropout(0.5), tf.keras.layers.Dense(10, activation='softmax') ]) model.compile( loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(0.001), metrics=['accuracy'], ) start_time = time.time() model.fit( ds_train, epochs=10, validation_data=ds_test, ) print("--- %s minutes with GPU ---" % ((time.time() - start_time)/60)) here are the outputs: commenting from tensorflow.python.framework.ops import disable_eager_execution disable_eager_execution() kernel died: Metal device set to: Apple M1 Pro systemMemory: 16.00 GB maxCacheSize: 5.33 GB Epoch 1/10 2021-10-31 19:10:20.277599: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-10-31 19:10:20.278606: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-10-31 19:10:20.279185: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) with from tensorflow.python.framework.ops import disable_eager_execution disable_eager_execution() RuntimeError: Caught an unknown exception!: I installed tensorflow-macos and tensorflow-metal by following https://developer.apple.com/metal/tensorflow-plugin/ I am struggling of using tensorflow on new MBP for many days and it just totally a nightmare. I met tons of issues, searched ways to solved them and tried tons of methods and none of them can really help me to train a NN. Honestly, I am out of patience right now. I brought MBP as I suppose it can facilitate my work and it turns out I even can use tensorflow! so ridiculous! Tensorflow: 2.6.0 Keras Version: 2.6.0 Python 3.8.12 macOS: 12.0.1 Anaconda: 2.0.3 tensorflow-macos: 2.6.0 tensorflow-metal: 0.2.0
2
0
1.2k
Oct ’21
Model keeps freezing and cannot train on Apple M1 Pro chip.
I followed the instruction by apple to install the latest tensorflow-metal and tensorflow-macos. and then test the performance of GPU with the tensorflow MINIST example: https://www.tensorflow.org/datasets/keras_example notice that I cannot use CPU only as mlcompute module cannot be found. don't know why. so basically I cannot train any model now. the output of MINIST example: Metal device set to Apple M1 Pro 2021-10-31 00:13:03.147040: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-10-31 00:13:03.147896: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-10-31 00:13:03.148505: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) Epoch 1/6 and that's it, it just keeps freezing. Extremely appreciate if someone can help me to address these issues. OS version: 12.01 tensorflow: 2.6.0 tensorflow-deps: 2.6.0  tensorflow-macos: 2.6.0 tensorflow-metal: 0.2.0 conda: 4.10.3 Python 3.8.12 I ran example code on jupyter notebook
4
0
1.1k
Oct ’21
MacBook Pro 2021 cannot find mlcompute module
Hi, I met the same issues that posted here: https://developer.apple.com/forums/thread/684263?login=true&page=1#693284022 My mac also cannot find mlcompute module. Unlike the previous question, I wanna use CPU instead of GPU during training. I've tried a bunch of methods that I can easily find online, can anyone help me address this? is it a tensorflow-macos or tensorflow-metal bug?
2
0
963
Oct ’21