PS : If I really "must" use Xcode (for some obscure mandatory reasons), I guess I will. However, forget about telling me to code my software in ObjC or Swift. I already tried.
Post
Replies
Boosts
Views
Activity
From my understanding and information I gathered here and there over time : the neural engine is inferior to the gpu in every aspect for training a TF model and is ... kind of useless to us, developper ? If I extrapolate from the information I found, it's only useful for the tiny model (per today's standard) like the Apple's OCR (eg : you can copy/paste written in image), speech recognition, touchpad gesture, etc ...
We really lack documentation indeed. I had weird case were cpu was faster than gpu too. ^^
I only have the M1 (non pro/max)
To fully disable the CPU I use this :
tf.config.set_visible_devices([], 'GPU')
call it first before doing anything else.
You might also want to display what device is used for what operation :
tf.debugging.set_log_device_placement(True))
It's very verbose and the 1st step is usually mostly cpu (function tracing).
From my experience too : don't use float16 (not faster) and don't use mixed_precision (it fallback to CPU), at least on my M1.
Give a try to this option too :
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0],True)
This isn't unexpected, on any platform with any device. Sometime the CPU is faster than the GPU.
Sometime my M1 on my MacBook Air 13" is faster than my Nvidia Quadro, or a Tesla K80.
It depend on the workload.
It's not specific to TensorFlow metal.
To be 100% sure you disable the GPU in order to test :
tf.config.set_visible_devices([], 'GPU')
I've installed Tensorflow multiple time on Mac M1 using this guide https://developer.apple.com/metal/tensorflow-plugin/
Just follow it step by step, don't skip the miniforge3 installation, it is absolutely mandatory to install and use the one provided in the guide.
Tested on python 3.8 and 3.9. Tensorflow is not supported on 3.10 (yet)
I upgraded to 12.1 today.
I just launched a DCGAN, I'll let you know.
BUT, I have other model in training (an autoencoder) and haven't noticed any difference since yesterday.
I'm still on Epoch 5, on a MacBook Air M1 2020, but it look fine too me. so far.
My other trainings run just fine too. look like you just got bad luck on this run ? What about the other intermediary result ? do they all look bad ?
edit : I also have some very bad result sometimes, weird. is there a problem with random generation ?
i have a model that heavily use random.uniform, I'll check.
EDIT again : I need to double check but random is broken in some situation
wrote a minimal use case, this used to generate 2 different series :
import tensorflow as tf
x = tf.random.uniform((10,))
y = tf.random.uniform((10,))
tf.print(x)
tf.print(y)
[0.178906798 0.8810848 0.384304762 ... 0.162458301 0.64780426 0.0123682022]
[0.178906798 0.8810848 0.384304762 ... 0.162458301 0.64780426 0.0123682022]
works fine on collab :
It also works fine if I disable GPU with :
tf.config.set_visible_devices([], 'GPU')
WORKAROUND :
g = tf.random.Generator.from_non_deterministic_state()
x = g.uniform((10,))
y = g.uniform((10,))
tf.print(x)
tf.print(y)
See this post, this should help, you have exactly the same problem : https://developer.apple.com/forums/thread/696693
reformatting your code :
import tensorflow as tf
from tensorflow.python.compiler.mlcompute import mlcompute
tf.compat.v1.disable_eager_execution()
mlcompute.set_mlc_device(device_name='gpu')
print("is_apple_mlc_enabled %s" % mlcompute.is_apple_mlc_enabled())
print("is_tf_compiled_with_apple_mlc %s" % mlcompute.is_tf_compiled_with_apple_mlc())
print(f"eagerly? {tf.executing_eagerly()}")
print(tf.config.list_logical_devices())
it look like some seriously old code, just do this instead
import tensorflow as tf
print(tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
tf.print(physical_devices)
ex :
2.7.0
Metal device set to: Apple M1
systemMemory: 8.00 GB
maxCacheSize: 2.67 GB
2021-12-20 23:11:09.001976: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-12-20 23:11:09.002466: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
It's a perfectly normal and harmless message on a M1.
I have it too and my model & code works just fine.
2021-12-20 23:19:04.025952: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-12-20 23:19:04.026364: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: Apple M1
systemMemory: 8.00 GB
maxCacheSize: 2.67 GB
__________________________________________________________________________________________________
2021-12-20 23:19:04.413489: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
Epoch 1/10
2021-12-20 23:19:04.723827: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
32/32 [==============================] - ETA: 0s - loss: 0.0256 - accuracy: 0.9605 - mae: 0.0933 - mse: 0.02562021-12-20 23:19:24.073636: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
32/32 [==============================] - 20s 608ms/step - loss: 0.0256 - accuracy: 0.9605 - mae: 0.0933 - mse: 0.0256 - val_loss: 0.0100 - val_accuracy: 0.9855 - val_mae: 0.0650 - val_mse: 0.0100
Epoch 2/10
32/32 [==============================] - 19s 585ms/step - loss: 0.0079 - accuracy: 0.9787 - mae: 0.0568 - mse: 0.0079 - val_loss: 0.0063 - val_accuracy: 0.9869 - val_mae: 0.0534 - val_mse: 0.0063
Epoch 3/10
32/32 [==============================] - 18s 575ms/step - loss: 0.0060 - accuracy: 0.9700 - mae: 0.0506 - mse: 0.0060 - val_loss: 0.0045 - val_accuracy: 0.9776 - val_mae: 0.0438 - val_mse: 0.0045
Epoch 4/10
....
shape[114389,320] ? are you sure you're not doing something wrong here ?
The workaround doesn't work in a tf.function, this is a real problem.
I tried other alternative like :
randomgen = tf.random.Generator.from_non_deterministic_state()
#%%
for _ in range(10):
g2 = tf.random.get_global_generator()
x = g2.uniform((10,),(1,2))
y = g2.uniform((10,),(3,4))
tf.print(x)
tf.print(y)
But
NotFoundError: No registered 'RngReadAndSkip' OpKernel for 'GPU' devices compatible with node {{node RngReadAndSkip}}
. Registered: device='CPU'
[Op:RngReadAndSkip]
And obviously calling this in a tf.function will always generate the same sequence
tf.random.stateless_uniform((size,),(1,2),xmin,xmax,tf.float32)
this doesn't works too :
randomgen = tf.random.Generator.from_non_deterministic_state()
@tf.function
def MandelbrotDataSet(size=1000, max_depth=100, xmin=-2.0, xmax=0.7, ymin=-1.3, ymax=1.3):
global randomgen
x = randomgen.uniform((size,),xmin,xmax,tf.float32)
y = randomgen.uniform((size,),xmin,xmax,tf.float32)
Because of RngReadAndSkip again.
you must you Miniforge3 as stated in the guide, not the regular conda.
if pip install do not works just install it with conda install instead
how did you install it ?