I got this error with (tensorflow-metal) virtualenv on Big Sur with an AMD Radeon 5700 XT GPU
tensorflow-metal/lib/python3.8/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 6): Symbol not found: _TF_AssignUpdateVariable
Referenced from: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow-plugins/libmetal_plugin.dylib
Expected in: flat namespace
$ nm /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow-plugins/libmetal_plugin.dylib | grep _TF_AssignUpdateVariable
U _TF_AssignUpdateVariable
Post
Replies
Boosts
Views
Activity
Fixed with tensorflow-metal version 0.1.2
I was able to profile keras/tensorflow example code with a tensorflow-metal virtual environment. Please note the profile tab will only display results in Google Chrome. In Safari the Profile tab was empty.
I installed the latest versions of tensorflow-macos and tensorflow-metal on OS X 11.6.
Now, it no longer prints out that it's using metal or my AMD GPU.
% ipython
In [3]: import tensorflow
No supported GPU was found.
I installed the latest versions from PyPi into my existing tensorflow-metal virtual environement with:
% pip install tensorflow-macos==2.6.0
% pip install tensorflow-metal=0.2.0
What's changed? Do I need to recreate the tensorflow-metal virtual environment from scratch?
% pip show tensorflow-metal
Name: tensorflow-metal
Version: 0.2.0
Summary: TensorFlow acceleration for Mac GPUs.
Home-page: https://developer.apple.com/metal/tensorflow-plugin/
Author:
Author-email:
License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved.
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: wheel, six
Required-by:
(tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 Top2Vec % pip show tensorflow-macos
Name: tensorflow-macos
Version: 2.6.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
% pip show
tensorboard 2.6.0
tensorboard-data-server 0.6.1
tensorboard-plugin-profile 2.5.0
tensorboard-plugin-wit 1.8.0
tensorflow 2.6.0
tensorflow-consciousness 0.1
tensorflow-datasets 4.3.0
tensorflow-determinism 0.3.0
tensorflow-estimator 2.6.0
tensorflow-gan 2.1.0
tensorflow-hub 0.12.0
tensorflow-macos 2.6.0
tensorflow-metadata 1.1.0
tensorflow-metal 0.2.0
tensorflow-probability 0.13.0
tensorflow-similarity 0.13.45
tensorflow-text 2.6.0
This code reproduces the crash ```In [2]: import tensorflow as tf
...:
...: mnist = tf.keras.datasets.mnist
...:
...: (x_train, y_train), (x_test, y_test) = mnist.load_data()
...: x_train, x_test = x_train / 255.0, x_test / 255.0
...:
...: model = tf.keras.models.Sequential([
...: tf.keras.layers.Flatten(input_shape=(28, 28)),
...: tf.keras.layers.Dense(128, activation='relu'),
...: tf.keras.layers.Dropout(0.2),
...: tf.keras.layers.Dense(10)
...: ])
...:
...: predictions = model(x_train[:1]).numpy()
...: tf.nn.softmax(predictions).numpy()
...:
...: loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True
...: )
...:
...: loss_fn(y_train[:1], predictions).numpy()
...:
...: model.compile(optimizer = 'adam', loss = loss_fn)
...: model.fit(x_train, y_train, epochs=100)
Epoch 1/100
2021-10-10 10:50:53.503460: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-10 10:50:53.527 python[25080:3485800] -[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x6000037975a0
zsh: segmentation fault ipython
This code reproduces the crash ...
This code reproduces the crash:
test.txt
Also, running WITH OUT metal (just CPU) is 4X faster with 'SDG' optimizer. I can't compare the ADAM optimizer since it crashed.
In [2]: import tensorflow as tf
...:
...: mnist = tf.keras.datasets.mnist
...:
...: (x_train, y_train), (x_test, y_test) = mnist.load_data()
...: x_train, x_test = x_train / 255.0, x_test / 255.0
...:
...: model = tf.keras.models.Sequential([
...: tf.keras.layers.Flatten(input_shape=(28, 28)),
...: tf.keras.layers.Dense(128, activation='relu'),
...: tf.keras.layers.Dropout(0.2),
...: tf.keras.layers.Dense(10)
...: ])
...:
...: predictions = model(x_train[:1]).numpy()
...: tf.nn.softmax(predictions).numpy()
...:
...: loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True
...: )
...:
...: loss_fn(y_train[:1], predictions).numpy()
...:
...: model.compile(optimizer = 'adam', loss = loss_fn)
...: model.fit(x_train, y_train, epochs=100)
Epoch 1/100
2021-10-10 10:50:53.503460: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-10 10:50:53.527 python[25080:3485800] -[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x6000037975a0
zsh: segmentation fault ipython
tensorflow_metal (GPU):
% time python test.py
2021-10-10 11:34:34.602604: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Metal device set to: AMD Radeon Pro 5700 XT
systemMemory: 128.00 GB
maxCacheSize: 7.99 GB
2021-10-10 11:34:34.603850: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-10 11:34:34.604642: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2021-10-10 11:34:35.779610: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/100
2021-10-10 11:34:35.929611: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
1875/1875 [==============================] - 7s 3ms/step - loss: 0.7213
Epoch 2/100
1875/1875 [==============================] - 6s 3ms/step - loss: 0.38653ms/step - loss: 0.0474
...
Epoch 100/100
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0473
python test.py 721.48s user 375.56s system 173% cpu 10:31.28 total
(tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 ~ %
tensorflow (CPU):
% time python ~/test.py
2021-10-10 11:45:44.111971: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-10 11:45:44.487763: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/100
1875/1875 [==============================] - 1s 460us/step - loss: 0.7210
Epoch 2/100
1875/1875 [==============================] - 1s 459us/step - loss: 0.3874
Epoch 3/100
1875/1875 [==============================] - 1s 459us/step - loss: 0.3233
Epoch 4/100
1875/1875 [==============================] - 1s 460us/step - loss: 0.2884
Epoch 5/100
1875/1875 [==============================] - 1s 471us/step - loss: 0.2608
Epoch 6/100
1875/1875 [==============================] - 1s 462us/step - loss: 0.2400
Epoch 7/100
...
Epoch 99/100
1875/1875 [==============================] - 1s 468us/step - loss: 0.0455
Epoch 100/100
1875/1875 [==============================] - 1s 469us/step - loss: 0.0463
python ~/test.py 181.09s user 48.20s system 246% cpu 1:32.86 total
(ai) davidlaxer@x86_64-apple-darwin13 text %
Virtual Environment
% pip list
Package Version
------------------------ ---------
absl-py 0.12.0
anyio 3.3.2
appnope 0.1.2
argon2-cffi 21.1.0
astunparse 1.6.3
attrs 21.2.0
Babel 2.9.1
backcall 0.2.0
bleach 4.1.0
bokeh 2.3.3
cachetools 4.2.4
certifi 2021.5.30
cffi 1.14.6
charset-normalizer 2.0.6
clang 5.0
cloudpickle 2.0.0
cycler 0.10.0
Cython 0.29.24
debugpy 1.5.0
decorator 5.1.0
defusedxml 0.7.1
dill 0.3.4
distinctipy 1.1.5
dm-tree 0.1.6
dotmap 1.3.24
entrypoints 0.3
flatbuffers 1.12
future 0.18.2
gast 0.4.0
gensim 3.8.3
google-auth 1.35.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
googleapis-common-protos 1.53.0
grpcio 1.41.0
h5py 3.1.0
hdbscan 0.8.27
idna 3.2
importlib-resources 5.2.2
ipykernel 6.4.1
ipython 7.28.0
ipython-genutils 0.2.0
ipywidgets 7.6.5
jedi 0.18.0
Jinja2 3.0.2
joblib 1.1.0
json5 0.9.6
jsonschema 4.0.1
jupyter-client 7.0.6
jupyter-core 4.8.1
jupyter-server 1.11.1
jupyterlab 3.1.18
jupyterlab-pygments 0.1.2
jupyterlab-server 2.8.2
jupyterlab-widgets 1.0.2
keras 2.6.0
Keras-Preprocessing 1.1.2
kiwisolver 1.3.2
llvmlite 0.37.0
Markdown 3.3.4
MarkupSafe 2.0.1
matplotlib 3.4.3
matplotlib-inline 0.1.3
memory-profiler 0.58.0
mistune 0.8.4
nbclassic 0.3.2
nbclient 0.5.4
nbconvert 6.2.0
nbformat 5.1.3
nest-asyncio 1.5.1
nmslib 2.1.1
notebook 6.4.4
numba 0.54.0
numpy 1.20.3
oauthlib 3.1.1
opt-einsum 3.3.0
packaging 21.0
pandas 1.3.3
pandocfilters 1.5.0
parso 0.8.2
pexpect 4.8.0
pickleshare 0.7.5
Pillow 8.3.2
pip 21.2.4
prometheus-client 0.11.0
promise 2.3
prompt-toolkit 3.0.20
protobuf 3.18.1
psutil 5.8.0
ptyprocess 0.7.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pybind11 2.6.1
pycparser 2.20
Pygments 2.10.0
pynndescent 0.5.4
pyparsing 2.4.7
pyrsistent 0.18.0
python-dateutil 2.8.2
pytz 2021.3
PyYAML 5.4.1
pyzmq 22.3.0
requests 2.26.0
requests-oauthlib 1.3.0
requests-unixsocket 0.2.0
rsa 4.7.2
scikit-learn 1.0
scipy 1.7.1
Send2Trash 1.8.0
setuptools 47.1.0
six 1.15.0
smart-open 5.2.1
sniffio 1.2.0
tabulate 0.8.9
tensorboard 2.6.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
tensorflow 2.6.0
tensorflow-consciousness 0.1
tensorflow-datasets 4.4.0
tensorflow-estimator 2.6.0
tensorflow-gan 2.1.0
tensorflow-hub 0.12.0
tensorflow-macos 2.6.0
tensorflow-metadata 1.2.0
tensorflow-metal 0.2.0
tensorflow-probability 0.14.1
tensorflow-similarity 0.13.45
tensorflow-text 2.6.0
termcolor 1.1.0
terminado 0.12.1
testpath 0.5.0
threadpoolctl 3.0.0
top2vec 1.0.26
tornado 6.1
tqdm 4.62.3
traitlets 5.1.0
typing-extensions 3.7.4.3
umap-learn 0.5.1
urllib3 1.26.7
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 1.2.1
Werkzeug 2.0.2
wheel 0.37.0
widgetsnbextension 3.5.1
wordcloud 1.8.1
wrapt 1.12.1
zipp 3.6.0
(tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 ~ %
On my iMac 27" with Monterey 12.0.1 it crashes with the GPU in tensorflow-metal:
% python muzero.py
2021-10-21 08:36:21.088556: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Metal device set to: AMD Radeon Pro 5700 XT
systemMemory: 128.00 GB
maxCacheSize: 7.99 GB
2021-10-21 08:36:21.089347: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-21 08:36:21.089966: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2021-10-21 08:36:21.753689: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-10-21 08:36:21.759239: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-21 08:36:34.888 python[14296:730686] -[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x600001b26220
zsh: segmentation fault python muzero.py
It runs with the CPU.
% python --version
Python 3.8.5
% pip freeze
absl-py==0.12.0
anyio==3.3.2
appnope==0.1.2
argon2-cffi==21.1.0
asttokens==2.0.5
astunparse==1.6.3
attrs==21.2.0
Babel==2.9.1
backcall==0.2.0
bleach==4.1.0
bokeh==2.3.3
cachetools==4.2.4
certifi==2021.5.30
cffi==1.14.6
charset-normalizer==2.0.6
clang==5.0
cloudpickle==2.0.0
colorama==0.4.4
cycler==0.10.0
Cython==0.29.24
debugpy==1.5.0
decorator==5.1.0
defusedxml==0.7.1
dill==0.3.4
distinctipy==1.1.5
dm-tree==0.1.6
dotmap==1.3.24
entrypoints==0.3
executing==0.8.2
flatbuffers==1.12
future==0.18.2
gast==0.4.0
gensim==3.8.3
google-auth==1.35.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
googleapis-common-protos==1.53.0
grpcio==1.41.0
gviz-api==1.9.0
gym==0.21.0
h5py==3.1.0
hdbscan==0.8.27
icecream==2.1.1
idna==3.2
importlib-resources==5.2.2
ipykernel==6.4.1
ipython==7.28.0
ipython-genutils==0.2.0
ipywidgets==7.6.5
jedi==0.18.0
Jinja2==3.0.2
joblib==1.1.0
json5==0.9.6
jsonschema==4.0.1
jupyter-client==7.0.6
jupyter-core==4.8.1
jupyter-server==1.11.1
jupyterlab==3.1.18
jupyterlab-pygments==0.1.2
jupyterlab-server==2.8.2
jupyterlab-widgets==1.0.2
keras==2.6.0
Keras-Preprocessing==1.1.2
kiwisolver==1.3.2
llvmlite==0.37.0
Markdown==3.3.4
MarkupSafe==2.0.1
matplotlib==3.4.3
matplotlib-inline==0.1.3
memory-profiler==0.58.0
mistune==0.8.4
nbclassic==0.3.2
nbclient==0.5.4
nbconvert==6.2.0
nbformat==5.1.3
nest-asyncio==1.5.1
nmslib==2.1.1
notebook==6.4.4
numba==0.54.0
numpy==1.20.3
oauthlib==3.1.1
opt-einsum==3.3.0
packaging==21.0
pandas==1.3.3
pandocfilters==1.5.0
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.3.2
prometheus-client==0.11.0
promise==2.3
prompt-toolkit==3.0.20
protobuf==3.18.1
psutil==5.8.0
ptyprocess==0.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybind11==2.6.1
pycparser==2.20
Pygments==2.10.0
pynndescent==0.5.4
pyparsing==2.4.7
pyrsistent==0.18.0
python-dateutil==2.8.2
pytz==2021.3
PyYAML==5.4.1
pyzmq==22.3.0
requests==2.26.0
requests-oauthlib==1.3.0
requests-unixsocket==0.2.0
rsa==4.7.2
scikit-learn==1.0
scipy==1.7.1
Send2Trash==1.8.0
six==1.15.0
smart-open==5.2.1
sniffio==1.2.0
tabulate==0.8.9
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-profile==2.5.0
tensorboard-plugin-wit==1.8.0
tensorflow==2.6.0
tensorflow-consciousness==0.1
tensorflow-datasets==4.4.0
tensorflow-estimator==2.6.0
tensorflow-gan==2.1.0
tensorflow-hub==0.12.0
tensorflow-macos==2.6.0
tensorflow-metadata==1.2.0
tensorflow-metal==0.2.0
tensorflow-probability==0.14.1
tensorflow-similarity==0.13.45
tensorflow-text==2.6.0
termcolor==1.1.0
terminado==0.12.1
testpath==0.5.0
threadpoolctl==3.0.0
top2vec==1.0.26
tornado==6.1
tqdm==4.62.3
traitlets==5.1.0
typing-extensions==3.7.4.3
umap-learn==0.5.1
urllib3==1.26.7
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.2.1
Werkzeug==2.0.2
widgetsnbextension==3.5.1
wordcloud==1.8.1
wrapt==1.12.1
zipp==3.6.0
Hi,
Your example runs for me on Monterey 12.0.1 with Python 3.8 ... if I replace the ADAM optimizer with SGD.
model.compile( loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.SGD(0.001), metrics=['accuracy'], )
I've noticed ADAM crash the session.
Metal device set to: AMD Radeon Pro 5700 XT
2021-10-25 12:01:51.733970: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-25 12:01:51.734526: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-25 12:01:51.734764: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2021-10-25 12:01:51.902618: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-25 12:01:51.902647: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2021-10-25 12:01:52.021880: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-25 12:01:52.035650: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-25 12:01:52.081019: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-25 12:01:52.099696: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-25 12:01:52.211089: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-25 12:01:52.229341: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-25 12:01:52.237014: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-25 12:01:52.261855: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-25 12:01:52.279544: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-10-25 12:01:52.304527: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-25 12:01:52.324218: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 2.2622 - accuracy: 0.1993
/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/keras/engine/training.py:2470: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
2021-10-25 12:02:06.665054: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
469/469 [==============================] - 15s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 2.2622 - accuracy: 0.1993 - val_loss: 2.2074 - val_accuracy: 0.4665
Epoch 2/12
469/469 [==============================] - 11s 20ms/step - batch: 234.0000 - size: 1.0000 - loss: 2.1208 - accuracy: 0.3812 - val_loss: 1.9072 - val_accuracy: 0.6792
Epoch 3/12
469/469 [==============================] - 11s 21ms/step - batch: 234.0000 - size: 1.0000 - loss: 1.6169 - accuracy: 0.5601 - val_loss: 1.0289 - val_accuracy: 0.8151
Epoch 4/12
469/469 [==============================] - 12s 22ms/step - batch: 234.0000 - size: 1.0000 - loss: 1.0248 - accuracy: 0.6935 - val_loss: 0.5984 - val_accuracy: 0.8613
Epoch 5/12
469/469 [==============================] - 12s 23ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.7831 - accuracy: 0.7570 - val_loss: 0.4718 - val_accuracy: 0.8799
Epoch 6/12
469/469 [==============================] - 13s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.6629 - accuracy: 0.7937 - val_loss: 0.4055 - val_accuracy: 0.8929
Epoch 7/12
469/469 [==============================] - 13s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.6024 - accuracy: 0.8123 - val_loss: 0.3660 - val_accuracy: 0.9007
Epoch 8/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.5541 - accuracy: 0.8301 - val_loss: 0.3380 - val_accuracy: 0.9073
Epoch 9/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.5244 - accuracy: 0.8397 - val_loss: 0.3181 - val_accuracy: 0.9121
Epoch 10/12
469/469 [==============================] - 13s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.4910 - accuracy: 0.8500 - val_loss: 0.2988 - val_accuracy: 0.9161
Epoch 11/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.4683 - accuracy: 0.8570 - val_loss: 0.2857 - val_accuracy: 0.9186
Epoch 12/12
469/469 [==============================] - 14s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.4562 - accuracy: 0.8600 - val_loss: 0.2736 - val_accuracy: 0.9207
[1]:
<keras.callbacks.History at 0x7f8758fd9310>
[ ]:
Try changing
optimizer='adam'
to
optimizer='SGD'
This code crashes with the 'adam' optimzer. It does work with 'SGD'.
I am running Monterey 12.1 beta, and the latest versions of tensorflow-macos and tensorflow-metal from pypi.
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
predictions = model(x_train[:1]).numpy()
tf.nn.softmax(predictions).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer = 'adam', loss = loss_fn)
model.fit(x_train, y_train, epochs=100)
The AdamOptimizer is still causing crashes with:
tensorflow-metal 0.3.0
tensorflow-macos 2.7.0
The exception is generated building a list of document vectors from input documents not in model training:
E.g. -
document_vectors.append(self.embed(train_corpus[current:current + batch_size]))
The python 3.8 process grows in memory to 100GB and then generates the OOM exception.
def _embed_documents(self, train_corpus):
self._check_import_status()
self._check_model_status()
# embed documents
batch_size = 5
document_vectors = []
current = 0
batches = int(len(train_corpus) / batch_size)
extra = len(train_corpus) % batch_size
for ind in range(0, batches):
try:
__**document_vectors.append(self.embed(train_corpus[current:current + batch_size]))**__
except Exception as e:
print (e.__doc__)
print (e.message)
current += batch_size
if extra > 0:
document_vectors.append(self.embed(train_corpus[current:current + extra]))
document_vectors = self._l2_normalize(np.array(np.vstack(document_vectors)))
return document_vectors
Which optimizer are you using? If 'Adam' try 'SGD'.