https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac
Does this feature support AMD GPUs with Metal or only M1 support?
Does v1.12 nightly build support the Apple Metal only with source?
Post
Replies
Boosts
Views
Activity
I am running tensorflow-macos and tensorflow-metal version 2.6 on Monterey Beta (21A5543b) on an iMac 27" 2021 with an AMD Radeon GPU.
I got the following error training the model VariationalDeepSemanticHashing
e.g.
2021-10-09 13:05:14.521286: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-09 13:05:27.092823: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-09 13:05:27.153 python[6315:1459657] -[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x600000eede10
[I 2021-10-09 13:05:28.157 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports
kernel d25e6066-74f7-4b4a-b5e7-b2911e7501d9 restarted
https://github.com/unsuthee/VariationalDeepSemanticHashing/blob/master/Run_Experiment_Unsupervised.ipynb
Here's the repository:
https://github.com/unsuthee/VariationalDeepSemanticHashing
[I am running Top2vec on Big Sur 11.6 with tensorflow-macos and tensorflow-metal.
Python crashed ...
linkText
Crashed Thread: 0 Dispatch queue: metal gpu stream
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Application Specific Information:
/System/Volumes/Data/SWE/macOS/BuildRoots/38cf1d983f/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MetalPerformanceShaders-124.6.1/MPSCore/Utility/MPSCommandBufferImageCache.mm:1386: failed assertion `Failed to allocate private MTLBuffer for size 421888000
Crash Log
Will Metal Performance Shaders Framework ('MPS') support Float64?
https://developer.apple.com/documentation/metalperformanceshaders
Training Top2vec with embedding_batch_size=256 crashed OS X 12.3.1
tensorflow_macos 2.8.0, tensorflow_metal 0.4.0
Anaconda Python 3.8.5
% pip show tensorflow_macos
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: tensorflow-macos
Version: 2.8.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, setuptools, six, tensorboard, termcolor, tf-estimator-nightly, typing-extensions, wrapt
Required-by:
(tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 top2vec % pip show tensorflow_metal
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: tensorflow-metal
Version: 0.4.0
Summary: TensorFlow acceleration for Mac GPUs.
Home-page: https://developer.apple.com/metal/tensorflow-plugin/
Author:
Author-email:
License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved.
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: six, wheel
Required-by:
To train the model with embedding_model="universal-sentence-encoder", you'll have to download universal-sentence-encoder_4.
top2vec_trained = Top2Vec(documents=papers_filtered_df.text.tolist(), split_documents=True, **embedding_batch_size=256,** embedding_model="universal-sentence-encoder", use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4", workers=8)
Here's the project:
https://github.com/ddangelov/Top2Vec
Here's the Jupyter notebook:
https://github.com/ddangelov/Top2Vec/blob/master/notebooks/CORD-19_top2vec.ipynb
You'll have to load the COVID-19 data set from Kaggle here:
https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge
I set filter size to 1,000:
def filter_short(papers_df):
papers_df["token_counts"] = papers_df["text"].str.split().map(len)
papers_df = **papers_df[papers_df.token_counts>1000].reset_index(drop=True)**
papers_df.drop('token_counts', axis=1, inplace=True)
return papers_df
Trace
panic(cpu 8 caller 0xffffff8020d449ad): userspace watchdog timeout: no successful checkins from WindowServer in 120 seconds
service: logd, total successful checkins since wake (127621 seconds ago): 12763, last successful checkin: 0 seconds ago
service: WindowServer, total successful checkins since wake (127621 seconds ago): 12751, last successful checkin: 120 seconds ago
service: remoted, total successful checkins since wake (127621 seconds ago): 12763, last successful checkin: 0
[Trace](https://developer.apple.com/forums/content/attachment/d17c2c9b-569b-4c1a-9c61-892ced7f785b)
I am trying to build AI-Feynman
https://github.com/SJ001/AI-Feynman
OS X 12.4
XCode 13.4
% xcode-select -p
/Library/Developer/CommandLineTools
% which gfortran
/Users/davidlaxer/anaconda3/envs/AI-Feynman/bin/gfortran
(AI-Feynman) davidlaxer@x86_64-apple-darwin13 AI-Feynman % gfortran --version
GNU Fortran (GCC) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
I am getting this error trying in link AI-Feynman:
/Users/davidlaxer/anaconda3/envs/AI-Feynman/bin/x86_64-apple-darwin13.4.0-gfortran -Wall -g -arch x86_64 -Wall -g -undefined dynamic_lookup -bundle build/temp.macosx-10.9-x86_64-3.9/build/src.macosx-10.9-x86_64-3.9/aifeynman/_symbolic_regress1module.o build/temp.macosx-10.9-x86_64-3.9/build/src.macosx-10.9-x86_64-3.9/build/src.macosx-10.9-x86_64-3.9/aifeynman/fortranobject.o build/temp.macosx-10.9-x86_64-3.9/aifeynman/symbolic_regress1.o build/temp.macosx-10.9-x86_64-3.9/build/src.macosx-10.9-x86_64-3.9/aifeynman/_symbolic_regress1-f2pywrappers.o -L/Users/davidlaxer/anaconda3/envs/AI-Feynman/bin/../lib/gcc/x86_64-apple-darwin11.4.2/4.8.5 -L/Users/davidlaxer/anaconda3/envs/AI-Feynman/bin/../lib/gcc/x86_64-apple-darwin11.4.2/4.8.5/../../.. -L/Users/davidlaxer/anaconda3/envs/AI-Feynman/bin/../lib/gcc/x86_64-apple-darwin11.4.2/4.8.5/../../.. -lgfortran -o aifeynman/_symbolic_regress1.cpython-39-darwin.so
ld: unsupported tapi file type '!tapi-tbd' in YAML file '/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib/libSystem.tbd' for architecture x86_64
collect2: error: ld returned 1 exit status
error: Command "/Users/davidlaxer/anaconda3/envs/AI-Feynman/bin/x86_64-apple-darwin13.4.0-gfortran -Wall -g -arch x86_64 -Wall -g -undefined dynamic_lookup -bundle build/temp.macosx-10.9-x86_64-3.9/build/src.macosx-10.9-x86_64-3.9/aifeynman/_symbolic_regress1module.o build/temp.macosx-10.9-x86_64-3.9/build/src.macosx-10.9-x86_64-3.9/build/src.macosx-10.9-x86_64-3.9/aifeynman/fortranobject.o build/temp.macosx-10.9-x86_64-3.9/aifeynman/symbolic_regress1.o build/temp.macosx-10.9-x86_64-3.9/build/src.macosx-10.9-x86_64-3.9/aifeynman/_symbolic_regress1-f2pywrappers.o -L/Users/davidlaxer/anaconda3/envs/AI-Feynman/bin/../lib/gcc/x86_64-apple-darwin11.4.2/4.8.5 -L/Users/davidlaxer/anaconda3/envs/AI-Feynman/bin/../lib/gcc/x86_64-apple-darwin11.4.2/4.8.5/../../.. -L/Users/davidlaxer/anaconda3/envs/AI-Feynman/bin/../lib/gcc/x86_64-apple-darwin11.4.2/4.8.5/../../.. -lgfortran -o aifeynman/_symbolic_regress1.cpython-39-darwin.so" failed with exit status 1
% find /Applications/Xcode.app -name libSystem.tbd -ls
227714500 0 lrwxr-xr-x 1 root wheel 15 Jul 29 2021 /Applications/Xcode.app/Contents/Developer/Platforms/AppleTVOS.platform/Developer/SDKs/AppleTVOS.sdk/usr/lib/libSystem.tbd -> libSystem.B.tbd
227847317 64 -rw-r--r-- 1 root wheel 170869 Feb 21 20:37 /Applications/Xcode.app/Contents/Developer/Platforms/DriverKit.platform/Developer/SDKs/DriverKit.sdk/System/DriverKit/usr/lib/libSystem.tbd
228117342 0 lrwxr-xr-x 1 root wheel 15 Jul 29 2021 /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk/usr/lib/libSystem.tbd -> libSystem.B.tbd
find: /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Library/Developer/CoreSimulator/Profiles/Runtimes/iOS.simruntime/Contents/Resources/RuntimeRoot/private/var/mobile: Permission denied
227900857 0 lrwxr-xr-x 1 root wheel 15 Jul 29 2021 /Applications/Xcode.app/Contents/Developer/Platforms/WatchOS.platform/Developer/SDKs/WatchOS.sdk/usr/lib/libSystem.tbd -> libSystem.B.tbd
227872436 0 lrwxr-xr-x 1 root wheel 15 Jul 29 2021 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib/libSystem.tbd -> libSystem.B.tbd
228110285 0 lrwxr-xr-x 1 root wheel 15 Jul 29 2021 /Applications/Xcode.app/Contents/Developer/Platforms/WatchSimulator.platform/Developer/SDKs/WatchSimulator.sdk/usr/lib/libSystem.tbd -> libSystem.B.tbd
227845798 0 lrwxr-xr-x 1 root wheel 15 Jul 29 2021 /Applications/Xcode.app/Contents/Developer/Platforms/AppleTVSimulator.platform/Developer/SDKs/AppleTVSimulator.sdk/usr/lib/libSystem.tbd -> libSystem.B.tbd
228409847 0 lrwxr-xr-x 1 root wheel 15 Jul 29 2021 /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator.sdk/usr/lib/libSystem.tbd -> libSystem.B.tbd
(AI-Feynman) davidlaxer@x86_64-apple-darwin13 AI-Feynman % diff /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib/libSystem.tbd /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib/libSystem.tbd
I got this error building PyTorch 1.12 on OS X Monterey 12.4, XCode 13.4 in the linker:
ld: in lib/libnnpack.a(conv1x1.py.o), section __TEXT/__const address out of range for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
% clang --version
Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: x86_64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
% lipo -info ./build/lib/*.a
Non-fat file: ./build/lib/libXNNPACK.a is architecture: x86_64
Non-fat file: ./build/lib/libasmjit.a is architecture: x86_64
Non-fat file: ./build/lib/libbenchmark.a is architecture: x86_64
Non-fat file: ./build/lib/libbenchmark_main.a is architecture: x86_64
Non-fat file: ./build/lib/libcaffe2_protos.a is architecture: x86_64
Non-fat file: ./build/lib/libclog.a is architecture: x86_64
Non-fat file: ./build/lib/libcpuinfo.a is architecture: x86_64
Non-fat file: ./build/lib/libcpuinfo_internals.a is architecture: x86_64
Non-fat file: ./build/lib/libdnnl.a is architecture: x86_64
Non-fat file: ./build/lib/libfbgemm.a is architecture: x86_64
Non-fat file: ./build/lib/libfmt.a is architecture: x86_64
Non-fat file: ./build/lib/libfoxi_loader.a is architecture: x86_64
Non-fat file: ./build/lib/libgmock.a is architecture: x86_64
Non-fat file: ./build/lib/libgmock_main.a is architecture: x86_64
Non-fat file: ./build/lib/libgtest.a is architecture: x86_64
Non-fat file: ./build/lib/libgtest_main.a is architecture: x86_64
Non-fat file: ./build/lib/libkineto.a is architecture: x86_64
Non-fat file: ./build/lib/libnnpack.a is architecture: x86_64
Non-fat file: ./build/lib/libnnpack_reference_layers.a is architecture: x86_64
Non-fat file: ./build/lib/libonnx.a is architecture: x86_64
Non-fat file: ./build/lib/libonnx_proto.a is architecture: x86_64
Non-fat file: ./build/lib/libprotobuf-lite.a is architecture: x86_64
Non-fat file: ./build/lib/libprotobuf.a is architecture: x86_64
Non-fat file: ./build/lib/libprotoc.a is architecture: x86_64
Non-fat file: ./build/lib/libpthreadpool.a is architecture: x86_64
Non-fat file: ./build/lib/libpytorch_qnnpack.a is architecture: x86_64
Non-fat file: ./build/lib/libqnnpack.a is architecture: x86_64
(base) davidlaxer@x86_64-apple-darwin13 pytorch
linkText
linkText
Any suggestions?
I am running on an iMac 27" 2020 with an AMD Radeon Pro 5700 XT GPU.
% pip show tensorflow-macos
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: tensorflow-macos
Version: 2.8.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, setuptools, six, tensorboard, termcolor, tf-estimator-nightly, typing-extensions, wrapt
Required-by:
(tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 deep-learning-with-python-notebooks % pip show tensorflow-metal
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: tensorflow-metal
Version: 0.4.0
Summary: TensorFlow acceleration for Mac GPUs.
Home-page: https://developer.apple.com/metal/tensorflow-plugin/
Author:
Author-email:
License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved.
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: six, wheel
Required-by:
Here's the code in a Jupyter Notebook:
https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/chapter11_part02_sequence-models.ipynb
Monterey crashed in Epoch 1 of training:
callbacks = [
keras.callbacks.ModelCheckpoint("one_hot_bidir_lstm.keras",
save_best_only=True)
]
model.fit(int_train_ds, validation_data=int_val_ds, epochs=10, callbacks=callbacks)
model = keras.models.load_model("one_hot_bidir_lstm.keras")
print(f"Test acc: {model.evaluate(int_test_ds)[1]:.3f}")
Epoch 1/10
2022-04-01 08:55:17.903921: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-04-01 08:55:18.427409: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-04-01 08:55:18.463345: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-04-01 08:55:32.254473: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-04-01 08:55:32.276648: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
3/625 [..............................] - ETA: 2:03:10 - loss: nan - accuracy: 0.5000
Kernel Panic
I am running a recurrent neural network example on an iMac 27" with an AMD Radeon Pro 5700 XT on OS X 12.3.
The code runs, the GPU was initialized the GPU is not active.
Each epoch takes:
819/819 [==============================] - 32417s 40s/step - loss: 121.7538 - mae: 8.9641 - val_loss: 100.3145 - val_mae: 8.0313
% python chapter-10.py
['"Date Time"', '"p (mbar)"', '"T (degC)"', '"Tpot (K)"', '"Tdew (degC)"', '"rh (%)"', '"VPmax (mbar)"', '"VPact (mbar)"', '"VPdef (mbar)"', '"sh (g/kg)"', '"H2OC (mmol/mol)"', '"rho (g/m**3)"', '"wv (m/s)"', '"max. wv (m/s)"', '"wd (deg)"']
420451
num_train_samples: 210225
num_val_samples: 105112
num_test_samples: 105114
2022-03-28 18:28:59.988516: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Metal device set to: AMD Radeon Pro 5700 XT
systemMemory: 128.00 GB
maxCacheSize: 7.99 GB
2022-03-28 18:28:59.989242: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-03-28 18:28:59.989616: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
WARNING:tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
Epoch 1/50
2022-03-28 18:29:02.342296: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
819/819 [==============================] - ETA: 0s - loss: 121.7538 - mae: 8.9641 2022-03-29 03:21:02.092397: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
819/819 [==============================] - 32417s 40s/step - loss: 121.7538 - mae: 8.9641 - val_loss: 100.3145 - val_mae: 8.0313
Epoch 2/50
381/819 [============>.................] - ETA: 4:50:31 - loss: 93.7597 - mae: 7.6880
from tensorflow import keras
from tensorflow.keras import layers
num_features = 14
steps = 120
import os
fname = os.path.join("jena_climate_2009_2016.csv")
with open(fname) as f:
data = f.read()
lines = data.split("\n")
header = lines[0].split(",")
lines = lines[1:]
print(header)
print(len(lines))
import numpy as np
temperature = np.zeros((len(lines),))
raw_data = np.zeros((len(lines), len(header) - 1))
for i, line in enumerate(lines):
values = [float(x) for x in line.split(",")[1:]]
temperature[i] = values[1]
raw_data[i, :] = values[:]
sampling_rate = 6
sequence_length = 120
delay = sampling_rate * (sequence_length + 24 - 1)
batch_size = 256
num_train_samples = int(0.5 * len(raw_data))
num_val_samples = int(0.25 * len(raw_data))
num_test_samples = len(raw_data) - num_train_samples - num_val_samples
print("num_train_samples:", num_train_samples)
print("num_val_samples:", num_val_samples)
print("num_test_samples:", num_test_samples)
train_dataset = keras.utils.timeseries_dataset_from_array(
raw_data[:-delay],
targets=temperature[delay:],
sampling_rate=sampling_rate,
sequence_length=sequence_length,
shuffle=True,
batch_size=batch_size,
start_index=0,
end_index=num_train_samples)
val_dataset = keras.utils.timeseries_dataset_from_array(
raw_data[:-delay],
targets=temperature[delay:],
sampling_rate=sampling_rate,
sequence_length=sequence_length,
shuffle=True,
batch_size=batch_size,
start_index=num_train_samples,
end_index=num_train_samples + num_val_samples)
test_dataset = keras.utils.timeseries_dataset_from_array(
raw_data[:-delay],
targets=temperature[delay:],
sampling_rate=sampling_rate,
sequence_length=sequence_length,
shuffle=True,
batch_size=batch_size,
start_index=num_train_samples + num_val_samples)
inputs = keras.Input(shape=(steps, num_features))
x = layers.SimpleRNN(16, return_sequences=True)(inputs)
x = layers.SimpleRNN(16, return_sequences=True)(x)
outputs = layers.SimpleRNN(16)(x)
inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
x = layers.LSTM(32, recurrent_dropout=0.25)(inputs)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
callbacks = [
keras.callbacks.ModelCheckpoint("jena_lstm_dropout.keras",
save_best_only=True)
]
model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
history = model.fit(train_dataset,
epochs=50,
validation_data=val_dataset,
callbacks=callbacks)
Any idea why the GPU is not active?
How does this code example run on an M1 Ultra?
Will tensorflow_text be supported with Tensorflow_macos & Tensorflow_metal?
E.g.
https://github.com/tensorflow/text/issues/654
I am using a 2022 Intel based iMac 27" with an AMD Radeon Pro GPU
running OS X Monterey 12.3 Beta.
I am working with Francois Chollet's GAN example:
https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/chapter12_part05_gans.ipynb
Here is my tensorflow-metal environment:
Package Version Editable project location
---------------------------- ---------- ----------------------------
...
Send2Trash 1.8.0
sentence-transformers 2.1.0
sentencepiece 0.1.96
setuptools 47.1.0
shap 0.40.0
six 1.15.0
slicer 0.0.7
smart-open 5.2.1
sniffio 1.2.0
soupsieve 2.3.1
sympy 1.9
tables 3.6.1
tabulate 0.8.9
tbb 2021.5.0
tbb-devel 2021.5.0
tenacity 8.0.1
tensorboard 2.6.0
tensorboard-data-server 0.6.1
tensorboard-plugin-profile 2.5.0
tensorboard-plugin-wit 1.8.0
tensorflow 2.6.0
tensorflow-addons 0.14.0
tensorflow-consciousness 0.1
tensorflow-datasets 4.4.0
tensorflow-estimator 2.7.0
tensorflow-gan 2.1.0
tensorflow-hub 0.12.0
tensorflow-io-gcs-filesystem 0.22.0
tensorflow-macos 2.7.0
tensorflow-metadata 1.2.0
tensorflow-metal 0.3.0
tensorflow-probability 0.14.1
tensorflow-similarity 0.13.45
tensorflow-text 2.7.3
termcolor 1.1.0
terminado 0.12.1
testpath 0.5.0
threadpoolctl 3.0.0
tokenizers 0.10.3
toml 0.10.2
top2vec 1.0.26
torch 1.10.1
torchvision 0.11.2
tornado 6.1
tqdm 4.62.3
traitlets 5.1.0
transformers 4.11.3
typeguard 2.13.0
typing-extensions 3.7.4.3
umap-learn 0.5.1
urllib3 1.26.7
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 1.2.1
Werkzeug 2.0.2
wheel 0.37.0
widgetsnbextension 4.0.0b0
wordcloud 1.8.1
wrapt 1.12.1
xgboost 1.5.1
zipp 3.6.0
(tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 ~ %
The Keras Adam optimizer crashes tensorflow.
The SGD optimzer runs but the GAN generates a single color block image.
The Adagrad optimizer runs but also generates a one color block image.
Here is output:
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-04 07:36:35.810367: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-02-04 07:36:35.810606: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: AMD Radeon Pro 5700 XT
2022-02-04 07:36:38.833846: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
Example output image:
The code ran for 4 days and completed with no errors, however, the images were always single color blocks.
I tried running with the Adam optimizer on the CPU but it was too slow.
I am getting a segmentation violation in Tensorflow running the Top2vec model.
https://github.com/ddangelov/Top2Vec
Here's my code snippet:
import sys
import faulthandler
import numpy as np
import pandas as pd
import json
import os
import ipywidgets as widgets
from IPython.display import clear_output, display
from top2vec import Top2Vec
faulthandler.enable(file=sys.stderr, all_threads=True)
def filter_short(papers_df):
papers_df["token_counts"] = papers_df["text"].str.split().map(len)
papers_df = papers_df[papers_df.token_counts > 50].reset_index(drop=True)
papers_df.drop('token_counts', axis=1, inplace=True)
return papers_df
papers_prepared_df = pd.read_feather("/Users/davidlaxer/Downloads/archive/covid19_papers_processed.feather")
papers_feathered_filtered_df = filter_short(papers_prepared_df)
top2vec_trained = Top2Vec(documents=papers_feathered_filtered_df.text.tolist(), embedding_model="universal-sentence-encoder", use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4/", workers=4)
Here's the stack trace:
/Users/davidlaxer/tensorflow-metal/bin/python /Users/davidlaxer/Top2Vec/notebooks/test1.py
2021-12-24 08:50:04,782 - top2vec - INFO - Pre-processing documents for training
/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
warnings.warn(msg, category=FutureWarning)
2021-12-24 08:51:37,701 - top2vec - INFO - Loading universal-sentence-encoder model at /Users/davidlaxer/Downloads/universal-sentence-encoder_4/
2021-12-24 08:51:37.837087: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-24 08:51:37.838170: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-12-24 08:51:37.838380: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: AMD Radeon Pro 5700 XT
systemMemory: 128.00 GB
maxCacheSize: 7.99 GB
2021-12-24 08:51:39.764518: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-12-24 08:51:41,157 - top2vec - INFO - Creating joint document/word embedding
2021-12-24 08:51:41.241496: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
Fatal Python error: Segmentation fault
Thread 0x000000010c32a600 (most recent call first):
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 58 in quick_execute
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 598 in call
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1959 in _call_flat
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3130 in __call__
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 949 in _call
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 910 in __call__
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 150 in error_handler
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 701 in _call_attribute
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/top2vec/Top2Vec.py", line 538 in _embed_documents
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/top2vec/Top2Vec.py", line 344 in __init__
File "/Users/davidlaxer/Top2Vec/notebooks/test1.py", line 24 in <module>
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
This is likely related to:
https://github.com/ddangelov/Top2Vec/issues/232
However, SIGSEGV indicates some other problem(s).
% python --version
Python 3.8.5
% pip show tensorflow_macos
Name: tensorflow-macos
Version: 2.7.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, six, tensorboard, tensorflow-estimator, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wheel, wrapt
Required-by:
% pip show tensorflow_metal
Name: tensorflow-metal
Version: 0.3.0
Summary: TensorFlow acceleration for Mac GPUs.
Home-page: https://developer.apple.com/metal/tensorflow-plugin/
Author:
Author-email:
License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved.
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: six, wheel
Required-by:
In a tensorflow-metal virtual environment on OS X 12.1:
tensorboard 2.6.0
tensorboard-data-server 0.6.1
tensorboard-plugin-profile 2.5.0
tensorboard-plugin-wit 1.8.0
tensorflow 2.6.0
tensorflow-addons 0.14.0
tensorflow-consciousness 0.1
tensorflow-datasets 4.4.0
tensorflow-estimator 2.7.0
tensorflow-gan 2.1.0
tensorflow-hub 0.12.0
tensorflow-io-gcs-filesystem 0.22.0
tensorflow-macos 2.7.0
tensorflow-metadata 1.2.0
tensorflow-metal 0.3.0
tensorflow-probability 0.14.1
tensorflow-similarity 0.13.45
tensorflow-text 2.7.3
Running the Top2vec model:
https://github.com/ddangelov/Top2Vec
import numpy as np
import pandas as pd
import json
import os
import ipywidgets as widgets
from IPython.display import clear_output, display
from top2vec import Top2Vec
papers_prepared_df = pd.read_feather("/Users/davidlaxer/Downloads/archive/covid19_papers_processed.feather")
top2vec_trained = Top2Vec(documents=papers_prepared_df.text.tolist(), embedding_model="universal-sentence-encoder", use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4/", workers=4)
2021-12-20 06:30:52,188 - top2vec - INFO - Pre-processing documents for training
/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
warnings.warn(msg, category=FutureWarning)
2021-12-20 06:31:57,351 - top2vec - INFO - Loading universal-sentence-encoder model at /Users/davidlaxer/Downloads/universal-sentence-encoder_4
2021-12-20 06:31:57.488459: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-20 06:31:57.489288: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-12-20 06:31:57.489490: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: AMD Radeon Pro 5700 XT
2021-12-20 06:31:59.447260: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-12-20 06:32:00,841 - top2vec - INFO - Creating joint document/word embedding
2021-12-20 06:32:00.923838: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
Some resource has been exhausted.
For example, this error might be raised if a per-user quota is
exhausted, or perhaps the entire file system is out of space.
@@__init__
2 root error(s) found.
(0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[114389,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
[[{{node EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/GatherV2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[StatefulPartitionedCall/StatefulPartitionedCall/EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/Reshape_1/_188]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[114389,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
[[{{node EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/GatherV2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
...
I tried adjusting the batchsize (e.g - 500, 100, 50, 10, 5).
I measured a significant performance difference running the 'keras-io' example 'text_extraction_with_bert.ipynb' on Google Colab and my tensorflow_metal GPU (AMD Radeon Pro 5700 XT).
Google Colab Pro w/TPU finished 3 epochs in 11 minutes, while tensorflow_metal ran for many hours for 1 epoch.
So, I tried to profile the model in both environments.
I was able to profile text_extraction_with_bert.ipynb on Google Colab Pro, but not on tensorflow_metal.
My Mac has 128gb ... the OOM exception happened when the Python 3.8 process got to ~85GB.
ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[256,384,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
[[node model/tf_bert_model/bert/encoder/layer_._6/intermediate/Gelu/add (defined at Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/transformers/models/bert/modeling_tf_bert.py:354) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[model/tf_bert_model/bert/encoder/layer_._7/attention/output/dense/Tensordot/Prod/_632]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) Resource exhausted: OOM when allocating tensor with shape[256,384,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
[[node model/tf_bert_model/bert/encoder/layer_._6/intermediate/Gelu/add (defined at Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/transformers/models/bert/modeling_tf_bert.py:354) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_24748]
Function call stack:
train_function -> train_function
Here's the model:
https://github.com/keras-team/keras-io/blob/master/examples/nlp/ipynb/text_extraction_with_bert.ipynb
I have installed tensorflow-macos and tensorflow-metal on Big Sur on a iMac 27" with AMD Radeon Pro 5700 XT.
I am trying to run Keras code from Francios Challet's Deep Learning example:
E.g Chapter 11-part04_sequence-to-Sequence
https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/chapter11_part04_sequence-to-sequence-learning.ipynb
seq2seq_rnn.compile(
optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
seq2seq_rnn.fit(train_ds, epochs=15, validation_data=val_ds)
2021-07-15 13:17:00.117869: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-07-15 13:17:01.403133: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at partitioned_function_ops.cc:114 : Invalid argument: No OpKernel was registered to support Op 'CudnnRNNV3' used by {{node cond_41/then/_0/cond/CudnnRNNV3}} with these attrs: [T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="gru", seed2=0, is_training=true, num_proj=0, time_major=false, seed=0, dropout=0]
Registered devices: [CPU, GPU]
Registered kernels:
<no registered kernels>
[[cond_41/then/_0/cond/CudnnRNNV3]]
2021-07-15 13:17:01.419061: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at partitioned_function_ops.cc:114 : Invalid argument: No OpKernel was registered to support Op 'CudnnRNNV3' used by {{node cond_41/then/_0/cond/CudnnRNNV3}} with these attrs: [time_major=false, dropout=0, seed=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="gru", seed2=0, is_training=true, num_proj=0]
Registered devices: [CPU, GPU]
Registered kernels:
<no registered kernels>
[[cond_41/then/_0/cond/CudnnRNNV3]]
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
/var/folders/3n/56fpv14n4wj0c1l1sb106pzw0000gn/T/ipykernel_94493/3093225856.py in <module>
3 loss="sparse_categorical_crossentropy",
4 metrics=["accuracy"])
----> 5 seq2seq_rnn.fit(train_ds, epochs=15, validation_data=val_ds)
~/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1181 _r=1):
1182 callbacks.on_train_batch_begin(step)
-> 1183 tmp_logs = self.train_function(iterator)
1184 if data_handler.should_sync:
1185 context.async_wait()
~/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
887
888 with OptionalXlaContext(self._jit_compile):
--> 889 result = self._call(*args, **kwds)
890
891 new_tracing_count = self.experimental_get_tracing_count()
~/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
948 # Lifting succeeded, so variables are initialized and we can run the
949 # stateless function.
--> 950 return self._stateless_fn(*args, **kwds)
951 else:
952 _, _, _, filtered_flat_args = \
~/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
3021 (graph_function,
3022 filtered_flat_args) = self._maybe_define_function(args, kwargs)
-> 3023 return graph_function._call_flat(
3024 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
3025
~/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1958 and executing_eagerly):
1959 # No tape is watching; skip to running the function.
-> 1960 return self._build_call_outputs(self._inference_function.call(
1961 ctx, args, cancellation_manager=cancellation_manager))
1962 forward_backward = self._select_forward_and_backward_functions(
~/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
589 with _InterpolateFunctionError(self):
590 if cancellation_manager is None:
--> 591 outputs = execute.execute(
592 str(self.signature.name),
593 num_outputs=self._num_outputs,
~/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
57 try:
58 ctx.ensure_initialized()
---> 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: No OpKernel was registered to support Op 'CudnnRNNV3' used by {{node cond_41/then/_0/cond/CudnnRNNV3}} with these attrs: [T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="gru", seed2=0, is_training=true, num_proj=0, time_major=false, seed=0, dropout=0]
Registered devices: [CPU, GPU]
Registered kernels:
<no registered kernels>
[[cond_41/then/_0/cond/CudnnRNNV3]]
[[model/bidirectional/backward_gru/PartitionedCall]]
[[broadcast_weights_1/assert_broadcastable/is_valid_shape/else/_1/broadcast_weights_1/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/then/_53/broadcast_weights_1/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims/concat/_66]]
(1) Invalid argument: No OpKernel was registered to support Op 'CudnnRNNV3' used by {{node cond_41/then/_0/cond/CudnnRNNV3}} with these attrs: [T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="gru", seed2=0, is_training=true, num_proj=0, time_major=false, seed=0, dropout=0]
Registered devices: [CPU, GPU]
Registered kernels:
<no registered kernels>
[[cond_41/then/_0/cond/CudnnRNNV3]]
[[model/bidirectional/backward_gru/PartitionedCall]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_520769]
Function call stack:
train_function -> train_function