I am getting a segmentation violation in Tensorflow running the Top2vec model.
https://github.com/ddangelov/Top2Vec
Here's my code snippet:
import sys
import faulthandler
import numpy as np
import pandas as pd
import json
import os
import ipywidgets as widgets
from IPython.display import clear_output, display
from top2vec import Top2Vec
faulthandler.enable(file=sys.stderr, all_threads=True)
def filter_short(papers_df):
papers_df["token_counts"] = papers_df["text"].str.split().map(len)
papers_df = papers_df[papers_df.token_counts > 50].reset_index(drop=True)
papers_df.drop('token_counts', axis=1, inplace=True)
return papers_df
papers_prepared_df = pd.read_feather("/Users/davidlaxer/Downloads/archive/covid19_papers_processed.feather")
papers_feathered_filtered_df = filter_short(papers_prepared_df)
top2vec_trained = Top2Vec(documents=papers_feathered_filtered_df.text.tolist(), embedding_model="universal-sentence-encoder", use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4/", workers=4)
Here's the stack trace:
/Users/davidlaxer/tensorflow-metal/bin/python /Users/davidlaxer/Top2Vec/notebooks/test1.py
2021-12-24 08:50:04,782 - top2vec - INFO - Pre-processing documents for training
/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
warnings.warn(msg, category=FutureWarning)
2021-12-24 08:51:37,701 - top2vec - INFO - Loading universal-sentence-encoder model at /Users/davidlaxer/Downloads/universal-sentence-encoder_4/
2021-12-24 08:51:37.837087: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-24 08:51:37.838170: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-12-24 08:51:37.838380: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: AMD Radeon Pro 5700 XT
systemMemory: 128.00 GB
maxCacheSize: 7.99 GB
2021-12-24 08:51:39.764518: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-12-24 08:51:41,157 - top2vec - INFO - Creating joint document/word embedding
2021-12-24 08:51:41.241496: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
Fatal Python error: Segmentation fault
Thread 0x000000010c32a600 (most recent call first):
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 58 in quick_execute
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 598 in call
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1959 in _call_flat
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3130 in __call__
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 949 in _call
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 910 in __call__
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 150 in error_handler
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 701 in _call_attribute
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/top2vec/Top2Vec.py", line 538 in _embed_documents
File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/top2vec/Top2Vec.py", line 344 in __init__
File "/Users/davidlaxer/Top2Vec/notebooks/test1.py", line 24 in <module>
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
This is likely related to: https://github.com/ddangelov/Top2Vec/issues/232
However, SIGSEGV indicates some other problem(s).
% python --version
Python 3.8.5
% pip show tensorflow_macos
Name: tensorflow-macos
Version: 2.7.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, six, tensorboard, tensorflow-estimator, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wheel, wrapt
Required-by:
% pip show tensorflow_metal
Name: tensorflow-metal
Version: 0.3.0
Summary: TensorFlow acceleration for Mac GPUs.
Home-page: https://developer.apple.com/metal/tensorflow-plugin/
Author:
Author-email:
License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved.
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: six, wheel
Required-by: