Training Top2vec Model Crashed OS X 12.3.1

Training Top2vec with embedding_batch_size=256 crashed OS X 12.3.1

tensorflow_macos 2.8.0, tensorflow_metal 0.4.0 Anaconda Python 3.8.5

% pip show tensorflow_macos
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: tensorflow-macos
Version: 2.8.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, setuptools, six, tensorboard, termcolor, tf-estimator-nightly, typing-extensions, wrapt
Required-by: 
(tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 top2vec % pip show tensorflow_metal
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: tensorflow-metal
Version: 0.4.0
Summary: TensorFlow acceleration for Mac GPUs.
Home-page: https://developer.apple.com/metal/tensorflow-plugin/
Author: 
Author-email: 
License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved.
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: six, wheel
Required-by: 

To train the model with embedding_model="universal-sentence-encoder", you'll have to download universal-sentence-encoder_4.

top2vec_trained = Top2Vec(documents=papers_filtered_df.text.tolist(),  split_documents=True, **embedding_batch_size=256,** embedding_model="universal-sentence-encoder",  use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4", workers=8)

Here's the project:

https://github.com/ddangelov/Top2Vec

Here's the Jupyter notebook:

https://github.com/ddangelov/Top2Vec/blob/master/notebooks/CORD-19_top2vec.ipynb

You'll have to load the COVID-19 data set from Kaggle here:

https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge

I set filter size to 1,000:

def filter_short(papers_df):
    papers_df["token_counts"] = papers_df["text"].str.split().map(len)
    papers_df = **papers_df[papers_df.token_counts>1000].reset_index(drop=True)**
    papers_df.drop('token_counts', axis=1, inplace=True)

    return papers_df

Trace

panic(cpu 8 caller 0xffffff8020d449ad): userspace watchdog timeout: no successful checkins from WindowServer in 120 seconds
service: logd, total successful checkins since wake (127621 seconds ago): 12763, last successful checkin: 0 seconds ago
service: WindowServer, total successful checkins since wake (127621 seconds ago): 12751, last successful checkin: 120 seconds ago
service: remoted, total successful checkins since wake (127621 seconds ago): 12763, last successful checkin: 0 
[Trace](https://developer.apple.com/forums/content/attachment/d17c2c9b-569b-4c1a-9c61-892ced7f785b)

Hi @dbl001,

Thanks for reporting this, I will try to get this reproduced locally to inspect the cause of the crash. Could you confirm is this also on iMac 27" 2020 with an AMD Radeon Pro 5700 XT GPU system? This will help me to try reproduce it with the correct setup from the start.

Hi @dbl001,

I'm trying to reproduce the issue but having trouble loading the pre-rained model you provided. Can you please tell the exact versions of Top2vec packages you used to do the initial training as well as Anaconda version? Would really help if you could provide all installed packages from you environment so that we can mimic it 100%, to do so just run pip3 freeze > requirements.txt. I installed requirements.xtx from your repo, but still get incompatibly errors.

Could you try reproducing the issue with the updated OS X and tensorflow-macos, tensorflow-metal packages, please?

Training Top2vec Model Crashed OS X 12.3.1
 
 
Q