Some resource has been exhausted. For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space. @@init 2 root error(s) found. (0) RESOURCE_EXHAUSTED: OOM when allocating

Question

dbl001 OP

Created Dec ’21

Replies 2

Boosts 0

Views 1.3k

Participants 2

In a tensorflow-metal virtual environment on OS X 12.1:

tensorboard                  2.6.0
tensorboard-data-server      0.6.1
tensorboard-plugin-profile   2.5.0
tensorboard-plugin-wit       1.8.0
tensorflow                   2.6.0
tensorflow-addons            0.14.0
tensorflow-consciousness     0.1
tensorflow-datasets          4.4.0
tensorflow-estimator         2.7.0
tensorflow-gan               2.1.0
tensorflow-hub               0.12.0
tensorflow-io-gcs-filesystem 0.22.0
tensorflow-macos             2.7.0
tensorflow-metadata          1.2.0
tensorflow-metal             0.3.0
tensorflow-probability       0.14.1
tensorflow-similarity        0.13.45
tensorflow-text              2.7.3

Running the Top2vec model: https://github.com/ddangelov/Top2Vec

import numpy as np 
import pandas as pd 
import json
import os
import ipywidgets as widgets
from IPython.display import clear_output, display
from top2vec import Top2Vec

papers_prepared_df = pd.read_feather("/Users/davidlaxer/Downloads/archive/covid19_papers_processed.feather")
top2vec_trained = Top2Vec(documents=papers_prepared_df.text.tolist(), embedding_model="universal-sentence-encoder", use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4/", workers=4)

2021-12-20 06:30:52,188 - top2vec - INFO - Pre-processing documents for training
/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
  warnings.warn(msg, category=FutureWarning)
2021-12-20 06:31:57,351 - top2vec - INFO - Loading universal-sentence-encoder model at /Users/davidlaxer/Downloads/universal-sentence-encoder_4
2021-12-20 06:31:57.488459: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-20 06:31:57.489288: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-12-20 06:31:57.489490: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: AMD Radeon Pro 5700 XT
2021-12-20 06:31:59.447260: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-12-20 06:32:00,841 - top2vec - INFO - Creating joint document/word embedding
2021-12-20 06:32:00.923838: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.

Some resource has been exhausted.

  For example, this error might be raised if a per-user quota is
  exhausted, or perhaps the entire file system is out of space.

  @@__init__
  
2 root error(s) found.
  (0) RESOURCE_EXHAUSTED:  OOM when allocating tensor with shape[114389,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
	 [[{{node EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/GatherV2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	 [[StatefulPartitionedCall/StatefulPartitionedCall/EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/Reshape_1/_188]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) RESOURCE_EXHAUSTED:  OOM when allocating tensor with shape[114389,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
	 [[{{node EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/GatherV2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

...

I tried adjusting the batchsize (e.g - 500, 100, 50, 10, 5).

Boost

Answer 1

dbl001 OP

Dec ’21

The exception is generated building a list of document vectors from input documents not in model training: E.g. - document_vectors.append(self.embed(train_corpus[current:current + batch_size]))

The python 3.8 process grows in memory to 100GB and then generates the OOM exception.

def _embed_documents(self, train_corpus):

    self._check_import_status()
    self._check_model_status()

    # embed documents
    batch_size = 5
    document_vectors = []

    current = 0
    batches = int(len(train_corpus) / batch_size)
    extra = len(train_corpus) % batch_size

    for ind in range(0, batches):
        try:
            __**document_vectors.append(self.embed(train_corpus[current:current + batch_size]))**__
        except Exception as e:
            print (e.__doc__)
            print (e.message)
        current += batch_size

    if extra > 0:
        document_vectors.append(self.embed(train_corpus[current:current + extra]))

    document_vectors = self._l2_normalize(np.array(np.vstack(document_vectors)))

    return document_vectors

0

Answer 2

ker2x OP

Dec ’21

shape[114389,320] ? are you sure you're not doing something wrong here ?

0

Some resource has been exhausted. For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space. @@__init__ 2 root error(s) found. (0) RESOURCE_EXHAUSTED: OOM when allocating

Some resource has been exhausted. For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space. @@init 2 root error(s) found. (0) RESOURCE_EXHAUSTED: OOM when allocating