Memory leak for CoreML inference on iOS device

Question

Created Oct ’21

Replies 2

Boosts 0

Views 1.8k

Participants 2

In my mobile application, I observe a memory leak when running inference with my image convolution model. The memory leak occurs when getting predictions from the model.

Given a pointer to a loaded MLModel object called module and input feature provider feature_provider (of type MLDictionaryFeatureProvider*), the memory leak is observed each time a prediction is made by calling

[module predictionFromFeatures:feature_provider error:NULL];

The amount of memory leaked between each iteration appears to be related to the output size of the model. Assuming the mobile GPU backend is running in half-precision (float16), I observe the following for the given output sizes;

Output image of dimension [1,3,3840,2160] (of size 1*3*3840*2160*16bits/(8bits * 1000^2) == 49.7664MB)
- Constant increase in memory of approximately 91.7MB after each image prediction.
Output image of dimension [1,3,2048,1080] (of size 1*3*2048*1080*16bits/(8bits * 1000^2) == 13.27104MB)
- Constant increase in memory of approximately 23.7MB after each image prediction.

Is there a known issue with the CoreML MLModel's predictionFromFeatures which allocates memory each time it is called? Or is this the intended behaviour? At the moment this is limiting me from running inference on mobile devices, and I was wondering if anyone has a suggested workaround, patch, or advice?

Thank you in advance, and please find the information to reproduce the issue below.

To Reproduce

To reproduce the problem, a simple model with three convolutions and one pixel-shuffle layer was converted from PyTorch to an MLModel. The MLModel was then run with a debugger in a mobile application. A breakpoint was set on the line computing the predictions in a loop and the memory use after each iteration was observed to increase. Alternatively to setting a breakpoint, the number of prediction iterations can be set to 50 (assuming output size is [1,3,3840,2160] and phone memory is 4GB), which causes the application to run out of memory at runtime.

The PyTorch model:

import torch.nn as nn

class Model(nn.Module):
	def __init__(self):
		super().__init__()
		upscale_factor  = 8

		self.Conv1 = nn.Conv2d(in_channels = 48, out_channels = 48, kernel_size = 3, stride = 1)
		self.Conv2 = nn.Conv2d(48, 48, 3, 1)
		self.Conv3 = nn.Conv2d(48, 3 * (upscale_factor*upscale_factor), 3, 1)

		self.PS = nn.PixelShuffle(upscale_factor)

	def forward(self, x):

		Conv1 = self.Conv1(x)
		Conv2 = self.Conv2(Conv1)
		Conv3 = self.Conv3(Conv2)
		y = self.PS(Conv3)

		return y

The PyTorch to MLModel converter:

import torch
import coremltools

def convert_torch_to_coreml(torch_model, input_shapes, save_path):

	torchscript_model = torch.jit.script(torch_model)

	mlmodel = coremltools.converters.convert(
		torchscript_model,
		inputs=[coremltools.TensorType(name=f'input_{i}', shape=input_shape) for i, input_shape in enumerate(input_shapes)],
	)

	mlmodel.save(save_path)

Generate MLModel using the above definitions:

if __name__ == "__main__":
	torch_model = Model()

	# input_shapes = [[1,48,256,135]]  # 2K
	input_shapes = [[1,48,480,270]]    # 4K
	coreml_model_path = "./toy.mlmodel"

	convert_torch_to_coreml(torch_model, input_shapes, coreml_model_path)

Mobile application:

The mobile application was generated using PyTorch's iOS TestApp and adapted for our use case. The adapted TestApp is available here.. The most relevant lines in the application for loading the model and running inference are included below:

Set MLMultiArray pointer to input tensor's data pointer:

+ (MLMultiArray*) tensorToMultiArray:(at::Tensor) input {

  float* input_ptr = input.data_ptr<float>();
  int batch = (int) input.size(0);
  int ch = (int) input.size(1);
  int height = (int) input.size(2);
  int width = (int) input.size(3);

  int pixels = ch * height * width;

  NSArray* shape = @[[NSNumber numberWithInt:batch][NSNumber numberWithInt: ch], [NSNumber numberWithInt: height], [NSNumber numberWithInt: width]];

  MLMultiArray* output = [[MLMultiArray alloc] initWithShape:shape dataType:MLMultiArrayDataTypeFloat32 error:NULL];
  float* output_ptr = (float *) output.dataPointer;

  for (int pixel_index = 0; pixel_index < pixels; ++pixel_index) {
    output_ptr[pixel_index] = input_ptr[pixel_index];
  }

  return output;
}

Load model, set input feature provider, and run inference over multiple iterations:

    NSError* __autoreleasing __nullable* __nullable error = nil;

    NSString* modelPath = [NSString stringWithUTF8String:model_path.c_str()];
    NSURL* modelURL = [NSURL fileURLWithPath:modelPath];
    NSURL* compiledModel = [MLModel compileModelAtURL:modelURL error:error];

    MLModel* module = [MLModel modelWithContentsOfURL:compiledModel error:NULL];

    NSMutableDictionary* feature_inputs = [[NSMutableDictionary alloc] init];
    for (int i = 0; i < inputs.size(); ++i) {
      NSString* key = [NSString stringWithFormat:@"input_%d", i];
      [feature_inputs setValue:[Converter tensorToMultiArray: inputs[i].toTensor()] forKey: key];
    }
    MLDictionaryFeatureProvider* feature_provider = [[[MLDictionaryFeatureProvider alloc] init] initWithDictionary:feature_inputs error:NULL];

    // Running inference on the model results in memory leak
    for (int i = 0; i < iter; ++i) {
      [module predictionFromFeatures:feature_provider error:NULL];
    }

Complete example source

The complete minimal example of both the MLModel generation and the TestApp are available here.

System environment:

Original environment:

coremltools version: 5.0b5:
OS: build on MacOS targetting iOS for mobile application:
- macOS version: Big Sur (version 11.4)
- iOS version: 14.7.1 (run on iPhone 12)
- XCode version: Version 12.5.1 (12E507)
How you install python: Install from source
- python version: 3.8.10
How you install Pytorch: Install from source
- PyTorch version: 1.8.1.

Update to 'latest' environment

coremltools version: 5.0b5:
OS: build on MacOS targetting iOS for mobile application:
- macOS version: Big Sur (version 11.4)
- iOS version: 15.0.2 (run on iPhone 12)
- XCode version: Version 13.0(13A233)
How you install Python: Install from source
- python version: 3.8.10
How you install Pytorch: Install from source
- PyTorch version: 1.10.0-rc2

Additional Information

Given the model definition and tensor output shapes above, the corresponding tensor input shapes for the model are as follows:

Output shape of [1,3,3840,2160] has input shape [1,48,480,270]
Output shape of [1,3,2048,1080] has input shape [1,48,256,135]

Answered by AppleCare Staff in 691758022

Hi @mylesDoyle, CoreML does allocate memory within predictionFromFeatures but, they are released when the autorelease pool pops. To explicitly trigger the release, you could wrap the prediction call inside @autoreleasepool { }. Let us know if you run into issues after that.

Boost

Answer 1

AppleCare Staff OP

Apple

Oct ’21

Accepted Answer

Hi @mylesDoyle, CoreML does allocate memory within predictionFromFeatures but, they are released when the autorelease pool pops. To explicitly trigger the release, you could wrap the prediction call inside @autoreleasepool { }. Let us know if you run into issues after that.

2

Answer 2

mylesDoyle OP

Oct ’21

Hi @Frameworks Engineer, thank you for your reply! That's great to know. I applied the @autoreleasepool block as you suggested and this resolves the memory leak. Thanks again!

0