In my mobile application, I observe a memory leak when running inference with my image convolution model. The memory leak occurs when getting predictions from the model.
Given a pointer to a loaded MLModel
object called module
and input feature provider feature_provider
(of type MLDictionaryFeatureProvider*
), the memory leak is observed each time a prediction is made by calling
[module predictionFromFeatures:feature_provider error:NULL];
The amount of memory leaked between each iteration appears to be related to the output size of the model. Assuming the mobile GPU backend is running in half-precision (float16
), I observe the following for the given output sizes;
- Output image of dimension
[1,3,3840,2160]
(of size1*3*3840*2160*16bits/(8bits * 1000^2) == 49.7664MB
)- Constant increase in memory of approximately
91.7MB
after each image prediction.
- Constant increase in memory of approximately
- Output image of dimension
[1,3,2048,1080]
(of size1*3*2048*1080*16bits/(8bits * 1000^2) == 13.27104MB
)- Constant increase in memory of approximately
23.7MB
after each image prediction.
- Constant increase in memory of approximately
Is there a known issue with the CoreML MLModel
's predictionFromFeatures
which allocates memory each time it is called? Or is this the intended behaviour? At the moment this is limiting me from running inference on mobile devices, and I was wondering if anyone has a suggested workaround, patch, or advice?
Thank you in advance, and please find the information to reproduce the issue below.
To Reproduce
To reproduce the problem, a simple model with three convolutions and one pixel-shuffle layer was converted from PyTorch to an MLModel. The MLModel was then run with a debugger in a mobile application. A breakpoint was set on the line computing the predictions in a loop and the memory use after each iteration was observed to increase. Alternatively to setting a breakpoint, the number of prediction iterations can be set to 50 (assuming output size is [1,3,3840,2160]
and phone memory is 4GB
), which causes the application to run out of memory at runtime.
The PyTorch model:
import torch.nn as nn
class Model(nn.Module):
def __init__(self):
super().__init__()
upscale_factor = 8
self.Conv1 = nn.Conv2d(in_channels = 48, out_channels = 48, kernel_size = 3, stride = 1)
self.Conv2 = nn.Conv2d(48, 48, 3, 1)
self.Conv3 = nn.Conv2d(48, 3 * (upscale_factor*upscale_factor), 3, 1)
self.PS = nn.PixelShuffle(upscale_factor)
def forward(self, x):
Conv1 = self.Conv1(x)
Conv2 = self.Conv2(Conv1)
Conv3 = self.Conv3(Conv2)
y = self.PS(Conv3)
return y
The PyTorch to MLModel converter:
import torch
import coremltools
def convert_torch_to_coreml(torch_model, input_shapes, save_path):
torchscript_model = torch.jit.script(torch_model)
mlmodel = coremltools.converters.convert(
torchscript_model,
inputs=[coremltools.TensorType(name=f'input_{i}', shape=input_shape) for i, input_shape in enumerate(input_shapes)],
)
mlmodel.save(save_path)
Generate MLModel using the above definitions:
if __name__ == "__main__":
torch_model = Model()
# input_shapes = [[1,48,256,135]] # 2K
input_shapes = [[1,48,480,270]] # 4K
coreml_model_path = "./toy.mlmodel"
convert_torch_to_coreml(torch_model, input_shapes, coreml_model_path)
Mobile application:
The mobile application was generated using PyTorch's iOS TestApp
and adapted for our use case. The adapted TestApp
is available here.. The most relevant lines in the application for loading the model and running inference are included below:
- Set
MLMultiArray
pointer to input tensor's data pointer:
+ (MLMultiArray*) tensorToMultiArray:(at::Tensor) input {
float* input_ptr = input.data_ptr<float>();
int batch = (int) input.size(0);
int ch = (int) input.size(1);
int height = (int) input.size(2);
int width = (int) input.size(3);
int pixels = ch * height * width;
NSArray* shape = @[[NSNumber numberWithInt:batch][NSNumber numberWithInt: ch], [NSNumber numberWithInt: height], [NSNumber numberWithInt: width]];
MLMultiArray* output = [[MLMultiArray alloc] initWithShape:shape dataType:MLMultiArrayDataTypeFloat32 error:NULL];
float* output_ptr = (float *) output.dataPointer;
for (int pixel_index = 0; pixel_index < pixels; ++pixel_index) {
output_ptr[pixel_index] = input_ptr[pixel_index];
}
return output;
}
- Load model, set input feature provider, and run inference over multiple iterations:
NSError* __autoreleasing __nullable* __nullable error = nil;
NSString* modelPath = [NSString stringWithUTF8String:model_path.c_str()];
NSURL* modelURL = [NSURL fileURLWithPath:modelPath];
NSURL* compiledModel = [MLModel compileModelAtURL:modelURL error:error];
MLModel* module = [MLModel modelWithContentsOfURL:compiledModel error:NULL];
NSMutableDictionary* feature_inputs = [[NSMutableDictionary alloc] init];
for (int i = 0; i < inputs.size(); ++i) {
NSString* key = [NSString stringWithFormat:@"input_%d", i];
[feature_inputs setValue:[Converter tensorToMultiArray: inputs[i].toTensor()] forKey: key];
}
MLDictionaryFeatureProvider* feature_provider = [[[MLDictionaryFeatureProvider alloc] init] initWithDictionary:feature_inputs error:NULL];
// Running inference on the model results in memory leak
for (int i = 0; i < iter; ++i) {
[module predictionFromFeatures:feature_provider error:NULL];
}
Complete example source
The complete minimal example of both the MLModel
generation and the TestApp
are available here.
System environment:
Original environment:
- coremltools version: 5.0b5:
- OS: build on MacOS targetting iOS for mobile application:
- macOS version: Big Sur (version 11.4)
- iOS version: 14.7.1 (run on iPhone 12)
- XCode version: Version 12.5.1 (12E507)
- How you install python: Install from source
- python version: 3.8.10
- How you install Pytorch: Install from source
- PyTorch version: 1.8.1.
Update to 'latest' environment
- coremltools version: 5.0b5:
- OS: build on MacOS targetting iOS for mobile application:
- macOS version: Big Sur (version 11.4)
- iOS version: 15.0.2 (run on iPhone 12)
- XCode version: Version 13.0(13A233)
- How you install Python: Install from source
- python version: 3.8.10
- How you install Pytorch: Install from source
- PyTorch version: 1.10.0-rc2
Additional Information
Given the model definition and tensor output shapes above, the corresponding tensor input shapes for the model are as follows:
- Output shape of
[1,3,3840,2160]
has input shape[1,48,480,270]
- Output shape of
[1,3,2048,1080]
has input shape[1,48,256,135]
Hi @mylesDoyle, CoreML does allocate memory within predictionFromFeatures but, they are released when the autorelease pool pops. To explicitly trigger the release, you could wrap the prediction call inside @autoreleasepool { }
. Let us know if you run into issues after that.