Why is CoreML Prediction using over 10 times more RAM on older device?

I am using

CoreML
style transfer based on the torch2coreml implementation on git. For purposes herein, I have only substituted my
mlmodel
with an input/output size of 1200 pixels with the sample
mlmodels
.


This works perfectly on my iPhone 7 plus and uses a maximum of 65.11 MB of RAM. Running the identical code and identical

mlmodel
on an iPad Mini 2, it uses 758.87 MB of RAM before it crashes with an out of memory error.


Memory allocations on the iPhone 7 Plus:

http://3DTOPO.com/CoreML-iPhonePlus.png


Memory allocations on the iPad Mini 2:

http://3DTOPO.com/CoreML-iPadMini2.png


Running on the iPad Mini, there are two 200 MB and one 197.77 MB

Espresso
library allocations that are not present on the iPhone 7+. The iPad Mini also uses a 49.39 MB allocation that the iPhone 7+ doesn't use, and three 16.48 MB allocations versus one 16.48 MB allocations on the iPhone 7+ (see screenshots above).


What on earth is going on, and how can I fix it?


Relevant code (download project linked above for full source):

    private var inputImage = UIImage(named: "input")!
    let imageSize = 1200
    private let models = [
       test().model
    ]

    @IBAction func styleButtonTouched(_ sender: UIButton) {
        guard let image = inputImage.scaled(to: CGSize(width: imageSize, height: imageSize), scalingMode: .aspectFit).cgImage else {
            print("Could not get a CGImage")
            return
        }


        let model = models[0] //Use my test model

        toggleLoading(show: true)

        DispatchQueue.global(qos: .userInteractive).async {
            let stylized = self.stylizeImage(cgImage: image, model: model)
   
            DispatchQueue.main.async {
                self.toggleLoading(show: false)
                self.imageView.image = UIImage(cgImage: stylized)
            }
        }
    }

    private func stylizeImage(cgImage: CGImage, model: MLModel) -> CGImage {
        let input = StyleTransferInput(input: pixelBuffer(cgImage: cgImage, width: imageSize, height: imageSize))
        let outFeatures = try! model.prediction(from: input)
        let output = outFeatures.featureValue(for: "outputImage")!.imageBufferValue!
        CVPixelBufferLockBaseAddress(output, .readOnly)
        let width = CVPixelBufferGetWidth(output)
        let height = CVPixelBufferGetHeight(output)
        let data = CVPixelBufferGetBaseAddress(output)!

        let outContext = CGContext(data: data,
                                   width: width,
                                   height: height,
                                   bitsPerComponent: 8,
                                   bytesPerRow: CVPixelBufferGetBytesPerRow(output),
                                   space: CGColorSpaceCreateDeviceRGB(),
                                   bitmapInfo: CGImageByteOrderInfo.order32Little.rawValue | CGImageAlphaInfo.noneSkipFirst.rawValue)!
        let outImage = outContext.makeImage()!
        CVPixelBufferUnlockBaseAddress(output, .readOnly)


        return outImage
    }

    private func pixelBuffer(cgImage: CGImage, width: Int, height: Int) -> CVPixelBuffer {
        var pixelBuffer: CVPixelBuffer? = nil
        let status = CVPixelBufferCreate(kCFAllocatorDefault, width, height, kCVPixelFormatType_32BGRA , nil, &pixelBuffer)
        if status != kCVReturnSuccess {
            fatalError("Cannot create pixel buffer for image")
        }

        CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags.init(rawValue: 0))
        let data = CVPixelBufferGetBaseAddress(pixelBuffer!)
        let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
        let bitmapInfo = CGBitmapInfo(rawValue: CGBitmapInfo.byteOrder32Little.rawValue | CGImageAlphaInfo.noneSkipFirst.rawValue)
        let context = CGContext(data: data, width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer!), space: rgbColorSpace, bitmapInfo: bitmapInfo.rawValue)

        context?.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))
        CVPixelBufferUnlockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))

        return pixelBuffer!
    }

    class StyleTransferInput : MLFeatureProvider {

        /// input as color (kCVPixelFormatType_32BGRA) image buffer, 720 pixels wide by 720 pixels high
        var input: CVPixelBuffer

        var featureNames: Set<String> {
            get {
                return ["inputImage"]
            }
        }

        func featureValue(for featureName: String) -> MLFeatureValue? {
            if (featureName == "inputImage") {
                return MLFeatureValue(pixelBuffer: input)
            }
            return nil
        }

        init(input: CVPixelBuffer) {
            self.input = input
        }
    }

Replies

Hmm... let me have a guess:


The iPad Mini 2 has only an A7 processor. This is the lowest end for CoreML support. My experiements showed me that CoreML only supports GPU accelleration on A9 chips and newer (GPU device family 3). This means on your iPad CoreML is executed entirely on the CPU.


Now note also that the memory instrument does not show GPU memory utilization. That means on the iPhone 7, where the model can be executed on the GPU, most of the resouces will be allocated in GPU memory and therefore don't show up in the graph. A good indicator is the "Other Processes" part of the memory gauge you can see in Xcode during runtime.


Also note that the iPad only has 1 GB of RAM, whereas the iPhone has 3. It's most likely your app crashes on the iPad due to the high memory consumption (almost 80%), but on the iPhone it's still within the allowed bounds.


I suggest you try to reduce the input size for your model, if feasible. You can also try to use half-float-typed weights, which were introduced some days ago with iOS 11.2. In my app I ship different models for older and newer devices.

OK, so I have drastically reduced the size of my model from ~7MB down to ~450kb. Since the A7 can't process CoreML on the GPU, for comparative testing, I have set CoreML to process only on the CPU:


let options = MLPredictionOptions()
options.usesCPUOnly = true
var outFeatures: MLFeatureProvider?
do {
    try outFeatures = model.prediction(from: input, options: options)
}
catch {
    print("prediction error: \(error)")
}


Now on the iPhone 7+, running prediction() uses a maximum of 325.7MB. I thought that should be plenty good to run on the iPad Mini 2 with 1GB of RAM.


Wrong! Running on the iPad Mini 2, it uses 956.74MB of RAM before it crashes out. No crash report is captured since it is terminating due to exceeding memory.


This just doesn't make any sense to me, and causing me to pull out my hair! Why would the same model, both running on the CPU use over 3 times the RAM on the older device? Besides having issues with the iPad Mini 2, a tester with an iPhone 7 is experiencing some out of memory terminations - which just seems crazy to me.


I tried using a half precision model (fp16), but still terminates out of memory on the iPad mini 2. It makes the model smaller, but doesn't seem to reduce the memory footprint.

  • Did you ever find a solution to this? I’m having a similar issue with my coreml model eating up too much memory when predicting.

Add a Comment