GPU usage constantly high after MPS test program

I did a very simple test using Metal PerformanceShaders. It is basically a MPSMatrixMultiplication.

I compiled it in Terminal with 'swiftc matrixMul.swift'

Then you get the executable called matrixMul.

Now, execute 'time matrixMul' and after approximately 10 seconds you will get the first then elements of the result matrix prows i/nted on screen.


This all works as intended, however I realized in Activity Monitor (and also because of high GPU fan speed), that somehow the GPU isn't fully released. The Activity Monitor through the GPU monitor shows 100% activity for several minutes anfter program exit. Somehow the matrixMul process keeps the GPU busy after program exit. Do I miss some statement in my code which tells the system to free resources?


Below is the simple test-code

-------------------------------------

import Metal

import Accelerate

import MetalPerformanceShaders

let n = 8192

let rowsA = n

let columnsA = n

let rowsB = n

let columnsB = n

let rowsC = n

let columnsC = n

var arrayA = [Float](repeating: 1, count: rowsA * columnsA)

var arrayB = [Float](repeating: 2, count: rowsB * columnsB)

var arrayC = [Float](repeating: 0, count: rowsC * columnsC)

var device: MTLDevice!

device = MTLCreateSystemDefaultDevice();

guard device != nil else {

fatalError("Error: This device does not support Metal")

}

let bufferA = device.makeBuffer(bytes: arrayA,

length: rowsA * columnsA * MemoryLayout<Float>.stride,

options: [])!;

let bufferB = device.makeBuffer(bytes: arrayB,

length: rowsB * columnsB * MemoryLayout<Float>.stride,

options: [])!;

let bufferC = device.makeBuffer(length: rowsC * columnsC * MemoryLayout<Float>.stride, options: [])!;


let descA = MPSMatrixDescriptor(dimensions: rowsA, columns: columnsA,

rowBytes: columnsA * MemoryLayout<Float>.stride,

dataType: .float32);

let descB = MPSMatrixDescriptor(dimensions: rowsB, columns: columnsB,

rowBytes: columnsB * MemoryLayout<Float>.stride,

dataType: .float32);

let descC = MPSMatrixDescriptor(dimensions: rowsC, columns: columnsC,

rowBytes: columnsC * MemoryLayout<Float>.stride,

dataType: .float32);

var matrixA: MPSMatrix!;

var matrixB: MPSMatrix!;

var matrixC: MPSMatrix!;

matrixA = MPSMatrix(buffer: bufferA, descriptor: descA);

matrixB = MPSMatrix(buffer: bufferB, descriptor: descB);

matrixC = MPSMatrix(buffer: bufferC, descriptor: descC);

let matrixMultiplication = MPSMatrixMultiplication(device: device,

transposeLeft: false, transposeRight: false,

resultRows: rowsC, resultColumns: columnsC,

interiorColumns: columnsA, alpha: 1, beta: 0);

var commandQueue: MTLCommandQueue!;

commandQueue = device.makeCommandQueue();

let commandBuffer = commandQueue.makeCommandBuffer()!;



matrixMultiplication.encode(commandBuffer: commandBuffer, leftMatrix: matrixA,

rightMatrix: matrixB, resultMatrix: matrixC);

print("start calculation on GPU")

let start = DispatchTime.now();

commandBuffer.commit()

commandBuffer.waitUntilCompleted()

let end = DispatchTime.now()

print("time =", 1e-9 * Double(end.uptimeNanoseconds - start.uptimeNanoseconds), "sec")



// we look at the result

let rawPointer = matrixC.data.contents();

let count = matrixC.rows * matrixC.columns;

let typedPointer = rawPointer.bindMemory(to: Float.self, capacity: count);

let bufferedPointer = UnsafeBufferPointer(start: typedPointer, count: count);

// Print the first 10 results, to make sure it's not all 0s or NaNs.

print("\nFirst 10 results:")

for i in 0..<10 {

print(arrayC[i], bufferedPointer[i]);

}

-----------------------------

Replies

I'm experiencing a very similar issue on an M2 Max GPU. I'm using wgpu to run compute shaders, but internally it is using the Metal APIs. Restarting the computer is the only way to reduce the usage after the program has exited.

Did you manage to find a workaround for this to eliminate the leak or to manually clean up the resources?