I did a very simple test using Metal PerformanceShaders. It is basically a MPSMatrixMultiplication.
I compiled it in Terminal with 'swiftc matrixMul.swift'
Then you get the executable called matrixMul.
Now, execute 'time matrixMul' and after approximately 10 seconds you will get the first then elements of the result matrix prows i/nted on screen.
This all works as intended, however I realized in Activity Monitor (and also because of high GPU fan speed), that somehow the GPU isn't fully released. The Activity Monitor through the GPU monitor shows 100% activity for several minutes anfter program exit. Somehow the matrixMul process keeps the GPU busy after program exit. Do I miss some statement in my code which tells the system to free resources?
Below is the simple test-code
-------------------------------------
import Accelerate
import MetalPerformanceShaders
let n = 8192
let rowsA = n
let columnsA = n
let rowsB = n
let columnsB = n
let rowsC = n
let columnsC = n
var arrayA = [Float](repeating: 1, count: rowsA * columnsA)
var arrayB = [Float](repeating: 2, count: rowsB * columnsB)
var arrayC = [Float](repeating: 0, count: rowsC * columnsC)
var device: MTLDevice!
device = MTLCreateSystemDefaultDevice();
guard device != nil else {
fatalError("Error: This device does not support Metal")
}
let bufferA = device.makeBuffer(bytes: arrayA,
length: rowsA * columnsA * MemoryLayout<Float>.stride,
options: [])!;
let bufferB = device.makeBuffer(bytes: arrayB,
length: rowsB * columnsB * MemoryLayout<Float>.stride,
options: [])!;
let bufferC = device.makeBuffer(length: rowsC * columnsC * MemoryLayout<Float>.stride, options: [])!;
let descA = MPSMatrixDescriptor(dimensions: rowsA, columns: columnsA,
rowBytes: columnsA * MemoryLayout<Float>.stride,
dataType: .float32);
let descB = MPSMatrixDescriptor(dimensions: rowsB, columns: columnsB,
rowBytes: columnsB * MemoryLayout<Float>.stride,
dataType: .float32);
let descC = MPSMatrixDescriptor(dimensions: rowsC, columns: columnsC,
rowBytes: columnsC * MemoryLayout<Float>.stride,
dataType: .float32);
var matrixA: MPSMatrix!;
var matrixB: MPSMatrix!;
var matrixC: MPSMatrix!;
matrixA = MPSMatrix(buffer: bufferA, descriptor: descA);
matrixB = MPSMatrix(buffer: bufferB, descriptor: descB);
matrixC = MPSMatrix(buffer: bufferC, descriptor: descC);
let matrixMultiplication = MPSMatrixMultiplication(device: device,
transposeLeft: false, transposeRight: false,
resultRows: rowsC, resultColumns: columnsC,
interiorColumns: columnsA, alpha: 1, beta: 0);
var commandQueue: MTLCommandQueue!;
commandQueue = device.makeCommandQueue();
let commandBuffer = commandQueue.makeCommandBuffer()!;
matrixMultiplication.encode(commandBuffer: commandBuffer, leftMatrix: matrixA,
rightMatrix: matrixB, resultMatrix: matrixC);
print("start calculation on GPU")
let start = DispatchTime.now();
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
let end = DispatchTime.now()
print("time =", 1e-9 * Double(end.uptimeNanoseconds - start.uptimeNanoseconds), "sec")
// we look at the result
let rawPointer = matrixC.data.contents();
let count = matrixC.rows * matrixC.columns;
let typedPointer = rawPointer.bindMemory(to: Float.self, capacity: count);
let bufferedPointer = UnsafeBufferPointer(start: typedPointer, count: count);
// Print the first 10 results, to make sure it's not all 0s or NaNs.
print("\nFirst 10 results:")
for i in 0..<10 {
print(arrayC[i], bufferedPointer[i]);
}
-----------------------------