Sample code for MPSMatrixDecompositionCholesky?

I am trying to get my code to use MPSMatrixDecompositionCholesky correctly but it is giving me an incorrect matrix result.

Any help would be greatly appreciated!


let device = MTLCreateSystemDefaultDevice()!

let commandQueue = device.makeCommandQueue()


let M = 2


let row = M * MemoryLayout<Float>.stride

let matlength = M * row

let mdesc = MPSMatrixDescriptor(

dimensions: M, columns: M, rowBytes: row, dataType: MPSDataType.float32)


let arrayA: [Float] = [ 2.0 , 1.0 , 1.0 , 2.0 ]

let buffA = device.makeBuffer(bytes: arrayA, length: matlength)

let matA = MPSMatrix(buffer: buffA!, descriptor: mdesc)

let arrayB: [Float] = [ 1.0 , 0.0 , 0.0 , 0.0 ]

let buffB = device.makeBuffer(bytes: arrayB, length: matlength)

let matB = MPSMatrix(buffer: buffB!, descriptor: mdesc)

let arrayX: [Float] = [ 0.0 , 0.0 , 0.0 , 0.0 ]

let buffX = device.makeBuffer(bytes: arrayX, length: matlength)

let matX = MPSMatrix(buffer: buffX!, descriptor: mdesc)

let arrayL: [Float] = [ 0.0 , 0.0 , 0.0 , 0.0 ]

let buffL = device.makeBuffer(bytes: arrayL, length: matlength)

let matL = MPSMatrix(buffer: buffL!, descriptor: mdesc)


let decomp = MPSMatrixDecompositionCholesky(device: device, lower: true, order: M )


let commandBuffer = commandQueue?.makeCommandBuffer()


decomp.encode(commandBuffer: commandBuffer!,

sourceMatrix: matA, resultMatrix: matL, status: buffX)


commandBuffer?.commit()


let rawPointerL = matL.data.contents()

let countL = matL.rows * matL.columns

let typedPointerL = rawPointerL.bindMemory(to: Float.self, capacity: countL)

let bufferedPointerL = UnsafeBufferPointer(start: typedPointerL, count: countL)

print(" ")

print(" ",bufferedPointerL[0], bufferedPointerL[1])

print(" ",bufferedPointerL[2], bufferedPointerL[3])

print(" ")

Replies

What is the output?


What do you expect the output to be?

The result matrix matL is:

1.41421 1.0

inf. -inf


But it should equal:

1.4142 0

0.7071 1.2247

in my original post you must :

import MetalPerformanceShaders

at the beginning of the coding forgot to show that in the above code.


So, what I am trying to ultimately accomplish is to solve the A*X = B for the unknown X.

But the first step is to determine the lower factorization matrix L which is needed to solve for X.

Ok, this is getting spooky. I decided to try it on another Mac ... switched from a 2017 iMac to a 2017 MacBook Pro.

Using the same code I get different answers! The answers are closer but still not correct.


On the MacBook Pro I get:

1.41421 1.0

0.707107 1.22474


Previously, on the iMac I got:

1.41421 1.0

inf -inf


That 1.0 should be a zero on the MacBook Pro.

If it were zero that would be the correct lower factorization for L*L transpose to equal the A matrix.


What is going on??

I checked what type of 'device' or GPU that: let device = MTLCreateSystemDefaultDevice()!

may refer to on the iMac and the MacBook Pro.

The iMac has a Radeon Pro 555 with Metal support for feature set macOS GPUFamily1 v3

The MacBook Pro has Intel Iris Plus Graphics 650 with Metal support for feature set macOS GPUFamily1 v3


Something is different between the interpretation of the code on these machines... maybe inconsistent buffers??

However, I realize since my coding is likely in error (even if it runs with no compiler errors) this may result in incorrect or inconsistent results.

I would check the value of rowBytes(fromColumns:dataType). You're currently assuming it is columns*sizeof(float) but the docs for MPSMatrixDescriptor say:


For performance considerations, the optimal row stride may not necessarily be equal to the number of columns in the matrix. The

rowBytes(fromColumns:dataType:)
method may be used to help you determine this value.


If this row stride is different between the different GPUs you've tried, that would explain the error.

I tried: let row = MPSMatrixDescriptor.rowBytes(forColumns: M, dataType: MPSDataType.float32)

Checked the number of bytes. This gave 16 bytes.

Whereas: let row = M * MemoryLayout<Float>.stride

gave 8 bytes.


The use of: let row = MPSMatrixDescriptor.rowBytes(forColumns: M, dataType: MPSDataType.float32)

gave incorrect results:

1.41421 1.0

0.0 0.0


I couldn't run this on the iMac ... only have the MacBook Pro available.

But did you also allocate the buffer using that same row bytes stride, and add padding to fill out the extra values in each row? In other words:


let arrayA: [Float]  = [ 2.0 , 1.0 , 0, 0, 1.0 , 2.0, 0, 0 ]


Note that the docs say that using rowBytes(...) isn't strictly necessary, only for getting the best performance. But it's worth trying out the above to see if this fixes the issue.

Oh. Will try. Thank you.

It will be great if you can you file a bug with a small reproduceable test case. We will investigate to see what the issue is.

Still trying to figure out how to use: device.makebuffer( ... ) with the padded input array. I don't think I am able to copy the entire padded data directly into the buffer. I think I need to copy it in parts. Not sure. Right now I have an 8 by 8 byte matrix, but it is apparently true I need create an 16 by 16 byte buffer. So I need to somehow move the data into the correct areas of the buffer.

probably stated that wrong. I have 2 rows of 8 bytes whereas I need two rows of 16 bytes? They are still both 2 by 2 matrices.

I've tried all patterns of the arrayA. It seems I am not able to create a valid buffer with the rowBytes set at 16 when all I have are 8 bytes of data per row. Maybe it doesn't work with a 2 by 2 matrix set-up?


I've tried the arrayA with [ 2, 1, 0, 0, 1, 2, 0, 0 ] and I've moved the data around to different positions in that array but so far no luck.


I was under the belief that I could directly assign that padded array with .makebuffer but it doesn't yield a correct result.

Thank you, but I still don't know if there is any bug. I think I must be misaligning the data in the Metal buffer. The Intel Iris Pro 650 on the MacBook Pro gives me the 'closest' answer, the Radeon 555 iMac doesn't make sense at all ... but you know 'close' doesn't count in matrix algebra. I am trying to understand how to assign data to specific locations in the buffer. Obviously, if the GPUs all require 16 byte rows (from .rowBytes) then my 8 bytes (of 2 columns) of data may be the issue?

Well, it is trivial to check the lower decomposition with an input matrix of [ 2, 1, 1, 2 ], but I'd be interested in how or what you did to get the buffer alignment for the decomposition. I am doing something wrong ... obviously. I checked on-line but I don't see any sample code for linear algbra other than matrix multiplication.