Hello! I’m having an issue with retrieving the trained weights from MLCLSTMLayer
in ML Compute when training on a GPU. I maintain references to the input-weights, hidden-weights, and biases tensors and use the following code to extract the data post-training:
extension MLCTensor {
func dataArray<Scalar>(as _: Scalar.Type) throws -> [Scalar] where Scalar: Numeric {
let count = self.descriptor.shape.reduce(into: 1) { (result, value) in
result *= value
}
var array = [Scalar](repeating: 0, count: count)
self.synchronizeData() // This *should* copy the latest data from the GPU to memory that’s accessible by the CPU
_ = try array.withUnsafeMutableBytes { (pointer) in
guard let data = self.data else {
throw DataError.uninitialized // A custom error that I declare elsewhere
}
data.copyBytes(to: pointer)
}
return array
}
}
The issue is that when I call dataArray(as:)
on a weights or biases tensor for an LSTM layer that has been trained on a GPU, the values that it retrieves are the same as they were before training began. For instance, if I initialize the biases all to 0
and then train the LSTM layer on a GPU, the biases values seemingly remain 0
post-training, even though the reported loss values decrease as you would expect.
This issue does not occur when training an LSTM layer on a CPU, and it also does not occur when training a fully-connected layer on a GPU. Since both types of layers work properly on a CPU but only MLCFullyConnectedLayer
works properly on a GPU, it seems that the issue is a bug in ML Compute’s GPU implementation of MLCLSTMLayer
specifically.
For reference, I’m testing my code on M1 Max.
Am I doing something wrong, or is this an actual bug that I should report in Feedback Assistant?