In the ml-ane-transformers repo, there is a custom LayerNorm implementation for the Neural Engine-optimized shape of (B,C,1,S).
The coremltools documentation makes it sound like the layer_norm MIL op would support this natively. In fact, the following code works on CPU:
B,C,S = 1,768,512
g,b = 1, 0
@mb.program(input_specs=[mb.TensorSpec(shape=(B,C,1,S)),])
def ln_prog(x):
gamma = (torch.ones((C,), dtype=torch.float32) * g).tolist()
beta = (torch.ones((C), dtype=torch.float32) * b).tolist()
return mb.layer_norm(x=x, axes=[1], gamma=gamma, beta=beta, name="y")
However it fails when run on the Neural Engine, giving results that are scaled by an incorrect value.
Should this work on the Neural Engine?