ANE-Optimized Layer Norm Fails on ANE

In the ml-ane-transformers repo, there is a custom LayerNorm implementation for the Neural Engine-optimized shape of (B,C,1,S).

The coremltools documentation makes it sound like the layer_norm MIL op would support this natively. In fact, the following code works on CPU:

B,C,S = 1,768,512
g,b = 1, 0
@mb.program(input_specs=[mb.TensorSpec(shape=(B,C,1,S)),])
def ln_prog(x):
    gamma = (torch.ones((C,), dtype=torch.float32) * g).tolist()
    beta = (torch.ones((C), dtype=torch.float32) * b).tolist()
    return mb.layer_norm(x=x, axes=[1], gamma=gamma, beta=beta, name="y")

However it fails when run on the Neural Engine, giving results that are scaled by an incorrect value.

Should this work on the Neural Engine?

I've filed FB12150787 for this.

Following up on this. The MIL op does work on the Neural Engine. It runs in float16, which can overflow and yield the incorrect-looking results.

ANE-Optimized Layer Norm Fails on ANE
 
 
Q