We have troubles running a model on the Neural Engine on iPhone 13 and 14. After some debugging we came with this minimal reproduction model. The model has 2 inputs, each one goes through a convolution and then the results are concatenated along the channels axis and the result goes through a depth-wise convolution.
It was constructed by this code:
builder = NeuralNetworkBuilder(
[
('input2' , datatypes.Array(1, 128, 43, 1)),
('input1' , datatypes.Array(1, 800, 43, 1)),
],
[('output', datatypes.Array(1, 800, 39, 1))],
disable_rank5_shape_mapping=True,
use_float_arraytype=True)
builder.add_convolution(name='conv1', input_name='input1', output_name='conv1_output', kernel_channels=800, output_channels=800, height=1, width=1, stride_height=1, stride_width=1, border_mode='valid', groups=1,**conv1_weights)
builder.add_convolution(name='conv2', input_name='input2', output_name='conv2_output',kernel_channels=128, output_channels=800, height=1, width=1,stride_height=1, stride_width=1, border_mode='valid', groups=1, **conv2_weights)
builder.add_concat_nd('concat', ['conv1_output', 'conv2_output'], 'concat_output', axis=-3)
builder.add_convolution(name='conv3', input_name='concat_output', output_name='output',kernel_channels=1, output_channels=1600, height=5, width=1,stride_height=1, stride_width=1, border_mode='valid', groups=1600,**conv3_weights)
save_spec(builder.spec, 'Block.mlmodel')
(I tried attaching the file itself but it is 3MB which is too big to upload here) Then, running it on CPU, GPU and Neural Engine in random inputs, I receive the following results:
Index CPU GPU ANE
...
12568 -23.01966 -23.01562 0.2047119
12569 -22.09334 -22.09375 0.2047119
12570 -21.53324 -21.53125 0.2047119
12571 -21.68019 -21.6875 0.2047119
12572 -21.8576 -21.85938 0.2047119
12573 -24.22912 -24.23438 0.2047119
12574 -21.20876 -21.20312 0.2047119
12575 -22.21795 -22.21875 0.2047119
12576 -22.34912 -22.34375 0.2047119
12577 -22.42678 -22.42188 0.2047119
12578 -22.49558 -22.5 0.2047119
...
Which seems like the ANE got "stuck" somehow on an invalid output while ignoring the changes in the input. What I tried so far:
- Removing any of the convolutions made the network work as expected.
- Replacing the ConcatND with Concat did not change the behavior.
- Adding copy layers before and after the concatND did not change the behavior.
- It seems to happen on iPhone 13 and 14, but not on the 3rd generation iPhone SE.
Any ideas for what is wrong and how to fix it?