Model produces incorrect results on neural engine, run well on CPU/GPU

We have troubles running a model on the Neural Engine on iPhone 13 and 14. After some debugging we came with this minimal reproduction model. The model has 2 inputs, each one goes through a convolution and then the results are concatenated along the channels axis and the result goes through a depth-wise convolution.

It was constructed by this code:

    builder = NeuralNetworkBuilder(
        [
            ('input2' , datatypes.Array(1, 128, 43, 1)),
            ('input1' , datatypes.Array(1, 800, 43, 1)),
        ],
        [('output', datatypes.Array(1, 800, 39, 1))],
        disable_rank5_shape_mapping=True,
        use_float_arraytype=True)
    builder.add_convolution(name='conv1', input_name='input1', output_name='conv1_output', kernel_channels=800, output_channels=800, height=1, width=1, stride_height=1, stride_width=1, border_mode='valid', groups=1,**conv1_weights)
    builder.add_convolution(name='conv2', input_name='input2', output_name='conv2_output',kernel_channels=128, output_channels=800, height=1, width=1,stride_height=1, stride_width=1, border_mode='valid', groups=1, **conv2_weights)
    builder.add_concat_nd('concat', ['conv1_output', 'conv2_output'], 'concat_output', axis=-3)
    builder.add_convolution(name='conv3', input_name='concat_output', output_name='output',kernel_channels=1, output_channels=1600, height=5, width=1,stride_height=1, stride_width=1, border_mode='valid', groups=1600,**conv3_weights)
    save_spec(builder.spec, 'Block.mlmodel')

(I tried attaching the file itself but it is 3MB which is too big to upload here) Then, running it on CPU, GPU and Neural Engine in random inputs, I receive the following results:

Index		   CPU		      GPU		       ANE
 ...
12568 	 -23.01966 	 -23.01562 	 0.2047119
12569 	 -22.09334 	 -22.09375 	 0.2047119
12570 	 -21.53324 	 -21.53125 	 0.2047119
12571 	 -21.68019 	 -21.6875 	 0.2047119
12572 	 -21.8576 	 -21.85938 	 0.2047119
12573 	 -24.22912 	 -24.23438 	 0.2047119
12574 	 -21.20876 	 -21.20312 	 0.2047119
12575 	 -22.21795 	 -22.21875 	 0.2047119
12576 	 -22.34912 	 -22.34375 	 0.2047119
12577 	 -22.42678 	 -22.42188 	 0.2047119
12578 	 -22.49558 	 -22.5   	   0.2047119
 ...

Which seems like the ANE got "stuck" somehow on an invalid output while ignoring the changes in the input. What I tried so far:

  • Removing any of the convolutions made the network work as expected.
  • Replacing the ConcatND with Concat did not change the behavior.
  • Adding copy layers before and after the concatND did not change the behavior.
  • It seems to happen on iPhone 13 and 14, but not on the 3rd generation iPhone SE.

Any ideas for what is wrong and how to fix it?

Hello, I also encountered the same problem, please ask you to solve it

Model produces incorrect results on neural engine, run well on CPU/GPU
 
 
Q