argsort performance on coreML

When the input dimension is 600w, the operator runs on ANE. But when the input shape is 100w or 200w, this operator can only run on the CPU. The data dimension has decreased, but it does not run on ANE. What is the reason for this and what are the ways to avoid it

argsort performance on coreML
 
 
Q