Apple M1 - tf.sort only sorts up to 16 values for float32

my_array is defined as a constant tensor with these values:

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([0.39002007, 0.6232998 , 0.65246916, 0.51837456, 0.32046252,
       0.17287847, 0.1020941 , 0.05556634, 0.03855091, 0.04841335,
       0.08809784, 0.17805861, 0.29818463, 0.48202834, 0.63666624,
       0.68172085, 0.66695976, 0.64094126, 0.6494308 , 0.66173404],
      dtype=float32)>

tf.sort(my_array) returns the following tensor:

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([ 0.03855091,  0.04841335,  0.05556634,  0.08809784,  0.1020941 ,
        0.17287847,  0.17805861,  0.29818463,  0.32046252,  0.39002007,
        0.48202834,  0.51837456,  0.6232998 ,  0.63666624,  0.64094126,
        0.6494308 , -0.        , -0.        , -0.        , -0.        ],
      dtype=float32)>

Only the first 16 elements are sorted. The same behavior occurs with argsort. When casting to float64 the error disappears.

I installed tensorflow following https://developer.apple.com/metal/tensorflow-plugin/

Is this a bug?

Still persists in tensorflow 2.6.0.

Oops, sorry this is not an answer and I don't know how to delete this.

Thank you for posting this. I have encountered the same problem. It also happens on tf.argsort()

data = tf.random.uniform([20])
argsorted = tf.argsort(data, 0, stable=True)
print(f'{argsorted=}')

output:

argsorted=<tf.Tensor: shape=(20,), dtype=int32, numpy=array([ 4, 17, 9, 1, 7, 12, 8, 14, 3, 0, 18, 13, 5, 10, 15, 19, 0, 0, 0, 0], dtype=int32)>

I installed tensorflow, following the exact path you have pointed out.

Persists in tensorflow-metal==0.3.0

I've got the same issue. For what its worth, casting to float16 works as well. The issue also only occurs when the tensor is on the GPU. If you move the tensor to the CPU, then sort, sorting works as expected.

Still there with tensorflow-deps==2.9, tensorflow-macos==2.9.2 and tensorflow-metal ==0.5.0

Apple M1 - tf.sort only sorts up to 16 values for float32
 
 
Q