I'm playing with a library that outputs/generates opencl code (coriander). I'm trying to launch a compute kernel but I can't seem to get anything bigger than 256 threads per workgroup.
Can anyone confirm this is a hardware limitation? I can't find any info in the metal feature table for m1.
I'd like to know if this is actually the max threadgroup size or if there is an issue with the opencl drivers or the library doing the translation.
Thanks in advance.
Can anyone confirm this is a hardware limitation? I can't find any info in the metal feature table for m1.
I'd like to know if this is actually the max threadgroup size or if there is an issue with the opencl drivers or the library doing the translation.
Thanks in advance.