In CUDA terminology, a threadgroup is executed by a "streaming multiprocessor." In Metal terminology, is a threadgroup executed by a "core" or an "execution unit" (within a core)? I can find no resources to answer online, but resources imply differently.
Regardless the answer, why do Apple GPU's have this two-layer architecture of cores and execution units, whereas Nvidia has the single layer of streaming multiprocessors? Are both layers visible/accessible to the Metal programmer, or only one layer (whichever corresponds to threadgroups)? What's the purpose of the other layer?