Running a compute shader over only a subregion of a texture

As per the title, I wish to use a compute shader, but run it only over a rectangular subregion of a texture. The textures are currently marked as MTLResourceStorageModePrivate since that best represents their intended use.

Seeming dead ends already explored:
  • setStageInRegion: is barely documented and it's unclear to me what purpose it has, but I assume it relates to per-thread kernel inputs, since that's what you'd mark as [[stage_in]], so that's something else;

  • texture views seem to vary only in pixel format and layout, there's no obvious way to create a texture view that is a subregion of another texture;

  • neither of the dispatch functions on MTLComputeCommandEncoder take a region;

  • there is similarly nothing obvious in MTLComputePipelineState, or indeed in MTLComputePassDescriptor.

Is what I want to do supported? It'd be a real hassle to have to cram myself into a 1:1 render pass just to get this functionality.

EDIT: I guess I could adjust the number of threadgroups I dispatch to set a size and then provide an offset via a buffer, adding that to whatever comes out of my gid before accessing the source and destination. But is that within the bounds of intended usage of compute shaders?
Answered by Ceylo in 631729022
When executing compute shaders you just dispatch a grid of tasks. Whether this grid elements match each pixel of your texture is up to you. So indeed your guess about dispatching a grid of the size of the region of the source texture and adding an offset (the origin of the region) to the gid to know which pixel coordinate to read is correct.
Accepted Answer
When executing compute shaders you just dispatch a grid of tasks. Whether this grid elements match each pixel of your texture is up to you. So indeed your guess about dispatching a grid of the size of the region of the source texture and adding an offset (the origin of the region) to the gid to know which pixel coordinate to read is correct.
Cool. I probably should have added: it doesn't negatively affect anything if I process the same data multiple times, it's just a performance hazard. So right now I'm using -dispatchThreads:threadsPerThreadgroup: which obviates the issue anyway but -dispatchThreadgroups:threadsPerThreadgroup: wouldn't be problematic on hardware that supports only fixed threadgroup sizes.

That said, this is a Mac app so it looks like non-uniform threadgroup sizes are always available.

Running a compute shader over only a subregion of a texture
 
 
Q