Confusion over the `constant` memory address space and memory layout

Situation (simplified):

There're 3 Metal buffers (created on app initialization via MTLDevice – not via MTLHeap);

  • Buffer A: contains total of n float3 elements (which may represent particle positions)
  • Buffer B: double-buffered A
  • Buffer C: contains total of n float elements (i-th element of B corresponds to i-th element of A (and also B))


I also have 3 kernels (Metal compute functions), which maipulate the 3 buffers;

  • Kernel preprocess(A, B): its parameters are A and B. It reads (old) values from A and writes updated values into B. [1 thread = 1 element]
  • Kernel process(B, C): this is a very computation-expensive/time-comsuming kernel. (Just to be clear: each element of buffer B represents particle position.) For each element/particle from B, the kernel finds the particle's nearest neighbors (i.e. their positions). After the neighbors are found, it reduces their positions into one float value. The float value is written to buffer C. [1 thread = 1 element (i.e. particle)]
  • Kernel postprocess(A, B, C): it reads element from B and C, combines them in a simple way and writes the result back to A. (Something like A[i] = B[i]*C[i])[1 thread = 1 element]



Problem:

The kernel process takes serious amount of time to complete (Because for each particle it iterates over a range of B's values. Particles that are near (i.e. adjacent threads) iterate over very similar ranges in B – there's heavy memory reuse of B between adjacent kernels). The kernel is defined similarly to this:

kernel void process(..., device float3* B, ...) {...}



Question:

Since the kernel process loads repeteadly the same regions of B (even though I use threadgroup memory) and B is read-only in this kernel, I thought it'd be a good idea for performance improvement to cache the buffer B – to transfer it into a highly cached portion of memory optimized for repeated reads. AFAIK such portion of memory is the constant address space. Is it possible to do the following? And if it is, then how?

  1. Call the kernel preprocess(A, B) [this updates the contents of buffer B – hence buffer B cannot be in the constant address space]
  2. Transfer the buffer B (containing the new, updated values from previous step) into constant address space. <——— How do I do this?
  3. Call the kernel process(B, C) [read repeatedly values from the buffer B which is in constant address space (improving performance)]
  4. Call the kernel postprocess(A, B, C)


Extra question:

When I create an buffer at the app initialization, I don't specify what address space it belongs to, so when is it decided that a buffer belongs into device or constant address space? Is it per Command Buffer, per Encoder or per each individual Pipeline state of one Encoder? Or does a buffer's position in memory address spaces change throughout the app lifetime?


Example: Say I have 1 buffer and 2 kernels:

// CPU side:
var data = device.newBuffer(...)

commandEncoder.setComputePipelineState(fooPipeline);
commandEncoder.setBuffer(data, offset: 0, at: 0)
commandEncoder.dispatchThreadgroups(...)

commandEncoder.setComputePipelineState(barPipeline);
commandEncoder.setBuffer(data, offset: 0, at: 0)
commandEncoder.dispatchThreadgroups(...)

commandEncoder.endEncoding()

// GPU side
kernel foo(device float* data){...}  // foo writes to data
kernel bar(constant float& data){...} // bar only reads from data


Where is the buffer data stored (in the device or the (cached) constant address space space)? What would happen when I swap the order in which are foo and bar dispatched?

>when is it decided that a buffer belongs into device or constant address space?

See: https://developer.apple.com/library/ios/documentation/Metal/Reference/MetalShadingLanguageGuide/func-var-qual/func-var-qual.html

"All arguments to a graphics (vertex or fragment) or compute function that are a pointer or reference to a type must be declared with an address space qualifier. For graphics functions, an argument that is a pointer or reference to a type must be declared in the

device
or
constant
address space.

The address space for a variable at program scope must be

constant
.

Any variable that is a pointer or reference must be declared with one of the address space qualifiers discussed in this section. If an address space qualifier is missing on a pointer or reference type declaration, a compilation error occurs."

Best of luck w/your homework 😉

I'm getting this error: "Program scope variable must reside in constant address space" in Xcode in a C header file. I've tried several usages of constant but can't guess the correct syntax the compiler is looking for.

constant int var = 90; /unknown type name 'constant'/ __constant int var = 90; /unknown type name '__constant'/ [constant] int var = 90; /expected identifier or '('/Expected unqualified-id/use of undeclared identifier 'constant'/ [[constant]] int var = 90; /The above plus a warning: Unknown attribute 'constant' ignored/ [[__constant]] int var = 90;/Some of the above plus:Program scope variable must reside in constant address space/ static int var = 90; /* Two copies of:Program scope variable must reside in constant address space*/

What is the correct syntax to declare variables in constant address space in C header files?

Confusion over the `constant` memory address space and memory layout
 
 
Q