Hello -
I was hopeful someone could help me understand the syntax of address space qualifiers that I have seen in the metal standard library. I have seen the qualifiers used on constructors and on functions such as the following:
struct A {
A() thread : { }
A() device : { }
void do_it() thread { device int& param } // 1
void do_it() thread { thread int& param } // 2
void do_it() device { device int& param } // 3
void do_it() device { thread int& param } // 4
}
What do the address qualifiers mean in this context exactly? Are these overloads depending on where A resides in memory? For example:
kernel void process_data(device A& d_var, device uint& d_out ) {
thread A p_var { }; // OK
threadgroup A t_var { }; // ERROR?
thread int p_out = 0;
threadgroup int t_out = 0;
p_var.do_it(d_out); // calls 1?
p_var.do_it(p_out); // calls 2?
p_var.do_it(t_out); // Error?
d_var.do_it(d_out); // calls 3?
d_var.do_it(p_out); // calls 4?
d_var.do_it(t_out); // Error?
}
Or does this mean something else entirely? Would it be valid to add an address space qualifier to a global function? For example:
void do_it() device; // declared in program scope
If so, what does it mean for a function to be in (e.g.) device memory? Thank you for helping me understanding this better.
So the struct you posted would not compile but I'm pretty sure this is what you meant:
struct A {
A() thread : { }
A() device : { }
void do_it(device int& param ) thread { } // 1
void do_it( thread int& param ) thread { } // 2
void do_it( device int& param) device { } // 3
void do_it(thread int& param) device { } // 4
}
Address space after the method is the address space of the object (ie of the “this” pointer). This is analogous to how other qualifiers (const, volatile, etc) can be placed after the method to qualify the type of object.
So in this specific example, methods 1 and 2 can only be called on objects declared in the thread address space, whereas methods 3 and 4 can only be called on objects declared in the device address space.
kernel void process_data(device A& d_var, device int& d_out ) {
thread A p_var { }; // Uses the first constructor since p_var is declared with the thread address space
threadgroup A t_var { }; // Error because there is no threadgroup contructor A
thread int p_out = 0;
threadgroup int t_out = 0;
p_var.do_it(d_out); // calls 1 because p_var uses the thread address space and d_out uses the device address space
p_var.do_it(p_out); // calls 2 because p_var uses the thread address space and p_out uses the thread address space
p_var.do_it(t_out); // Error because t_out uses the threadgroup address space and there is no method which takes a reference using the threadgroup address space
d_var.do_it(d_out); // calls 3 because d_var uses the device address space and d_out uses the device address space
d_var.do_it(p_out); // calls 4 because d_var uses the device address space and p_out uses the thread address space
d_var.do_it(t_out); // Error because t_out uses the threadgroup address space and there is no method which takes a reference using the threadgroup address space
You can't add a qualifier to global function since there can be no object associated with it.