BGRX pixel formats or swizzling options

Hi,


In other graphics APIs, you often have access to a pixel format like RGBX or BGRX, in which there is a fourth channel that is explicitly unused and treated as 255 or 1.0. In OpenGL you can achieve a similar effect with the GL_TEXTURE_SWIZZLE_A parameter forcing the alpha channel of an RGBA/BGRA texture to GL_ONE.


Is there any way to emulate this behavior in Metal? There doesn't seem to be an RGBX or BGRX format, nor any similar texture/sampler swizzling properties. Am I stuck using RGBA and manually slamming the alpha channel of each texel to 255 on the CPU before creating the texture and doing a replaceRegion?


Thanks,

Mike

Replies

Why on Earth on CPU? Consider the fact, that just about everything one does in Metal requires shader or compute kernel. So after sampling/reading from texture one can just force some of the channels to desired value(s). Unless you’re using shaders/kernels without access to their source code (like Metal Performance Kernels). Then CPU is one option, yes. Other teo would be: 1) Avoid replaceRegion at all (which can be good idea anyway) and just upload pixel values (non trivial ones) as a buffer, then have fragment shader in render to texture operation “unpack” these and write them into destination along with constant channel values 2) Upload whole texture without constant channel values and have render to texture operation that reads source values and writes them plus constants into a target texture. Hope that helps Michal

I see what you're saying about those buffer-based methods of resource population being more efficient than the CPU method I suggest, so thanks for that! They do involve some custom shader code, though.


I guess ideally I would be able to use a texture format to describe this behavior, so the shader can be texture-format agnostic and also sample from textures where I don't want to force the alpha channel to 1. Sounds like I might be out of luck relative to DirectX and OpenGL on that front.

Hi,


I am also stuck on the same issue. Are you able to able to find the solution for the same, as you mentioned buffers don't allow sampling so even I want to use texture itself for 24 bit per pixel image rendering.
Also @MikeAlpha, can you suggestt something for 1 bit per pixel images .As currently I have to convert them to 8 bit and then use an R8 pixelformat.
Any suggestions are welcome.

If you want to use hardware _sampling_, then converting to R8 (as well as making mipmaps, I guess) is the only way I can think of.

On the other hand, if you're OK with writing your own sampling routines, then you could do as follows:

1) Load data into R8 directly (so not convert/expand every bit into byte consisting of 0/255, but put each 8 bits of 1-bit deep texture into one texel of 8-bit deep texture)

2) Write your own sampling code, using texture read() functions, the following way (this is very pseudo code, no bounds checking, and all just typed here). This way you'll still get hardware cache support. It should work reasonably fast. And if memory serves it can be even a bit more accurate than "true" hardware sampling, which at least used to have limited precision on some hardware.


// coord is given in 1bit increments (so first bit has 0, second has 1 and so on)
uint read_1bit_packed(texture2d<uint> texture, uint2 coord)
{
     uint shift = coord.x & 0x7;
     coord.x >>= 3;
     return (texture.read(coord) >> shift) & 0x1;
}

// coord is given in <0,1> range
float sample_1bit(texture2d<uint> texture, float2 coord)
{
     // convert to floating point coordinates of sample position (in bits)
     float2 access = float2(coord.x * texture.get_width() * 8, coord.y * texture.get_height());
     // sampling indices and weights
     uint i0 = uint(access.x), j0 = uint(access.y).
     float h = access.x - i0, v = access.y - j0;
     // load four values
     float v00 = read_1bit_packed(texture, uint2(i0, j0)),
          v10 = read_1bit_packed(texture, uint2(i0 + 1, j0)),
          v01 = read_1bit_packed(texture, uint2(i0, j0 + 1)),
          v11 = read_1bit_packed(texture, uint2(i0 + 1, j0 + 1));
     // bilinear interpolation
     return
          ( ( ( (1.0 - h) * v00) + (h * v10) ) * (1.0 - v) ) +
          ( ( ( (1.0 - h) * v01) + (h * v11) ) * v );
}

Sorry, forgot to add: the example above omits index range checking. So either 1bit texture should be a little bigger or there should be clamping, to not attempt reading outside of texture, which can crash.