Understanding the `ycbcrToRGBTransform` in the sample code metal shader

Both the Xcode sample project and developer guide lists a metal shader that makes use of a matrix to transform a Y'CbCr pixel into an R'G'B' pixel. I'm trying to understand the last column of that matrix which, as far as I can tell, shifts the values so that they line up between the eexpected 0...1 range.


The docs for `capturedImage` also mention the transform only saying that "Unlike some uses of [ITU R. 601-4], ARKit captures full-range color space values, not video-range values. To correctly render these images on a device display, you'll need to access the luma and chroma planes of the pixel buffer and convert full-range YCbCr values to an sRGB (or ITU R. 709) format according to the ITU-T T.871 specification."


The transform in question:

const float4x4 ycbcrToRGBTransform = float4x4(
  float4(+1.0000f, +1.0000f, +1.0000f, +0.0000f),
  float4(+0.0000f, -0.3441f, +1.7720f, +0.0000f),
  float4(+1.4020f, -0.7141f, +0.0000f, +0.0000f),
  float4(-0.7010f, +0.5291f, -0.8860f, +1.0000f)
);


So, as far as I can tell the first three columns (or rows as they're formatted in code) are the same as the ones recommended in ITU-T T.871 and as discussed in the documentation for ARFrame.capturedImage.


However, the last column is a mystery to me. It appears to be a translation thats's applied after the T.871 conversion, and apparently seems to subtract 0.701 from the R value, add 0.5291 to the G value, and subtract 0.886 from the B value.


What's happening here? Is this to account for P3 color? Is there anyway we can get an idea of how to derive these values ourselves?

Replies

This matrix is fully described by the conversion equations in T.871. As an exercise, you can perform the matrix multiplication by hand with a generic column vector of [Y, Cb, Cr, 1], and you will arrive at the same yCbCr to RGB equations in T.871 to four decimal position accuracy.

So it does! A useful exercise, thank you.


A further question: the video pixel buffer color primaries attachment states 'P3 D65' – does that mean that in the context of the shader from the example code, we're in what could be described as an 'sRGB linear extended' color space? i.e a max saturated P3 red will give values of R greater than 1 in the shader?