2 Replies
      Latest reply on Dec 5, 2019 5:40 AM by Luxo
      Luxo Level 1 Level 1 (0 points)

        Both the Xcode sample project and developer guide lists a metal shader that makes use of a matrix to transform a Y'CbCr pixel into an R'G'B' pixel. I'm trying to understand the last column of that matrix which, as far as I can tell, shifts the values so that they line up between the eexpected 0...1 range.


        The docs for `capturedImage` also mention the transform only saying that "Unlike some uses of [ITU R. 601-4], ARKit captures full-range color space values, not video-range values. To correctly render these images on a device display, you'll need to access the luma and chroma planes of the pixel buffer and convert full-range YCbCr values to an sRGB (or ITU R. 709) format according to the ITU-T T.871 specification."


        The transform in question:

        const float4x4 ycbcrToRGBTransform = float4x4(
          float4(+1.0000f, +1.0000f, +1.0000f, +0.0000f),
          float4(+0.0000f, -0.3441f, +1.7720f, +0.0000f),
          float4(+1.4020f, -0.7141f, +0.0000f, +0.0000f),
          float4(-0.7010f, +0.5291f, -0.8860f, +1.0000f)


        So, as far as I can tell the first three columns (or rows as they're formatted in code) are the same as the ones recommended in ITU-T T.871 and as discussed in the documentation for ARFrame.capturedImage.


        However, the last column is a mystery to me. It appears to be a translation thats's applied after the T.871 conversion, and apparently seems to subtract 0.701 from the R value, add 0.5291 to the G value, and subtract 0.886 from the B value.


        What's happening here? Is this to account for P3 color? Is there anyway we can get an idea of how to derive these values ourselves?