How to capture capture the region from the camera stream with respect to referenceImage?

We are using realitykit, we have reference images, and we are able to detect the said reference image, now I want to dynamically capture the region from the camera stream where the physical image has appeared, not the whole screen but only the exact physical image region.?

On a high level, the basic steps would be the following:

  1. Use the ARImageAnchor's transform to retrieve where the image is located in world space.
  2. With help of the physicalSize of the image anchor's referenceImage, you can compute the extent of the tracked image and the position of its corners (in world space).
  3. With help of the project(_:) method on the ARView, you can project those 3D points into 2D screen space, giving you the 2D pixel coordinates of the region containing the image.
How to capture capture the region from the camera stream with respect to referenceImage?
 
 
Q