Interracting with Metal PointCloud

I am trying to build an App that allows the user to interact (tap in the view to get 3D coordinates of the tapped location) with a MTKView and the Point Cloud that is drawn on it, using SwiftUI.

Based on the Displaying a Point Cloud Using Scene Depth sample App I managed to save a frame and the needed data to draw the captured image in the MTKView. From what I read from this post, I need to store a given set of information (depthMap / capturedImage /...) to be able to access the 3D data of each point. I am not sure to understand fully how to do this.

I suppose that, by multiplying the cameraIntrinsicsInversed matrix by the localToWorld matrix and depthMap, I can recreate a 3D map of the captured image?

Once the 3D map is created I would have a 256x192 matrix that maps each pixel to it's 3D coordinates. Which would mean that, when tapping in the MTKView, I would have to fetch the coordinates for the pressed location which can then be shown to the user.

However the MTKView is way bigger than the drawn image. On top of that I would expect the drawn image to be either 256x192 pixels or a ratio of the capturedImage over the depthMap, which is not the case.

Is there a way to fit the drawn image to the whole MTKView? Do I need to set a frame for the MTKView? If so, what should be the size of it since I cannot seem to find the size of the drawn image.

Is my train of thought correct or am I missing some information to make this possible? Any help would be much appreciated!