Matching Virtual Object Depth with ARFrame Estimated Depth Data

I am trying to do a hit test of sorts between a person in my ARFrame and a RealityKit Entity. So far I have been able to use the position value of my entity and project it to a CGPoint which I can match up with the ARFrame's segmentationBuffer to determine whether a person intersects with that entity. Now I want to find out if that person is at the same depth as that entity. How do I relate the SIMD3 position value for the entity, which is in meters I think, to the estimatedDepthData value?
Answered by DTS Engineer in 619385022
It's difficult to say where you've gone wrong, the following method will extract the value at the provided image coordinate from the depth texture:

Code Block
extension CVPixelBuffer {
    func value(column: Int, row: Int) -> Float? {
        guard CVPixelBufferGetPixelFormatType(self) == kCVPixelFormatType_DepthFloat32 else { return nil }
        CVPixelBufferLockBaseAddress(self, .readOnly)
        if let baseAddress = CVPixelBufferGetBaseAddress(self) {
            let width = CVPixelBufferGetWidth(self)
            let index = column + row*width
            let offset = index * MemoryLayout<Float>.stride
            let value = baseAddress.load(fromByteOffset: offset, as: Float.self)
        CVPixelBufferUnlockBaseAddress(self, .readOnly)
            return value
        }
        CVPixelBufferUnlockBaseAddress(self, .readOnly)
        return nil
    }
}


I recommend that you start here, make sure you can get valid values, and then work forward from there to see where your issue is. It is likely an error in converting between coordinate spaces somewhere.

The depth data in the estimatedDepthData pixel buffer is estimated linear depth in meters from the point of view.

So, if you have a pixel where your entity intersects with the segmentationBuffer, you can unproject that position into world space using the estimated linear depth, which you may be able to use as a sort of rough hit test.

This sample contains an unprojection method which may be useful for reference: https://developer.apple.com/documentation/arkit/visualizing_a_point_cloud_using_scene_depth
Thanks for the suggestion. Since posting this I have indeed been able to get the beginnings of a hit test going with the segmentationBuffer, but then when I try to use the estimatedDepthData, I run into trouble extracting values.

Here's some of my code:
Code Block
let segmentationCols = CVPixelBufferGetWidth(segmentationBuffer)
let segmentationRows = CVPixelBufferGetHeight(segmentationBuffer)
let colPosition = screenPosition.x / UIScreen.main.bounds.width * CGFloat(segmentationCols)
let rowPosition = screenPosition.y / UIScreen.main.bounds.height * CGFloat(segmentationRows)
CVPixelBufferLockBaseAddress(segmentationBuffer, .readOnly)
guard let baseAddress = CVPixelBufferGetBaseAddress(segmentationBuffer) else { return }
let bytesPerRow = CVPixelBufferGetBytesPerRow(segmentationBuffer)
let buffer = baseAddress.assumingMemoryBound(to: UInt8.self)
let index = Int(colPosition) + Int(rowPosition) * bytesPerRow
let b = buffer[index]
if let segment = ARFrame.SegmentationClass(rawValue: b), segment == .person, let depthBuffer = frame.estimatedDepthData {
print("Person!")
CVPixelBufferLockBaseAddress(depthBuffer, .readOnly)
guard let depthAddress = CVPixelBufferGetBaseAddress(depthBuffer) else { return }
let depthBytesPerRow = CVPixelBufferGetBytesPerRow(depthBuffer)
let depthBoundBuffer = depthAddress.assumingMemoryBound(to: Float32.self)
let depthIndex = Int(colPosition) * Int(rowPosition)
let depth_b = depthBoundBuffer[depthIndex]
print(depth_b)
CVPixelBufferUnlockBaseAddress(depthBuffer, .readOnly)
}
CVPixelBufferUnlockBaseAddress( segmentationBuffer, .readOnly )


I strongly suspect that my problems are in line 19 and 20 of my code above, but I can't figure out the right values to find the point I want in the estimatedDepthData
Thanks for the suggestion. Since posting this I have indeed been able to get the beginnings of a hit test going with the segmentationBuffer, but then when I try to use the estimatedDepthData, I run into trouble extracting values.

Here's some of my code:
Code Block
let segmentationCols = CVPixelBufferGetWidth(segmentationBuffer)
let segmentationRows = CVPixelBufferGetHeight(segmentationBuffer)
let colPosition = screenPosition.x / UIScreen.main.bounds.width * CGFloat(segmentationCols)
let rowPosition = screenPosition.y / UIScreen.main.bounds.height * CGFloat(segmentationRows)
CVPixelBufferLockBaseAddress(segmentationBuffer, .readOnly)
guard let baseAddress = CVPixelBufferGetBaseAddress(segmentationBuffer) else { return }
let bytesPerRow = CVPixelBufferGetBytesPerRow(segmentationBuffer)
let buffer = baseAddress.assumingMemoryBound(to: UInt8.self)
let index = Int(colPosition) + Int(rowPosition) * bytesPerRow
let b = buffer[index]
if let segment = ARFrame.SegmentationClass(rawValue: b), segment == .person, let depthBuffer = frame.estimatedDepthData {
print("Person!")
CVPixelBufferLockBaseAddress(depthBuffer, .readOnly)
guard let depthAddress = CVPixelBufferGetBaseAddress(depthBuffer) else { return }
let depthBytesPerRow = CVPixelBufferGetBytesPerRow(depthBuffer)
let depthBoundBuffer = depthAddress.assumingMemoryBound(to: Float32.self)
let depthIndex = Int(colPosition) * Int(rowPosition)
let depth_b = depthBoundBuffer[depthIndex]
print(depth_b)
CVPixelBufferUnlockBaseAddress(depthBuffer, .readOnly)
}
CVPixelBufferUnlockBaseAddress( segmentationBuffer, .readOnly )


I strongly suspect that my problems are in line 19 and 20 of my code above, but I can't figure out the right values to find the point I want in the estimatedDepthData
It looks like the error is in line 20:

Code Block
let depthIndex = Int(colPosition) * Int(rowPosition)


You should try:

Code Block
let depthIndex = Int(colPosition) + Int(rowPosition) * width // Where width is CVPixelBufferGetWidth(pixelBuffer)

Hey thanks for the suggestion. That is actually what I had initially, similar to line 10, but it wasn't working so I started messing around with other values to see if I could get something to work. Neither one works though. Any other ideas? Most examples I've come across are Metal implementations and don't have corresponding code to what I'm trying to do.
Accepted Answer
It's difficult to say where you've gone wrong, the following method will extract the value at the provided image coordinate from the depth texture:

Code Block
extension CVPixelBuffer {
    func value(column: Int, row: Int) -> Float? {
        guard CVPixelBufferGetPixelFormatType(self) == kCVPixelFormatType_DepthFloat32 else { return nil }
        CVPixelBufferLockBaseAddress(self, .readOnly)
        if let baseAddress = CVPixelBufferGetBaseAddress(self) {
            let width = CVPixelBufferGetWidth(self)
            let index = column + row*width
            let offset = index * MemoryLayout<Float>.stride
            let value = baseAddress.load(fromByteOffset: offset, as: Float.self)
        CVPixelBufferUnlockBaseAddress(self, .readOnly)
            return value
        }
        CVPixelBufferUnlockBaseAddress(self, .readOnly)
        return nil
    }
}


I recommend that you start here, make sure you can get valid values, and then work forward from there to see where your issue is. It is likely an error in converting between coordinate spaces somewhere.

That extension was super helpful and solved my problems, so thank you so much! Comparing the extension to my code, I think the key problem was in fact what you highlighted earlier--I needed to account for the pixel buffer width. In my previous implementation, I had been just accounting for the bytes per row which is what I thought you were saying too, but in fact you need to account for both.

Thanks again!
One tricky bit I have discovered is that when working with an iPhone, the screen aspect ratio does not match the aspect ratio of the depth buffer, so translating from the buffer width to the screen position requires disregarding some of the buffer width on each side.
Matching Virtual Object Depth with ARFrame Estimated Depth Data
 
 
Q