Hello developers,
I am recently struggling with providing the right data for a deep learning model i want to integrate into my swift app. (I am fairly new to swift and ios dev, please bear with me.)
The App is supposed to run on an iPad Pro with Lidar sensor, no other devices. I'm working with Dense Fusion 6DoF pose estimation , which requires an RGB-D Image as input. (it's trained on the ycb-video dataset)
I already looked up different examples on how i can stream depth data from the camera (fog example) and also how to capture an Image with depth information. So I started a session that gets video and depth data from the CVPixelBuffer as input.
However, I don't really know what to do with them after that and how to fuse them together to create one RGBD image.
Also I want to use the predicted pose to place a 3d model into a scene so i can append AR content to it.
(The apps purpose is to check whether Deep Learning can outperform RealityKit in things like poor lighting conditions, etc)
I'd be glad about every little help. Thanks in advance!