Good day people!
I'm currently working on my master thesis in media informatics. I'd really appreciate to discuss my topic with you guys, so I may get some interesting ideas or new information.
The goal is to implement an app, specifically designed for places like museums where the envrionment isn't perfect for AR tracking. (Darkness, no network connection, maybe exhibits made out of glass...)
Therefore, i'd like to develop a neuronal network for the new ipad pro that takes rgb-d data to predict a pose estimation in a scene for an object, so that it matches the real world object perfectly. This placed object will be a perfect 3d model replica of the real object. (hand modeled or scanned and revised)
This should allow me to place AR Content precisely over the real world object, even in difficult lightlings and stuff. Maybe it will improve occlusion, too. I can imagine that the neuronal network may also detect structures, edges and semantic coherences better than the usual approach.
My first thought was to work with CoreML, Metal, maybe Vision and ARKit. I will also try out XCode for the first time.
Maybe you guys have interesting ideas for improvement or can guide me a little bit, since i fell a bit lost at the moment.
Would you use rather point clouds or the raw depth buffer to train the model? Would you also train with edge filter images and stuff? Why or why not?
Thanks in advance, it would mean the world to me!
Kind regards, Miri :-)
Post
Replies
Boosts
Views
Activity
Hello developers,
I am recently struggling with providing the right data for a deep learning model i want to integrate into my swift app. (I am fairly new to swift and ios dev, please bear with me.)
The App is supposed to run on an iPad Pro with Lidar sensor, no other devices. I'm working with Dense Fusion 6DoF pose estimation , which requires an RGB-D Image as input. (it's trained on the ycb-video dataset)
I already looked up different examples on how i can stream depth data from the camera (fog example) and also how to capture an Image with depth information.
So I started a session that gets video and depth data from the CVPixelBuffer as input.
However, I don't really know what to do with them after that and how to fuse them together to create one RGBD image.
Also I want to use the predicted pose to place a 3d model into a scene so i can append AR content to it.
(The apps purpose is to check whether Deep Learning can outperform RealityKit in things like poor lighting conditions, etc)
I'd be glad about every little help. Thanks in advance!
I am fairly new to ios development, and i have a rather simple question (i guess):
I exported an FBX model from blender with attached child elements (some empty axis). I need those transforms for a certain purpose in my app. However, they won't show up when im placing my model entity in the AR Scene and try to access it's child elements. However, they are showing up in the preview view in xcode when i click on the model file.
Can someone please explain why this is and how I can access them? Would be lovely. Thanks for your help in advance!