Good day people!
I'm currently working on my master thesis in media informatics. I'd really appreciate to discuss my topic with you guys, so I may get some interesting ideas or new information.
The goal is to implement an app, specifically designed for places like museums where the envrionment isn't perfect for AR tracking. (Darkness, no network connection, maybe exhibits made out of glass...)
Therefore, i'd like to develop a neuronal network for the new ipad pro that takes rgb-d data to predict a pose estimation in a scene for an object, so that it matches the real world object perfectly. This placed object will be a perfect 3d model replica of the real object. (hand modeled or scanned and revised) This should allow me to place AR Content precisely over the real world object, even in difficult lightlings and stuff. Maybe it will improve occlusion, too. I can imagine that the neuronal network may also detect structures, edges and semantic coherences better than the usual approach.
My first thought was to work with CoreML, Metal, maybe Vision and ARKit. I will also try out XCode for the first time.
Maybe you guys have interesting ideas for improvement or can guide me a little bit, since i fell a bit lost at the moment. Would you use rather point clouds or the raw depth buffer to train the model? Would you also train with edge filter images and stuff? Why or why not?
Thanks in advance, it would mean the world to me!
Kind regards, Miri :-)