How can you recognise a text and display 3D contents according to it.

Question

Created Jun ’20

Replies 1

Boosts 0

Views 1.3k

Participants 2

How can you recognise a text and display 3D contents according to it

For example: you have a text “GO” when u point the camera at the text you should recognise the text and a 3D arrow-mark should be placed in the real world.

how can it be done using CoreMl, Vision and ARKit.
if it is not possible in these frameworks, what are the other Frameworks for it.

Answered by Vision Pro Engineer in 615643022

It is absolutely possible to create an experience as you describe by using ARKit and the Vision framework. The logic in your application would be something like this:

Run an ARKit world tracking session.
Pass along the current camera frame to Vision and perform a text recognition request. Recognizing Text in Images might be a good read to get started with text recognition. To learn how to use a combination of ARKit and Vision in your app, you can check out the Tracking and Altering Images developer sample.
When Vision detects the text, you will get its bounding box in the image in 2D.
By performing a raycast in ARKit based on the 2D position, you can determine the 3D coordinate where to place your arrow.

Alternatively, there might be an even simpler solution if you don't need to detect arbitrary text: You could also use image tracking with your "GO" sign as a reference image. You can then simply use the image anchor's location to position your content.

Boost

Answer 1

Vision Pro Engineer OP

Apple

Jun ’20

Accepted Answer

It is absolutely possible to create an experience as you describe by using ARKit and the Vision framework. The logic in your application would be something like this:

Run an ARKit world tracking session.
Pass along the current camera frame to Vision and perform a text recognition request. Recognizing Text in Images might be a good read to get started with text recognition. To learn how to use a combination of ARKit and Vision in your app, you can check out the Tracking and Altering Images developer sample.
When Vision detects the text, you will get its bounding box in the image in 2D.
By performing a raycast in ARKit based on the 2D position, you can determine the 3D coordinate where to place your arrow.

Alternatively, there might be an even simpler solution if you don't need to detect arbitrary text: You could also use image tracking with your "GO" sign as a reference image. You can then simply use the image anchor's location to position your content.

1