Core ML - captureOutput can capture 3D object but can it provide x, y position?

Question

I can use the caputreOutput to identify the 3d object, but I do not know how to get the x, y position of the corresponding object.

Can anybody help?

I can idenity the object by .identifier but as the time I use .accessibilityActivationPoint.x and .accessibilityActivationPoint.y to detect the x, y position. (Please see the code below) It always come out as 0. And even I move closer or farther the object. The results are always 0.

How can I detect the x, y position of the captured object?

Thanks

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

// print ("Camera was able to capture a frame:", Date())

guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {return}

guard let model = try? VNCoreMLModel (for: CubeImageClassifier().model) else

{ return }

let request = VNCoreMLRequest(model:model)

{

(finishedReq,err) in

guard let results = finishedReq.results as?

[VNClassificationObservation] else {return}

guard let firstObservation = results.first else {return}

if firstObservation.identifier == "Cube" {

print("It is a Cube. Conference = \(firstObservation.confidence)")

print("Pos-x=\(firstObservation.accessibilityActivationPoint.x)")

print("Pos-y=\(firstObservation.accessibilityActivationPoint.y)")

}

else { print("NOT Cube")}

}

try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])

}

Vision

432

Posted by

Angela_m_Li

Reply

Add a Comment

Answer 1

Hello,

To get a screen position for an object, you need to perform an image request which gives you VNRecognizedObjectObservation. Currently you are receiving VNClassificationObservations, which do not contain positional data.

Check out this sample which recognizes certain objects and draws their screen bounding boxes: https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture

Posted by

gchiste

Add a Comment

Core ML - captureOutput can capture 3D object but can it provide x, y position?

Replies