Core ML - captureOutput can capture 3D object but can it provide x, y position?

I can use the caputreOutput to identify the 3d object, but I do not know how to get the x, y position of the corresponding object.


Can anybody help?


I can idenity the object by .identifier but as the time I use .accessibilityActivationPoint.x and .accessibilityActivationPoint.y to detect the x, y position. (Please see the code below) It always come out as 0. And even I move closer or farther the object. The results are always 0.


How can I detect the x, y position of the captured object?


Thanks




func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

// print ("Camera was able to capture a frame:", Date())

guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {return}

guard let model = try? VNCoreMLModel (for: CubeImageClassifier().model) else

{ return }

let request = VNCoreMLRequest(model:model)

{

(finishedReq,err) in

guard let results = finishedReq.results as?

[VNClassificationObservation] else {return}

guard let firstObservation = results.first else {return}

if firstObservation.identifier == "Cube" {

print("It is a Cube. Conference = \(firstObservation.confidence)")

print("Pos-x=\(firstObservation.accessibilityActivationPoint.x)")

print("Pos-y=\(firstObservation.accessibilityActivationPoint.y)")

}

else { print("NOT Cube")}

}

try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])

}

Replies

Hello,


To get a screen position for an object, you need to perform an image request which gives you VNRecognizedObjectObservation. Currently you are receiving VNClassificationObservations, which do not contain positional data.


Check out this sample which recognizes certain objects and draws their screen bounding boxes: https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture