Short answer: No
CoreML always works with image, in lucky cases it may give you position in 2D world, missing a dimension for 3D world. For classificating objects the situation is worse since it gives nothing about position / boundary of the object.
If your app can accept slower proccessing speed (I afraid it is not that case when working with ARKit) you may split your input image into some smaller pieces and then try-error with CoreML to find closer boundary of the object. However, one dimension is still missing (you may assign it by a fix number).
Maybe you can add a tap gesture recognizer, s.t. when you tap the object, the sceneView run a hitTest to look for a feature point. Thus you can approximate the 3D location of the object by the coordinates of the feature point.
No, you cannot tap / hittest to a real object in the real world. You can do that only with "artificial" objects which you have created or loadded (3D models).