In the absence of the ability to simultaneously use ARKit motion capture and people occlusion with depth, I am brainstorming ways that I can still make use of virtual objects in my app in more limited ways. One idea I am considering is using the occlusion configuration until I detect the person is in a particular position and switching configurations to motion capture. Is that switch going to cause me problems with loss of world anchor or other disruptive experiences for the user? How seamless will mode switching appear?
You should be able to turn off people occlusion by removing it from your configuration's frameSemantics property and calling ARSession.run() with the modified configuration. This should be seamless for the user. But watch out that you are not passing the .resetTracking option to ARSession.run()