The arSession owned by RoomCaptureSession doesn't keep track of scene depth (session.configuration?.frameSemantics.contains(.sceneDepth) = false)
I tried to stop the session and restart it with the correct configuration but it didn't work:
captureSession.arSession.pause()
guard ARWorldTrackingConfiguration.supportsFrameSemantics(.sceneDepth) else { return }
let config = ARWorldTrackingConfiguration()
config.frameSemantics = .sceneDepth
captureSession.arSession.run(config)