I'm planning on working with object masks as well, but I'm yet to give it a try. However, instead of reading files from disk, I'm providing the session with self-constructed samples due to my specific use-case. Maybe you've already seen this, but when providing the objectMask on the PhotogrammetrySample, the documentation reads:
When a photograph of an object includes surrounding objects, such as plants, buildings, or people in an outdoor space, you can create an object mask to exclude the portions of the image that don’t contain the object. Masking extraneous image data reduces the number of landmarks RealityKit attempts to match, speeds up the object-creation process, and produces a more accurate 3D model.
Provide the object mask in kCVPixelFormatType_OneComponent8 format and with the same height and width as image. RealityKit ignores any pixel in image when the corresponding pixel in objectMask has a value of 0.0 (black) unless isObjectMaskingEnabled is set to False in the session’s configuration.
Coming from the machine learning world, I've just assumed the file type would be PNG, as that's fairly common for segmentation masks. At least it should be easy to make a PNG in the kCVPixelFormatType_OneComponent8 format, but I guess it could be a different file type if you wanted.
Are your masks using this format?
If possible, could you provide a sample of your images and corresponding masks? I'm curious as to what you're attempting to make it work with. For example, how "tight" your mask is to your object etc.