I'm trying to set up Facebook AI's "Segment Anything" MLModel to compare its performance and efficacy on-device against the Vision library's Foreground Instance Mask Request.
The Vision request accepts any reasonably-sized image for processing, and then has a method to produce an output at the same resolution as the input image. Conversely, the MLModel for Segment Anything accepts a 1024x1024 image for inference and outputs a 1024x1024 image for output.
What is the best way to work with non-square images, such as 4:3 camera photos? I can basically think of 3 methods for accomplishing this:
Scale the image to 1024x1024, ignoring aspect ratio, then inversely scale the output back to the original size. However, I have a big concern that squashing the content will result in poor inference results.
Scale the image, preserving its aspect ratio so its minimum dimension is 1024, then run the model multiple times on a sliding 1024x1024 window and then aggregating the results. My main concern here is the complexity of de-duping the output, when each run could make different outputs based on how objects are cropped.
Fit the image within 1024x1024 and pad with black pixels to make a square. I'm not sure if the border will muck up the inference.
Anyway, this seems like it must be a well-solved problem in ML, but I'm having difficulty finding an authoritative best practice.
Post
Replies
Boosts
Views
Activity
I've created a Swift Package for a metal-backed Core Image Filter that generates Simplex Noise and (both simple and fractal). The issue that I have is a Metal file that contains a Core Image Filter must be compiled with the -fcikernel flag and then linked with the -cikernel flag in order to work correctly with CIKernel's init(functionName:fromMetalLibraryData:)
I've accomplished this in my own projects by following the guidelines in the Build Metal-based Core Image kernels with Xcode from WWDC20. This involves naming the source files with a ".ci.metal" extension and then putting in a build rule that takes files that extension and uses the correct flag to create a ".ci.air" file which is then linked into a metallib using the appropriate linker flag in a second custom build rule.
Thus far, I have not determined a way to accomplish this in a Swift Package short of asking the end user to create the special build rules themselves in my README file, which is a terrible idea because my library would be broken unless the user took fairly complex additional steps in their project.
I settled on the temporary solution of just including the final "SimplexNoise.ci.metallib" as a package resource because it's only 7kb. However, this is not a great solution either, because I have to separately recompile and update the metallib file any time I want to make changes to the metal source code.
My dream would be to somehow include my custom build rules along with my library so that the metallib file could be created at build time in the user's project, but I have not found anything in the Swift Package documentation to even hint that such a thing is possible.
Does anyone have any ideas?
I've created a simple identity kernel as a starting point (it immediately returns the sample). I named the file "Kernel.ci.metal". I copied the 2 build rules shown in the demo exactly, but when I build the app, the "*.ci.metal" build rule fails because the output file "Kernal.ci.air" does not exist:
metal: error: no such file or directory: '<PATH>/DerivedSources/Kernel.ci.air' When I run the 2 commands in the build scripts directly from the command line, it works fine. Are my build rules happening at the wrong time in the build process and the target directory has been removed already?