I understand we can use MPSImageBatch as input to
[MPSNNGraph encodeBatchToCommandBuffer: ...]
method.
That being said, all inputs to the MPSNNGraph need to be encapsulated in a MPSImage(s).
Suppose I have an machine learning application that trains/infers on thousands of input data where each input has 4 feature channels. Metal Performance Shaders is chosen as the primary AI backbone for real-time use.
Due to the nature of encodeBatchToCommandBuffer method, I will have to create a MTLTexture first as a 2D texture array. The texture has pixel width of 1, height of 1 and pixel format being RGBA32f.
The general set up will be:
#define NumInputDims 4
MPSImageBatch * infBatch = @[];
const uint32_t totalFeatureSets = N;
// Each slice is 4 (RGBA) channels.
const uint32_t totalSlices = (totalFeatureSets * NumInputDims + 3) / 4;
MTLTextureDescriptor * descriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat: MTLPixelFormatRGBA32Float
width: 1
height: 1
mipmapped: NO];
descriptor.textureType = MTLTextureType2DArray
descriptor.arrayLength = totalSlices;
id<MTLTexture> texture = [mDevice newTextureWithDescriptor: descriptor];
// bytes per row is `4 * sizeof(float)` since we're doing one pixel of RGBA32F.
[texture replaceRegion: MTLRegionMake3D(0, 0, 0, 1, 1, totalSlices)
mipmapLevel: 0
withBytes: inputFeatureBuffers[0].data()
bytesPerRow: 4 * sizeof(float)];
MPSImage * infQueryImage = [[MPSImage alloc] initWithTexture: texture
featureChannels: NumInputDims];
infBatch = [infBatch arrayByAddingObject: infQueryImage];
The training/inference will be:
MPSNNGraph * mInferenceGraph = /*some MPSNNGraph setup*/;
MPSImageBatch * returnImage = [mInferenceGraph encodeBatchToCommandBuffer: commandBuffer
sourceImages: @[infBatch]
sourceStates: nil
intermediateImages: nil
destinationStates: nil];
// Commit and wait...
// Read the return image for the inferred result.
As you can see, the setup is really ad hoc - a lot of 1x1 pixels just for this sole purpose.
Is there any better way I can achieve the same result while still on Metal Performance Shaders? I guess a further question will be: can MPS handle general machine learning cases other than CNN? I can see the APIs are revolved around convolution network, both from online documentations and header files.
Any response will be helpful, thank you.
General
RSS for tagExplore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.
Post
Replies
Boosts
Views
Activity
Hi everyone!
I appreciate your help. I am a researcher and I use UMAP to cluster my data. Reproducibility is a key requirement for my field, so I set a random seed for reproducibility.
After coming back to my project after some time, I do not get the same results than previously even though I am working in a virtual environment, which I did not change.
When pondering about the reasons, I remembered that I upgraded my OS from Sonoma 14.1.1 to 14.5, so I was wondering whether the change in OS might cause those issues.
I'm sorry if this question is obvious to developer folks, but before I downgrade my OS or create a virtual machine, any tipp is much appreciated. Thank you!
Hi everyone,
I'm working on an iOS app built in Swift using Xcode, where I'm integrating Roboflow's object detection API to extract items from grocery receipts. My goal is to identify key information (like items, total, tax, etc.) from the images of these receipts.
I'm successfully sending images to the Roboflow API and receiving predictions with bounding box data, but when I attempt to extract text from the detected regions (bounding boxes), it appears that the text extraction is failing—no text is being recognized. The issue seems to be that the bounding boxes are either not properly being handled or something is going wrong in the way I process the API response.
Here's a brief breakdown of what I'm doing:
The image is captured, converted to base64, and sent to the Roboflow API.
The API response comes back with bounding boxes for the detected elements (items, date, subtotal, etc.).
The problem occurs when I try to extract the text from the image using the bounding box data—it seems like the bounding boxes are being found, but no text is returned.
I suspect the issue might be happening because the app’s segue to the results view controller is triggered before the OCR extraction completes, or there might be a problem in my code handling the bounding box response.
Response Data:
{
"inference_id": "77134cce-91b5-4600-a59b-fab74350ca06",
"time": 0.09240847699993537,
"image": {
"width": 370,
"height": 502
},
"predictions": [
{
"x": 163.5,
"y": 250.5,
"width": 313.0,
"height": 127.0,
"confidence": 0.9357666373252869,
"class": "Item",
"class_id": 1,
"detection_id": "753341d5-07b6-42a1-8926-ecbc61128243"
},
{
"x": 52.5,
"y": 417.5,
"width": 89.0,
"height": 23.0,
"confidence": 0.8819760680198669,
"class": "Date",
"class_id": 0,
"detection_id": "b4681149-d538-47b1-8700-d9528bf1daa0"
},
...
]
}
And the log showing bounding boxes:
Prediction: ["width": 313, "y": 250.5, "x": 163.5, "detection_id": 753341d5-07b6-42a1-8926-ecbc61128243, "class": Item, "height": 127, "confidence": 0.9357666373252869, "class_id": 1]
No bounding box found in prediction.
I've double-checked the bounding box coordinates, and everything seems fine. Does anyone have experience with using OCR alongside object detection APIs in Swift? Any help on how to ensure the bounding boxes are properly processed and used for OCR would be greatly appreciated!
Also, would it help to delay the segue to the results view controller until OCR is complete?
Thank you!
The metal plugin for TensorFlow had its GitHub repo taken down, and on pypi, the last update was a year ago for TF 2.14. What's the status on the metal plugin? For now it seems to work fine for TF 2.15 but what's the plan for the future?
When I use VNGenerateForegroundInstanceMaskRequest to generate the mask in the simulator by SwiftUI, there is an error "Could not create inference context".
Then I add the code to make the vision by CPU:
let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(ciImage: inputImage)
#if targetEnvironment(simulator)
if #available(iOS 18.0, *) {
let allDevices = MLComputeDevice.allComputeDevices
for device in allDevices {
if(device.description.contains("MLCPUComputeDevice")){
request.setComputeDevice(.some(device), for: .main)
break
}
}
} else {
// Fallback on earlier versions
request.usesCPUOnly = true
}
#endif
do {
try handler.perform([request])
if let result = request.results?.first {
let mask = try result.generateScaledMaskForImage(forInstances: result.allInstances, from: handler)
return CIImage(cvPixelBuffer: mask)
}
} catch {
print(error)
}
Even I force the simulator to run the code by CPU, but it still have the error: "Could not create inference context"
Was just wondering, not sure if anyone else had thought about this.
but different sound output device have different mechanism of sound throw.
can we not put in something which can go into bluetooth settings and overseeing if it is a music device connected would automatically set the EQ differently( as per user requirement)
So its somewhat like each music device would have specific music EQ stored for the same which can be recognized via bluetooth.
VNRecognizeTextRequest2 did not recognize the upside down text of English text. VNRecognizeTextRequest3 can recognize the text even if English text is upside down.
Till iOS 17, I can select VNRecognizeTextRequest2 or VNRecognizeTextRequest3 in my code which is minimum build is iOS16 when I need upside down text detection required..
But on iOS18, even if I set the VNRecognizeTextRequest2 in my code, result seems to be based on the VNRecognizeTextRequest3 because upside down text is detected.
VNRecognizeTextRequest2 was deplicant on iOS18, I know.
How can I recognize the observation result is upside down or not? Are there any solution with VNRecognizeTextRequest3?
The following code taken from keras.io produces the error
InternalError: Exception encountered when calling GPT2Tokenizer.call().
...
2 root error(s) found.
(0) INTERNAL: stream cannot wait for itself
Macos on Macbook, M2 Max. Setting the optimizer to "Adam" does not help.
import keras_nlp # version 0.15
causal_lm = keras_nlp.models.GPT2CausalLM.from_preset("gpt2_base_en")
causal_lm.compile(sampler="greedy")
# the next call produces the error
causal_lm.generate(["Keras is a"])
Where does the processing power to enact certain AI capabilities come from? Is it hosted on the originating device? Or does the device send contents of originating information to Apple assets to process and give product to end user?
e.g. If I ask AI to summarize an email will it send the contents of the email to an Apple AI asset to process it and give the summary to the originating device.
Following this instruction to install jax (https://developer.apple.com/metal/jax/), I still encountered this error:
RuntimeError: This version of jaxlib was built using AVX instructions, which your CPU and/or operating system do not support. This error is frequently encountered on macOS when running an x86 Python installation on ARM hardware. In this case, try installing an ARM build of Python. Otherwise, you may be able work around this issue by building jaxlib from source.
How to fix it?
Hi, I'm trying to personalize the Detect animal poses in Vision example (WWDC 23).
Detect animal poses in Vision
After some tests I saw that the landmarks and connection drawings work only if I do not ignore the safe area, if I ignore it (removing the toggle) or use the app on the iPad the drawings are no longer applied correctly.
In the example GeometryReader is used to detect the size of the view:
...
ZStack {
GeometryReader { geo in
AnimalSkeletonView(animalJoint: animalJoint, size: geo.size)
}
}.frame(maxWidth: .infinity)
...
struct AnimalSkeletonView: View {
// Get the animal joint locations.
@StateObject var animalJoint = AnimalPoseDetector()
var size: CGSize
var body: some View {
DisplayView(animalJoint: animalJoint)
if animalJoint.animalBodyParts.isEmpty == false {
// Draw the skeleton of the animal.
// Iterate over all recognized points and connect the joints.
ZStack {
ZStack {
// left head
if let nose = animalJoint.animalBodyParts[.nose] {
if let leftEye = animalJoint.animalBodyParts[.leftEye] {
Line(points: [nose.location, leftEye.location], size: size)
.stroke(lineWidth: 5.0)
.fill(Color.orange)
}
}
...
}
}
}
}
}
// Create a transform that converts the pose's normalized point.
struct Line: Shape {
var points: [CGPoint]
var size: CGSize
func path(in rect: CGRect) -> Path {
let pointTransform: CGAffineTransform =
.identity
.translatedBy(x: 0.0, y: -1.0)
.concatenating(.identity.scaledBy(x: 1.0, y: -1.0))
.concatenating(.identity.scaledBy(x: size.width, y: size.height))
var path = Path()
path.move(to: points[0])
for point in points {
path.addLine(to: point)
}
return path.applying(pointTransform)
}
}
Looking online I saw that it was recommended to change the property cameraView.previewLayer.videoGravity
from:
cameraView.previewLayer.videoGravity = .resizeAspectFill
to:
cameraView.previewLayer.videoGravity = .resizeAspect
but it doesn't work for me.
Could you help me understand where I'm wrong?
Thanks!
Hello,
I‘m using DockKit within my SwiftUI Application with GetStream. Before updating to iOS 18 yesterday the custom Tracking using DockKit worked like a charm, but After updating it stopped working unexpectedly.
What‘s more curious: using the official GetStream Video Calls Application it works on iOS18 still, but Not within my Application. I can confirm, that my iPhone is still paired and I can receive logs about the current docking State and everything seems fine.
Any suggestions what I‘m missing here?
I am searching for a method to remove background from a video. it can be from camera Session fileOutput url or from photo library.
I was able to accomplish live preview of removed background with the depth data and some metal framework code from the example Enhancing Live Video by Leveraging TrueDepth Camera Data. However I count figure out a way to save this as a video so that I can upload it.
Also this method is using over 150% of cpu ( Xcode cpu usage ), which seems to be quite a lot and the device is getting heated up so fast and drops the frames when It hot.
I also found something similar from GitHub using CoreML example by Dmitry Voitekh which only uses less than 40% cpu.
Any information regarding this will be helpful.
Objective : Remove Background from video and save it
Hi,
I am working with AppIntents, and have created an OpenIntent with a target using a MyAppContact AppEntity that I have created. This works fine when running from Shortcuts because it pops up a list of options from the 'suggestedEntities` method. But It doesn't work well when using with Siri. It invokes the AppIntent, but keeps repeatedly asking for the value of the 'target' entity, which you can't really pass in with voice.
What's the workaround here? Can an OpenIntent be activated by voice as well?
Hi,
I have a 128Gb 2024 M2 iPad Air. I upgraded to iPadOS 18 Beta 3 & I’m not seeing the option in settings to join the waitlist for the Apple Intelligence? Is this just an option in settings they’re gradually rolling out to devices? I know my iPad is eligible.
I've been attempting to install tf metal on my computer so that I can use GPUs instead of CPUs. I have tf macOS installed already, and I am fully updated with pip and tf. I'm currently 2 months into building and training a tf CNN, and I'm at the point where training a single epoch for my network will take a week (I have a lot of data that I need to use). I desperately need to use GPUs but am stuck with CPUs for now. I can't get access to a cluster, so the best I can do is continue to use my M2 MacBook. Is there any other way I can install TF metal? Is there a way I can use GPUs (rather than CPUs) when using TF if I can't get install metal?
I keep getting this error message:
"ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none) ERROR: No matching distribution found for tensorflow-metal"
I looked on apple forums, tried to download it from GitHub (the page is down), and anything else I could think of and/or find on the internet to help, but it still isn't installing.
I've used the following commands and still no luck:
python -m pip install tensorflow-metal
pip install https://github.com/apple/tensorflow_metal/releases/download/v0.5.0/tensorflow_metal-0.5.0-py3-none-any.whl
pip install tensorflow-metal
pip3 install tensorflow-metal
SYSTEM_VERSION_COMPAT=0 python -m pip install tensorflow-metal
SYSTEM_VERSION_COMPAT=0 pip install tensorflow-macos tensorflow-metal
conda install -c anaconda tensorflow-gpu
Any help would be appreciated! Thanks so much!
I'm trying to create an App Shortcut so that users can interact with one of my app's features using Siri. I would like to be able to turn this shortcut on or off at runtime using a feature toggle.
Ideally, I would be able to do something like this.
struct MyShortcuts: AppShortcutsProvider {
static var appShortcuts: [AppShortcut] {
// This shortcut is always available
AppShortcut(
intent: AlwaysAvailableIntent(),
phrases: ["Show my always available intent with \(.applicationName)"],
shortTitle: "Always Available Intent",
systemImageName: "infinity"
)
// This shortcut is only available when "myCoolFeature" is available
if FeatureProvider.shared.isAvailable("myCoolFeature") {
AppShortcut(
intent: MyCoolFeatureIntent(),
phrases: ["Show my cool feature in \(.applicationName)"],
shortTitle: "My Cool Feature Intent",
systemImageName: "questionmark"
)
}
}
}
However, this does not work because the existing buildOptional implementation is limited to components of type (any _AppShortcutsContentMarker & _LimitedAvailabilityAppShortcutsContentMarker)?.
All other attempts at making appShortcuts dynamic have resulted in shortcuts not working at all. I've tried:
Creating a makeAppShortcuts method that returns [AppShortcut] and invoking this method from within the appShortcuts
Extending AppShortcutsBuilder to support a buildOptional block isn't restricted to a component type of (any _AppShortcutsContentMarker & _LimitedAvailabilityAppShortcutsContentMarker)?
Extending AppShortcutsBuilder to support buildArray and using compactMap(_:) to return an empty array when the feature is disabled
I haven't used SiriKit before but it appears that shortcut suggestions were set at runtime by invoking setShortcutSuggestions(_:), meaning that what I'm trying to do would be possible. I'm not against using SiriKit if I have to but my understanding is that the App Intents framework is meant to be a replacement for SiriKit, and the prompt within Xcode to replace older custom intents with App Intents indicates that that is indeed the case.
Is there something obvious that I'm just missing or is this simply not possible with the App Intent framework? Is the App Intent framework not meant to replace SiriKit and should I just use that instead?
I am opening the Siri shortcut screen from the viewDidLoad method, as follows:
override func viewDidLoad() {
super.viewDidLoad()
// Present the Siri Shortcut screen to add Card Payment Intent
let viewController = INUIAddVoiceShortcutViewController(shortcut: INShortcut(intent: self.cardPaymentIntent)!)
viewController.modalPresentationStyle = .pageSheet
// Setting Delegate
viewController.delegate = self
self.present(viewController, animated: true, completion: nil)
}
// Delegate Method Conformance :: INUIAddVoiceShortcutViewControllerDelegate
@available(iOS 12.0, *)
func addVoiceShortcutViewController(_ controller: INUIAddVoiceShortcutViewController, didFinishWith voiceShortcut: INVoiceShortcut?, error: Error?) {
controller.dismiss(animated: true, completion: nil)
// The issue is here. Whether we add the or Dismiss the Siri shortcut screen without adding it, this delegate gets called.
}
@available(iOS 12.0, *)
func addVoiceShortcutViewControllerDidCancel(_ controller: INUIAddVoiceShortcutViewController) {
controller.dismiss(animated: true, completion: nil)
}
// Card Payment Intent
public var cardPaymentIntent: CardPaymentIntent {
let intent = CardPaymentIntent()
intent.suggestedInvocationPhrase = NSLocalizedString("Pay my credit card", comment: "")
return intent
}
Whenever I present the siri shortcut screen, either I add the shortcut or dismiss the screen without adding. In both cases , the shortcut is added. And this method is called every time
func addVoiceShortcutViewController(_ controller: INUIAddVoiceShortcutViewController, didFinishWith voiceShortcut: INVoiceShortcut?, error: Error?)
Any solution ? while I dismiss the screen, i want it not to be added into the shortcut
Hello everybody,
I am running into an error with BNNS.NormalizationLayer. It appears to only work with .vector, and matrix shapes throws layerApplyFail during training. Inference doesn't throw but the output stays the same.
How to correctly use BNNS.NormalizationLayer with matrix shapes? How to debug layerApplyFail exception?
Thanks
let array: [Float32] = [
01, 02, 03, 04, 05, 06,
07, 08, 09, 10, 11, 12,
13, 14, 15, 16, 17, 18,
]
// let inputShape: BNNS.Shape = .vector(6 * 3) // works
let inputShape: BNNS.Shape = .matrixColumnMajor(6, 3)
let input = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape)
let output = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape)
let beta = BNNSNDArrayDescriptor.allocate(repeating: Float32(0), shape: inputShape, batchSize: 1)
let gamma = BNNSNDArrayDescriptor.allocate(repeating: Float32(1), shape: inputShape, batchSize: 1)
let activation: BNNS.ActivationFunction = .identity
let layer = BNNS.NormalizationLayer(type: .layer(normalizationAxis: 0), input: input, output: output, beta: beta, gamma: gamma, epsilon: 1e-12, activation: activation)!
let layerInput = BNNSNDArrayDescriptor.allocate(initializingFrom: array, shape: inputShape)
let layerOutput = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape)
// try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .inference) // No throw
try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .training)
_ = layerOutput.makeArray(of: Float32.self) // All zeros when .inference
My application is being supported by both iOS and iPadOS platforms. I would like to add AppIntents only on iOS and not on iPadOS. I have not found any blogs related to it. Is it possible? If so, May I know how can we do it?
If not, what is the best practice to avoid showing Siri shortcuts on iPad.