Apple Developer Forums

Vision framework OCR missing Swedish support?

WWDC 2024 mentioned that the OCR feature from the Vision framework has support for "Korean, Swedish, and Chinese", but the Swedish support does not seem to be available... Running either print(try? VNRecognizeTextRequest().supportedRecognitionLanguages()) or var ocrRequest = RecognizeTextRequest(.revision3) print(ocrRequest.supportedRecognitionLanguages) did not print out Swedish as one of the supported languages, but Korean and Chinese are. Tested on early versions of iOS 18 developer beta, and the latest version of iOS 18.1 (22B5054e).

Machine Learning & AI General Vision

1

0

303

Oct ’24

Kernel dying issue after installing tensorflow

I was working on my project and when I tried to train a model the kernel crashed, so I restarted the kernel and tried the same and still I got the same crashing issue. Then I read one of the thread having the same issue where the apple support was saying to install tensorflow-macos and tensorflow-metal and read the guide from this site: https://developer.apple.com/metal/tensorflow-plugin/ and I did so, I tried every single thing and when I tried the test code provided in the site, I got the same error, here's the code and the output. Code: import tensorflow as tf cifar = tf.keras.datasets.cifar100 (x_train, y_train), (x_test, y_test) = cifar.load_data() model = tf.keras.applications.ResNet50( include_top=True, weights=None, input_shape=(32, 32, 3), classes=100,) loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False) model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"]) model.fit(x_train, y_train, epochs=5, batch_size=64) and here's the output: Epoch 1/5 The Kernel crashed while executing code in the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click here for more info. View Jupyter log for further details. And here's the half of log file as it was not fully coming: metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 2024-10-06 23:30:49.894405: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 8.00 GB 2024-10-06 23:30:49.894420: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 2.67 GB 2024-10-06 23:30:49.894444: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-10-06 23:30:49.894460: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) 2024-10-06 23:30:56.701461: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled. [libprotobuf FATAL google/protobuf/message_lite.cc:353] CHECK failed: target + size == res: libc++abi: terminating due to uncaught exception of type google::protobuf::FatalException: CHECK failed: target + size == res: Please respond to this post as soon as possible as I am working on my project now and getting this error again n again. Device: Apple MacBook Air M1.

Machine Learning & AI General Machine Learning tensorflow-metal

0

330

Oct ’24

SFSpeechRecognitionResult discards previous transcripts with on-device option set to true

Hi everyone, I might need some help with on-device recognition. It seems that the speech recognition task will discard whatever it has transcribed after a new sentence starts (or it believes it becomes a new sentence) during a single audio session, with requiresOnDeviceRecognition is set to true. This doesn't happen with requiresOnDeviceRecognition set to false. System environment: macOS 14 with Xcode 15, deploying to iOS 17 Thank you all!

Machine Learning & AI General Speech

13

4

1.7k

Jun ’23

Do BLAS and LAPACK functions use Apple Silicon features

I can use BLAS and LAPACK functions via the Accelerate framework to perform vector and matrix arithmetic and linear algebra calculations. But do these functions take advantage of Apple Silicon features?

Machine Learning & AI General Accelerate

3

0

535

Oct ’24

Many inputs to `MPSNNGraph::encodeBatchToCommandBuffer`

I understand we can use MPSImageBatch as input to [MPSNNGraph encodeBatchToCommandBuffer: ...] method. That being said, all inputs to the MPSNNGraph need to be encapsulated in a MPSImage(s). Suppose I have an machine learning application that trains/infers on thousands of input data where each input has 4 feature channels. Metal Performance Shaders is chosen as the primary AI backbone for real-time use. Due to the nature of encodeBatchToCommandBuffer method, I will have to create a MTLTexture first as a 2D texture array. The texture has pixel width of 1, height of 1 and pixel format being RGBA32f. The general set up will be: #define NumInputDims 4 MPSImageBatch * infBatch = @[]; const uint32_t totalFeatureSets = N; // Each slice is 4 (RGBA) channels. const uint32_t totalSlices = (totalFeatureSets * NumInputDims + 3) / 4; MTLTextureDescriptor * descriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat: MTLPixelFormatRGBA32Float width: 1 height: 1 mipmapped: NO]; descriptor.textureType = MTLTextureType2DArray descriptor.arrayLength = totalSlices; id<MTLTexture> texture = [mDevice newTextureWithDescriptor: descriptor]; // bytes per row is `4 * sizeof(float)` since we're doing one pixel of RGBA32F. [texture replaceRegion: MTLRegionMake3D(0, 0, 0, 1, 1, totalSlices) mipmapLevel: 0 withBytes: inputFeatureBuffers[0].data() bytesPerRow: 4 * sizeof(float)]; MPSImage * infQueryImage = [[MPSImage alloc] initWithTexture: texture featureChannels: NumInputDims]; infBatch = [infBatch arrayByAddingObject: infQueryImage]; The training/inference will be: MPSNNGraph * mInferenceGraph = /*some MPSNNGraph setup*/; MPSImageBatch * returnImage = [mInferenceGraph encodeBatchToCommandBuffer: commandBuffer sourceImages: @[infBatch] sourceStates: nil intermediateImages: nil destinationStates: nil]; // Commit and wait... // Read the return image for the inferred result. As you can see, the setup is really ad hoc - a lot of 1x1 pixels just for this sole purpose. Is there any better way I can achieve the same result while still on Metal Performance Shaders? I guess a further question will be: can MPS handle general machine learning cases other than CNN? I can see the APIs are revolved around convolution network, both from online documentations and header files. Any response will be helpful, thank you.

Machine Learning & AI General Metal Performance Shaders Objective-C

0

270

Oct ’24

Does the Random Number Generation (RGN) process change over different OS versions?

Hi everyone! I appreciate your help. I am a researcher and I use UMAP to cluster my data. Reproducibility is a key requirement for my field, so I set a random seed for reproducibility. After coming back to my project after some time, I do not get the same results than previously even though I am working in a virtual environment, which I did not change. When pondering about the reasons, I remembered that I upgraded my OS from Sonoma 14.1.1 to 14.5, so I was wondering whether the change in OS might cause those issues. I'm sorry if this question is obvious to developer folks, but before I downgrade my OS or create a virtual machine, any tipp is much appreciated. Thank you!

Machine Learning & AI General

2

0

231

Oct ’24

Issue with OCR on Swift iOS App: Roboflow API Bounding Boxes Missing After Response

Hi everyone, I'm working on an iOS app built in Swift using Xcode, where I'm integrating Roboflow's object detection API to extract items from grocery receipts. My goal is to identify key information (like items, total, tax, etc.) from the images of these receipts. I'm successfully sending images to the Roboflow API and receiving predictions with bounding box data, but when I attempt to extract text from the detected regions (bounding boxes), it appears that the text extraction is failing—no text is being recognized. The issue seems to be that the bounding boxes are either not properly being handled or something is going wrong in the way I process the API response. Here's a brief breakdown of what I'm doing: The image is captured, converted to base64, and sent to the Roboflow API. The API response comes back with bounding boxes for the detected elements (items, date, subtotal, etc.). The problem occurs when I try to extract the text from the image using the bounding box data—it seems like the bounding boxes are being found, but no text is returned. I suspect the issue might be happening because the app’s segue to the results view controller is triggered before the OCR extraction completes, or there might be a problem in my code handling the bounding box response. Response Data: { "inference_id": "77134cce-91b5-4600-a59b-fab74350ca06", "time": 0.09240847699993537, "image": { "width": 370, "height": 502 }, "predictions": [ { "x": 163.5, "y": 250.5, "width": 313.0, "height": 127.0, "confidence": 0.9357666373252869, "class": "Item", "class_id": 1, "detection_id": "753341d5-07b6-42a1-8926-ecbc61128243" }, { "x": 52.5, "y": 417.5, "width": 89.0, "height": 23.0, "confidence": 0.8819760680198669, "class": "Date", "class_id": 0, "detection_id": "b4681149-d538-47b1-8700-d9528bf1daa0" }, ... ] } And the log showing bounding boxes: Prediction: ["width": 313, "y": 250.5, "x": 163.5, "detection_id": 753341d5-07b6-42a1-8926-ecbc61128243, "class": Item, "height": 127, "confidence": 0.9357666373252869, "class_id": 1] No bounding box found in prediction. I've double-checked the bounding box coordinates, and everything seems fine. Does anyone have experience with using OCR alongside object detection APIs in Swift? Any help on how to ensure the bounding boxes are properly processed and used for OCR would be greatly appreciated! Also, would it help to delay the segue to the results view controller until OCR is complete? Thank you!

Machine Learning & AI General Swift Vision UIKit Core ML

0

339

Sep ’24

The Vision request does not work in simulator with Error "Could not create inference context"

When I use VNGenerateForegroundInstanceMaskRequest to generate the mask in the simulator by SwiftUI, there is an error "Could not create inference context". Then I add the code to make the vision by CPU: let request = VNGenerateForegroundInstanceMaskRequest() let handler = VNImageRequestHandler(ciImage: inputImage) #if targetEnvironment(simulator) if #available(iOS 18.0, *) { let allDevices = MLComputeDevice.allComputeDevices for device in allDevices { if(device.description.contains("MLCPUComputeDevice")){ request.setComputeDevice(.some(device), for: .main) break } } } else { // Fallback on earlier versions request.usesCPUOnly = true } #endif do { try handler.perform([request]) if let result = request.results?.first { let mask = try result.generateScaledMaskForImage(forInstances: result.allInstances, from: handler) return CIImage(cvPixelBuffer: mask) } } catch { print(error) } Even I force the simulator to run the code by CPU, but it still have the error: "Could not create inference context"

Machine Learning & AI General Vision Machine Learning

2

1

358

Sep ’24

Status on TF-Metal

The metal plugin for TensorFlow had its GitHub repo taken down, and on pypi, the last update was a year ago for TF 2.14. What's the status on the metal plugin? For now it seems to work fine for TF 2.15 but what's the plan for the future?

Machine Learning & AI General tensorflow-metal

0

7

411

Sep ’24

Apple Music EQ settings

Was just wondering, not sure if anyone else had thought about this. but different sound output device have different mechanism of sound throw. can we not put in something which can go into bluetooth settings and overseeing if it is a music device connected would automatically set the EQ differently( as per user requirement) So its somewhat like each music device would have specific music EQ stored for the same which can be recognized via bluetooth.

Machine Learning & AI General

1

0

276

Sep ’24

Image Playground API

Does the new Image Playground API allow programmatically generating images? Can the app generate and use them without the API's UI or would that require using another generative image model?

Machine Learning & AI General

3

12

3.4k

Jun ’24

iOS18 using VNRecognizeTextRequest2 but VNRecognizeTextRequest3 used

VNRecognizeTextRequest2 did not recognize the upside down text of English text. VNRecognizeTextRequest3 can recognize the text even if English text is upside down. Till iOS 17, I can select VNRecognizeTextRequest2 or VNRecognizeTextRequest3 in my code which is minimum build is iOS16 when I need upside down text detection required.. But on iOS18, even if I set the VNRecognizeTextRequest2 in my code, result seems to be based on the VNRecognizeTextRequest3 because upside down text is detected. VNRecognizeTextRequest2 was deplicant on iOS18, I know. How can I recognize the observation result is upside down or not? Are there any solution with VNRecognizeTextRequest3?

Machine Learning & AI General

1

0

303

Sep ’24

Tensorflow-metal: Problems with Keras 3.0

The following code taken from keras.io produces the error InternalError: Exception encountered when calling GPT2Tokenizer.call(). ... 2 root error(s) found. (0) INTERNAL: stream cannot wait for itself Macos on Macbook, M2 Max. Setting the optimizer to "Adam" does not help. import keras_nlp # version 0.15 causal_lm = keras_nlp.models.GPT2CausalLM.from_preset("gpt2_base_en") causal_lm.compile(sampler="greedy") # the next call produces the error causal_lm.generate(["Keras is a"])

Machine Learning & AI General tensorflow-metal

1

0

435

Sep ’24

Apple AI / Data Protection & Processing

Where does the processing power to enact certain AI capabilities come from? Is it hosted on the originating device? Or does the device send contents of originating information to Apple assets to process and give product to end user? e.g. If I ask AI to summarize an email will it send the contents of the email to an Apple AI asset to process it and give the summary to the originating device.

Machine Learning & AI General Machine Learning

0

284

Sep ’24

Install jax on macOS 15.1 Beta (24B5046f)

Following this instruction to install jax (https://developer.apple.com/metal/jax/), I still encountered this error: RuntimeError: This version of jaxlib was built using AVX instructions, which your CPU and/or operating system do not support. This error is frequently encountered on macOS when running an x86 Python installation on ARM hardware. In this case, try installing an ARM build of Python. Otherwise, you may be able work around this issue by building jaxlib from source. How to fix it?

Machine Learning & AI General tensorflow-metal

1

0

565

Sep ’24

Random crash from AVFAudio library

Hi everyone ! I'm getting random crashes when I'm using the Speech Recognizer functionality in my app. This is an old bug (for 8 years on Apple Forums) and I will really appreciate if anyone from Apple will be able to find a fix for this crashes. Can anyone also help me please to understand what could I do to keep the Speech Recognizer functionality still available in my app, but to avoid this crashes (if there is any other native library available or a CocoaPod library). Here is my code and also the crash log for it. Code: func startRecording() { startStopRecordBtn.setImage(UIImage(#imageLiteral(resourceName: "microphone_off")), for: .normal) if UserDefaults.standard.bool(forKey: Constants.darkTheme) { commentTextView.textColor = .white } else { commentTextView.textColor = .black } commentTextView.isUserInteractionEnabled = false recordingLabel.text = Constants.recording if recognitionTask != nil { recognitionTask?.cancel() recognitionTask = nil } let audioSession = AVAudioSession.sharedInstance() do { try audioSession.setCategory(AVAudioSession.Category.record) try audioSession.setMode(AVAudioSession.Mode.measurement) try audioSession.setActive(true, options: .notifyOthersOnDeactivation) } catch { showAlertWithTitle(message: Constants.error) } recognitionRequest = SFSpeechAudioBufferRecognitionRequest() let inputNode = audioEngine.inputNode guard let recognitionRequest = recognitionRequest else { fatalError(Constants.error) } recognitionRequest.shouldReportPartialResults = true recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in var isFinal = false if result != nil { self.commentTextView.text = result?.bestTranscription.formattedString isFinal = (result?.isFinal)! } if error != nil || isFinal { self.audioEngine.stop() inputNode.removeTap(onBus: 0) self.recognitionRequest = nil self.recognitionTask = nil self.startStopRecordBtn.isEnabled = true } }) let recordingFormat = inputNode.outputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {[weak self] (buffer: AVAudioPCMBuffer, when: AVAudioTime) in // CRASH HERE self?.recognitionRequest?.append(buffer) } audioEngine.prepare() do { try audioEngine.start() } catch { showAlertWithTitle(message: Constants.error) } } Here is the crash log: Thanks for very much for reading this !

Machine Learning & AI General Speech AVFoundation

3

0

727

May ’24

Detect animal poses in Vision: Detected joints and connection are drawn correctly only on iPhone without ignoring safe area

Hi, I'm trying to personalize the Detect animal poses in Vision example (WWDC 23). Detect animal poses in Vision After some tests I saw that the landmarks and connection drawings work only if I do not ignore the safe area, if I ignore it (removing the toggle) or use the app on the iPad the drawings are no longer applied correctly. In the example GeometryReader is used to detect the size of the view: ... ZStack { GeometryReader { geo in AnimalSkeletonView(animalJoint: animalJoint, size: geo.size) } }.frame(maxWidth: .infinity) ... struct AnimalSkeletonView: View { // Get the animal joint locations. @StateObject var animalJoint = AnimalPoseDetector() var size: CGSize var body: some View { DisplayView(animalJoint: animalJoint) if animalJoint.animalBodyParts.isEmpty == false { // Draw the skeleton of the animal. // Iterate over all recognized points and connect the joints. ZStack { ZStack { // left head if let nose = animalJoint.animalBodyParts[.nose] { if let leftEye = animalJoint.animalBodyParts[.leftEye] { Line(points: [nose.location, leftEye.location], size: size) .stroke(lineWidth: 5.0) .fill(Color.orange) } } ... } } } } } // Create a transform that converts the pose's normalized point. struct Line: Shape { var points: [CGPoint] var size: CGSize func path(in rect: CGRect) -> Path { let pointTransform: CGAffineTransform = .identity .translatedBy(x: 0.0, y: -1.0) .concatenating(.identity.scaledBy(x: 1.0, y: -1.0)) .concatenating(.identity.scaledBy(x: size.width, y: size.height)) var path = Path() path.move(to: points[0]) for point in points { path.addLine(to: point) } return path.applying(pointTransform) } } Looking online I saw that it was recommended to change the property cameraView.previewLayer.videoGravity from: cameraView.previewLayer.videoGravity = .resizeAspectFill to: cameraView.previewLayer.videoGravity = .resizeAspect but it doesn't work for me. Could you help me understand where I'm wrong? Thanks!

Machine Learning & AI General Vision SwiftUI

1

0

398

Sep ’24

iOS 15 - AVSpeechSynthesizerDelegate didCancel not getting called

in iOS 15, on stopSpeaking of AVSpeechSynthesizer, didFinish delegate method getting called instead of didCancel which is working fine in iOS 14 and below version.

Machine Learning & AI General Speech

3

1

1.2k

Oct ’21

openAppWhenRun makes AppIntent crash when launched from Control Center.

Adding the openAppWhenRun property to an AppIntent for a ControlWidgetButton causes the following error when the control is tapped in Control Center: Unknown NSError The operation couldn’t be completed. (LNActionExecutorErrorDomain error 2018.) Here’s the full ControlWidget and AppIntent code that causes the errorerror: Should controls be able to open apps after the AppIntent runs, or is this a bug?

Machine Learning & AI General WidgetKit App Intents

5

2

1.7k

Jul ’24

DockKit in custom App Not Tracking anymore after updating to iOS 18

Hello, I‘m using DockKit within my SwiftUI Application with GetStream. Before updating to iOS 18 yesterday the custom Tracking using DockKit worked like a charm, but After updating it stopped working unexpectedly. What‘s more curious: using the official GetStream Video Calls Application it works on iOS18 still, but Not within my Application. I can confirm, that my iPhone is still paired and I can receive logs about the current docking State and everything seems fine. Any suggestions what I‘m missing here?

Machine Learning & AI General DockKit

0

309

Sep ’24

General

Post

Replies

Boosts

Views

Activity