I tried running inference with the 2B model from https://github.com/google-deepmind/gemma on my M2 MacBook Pro, but it segfaults during sampling: https://pastebin.com/KECyz60T
Note: out of the box it will try to load bfloat16 weights, which will fail. To avoid this, I patched line 30 in gemma/params.py to explicitly cast to float32:
param_state = jax.tree_util.tree_map(lambda p: jnp.array(p, jnp.float32), params)
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Post
Replies
Boosts
Views
Activity
NLEmembedding.wordEmbedding is not available in your language.
This is a very serious issue for any service that caters to Koreans, please fix it quickly. We have added the sample code below.
import UIKit
import CoreML
import NaturalLanguage
class MLTextViewController: UIViewController {
override func viewDidLoad() {
super.viewDidLoad()
execute()
}
func execute() {
if let embedding = NLEmbedding.wordEmbedding(for: .korean) {
let word = "bicycle"
if let vector = embedding.vector(for: word) {
print(vector)
}
let specificDistance = embedding.distance(between: word, and: "motorcycle")
print("✅ \(specificDistance.description)")
embedding.enumerateNeighbors(for: word, maximumCount: 5) { neighbor, distance in
print("\(neighbor): \(distance.description)")
return true
}
}
}
}
Where can I find the Puzzle Game demo code they showed in the video for lift subjects from images in the app? Thank you!
https://developer.apple.com/videos/play/wwdc2023/10176/
I am using VNRecognizeTextRequest to read Chinese characters. It works fine with text written horizontally, but if even two characters are written vertically, then nothing is recognized. Does anyone know how to get the vision framework to either handle vertical text or recognize characters individually when working with Chinese?
I am setting VNRequestTextRecognitionLevel to accurate, since setting it to fast does not recognize any Chinese characters at all. I would love to be able to use fast recognition and handle the characters individually, but it just doesn't seem to work with Chinese. And, when using accurate, if I take a picture of any amount of text, but it's arranged vertically, then nothing is recognized. I can take a picture of 1 character and it works, but if I add just 1 more character below it, then nothing is recognized. It's bizarre.
I've tried setting usesLanguageCorrection = false and tried using VNRecognizeTextRequestRevision3, ...Revision2 and ...Revision1. Strangely enough, revision 2 seems to recognize some text if it's vertical, but the bounding boxes are off. Or, sometimes the recognized text will be wrong.
I tried playing with DataScannerViewController and it's able to recognize characters in vertical text, but I can't figure out how to replicate it with VNRecognizeTextRequest. The problem with using DataScannerViewController is that it treats the whole text block as one item, and it uses the live camera buffer. As soon as I capture a photo, I still have to use VNRecognizeTextRequest.
Below is a code snippet of how I'm using VNRecognizeTextRequest. There's not really much to it and there aren't many other parameters I can try out (plus I've already played around with them). I've also attached a sample image with text laid out vertically.
func detectText(
in sourceImage: CGImage,
oriented orientation: CGImagePropertyOrientation
) async throws -> [VNRecognizedTextObservation] {
return try await withCheckedThrowingContinuation { continuation in
let request = VNRecognizeTextRequest { request, error in
// ...
continuation.resume(returning: observations)
}
request.recognitionLevel = .accurate
request.recognitionLanguages = ["zh-Hant", "zh-Hans"]
// doesn't seem have any impact
// request.usesLanguageCorrection = false
do {
let requestHandler = VNImageRequestHandler(
cgImage: sourceImage,
orientation: orientation
)
try requestHandler.perform([request])
} catch {
continuation.resume(throwing: error)
}
}
}
The CoreML model worked correctly in the “Preview” of “CreateML”.
However, after it is put into the Xcode project and replaced the “MobileNetV2” , it did not classify the images correctly, it returned one image with high confidence all the time no matter what image it is .
The same code works fine when executed on real device.
Can someone please assist on this ?
The application is developed in SwiftUI.
Our application is responsible for audio recording, transcribing the audio file and uploading it to the backend.
So, the 2 main components on the iOS application are : AVAudioRecorder, SFSpeechRecognizer.
The UI compromises a visual design which showcases the recording of audio, and lets the user know if the audio is being recorded on not using a Text component.
Lately the customer has been complaining that though the application says “Recording ” on the UI, their audios are not being are not being received at the backend.
The customers try restarting there device(iPad) and the application started working normally
We haven’t been able to reproduce the issue. But we suspect an intermittent failure in audio transmission or a potential UI freezing.
Note : I have tried using Leaks instrument and had not encountered any memory leaks while using the application.
Is there a way to determine whether the issue lies with the audio recorder, the speech recognizer, or elsewhere in the app?
Are there any known issues or limitations with audio recorder lately on iOS that could be causing this behaviour?
Please let me know if you have any suggestions to diagnose this issue.
Also, do let me know if more information is required
Thank you in advance
I have an mlprogram of size 127.2MB it was created using tensorflow and then converted to CoreML. When I request a prediction the amount of memory shoots up to 2-2.5GB every time. I've tried using the optimization techniques in coremltools but nothing seems to work it still shoots up to the same 2-2.5GB of ram every time. I've attached a graph to see it doesn't seem to be a leak as the memory is then going back down.
i'm trying to create an NLModel within a MessageFilterExtension handler.
The code works fine in the main app, but when I try to use it in the extension it fails to initialize. Just this doesn't even work and gets the error below.
Single line that fails.
SMS_Classifier is the class xcode generated for my model. This line works fine in the main app.
let mlModel = try SMS_Classifier(configuration: MLModelConfiguration()).model
Error
Unable to locate Asset for contextual word embedding model for local en.
MLModelAsset: load failed with error Error Domain=com.apple.CoreML Code=0 "initialization of text classifier model with model data failed" UserInfo={NSLocalizedDescription=initialization of text classifier model with model data failed}
Any ideas?
Hi
can you add new feature in Pages and Numbers using Ai to apply style from PDF or template to documents, so ai arrange footers and headers and fonts , pages breaks , pages numbers, like one in PDF or templates , so we can auto format documents to desired look standard, also for Numbers. So we can on raw text upload pdf of another documents or report and get documents in that style for export to pdf or print
Best regards,
Hardware: 16" 2023 MBP M3 Pro
OS: 14.4.1
Memory: 36 GB
python version: 3.8.16
TF-Metal version: tensorflow-metal 1.0.1 installed via pip
TF version: 2.13.0
Tensorflow-Metal starts pretty slow, approximately 10s/iteration and over the course of 36 iteration progressively slows down to over 120s/iteration. Info log prints out that TFLite is using XNNPack. Can't share the TFLite model but it is relatively shallow, small, and simple.
Uninstalled TF-Metal, and installed tensorflow. Inference speed picks right up and is rock solid at 0.78s/iteration. What is going on???
**TLDR, TFLite inference speed:
TF Metal = 120s/iteration
TF = 0.78s/iteration**
I hope this message finds you well. I recently had the opportunity to watch the insightful session titled "Improve Core ML Integration with Async Prediction" and was thoroughly impressed by the depth of information and the practical demonstration provided. The session offered valuable insights that I believe would greatly benefit my ongoing projects and my understanding of Core ML integration.
As I am keen on implementing the demonstrated workflows and techniques within my own work, I am reaching out to kindly request access to the source code and any related material presented during the session. Having access to the code would enable me to better understand the concepts discussed and apply them more effectively in real-world scenarios.
I believe that being able to review and experiment with the actual code would significantly enhance my learning experience and the implementation efficiency of my projects. It would also serve as a valuable resource for referencing best practices in Core ML integration and async prediction techniques.
Thank you very much for considering my request. I greatly appreciate the effort that went into creating such an informative session and am looking forward to potentially exploring the material in greater depth.
Best regards,
Fabio G.
Hello! I'm writing to the Apple developers to request the addition of an API for downloading premium voices directly within the app. Currently, this can only be done via the settings, which is not convenient for our users. As a developer for an application where this plays a crucial role, I ask you to take this into consideration. Thank you!
For example: we use DocKit for birdwatching, so we have an unknown field distance and direction.
Distance = ?
Direction = ?
For example, the rock from which the observation is made. The task is to recognize the number of birds caught in the frame, add a detection frame and collect statistics.
Question:
What is the maximum number of frames processed with custom object recognition?
If not enough, can I do the calculations myself and transfer to DokKit for fast movement?
Description:
Problem Statement:
State the problem clearly: The Siri Intent for the "Next","Previous","Repeat" command is not working as expected within the Speech Framework.
Steps to Reproduce:
Provide a detailed description of the steps to reproduce the issue. For example:
Open the Speech Framework application.
Tap on the Siri button to activate voice input.
Say "Next" to trigger the intended action.
Observe that the action is not executed correctly.
IN Our Demo App:
Steps of my demo application as below:
Open SIRI
Speak: Check
In Response: Open dialog as below:
What user wants?
One 2) Next 3) Yes 4) Goodbye
Speak: Next
In Response: SIRI repeat same dialog (Step: 2)
3) Speak: Yes, or One or Goodbye
In Response: SIRI goes to next dialog.
Expected Behavior:
Should be get "Next" Value in siri kit intent or app intent.
Actual Behavior:
But it give previous user input key word give in siri kit intent and recuresively repeat dialog in app intent.
Device versions and Region and Language:
Device model: IPhone 11 and OS version: 17.4.1
Region: Us and Language: English(US)
Impact:
User Cant use Iterative dialog in one context.
Additional:
How Different command work on app intent and siri kit intent on diffrent diffrent device. you can follow No vise in order.
|| No || Diffrent Device test on Diffrent sinario || SiriKit intent || app Intent ||
| 1 | ISG iPhone 11 - Next | Not | Not |
| 2 | ISG iPhone 11 - Yes | Not | Yes (But Using Enum) |
| 3 | ISG iPhone 11 - GoodBye | Not | Yes (But Using Enum) |
| 4 | ISG iPhone 11 - One | Yes | Yes |
| 5 | iPad - Next | Not | Not |
| 6 | iPad - One | Yes | Yes |
| 7 | iPad - GoodBye | Not | Yes |
| 8 | iPad - Yes | Not | Yes |
| 9 | Simulator - iPhone 15 - Next, Yes, One, GoodBye | Yes | Yes |
Please help me in it...
I'm using Filemaker, with Monkey Bread Software plugin's CoreML features, to find that it can only write to .mlmodelc.
Are these (.mlmodel = .mlmodelc) the same? If not, how do you generate a .mlmodelc using XCode.
Please let me know, thanks.
Hi,
I have encountered to a segfault error when I called something via jax.lax.scan.
A minimum failing example is pasted below:
$ ipython
Python 3.9.6 (default, Feb 3 2024, 15:58:27)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.18.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import jax
In [2]: jax.__version__
Out[2]: '0.4.22'
In [3]: import jaxlib
In [4]: jaxlib.__version__
Out[4]: '0.4.22'
In [6]: import jax.numpy as jnp
In [7]: def f(carry, x):
...: return carry + x * x, x * x
...:
...: jax.lax.scan(f, jnp.zeros((), dtype=jnp.float32), jnp.arange(3, dtype=jnp.float32))
Platform 'METAL' is experimental and not all JAX functionality may be correctly supported!
2024-04-16 01:03:52.483015: W pjrt_plugin/src/mps_client.cc:563] WARNING: JAX Apple GPU support is experimental and not all JAX functionality is correctly supported!
Metal device set to: Apple M3 Max
systemMemory: 36.00 GB
maxCacheSize: 13.50 GB
zsh: segmentation fault ipython
This might be related to the thread below:
https://developer.apple.com/forums/thread/749080
Strangely, when we call it
jax.lax.scan is a very important building block, so I would greatly appreciate if this can be resolved soon.
Copying from https://github.com/google/jax/issues/20750:
import jax
import jax.numpy as jnp
def test_func(x, y):
return x, y
def main():
# Print available JAX devices
print("JAX devices:", jax.devices())
# Create two random matrices
a = jnp.array([[1.0, 2.0], [3.0, 4.0]])
b = jnp.array([[5.0, 6.0], [7.0, 8.0]])
# Perform matrix multiplication
c = jnp.dot(a, b)
# Print the result
print("Result of matrix multiplication:")
print(c)
# Compute the gradient of sum of c with respect to a
grad_a = jax.grad(lambda a: jnp.sum(jnp.dot(a, b)))(a)
print("Gradient with respect to a:")
print(grad_a)
rng = jax.random.PRNGKey(0)
test_input = jax.random.normal(key=rng, shape=(5,5,5))
initial_state = jax.numpy.array(0.0)
x, y = jax.lax.scan(test_func, initial_state, test_input)
if __name__ == "__main__":
main()
Gets:
Platform 'METAL' is experimental and not all JAX functionality may be correctly supported!
2024-04-15 18:22:28.994752: W pjrt_plugin/src/mps_client.cc:563] WARNING: JAX Apple GPU support is experimental and not all JAX functionality is correctly supported!
Metal device set to: Apple M2 Pro
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB
JAX devices: [METAL(id=0)]
Result of matrix multiplication:
[[19. 22.]
[43. 50.]]
Gradient with respect to a:
[[11. 15.]
[11. 15.]]
zsh: segmentation fault python JAXTest.py
With more info from the debugger:
Current thread 0x00000001fdd3bac0 (most recent call first):
File "/Users/.../anaconda3/lib/python3.11/site-packages/jax/_src/interpreters/pxla.py", line 1213 in __call__
My configuration is:
jax-metal : 0.0.6
jax: 0.4.26
jaxlib: 0.4.23
numpy: 1.24.3
python: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 16.0.6 ]
jax.devices (1 total, 1 local): [METAL(id=0)]
process_count: 1
platform: uname_result(system='Darwin', root:xnu-10063.101.17~1/RELEASE_ARM64_T6020', machine='arm64')
macOS 14.4.1 (23E224)
Before in 3.9+0.0.3 etc it wasn't happening.
I was wondering if there is a quick way to convert a model trained with the open source CRFSuite for use with NLTagger?
It seems like retraining should be possible but was wondering if automatic conversion was supported?
Hello!
We have an app that utilises the SpeechKit Framework. Especially the local on-device speech recognition for the audio files with the user selected language.
Up until recently it worked as expected. However after updating one of our testing device to iOS 17.4.1 we found out that the local recognition on it stopped working completely.
The error that we are getting has code 102 at its localised description reads:
"Failed to access assets".
That sounds just like a rear though known issue in previous iOS versions. The solution was inconvenient for our users but at least it worked – they were to go to the System settings and tweak with the dictation setting in the keyboard section.
Right now no tweaks of this sort appear to help us fix the situation. We even tried to do the setting reset of the device (not the factory reset though). The error persists.
it appears one one of our devices 100% of the time, halting the local recognition process. It sometimes shows on other devices for some particular languages too, but it does not show for other languages.
As it is a UX breaking bug for our app, today I decided to check the logs of the Console app at the moment of the recognition attempt.
There are lots of errors with code 1101 which from our research appear to be the general notifications about some local recognition setup problems.
Removing the lines about the 1101 error from the log we have some interesting stuff remaining, that is (almost) never mentioned in any of the searchable webpages in the Internet. I assume they are the private API calls that the SpeechKit Framework executes under the hood:
default localspeechrecognition -[UAFAssetSet assetNamed:]_block_invoke 9067C4F1-0B29-4A57-85DD-F8740DF7C344: No assets in asset set com.apple.siri.understanding
default localspeechrecognition -[UAFAssetSet assetNamed:] 9067C4F1-0B29-4A57-85DD-F8740DF7C344: Returning com.apple.siri.asr.assistant from source none
error localspeechrecognition -[SFEntitledAssetManager _assetWithAssetConfig:regionId:] No asset found with name: com.apple.siri.asr.assistant, asset set: com.apple.siri.understanding, usage: <private>
error localspeechrecognition +[LSRConnection modelRootWithLanguage:clientID:modelOverrideURL:returningAssetType:error:] Fetch asset error (null)
error localspeechrecognition -[LSRConnection prepareRecognizerWithLanguage:recognitionOverrides:modelOverrideURL:anyConfiguration:task:clientID:error:] modelRoot is nil (null)
default OurApp [0x113e96d40] invalidated because the current process cancelled the connection by calling xpc_connection_cancel()
Looks like there are some language-model related problems that appeared after the device was updated to 17.4.1.
The Settings -> General -> Keyboard -> Dictation Languages appear to be configured correctly, the dictation toggle is On, we tried tweaking all these setting, rebooting the device and resetting the device settings.
However the log lines still tell us that there is something wrong with the private resources of the SpeechKit framework.
We are very concerned as the speech recognition is the core of out application's logic. And we don't understand what is the scale of possible impact of such a faulty behaviour (rare occurrences / some users / all users?) and how we can fix it to provide our users with the desired behaviour.
Will macos support amd rx7600?