Vision

Implement model recognition based on the iPhone 14 Max camera, with LiDAR calculating the width and height.

Based on the iPhone 14 Max camera, implement model recognition and draw a rectangular box around the recognized object. The width and height are calculated using LiDAR and displayed in centimeters on the real-time updated image.

Media Technologies Photos & Camera Concurrency Vision VisionKit

0

142

19h

Desire to close contour from left & right two top points

Hello all... is there a way to close a contour if you have found say two points on each side top "extension"? see image attached. So in end desire a trapezoid type shape. Code example would be very appreciated. thank you :) Think I have it as a CGPath. So a way to edit a CGPath, or close the top from a top left to a top right point?

Graphics & Games General Vision

2

0

119

2d

Update main window screen size in vision os

I have to decrease main window screen size when user open Immersive space in my project. Using frame i try it but it not updated main window size it just update view frame.

UI Frameworks SwiftUI Vision SwiftUI

0

99

3d

The results detected by VNRecognizeTextRequest may miss some text blocks.

We are using VNRecognizeTextRequest to detect text in documents, and we have noticed that even in some very clear and well-formatted documents, there are still instances where text blocks are missed. the live text also have the same issue.

Machine Learning & AI General Vision Live Text

2

0

185

5d

Vision to detect corners and or 3 lines.

End goal: to detect 3 lines, and 2 corners accurately. Trying contours but they are a bit off. Is there a way or settings in contours to detect corners and lines more accurately, maybe less an sharper edged/corner contours? Or some other API or methods please? I would love an email please ;) thank you. 2. also an overlay/scale issue

Graphics & Games General Vision

3

0

234

4d

Filtering Contours from Vision

Hello, I need help I desire to select/filter the contours on an image. Not sure best way to do that. Idea select/filter for bottom left most contour? see image attached please. also will need end points or court corners. and need contour to be fine line, smooth, ie accurate of the court end line and side lines only is desired. thank you :) or also glad for other ideas or api to determine the lines/corners I need. glad to email to discuss if that is better/easier actually prefer that. thanks.

Developer Tools & Services Xcode Vision Machine Learning

3

0

164

1w

Help Needed: SwiftUI View with Camera Integration and Core ML Object Recognition

Hi everyone, I'm working on a SwiftUI app and need help building a view that integrates the device's camera and uses a pre-trained Core ML model for real-time object recognition. Here's what I want to achieve: Open the device's camera from a SwiftUI view. Capture frames from the camera feed and analyze them using a Create ML-trained Core ML model. If a specific figure/object is recognized, automatically close the camera view and navigate to another screen in my app. I'm looking for guidance on: Setting up live camera capture in SwiftUI. Using Core ML and Vision frameworks for real-time object recognition in this context. Managing navigation between views when the recognition condition is met. Any advice, code snippets, or examples would be greatly appreciated! Thanks in advance!

Machine Learning & AI Create ML Vision SwiftUI

1

0

265

2w

[Vision, visionOS] Is it possible using Vision Framework on visionOS for body tracking feature?

Hello, I checked following documentations. Vision | Apple Developer Documentation Discover Swift enhancements in the Vision framework - WWDC24 - Videos - Apple Developer I saw Vision Framework is available on visionOS. So I want to know that if it's possible using Vision Framework on visionOS for tracking human and animal body poses. Or are there some limits to use this on visionOS?

Spatial Computing General Vision visionOS

1

0

243

2w

How to Implement a Curved Surface Effect for Video Playback and Allow Dynamic Width Adjustment in visionOS?

Dear Apple Engineers, I am working on a project in visionOS and need to implement a curved surface effect for video playback, where the width of the surface can be dynamically adjusted. Specifically, I want the video to be displayed on a curved surface (similar to a scroll unfolding), and the user should be able to adjust the width of this surface. I have the following specific questions: How can I implement a curved surface for video playback and ensure the video content is not stretched or distorted on the surface? How can I create a dynamic curved surface (such as a bending plane) in RealityKit or visionOS, where the width can be adjusted by the user? Is it possible to achieve more complex curved surface effects (such as scroll unfolding or bending) using Shaders or other techniques? Thank you very much for your help!

Spatial Computing ARKit Vision visionOS

1

0

276

3w

Vision Framework Causes EXC_BREAKPOINT Error in Xcode App Playground (.swiftpm) File

I’m trying to use the Vision framework in a Swift Playground to perform face detection on an image. The following code works perfectly when I run it in a regular Xcode project, but in an App Playground, I get the error: Thread 12: EXC_BREAKPOINT (code=1, subcode=0x10321c2a8) Here's the code: import SwiftUI import Vision struct ContentView: View { var body: some View { VStack { Text("Face Detection") .font(.largeTitle) .padding() Image("me") .resizable() .aspectRatio(contentMode: .fit) .onAppear { detectFace() } } } func detectFace() { guard let cgImage = UIImage(named: "me")?.cgImage else { return } let request = VNDetectFaceRectanglesRequest { request, error in if let results = request.results as? [VNFaceObservation] { print("Detected \(results.count) face(s).") for face in results { print("Bounding Box: \(face.boundingBox)") } } else { print("No faces detected.") } } let handler = VNImageRequestHandler(cgImage: cgImage, options: [:]) do { try handler.perform([request]) // This line causes the error. } catch { print("Failed to perform Vision request: \(error)") } } } The error occurs on this line: try handler.perform([request]) Details: This code runs fine in a normal Xcode project (.xcodeproj). I'm using an App Playground instead (.swiftpm). The image is being included in the .xcassets folder. Is there any way I can mitigate this issue? Please do not recommend switching to .xcodeproj, as I am making a submission for Apple's Swift Student Challenge, and they require that I use .swiftpm.

Developer Tools & Services Xcode Swift Swift Playgrounds Xcode Vision

1

0

222

4w

Vision - Time travel door

Hello All, We're going to do a scene now, kind of like a time travel door. When the user selects the scene, the user passes through the door to show the current scene. The changes in the middle need to be more natural. It's even better if you can walk through an immersive space... There is very little information now. How can I start doing this? Is there any information I can refer to thanks

Spatial Computing ARKit Vision

2

0

288

Dec ’24

How to adjust the volume of an Entity

After I played the audio for the entity the sound was very low and I wanted to adjust the sound size. No api is found. What should I do if let audio = audioResources { entity.playAudio(audio) }

Spatial Computing ARKit Vision

1

0

250

Dec ’24

Crash inside of Vision framework during VNImageRequestHandler use

Hello, I've been dealing with a puzzling issue for some time now, and I’m hoping someone here might have insights or suggestions. The Problem: We’re observing an occasional crash in our app that seems to originate from the Vision framework. Frequency: It happens randomly, after many successful executions of the same code, hard to tell how long the app was working, but in some cases app could run for like a month without any issues. Devices: The issue doesn't seem device-dependent (we’ve seen it on various iPad models). OS Versions: The crashes started occurring with iOS 18.0.1 and are still present in 18.1 and 18.1.1. What I suspected: The crash logs point to a potential data race within the Vision framework. The relevant section of the code where the crash happens: guard let cgImage = image.cgImage else { throw ... } let request = VNCoreMLRequest(model: visionModel) try VNImageRequestHandler(cgImage: cgImage).perform([request]) // <- the line causing the crash Since the code is rather simple, I'm not sure what else there could be missing here. The images sent here are uniform (fixed size). Model is loaded and working, the crash occurs random after a period of time and the call worked correctly many times. Also, the model variable is not an optional. Here is the crash log: libobjc.A objc_exception_throw CoreFoundation -[NSMutableArray removeObjectsAtIndexes:] Vision -[VNWeakTypeWrapperCollection _enumerateObjectsDroppingWeakZeroedObjects:usingBlock:] Vision -[VNWeakTypeWrapperCollection addObject:droppingWeakZeroedObjects:] Vision -[VNSession initWithCachingBehavior:] Vision -[VNCoreMLTransformer initWithOptions:model:error:] Vision -[VNCoreMLRequest internalPerformRevision:inContext:error:] Vision -[VNRequest performInContext:error:] Vision -[VNRequestPerformer _performOrderedRequests:inContext:error:] Vision -[VNRequestPerformer _performRequests:onBehalfOfRequest:inContext:error:] Vision -[VNImageRequestHandler performRequests:gatheredForensics:error:] OurApp ModelWrapper.perform And I'm a bit lost at this point, I've tried everything I could image so far. I've tried to putting a symbolic breakpoint in the removeObjectsAtIndexes to check if some library (e.g. crash reporter) we use didn't do some implementation swap. There was none, and if anything did some method swizzling, I'd expect that to show in the stack trace before the original code would be called. I did peek into the previous functions and I've noticed a lock used in one of the Vision methods, so in my understanding any data race in this code shouldn't be possible at all. I've also put breakpoints in the NSLock variants, to check for swizzling/override with a category and possibly messing the locking - again, nothing was there. There is also another model that is running on a separate queue, but after seeing the line with the locking in the debugger, it doesn't seem to me like this could cause a problem, at least not in this specific spot. Is there something I'm missing here, or something I'm doing wrong? Thanks in advance for your help!

App & System Services General Vision

4

3

253

Dec ’24

Immersive Space not working

if I set UIApplicationPreferredDefaultSceneSessionRole to UISceneSessionRoleImmersiveSpaceApplication then my Immersive Space for image is working fine but when I try with UIWindowSceneSessionRoleApplication this option and try to open Immersive space on particular sub screen then its not showing image in immersive space(Immersive space not open). Any one have idea what the issue. <key>UIApplicationSceneManifest</key> <dict> <key>UIApplicationPreferredDefaultSceneSessionRole</key> <string>UIWindowSceneSessionRoleApplication</string> <key>UIApplicationSupportsMultipleScenes</key> <true/> <key>UISceneConfigurations</key> <dict> <key>UISceneSessionRoleImmersiveSpaceApplication</key> <array> <dict> <key>UISceneInitialImmersionStyle</key> <string>UIImmersionStyleFull</string> </dict> </array> </dict> </dict> My info.plist value as above

Spatial Computing General Swift Vision UIKit

1

0

332

Dec ’24

VisionKit: Improve barcode scanning accuracy

Hi all, I am developing an app that scans barcodes using VisionKit, but I am facing some difficulties. The accuracy level is not at where I hope it to be at. Changing the “qualityLevel” parameter from balanced to accurate made the barcode reading slightly better, but it is still misreading some cases. I previously implemented the same barcode scanning app with AVFoundation, and that had much better accuracy. I tested it out, and barcodes that were read correctly with AVFoundation were read incorrectly with VisionKit . Is there anyway to improve the accuracy of the barcode reading in VisionKit? Or is this something that is built in and the developer cannot change? Either way, any ideas on how to improve reading accuracy would help. Thanks in advance!

UI Frameworks SwiftUI Vision VisionKit

0

178

Dec ’24

Facing issue during open Immersive Space

When I try to open Immersive space I got error like below:- HALC_ProxyIOContext::IOWorkLoop: skipping cycle due to overload How to solve it any idea?

Spatial Computing General Vision RealityKit visionOS

1

0

266

Dec ’24

How to Retrieve VisualLookUp Results (e.g., Object Name) in VisionKit?

Hi everyone, I'm working on an iOS app that uses VisionKit and I'm exploring the .visualLookUp feature. Specifically, I want to extract the detailed information that Visual Look Up provides after identifying an object in an image (e.g., if the object is a flower, retrieve its name; if it’s a clothing tag, get the tag's content).

Machine Learning & AI Core ML Image I/O Vision VisionKit

1

0

241

2w

BarcodeObservation Orientation

Hi, I'm working with vision framework to detect barcodes. I tested both ean13 and data matrix detection and both are working fine except for the QuadrilateralProviding values in the returned BarcodeObservation. TopLeft, topRight, bottomRight and bottomLeft coordinates are rotated 90° counter clockwise (physical bottom left of data Matrix, the corner of the "L" is returned as the topLeft point in observation). The same behaviour is happening with EAN13 Barcode. Did someone else experienced the same issue with orientation? Is it normal behaviour or should we expect a fix in next releases of the Vision Framework?

Machine Learning & AI General Vision

4

0

277

2w

Inference with non-square Images

I'm trying to set up Facebook AI's "Segment Anything" MLModel to compare its performance and efficacy on-device against the Vision library's Foreground Instance Mask Request. The Vision request accepts any reasonably-sized image for processing, and then has a method to produce an output at the same resolution as the input image. Conversely, the MLModel for Segment Anything accepts a 1024x1024 image for inference and outputs a 1024x1024 image for output. What is the best way to work with non-square images, such as 4:3 camera photos? I can basically think of 3 methods for accomplishing this: Scale the image to 1024x1024, ignoring aspect ratio, then inversely scale the output back to the original size. However, I have a big concern that squashing the content will result in poor inference results. Scale the image, preserving its aspect ratio so its minimum dimension is 1024, then run the model multiple times on a sliding 1024x1024 window and then aggregating the results. My main concern here is the complexity of de-duping the output, when each run could make different outputs based on how objects are cropped. Fit the image within 1024x1024 and pad with black pixels to make a square. I'm not sure if the border will muck up the inference. Anyway, this seems like it must be a well-solved problem in ML, but I'm having difficulty finding an authoritative best practice.

Machine Learning & AI Core ML Vision

0

263

Dec ’24

Running out of memory analyzing images with ImageRequestHandler

Hi, I'm trying to analyze images in my Photos library with the following code: func analyzeImages(_ inputIDs: [String]) { let manager = PHImageManager.default() let option = PHImageRequestOptions() option.isSynchronous = true option.isNetworkAccessAllowed = true option.resizeMode = .none option.deliveryMode = .highQualityFormat let concurrentTasks=1 let clock = ContinuousClock() let duration = clock.measure { let group = DispatchGroup() let sema = DispatchSemaphore(value: concurrentTasks) for entry in inputIDs { if let asset=PHAsset.fetchAssets(withLocalIdentifiers: [entry], options: nil).firstObject { print("analyzing asset: \(entry)") group.enter() sema.wait() manager.requestImage(for: asset, targetSize: PHImageManagerMaximumSize, contentMode: .aspectFit, options: option) { (result, info) in if let result = result { Task { print("retrieved asset: \(entry)") let aestheticsRequest = CalculateImageAestheticsScoresRequest() let fingerprintRequest = GenerateImageFeaturePrintRequest() let inputImage = result.cgImage! let handler = ImageRequestHandler(inputImage) let (aesthetics,fingerprint) = try await handler.perform(aestheticsRequest, fingerprintRequest) // save Results print("finished asset: \(entry)") sema.signal() group.leave() } } else { group.leave() } } } } group.wait() } print("analyzeImages: Duration \(duration)") } When running this code, only two requests are being processed simultaneously (due to to the semaphore)... However, if I call the function with a large list of images (>100), memory usage balloons over 1.6GB and the app crashes. If I call with a smaller number of images, the loop completes and the memory is freed. When I use instruments to look for memory leaks, it indicates no memory leaks are found, but there are 150+ VM:IOSurfaces allocated by CMPhoto, CoreVideo and CoreGraphics @ 35MB each. Shouldn't each surface be released when the task is complete?

Machine Learning & AI Apple Intelligence Vision

2

0

365

Dec ’24

Post

Replies

Boosts

Views

Activity

Vision

Posts under Vision tag

Post

Replies

Boosts

Views

Activity