It seems like eye tracking comes into conflict with pointing devices and the cursor will jump to where the user looks at, which makes trackpad control very janky.
Is there a way to filter eye tracking and just enable pointing devices?
Vision
RSS for tagApply computer vision algorithms to perform a variety of tasks on input images and video using Vision.
Posts under Vision tag
104 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
App A want to know if iPhone user is looking at the photo one of the family member and display some information related to that user?
Hello . Currently, only the ios version is on sale on the App Store. The application is offering an icloud-linked, auto-renewable subscription.
I want to sell to the app store connect with the same identifier, AppID at the same time.
I simply added visionos to the existing app project to provide the visionos version early, but the existing UI-related code and the location-related code are not compatible.
We used the same identifier with the same name, duplicated and optimized only what could be implemented, and created it without any problems on the actual device.
However, when I added the visionos platform to the App Store cennect and tried to upload it through the archive in the app for visionos that I created as an addition, there was an error in the identifier and provisioning, so the upload was blocked.
The result of looking up to solve the problem
App Group
-I found out about the function, but it was judged that a separate app was for an integrated service, so it was not suitable for me.
Add an APP to an existing app project via target and manually adjust the platform in Xcode -> Build Phases -> Compile Soures -> Archive upload success?( I haven't been able to implement this stage of information yet.)
I explained the current situation. Please give me some advice on how to implement it.visionos has a lot of constraints, so you need to take a lot of features off.
I am using VNRecognizeTextRequest to read Chinese characters. It works fine with text written horizontally, but if even two characters are written vertically, then nothing is recognized. Does anyone know how to get the vision framework to either handle vertical text or recognize characters individually when working with Chinese?
I am setting VNRequestTextRecognitionLevel to accurate, since setting it to fast does not recognize any Chinese characters at all. I would love to be able to use fast recognition and handle the characters individually, but it just doesn't seem to work with Chinese. And, when using accurate, if I take a picture of any amount of text, but it's arranged vertically, then nothing is recognized. I can take a picture of 1 character and it works, but if I add just 1 more character below it, then nothing is recognized. It's bizarre.
I've tried setting usesLanguageCorrection = false and tried using VNRecognizeTextRequestRevision3, ...Revision2 and ...Revision1. Strangely enough, revision 2 seems to recognize some text if it's vertical, but the bounding boxes are off. Or, sometimes the recognized text will be wrong.
I tried playing with DataScannerViewController and it's able to recognize characters in vertical text, but I can't figure out how to replicate it with VNRecognizeTextRequest. The problem with using DataScannerViewController is that it treats the whole text block as one item, and it uses the live camera buffer. As soon as I capture a photo, I still have to use VNRecognizeTextRequest.
Below is a code snippet of how I'm using VNRecognizeTextRequest. There's not really much to it and there aren't many other parameters I can try out (plus I've already played around with them). I've also attached a sample image with text laid out vertically.
func detectText(
in sourceImage: CGImage,
oriented orientation: CGImagePropertyOrientation
) async throws -> [VNRecognizedTextObservation] {
return try await withCheckedThrowingContinuation { continuation in
let request = VNRecognizeTextRequest { request, error in
// ...
continuation.resume(returning: observations)
}
request.recognitionLevel = .accurate
request.recognitionLanguages = ["zh-Hant", "zh-Hans"]
// doesn't seem have any impact
// request.usesLanguageCorrection = false
do {
let requestHandler = VNImageRequestHandler(
cgImage: sourceImage,
orientation: orientation
)
try requestHandler.perform([request])
} catch {
continuation.resume(throwing: error)
}
}
}
Currently, I'm reducing the ios version being sold on the App Store with the same app id and identifier to visionos, and I'm trying to upload it using the same identifier, developer id, and icloud, but I can't upload it because of an error. I didn't know what the problem was, so I updated the update of the ios version early, but I uploaded the review without any problems. Uploading the visionos single version has a profile problem, so I would appreciate it if you could tell me the solution.
In addition, a lot of the code used in the ios version is not compatible with visionos, so we have created a new project for visionos.
I want to get only spatial video while open the Photo library in my app. How can I achieve?
One more thing, If I am selecting any video using photo library then how to identify selected video is Spatial Video or not?
self.presentPicker(filter: .videos)
/// - Tag: PresentPicker
private func presentPicker(filter: PHPickerFilter?) {
var configuration = PHPickerConfiguration(photoLibrary: .shared())
// Set the filter type according to the user’s selection.
configuration.filter = filter
// Set the mode to avoid transcoding, if possible, if your app supports arbitrary image/video encodings.
configuration.preferredAssetRepresentationMode = .current
// Set the selection behavior to respect the user’s selection order.
configuration.selection = .ordered
// Set the selection limit to enable multiselection.
configuration.selectionLimit = 1
let picker = PHPickerViewController(configuration: configuration)
picker.delegate = self
present(picker, animated: true)
}
`func picker(_ picker: PHPickerViewController, didFinishPicking results: [PHPickerResult]) {
picker.dismiss(animated: true) {
// do something on dismiss
}
guard let provider = results.first?.itemProvider else {return}
provider.loadFileRepresentation(forTypeIdentifier: "public.movie") { url, error in
guard error == nil else{
print(error)
return
}
// receiving the video-local-URL / filepath
guard let url = url else {return}
// create a new filename
let fileName = "\(Int(Date().timeIntervalSince1970)).\(url.pathExtension)"
// create new URL
let newUrl = URL(fileURLWithPath: NSTemporaryDirectory() + fileName)
print(newUrl)
print("===========")
// copy item to APP Storage
//try? FileManager.default.copyItem(at: url, to: newUrl)
// self.parent.videoURL = newUrl.absoluteString
}
}`
Context
So basically I've trained my model for object detection with +4k images. Under preview I'm able to check the prediction for Image "A" which detects two labels with 100% and its Bounding Boxes look accurate.
The problem itself
However, inside the Swift Playground, when I try to perform object detection using the same model and same Image I don't get same results.
What I expected
Is that after performing the request and processing the array of VNRecognizedObjectObservation would show the very same results that appear in CreateML Preview.
Notes:
So the way I'm importing the model into playground is just by drag and drop.
I've trained the images using JPEG format.
The test Image is rotated so that it looks vertical using MacOS Finder rotation tool.
I've tried, while creating VNImageRequestHandlerto pass a different orientation, with the same result.
Swift Playground code
This is the code I'm using.
import UIKit
import Vision
do{
let model = try MYMODEL_FROMCREATEML(configuration: MLModelConfiguration())
let mlModel = model.model
let coreMLModel = try VNCoreMLModel(for: mlModel)
let request = VNCoreMLRequest(model: coreMLModel) { request, error in
guard let results = request.results as? [VNRecognizedObjectObservation] else {
return
}
results.forEach { result in
print(result.labels)
print(result.boundingBox)
}
}
let image = UIImage(named: "TEST_IMAGE.HEIC")!
let requestHandler = VNImageRequestHandler(cgImage: image.cgImage!)
try requestHandler.perform([request])
} catch {
print(error)
}
Additional Notes & Uncertainties
Not sure if this is relevant, but just in case: I've trained the model using pictures I took from my iPhone using 48MP HEIC format. All photos were on vertical position. With a python script I overwrote the EXIF orientation to 1 (Normal). This was in order to be able to annotate the images using the CVAT tool and then convert to CreateML annotation format.
Assumption #1
Since I've read that Object Detection in Create ML is based on YOLOv3 architecture which inside the first layer resizes the image dimension, meaning that I don't have to worry about using very large images to train my model. Is this correct?
Assumption #2
Also makes me asume that the same thing happens when I try to make a prediction?
Hello, I have been working to try to create a scanner to scan a PDF417 barcode from your photos library for a few days now and have come to a dead end. Every time that I run my function on the photo, my array of observations always returns as []. This example is me trying to use it with an automatic generated image because I think that if it works with this, it will work with a real screenshot. That being said, I have already tried with all sorts of images that aren't pre-generated, and they, still, have failed to work. Code below:
Calling the function
createVisionRequest(image: generatePDF417Barcode(from: "71238-12481248-128035-40239431")!)
Creating the Barcode:
static func generatePDF417Barcode(from key: String) -> UIImage? {
let data = key.data(using: .utf8)!
let filter = CIFilter.pdf417BarcodeGenerator()
filter.message = data
filter.rows = 7
let transform = CGAffineTransform(scaleX: 3, y: 4)
if let outputImage = filter.outputImage?.transformed(by: transform) {
let context = CIContext()
if let cgImage = context.createCGImage(outputImage, from: outputImage.extent) {
return UIImage(cgImage: cgImage)
}
}
return nil
}
Main function for scanning the barcode:
static func desynthesizeIDScreenShot(from image: UIImage, completion: @escaping (String?) -> Void) {
guard let ciImage = CIImage(image: image) else {
print("Empty image")
return
}
let imageRequestHandler = VNImageRequestHandler(ciImage: ciImage, orientation: .up)
let request = VNDetectBarcodesRequest { (request,error) in
guard error == nil else {
completion(nil)
return
}
guard let observations = request.results as? [VNDetectedObjectObservation] else {
completion(nil)
return
}
request.revision = VNDetectBarcodesRequestRevision2
let result = (observations.first as? VNBarcodeObservation)?.payloadStringValue
print("Observations", observations)
if let result {
completion(result)
print()
print(result)
} else {
print(error?.localizedDescription) //returns nil
completion(nil)
print()
print(result)
print()
}
}
request.symbologies = [VNBarcodeSymbology.pdf417]
try? imageRequestHandler.perform([request])
}
Thanks!
After working immersively in vision, can the user adjust immersive mode and transparency in the wallpaper? Will developers be able to arbitrarily adjust transparency with code to see users overlap between reality and immersive mode at the same time?
Currently, visionos is customizing immersive mode in 360-degree full, and I'm looking for a way to adjust it like Apple's basic immersive mode.
I'm unable to figure out how to know when my app no longer has focus. ScenePhase will only change when the WindowGroup gets created or closed.
UIApplication.didBecomeActive and UIApplication.didEnterBackgroundNotification are not called either when say you move focus to Safari.
What's the trick?
i'm struggling to get my app through validation for visionOS. I'm not sure why I'm getting this as all of the appIcons are filled.
I used metal and CompositorLayer to render an immersive space skybox. In this space, the window created by the Swift UI I created only displays the gray frosted glass background effect (it seems to ignore the metal-rendered skybox and only samples and displays the black background). why is that? Is there any solution to display the normal frosted glass background? Thank you very much!
My project is use OC, is a iOS App, and now I need make it to visionOS (not unmodified designed for iPhone).
So a question one, how can I differentiate visionOS by code, need use macro definitions, otherwise, it cannot be compiled.
The question two, have some other tips?or other question need I know?
Thanks.
I want to make icloud backup using SwiftData in VisionOS and I need to use SwiftData first but I get the following error even though I do the following steps
I followed the steps below
I created a Model
import Foundation
import SwiftData
@Model
class NoteModel {
@Attribute(.unique) var id: UUID
var date:Date
var title:String
var text:String
init(id: UUID = UUID(), date: Date, title: String, text: String) {
self.id = id
self.date = date
self.title = title
self.text = text
}
}
I added modelContainer
WindowGroup(content: {
NoteView()
})
.modelContainer(for: [NoteModel.self])
And I'm making inserts to test
import SwiftUI
import SwiftData
struct NoteView: View {
@Environment(\.modelContext) private var context
var body: some View {
Button(action: { // new Note
let note = NoteModel(date: Date(), title: "New Note", text: "")
context.insert(note)
}, label: {
Image(systemName: "note.text.badge.plus")
.font(.system(size: 24))
.frame(width: 30, height: 30)
.padding(12)
.background(
RoundedRectangle(cornerRadius: 50)
.foregroundStyle(.black.opacity(0.2))
)
})
.buttonStyle(.plain)
.hoverEffectDisabled(true)
}
}
#Preview {
NoteView().modelContainer(for: [NoteModel.self])
}
Currently, I try to test the ImageTrackingProvider with the Apple Vision Pro. I started with some basic code:
import RealityKit
import ARKit
@MainActor class ARKitViewModel: ObservableObject{
private let session = ARKitSession()
private let imageTracking = ImageTrackingProvider(referenceImages: ReferenceImage.loadReferenceImages(inGroupNamed: "AR"))
func runSession() async {
do{
try await session.run([imageTracking])
}
catch{
print(error)
}
}
func processUpdates() async {
for await _ in imageTracking.anchorUpdates{
print("test")
}
}
}
I only have one picture in the AR folder. I added the size an I have no error messages in the AR folder.
As I am trying to run the application with the vision Pro, I receive following error:
ar_image_tracking_provider_t <0x28398f1e0>: Failed to load reference image <ARReferenceImage: 0x28368f120 name="IMG_1640" physicalSize=(1.350, 2.149)> with error: Failed to add reference image.
It finds the image, but there seems to be a problem with the loading. I tried the jpeg and the png format.
I do not understand why it fails to load the ReferenceImage.
I use Xcode Version 15.3 beta 3
Hello . I'm currently selling an app. I tried to run the ios version on visionos to provide the same service as the existing app, but a huge amount of incompatible code was generated.
Can I create a visionos app with the same name, add visionos from the App Store connect site, and upload a completely separate app project file?
The iOS app and the visionos app will use the same icloud, in-app payment, and so on. However, we plan to separate the project itself to reduce the size of the user's app itself and optimize it.
Like the basic environment of visionpro, I want to create a surrounding space when I run my app. Like other apps, I want to customize the space like when I run a movie theater or Apple TV, so I want to give users a better experience. Does anyone know the technology or developer documentation?
Hi,
I'm working on developing an app. I need my app to work in a crowd with multiple people (who have multiple hands).
From what I understand, the Vision Framework currently uses a heuristic of "largest hand" to assign as the detected hand. This won't work for my application since the largest hand won't always be the one that is of interest. In fact, the hand of interest will be the one that is pointing.
I know how to train a model using CreateML to identify a hand that is pointing, but where I'm running into issues is that there is no straightforward way to directly override the Vision framework's built-in heuristic of selecting the largest hand when you're solely relying on Swift and Create ML.
I would like my framework to be:
Request hand landmarks
Process image
CreateML reports which hand is pointing
We use the pointing hand to collect position data on the points of the index finger
But within vision's framework, if you set the number of hands to collect data for to 1, it will just choose the largest hand and report position data for that hand only. Of course, the easy work around here is to set it to X number of hands, but within the scope of an IOS device, this is computationally intensive (since my app could be handling up to 10 hands at a time).
Has anyone come up with a simpler solution to this problem or aware of something within visionOS to do it?
Currently, the id of my actual visionpro device is different from the xcode that works on my macmini. I added a visionpro id to xcode, but I can't build it with my visionpro. Can I log out of the existing login to xcode and log in with the same ID as visionpro? However, the appleID created for visionpro does not have an Apple Developer membership, so there is no certificate that makes it run on the actual device. How can I add my visionpro ID from the ID with my apple developer membership to run the xcode project app on visionpro? This is the first time this is the first time.