this week i was watching https://developer.apple.com/videos/play/wwdc2024/10105/
with the amazing "configuration" feature to change the color or mesh straight in quick look, but i tried a lot with goarounds but nothing bring me to success
how do i write in the usda files?
anytiome i overwrite the usda even with just a "{}" inside... Reality composer pro rejects the file to be open again
where is the developer man in the tutorial writing the usda?
how is the usda compressed in usdz? (none of the compressors i tried accepeted the modified usda file)
this is the code it's suggested in the video
#usda 1.0
(
defaultPrim = "iPhone"
)
def Xform "iPhone" (
variants = {
string Color = "Black_Titanium"
}
prepend variantSets = ["Color"]
)
{
variantSet "Color" = {
"Black_Titanium" { }
"Blue_Titanium" { }
"Natural_Titanium" { }
"White_Titanium" { }
}
}
but i dont understand how to do it with my own files,
Vision
RSS for tagApply computer vision algorithms to perform a variety of tasks on input images and video using Vision.
Posts under Vision tag
104 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
I'm playing with the new Vision API for iOS18, specifically with the new CalculateImageAestheticsScoresRequest API.
When I try to perform the image observation request I get this error:
internalError("Error Domain=NSOSStatusErrorDomain Code=-1 \"Failed to create espresso context.\" UserInfo={NSLocalizedDescription=Failed to create espresso context.}")
The code is pretty straightforward:
if let image = image {
let request = CalculateImageAestheticsScoresRequest()
Task {
do {
let cgImg = image.cgImage!
let observations = try await request.perform(on: cgImg)
let description = observations.description
let score = observations.overallScore
print(description)
print(score)
} catch {
print(error)
}
}
}
I'm running it on a M2 using the simulator.
Is it a bug? What's wrong?
The DINO v1/v2 models are particularly interesting to me as they produce embeddings for the detected objects rather than ordinary classification indexes.
That makes them so much more useful than the CNN based models.
I would like to prepare some of the models posted on Huggingface to run on Apple Silicon, but it seems that the default conversion with TorchScript will not work. The other default conversions I've looked at so far also don't work. Conversion based on an example input doesn't capture enough of the model.
I know that some have managed to convert it as I have a demo with a coreml model that seems to work, but I would like to know how to do the conversion myself.
Has anyone managed to convert any of the DINOv2 models?
In my app, I'm setting the app language to Japanese. On iPhone and iPad, the Apple sign up button still displays correctly in Japanese:
"Appleでサインイン". However, when running on vision Pro, the sign up button displays in English with the text "Sign up with Apple". Is this a vision pro error?
The code setting the app language is Japanese:
`CFBundleDevelopmentRegion
ja_JP
Photos displayed on iphone and vision pro.
On Iphone:
On vision Pro
code signup button apple:
(signInAppleButton as UIControl).cornerRadius = 2.0
let tapGesture = UITapGestureRecognizer(target: self, action: #selector(handleAuthorizationAppleIDButtonPress))
signInAppleButton.addGestureRecognizer(tapGesture)
signInAppleStackView.insertArrangedSubview(signInAppleButton, at: 0)
Hope you can help me why Vision Pro displays English text?
The new Mac virtual display feature on visionOS 2 offers a curved/panoramic window. I was wondering if this is simply a property that can be applied to a window, or if it involves an immersive mode or SceneKit/RealityKit?
I am trying to create demo for spatial meeting using persona also refer apple videos, But not getting clear idea about it.
Any one could you please guide me step by step process or any code are appreciated for learning.
I am doing below code for getting thumbnail from usdz model using the QuickLookThumbnailing, But don't get the proper out.
guard let url = Bundle.main.url(forResource: resource, withExtension: withExtension) else{
print("Unable to create url for resource.")
return
}
let request = QLThumbnailGenerator.Request(fileAt: url, size: size, scale: 10.0, representationTypes: .all)
let generator = QLThumbnailGenerator.shared
generator.generateRepresentations(for: request) { thumbnail, type, error in
DispatchQueue.main.async {
if thumbnail == nil || error != nil {
print(error)
}else{
let tempImage = Image(uiImage: thumbnail!.uiImage)
print(tempImage)
self.thumbnailImage = Image(uiImage: thumbnail!.uiImage)
print("=============")
}
}
}
}
Below Screen Shot for selected model :
Below is the thumbnail image, which not come with guitar but get only usdz icon.
I want to get thumbnail image from USDZ model from vision os, But it will get image without material apply. Here is my code
import Foundation
import SceneKit
import SceneKit.ModelIO
class ARQLThumbnailGenerator {
private let device = MTLCreateSystemDefaultDevice()!
/// Create a thumbnail image of the asset with the specified URL at the specified
/// animation time. Supports loading of .scn, .usd, .usdz, .obj, and .abc files,
/// and other formats supported by ModelIO.
/// - Parameters:
/// - url: The file URL of the asset.
/// - size: The size (in points) at which to render the asset.
/// - time: The animation time to which the asset should be advanced before snapshotting.
func thumbnail(for url: URL, size: CGSize, time: TimeInterval = 0) -> UIImage? {
let renderer = SCNRenderer(device: device, options: [:])
renderer.autoenablesDefaultLighting = true
if (url.pathExtension == "scn") {
let scene = try? SCNScene(url: url, options: nil)
renderer.scene = scene
} else {
let asset = MDLAsset(url: url)
let scene = SCNScene(mdlAsset: asset)
renderer.scene = scene
}
let image = renderer.snapshot(atTime: time, with: size, antialiasingMode: .multisampling4X)
self.saveImageFileInDocumentDirectory(imageData: image.pngData()!)
return image
}
func saveImageFileInDocumentDirectory(imageData : Data){
var uniqueID = UUID().uuidString
let tempPath = NSSearchPathForDirectoriesInDomains(FileManager.SearchPathDirectory.documentDirectory, FileManager.SearchPathDomainMask.userDomainMask, true)
let tempDocumentsDirectory: AnyObject = tempPath[0] as AnyObject
let uniqueVideoID = uniqueID + "image.png"
let tempDataPath = tempDocumentsDirectory.appendingPathComponent(uniqueVideoID) as String
try? imageData.write(to: URL(fileURLWithPath: tempDataPath), options: [])
}
}
I'm wondering if it's possible to implement object tracking on Vision Pro using the Vision framework of Apple? I see that the Vision documentation offers a variety of classes for computer vision which have a tag "visionOS", but all the example codes in the documentation are only for iOS, iPadOS or macOS. So can those classes also be used for developing Vision Pro apps? If so, how do they get data feed from the camera of Vision Pro?
I'm taking my iOS/iPadOS app and converting it so it runs on visionOS. I’m trying to compile my app, build it, for both visionOS and iOS. When I try to build for an iPhone and iPad simulator, I get the following error:
 Building for 'iphonesimulator', but realitytool only supports [xros, xrsimulator]
I’m thinking I might need to do a # if conditional compilation statement for visionOS so iOS doesn’t try to build lines of code but I can’t for this particular error find out for which file or code I need to do the conditional compilation. Anyone know how to get rid of this error? 
I am trying to store usdz files with SwiftData for now.
I am converting usdz to data, then storing it with SwiftData
My model
import Foundation
import SwiftData
import SwiftUI
@Model
class Item {
var name: String
@Attribute(.externalStorage)
var usdz: Data? = nil
var id: String
init(name: String, usdz: Data? = nil) {
self.id = UUID().uuidString
self.name = name
self.usdz = usdz
}
}
My function to convert usdz to data. I am currently a local usdz just to test if it is going to work.
func usdzData() -> Data? {
do {
guard let usdzURL = Bundle.main.url(forResource: "tv_retro", withExtension: "usdz") else {
fatalError("Unable to find USDZ file in the bundle.")
}
let usdzData = try Data(contentsOf: usdzURL)
return usdzData
} catch {
print("Error loading USDZ file: \(error)")
}
return nil
}
Loading the items
@Query private var items: [Item]
...
var body: some View {
...
ForEach(items) { item in
HStack {
Model3D(?????) { model in
model
.resizable()
.scaledToFit()
} placeholder: {
ProgressView()
}
}
}
...
}
How can I load the Model3D?
I have tried:
Model3D(data: item.usdz)
Gives me the errors:
Cannot convert value of type '[Item]' to expected argument type 'Binding<C>'
Generic parameter 'C' could not be inferred
Both errors are giving in the ForEach.
I am able to print the content inside item:
ForEach(items) { item in
HStack {
Text("\(item.name)")
Text("\(item.usdz)")
}
}
This above works fine for me.
The item.usdz prints something like Optional(10954341 bytes)
I would like to know 2 things:
Is this the correct way to save usdz files into SwiftData? Or should I use FileManager? If so, how should I do that?
Also how can I get the usdz from the storage (SwiftData) to my code and use it into Model3D?
hello
I am trying to detect the orientation of text in images. (each image has a label with a number but sometimes the the label is not in the right orientation and I would like two detect these cases and add a prefix to the image files)
this code is working well but when the text is upside down it considers that the text is well oriented
is it a way to distinguish the difference ?
thanks for your help !
import SwiftUI
import Vision
struct ContentView: View {
@State private var totalImages = 0
@State private var processedImages = 0
@State private var rotatedImages = 0
@State private var remainingImages = 0
var body: some View {
VStack {
Button(action: chooseDirectory) {
Text("Choisir le répertoire des images")
.padding()
}
Text("TOTAL: \(totalImages)")
Text("TRAITEES: \(processedImages)")
Text("ROTATION: \(rotatedImages)")
Text("RESTANT: \(remainingImages)")
}
.padding()
}
func chooseDirectory() {
let openPanel = NSOpenPanel()
openPanel.canChooseDirectories = true
openPanel.canChooseFiles = false
openPanel.allowsMultipleSelection = false
openPanel.begin { response in
if response == .OK, let url = openPanel.url {
processImages(in: url)
}
}
}
func processImages(in directory: URL) {
DispatchQueue.global(qos: .userInitiated).async {
do {
let fileManager = FileManager.default
let urls = try fileManager.contentsOfDirectory(at: directory, includingPropertiesForKeys: nil)
let imageUrls = urls.filter { $0.pathExtension.lowercased() == "jpg" || $0.pathExtension.lowercased() == "png" }
DispatchQueue.main.async {
self.totalImages = imageUrls.count
self.processedImages = 0
self.rotatedImages = 0
self.remainingImages = self.totalImages
}
for url in imageUrls {
self.processImage(at: url)
}
} catch {
print("Error reading contents of directory: \(error.localizedDescription)")
}
}
}
func processImage(at url: URL) {
guard let image = NSImage(contentsOf: url), let cgImage = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else {
return
}
let request = VNRecognizeTextRequest { (request, error) in
if let error = error {
print("Error recognizing text: \(error.localizedDescription)")
return
}
if let results = request.results as? [VNRecognizedTextObservation], !results.isEmpty {
let orientationCorrect = self.isTextOrientationCorrect(results)
if !orientationCorrect {
self.renameFile(at: url)
DispatchQueue.main.async {
self.rotatedImages += 1
}
}
}
DispatchQueue.main.async {
self.processedImages += 1
self.remainingImages = self.totalImages - self.processedImages
}
}
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
try handler.perform([request])
} catch {
print("Error performing text recognition request: \(error.localizedDescription)")
}
}
func isTextOrientationCorrect(_ observations: [VNRecognizedTextObservation]) -> Bool {
// Placeholder for the logic to check text orientation
// This should be implemented based on your specific needs
for observation in observations {
if let recognizedText = observation.topCandidates(1).first {
let boundingBox = observation.boundingBox
let angle = atan2(boundingBox.height, boundingBox.width)
if abs(angle) > .pi / 4 {
return false
}
}
}
return true
}
func renameFile(at url: URL) {
let fileManager = FileManager.default
let directory = url.deletingLastPathComponent()
let newName = "ROTATION_" + url.lastPathComponent
let newURL = directory.appendingPathComponent(newName)
do {
try fileManager.moveItem(at: url, to: newURL)
} catch {
print("Error renaming file: \(error.localizedDescription)")
}
}
}
struct ContentView_Previews: PreviewProvider {
static var previews: some View {
ContentView()
}
}
I faced a problem during development that I could not scan Code39 barcode with iPad using Vision. A sample label I used for test has multiple Code39 barcode on it and I could scan almost all barcodes except for specific one.
And when I use conventional barcode scanner and free apps to scan barcode, I could scan the barcode with no problem. I failed to scan the barcode only when I use Vision function.
Has anyone faced similar situation?
Do you know the cause why specific barcode could not be scanned with iPad with Vision?
Hi,
I face a problem that I could not scan a specific Code 39 barcode with Vision framework. We have multiple barcode in a label and almost all Code 39 can be scanned, but not for specific one.
One more information, regardless the one that is not recognized with Vision can be read by a general barcode scanner.
Have anyone faced similar situation?
Is there unique condition to make it hard to scan the barcode when using Vision?(size, intensity, etc)
Regards,
I know that I can use face detect with CoreML, but I'm wandering that is there any to identify the same person between two images like Photos app.
xtension Entity {
func addPanoramicImage(for media: WRMedia) {
let subscription = TextureResource.loadAsync(named:"image_20240425_201630").sink(
receiveCompletion: {
switch $0 {
case .finished: break
case .failure(let error): assertionFailure("(error)")
}
},
receiveValue: { [weak self] texture in
guard let self = self else { return }
var material = UnlitMaterial()
material.color = .init(texture: .init(texture))
self.components.set(ModelComponent(
mesh: .generateSphere(radius: 1E3),
materials: [material]
))
self.scale *= .init(x: -1, y: 1, z: 1)
self.transform.translation += SIMD3(0.0, -1, 0.0)
}
)
components.set(Entity.WRSubscribeComponent(subscription: subscription))
}
func updateRotation(for media: WRMedia) {
let angle = Angle.degrees( 0.0)
let rotation = simd_quatf(angle: Float(angle.radians), axis: SIMD3<Float>(0, 0.0, 0))
self.transform.rotation = rotation
}
struct WRSubscribeComponent: Component {
var subscription: AnyCancellable
}
}
case .failure(let error): assertionFailure("(error)")
Thread 1: Fatal error: Error Domain=MTKTextureLoaderErrorDomain Code=0 "Image decoding failed" UserInfo={NSLocalizedDescription=Image decoding failed, MTKTextureLoaderErrorKey=Image decoding failed}
Does anyone have a ready-made script/shortcut like the one shown in the video?
Hello! I am developing an app that leverages Apple's 2D pose estimation model and I would love to speak with someone about if my mobile app should leverage Apple's 3D pose estimation model.
Also, I would love to know if Apple considers adding more points on the body as this would be incredibly helpful. Or if it is possible for me to train the model to add more body points.
Thanks so much and please let me know if anyone is available to discuss.
For example: we use DocKit for birdwatching, so we have an unknown field distance and direction.
Distance = ?
Direction = ?
For example, the rock from which the observation is made. The task is to recognize the number of birds caught in the frame, add a detection frame and collect statistics.
Question:
What is the maximum number of frames processed with custom object recognition?
If not enough, can I do the calculations myself and transfer to DokKit for fast movement?
I want to set collection in curve view with fix paging in vision pro, How can i do?