From what others have said (and my own experience) 16GB of ram on an Intel Mac is the absolute minimum required to even open both XCode and the Vision Pro simulator. 12GB total gets taken up by running even the Hello World example from WWDC23 on Intel when XCode is open to compile it. I haven't heard that anyone has gotten a compiled visionPro app to "drag and drop" work with the simulator, so you need to run it with XCode open, and that's 12GB for XCode + Simulator as I said.
People are saying that even a 16GB M* system works fine, at least with the sample apps, so that's your minimum system for serious development I think, though you might be able to learn visionOS programming with Intel Macs if you enough RAM or at least a system that isn't prone to overheating like my 2018 Intel Mini.
Post
Replies
Boosts
Views
Activity
Its already available on teh web, so something far better than this should simply be built into the system:
https://coolmaterial.com/tech/fingerspelling-machine-learning-to-teach-abcs-american-sign-language/
[note that website will not work with Safari]
https://developer.apple.com/sample-code/wwdc/2023/
My impression is that they can't even do it internally.
Listen to the two VPs from apple (the guy onn the right is the one that started the vOS/AVP projects).
It doesn't sound like they had an easy time producing the keynote video, and if the AVP could do what we were talking about even if only using a private API, it would have been trivial to do: use streams from two AVPs: one showing the woman sitting, and the otehr showing what she is looking at. Add a little bridge animation about putting the AVP on, and you're done. Instead, they make it sound like it was a difficult task:
https://youtu.be/DgLrBSQ6x7E?t=4163
Greg Joswiak: Right, look, because one of the challenges we had in making teh [keynote] video is the fact that we have to take this incredible spatial experience and try to translate it onto a 2D screen. But all the UI you see, all the stuff that we were showing coming out of the device was rendered on device. And its out there, even in the third person view, that's composited onto a scene. So this isn't like us having graphic artists with an M2 ultra coming up with all this stuff this is all coming off of...
Mike Rockwell: It's all rendered real time.
Greg Joswiak: Yeah. Realtime and then that's how we showed it in the film and that's important. That was not fake.
.
Of course, maybe this is meant to be a hint of a future feature that they'll unlock for the release or visionOS x.x, but this is THE killer app that enables all other killer apps, IMHO, and should have been the first thing they showed.
At one point I thought you could get permission for eye-tracking, but I must have misheard. That's the ultimate no, I think. Eye-tracking opens up all sorts of ultra-sophisticated psychological analysis for manipulation and marketing
There has been, and I can’t imagine that I’m in any way alone in this feeling, a complete lack of help from Apple in helping people get started in a meaningful way.
Look at my rants about the inability to stream 2D video out. There's been no official response from Apple, but numerous people have pointed to the existing WWDC info and videos that suggests that Apple doesn't WANT people to be able to share the process of creating their work but only the outcome.
If a programmer could record their programming process via Apple Vision Pro or even live stream, that would go a long way towards making things more accessible to other programers, but the only blessed way that one can let people look over your shoulder is via the extremely limited shareplay process. You can't demo how to do program and record it or even stream it for the entire world to see, but only for the limited audience of people who already own an AVP.
Note the odd wording of the video you were told to watch: https://developer.apple.com/videos/play/wwdc2023/10094/?time=588
"If no spatial personal is found on this device, then no camera frames will return to apps."
Does this mean that ONLY if a shared session is active and has at least one other "spatial personal" present, that it is possible to obtain video?
That you can't stream yourself by yourself performing ad narrating a task, such as performing a complex calculation or playing a game or giving a lecture or... reporting a bug?
You combine this with the odd wording in this interview:
https://youtu.be/DgLrBSQ6x7E?t=4163
Greg Joswiak: Right, look, because one of the challenges we had in making teh [keynote] video is the fact that we have to take this incredible spatial experience and try to translate it onto a 2D screen. But all the UI you see, all the stuff that we were showing coming out of the device was rendered on device. And its out there, even in the third person view, that's composited onto a scene. So this isn't like us having graphic artists with an M2 ultra coming up with all this stuff this is all coming off of...
Mike Rockwell: It's all rendered real time.
Greg Joswiak: Yeah. Realtime and then that's how we showed it in the film and that's important. That was not fake.
.
In other words, it took a lot of work to get that out of the AVP. Had it been built in as I'm asking for, they would have said: "oh yeah, 2D composite video out is an important usecase, so we used the built-in video-out feature and in fact, the scene of the person wearing the AVP was shot by another person wearing the AVP."
Instead, you have this official description of when cameras can be used by developers: "If no spatial personal is found on this device, then no camera frames will return to apps" combined with a detailed discussion of how important it was to get real video (implication how difficult it was to do) and you'll realize that they left off the single most important use-case for early adopters, both developers AND powerusers:
The power users get to put up a youtube of themselves "in world" doing random stuff to show off their new toy while the developers can fire a 2D video of their own app "in world" doing something unexpected and say "this is what we expected and yet THIS is what we actually saw on the device."
.
So there appears to be a severe limitation to what you can show other people who are NOT wearing an AVP.
IF it is at all possible, this has to change.
Filiming yourself doing something and presenting it to others, both in streaming (to a video projector) or as a youtube video is an extremely important usecase and my impression is that this is NOT supported at all.
Now, I undertant that Netflix doesn't want their movies restreamed, but that is trivial to handle:
have a permission for all apps that is automatically set to false — allow 2D streaming — so only apps whose creators want the content streamable can be streamed.
If any app in a space has "allows 2D streaming" set to false, 2D streaming in shared space cannot happen, and the user will need to quit that app.
Only if ALL running apps (including no apps at all) have set "allows 2D streaming" to true, can the user stream their shared space. If they try and an app is running that disallows this, they must first quit that app and try again. HI stuff to be worked out, of course.
I do't know the technical details, but that file is simply used by Xcode when it first starts up.
Just make sure that both are in the same folder once you unxip the XCOde and it will automatically use it to complete its download/install sequence.
I just put both the .xip and *manifest in a folder in Application folder to create less clutter then unxipped the .xip and ran the resulting Xcode.app which assembled all its parts inside the app as they downloaded, guided (I assume) by the *manifest file.
And, what was needed is to wait two forevers. Apparently a xip file is compressed using algorithms that work faster on an M* machine rather than an Intel one. Apparently 20 minutes for that last pixels was not long enough. 30-40 minutes was.
I thought that the SDK would be out at the end of June, but here it is 10 days early and given that I can't install it, I wish they had tested it for another 10 days...
I misunderstood what was said here: https://youtu.be/DgLrBSQ6x7E?t=4162
The external video was normal film apparently. EVen so, if a 2D video representation of what the wearer sees can be streamed to someone else for projection or use on youtube or in TV news for that matter, that's an entire ecosystem of use-cases that are opened up.
https://youtu.be/TX9qSaGXFyg?t=128 seems to show a 3D movie taken outdoors.
As I recall, one of the developer videos shows that apps (but not browsers specifically) can request that the user give permission for eye-tracking.
THis is to prevent webpages and unscrupulous developers from scraping info about your eye movements, which might give AI a lot of info about you personally.
But if a game or drawing app (?) requires eye-tracking, the app can request that capability from the user.