Hello everyone, I expose my problem to you. I should make for MacOS, iOS, iPadOS, a 3D woman virtual assistant able to listen to questions and provide answers: a kind of Siri but with a model of a girl in 3D on the screen that has to mimic speech (lip sync) and will have to move according to the needs of the program.
I would like to do it all with Xcode, SwiftUI, SceneKit.
I have already done some good experiments with the Speech Framework for speech recognition with good results. For the spoken part (TTS) I will use an external service.
Here I have a problem: the Speech Framework listens and transcribes even when my app speaks. I would like to be able to mute the microphone when the audio file is playing and unmute it when playback ends.
I also tried to create a 3D female model with Mixamo and exported some animations. I was able to import the animations into an Xcode project and get them to work (https://youtu.be/HJtbUHdPjzQ). Next I want to try to create a model using 3D Object Capture.
I also saw the video session 604 (https://developer.apple.com/videos/play/wwdc2017/604/) which clarified many doubts for me.
What I still haven't understood:
How can I blend multiple animations from code? For example: I could have the animation of the girl walking and the animation of the still girl that she greets by moving her arm and I would like to be able to join them and make her greet while she walks.
If I have the character completely rigged, from the code how can I make my mouth, eyes, etc. move to create a kind of lip sync and facial expressions?
Do you know if there is any good tutorial even for a fee that can fill these gaps? I've also searched Udemy but haven't found a SceneKit course similar to the one I need.
However, I think that the solutions for ARKit or RealityKit can also be fine.