Hi,
If I understand your question correctly, you are trying to animate your avatar's mouth based on captured audio signals. I recommend you have a look at ARKit face tracking instead.
This lets you track facial expressions in real-time which you can apply to the 3D model.
When ARKit detects your face, it creates an ARFaceAnchor
which has a dictionary of blendShapes
. The different blend shape coefficients correspond to facial features. For example, you could use the value of jawOpen
to determine how much the mouth is opened, and use this to animate the model.
For a very basic example how to animate a simple model based on blend shapes, check out this developer sample.