I want to develop an AI assistant ios application using whisper and chatGPT OpenAI apis. I am implementing these following steps.
Audio-engine to record the user's voice
Send audio chunk to Whisper for Speech to Text
Send that text to chatgpt openAI to get response
Now sending that response to Speech Synthesizer to speak response through built-in speaker
In this process, i don't want to disable microphone. Because user can interrupt the speech synthesizer anytime he likes. It should be realtime and look like continuous call between the user and AI assistant.
Problem: When user speaks, microphone takes the input and appends into the audioengine recording file. Then sends that chunk to whisper for transcribing, transcribed text is then sent to chatgpt api to get response and response is sent to speech synthesiser which generates an output on speaker. Issue is that the microphone again takes synthesiser voice from speaker, and create a loop.
What should i possibly do to stop my microphone to not take the input from iphone speaker. Talking tom, callAnnie applications and many other ios applications are continuously using microphone and generating outputs from speaker without overlapping and loop. Suggest the possible ways.
I tried to set all possible ways for setting audio-engine category and settings with record, playback, playandrecord etc. Nothing gives me the solution to avoid speaker voice into my microphone. Technically as I think of microphone should never take the device generated voices. What could be the possible solution. If my approach is wrong also i am open to plenty suggestions and guidance.