Question about speech recognition with Siri

Hi Sir,


I just do some research about speech recognition now, I already got some answer from Apple's Machine Learning Journal - "Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistant"


But I still have one more question, as article mentioned; Siri would detect default voice command as "Hey Siri" in local side with small/large DNN.
It will also do some pre-work in local side, like DFT and MFCC.


My question is when user said other sentences, what kinds of data format does Siri upload to cloud? Original voice data or spectrum data after pre-work (e.g. DFT / MFCC).


Thanks.
BR

Blake

Replies

That 'pre-work', as you call, is processing done on the device, for use on the device. Additionally, encapsulated voice data is transmitted off the device.


See h ttp://www.smartplanet.com/blog/smart-takes/say-command-how-voice-recognition-will-change-the-world/19895


Can you explain your question in the context of sirikit & your app(s), thanks.