Is it will be good case to listen users microphone permanently when app in foreground and when user tell some commands (for example "Make payment", "Add to favorites" ...) app will match it using custom catalog and ShazamKit and make some action (for example open payment screen or add something to favorites).
ShazamKit for actions while app in foreground
ShazamKit performs exact audio matching so would not be a good fit for your use case. I would suggest you take a look at the Speech framework which has a detailed example of how you could achieve what you describe.