SFCustomLanguageModelData.CustomPronunciation and X-SAMPA string conversion

Can anyone please guide me on how to use SFCustomLanguageModelData.CustomPronunciation?

I am following the below example from WWDC23 https://wwdcnotes.com/documentation/wwdcnotes/wwdc23-10101-customize-ondevice-speech-recognition/

While using this kind of custom pronunciations we need X-SAMPA string of the specific word. There are tools available on the web to do the same

Word to IPA: https://openl.io/

IPA to X-SAMPA: https://tools.lgm.cl/xsampa.html

But these tools does not seem to produce the same kind of X-SAMPA strings used in demo, example - "Winawer" is converted to "w I n aU @r". While using any online tools it gives - "/wI"nA:w@r/".

Hello @Nishchal, thank you for your post. The Customize on-device speech recognition WWDC video mentions pronunciations are accepted in the form of X-SAMPA strings. If you initialize CustomPronunciation with "/wI"nA:w@r/", do you get the expected result?

The video also states each locale supports a unique subset of pronunciation symbols. To use a different pronunciation, you should check that its constituent components are present in supportedPhonemes(locale:).

Hello @Engineer , Thank you for your response.

I wanted to clarify that the input "/wI"nA:w@r/" does not seem to work in my case. Regarding your point about the limited unique subsets of pronunciation symbols, I will explore this further to see if adjustments to my X-SAMPA string could help improve the functionality of my words.

For your information, my word library primarily consists of U.S. English words. However, I also have a few Spanish words, such as "pan de queso", amounting to a small subset (4–6 words) from Spanish locale.

Given this fact, do you think it is possible to achieve fluent recognition using the Custom Pronunciation feature, assuming I can produce accurate X-SAMPA strings for each word?

Looking forward to your insights.

I experimented with these symbols to better understand their functionality, and the supported symbols for Locale(identifier: "en-US") are as follows:

[".", "Z", "j", "w", "@", "V", "S", "p", "n", "k", "l", "I", "h", "g", "f", "e", "E", "i", "d", "b", "o", "O", "A", ""A", "{", "z", ""@r", "aU", ""aI", "%@r", "aI", "n=", ""E", ""u", "l=", "%U", "OI", "%o", "dZ", ""o", "%i", ""I", "%e", ""e", "%I", "E@", "%E", "%u", "s", "%E@", "u", ""OI", ""E@", "%aU", ""i", "@r", ""U", ""O", "T", "%O", "r", "%A", "m=", "%Q", "m", ""Q", "%{", "Q", ""V", "U", "tS", ""{", "%aI", "v", "ng", "D", ""aU", "t", "%OI"]

Based on my understanding, it seems we need to use symbols from the provided set. I attempted the following implementation:

SFCustomLanguageModelData.CustomPronunciation(
    grapheme: "picon",
    phonemes: ["p i \"A n", "p I k @ n"]
)

However, the app still outputs pecan instead of the intended pronunciation.

Is there a definitive method or tool available to reliably convert words into X-SAMPA phonemes for a specific locale? Without such a resource, the process feels like trial and error, which might not yield consistent results.

Hi @Engineer, it would be super helpful to know how the X-SAMPA strings are actually handled by the framework. E.g. are spaces needed between every single phonem or can you group different phonemes in syllables?

SFCustomLanguageModelData.CustomPronunciation and X-SAMPA string conversion
 
 
Q