Hello,
I am dabbing my feet in core ML and I am not so sure about the best way to tackle a language processing issue.
I need to tokenize based on multiple words items.
IE: "I like Ferrari Testarossa"
In order to match "Ferrari Testarossa" I have tried two segmentation strategies:
["I","like","Ferrari Testarossa"]
["person", "like", "car"]
and
["I","like","Ferrari", "Testarossa"]
["person", "like", "car", "car"]
After generating the model, I realized that NSLinguisticTagger can only divide into words OR sentences (or bigger), nothing in between.
So with either strategies, If I type
"I love Lamborghini Contact"
I get
["I","love","Lamborghini","Contact"]
["person", "like","car","car"]
The first segmentation strategy (put tokens with 2 words in the same item) would be more suited to my UX, but I can deal with the second, it is quite trivial to join tags.
I was just wondering, which is the best way to feed the model generator, Machine Learning-wise?
Is there a difference in feeding the model "Ferrari Testarossa" and "Ferrari", "Testarossa" ?
I am dabbing my feet in core ML and I am not so sure about the best way to tackle a language processing issue.
I need to tokenize based on multiple words items.
IE: "I like Ferrari Testarossa"
In order to match "Ferrari Testarossa" I have tried two segmentation strategies:
["I","like","Ferrari Testarossa"]
["person", "like", "car"]
and
["I","like","Ferrari", "Testarossa"]
["person", "like", "car", "car"]
After generating the model, I realized that NSLinguisticTagger can only divide into words OR sentences (or bigger), nothing in between.
So with either strategies, If I type
"I love Lamborghini Contact"
I get
["I","love","Lamborghini","Contact"]
["person", "like","car","car"]
The first segmentation strategy (put tokens with 2 words in the same item) would be more suited to my UX, but I can deal with the second, it is quite trivial to join tags.
I was just wondering, which is the best way to feed the model generator, Machine Learning-wise?
Is there a difference in feeding the model "Ferrari Testarossa" and "Ferrari", "Testarossa" ?