I’m working with NLTagger as an easy way to stem words, in order (as they say) to improve our users’ search experience. The results sometimes seem odd.
First, suppose the user searches for “strike”, hoping to find “the Bread and Roses strike” and also “Casey has struck out.” We need first to lemmatize the user’s search term, but NLTagger won’t lemmatize the isolated word “strike”. (Appending a space and “this” resolves the issue, but that’s clumsy.)
Second, let’s search for “in” in the string “IN THE WEEDS”. The lemmatizer thinks the first word in the string is “Indiana”! OK: all caps is arguably unusual. Let’s try “In The Weeds”. Now, the lemmatizer declines to tag the first word at all.
Both these examples are organic — they arose in adapting unit tests for our current, regex-based search. I expect that I'm Doing It Wrong™, but documentation is thin on the ground. (10.14.6 Beta (18G29g) )