Posts

Post not yet marked as solved
1 Replies
644 Views
There's essentially no activity here, yet surely there are dozens if not hundreds of researchers working with these APIs. Is there a better forum to pursue technical questions?
Posted
by eastgate.
Last updated
.
Post not yet marked as solved
2 Replies
733 Views
I have an NLTextClassifier, trained in English. When asked to classify texts in Chinese, it appears to be significantly slower than when classifying texts in English.Is this in fact the case? I expected the classification to be O(n) in the length of the text to be classified, and for runtime to be independent of language.
Posted
by eastgate.
Last updated
.
Post not yet marked as solved
0 Replies
442 Views
When using the maxEnt (maximum Entropy) algorithm in NLTextClassifier, what is the underlying method? “Maximum entropy” is, if I remember, the metric for Quinlan’s ID3: is that what we're using here? If so, what features are we extracting?My impression is that this is something like “use word2vec as input to ID3”, but is that really right? And where is this written? It’s not just idle curiousity: knowing what’s going on guides how we construct training cases, what’s useful to train, and what sort of errors we have to live with or route around.
Posted
by eastgate.
Last updated
.
Post marked as solved
3 Replies
1.2k Views
I’m working with NLTagger as an easy way to stem words, in order (as they say) to improve our users’ search experience. The results sometimes seem odd.First, suppose the user searches for “strike”, hoping to find “the Bread and Roses strike” and also “Casey has struck out.” We need first to lemmatize the user’s search term, but NLTagger won’t lemmatize the isolated word “strike”. (Appending a space and “this” resolves the issue, but that’s clumsy.)Second, let’s search for “in” in the string “IN THE WEEDS”. The lemmatizer thinks the first word in the string is “Indiana”! OK: all caps is arguably unusual. Let’s try “In The Weeds”. Now, the lemmatizer declines to tag the first word at all.Both these examples are organic — they arose in adapting unit tests for our current, regex-based search. I expect that I'm Doing It Wrong™, but documentation is thin on the ground. (10.14.6 Beta (18G29g) )
Posted
by eastgate.
Last updated
.