Hi, I am trying to use the natural language tagger for Japanese. I used this sample code:
and applied it on some dummy data:
The output was:
It looks like every word is just being picked up as OtherWord, and it can detect some punctuation. Is this correct, or will there be an improvement to the Japanese tagger soon so we can differentiate between nouns, verbs, conjunctions, particles, etc. ?
Code Block var stringToRecognize = jpTextView.text let range = stringToRecognize!.startIndex ..< stringToRecognize!.endIndex let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = stringToRecognize tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass) { (tag, range) -> Bool in print("Word [\(stringToRecognize![range])] : \(tag!.rawValue)") return true }
and applied it on some dummy data:
Code Block 東京では11月から、コロナウイルスの病気で入院する人が多くなっています。このため、お腹の中に赤ちゃんがいる看護師も仕事を続けています。家に小さな子どもがいる看護師は、子どもにウイルスがうつらないか心配しながら仕事をしています。
The output was:
Code Block Word [東京] : OtherWord Word [で] : OtherWord Word [は] : OtherWord Word [11] : OtherWord Word [月] : OtherWord Word [から] : OtherWord Word [、] : Punctuation Word [コロナ] : OtherWord Word [ウイルス] : OtherWord Word [の] : OtherWord Word [病気] : OtherWord Word [で] : OtherWord Word [入院] : OtherWord Word [する] : OtherWord Word [人] : OtherWord Word [が] : OtherWord Word [多く] : OtherWord Word [なっ] : OtherWord Word [て] : OtherWord Word [い] : OtherWord Word [ます] : OtherWord Word [。] : SentenceTerminator Word [この] : OtherWord Word [ため] : OtherWord Word [、] : Punctuation Word [お腹] : OtherWord Word [の] : OtherWord Word [中] : OtherWord Word [に] : OtherWord Word [赤ちゃん] : OtherWord Word [が] : OtherWord Word [いる] : OtherWord Word [看護] : OtherWord Word [師] : OtherWord Word [も] : OtherWord Word [仕事] : OtherWord Word [を] : OtherWord Word [続] : OtherWord Word [け] : OtherWord Word [て] : OtherWord Word [い] : OtherWord Word [ます] : OtherWord Word [。] : SentenceTerminator Word [家] : OtherWord Word [に] : OtherWord Word [小さな] : OtherWord Word [子ども] : OtherWord Word [が] : OtherWord Word [いる] : OtherWord Word [看護] : OtherWord Word [師] : OtherWord Word [は] : OtherWord Word [、] : Punctuation Word [子ども] : OtherWord Word [に] : OtherWord Word [ウイルス] : OtherWord Word [が] : OtherWord Word [うつら] : OtherWord Word [ない] : OtherWord Word [か] : OtherWord Word [心配] : OtherWord Word [し] : OtherWord Word [ながら] : OtherWord Word [仕事] : OtherWord Word [を] : OtherWord Word [し] : OtherWord Word [て] : OtherWord Word [い] : OtherWord Word [ます] : OtherWord Word [。] : SentenceTerminator
It looks like every word is just being picked up as OtherWord, and it can detect some punctuation. Is this correct, or will there be an improvement to the Japanese tagger soon so we can differentiate between nouns, verbs, conjunctions, particles, etc. ?