NLTagger support for Japanese

Question

dshum17 OP

Created Dec ’20

Replies 1

Boosts 1

Participants 2

Hi, I am trying to use the natural language tagger for Japanese. I used this sample code:

Code Block var stringToRecognize = jpTextView.text
    let range = stringToRecognize!.startIndex ..< stringToRecognize!.endIndex
    let tagger = NLTagger(tagSchemes: [.lexicalClass])
    tagger.string = stringToRecognize
    tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass) { (tag, range) -> Bool in
      print("Word [\(stringToRecognize![range])] : \(tag!.rawValue)")
      return true
    }

and applied it on some dummy data:

Code Block 東京では１１月から、コロナウイルスの病気で入院する人が多くなっています。このため、お腹の中に赤ちゃんがいる看護師も仕事を続けています。家に小さな子どもがいる看護師は、子どもにウイルスがうつらないか心配しながら仕事をしています。

The output was:

Code Block Word [東京] : OtherWord
Word [で] : OtherWord
Word [は] : OtherWord
Word [１１] : OtherWord
Word [月] : OtherWord
Word [から] : OtherWord
Word [、] : Punctuation
Word [コロナ] : OtherWord
Word [ウイルス] : OtherWord
Word [の] : OtherWord
Word [病気] : OtherWord
Word [で] : OtherWord
Word [入院] : OtherWord
Word [する] : OtherWord
Word [人] : OtherWord
Word [が] : OtherWord
Word [多く] : OtherWord
Word [なっ] : OtherWord
Word [て] : OtherWord
Word [い] : OtherWord
Word [ます] : OtherWord
Word [。] : SentenceTerminator
Word [この] : OtherWord
Word [ため] : OtherWord
Word [、] : Punctuation
Word [お腹] : OtherWord
Word [の] : OtherWord
Word [中] : OtherWord
Word [に] : OtherWord
Word [赤ちゃん] : OtherWord
Word [が] : OtherWord
Word [いる] : OtherWord
Word [看護] : OtherWord
Word [師] : OtherWord
Word [も] : OtherWord
Word [仕事] : OtherWord
Word [を] : OtherWord
Word [続] : OtherWord
Word [け] : OtherWord
Word [て] : OtherWord
Word [い] : OtherWord
Word [ます] : OtherWord
Word [。] : SentenceTerminator
Word [家] : OtherWord
Word [に] : OtherWord
Word [小さな] : OtherWord
Word [子ども] : OtherWord
Word [が] : OtherWord
Word [いる] : OtherWord
Word [看護] : OtherWord
Word [師] : OtherWord
Word [は] : OtherWord
Word [、] : Punctuation
Word [子ども] : OtherWord
Word [に] : OtherWord
Word [ウイルス] : OtherWord
Word [が] : OtherWord
Word [うつら] : OtherWord
Word [ない] : OtherWord
Word [か] : OtherWord
Word [心配] : OtherWord
Word [し] : OtherWord
Word [ながら] : OtherWord
Word [仕事] : OtherWord
Word [を] : OtherWord
Word [し] : OtherWord
Word [て] : OtherWord
Word [い] : OtherWord
Word [ます] : OtherWord
Word [。] : SentenceTerminator

It looks like every word is just being picked up as OtherWord, and it can detect some punctuation. Is this correct, or will there be an improvement to the Japanese tagger soon so we can differentiate between nouns, verbs, conjunctions, particles, etc. ?

Boost

Answer 1

gokul_as OP

Jul ’22

Any updates on this one

0