My app would significantly benefit from being able to identify sentences in text. So I'm trying NLTokenizer, since the api makes it looks like it could do that. I'm not able to obtain sentences as tokens. However, if I change the unit to words or paragraphs, I do get words and paragraphs respectively. Am I missing something or is this a bug?
Here's some small example code:
I expected to get:
But what I actually get is:
I found a blog where someone had claimed they had iterated the sentences using NLTokenizer, but when I examined his output, he had actually enumerated the words.
macOS 10.15.6 beta 1
Here's some small example code:
Code Block swift let source = "It was many and many a year ago, in a kingdom by the sea. \"Quiet\", said the raven." let tokenizer = NLTokenizer(unit: .sentence) tokenizer.string = source tokenizer.setLanguage(.english) print("begin") let tokens = tokenizer.tokens(for: source.startIndex..<source.endIndex).map({ range in return source[range] }) print(tokens) print("end")
I expected to get:
Code Block begin ["It was many and many a year ago, in a kingdom by the sea.", "\"Quiet\", said the raven."] end
But what I actually get is:
Code Block begin [] end
I found a blog where someone had claimed they had iterated the sentences using NLTokenizer, but when I examined his output, he had actually enumerated the words.
macOS 10.15.6 beta 1