Can NLTokenizer handle .sentence s?

Question

Created Jun ’20

Replies 1

Boosts 0

Views 1.1k

Participants 1

My app would significantly benefit from being able to identify sentences in text. So I'm trying NLTokenizer, since the api makes it looks like it could do that. I'm not able to obtain sentences as tokens. However, if I change the unit to words or paragraphs, I do get words and paragraphs respectively. Am I missing something or is this a bug?

Here's some small example code:

Code Block swiftlet source = "It was many and many a year ago, in a kingdom by the sea. \"Quiet\", said the raven."
let tokenizer = NLTokenizer(unit: .sentence)
tokenizer.string = source
tokenizer.setLanguage(.english)
print("begin")
let tokens = tokenizer.tokens(for: source.startIndex..<source.endIndex).map({ range in
	return source[range]
})
print(tokens)
print("end")

I expected to get:

Code Block begin 
["It was many and many a year ago, in a kingdom by the sea.", "\"Quiet\", said the raven."]
end

But what I actually get is:

Code Block begin
[]
end

I found a blog where someone had claimed they had iterated the sentences using NLTokenizer, but when I examined his output, he had actually enumerated the words.

macOS 10.15.6 beta 1

Boost

Answer 1

benspratling4 OP

Aug ’21

This is fixed in macOS 11. (tested in 11.5.1)

0