I'm using NLTokenizer in this code to extract words from text files:
func loadData() {
var wordTokens: Set<String> = []
let tokenizer = NLTokenizer(unit: .word)
tokenizer.string = TextContent.sharedInstance.text.uppercased()
let tokenRanges = tokenizer.tokens(for: tokenizer.string!.startIndex..<tokenizer.string!.endIndex)
for r in tokenRanges {
let word = String(tokenizer.string![r]).trimmingCharacters(in: .whitespacesAndNewlines)
if word.count > 0 {
wordTokens.insert(word)
}
}
It's been working fine for most files, including some that are over 800KB in size. But when I input an even larger one (1.4MB), I don't get anything in the tokenRanges array at line #5. I've checked the tokenizer string, and it is initialized.
I have a limited understanding of threads. But I'm wondering whether the tokenizer starts a background thread at line #5 to do its work, and this thread isn't complete yet when line #6 is executed. If this is what's happening, is it possible to somehow require the thread to complete before proceeding?
I've also tried this using the enumerateTokens function with a closure at line #5, with the same result.