Analyzing Text

Is it possible to use Applescript to analyze a directory full of text files and report the three most used common nouns in each file?

Theoretically? Sure. AppleScript is Turing complete, after all.

In practice? Yikes, no, no, nope, nein, no.

Natural language processing is no small matter. You'll here fundamentally need to identify exactly which words are nouns with some form of tagging.

As has been used elsewhere as an example of the relative difficulty of this task, this sentence is semantically-valid English: "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo".

Even words you might think of as always being nouns—"butterfly", for instance—can be verbs.

The rest of the problem here is housekeeping; of slogging through a database or some other storage for the identified nouns.

The tagging? Yeah, not gonna try that in AppleScript.

There's a book Speech and Language Processing by Daniel Jurafsky & James H. Martin, 2019, and the author Jurafsky has a draft of chapter 8 Part-of-Speech Tagging available online at Stanford. Alas, the URL isn't allowed here.

AFAIK, macOS contains no APIs for tagging parts of speech.

Analyzing Text
 
 
Q