How do I determine the attributes of an AttributedString.Runs.Run?
What I'm trying to do
I'm trying to reformat Microsoft docx
(Word) files (appearance) as TEI XML (semantics).
Currently: My lexer ingests the Word file into NSAttributedString
and emits events at every change in visible styles (.enumerateAttributes(in:options:using:)
.). The parser applies hideously ad hoc rules to yield the semantics they reflect. One hopes. (Fun fact: Times New Roman is not in the same family as Times New Roman-Italic.)
The start/stop events are derived from the enumerated style runs. Stacking style runs and enqueueing "stop" events do the rest. This is error-prone and graceless, as you must issue start/stop events for many attributes, each readable through one of three or four mutually-exclusive accessors.
It works (net of a few weeks at every update), but I'd rather not depend on it.
Masochists may see Details, below.
Question
If I switch to Swift Foundation's AttributedString
, how can I iterate runs of common attributes and determine what those attributes are? The mere iteration is easy, but everything I've tried in Playground produces null results or crashes.
It would be absurd if Run
elements were write-only, there must be a tutorial on how to do it. Where can I look?
AttributedString: A new hope
The Swift Foundation framework introduces AttributedString
, which kinda appears might make this easier. It has a property, .runs
, a collection of AttributedString.Runs.Run
that identifies ranges and attributes. It has an extensive and extendable repertoire of attributes to apply to the string-in-progress.
Every attribute is a first-class value. Writing run
properties goes through a consistent, generic interface. Actually, for reading, three or four (functionally identical) interfaces.
For building an AttributedString
, this is a dream. Reading a Run
is a different matter. The API exposes the run Collection
s, but once you get into them, the documentation (as far as I can tell) halts amid a broken field of Type
indices into subscripts; twisty mazes of little attribute names, all alike; used either as subscripts or dynamicCallable
modifiers' Type
, and calls-as-functions, and nearly all generic; which crash if you try to determine whether they are present in the first place..
(The Book says selecting dynamicCallable
, subscript
, or call-as-function could be up for judgment. AttributedString
solves this problem by exposing all of them. This complaint may be unfair, I wish I could tell.)
Everything I've tried in a Playground is wrong. Sometimes the result is null; others, crashes appear as soon as I try to introspect anything. (An "intention" enum
is used, as rawValue
s; unpublished, and too few.
Details (if by miracle this is clearer)
The first stage in my lexer imports the docx
into an NSAttributedString
; the second stage of the lexer could yield a stream of attribute-changes and content-text.
The parser applies rules (ad-hoc, empirical, and often horrific) to extract the semantics and render them into XML. In other words,
LS: Democracy is therefore constrained.
yields the stream
line-break
(short-run-of-bold)
": " (if not part of the bold run)
(signal start-of-attribute)
(runs of characters, possibly enclosing additional runs)
(signal end-of-attribute)
line-break [look-ahead]
and emit (sorry the forum Markdown doesn't pass XML elements)
SPEECH SPEAKER $1
/SPEAKER SPOKEN $2 /SPOKEN /SPEECH`
→
SPEECH SPEAKER LS
/SPEAKER SPOKEN Democracy is therefore constrained. /SPOKEN /SPEECH`
AttributedString.run
yields a collection, Runs, "an iterable view into segments of the attributed string, each of which indicates where a run of identical attributes begins or ends." A relief to me, if true.