Reading Swift/Foundation `AttributedString` attributes

Question

fritza OP

Created May ’23

Replies 0

Boosts 0

Participants 1

How do I determine the attributes of an AttributedString.Runs.Run?

What I'm trying to do

I'm trying to reformat Microsoft docx (Word) files (appearance) as TEI XML (semantics).

Currently: My lexer ingests the Word file into NSAttributedString and emits events at every change in visible styles (.enumerateAttributes(in:options:using:).). The parser applies hideously ad hoc rules to yield the semantics they reflect. One hopes. (Fun fact: Times New Roman is not in the same family as Times New Roman-Italic.)

The start/stop events are derived from the enumerated style runs. Stacking style runs and enqueueing "stop" events do the rest. This is error-prone and graceless, as you must issue start/stop events for many attributes, each readable through one of three or four mutually-exclusive accessors.

It works (net of a few weeks at every update), but I'd rather not depend on it.

Masochists may see Details, below.

Question

If I switch to Swift Foundation's AttributedString, how can I iterate runs of common attributes and determine what those attributes are? The mere iteration is easy, but everything I've tried in Playground produces null results or crashes.

It would be absurd if Run elements were write-only, there must be a tutorial on how to do it. Where can I look?

AttributedString: A new hope

The Swift Foundation framework introduces AttributedString, which kinda appears might make this easier. It has a property, .runs, a collection of AttributedString.Runs.Run that identifies ranges and attributes. It has an extensive and extendable repertoire of attributes to apply to the string-in-progress.

Every attribute is a first-class value. Writing run properties goes through a consistent, generic interface. Actually, for reading, three or four (functionally identical) interfaces.

For building an AttributedString, this is a dream. Reading a Run is a different matter. The API exposes the run Collections, but once you get into them, the documentation (as far as I can tell) halts amid a broken field of Type indices into subscripts; twisty mazes of little attribute names, all alike; used either as subscripts or dynamicCallable modifiers' Type, and calls-as-functions, and nearly all generic; which crash if you try to determine whether they are present in the first place..

(The Book says selecting dynamicCallable, subscript, or call-as-function could be up for judgment. AttributedString solves this problem by exposing all of them. This complaint may be unfair, I wish I could tell.)

Everything I've tried in a Playground is wrong. Sometimes the result is null; others, crashes appear as soon as I try to introspect anything. (An "intention" enum is used, as rawValues; unpublished, and too few.

Details (if by miracle this is clearer)

The first stage in my lexer imports the docx into an NSAttributedString; the second stage of the lexer could yield a stream of attribute-changes and content-text.

The parser applies rules (ad-hoc, empirical, and often horrific) to extract the semantics and render them into XML. In other words,

LS: Democracy is therefore constrained.

yields the stream

line-break
(short-run-of-bold)
": " (if not part of the bold run)
(signal start-of-attribute)
(runs of characters, possibly enclosing additional runs)
(signal end-of-attribute)
line-break [look-ahead]

and emit (sorry the forum Markdown doesn't pass XML elements)

SPEECH SPEAKER $1 /SPEAKER SPOKEN $2 /SPOKEN /SPEECH`

→

SPEECH SPEAKER LS /SPEAKER SPOKEN Democracy is therefore constrained. /SPOKEN /SPEECH`

AttributedString.run yields a collection, Runs, "an iterable view into segments of the attributed string, each of which indicates where a run of identical attributes begins or ends." A relief to me, if true.

Boost