iOS 18.x PDFKit Page.string and Page.attributedString return some text way out of oder

Reading text out of PDFs with PDFKit results in some text being returned way out of order when using .string or .attributedString functions. Way out of order means not just wrong sorting of words on a line or wrongly showing up on the next line (as has happened with PDFKit on older iOS releases, e.g. 17.x), but some text (one or more words) may show up near the end of a page of text, while it should show near the beginning.

As Page.characterBounds(at:) is buggy in iOS 18.x returns wrong bounds, devs cannot correct such faulty PDFKit behaviour programmatically.

I believe it is on Apple to fix this iOS 18 bug asap. Thank you for giving it priority as this is killing apps that need PDFKit to get and parse text data out of PDFs.

I have filed Feedback FB16264926.

Answered by DTS Engineer in 820472022

Hello and thank you for filing a bug report about these problems.

Looking in at the status of your bug report it looks like you have described a number of separate problems but you have included them both in the same bug report - a problem with PDFPage.attributedString, a problem with PDFPage.string, and a problem with Page.characterBounds(at:). Because of the way bugs are handled I recommend filing separate bugs, one for each specific issue. Arguably, PDFPage.attributedString and PDFPage.string are very similar in function, but they are separate APIs and most likely it would be useful to have separate bugs about each. That way, each bug will be focused on one particular problem and easier to rationalize as separate investigations. Also, in each of your bugs, please include a small Xcode project and some directions that can be used to reproduce the problems.

Once you have filed bug reports, please post the bug reference numbers here. I'll make sure they get routed to the right team.

Hello and thank you for filing a bug report about these problems.

Looking in at the status of your bug report it looks like you have described a number of separate problems but you have included them both in the same bug report - a problem with PDFPage.attributedString, a problem with PDFPage.string, and a problem with Page.characterBounds(at:). Because of the way bugs are handled I recommend filing separate bugs, one for each specific issue. Arguably, PDFPage.attributedString and PDFPage.string are very similar in function, but they are separate APIs and most likely it would be useful to have separate bugs about each. That way, each bug will be focused on one particular problem and easier to rationalize as separate investigations. Also, in each of your bugs, please include a small Xcode project and some directions that can be used to reproduce the problems.

Once you have filed bug reports, please post the bug reference numbers here. I'll make sure they get routed to the right team.

Thank you for your detailed reply and suggestion to separate into two feedbacks!

I will split/update the feedback as soon as I can manage.

FYI, I have filed the Page.characterBounds(at:) bug under FB14843671 already in August '24 during the beta cycle. But have so far seen neither any Apple reaction, bug confirmation nor a bug fix.

That made me wonder that I missed something or even reported a non-bug. But looking into the issue again in more detail I have to reconfirm my view that there is an Apple framework issue.

This means that with iOS 18 release the large majority of the users of my app is completely stripped of a core feature, i.e. importing, semantically analysing and parsing PDF scripts (of actors) because of that bug. Really tough to cope with.

I have sent in two new bug reports, one for Page.string (FB16313297) and one for Page.attributedString (FB16313295). I included screenshots and the sample project that shows the issue in code.

Thank you for forwarding these reports to the right people!

I really hope that those bugs, especially the characterBounds(at:) bug FB14843671 are fixed soon!

iOS 18.x PDFKit Page.string and Page.attributedString return some text way out of oder
 
 
Q