Why does referencing a class inside loop cause memory crash

I'm looping through all pages in a PDFDocument (300+ pages) but app crashes with



> Message from debugger: Terminated due to memory issue



The pdf is approx 4mb in size yet each iteration of the loop jumps the memory up approx 30mb. That memory is never reclaimed because the loop ends i get the crash.



@objc func scrapePDF(){
      
        let documentURL = self.documentDisplayWebView!.url!
        let document = PDFDocument(url: documentURL)
        let numberOfPages = document!.pageCount
         
        DispatchQueue.global().async {
          
            for pageNumber in 1...numberOfPages {
              
               print(document?.page(at: pageNumber)!.string!)
              
            }
        }
    }





I found that rather than passing a reference to the `PDFDocument` inside the loop, if instead I create a new instance for each loop this strangely solves the memory issue. I don't quite understand why though. `PDFDocument` is a Class not a Struct so is passed by reference. Meaning it is only created once and then referenced to inside my loop. So why would it cause a memory issue?




@objc func scrapePDF(){
  
        let documentURL = self.documentDisplayWebView!.url!
        let document = PDFDocument(url: documentURL)
        let numberOfPages = document!.pageCount
  
        DispatchQueue.global().async {
  
            for pageNumber in 1...numberOfPages {
               let doc = PDFDocument(url: documentURL)
               print(doc?.page(at: pageNumber)!.string!)
  
            }
        }
    }



Though the above code clears the memory issue the problem with it is that its too slow. Each loop takes 0.5 seconds and with 300+ pages I can't accept that. Any tips on speeding it up? Or why it doesn't give the memory back if referencing the `PDFDocument` from outside the loop



have tried autoreleasepool{} no effect

Replies

You could try an autoreleasepool. For example h ttps://stackoverflow.com/a/25880106

Just for investigation:


1. what happens if you do not dispatch (not a final solution because it would block the main loop, but just to check).


2. If you put evrything in the dispatch:


@objc func scrapePDF(){
  
        DispatchQueue.global().async { 
  
             let documentURL = self.documentDisplayWebView!.url!
             let doc = PDFDocument(url: documentURL)
             // let document = PDFDocument(url: documentURL)
             let numberOfPages = doc!.pageCount
  
            for pageNumber in 1...numberOfPages {
               print(doc?.page(at: pageNumber)!.string!)
            }
        }
    }

sorry i tried auorelease pool but zero effect. Ill update Q to show

no change. still runs out of memory.

Could you try to print just 10 pages and check if memory is released at the end ?


I suspect that in each loop you create a new page, which contains an imageview, hence 30 MB. ANd it is not released as long as it is not printed.


Could a lazy initialization help ?

yes when I run just 10 pages at a time i get the memory back. if no other solution i may end up writing some logic to do just that.

Pages contain just text. no images.

Could you try this


            for pageNumber in 1...numberOfPages {
               let s = doc?.page(at: pageNumber)!.string!
               print(s)
            }

same issue.

example of each pages text


Duty:620 Mon-Thu DutyLength:06:37 On: 06:09 NEA Finish: 13:16 WPK

Run From

356 06:34

06:563⁄4

364 07:51

07:561⁄2 09:021⁄2 09:583⁄4 10:11

331 12:151⁄2

12:221⁄2

To

NEADTF Ety 06:433⁄4

Next

SFD STA

Relieved at 07:431⁄2 SFD

STA WHM E SDJAR SFDTSW

WPK S

SFD

SFD

STA 09:573⁄4 WHM E Ety 10:033⁄4 SDJAR Ety 10:19

WPK N

07:431⁄2

Stand 07:561⁄2

Meal Relief 11:00 to 11:30 at SFD

SFD Stand 12:221⁄2 SFD

SFD

Finish at 13:16

13:09 WPK N STA

Relieved at 13:09

08:541⁄2

RESTRICTED 32

I tested in some code.


pageNumber should start at 0 and end at pageNumber-1


This is what I included:

            for pageNumber in 0..<numberOfPages {
                if let s = doc?.page(at: pageNumber)?.string {
                    Swift.print(s)
                } else {
                    print("not found")
                }
            }


I tested on a 300 pages pdf, no problem

Appreciated. However im still running out of memory.

At which page do you crash (add a print of page number) ?


But the code of scrapePDF is probably not the problem (as it works really OK for me).


How do you call scrapePDF() ?

crashes at page 324 as thats when the memory reaches 2gb and i get the error. I call the method via a tapGesture on the pdf. I can use a pdf with fewer pages and it doesnt crash but i cant gurantee pdf page numbers so will need it to work on larger files too.

This worked for me, printing all to the console:


    @objc func scrapePDF(){
         
          let documentURL = Bundle.main.url(forResource: "DocumentationPT", withExtension: "pdf")!
          let doc = PDFDocument(url: documentURL)
          let numberOfPages = doc!.pageCount
          DispatchQueue.global().async {
               Swift.print(numberOfPages)
               for pageNumber in 0..<numberOfPages {
                    if let s = doc?.page(at: pageNumber)?.string {
                         Swift.print(s)
                    } else {
                         print("not found")
                    }
               }
          }
     }
   
    @IBAction func handleTap(recognizer:UITapGestureRecognizer) {
        print("Button Tapped")
        scrapePDF()
    }


Documentation is 338 pages, with lot of images, so a big 50 MB file.


I copied the file in the project bundle

have tried

autoreleasepool{}
no effect

Where exactly did you put that

autoreleasepool
call? The code you’ve posted — and the code in all the responses on this thread — doesn’t show this, and the location is vital. If you’re looping many times calling Objective-C, you must have an autorelease pool inside the loop. And for big operations like this, it’s a good idea to have an autorelease pool at the top of the operation as well (more on this below). So it should look something like this:
DispatchQueue.global().async {
    autoreleasepool {
        let doc = PDFDocument(url: documentURL)!  
        let numberOfPages = doc.pageCount  
        for pageNumber in 0..<numberOfPages {
            autoreleasepool {
                print(doc.page(at: pageNumber)!.string!)  
            }
        }
    }
}

If you still crash after that, you can use Instruments to go hunting for the source of any remaining memory problems.

The reason why you need an autorelease pool at the top level relates to an odd interaction between dispatch queues and threads:

  • An autorelease pool is owned by a thread and, in the absence of nested autorelease pools, that thread cleans up its top-level autorelease pool when it goes idle (after which it typically terminates, but that relationship is complex and something that we don’t need to get into here).

  • A dispatch queue is run by a thread. Dispatch maintains a pool of threads to run its various queues. These threads only clean up their autorelease pool when they go idle. However, these threads tend to stay active as they’re reused to run closures on queues throughout your app.

  • So, if you drop memory into an autorelease pool from the top-level of a closure dispatched to a queue, it can be a while before that memory gets released. This can unnecessarily increase your app’s memory high water mark.

There two ways around this:

  • Put an autorelease pool at the top of your closure.

  • Use a custom serial queue with the autorelease frequency set (requires iOS 10 or later).

In the snippets above I’ve used the first approach because you’re using the global concurrent queue. However, in a real app you’d want to be doing this work on a custom queue anyway and, once you do that, setting the autorelease frequency is easy.

let queue = DispatchQueue(label: …, autoreleaseFrequency: .workItem)

You should also take that opportunity to set the queue’s QoS.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

I did the test with and without autorelease.


I do not see any difference on this small app (230 MB max memory use).

If I call repeatidly by tapping a lot (24 times), memory grows over 2 GB. And stays at 1.15 GB at the end, even with autorelease.