Why does referencing a class inside loop cause memory crash

I'm looping through all pages in a PDFDocument (300+ pages) but app crashes with



> Message from debugger: Terminated due to memory issue



The pdf is approx 4mb in size yet each iteration of the loop jumps the memory up approx 30mb. That memory is never reclaimed because the loop ends i get the crash.



@objc func scrapePDF(){
      
        let documentURL = self.documentDisplayWebView!.url!
        let document = PDFDocument(url: documentURL)
        let numberOfPages = document!.pageCount
         
        DispatchQueue.global().async {
          
            for pageNumber in 1...numberOfPages {
              
               print(document?.page(at: pageNumber)!.string!)
              
            }
        }
    }





I found that rather than passing a reference to the `PDFDocument` inside the loop, if instead I create a new instance for each loop this strangely solves the memory issue. I don't quite understand why though. `PDFDocument` is a Class not a Struct so is passed by reference. Meaning it is only created once and then referenced to inside my loop. So why would it cause a memory issue?




@objc func scrapePDF(){
  
        let documentURL = self.documentDisplayWebView!.url!
        let document = PDFDocument(url: documentURL)
        let numberOfPages = document!.pageCount
  
        DispatchQueue.global().async {
  
            for pageNumber in 1...numberOfPages {
               let doc = PDFDocument(url: documentURL)
               print(doc?.page(at: pageNumber)!.string!)
  
            }
        }
    }



Though the above code clears the memory issue the problem with it is that its too slow. Each loop takes 0.5 seconds and with 300+ pages I can't accept that. Any tips on speeding it up? Or why it doesn't give the memory back if referencing the `PDFDocument` from outside the loop



have tried autoreleasepool{} no effect

Replies

Have used your exact code (which was were I had my autorelease pool) but no change still runs out of memory.


it appears that calling .string is what is using all the memory.

I have used this but the issue remains

Can you show the exact and complete code (including tap action)?

in viewDidLoad


let tapPDF = UITapGestureRecognizer(target: self, action: #selector(self.scrapePDF(gesture:)))
self.documentView?.addGestureRecognizer(tapPDF)


then as shown.

You defined scrapePDF without argument

@objc func scrapePDF()


but call it with an argument.

let tapPDF = UITapGestureRecognizer(target: self, action: #selector(self.scrapePDF(gesture:)))


I'm even surprised it compiles.

could it be a bug as when I investigate through instruments its the

[PDFPage string]

that grows and grows in memory size. Rather than giving it back it increases

I am seeing the issue in instruments being the .string call on the PDFPage. My thinkig as to why its not crashing for you is you have images that doent effect the .string call. where as my pdf is all text

Wich versions of XCode, OSX are you using ?


Is it Swift or ObjC code ? (Your call [PDFPage string] is objC, but the code seems to be all Swift)

Xcode Version 9.2 (9C40b)

Mac OS High Sierra 10.13.2 (17C88)

Swift 4 (reference to '[PDFPage string]' is how its shown in instruments. I guess because its bridged to NSString)


error was also on the previous beta of both

Last try: could it be you have zombie enabled ?

No. I checked that. Maybe is bug?

You need to provide more context for the bug. In my tries, everything works OK, I do not see any memory growth, unless I tam franctically ob the image to call scrapePDF.


Did you try to modify:


let tapPDF = UITapGestureRecognizer(target: self, action: #selector(self.scrapePDF(gesture:)))


into


let tapPDF = UITapGestureRecognizer(target: self, action: #selector(self.scrapePDF()))

>> [PDFPage string]

>> that grows and grows in memory size


This is imprecise. That tells you memory increased at a [PDFPage string] method call, but it doesn't tell you what consumed the memory. It's probably the returned string, but it's worth finding out.


In this code:


        for pageNumber in 0..<numberOfPages { 
            autoreleasepool { 
                print(doc.page(at: pageNumber)!.string!)   
            } 
        }


the thing that stands out is the "print" call. AFAIK, a "print" is going to log to a shared console log (shared between threads, I mean), which may mean that for thread safety reasons a reference to the string is going to be held by another thread or queue until the logging of the string is completed. If your queue is running at top speed, it may be preventing the logging from completing, which may in turn cause the strings to accumulate until the end of the dispatched loop. Under those circumstances, draining the autorelease pool here would have no effect.


This is just a theory, but I suggest you try doing something with the string other than a print, such as simply assigning it to a variable.


You should also verify that memory doesn't increase if you don't access the "string" property.

I have tried removing the print function and the memory still rises to 2GB. When I receive the .string call then the memory is fine. It’s also fine to keep the .string call though if I create new instance of the PDFDocument at every iteration of the loop. Problem though in doing so is that it’s far too slow. There seems to something about creating a PDFDocument outside a loop and referencing it inside.

Did you try a text only document. Say with average 500 characters per page? It has to be down to the .string call that obviously depends on how many words you’re PDF has. no I had not modified the tap gesture signature.