Determine maximum tile size, think it's related to GPU crashes

Is there a way to determine the maximum size of a tile in Core Image filter?


I am still getting crashes from the GPU (see the crash report listed below), which results in either a corrupted output and the app cannot quit (has to be force quit) or what I started to see yesterday were complete app crashes.


Right now, I am working on a theory that I am potentially exceeding the maximum tile size, and wondered if there is a way I can quantify this?


My problematic filter (by using intermediate renders & logging I am able to narrow down to exactly which filter is crashing), reads the surrounding pixels to do some "local" analysis in order to update the central pixel. I've done my best to keep the radius low, but with some images it produces unnatural results when using a small radius, especially when I test it with 50 or 100 megapixel images.


I am using PDS locations to reduce the amount of pixels it needs to sample, & using an image to store those locations (as I don't know how to share an array of vec2 objects between kernels). My earlier problem was that it would only render a portion of the image, this was solved by handling the ROI, but that's when I started to experience crashes. It no longer crashes for radii of 100 or 200 pixels, but going over that (which is required for larger images) in an almost certain way to bring it down (please note that sometimes it doesn't crash).


Running on the CPU, solves this, but at a cost. The CPU (on the machine I'm developing with) is limited to processing 50 megapixel images. While the GPU claims to cope with larger images, I experience these crashes. Currently, processing a 50 megapixel image on the CPU took 10 minutes to render, while the GPU now averages ~ 1/5th of the time, I simply can't trust it.


GPU crash report below.

Wed Feb 13 00:02:24 2019

Event:               GPU Reset
Date/Time:           Wed Feb 13 00:02:24 2019
Application:         <appName>
Path:                
OS Version:          Mac OS X 
Graphics Hardware:   NVIDIA GeForce GT 650M
Signature:           8

Report Data:

NVDA(Graphics): Channel exception! Exception type = 0x8 DMA Engine Error (FIFO Error 8)
Channel Info: [56, 0x1e, 0x11, 0x1e51]
Version Info: [com.apple.GeForce, 10.1.0, 0x7d780b0a, 18894120, 310.42.25f02, 1]

Resource Manager Info:
 4443564e 00000118 8fb28137 f8a4de8f 00000001 00000014 d3793533 46d3a4a6
 4614f297 e71edccf 00088301 000000e1 12f2500a 081d0a4d 1002c197 20001810
 30002800 05dc3800 4805dc40 00500392 00601e58 01080d22 808e8010 81042202
 22188091 1001080d 02808a84 9c820422 0e2250a8 84100108 22028180 80a6b805
 030a0880 0a00149a 1f138222 47100008 d2200118 e1b02806 50004804 60015820
 78007064 01019000 0a000198 00138a03 13923d0a 24380a3a 0e000000 01000000
 490000e0 0100000f 49000000 0000000b 47000000 ff000304 ff000000 ff000000
 ff000000 ff000000 ff000000 0a000000 1d13c220 00100008 dc80a818 1e200bf6
 03300828 b5ade038 059cbbac 48028040 00000013 4443564e

Accelerator Event History:
 0a0808001a04080010010a0808001a04080210010a2b0800122708c080021080f09dd1
 86f0ffffff0118a18fc08e8c87800f20b79d8080c0d80328d2bc808090020a23080012
 1f08c480021080f09dd186f0ffffff0118a18fc08e8c87800f208082800828000a0808
 001a04080210000a0808001a0408001000


I am beyond tired at this point, incredibly frustrated and I really want to quit.


Any help or suggestions you can offer, would be grately appreciated.

Replies

My guess is that this is simply too much work you want to do on the GPU. Spending 2 minutes in a single shader processing a 50 MP image on a 7 year old mobile GPU... I think the GPU watchdog won't like that.

Don't you have any means for reducing the complexity of your algorithm? Or maybe you could implement the tiling yourself and give the system some time to breath between the operations.

Thanks Frank,

I've already done some work to reduce the radius, and have some ideas on what I might be able to do to reduce it even further.


Just to make it clear, the 50 megapixel image never completed on the GPU, the application died. On the CPU is where it took 10 minutes.


I've also done some experiements with my intermediate render code, and have a newer version ready (to try) where I use a Metal device and render direct to a block of memory, then create a CIImage from that chunk of memory (Thanks for suggesting the tip on creating an CIImage from a block of memory btw), it's doing these renders in about 40% of the time of what I was using before (CGBitmapContext). However I've yet to see if this will have any effect within my application.


I did toy with the idea of implimenting a moving box, however I figured that this would probably be slower with something like 50 megapixels.