subsampling, const, uniform, arrays and what am I missing?

I'm trying to utilize subsampling to get actual pixels in an area (but not the average) and so far; I keep failing with performance.


My first attempts were to create a kernel from a string and inject an array, using either "const" or "uniform" so that the array wouldn't have to be built for each pixel that gets passed to the kernel.


Neither const, nor uniform seems to work for an array, and the performance was terrible, my guess is because it's rebuilding the array for each pixel.


1425 x 948 image; ~200 samples per pixel, took 3 seconds.


Second attempt was to create an image containing location data; after days, I had it working and while it's faster than the above method, it's still too slow. Probably because for each pixel sampled from the image, it also has to sample the map image, so 200 samples, actually incurs 400 reads from textures.


1425 x 948 image; ~ 200 samples per pixel, took 0.2 of a second.


So what am I missing?


Is there a way to create a vec2/float2 array once, rather than for every single pixel? Is there something else you know of that I'm missing which would speed this whole thing up?


A pre-emptive thank you for any tips or suggestions on this; it's been mighty painful trying to optimize this kernel.


Oh, before I forget: I tried the Separable route and boy does that make it fast! 0.064, but because of the nature of a separable kernel, my results are the worst!

Replies

As a follow-up.


I created two kernels yesterday for a very simply blur of a radius of 10. One kernel was using two loops, and t'other was using predefined locations (it had 440 procedurally generated lines of "avgPixel = sample( image, samplerTransform( destCoord() + vec( -10, -10 ) ) );" )


The double loop was actually quicker; so I assume that having a heavy kernel also drags down performance.