Bizarre issue; When my kernel is run on the GPU and I ask it to process the full sized image (which can be 10mpx, 20mpx, 5mpx images); the resulting image only shows the bottom half being processed by the kernel.
When it's done via the CPU, it does the complete image, just about 10 times slower.
Any one run into this? Any suggestions on how to solve it. Obviously I want to use the GPU as it's about 10x faster than the CPU.