Apple OpenCL FFT: Test failed

Hello All,


I don't know if this is the right place to ask...


In my project (OSX) I have to execute a lot of FFTs (1-dimensional, radix-2): up to 60 times per second and the can become quite big: FFT-size 512k or bigger. For now I'm using the Accelerate framework for this but I'm looking for ways to do this more efficient and faster.


Naturally I'm looking at OpenCL and the FFT example apple has provided here: https://developer.apple.com/library/mac/samplecode/OpenCL_FFT/Introduction/Intro.html


When I tried this example on my MacBook Pro (15" 2014) using Iris Pro GPU the program is reporting "Test failed" for every test run. Which means that the result of the OpenCL FFT is different from the result of the Accelerate FFT.


If I run the same test on my iMac 27" which has an GeForce GTX 680MX GPU all test runs are passing successfully.


Is there any explanation for this?


I'm running OSX 10.10.5...


Thanks for your help!

Replies

There are lots of possibilities and lots of questions.

-Are the OS versions the same between your iMac and MPB?

-Are you using Nvidia web driver by chance?

-Are you running the sample code as is?

-What is the buffer limit for the Iris Pro? You could be creating a buffer to large for that GPU. OpenCL has a limit based on the GPUs VRAM.

-Lastly, OpenCL kernels are compiled for each device differently. This can cause strange issues between vendors. There might be something a miss with the kernel that is causing garbage output for intel.

Hello,


thanks for replying..


- I'm using the same OS version on both machines..

- And I did not download and install any drivers from 3rd party (Nvidia or Intel)..

- I use the sample code as it is.. One small change as the macro MAX was already defined I made the definition conditional (#ifndef MAX...)

- Where do I look for the buffer limit?? The Iris Pro has 1536 MB VRAM.. But I don't know if this could be the reason as even with very small FFTs (length 64) the test is failing..

- Regarding your last point it seems that I can't do anything..

Thank you!

I confirm that OpenCL_FFT builds fine, but consistently fails all of its tests on Mac OS X 10.11.6 with Xcode-8.0 on MacBook Pro and MacBook Air:

Performance Number GFlops achieved for n = (64, 1, 1), batchsize = 8192 (in GFlops/s, higher is better): 39.0753 
Test failed (n=(64, 1, 1), batchsize=8192): out-of-place 
Test: rel. L2-error = 228.423112 eps (max=301.685696 eps, min=158.383949 eps)

Performance Number GFlops achieved for n = (1024, 1, 1), batchsize = 8192 (in GFlops/s, higher is better): 37.4591 
Test failed (n=(1024, 1, 1), batchsize=8192): out-of-place 
Test: rel. L2-error = 387.798345 eps (max=412.124569 eps, min=361.660646 eps) 

Performance Number GFlops achieved for n = (1048576, 1, 1), batchsize = 4 (in GFlops/s, higher is better): 26.5989 
Test failed (n=(1048576, 1, 1), batchsize=4): out-of-place 
Test: rel. L2-error = 888.844705 eps (max=889.142585 eps, min=888.568028 eps) 

Performance Number GFlops achieved for n = (1024, 512, 1), batchsize = 8 (in GFlops/s, higher is better): 31.8915 
Test failed (n=(1024, 512, 1), batchsize=8): out-of-place 
Test: rel. L2-error = 691.362257 eps (max=692.112863 eps, min=690.540764 eps) 

Performance Number GFlops achieved for n = (128, 128, 128), batchsize = 1 (in GFlops/s, higher is better): 32.1126 
Test failed (n=(128, 128, 128), batchsize=1): out-of-place 
Test: rel. L2-error = 592.976522 eps (max=592.976522 eps, min=592.976522 eps) 

Performance Number GFlops achieved for n = (16384, 1, 1), batchsize = 4 (in GFlops/s, higher is better): 3.95649 
Test failed (n=(16384, 1, 1), batchsize=4): in-place 
Test: rel. L2-error = 567.284147 eps (max=569.101265 eps, min=565.834936 eps) 

Performance Number GFlops achieved for n = (32, 2048, 1), batchsize = 8 (in GFlops/s, higher is better): 8.28694 
Test failed (n=(32, 2048, 1), batchsize=8): in-place 
Test: rel. L2-error = 549.607006 eps (max=551.284672 eps, min=548.424223 eps) 

Performance Number GFlops achieved for n = (4096, 64, 1), batchsize = 4 (in GFlops/s, higher is better): 7.81609 
Test failed (n=(4096, 64, 1), batchsize=4): in-place 
Test: rel. L2-error = 682.976938 eps (max=683.220153 eps, min=682.730587 eps) 

Performance Number GFlops achieved for n = (64, 32, 16), batchsize = 1 (in GFlops/s, higher is better): 2.45741 
Test failed (n=(64, 32, 16), batchsize=1): out-of-place 
Test: rel. L2-error = 380.520545 eps (max=380.520545 eps, min=380.520545 eps) Program ended with exit code: 0


I tried to play with buffer limits, to no avail.


This might help figuring out what is wrong:


Number of OpenCL platforms: 1 

------------------------- 
Platform: Apple Vendor: Apple Version: 
OpenCL 1.2 (Jun 30 2016 20:18:53) 
Number of devices: 3
        ------------------------- 
                   Name: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
                   Version: OpenCL C 1.2
                   Max. Compute Units: 8 
                   Local Memory Size: 32 KB 
                   Global Memory Size: 16384 MB  
                   Max Alloc Size: 4096 MB 
                   Max Work-group Total Size: 1024 
                   Max Work-group Dims: ( 1024 1 1 )
        -------------------------
        ------------------------- 
                   Name: HD Graphics 4000
                   Version: OpenCL C 1.2 
                   Max. Compute Units: 16 
                   Local Memory Size: 64 KB 
                   Global Memory Size: 1536 MB 
                   Max Alloc Size: 384 MB 
                   Max Work-group Total Size: 512 
                   Max Work-group Dims: ( 512 512 512 ) 
        -------------------------
        ------------------------- 
                  Name: GeForce GT 650M 
                  Version: OpenCL C 1.2 
                  Max. Compute Units: 2 
                  Local Memory Size: 48 KB 
                  Global Memory Size: 1024 MB 
                  Max Alloc Size: 256 MB 
                  Max Work-group Total Size: 1024 
                  Max Work-group Dims: ( 1024 1024 64 )  
        -------------------------
        ------------------------- 

------------------------- 


It would be nice if Apple stepped up to this and fixed their own demo code.

For some reason Apple does not let the previous posting with details. The short summary is: the problem is likely to be with the parameters that OpenCL_FFT uses for the GPU configuration.