I confirm that OpenCL_FFT builds fine, but consistently fails all of its tests on Mac OS X 10.11.6 with Xcode-8.0 on MacBook Pro and MacBook Air:
Performance Number GFlops achieved for n = (64, 1, 1), batchsize = 8192 (in GFlops/s, higher is better): 39.0753
Test failed (n=(64, 1, 1), batchsize=8192): out-of-place
Test: rel. L2-error = 228.423112 eps (max=301.685696 eps, min=158.383949 eps)
Performance Number GFlops achieved for n = (1024, 1, 1), batchsize = 8192 (in GFlops/s, higher is better): 37.4591
Test failed (n=(1024, 1, 1), batchsize=8192): out-of-place
Test: rel. L2-error = 387.798345 eps (max=412.124569 eps, min=361.660646 eps)
Performance Number GFlops achieved for n = (1048576, 1, 1), batchsize = 4 (in GFlops/s, higher is better): 26.5989
Test failed (n=(1048576, 1, 1), batchsize=4): out-of-place
Test: rel. L2-error = 888.844705 eps (max=889.142585 eps, min=888.568028 eps)
Performance Number GFlops achieved for n = (1024, 512, 1), batchsize = 8 (in GFlops/s, higher is better): 31.8915
Test failed (n=(1024, 512, 1), batchsize=8): out-of-place
Test: rel. L2-error = 691.362257 eps (max=692.112863 eps, min=690.540764 eps)
Performance Number GFlops achieved for n = (128, 128, 128), batchsize = 1 (in GFlops/s, higher is better): 32.1126
Test failed (n=(128, 128, 128), batchsize=1): out-of-place
Test: rel. L2-error = 592.976522 eps (max=592.976522 eps, min=592.976522 eps)
Performance Number GFlops achieved for n = (16384, 1, 1), batchsize = 4 (in GFlops/s, higher is better): 3.95649
Test failed (n=(16384, 1, 1), batchsize=4): in-place
Test: rel. L2-error = 567.284147 eps (max=569.101265 eps, min=565.834936 eps)
Performance Number GFlops achieved for n = (32, 2048, 1), batchsize = 8 (in GFlops/s, higher is better): 8.28694
Test failed (n=(32, 2048, 1), batchsize=8): in-place
Test: rel. L2-error = 549.607006 eps (max=551.284672 eps, min=548.424223 eps)
Performance Number GFlops achieved for n = (4096, 64, 1), batchsize = 4 (in GFlops/s, higher is better): 7.81609
Test failed (n=(4096, 64, 1), batchsize=4): in-place
Test: rel. L2-error = 682.976938 eps (max=683.220153 eps, min=682.730587 eps)
Performance Number GFlops achieved for n = (64, 32, 16), batchsize = 1 (in GFlops/s, higher is better): 2.45741
Test failed (n=(64, 32, 16), batchsize=1): out-of-place
Test: rel. L2-error = 380.520545 eps (max=380.520545 eps, min=380.520545 eps) Program ended with exit code: 0
I tried to play with buffer limits, to no avail.
This might help figuring out what is wrong:
Number of OpenCL platforms: 1
-------------------------
Platform: Apple Vendor: Apple Version:
OpenCL 1.2 (Jun 30 2016 20:18:53)
Number of devices: 3
-------------------------
Name: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
Version: OpenCL C 1.2
Max. Compute Units: 8
Local Memory Size: 32 KB
Global Memory Size: 16384 MB
Max Alloc Size: 4096 MB
Max Work-group Total Size: 1024
Max Work-group Dims: ( 1024 1 1 )
-------------------------
-------------------------
Name: HD Graphics 4000
Version: OpenCL C 1.2
Max. Compute Units: 16
Local Memory Size: 64 KB
Global Memory Size: 1536 MB
Max Alloc Size: 384 MB
Max Work-group Total Size: 512
Max Work-group Dims: ( 512 512 512 )
-------------------------
-------------------------
Name: GeForce GT 650M
Version: OpenCL C 1.2
Max. Compute Units: 2
Local Memory Size: 48 KB
Global Memory Size: 1024 MB
Max Alloc Size: 256 MB
Max Work-group Total Size: 1024
Max Work-group Dims: ( 1024 1024 64 )
-------------------------
-------------------------
-------------------------
It would be nice if Apple stepped up to this and fixed their own demo code.