Submitting failing kernels without host code ?

Question for the Apple staff : taking into account the defects of the Metal compiler on the AMD GPUs, can we submit you directly some compute kernels (that don't work but should work on AMD and work on Nvidia) without the host code via the bug report site ?


Cause it is really difficult for us to isolate these kernels and the host code from our main project. If it's useful for you, and could help to improve the compiler, this is also simpler for us, and will allow us to sent you more data.

Accepted Reply

Yes. When in doubt, always file a Radar.

Replies

Yes. When in doubt, always file a Radar.

One other option is to use Xcode to take a frame capture from a system that does work, and send that our way.

GPU frame capture and Metal instrument are not available for OSX or did I miss something ?

GPU frame capture of Metal apps is fully supported for OSX.


Cheers,

Seth.

We are talking about Metal pure compute apps, here. There is no frame...

Moreover, the "Capture GPU frame" item of the debug menu remains grey.


Do you maintain that we can use the capture tool with a pure compute apps ?

And about Metal instrument ?


----- EDIT ----


OK... Found it 🙂... If your application is a pure player with no drawable, use insertDebugCaptureBoundary for injecting artificial frame boundaries.


This leads me to the next question :


We had to disable the "produce debugging information" of the "Metal Compiler - Build options" of our project, due to this error:


MetalLink MY_APP/Contents/Resources/default.metallib
    cd "MY_APP_PATH"
    /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/usr/bin/metallib -o MY_APP/Contents/Resources/default.metallib /Users/.../Build/Intermediates/MY_APP.build/Debug/MY_APP.build/Metal/default.metal-ar
Command /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/usr/bin/metallib failed with exit code 11


Any clue ?

Usually, metallib will generate an error message if something is wrong but it seems to retun 11. Please file a radar with the default.metal-ar or the shader and we can investigate why it is failing.

Thanks wmp...

As it is difficult to say which of the shaders of our project is the source of this error, could we just enclose the default.metal-ar of our entire app with the radar ?

I am also seeing the same errror. "metallib failed with exit code 11"


This is large metal shader which does fancy ray-tracing. - but it does work under iOS. It's failing on the Mac.


This makes it pretty much impossible to solve the issue, because I can't reason about why it is failing.

If a particular function is called, the linker fails. If it stubbed out, it works. But beyond that, I am stuck.


I submitted a Radar (22700700) last week.


Just re-tried this in Xcode7.1 beta 2. No change.

Just tried Xcode 7.1 Beta 3


Still seeing...


Command /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/usr/bin/metallib failed with exit code 11


When building on the Mac.

Meticulous line-by-line commenting out, gets me down to the most innocuous line.


Are there any workarounds for this EXIT CODE 11 issue? My project has had to be shelved until this is resolved.



float2 castRay(float3 ro, float3 rd )
{
    float2 res = float2( 0.1 , 0.1);
    float mat = -1.00;
    float tmin = 1.0;
    float tmax = 20.0;
   
   
    float precis = 0.002;
    float t = tmin;
    for( int i = 0 ; i < 50 ; i++ )
    {
        res = float2(0.1,0.1);
        if ( res.x < precis || t > tmax ) break;       
        mat = res.y;      <-- If this line is commented out the shader links.
        t += res.x;
    }
    
   
    if ( t > tmax ) mat = -1.0;
    return float2( t, mat );
}

Please remember that there 3 GPU families on Mac OS X each with their own unique driver bugs.


Your kernels might work on one device and fail on others. I have seen kernels that crah on 2011 AMD but work fine on 2013 and later AMD GPUs.

2011 AMD GPUs cant handle much thread divergence I have found.


Recommendation:

1) Replace the float2 res variable with say float resx & resy. (Intermixing scalar and vector operations might trigger a compiler bug - a hunch)

2) Your for loop with the break statement is going to cause extreme thread divergence.

Remember that in a thread block typically 16 hardware threads will execute this kernel in lock step fashion.

Rewrite this loop to remove the break like so:


if ( res.x >= precis && t >= tmax ) {

mat = res.y;

t += res.x;

}


Now no more thread divergence. I believe the break statement was the source of your problem.


Also the check for res.x < precis will ALWAYS be false -- look at your code - both are constant values

Looks like the res.x variable can be discarded - just use the constant value in your code.