11 Replies
      Latest reply on Dec 9, 2019 8:23 AM by fohx
      LxLasso Level 1 Level 1 (0 points)

        Hi,

         

        We develop a professional creativity software that has some fairly non-trivial Metal shaders. We have a user-base in the thousands - not massive, but decent. It's an expensive software for a niche industry. Our software is a vital tool for our users and we work on this for a living since many years.

         

        We find enough Metal driver issues, causing all sorts of system crashes. We submit bug reports for everything we find.

         

        The problem is the feedback isn't good enough. We've got around 20 bugs submitted now, give and take, some that are complete showstoppers. At the moment we'd have to actively tell our users they will not be able to run our next release on macOS because chances that that they have a GPU with a driver that doesn't crash their system is just too small.

         

        It would be a lot easier to deal with this if the feedback on bug reports was better. Has anyone looked at the problem? Is it solvable? In the pipeline? Anything.

         

        I suppose the dev team is busy up over their ears with all driver issues and the best thing they can do is just dig into it - any time spent on developer relations is time taken from actually fixing issues.

         

        We're out of ideas and the future of our next product release for macOS is in jeopardy. Are we the only ones in this situation? Does anyone have any solutions, or is this simply what it is?

         

        / Lars @ Capture

        • Re: Developer relations?
          KMT Level 9 Level 9 (15,215 points)

          How many of your issues have you raised directly w/DTS?

            • Re: Developer relations?
              LxLasso Level 1 Level 1 (0 points)

              Hi KMT,

               

              Are you referring to code level support? We have used code level support to get help with issues in our code and it has been helpful, but we haven't used these to report metal driver bugs. That would get pretty expensive to be honest.

                • Re: Developer relations?
                  KMT Level 9 Level 9 (15,215 points)

                  My mistake, sorry, I thought your main problem was risk to your users.

                   

                  As a routine, feedback/bug reports only evolve into dialog if the engrs. need more info, I think. Otherwise, status changes, when they occur, are normally communicated. Thanks for helping test.

                    • Re: Developer relations?
                      LxLasso Level 1 Level 1 (0 points)

                      It's no fun at all testing when reporting a kernel panic driver bug on recent hardware gets no attention at all for months. Delivering a professional product based on Metal is a complete nightmare.

                        • Re: Developer relations?
                          sbrodhead Level 1 Level 1 (10 points)

                          Mac OS seems to be the "Ugly Duckling" at Apple  (remember the Hans Christian Anderson fairy tale).

                           

                          I have logged almost 50 bugs in the last 10 years -- almost none of them have ever been closed.

                           

                          To make it worse, you occasionally get feedback on some bugs, so you know somebody at Apple is looking at them.

                          But you are never given any feedback about "if" or "when" the bugs will be fixed.

                           

                          Since I have learned to not expect Apple to fix these bugs, its time to look for workarounds.

                           

                          The three GPU vendors on MacOS, AMD, Intel, and Nvidia, all behave differently and have different sets of bugs.

                           

                          Intel's drivers for OpenCL/OpenGL and Metal all work well. Its run time compiler has big problems with large complicated shaders.

                           

                          For my app, recently AMD's Metal shader compiler has serious problems, but the AMD OpenCL compiler/drivers work very well.

                          (My rendered output looks different if rendered using Metal versus OpenCL. AMD has told me they think it is bugs in the shader compiler's optimizer. This affects all apps -- perhaps you are having the same problem.)

                            • Re: Developer relations?
                              sbrodhead Level 1 Level 1 (10 points)

                              Also I believe that the GPU supplier vendors build and maintain the Metal drivers for Apple.

                              With Apple not being responsive about Metal developer support, you may want to approach the GPU suppliers, AMD, Intel, Nvidia.

                            • Re: Developer relations?
                              sbrodhead Level 1 Level 1 (10 points)

                              Another important factor is that your kernels are not necessarily executed in the order they are submitted. This is true especially if you are queuing kernels from multiple threads. The order of thread execution in general is not predictable.

                               

                              So you need to add explicit synchronization to your kernel work flows to ensure the execution order is what you expect.

                               

                              Modern GPUs can process multiple shader instances in parallel, so this can lead to big surprises, if you dont do synchronization.

                               

                              But improper synchronization can theoretically lead to Deadlock, which would most likely cause GPU panic or even system kenel panic.

                               

                              Check your logs for signs of GPU reset from an internal GPU panic.

                      • Re: Developer relations?
                        sbrodhead Level 1 Level 1 (10 points)

                        Hi, my app, Fractal Architect, has been a showcase for the immense power of Metal/OpenCL/CUDA GPU compute shaders

                        for the last 10 years on MacOS and the app's render engine is in use on Windows as well (OpenCL/CUDA). The app is now in Beta on iOS/iPadOS with Metal as well.

                         

                        So I know exactly what you are going through.

                         

                        Apple's infamous "Wall of Secrecy" means that I have never had an internal contact inside of Apple.

                        I have been directly contacted by AMD, but their hands are tied with their working relationship with Apple.

                         

                        Oddly enough, my testing partner for Fractal Architect, is Lennart Ostman from Harnosand, Sweden. It is a small, small world.

                        I am in the USA.  I see that your company is based out of Sweden.

                         

                        Observations:

                        The pattern on Mac OS is that Metal/OpenCL is stable for 2-3 years, followed by 6-9 months of driver ****. This pattern has repeated several times.

                         

                        Metal is far better supported on iOS/iPadOS. My app's large and very complex compute shaders ported easily to iOS 12/13.

                        I was really surprised by how robust Metal has been on iOS.

                         

                        Catalina has been frankly extremely buggy. But for my app Metal on Catalina has not been problematic. I did have to workaround a couple of serious Catalina OpenCL/Metal bugs.

                         

                        My app also uses classic vertex and fragment shaders on both Metal and OpenGL. I am using them for both 2D and 3D model visualization.

                         

                        We should make contact off this forum. We might be able to help each other out.

                          • Re: Developer relations?
                            LxLasso Level 1 Level 1 (0 points)

                            Indeed, we are based in Sweden!

                             

                            Before we switched to Metal we had a fair amount of issues with OpenGL as well, but I don't think we used to be able to kernel panic the entire OS, it usually just resulted in glitches / undefined behaviour.

                             

                            I'll drop you a mail off-forum!

                          • Re: Developer relations?
                            funnest Apple Staff Apple Staff (145 points)

                            To the OP, can you post the IDs of the showstopper issues?

                              • Re: Developer relations?
                                fohx Level 1 Level 1 (0 points)

                                Hi,

                                 

                                Mathias here, graphics programmer at Capture.

                                 

                                FB7432403 is our most important showstopper right now, internal errors on newer Intel GPUs, and we don’t have any workarounds for this.

                                 

                                FB7466370 - Driver hangs/crashes, or even kernel panics on AMD hardware. (I have a sketchy workaround, but we keep triggering it when adding new features.)

                                 

                                FB6101284, internal error on GeForce GPUs identified by a customer. (Haven’t heard anything since I reported it in May.)

                                 

                                Also, FB6344520 is not a complete showstopper but will prevent us from launching a new feature on AMD hardware in the near future, unless I can find workarounds for it. (And just as I verified if this had been fixed yet or not I found a similar issue on Intel GPUs that I need to report.)