iOS 10 'Caused GPU Hang Error (IOAF code 3)'

I've got a Metal app running with Xcode 8, latest Base SDK (10.2), and it runs fine on devices with iOS 9.2. However when running on a device with iOS 10.2.1, the screen draws black and it spits out this error every frame:


Execution of the command buffer was aborted due to an error during execution. Caused GPU Hang Error (IOAF code 3)

I can't find much about that error and I don't know what I should be looking for. It's particularly tough when it runs without error on iOS 9.

Replies

I've still got this problem. iOS 9 runs just fine, iOS 10 doesn't.


I tried combing through Metal Changes for Swift to find out what might be different, but didn't find anything.


If I remove all my draw calls, I can successfully clear the screen to a color. But if any of my submeshes calls 'drawIndexedPrimitives' I get the errors:


Execution of the command buffer was aborted due to an error during execution. Caused GPU Hang Error (IOAF code 3)

Execution of the command buffer was aborted due to an error during execution. Discarded (victim of GPU error/recovery) (IOAF code 5)


Any possible avenues for figuring this out would be welcome.

Hello


This looks suspicious to me: "Any calls to drawIndexedPrimitives". This is just a guess, take for what it is, but when I read "any calls to drawIndexedPrimitives", my intuition is that somehow you have problems with loading your data. What I'd do is to write some extremally simplified debug shader (like, pass-through vertex and fragment drawing just white points), and then use that shader in call to drawIndexedPrimitives. Then, you'd either get some output (meaning that something is wrong "in between" simplified shader and full-blown one). Or you get error still, meaning that either Metal on your OS is completely broken (unlikely) or some change between versions screwed your data loading/generation for example.


Many sophisticated shaders can generate long running loops when fed with wrong data. GPU sees this as a hang and aborts. I used to have list (like single linked list)-following shader. Give it a bad data and GPU Hang Error was very likely outcome.


Hope that helps.

Michal

I've peeled away my draw calls. One of them works: my skybox draw call actually draws. But I made a passthrough vertex + single color fragment (runs fine on iOS 9), and it doesn't work. Is the way that struct sizes are calculated different on iOS 10? That's been a common problem since I'm running Swift and there's no way to directly look at the size of the vertex structs I'm using.


Also I've started getting a new error:


Finalizing CVPixelBuffer 0x1741343c0 while lock count is 1

The memory address is different but it calls each frame. I even get this when I draw a clearColor screen.

Hmm, what do you mean by "no way to look at the size of vertex structs"? Do you mean that you have some layour vertex shader expects, and you're not sure you got this one right? Because sorry, this is like exact science, you have to get it right, or else...could you post you vertex struct layout and how you're preparing that data on the Swift side? Frankly, I know nothing about the Swift, but yeah, something might have changed between iOS 9 and 10 (Swift could change, too) and that could spoil your vertex layout...if done improperly.


Regards

Michal

Unless I'm missing something, the structs declared in .metal files are not visible to the Swift side of things (maybe that's possible now?). So the workaround has been to declare those structs again in a .swift file and comb through to make sure they're the same stride. It's been annoying since the beginning so if there's a proper way I'm missing I'm all ears.

Nope, they're not visible. The proper way is to 1) design these structures taking into account sizes and alignment of member types 2) calculate member offsets carefully 3) fill up their counterparts on CPU side really carefully, taking aforementioned types, sizes and offsets into account. You really shouldn't assume that structure layout from one environment (Swift on 64 bit CPU) will be compatible with another (Metal on 32 bit GPU). This is a no-no. Please paste one of your vertex structures, and Swift code you use to fill the fields, it would be easier to work on concrete example, Regards Michal

Hi,

Could this be the reason ?

It could be your 3d mesh faces which are not proper individual traingles.

Example: if you have a square plane with two triangles. There should be exactly 6 vertices.

Your issue could be different.

Thanks

Here's a code example where I have a Light struct declared in my .metal file, and I mimic it in a Swift file so I can update the buffer each frame.


/*
         the .metal struct:
         */

        struct Light {
            packed_float3 direction;
            packed_float3 ambientColor;
            packed_float3 diffuseColor;
            packed_float3 specularColor;
        };

        /*
         my Swift structs
         */

        struct Float3 {
            var x: Float
            var y: Float
            var z: Float
        }

        struct LightSwift {
            var direction: Float3
            var ambientColor: Float3
            var diffuseColor: Float3
            var specularColor: Float3
        }

        /*
          in my Render.swift file I create a buffer for my lights (here is where I'm referencing the Swift struct):
         */
        let lightBufferSize = MemoryLayout<LightSwift>.stride
        lightBuffer = SharedDevice.device.makeBuffer(length: lightBufferSize * kInFlightCommandBuffers, options: [])
   
        /*
          which is updated in the draw loop to change lighting dynamically:
         */
        let lights = lightBuffer.contents().bindMemory(to: LightSwift.self, capacity: lightBuffer.length / MemoryLayout<LightSwift>.stride)
        lights[bufferIndex].direction.x = ...


In this example all the values are floats, so I guess I could just manually calculate sizes and offsets in the Render.swift side, but what about when a struct has mixed types, like int and float?

Hello


From "Metal Shading Language Specification v 1.2", page 29, packed_float3 has alignment of 4 bytes and size of 12 bytes. So Light will be laid out without any padding: direction.x, direction.y, direction.z, ambientColor.x and so on (this is of course floats). I do not program in Swift, and from what I read there are significant differences between how Swift lays out structs and how C (which I am familiar with) does it. For example, in C size == stride (meaning that if there is some padding inserted at the end of the structure, nothing will be ever laid out there). No so in Swift. So I am not 100% sure - but I suspect that in LightSwift case layout will be the same.


But consider another example (trying to came up with something here as you haven't written on how exactly you mix ints and floats):


struct SomethingMetal {
     float a;
     int b;
};
struct SomethingSwift {
     var a : Float
     var b : Int
}


Now SomethingMetal's layout will be simple: first four bytes will contain float a, and second four bytes will contain 32-bit integer b. sizeof(SomethingMetal) == 8. But look at the situation in Swiff on 64-bit machine: first four bytes will contain float a all right, but then you'll get 4 bytes of padding, and then 8 bytes of 64-bit integer b. So sizeof(SomethingSwift) == 16, and in Swift field b has neither same size nor offset as its Metal counterpart.


From what I read Swift's layout depends on language version, and perhaps this is what "got" you when changing environments? I also read that there is feature that lets you "import" C struct layout into Swift, so I guess to get stable result you could lay out your struct in C (Metal is C after all, you'd just need to be careful about sizes/alignments of types, like sizeof(int) above), and then import these layouts to Swift. Then Swift, even on some changes, would honor proper layout.


Regards

Michal

That's neat that you can import C structs, but you can't import the headers that contain the data types Metal uses (packed_float3, for example).


I understand that you should manually count the size of your structs and use that when creating your MTLBuffers, but my question is how do you update the contents of that buffer? So if you had:


//in .metal...
struct Something {
  packed_float3 a;
  packed_float4 b;
}

//in renderer...
let contents = somethingBuffer.contents().bindMemory(to: Something.self, capacity: somethingBuffer.length / MemoryLayout<Something>.stride)
contents[0].a = ...
contents[0].b = ...


That doesn't work because you can't reference the Something struct type in the bindMemory call. So how would you get the contents of the buffer?

So do I have to memcpy() into calculated byte offsets for every float / int I want to update? I'm feeling a bit disenchanted by Swift+Metal... should I have used Obj-C?

I've found a difference between iOS 9 and 10 that might be the cause, but I don't know what's going on.


In my model object's init method, I'm printing the values of the vertexBuffer:


    init(kitMesh: MTKMesh, modelMesh: MDLMesh, device: MTLDevice) {    
        let vertexBuffer = kitMesh.vertexBuffers[0]
        let floatData = vertexBuffer.buffer.contents().bindMemory(to: Float.self, capacity: vertexBuffer.length / MemoryLayout<Float>.stride)
        for i in 0..<100 {
            print("data [\(i)] = \(floatData[i])")
        }
    }


Consolde on iOS 9 (correct - matches vertex descriptor):

data [0] = -1.30764
data [1] = 0.142407
data [2] = 2.24837
data [3] = -0.0059
data [4] = 0.986
data [5] = -0.1665
data [6] = 0.0
data [7] = 0.0
data [8] = 0.0
data [9] = 1.0
data [10] = 0.00255011
data [11] = -1.72002
data [12] = 0.072178
data [13] = 1.84713
etc...


Console on iOS 10:

data [0] = 0.0
data [1] = 1.4013e-45
data [2] = 2.8026e-45
data [3] = 4.2039e-45
data [4] = 5.60519e-45
data [5] = 7.00649e-45
data [6] = 8.40779e-45
data [7] = 9.80909e-45
data [8] = 1.12104e-44
data [9] = 1.26117e-44
data [10] = 1.4013e-44
data [11] = 1.54143e-44
data [12] = 1.68156e-44
etc...


What's up? Is there something different about how Model I/O deals with vertex data layout?

Most probable cause is that Metal buffers or shaders are overloaded there is a limitations of using metal technology and it is described here

https://developer.apple.com/library/content/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/BestPracticesforShaders/BestPracticesforShaders.html

I was having this problem before and i was about to switch from Metal to OpenGL But the advantages of Metal let me try again then i discover that my issue that i was Calculating and sorting an Array of Heavy data (Mostly floats and doubles) within the renderer function

My Mistake was Here See the remarked line


- (void)renderer:(id <SCNSceneRenderer>)renderer updateAtTime:(NSTimeInterval)time 
{ 
     [self calculateMyData]; // My Mistaken line
} 
- (void)calculateMyData 
{ 
     // the Heavy data calculation
}

to avoid most IOAF Errors try not to do heavy or complex calculations like sorting Big data arrays or such within the renderer try use External Loops to call this calculations. This is what i have done

- (void)viewDidLoad 
{ 
     NSTimer *timerCounter2 = [NSTimer scheduledTimerWithTimeInterval:0.3 target:self selector:@selector(calculateMyData) userInfo:nil repeats: YES]; 
} 

- (void)renderer:(id <SCNSceneRenderer>)renderer updateAtTime:(NSTimeInterval)time 
{ 
     // [self calculateMyData]; // My Mistaken line
}
- (void)calculateMyData 
{ 
     // The Heavy Data Calculations
}