What are the gaps between individual computer shader invocations in metal GPU trace?

device: iphone 11 os: ios 15.6

I have a metal applicaton on IOS where a series of computer shaders are encoded, then disptached and comiited together at last. When I capture a GPU trace of my application, however I noticed there are these gaps between each computer shader invocation. And these gaps seem to take up a big part of the GPU time.

I'm wondering what are these gaps and what are causing them. Since all compute dispatch commands are commiited toghether at once, these gaps shouldn't be synchronizations between cpu and GPU

PS: In my application, later compute commands mostly depend on former ones and would use the result buffer from former invocations. But as shown in the picture, bandwith and read/write buffer limiter are not high as far as I'm concerned.

Hi. Could you show a screenshot with the Compute Shader track disclosed so we can see if they are different shaders? If there are different shaders, the GPU may be running those shaders between the gaps. We can take a better look if you share the gputrace with us by filing a Feedback Assistant Report and pasting the Feedback report ID here. Thanks.

They are different shaders. Could it be UI displays errors, cause I found the duration on the right timeline track doesnt match what shows on the left

What are the gaps between individual computer shader invocations in metal GPU trace?
 
 
Q