Posts

Post not yet marked as solved
1 Replies
Yes, the integrated GPU will be much faster for this. It's because it's a quite light workload but involves a block of memory. The iGPU can work directly from system memory, while the dGPU has to copy to VRAM, do the work, then copy back. The processing is much faster but copying across the PCI bus takes enough time for it to be slower overall.
Post not yet marked as solved
6 Replies
It might be that it treats the render target as an array even if there's only one, in which case you're still using it even though you might not specify it in the code. It certainly looks like the VS writes the vertex data out though. Odd. Are you using the latest Xcode?The only other issue I can see is in your fragment shader, line 40, you refer to the vertex as 'n' instead of 'in', which could cause that error but I'm pretty sure it wouldn't compile, so that's probably just an accident while posting.
Post not yet marked as solved
6 Replies
In that case, I'd do a frame capture and take a look to see where the problem is (should be quite easy to find then, if capturing doesn't 'fix' the issue - if it does, it's likely sync related)
Post not yet marked as solved
6 Replies
Check the documentation forreplaceRegion:mipmapLevel:withBytes:bytesPerRow:It doesn't synchronise. What might be happening: you're submitting the draw call to the GPU, then updating the texture with replaceRegion, and it's actually uploading the texture while the shader from the last draw call is rendering with it. If so, just adding a waitUntilCompleted should fix it.
Post marked as solved
5 Replies
This seems unwise - my iMac has no integrated GPU (or it has, but it's disabled and can't be used). You're also going to get poor performance unless it's something fairly basic, on systems that maybe have a weak integrated GPU but a high end descrete GPU. Fixing the actual issue might make more sense - make sure you have Metal validation enabled and check any reported issues, and check you're setting pixel formats properly etc.But to select a particular GPU: when you set up your MTLDevice, instead of using MTLCreateSystemDefaultDevice, use MTLCopyAllDevices which will give you an array. You'll then need to figure out which device you want.
Post not yet marked as solved
5 Replies
Have you checked in the profiler that it's the compute kernel itself slower, and not something happening outside it causing a wait?
Post not yet marked as solved
5 Replies
I've nothing to add on the actual issue (will have to test this out though, as it could affect some of my stuff) but if you'd like to look at what's actually happening with the compiler, I posted some details on how to decompile the shader cache in this thread: https://forums.developer.apple.com/thread/119625
Post not yet marked as solved
5 Replies
You're supposed to add the source code to github, not a zip of the project 🙂 Then we can take a look at the code without having to mess about. (It'll also mean you can track changes you make to the code, roll things back if necessary etc.)
Post not yet marked as solved
1 Replies
OpenGL runs on top of Metal on iOS (and has for some years), so pretty much yes. That doesn't mean you'll get the benefits of Metal, and OpenGL is still depreciated so if at some point it's removed from iOS your app will no longer work.
Post not yet marked as solved
2 Replies
The error says you're rendering as RGBA into a BGRA buffer, which is usually an error (if you draw red, it will show blue). You can work around that in the shader but it's better to know about the error and fix it instead.
Post not yet marked as solved
3 Replies
For things like this the Metal Shading Language guide helps. If you check the function constants reference there it's available since Metal 1.2. The API docs show 1.2 is availabel in iOS 10.0 and macOS 10.12 or later: https://developer.apple.com/documentation/metal/mtllanguageversion/version1_2
Post not yet marked as solved
3 Replies
That's really weird. The label is just there to identify the different encoders (which is mostly useful for debugging), it shouldn't have any effect on how anything gets executed!I'd assume DMA is the correct channel there - DMA to copy between GPU and system memory, blitter to copy GPU to GPU.
Post not yet marked as solved
2 Replies
Haven't looked at how you're testing, but if you compare the vertices/second numbers private storage looks to be much faster. So I'd guess something is affecting your measurement, maybe added initial setup cost for private storage?
Post marked as solved
6 Replies
sin() isn't going to help in any case here, as I'm trying to use the packed math features to increase ALU throughput and there's no equivalent to sin() for packed 16bit (i.e. it would just run sin() seperately on each value and not do two at once).To disassemble AMD shaders, you need to get AMD's own shader decompiler from here:https://github.com/CLRX/CLRX-mirrorAfter that, you need to actually run the shader (or perhaps building the pipeline state object is enough, not checked). It's highly advisable to do this in a small sample app that ONLY has that one pipeline state & shader, or you'll end up with a huge disassembly file. That compiles the shader, which gets helpfully cached. You can find the cache location with this command:getconf DARWIN_USER_CACHE_DIRInside there you should find your app's cache, and the GPU you're interested in, and finally the 'functions.data' file which is the raw shader binary (and unfortunately other stuff). You can then disassemble it with:clrxdisasm -g vega10 -r functions.data(You'll need to replace 'vega10' with your GPU architecture, that'll work with a vega 56/64).That gives you the assembly, plus some garbage around it (other data in the file gets interpreted as shader code too). I find it best to search for "s_endpgm" (end shader program), and work back. There will probably be 2+ shaders in there, hopefully you can spot some obvious code to know which you want.Finally you'll want the relevant AMD ISA document, which is easily found on the web. The Vega ISA is here:PDF Vega Instruction Set Architecture - AMD
Post marked as solved
6 Replies
Ok, so I've kind of figured this out now. I was able to use AMD's shader disassembler on my app's shader cache, and the Vega ISA is public. It turned out some of my code was using the fast packed 16bit operations, some wasn't, and there were additional conversion operations that likely accounted for the performance hit.If anyone else is loking at this, then basically you need to use packed_half2/4 types and limit your code to add, mul, fma, min and max. A bit more is possible with integer types - see AMD's Vega ISA document for details. It's useful for ML type work for sure, less useful for general shaders. You'll probably want to profile on older AMD cards, intel and nvidia because using this might lead to a performance drop (or gain!) depending on the hardware.