AlecazamTGC’s Profile | Apple Developer Forums

16" MBP w/AMD doesn't support MTLCounterSamplingPointAtStageBoundary

This is the latest Intel Mac running with AMD 5500, and it can't sample timings at stage boundaries? How are we supposed to write timing consistently for macOS and iOS if that's not the case? So I have to then add several 1000 samples per draw call and accumulate them? I don't remember the docs or sample code pointing this out. Our app compiles to deploy on macOS 10.15. Does setting that higher help with this? MTLCounterSamplingPointAtStageBoundary is not supported, startOfVertexSampleIndex must be MTLCounterDontSample. MTLCounterSamplingPointAtStageBoundary is not supported, startOfFragmentSampleIndex must be MTLCounterDontSample

Metal

Posted

by

AlecazamTGC.

Last updated

.

Rosetta2 missing AVX and f16c ops

We can drop our compiles from AVX to SSE4.2, but we also use f16c ops to handle fp16 <-> fp32 conversions. Neon already has similar routines to f16c support, so why are these missing from Rosetta2? Until we can generate universal apps, we need to fallback to running our tools under Rosetta2. Also looks like popcount is missing. These limits should be posted in Apple Rosetta2 documents. Here's my MBP 16" Intel sysctl -a | grep machdep.cpu.features machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C And an M1 comparison: sysctl -a | grep machdep.cpu.features machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTSE64 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 AES SEGLIM64

Graphics and Games

Posted

by

AlecazamTGC.

Last updated

.

Shader hotloading broken - newLibraryWithData on metallib returns cached not new metallib

This breaks shader hotloading and has been a persistent bug in Metal for the past many years. Metal holds onto some existing lib, returns it, without checking that the data content has changed. Similar bugs happen with Metal's shader cache not checking modification timestamps. In my case, I'm just changing a color in the shader from float3(1,0,0) to float3(1,1,0) and then never seeing the result of the shader change. The new metallib is loaded from disk, and handed off to newLibraryWithData. I can tell that it's returning a cached metallib, because we set a label on the MTLFunction that is returned. That's not nil on the first load of the shader, and after the hotload of the new metallib the label is non-nil. So we just see the old shader content. This is a very important Radar to fix.

Metal

Posted

by

AlecazamTGC.

Last updated

.

Metal draw indirect missing draw count

Why is there no count to any of these draw indirect directives? I am appending draws to a single MTLBuffer on the cpu, but can't limit how many are drawn out of the buffer. An offset isn't enough to specify a range. Can this be supplied in some bind call? - (void)drawIndexedPrimitives:(MTLPrimitiveType)primitiveType indexType:(MTLIndexType)indexType indexBuffer:(id <MTLBuffer>)indexBuffer indexBufferOffset:(NSUInteger)indexBufferOffset indirectBuffer:(id <MTLBuffer>)indirectBuffer indirectBufferOffset:(NSUInteger)indirectBufferOffset API_AVAILABLE(macos(10.11), ios(9.0)); Contrast this with the Vulkan call which as an offset and count. vkCmdDrawIndexedIndirect( m_encoder, indirectBuffer, drawBufferOffset, drawCount, sizeof( vkCmdDrawIndexedIndirect ) );

Metal

Posted

by

AlecazamTGC.

Last updated

.

Shader compiler issue with any() use on iOS

We target MSL 1.1 on iOS9, and are seeing non-equivalence to the following. The upper code gens bad pixels on iOS but is the more efficient form. macOS (on AMD 5500m) is fine. I will log this to Feedback Assistant, but also here too. The code was also compiled with -O2. So could be an iOS optimizer bug. #if 1 if ( all( greaterThanEqual(pos.xy, v_clip.xy )) && all( lessThanEqual(pos.xy, v_clip.zw )) ) #else if ( pos.x = v_clip.x && pos.x = v_clip.z && pos.y = v_clip.y && pos.y = v_clip.w ) #endif This is codgen out of spirv-cross. Mac and iOS codegen is the same for this chunk. These are on iOS With #1, this doesn't work: fsmain_out out = {}; float4 color = float4(0.0); float2 pos = gl_FragCoord.xy; bool _35 = all(pos = in.v_clip.xy); bool _43; if (_35) { _43 = all(pos = in.v_clip.zw); } else { _43 = _35; } if (_43) ... With #0, this works fsmain_out out = {}; float4 color = float4(0.0); float2 pos = gl_FragCoord.xy; bool _38 = pos.x = in.v_clip.x; bool _47; if (_38) { _47 = pos.x = in.v_clip.z; } else { _47 = _38; } bool _56; if (_47) { _56 = pos.y = in.v_clip.y; } else { _56 = _47; } bool _65; if (_56) { _65 = pos.y = in.v_clip.w; } else { _65 = _56; } if (_65)

Metal

Posted

by

AlecazamTGC.

Last updated

.

GPU capture should display draw call after pushDebugGroup/commands

The push/popDebugGroup calls are captured by GPU capture and display a folder around a series of draw calls. But when you select the folder, the previous draw call results and attachments are displayed. This makes walking through a deep hierarchy of draw calls confusing, especially to people new to GPU capture. A simple change, but selecting a folder like this or any command after a draw should really display the results from the next draw call instead of the previous.

Metal

Posted

by

AlecazamTGC.

Last updated

.

GPU capture only reports Counters on iOS/macOS when reopening capture

Make sure gpu capture is set to "Automically Enabled" and "Profile GPU Trace after Capture" in Xcode 12.2 and 12.4 Run an iOS app Do a GPU capture Try to go to look at Counters and they're not there. Save capture out via "Export" Reopen capture, and now Counters are there I see the "Counters" pane have a spinner for a short amount of time after doing step 2, but the Counters are never filled out. I don't want to have to exit my app to look at captures, since I need to look at multiple captures over the course of a session.

Posted

by

AlecazamTGC.

Last updated

.

Clickthrough to headers broken using new build system.

When we have warnings/errors in our make based builds, the new build system reports the warnings as relative instead of absolute paths. When I then try to click to follow to the code where the warnings/errors occur I get the "bonk" noise and Xcode doesn't take me there. My understanding is that the old build systems resolved these to full paths and so they would then jump to the line in the code, but the new build system just leaves them as a relative path. This mostly defeats the use of an IDE if we can quickly review and fix issues like this. Any suggestions for fixing this? Neither the warning/error summary, or the report navigator build panel take me to the line in FooClump.h. This is the line take from the report naviagator build pane. In file included from /Users/Me/MyAppFolder/FooClump.cpp:4: FooClump.h:30:15: warning: 'postConstructor' overrides a member function but is not marked 'override' [-Winconsistent-missing-override] virtual void postConstructor();

Posted

by

AlecazamTGC.

Last updated

.

macOS 10.15 deployment breaks monkeypatching C++ vtable

How do we get this code to not crash? This was working up until we bumped our macOS deployment to 10.15. When deployment is set to macOS 10.14, the code works fine. Data is nearly identical in the debugger, although the vtable is at a slightly lower address in 10.15. Have the C++ vtables been put in read-only marked pages, and if so how do we prevent that? Hardened runtime is not enabled, and I don't recall any mention from Apple about this change. #import Foundation/Foundation.h class Base { public: virtual ~Base() {}; virtual const char* Print() { return "Base"; } }; class Derived : public Base { public: virtual const char* Print() override { return "Derived"; } }; const char* PrintPatch( Base* localThis ) { return "Patch"; } template class T1, class T2 void* PatchVtablePtr( void** vtable, T1 memberFunction, T2 newFunction ) { // Replace the instance of memberFunction in the vtable with newFunction void* offset = *(void**)&memberFunction; auto vtableIndex = (uintptr_t)offset / sizeof(void*); vtable[vtableIndex] = (void*)newFunction; - Thread 1: EXC_BAD_ACCESS (code=2, address=0x100004038) // return the original vtable address return offset; } int main(int argc, const char * argv[]) { Derived* derived = new Derived(); printf("%s\n", derived-Print()); // monkeypatch the vtable //Base* base = derived; //void** vtable = *(void*)base; void vtable = *(void***)derived; PatchVtablePtr( vtable, &Derived::Print, &PrintPatch ); printf("%s\n", derived-Print()); return 0; }

Posted

by

AlecazamTGC.

Last updated

.

dlopen() reloads original instead of new dylib after changes

We have a C++ library that we hotload on macOS. This uses dlopen() and dlclose() and worked up until recent versions of Catalina. We don't use thread_local and don't have Objective-C code in the library. dlopen() succeeds, we use the original dylib. Then for hotloading we dlclose() the original dylib and then dlopen() the new dylib. All this succeeds, and no dlerror occurs. All of the dyld output indicates that the library is being unloaded and loaded back in. But after changing the sources, and building a new dylib, the app returns the original dylib and not the new one. This seems to be a problem in the dyld layer itself, and not our sources. On older macOS builds, the hotloading works correctly. Given the lack of edit+continue in Xcode, this is the only way to iterate quickly on source code changes. How do we fix this? We are not using the hardened runtime. This is failing on macOS 10.15.7 with Xcode 12.2 (and 12.3).

Foundation

Posted

by

AlecazamTGC.

Last updated

.

insertDebugSignpost doesn't appear in Metal GPU Capture

I need to be able to tag each draw call with a quick string that details shader name, draw counts, etc. In Vulkan, we have pVkCmdInsertDebugUtilsLabelEXT (and begin/end event). In DX, there's Pix setMarker (in addition to begin/endEvent). And the Metal equivalent would seem to be insertDebugSignpost. But these don't appear in the Metal GPU capture at all. I also tried using a quick beginDebugGroup/endDebugGroup, but since that doesn't surround any commands, it appears to get stripped. A "marker" are needed for two reasons, quickly tagging points in code. And also to replace and flatten the begin/endDebugGroup hierarchy from folders used by "groups" when we want to do that. Why doesn't this Metal equivalent appear?

Metal
Xcode

Posted

by

AlecazamTGC.

Last updated

.

User Profile

AlecazamTGC

Posts

Posts

16" MBP w/AMD doesn't support MTLCounterSamplingPointAtStageBoundary

Rosetta2 missing AVX and f16c ops

Shader hotloading broken - newLibraryWithData on metallib returns cached not new metallib

Metal draw indirect missing draw count

Shader compiler issue with any() use on iOS

GPU capture should display draw call after pushDebugGroup/commands

GPU capture only reports Counters on iOS/macOS when reopening capture

Clickthrough to headers broken using new build system.

macOS 10.15 deployment breaks monkeypatching C++ vtable

dlopen() reloads original instead of new dylib after changes

insertDebugSignpost doesn't appear in Metal GPU Capture