PYNing’s Profile | Apple Developer Forums

How to optimize ReductionSum with Metal?

I would like to write a ReductionSum Metal Shader like this: https://github.com/alibaba/MNN/blob/master/source/backend/metal/MetalReduction.metal#L32 Sometimes the reduced dimension is large while the other dimensions is small, which cause few threads can be launched and inefficient. Is there any way to optimize it?

Graphics & Games General Metal

1

0

738

Jun ’22

What is System Level Cache？Where is it？

Captured from Video WWDC 2020 10603 Optimize Metal apps and games with GPU counters How can I get information about System Level Cache？And is it inside the SOC？If not, where is it?

Graphics & Games General Metal Apple Silicon

1

0

1.3k

Jun ’22

[Metal] Fastest way to copy device data to threadgroup memory?

To optimize buffer read, I intend to use threadgroup memory. Buf it seems: (1) There is no API like std::memcpy in MSL; (2) Also, there is no API like [setBuffer: atIndex:] to set data for threadgroup memory. The amount of data is 2~4KB. How can I get the fastest way to copy data from device data to threadgroup memory? THX!

Graphics & Games General Metal

1

0

806

Apr ’22

[Metal] Is it right to set different version of Deployment Target and Metal Version?

What I want to do are shown as the following two pictures: My needs are: （1）build framework for MacOS Apps; （2）complie .metallib to accelerate shader init; （3）frameworks can be used in old MacOS Version which only use Metal 1 Features ; （4）frameworks use new Metal 2 Features in new MacOS Version and M1 Silicon. I have konwn：（1）while setting Deployment Target, MSL will adjust to corresponding version. But this design can't meet my needs, so I made the above settings. （2）Apple have post a doucument Detecting GPU Features and Metal Software Versions on this issue , but not fully solved my concern. My concern is：（1）Can the .metallib complied by high version MSL be loaded by old version MacOS? （2）Will App Store refuse the MacOS APP use this kind of setting？ Help: Can you tell me is right to do these setting, or give me a better solution? Thanks!

App & System Services Core OS macOS Metal Mac App Store

0

523

Apr ’22

[Metal] What's wrong with simdgroup_load or simdgroup_store?

OS: MacOS 12.2.1 Hardwear: MacBook Pro 2020, M1 Metal: 2.4 Xcode： 13.2.1 Here is my test computer kernel,which read input buffer with simdgroup_load adn write output buffer with simdgroup_store kernel void fun( const device half * Src [[ buffer(0) ]], constant uint4 & SrcShape [[ buffer(1) ]], device half * Dst [[ buffer(2) ]], constant uint4 & DstShape [[ buffer(3) ]], const device half * Weight [[ buffer(4) ]], ushort3 threadgroup_position_in_grid [[ threadgroup_position_in_grid ]], ushort3 thread_position_in_threadgroup [[ thread_position_in_threadgroup ]], ushort3 threads_per_threadgroup [[ threads_per_threadgroup ]], ushort3 thread_position_in_grid [[ thread_position_in_grid ]]) { const int SrcSlices = (int)SrcShape[0]; const int SrcHeight = (int)SrcShape[1]; const int SrcWidth = (int)SrcShape[2]; const int DstSlices = (int)DstShape[0]; const int DstHeight = (int)DstShape[1]; const int DstWidth = (int)DstShape[2]; const int Kernel_X = 3; const int KernelElemNum = 3 * 3; const int N_Pack = 8; // test only 1 thread if(thread_position_in_grid.z != 0 || thread_position_in_grid.y != 0 || thread_position_in_grid.x * N_Pack != 0) return; simdgroup_half8x8 sgMatY; simdgroup_load(sgMatY, Src); simdgroup_store(sgMatY, Dst); } It's a simple shader, however output buffer only save the first 2 values from input buffer， the other 62 values are ALL ZERO Here is the result from Xcode Metal Capture How can I debug or fix it？

Graphics & Games General Metal simd

3

0

2.0k

Mar ’22

Any Demo for Metal Struct simdgroup_float8x8 and simdgroup_half8x8 ?

I have a metal compute kernel for dense matrix mutiply， and I'd like to optimize it with simdgroup_float8x8 and simdgroup_half8x8. However, it seems no one apply them in Metal. Can you give me some more demo on how to use them excpet that in Metal Shading Language Specification Version 2.4. Thanks！

Graphics & Games Metal Metal Performance Shaders Metal simd

1

0

1.1k

Jan ’22

What's error with this Metal Compute Function in Apple Silicon?

I'm developing AI APPs with Metal Compute Function. As shown in the following picture captured from Metal Debuger: The preview picutre is wrong while the picture in Attachment View is correct. The final Result is the same as the preview picutre which is not expected. This error is not appeared in x86_64 Macbook, but appeared in M1 Macbook. I think there may be some issue(like concurrent read) mentioned in Bring your Metal app to Apple silicon Macs. Can you give me some idea how to debug it? Thanks!

Graphics & Games General Metal Apple Silicon

1

0

866

Jan ’22

PYNing

Post

Replies

Boosts

Views

Activity