CPU-based transform or GPU-based Affine 3D Transform or Linear 2D Transform + 2D Translation through `fma`, what is more efficient?

Question

Created May ’21

Replies 1

Boosts 0

Participants 2

I'm working on 2D drawing application. I receive CGPoints from UITouches and transform it to Metal coordinate space. In most cases I have to create several vertices from one CGPoint, apply transformation to them and convert to Metal coordinate space. I use simd and vector-matrix multiplication. So I have 4 options to do it.

Create affine 3D matrix with linear transform (scale/rotation in my case) + translation (matrix_float3x3) and perform vector-matrix multiplication on CPU side using simd.
Create affine transform and perform multiplication on GPU side in vertex function.
Create uniform with separate matrix_float2x2 linear transformation and simd_float2 translation and perform fma operation with 2D vector, linear 2D matrix and translation 2D vector on CPU side using Accelerate.
The same as third option but perform fma on GPU side in vertex function.

What is more efficient? And what are best practices in GPU programming? As I understand correctly fma and vector-matrix multiplication use one processor instruction. Am I right?

I have no more than 10 CGPoints which produce about 40-80 vertices on every draw call.

Answered by Graphics and Games Engineer in 675637022

Basically, if the data will need to be read by the CPU later, do it on the CPU. If it's just for display and the data was on the GPU anyway, do it on the GPU.

For this small a data-set, is unlikely that you will be able to measure the performance difference by trying to simplify the transformations. You are more likely to see speedups by trying to reduce the number of draw calls by batching.

For general guidance on GPU programming, you might want to watch Advanced Metal Shader Optimization.

Boost

Answer 1

Graphics and Games Engineer OP

Apple

May ’21

Accepted Answer

Basically, if the data will need to be read by the CPU later, do it on the CPU. If it's just for display and the data was on the GPU anyway, do it on the GPU.

For this small a data-set, is unlikely that you will be able to measure the performance difference by trying to simplify the transformations. You are more likely to see speedups by trying to reduce the number of draw calls by batching.

For general guidance on GPU programming, you might want to watch Advanced Metal Shader Optimization.

2