So I did a sample program that convert a 2048 x 2048 source texture to grayscale using either a render pipeline and a compute pipeline.
On an iPhone XS Max running iOS 12.1, the fragment shader and compute kernel take both 1.9 ms to complete (according to Metal System Trace instrument). I would have expect the compute kernel to execute faster but at least it's not slower.
But on an iPhone 5S running iOS 11.4, the fragment shader takes 3 ms and the compute kernel takes 177 ms to execute!!!
On iPhone 7 Plus running iOS 11.1, the numbers are 1.6 ms and 3.2 ms respectively.
Here's my compute and render methods:
- (void)compute {
id commandBuffer = [_commandQueue commandBuffer];
id computeEncoder = [commandBuffer computeCommandEncoder];
[computeEncoder setComputePipelineState:_computePipelineState];
[computeEncoder setTexture:_srcTexture atIndex:0];
[computeEncoder setTexture:_dstTexture atIndex:1];
[computeEncoder dispatchThreadgroups:_threadgroupCount threadsPerThreadgroup:_threadgroupSize];
[computeEncoder endEncoding];
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
}
- (void)render {
id commandBuffer = [_commandQueue commandBuffer];
MTLRenderPassDescriptor *renderPassDescriptor = [MTLRenderPassDescriptor renderPassDescriptor];
renderPassDescriptor.colorAttachments[0].loadAction = MTLLoadActionDontCare;
renderPassDescriptor.colorAttachments[0].texture = _dstTexture;
renderPassDescriptor.colorAttachments[0].storeAction = MTLStoreActionStore;
id renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:renderPassDescriptor];
[renderEncoder setRenderPipelineState:_renderPipelineState];
[renderEncoder setFragmentTexture:_srcTexture atIndex:0];
[renderEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip vertexStart:0 vertexCount:4];
[renderEncoder endEncoding];
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
}
And the fragment and kernel functions:
// Grayscale Fragment Function
fragment half4 grayscaleFragment(RasterizerData in [[stage_in]],
texture2d inTexture [[texture(0)]])
{
constexpr sampler textureSampler;
half4 inColor = inTexture.sample(textureSampler, in.textureCoordinate);
half gray = dot(inColor.rgb, kRec709Luma);
return half4(gray, gray, gray, 1.0);
}
// Grayscale Kernel Function
kernel void grayscaleKernel(uint2 gid [[thread_position_in_grid]],
texture2d<half, access::read=""> inTexture [[texture(0)]],
texture2d<half, access::write=""> outTexture [[texture(1)]])
{
half4 inColor = inTexture.read(gid);
half gray = dot(inColor.rgb, kRec709Luma);
outTexture.write(half4(gray, gray, gray, 1.0), gid);
}
Should I request technical support or fill a radar?