MPSCNNConvolution connected in series of 4 different result in a10 and a11 chip

In Ios 12, I tried to run the following code.

Basically the code is connecting 4 MPSCNNConvolution filter together.

This is the architecture

1. Decimal to Float Conversion : Convert 256 RGB color to Float in range 0-1

2. Preprocess

3. Convolution 3 to 16 channels

4. Convolution 16 to 32 channels

5. Convolution 32 to 64 channels

6. Convolution 64 to 3 channels

7. Deprocess

8. Float to Decimal: Convert back to RGB color for display.


However I noticed that the result of iphone 7+ and iPhone x is different. Why?

If I try to change the code by removing any one of the convolution filter out then both device result is same.

Why can't convolution be conected in series of 4 but works when it is 3 or less?



//#################################### Create and initialize device, library, queue, resources ######################ß

id<MTLCommandQueue> queue = [device newCommandQueue];

id<MTLCommandBuffer> buffer = [queue commandBuffer];

id<MTLLibrary> library = [device newDefaultLibrary];


//################################################### PREPARING INPUT IMAGE & HOLDER ####################################


CGImageRef imageRef = [inputImage CGImage];

// Create a suitable bitmap context for extracting the bits of the image

NSUInteger inputImageWidth = CGImageGetWidth(imageRef);

NSUInteger inputImageHeight = CGImageGetHeight(imageRef);

MTKTextureLoader *textureLoader = [[MTKTextureLoader alloc] initWithDevice:device];

NSError *err = nil;

inputTexture = [textureLoader newTextureWithCGImage:imageRef options:@{MTKTextureLoaderOptionSRGB: @NO} error:&err];

//inputTexture.pixelFormat=70;

NSLog(@"inputTexture pixel format: %tu",inputTexture.pixelFormat); //81 = MTLPixelFormatBGRA8Unorm_sRGB, 80 = MTLPixelFormatBGRA8Unorm

NSAssert(!err, [err description]);

//################################################ PREPARING OUTPUT IMAGE HOLDER ####################################

MPSImageDescriptor *preprocess_OutputImageDecriptor = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat16 width:inputImageWidth height:inputImageHeight featureChannels:3];

MPSTemporaryImage *preprocess_OutputImage = [MPSTemporaryImage temporaryImageWithCommandBuffer:buffer imageDescriptor:preprocess_OutputImageDecriptor];

MPSImageDescriptor *decToFloatConverter_OutputImageDecriptor = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat16 width:inputImageWidth height:inputImageHeight featureChannels:3];

MPSTemporaryImage *decToFloatConverter_OutputImage = [MPSTemporaryImage temporaryImageWithCommandBuffer:buffer imageDescriptor:decToFloatConverter_OutputImageDecriptor];


MPSImageDescriptor *convolution2_OutputImageDecriptor = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat16 width:inputImageWidth+40+40 height:inputImageHeight+40+40 featureChannels:16];

MPSTemporaryImage *convolution2_OutputImage = [MPSTemporaryImage temporaryImageWithCommandBuffer:buffer imageDescriptor:convolution2_OutputImageDecriptor];

MPSImageDescriptor *convolution5_OutputImageDecriptor = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat16 width:390 height:273 featureChannels:32];

MPSTemporaryImage *convolution5_OutputImage = [MPSTemporaryImage temporaryImageWithCommandBuffer:buffer imageDescriptor:convolution5_OutputImageDecriptor];


MPSImageDescriptor *convolution8_OutputImageDecriptor = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat16 width:195 height:137 featureChannels:64];

MPSTemporaryImage *convolution8_OutputImage = [MPSTemporaryImage temporaryImageWithCommandBuffer:buffer imageDescriptor:convolution8_OutputImageDecriptor];

MPSImageDescriptor *convolution16_OutputImageDecriptor = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat16 width:193 height:135 featureChannels:3];

MPSTemporaryImage *convolution16_OutputImage = [MPSTemporaryImage temporaryImageWithCommandBuffer:buffer imageDescriptor:convolution16_OutputImageDecriptor];

MPSImageDescriptor *deprocess_OutputImageDecriptor = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat16 width:191 height:133 featureChannels:3];

MPSTemporaryImage *deprocess_OutputImage = [MPSTemporaryImage temporaryImageWithCommandBuffer:buffer imageDescriptor:deprocess_OutputImageDecriptor];

MPSImageDescriptor *floatToDecConverter_OutputImageDecriptor = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatUnorm8 width:191 height:133 featureChannels:3];

MPSTemporaryImage *floatToDecConverter_OutputImage = [MPSTemporaryImage temporaryImageWithCommandBuffer:buffer imageDescriptor:floatToDecConverter_OutputImageDecriptor];

MPSImageDescriptor *debugTemp2Decriptor = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatUnorm8 width:191 height:133 featureChannels:3];

MPSImage *debugTemp2 = [[MPSImage alloc] initWithDevice:device imageDescriptor:debugTemp2Decriptor];

//################################################ PREPARING COMPONENT ###############################################

MPSImageConversion *decToFloatConverter=[[MPSImageConversion alloc] initWithDevice:device srcAlpha:MPSAlphaTypeNonPremultiplied destAlpha:MPSAlphaTypeNonPremultiplied backgroundColor:nil conversionInfo:nil];


MPSCNNConvolutionDescriptor* convolution2_descriptor = [MPSCNNConvolutionDescriptor cnnConvolutionDescriptorWithKernelWidth:9 kernelHeight:9 inputFeatureChannels:3 outputFeatureChannels:16];


ConvolutionDataSource *data_source_convolution2 = [[ConvolutionDataSource alloc]

initWithWeight:const_cast<float*>(conv2weights)

bias:const_cast<float*>(conv2biases)

desc:convolution2_descriptor];

MPSCNNConvolution *convolution2 = [[MPSCNNConvolution alloc] initWithDevice:device weights:data_source_convolution2];

convolution2.accumulatorPrecisionOption = MPSNNConvolutionAccumulatorPrecisionOptionHalf;

convolution2.offset=(MPSOffset) { 0, 0, 0 };

convolution2.edgeMode=MPSImageEdgeModeZero;

MPSCNNConvolutionDescriptor *convolution5_descriptor = [MPSCNNConvolutionDescriptor cnnConvolutionDescriptorWithKernelWidth:3 kernelHeight:3 inputFeatureChannels:16 outputFeatureChannels:32];

convolution5_descriptor.strideInPixelsX=2;

convolution5_descriptor.strideInPixelsY=2;

ConvolutionDataSource *data_source_convolution5 = [[ConvolutionDataSource alloc]

initWithWeight:const_cast<float*>(conv5weights)

bias:const_cast<float*>(conv5biases)

desc:convolution5_descriptor];

MPSCNNConvolution *convolution5 = [[MPSCNNConvolution alloc] initWithDevice:device weights:data_source_convolution5];

convolution5.accumulatorPrecisionOption = MPSNNConvolutionAccumulatorPrecisionOptionHalf;

convolution5.offset=(MPSOffset) { 0, 0, 0 };

convolution5.edgeMode=MPSImageEdgeModeZero;


MPSCNNConvolutionDescriptor *convolution8_descriptor = [MPSCNNConvolutionDescriptor cnnConvolutionDescriptorWithKernelWidth:3 kernelHeight:3 inputFeatureChannels:32 outputFeatureChannels:64];

convolution8_descriptor.strideInPixelsX=2;

convolution8_descriptor.strideInPixelsY=2;

ConvolutionDataSource *data_source_convolution8 = [[ConvolutionDataSource alloc]

initWithWeight:const_cast<float*>(conv8weights)

bias:const_cast<float*>(conv8biases)

desc:convolution8_descriptor];

MPSCNNConvolution *convolution8 = [[MPSCNNConvolution alloc] initWithDevice:device weights:data_source_convolution8];

convolution8.accumulatorPrecisionOption = MPSNNConvolutionAccumulatorPrecisionOptionHalf;

convolution8.offset=(MPSOffset) { 0, 0, 0 };

convolution8.edgeMode=MPSImageEdgeModeZero;

MPSCNNConvolutionDescriptor *convolution16_descriptor = [MPSCNNConvolutionDescriptor cnnConvolutionDescriptorWithKernelWidth:3 kernelHeight:3 inputFeatureChannels:64 outputFeatureChannels:3 ];

ConvolutionDataSource *data_source_convolution16 = [[ConvolutionDataSource alloc]

initWithWeight:const_cast<float*>(conv16weights)

bias:const_cast<float*>(conv16biases)

desc:convolution16_descriptor];

MPSCNNConvolution *convolution16 = [[MPSCNNConvolution alloc] initWithDevice:device weights:data_source_convolution16];

convolution16.offset=(MPSOffset) { 1, 1, 0 };

convolution16.accumulatorPrecisionOption = MPSNNConvolutionAccumulatorPrecisionOptionHalf;

convolution16.edgeMode=MPSImageEdgeModeZero;


MPSImageConversion *floatToDecConverter=[[MPSImageConversion alloc] initWithDevice:device srcAlpha:MPSAlphaTypeNonPremultiplied destAlpha:MPSAlphaTypeNonPremultiplied backgroundColor:nil conversionInfo:nil];

//################################################ ENCODING COMPONENT ###############################################

//################################## DECIMAL TO FLOAT16 #################################################

[decToFloatConverter encodeToCommandBuffer:buffer sourceTexture:inputTexture destinationTexture:decToFloatConverter_OutputImage.texture];

//################################## PREPROCESS #################################################

id <MTLComputeCommandEncoder> computeCE = [buffer computeCommandEncoder];

NSError *errors;

id <MTLFunction> preprocess_func = [library newFunctionWithName:@"preprocess"];

id <MTLComputePipelineState> pipelineState = [device newComputePipelineStateWithFunction:preprocess_func error:&errors];

[computeCE setComputePipelineState:pipelineState];

[computeCE setTexture:decToFloatConverter_OutputImage.texture atIndex:0];

[computeCE setTexture:preprocess_OutputImage.texture atIndex:1];

const auto& launchParamsPreprocess = spatialPointwiseKernelLaunchParams(pipelineState, preprocess_OutputImage);

[computeCE dispatchThreadgroups:launchParamsPreprocess.threadgroupsPerGrid

threadsPerThreadgroup:launchParamsPreprocess.threadsPerThreadgroup];

_outputText.text=[NSString stringWithFormat:@"YESY:width:%tu,height:%tu",launchParamsPreprocess.threadsPerThreadgroup.width,launchParamsPreprocess.threadsPerThreadgroup.height];

[computeCE endEncoding];

decToFloatConverter_OutputImage.readCount=0;

//################################################ CONVOLUTION (2) ###############################################

[convolution2 encodeToCommandBuffer:buffer sourceImage:preprocess_OutputImage destinationImage:convolution2_OutputImage];

preprocess_OutputImage.readCount=0;


//################################################ CONVOLUTION (5) ###############################################

[convolution5 encodeToCommandBuffer:buffer sourceImage:convolution2_OutputImage destinationImage:convolution5_OutputImage];

convolution2_OutputImage.readCount=0;


//################################################ CONVOLUTION (8) ###############################################

[convolution8 encodeToCommandBuffer:buffer sourceImage:convolution5_OutputImage destinationImage:convolution8_OutputImage];

convolution5_OutputImage.readCount=0;


//################################## CONVOLUTION (16) #################################################

[convolution16 encodeToCommandBuffer:buffer sourceImage:convolution8_OutputImage destinationImage:convolution16_OutputImage];

convolution8_OutputImage.readCount=0;

//################################## DEPROCESS #################################################

computeCE = [buffer computeCommandEncoder];

id <MTLFunction> deprocess_func = [library newFunctionWithName:@"deprocess"];

pipelineState = [device newComputePipelineStateWithFunction:deprocess_func error:&errors];

[computeCE setComputePipelineState:pipelineState];

[computeCE setTexture:convolution16_OutputImage.texture atIndex:0];

[computeCE setTexture:deprocess_OutputImage.texture atIndex:1];

const auto& launchParamsDeprocess = spatialPointwiseKernelLaunchParams(pipelineState, deprocess_OutputImage);

[computeCE dispatchThreadgroups:launchParamsDeprocess.threadgroupsPerGrid

threadsPerThreadgroup:launchParamsDeprocess.threadsPerThreadgroup];

[computeCE endEncoding];

convolution16_OutputImage.readCount=0;


//################################## CONVERT FLOAT RGB to DEC #################################################

[floatToDecConverter encodeToCommandBuffer:buffer sourceTexture:deprocess_OutputImage.texture destinationTexture:debugTemp2.texture];

deprocess_OutputImage.readCount=0;

floatToDecConverter_OutputImage.readCount=0;

//################################## COMMITING BUFFER & CALCULATE DURATION #################################################

[buffer commit];

[pendingBuffers addObject:buffer];

[pendingBuffers enumerateObjectsUsingBlock:^(id<MTLCommandBuffer> buffer, NSUInteger idx, BOOL *stop) {

[buffer waitUntilCompleted];

}];

NSLog(@"Time: %g seconds", CACurrentMediaTime() - start);


[pendingBuffers enumerateObjectsUsingBlock:^(id<MTLCommandBuffer> buffer, NSUInteger idx, BOOL *stop) {

UIImage *mya=[self imageWithMTLTexture:debugTemp2.texture];

resultImageView.image=mya;


}];

Replies

How different are the outputs? Just a little bit (several places after the decimal point) or wildly different?