faster way to total ushort4 components

In a fragment shader, an expression that adds together the four components of a ushort4 is the heaviest line in the profile.

Is there a faster way to do this?

The relevant lines from shader are:
Code Block Metal
ushort4 iterationSamples;
iterationSamples=tex2D.gather(quadSampler, inFrag.m_TexCoord);
ushort totalIterations=(iterationSamples.x+iterationSamples.y)+(iterationSamples.z+iterationSamples.w); // 40% of shader time on this line


Device is Apple TV 4K running tvOS 13.4.6 (17L570)
Xcode 11.5 (11E608c)

faster way to total ushort4 components
 
 
Q