In the AVAudioPlayerNode documentation, it says:
I want to understand why it's often preferable. It's because the AVAudioMixerNode can "sum" multiple signals with same sample rate and convert it all at once (like a single signal), so it's "lighter" than resampling multiple signals inside each AVAudioPlayerNodes?[...]when playing file segments, the node will make sample rate conversions if necessary, but it's often preferable to configure the node’s output sample rate to match that of the files and use a mixer to perform the rate conversion.
When playing buffers, there's an implicit assumption that the buffers are at the same sample rate as the node’s output format.