A discontinuity tells the system to reconstruct the rendering chain. As far as I undersand it this will cause a "gap". I'm not certain whether 200ms is the amount, but that sounds plausible, (Based on further conversations with other engineers it seems you should be able to have no gap.) You are correct that you have to have discontinuities in the audio as well. Otherwise bad things will happen.
I get the sense that you audio is really continuous (such as a song). One idea for how you might avoid the gap is to play two separate streams and sync them using the technique described in this year's Advances in HTTP Live Streaming WWDC talk. Under the Resources section on that page there is a link to sample code. This requires that you play using your own app, since you have to sync up two separate AVPlayers.
Feel free to put in a Radar or a DTS request about the gap. It may be that someone with more low-level experience than me can provide some advice.
Please put in a radar with a sample stream so we can help you get to the bottom of this issue.