We're integrating a web based group calling application within a native iOS application and finding that every time a CallKit session gets fully established the web based media streams break, rendering as gray with no audio.
Up to iOS 18 we worked around it by not fulfilling the call start action but that's no longer an option as the audio stopped getting automatically redirected to the speakers. We would now need the CXProvider
's didActivateAudioSession
callback but that would break the video.
The sample project loads up a simple webpage in a WKWebView which contains a video tag streaming the media from the device's camera.
At the same time it sets up a new CallKit session by requesting and fulfilling a CXStartCallAction
transaction.
You will notice that the media doesn't render and, if you are to follow the warnings we left, you will find that not fulfilling the CXStartCallAction
fixes it.
Unfortunately that's not a workaround we can use as we need the CXProvider delegate to inform us about audio session changes so we can redirect the audio to the speaker (so the proximity sensor doesn't activate and locking the screen doesn't end the call)
Any insights or workarounds would be greatly appreciated.
Any insights or workarounds would be greatly appreciated.
Unfortunately, my answer here is that I think WKWebView and CallKit are architecturally incompatible and, to the extent anything works, that success is effectively "accidental", often as a side effect of incorrectly using one of the APIs. With each of the APIs, there are two fundamental conflicts in their design:
WKWebView
-
WKWebView was designed as primarily as a foreground API and has never really integrated background operation as an intended use. Note that Picture In Picture is a form of foreground usage.
-
WKWebView's out of process rendering system means that the audio playback is actually occurring in a a secondary process with it's own audio session.
CallKit/PushKit
-
CallKit/PushKit are specifically designed as "background" APIs. Their entire purpose is to wake app for incoming calls in the background, which means they can launch into the background at ANY time, even in the most secure device state ("Prior to first unlock").
-
This isn't obvious from a surface API read, but CallKit is an audio API (just a very specialized one). It has specific requirement about audio session configuration (like configuring before call report) and session activation (don't activate the session yourself) because what CallKit actually does is modify your audio session to a specialized audio session configuration which is different than the standard PlayAndRecord session.
The problem here is that the conflict between these two architectures will basically create a nearly endless stream of failures. For example, receiving calls in the background is "standard" voip functionality, however:
-
In my experience, it's difficult to get WKWebView into a fully functional state from a background launch.
-
If you manage to get part that point, WKWebView shouldn't be able to activate a PlayAndRecord session from the background, as capability is specifically restricted to CallKit (and the PTT framework).
-
If you manage to, it's typically because you distorted CallKit's audio session configuration in a way that means it's not ACTUALLY a correctly configured call session. That creates other weird side effects like interruption issues and/or a lower max volume.
However, the worst part of all this is that because of how the development process interacts with our background APIs, the typical experience of developers who try to get this working goes something like this:
-
An initial prototype is built and some basic experimentation is done. The approach seems promising except for <some details>.
-
Further testing and experimentation continue but it never seems to QUITE work the way you'd expect.
What's happening here is that #1 is almost either focused entirely on the foreground and/or tested through debugger, both of which distort the app behavior in ways that allow things to work that would otherwise fail. For example, WKWebView cannot activate a PlayAndRecord session in the background, but it can when your app is in the foreground, assuming CallKit isn't already active.
In any case, the assumption here is that if you can JUST sort out <some detail> everything will work fine when, in fact, to opposite is true. Foreground operation is the easy part, background operation is where everything really starts to fall apart.
Moving to the specific issue you described here:
CallKit session gets fully established the web based media streams break, rendering as gray with no audio.
Yes. This is a DIRECT result of #2. CallKit activated it's own audio session inside your app, which interrupted the audio session of your WebView, just like it would interrupt Music.app or Voice Memos.
You then said:
... Up to iOS 18 we worked around it by not fulfilling the call start action
Failing to fulfill the start action is functionally the same as not using CallKit at all. The CallKit audio session never activated, so you're not actually in a functioning CallKit call. Delaying the fullfil is basically leaving the call in a half complete state.
Unfortunately, you can't simply leave the call in this state. Every CallKit action has a timeout, after which the action will automatically fail. CXStartCallAction has one of the longest (600s) but this approach has always meant that you "call" could never be longer than 10 min.
In any case, here is the way I'd summarize all this:
-
If you intend to support receiving calls from the background, then you need CallKit and you can't/shouldn't really use WKWebView. It just isn't going to work.
-
If you only intend to support "foreground" calling (meaning, the call always starts when the call is in the foreground), then you don't need CallKit. Just use WKWebView and the "audio"* background category.
- One somewhat subtle point about voip apps is that the "voip" background category is NOT how what keeps voip apps awake on calls, the audio background category is.
Note that "call notification" for #2 can be implemented without CallKit by using standard high priority alert pushes for call notification.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware