(Ab)using Picture in Picture to bring overlays to iOS
As iOS engineers we often get used to the lovely things Android can do. One of those lovely things is the ability to show overlays.
Whether you call overlays 'floating views', the ability to ‘draw over other apps’ or 'system alert windows', they're mostly familiar to us as the 'chat heads' Facebook Messenger still use.
But overlays aren't only useful for chat notifications. They're also useful for apps that require navigation in an external application. For example for Waze or Google Maps to use when they need to present some information from the ‘origin’ app, over the top of navigation.
In our case, it’s a really useful feature to show information about the full address of the delivery location (e.g. Flat 2003, 1 Pan Pen). Google Maps or Waze will often merge this information, especially if we're navigating to coordinates that we've reverse geocoded ourselves.
Unfortunately, nothing like this exists on iOS. We could make use of either push or local notifications, but they wouldn't stay up on screen unless the user explicitly went to their settings and enabled this to happen (it can’t be requested programmatically).
And even if we did do it this way, they’d block a key part of the navigation interface at the top of the screen and wouldn't have flexibility to edit and change.
A solution spotted in the wild!
Whilst running late to get into the office I hopped onto a Google Hangout on my phone. Wanting to look at Safari whilst being on the call, I closed the app and a little Picture in Picture (PiP) overlay popped up.
‘Interesting’, I thought. The Google Hangout’s app was presenting a custom view of multiple people’s video streams or initials, not just a video in the same way PiP is usually used for in YouTube or Apple’s TV app.
A little digging revealed that this is indeed supported and what I'd seen was correct! I also discovered that you can use the overlay without a video stream and instead pass on your own view to present, which can be as custom as you like it.
The downside? It requires a special entitlement (for iOS 15, no longer needed for iOS 16) that Apple has to grant in order to use it. Unfortunately, they don't take into account our use case and is purely focussed on apps that need to use the camera in the background or when multi-tasking on iPad (Zoom was the first app as far as I’m aware to make use of this, when this was still a private entitlement).
So what now?
It wouldn’t be much fun if we just gave up there. All of the APIs exist even if we don't have the special entitlement, but when we tried to use them, no PiP overlay appears. So we dug a little deeper to see how iOS determined whether or not to ‘allow’ it.
Going through iOS’ frameworks (bundled with Xcode) we saw that the framework that handles PiP passes information of the ‘content type’ to the daemon/service that that then manages PiP across iOS.
This content type information is stored (among other things) within a playback state object, `PGPlaybackState`, which is responsible (on request) for sharing any changes or differences from a previous playback state.
The content types I observed were, ‘video on demand’ (for example, playing back a locally stored video), ‘live broadcast’, ‘security camera’, ‘video call’ and ‘screen sharing’.
Each content type resulted in different controls being displayed when the PiP overlay is tapped. For example, is there a play/pause button or a skip button. And the last, ‘screen sharing’, is notable as this is a private content type not exposed in public documentation as far as I can see.
A way in?!
If we’re able to modify the content type we send to the service responsible for PiP to ‘pretend’ we’re just a live broadcast rather than a video call it might be our way to avoid the entitlement checks.
Hooking into (read: swizzling) the function that is responsible for providing the content type (`diffFromPlaybackState:`) enables us to modify what content type the service that handles PiP thinks we are providing.
Rather than being a video call, we can change our content type to be that of a live broadcast, and what do you know… it works!
The PiP service kindly still presents the custom view even though for a live broadcast it should be a video player, presumably as this is handled separately from the content type (and associated entitlement checks).