Play Services Complexity : Wires Are Obsolete

I was recently tasked with constructing an Android API that followed the design tenets of Google Play Services in form and function. Many who have discovered this API for the first time after using the Android framework for a while often have with a similar feeling — it feels like an overly complex surface area.

We have client objects, connection states, intent resolutions, and all sorts of administrative tasks to handle that simply aren't found in the core framework. It's not surprising why some would wonder how these two very different APIs for Android could come from the same source! After examining this architecture at a deeper level, I feel like I've come away with a newfound appreciation for the Play Services architecture.

Note: I do not work for Google, and Google Play Services is not open source. All that follows are observations from an outsider based on what is known about Android and what can be seen in the API surface area.

Core Framework vs. Play Services

Let's start by examining the design goals of Play Services in contrast with the platform code. At a high level, the relevant architectural issues at hand are:

Play Services is built into an application package (APK), rather than the framework core.
This application is updated automatically by the Play Store, outside of platform releases.
Client applications (to some extent) dictate the version of Play Services on the device.

In the Android framework, system services are registered at boot time with a registry component known as service manager, which tracks all services available to other applications. There is a strong assumption that framework services will always be available and the processes containing them will always be running. Even if they die, Android ensures they get restarted as quickly as possible. The framework, therefore, has very limited checks around service availability. Services are never unregistered, removed, or replaced at runtime.

Google Play Services, on the other hand, exposes services directly to client applications to be bound using intents. There's nothing terribly magical about this process. The same general capability is available to any application in the SDK using AIDL. At any moment, the APK containing Play Services may be updated. This can happen when a scheduled version upgrade rollout occurs, or even when a client application simply requests a higher version than is currently installed on the device. When this happens, all the existing services are replaced with a different instance.

In short, the core framework is quite static at runtime, while Play Services is designed to be very dynamic.

This means that the traditional service registration architecture of the core framework is not equipped to handle cases where the services themselves may die or disappear for a period of time.

Dynamic Services and You!

What would happen in the scenario where an update triggers while we are connected to the location service, for example? The service process terminates while the update completes. Because of this, clients need to have a concept of connection state with the API. Client applications must be able to validate that they successfully reached the APK where the services live, and receive callbacks if the service terminates for any reason.

This is where the GoogleApiClient comes in. This top-level object manages authentication between the client application and Play Services, but also registers several callbacks allowing the application to know when the connection is broken. Some problems (like having the wrong version installed) can be resolved with the user's help, so an Intent is provided to trigger UI for resolution. However, in many cases the client simply must wait for a reconnection to try again.

Narrowing Connection Scope

Another seemingly awkward design decision in Play Services is the use of Api classes and the addApi() builder method. Why must we declare scope through the client before connecting?

GoogleApiClient apiClient = new GoogleApiClient.Builder(this)
        .addApi(LocationServices.API)
        .addApi(Cast.API)
        .build();

This is partially due to the sheer number of APIs exposed by this one services package. Without some advanced knowledge, the client library code would have to bind to every possible service during the connection phase just so they might be available if the client application needs them. Limiting scope reduces the number of IDLs that actually get bound to each client instance. The GoogleApiClient object also becomes a convenient place to own each of those bound service interfaces, so that they can be easily and quickly invalidated on disconnect.

Keeping Focus on Connection State

The interaction pattern with Play Services is also quite different from the core Android SDK. Core framework services are generally exposed through a "manager" interface (e.g. PackageManager) that is specific to the task of each service. This object is both the initial entry point into that service and the owner of the IPC interface necessary to access it.

With Play Services, however, each "service" is exposed as a static shell (e.g. Cast.CastApi or LocationServices.FusedLocationProviderApi) that looks a bit more like a function table than a service instance. The GoogleApiClient is also never far away — a required parameter in almost every method call. Why?

This design ensures that the connection state is never too far removed from the client application code. If the client object were wrapped by the service instances or otherwise buried in the API, developers probably wouldn't hold a reference to it. When an unexpected service disconnect occurs, having the client handy to reconnect manually when necessary is important. It also localizes all the state to a single location, rather than having connection state spread across each service endpoint.

The Classpath Problem

Despite all the user and developer advantages of this approach, dynamic updates of your services code has one disadvantage — the client code doesn't get to live on the shared application classpath. The framework core and other built-in device libraries are part of the pre-defined classpath used by every application process. This means that only one instance of the library needs to exist (usually in /system/framework) and each process can share access to it.

Android has a solution to extend this support to other libraries called SDK Add-ons. This works well, so long as the library code is only updated when a platform OTA is distributed. In this way, no actual code is distributed or copied into the client application packages. It's all dynamically linked on the device system image.

However, updating outside this cycle means the client code cannot live on the system classpath. Instead, it must be copied into the client applications as a dependency. This causes the code to affect the client's dex method count, as well as introducing multiple copies of the same code across every client using the library — a large number for a library like Play Services.

This is why Play Services will update automatically if an application requests an updated version. There are various copies of the client code in each application process representing different versions, but only one instance of Play Services that has to service requests from them all. A complex API challenge to be sure, and one that the core framework doesn't have to handle.

Summary

Google Play Services has a lot of moving parts because it has to handle a lot of dynamic use cases that the core framework doesn't need to take into account. If you are asked to produce an SDK that has similar design goals, don't be afraid to look at it as a template just because of the complexity. The more you unravel the problem, the more you might realize what they've done might be just enough.