Since I started working at Slang Labs, I have worked in the Android team. It usually requires me to work on feature additions, optimizations and provide for specific customer requirements on the Android SDK.
“With great feature additions, comes the need for loads of Analytics.”
With each newly added feature came the need to add Analytics to gauge the performance and continue the feedback loop of constant improvement of our SDK. Until recently, adding data points to track new features was a tactless task. The steps we followed could roughly be summed up as follows:
And our Analytics codebase of the SDK could be summed up as a collection of listeners to feature usages and dumping events to our analytics endpoint whenever a callback was triggered
Does this sound like something you do?
No? Great! On behalf of your Data Science team, I sincerely thank you.
Yes? You really should continue reading.
All was fine until a few months later when I was asked to help the Data Science team with some work. Being always interested in the field of Data Science, I agreed since it was an excellent opportunity to gain first-hand experience on the inner workings of the product improvement lifecycle. I had no idea how hard my work was coming back to smack me in the face.
“It's called Karma. And it’s pronounced Ha-ha-ha.”
We had put zero thought into our data collection strategy up to this point—in pursuit of the Android team’s target of “moving fast”. However, a side effect of this was that we left the Data Science team terribly short of answering our PM’s question, “How well is this feature doing, and what can we do to improve?”
We realized that although we collected every data point, the collection was so haphazard and disorganized that making sense of that data was a Herculean task.
Imagine standing in a warehouse with a list of items you need to build a chair. You have been told everything is present in the warehouse, but you don’t know where exactly. There is no catalogue of stored items, just a long line of shelves with raw materials kept in random order.
Your customer keeps calling you about the date of delivery of the chair while the warehouse owner keeps insisting that they did an amazing job of storing everything you need.
This was precisely what our Data Science team was facing. And I was brought in to help because I was the one who dumped the data (or raw materials for our chair analogy) from the client SDK. So they needed me to make sense of “what went where”.
And each time I read the Android code and reported the data collection format, I was met with exasperated sighs and informed how it was a terrible approach and how costly it would be to query the data and perform the analysis.
There was no way around it; we had to redesign the Analytics module in our codebase. The data had to be organized, so the Data Science team could reliably provide insights without running after developers trying to make sense of the data. And having been down in the trenches with the Data Science team, I empathized with the scale of the task we dumped on them.
We broke down the redesign into simpler parts.
Rather than blindly dumping events to the server, there has to be an order to the data collection flow. We designed a finite state machine (FSM) which would move between well-defined sentinel states in response to analytics events, and the data would only be dumped when the FSM transitioned between sentinel states. This was a departure from our earlier practice of relying on events to send events to Analytics.
Let me explain with an example.
Imagine a login screen on your app. Instead of dumping analytics events in response to each event like user.clicked.email_input, user.clicked.password_input, user.clicked.login_button, and user.clicked.close_button, you should instead create a sentinel state like user.processed.login. The state machine will only enter the sentinel state after the user clicks login or closes the login screen. All events and data received between state transitions will be saved, and once the state transition is completed, the data is dumped into Analytics.
The sentinel states should always be designed based on user journeys rather than control flows since user journeys remain unchanged, barring minor modifications. However, the control flow can change based on code architecture, bug fixes or new functionalities.
This ensures that with each analytics event, all relevant information about the user journey is present in one place, even as the codebase evolves with feature additions and optimizations.
All the data you collect and dump should follow a predefined schema. This ensures three things:
Without a schema, a data scientist will have to dig through the code (or have a developer do it for them), understand the raw data and how it is collected and what it means, clean it, and fix what they can reliably extract in terms of business metrics.
Without a well-defined schema, data metrics could change arbitrarily without the knowledge of all the parties involved, leading to poor or inaccurate insights which ultimately guide business decisions.
If you need to compute something, there are two ways to do it: either do it at the time of data collection or at the time of data query. If both options are available, you should prioritize computation during data collection.
This is because storage is far cheaper than computation on the cloud.
This can apply to multiple situations:
Even after defining a schema and a strict state machine for the Analytics module, the whole exercise would be futile if there is no way to enforce this.
I cannot stress this enough: you need unit tests to test your code responsible for collecting and dumping the data. Because Analytics is usually an afterthought to feature additions, unit tests would ensure you avoid accidentally breaking the entire collection pipeline.
Client-side tests are the bare minimum, but beyond that, it is a great habit to also have integration tests which enforce the data collection rules at your server endpoint.
Once we completed our redesign, the entire data collection and analysis process became streamlined, and our Data Science team could process analysis requests reliably and consistently. In addition, the Android team no longer received complaints of poorly formatted data and was able to make feature additions and add data collection metrics with the assurance everything would be stored in a properly consumable format.
So let us take a look at the key learnings from this experience: