How to mitigate challenges in designing software at an early-stage startup
The great thing about starting a new project is that you get a clean slate. No baggage of design choices that you hated to look at every day in your last project. But how many times have you seen a shiny new project not turning into the same intractable mess?
It is more likely to happen in a fast-paced startup. The faster the pace, the sooner it happens. So how do you balance moving fast without being trapped in analysis paralysis and keep technical debt at a manageable level?
You design for change. Ignore the refrain that prevention is better than cure. Instead of preventing the mess, you should embrace it and mitigate it when it happens. That’s what we have done at Slang Labs.
In this article, I discuss:
In a startup, you may have some idea and intuition about the product you want to build. However, you have to iterate rapidly to achieve Product-Market Fit.
The three axioms of early-stage startups are:
The three axioms guide our engineering philosophy. Some of it may appear counter-intuitive but has proven effective in our experience.
Drawing on paper is cheaper than writing code. If building something will take a day, it is okay to spend 30 minutes thinking about it. If a task takes a week, you are better off investing half a day in thinking it through.
We design for what we want in the future in all its glory, but we implement only the parts that we need right now. That helps us in keeping an eye on the future without investing resources prematurely.
Design exercise helps in mapping the terrain and evaluating alternatives, and identifying unknowns. We seek to limit the blast radius in case of making a wrong choice:
Despite our best efforts, we may still have to re-implement significant parts of our system. Thus we invest in unit and integration test suite. Of course, it takes extra time and effort, but this safety net allows us to change code fearlessly and rapidly.
We have an automated CI/CD pipeline for effortless deployments and rollback on a Kubernetes cluster. The pipeline runs the necessary unit and integration tests for every code patch. It first deploys the new version of microservices on a staging tier, and if everything is okay, it promotes the patch and tags it as ready for prod deployment.
Slang CONVA platform provides Voice Assistant as a Service (VAaaS) specialized for various domains (e.g., e-commerce). Slang has two main components:
This section briefly describes the high-level designs of both. That will set the context to show how our engineering philosophies have helped evolve our code. As requirements changed with better market understanding, these philosophies helped us manage technical debt without sacrificing velocity.
The Client SDK provides a simple programming model of the User Journeys and the App States:
The Client SDK has a state machine for voice-to-action flow and an event bus for communication between the state machine and various sub-systems. Each subsystem implements only the part of the state machine that is relevant to it.
The state machine and event bus design decouples all subsystems and eliminates the need for an all-knowing orchestrator that handles all possible permutations of user UI and speech actions, communications with backend services, and the conversation stage.
States in the Client SDK State Machine represents the conversation stage in the life of the application:
Client SDK State Machine and all subsystems:
Each subsystem handles only one aspect, such as:
The state machine and subsystem interact with backend microservices through a utility layer.
While the Client SDK is responsible for easy-to-use programming APIs with a small footprint, it delegates tasks like automatic speech recognition (ASR), natural language understanding (NLU), Text-to-Speech (TTS), and Machine Translation (MT) to backend microservices.
A conversational voice assistant for a customer application can be created and configured using the Slang Console. Here is the life cycle of a voice assistant and the corresponding microservices:
Here is a summary of backend microservices:
The microservices are written in Python, JavaScript, and Go lang. The production deployment is on the Google Cloud Platform (GCP). In addition, we utilize GCP’s logging and monitoring services.
Following is a simplified version of our microservices and their interactions.
At present, we all are so accustomed to “touch” gestures (e.g., tap, press and hold, swipe). The gestures are almost standard now. It wasn’t so when touch phones were invented. It took experimentation and implicit user training.
Voice is a newer medium of Computer-Human Interaction. Similar interaction standards for voice will evolve too. In the case of Slang VAaaS, it is a multi-modal experience. A user may interact with the voice as well as touch.
Understanding user behavior is critical for our success. Therefore, we have devised In-app Voice Assistant Analytics. We collect a small number of key events from SDK related to voice-specific user interactions, such as:
We have a serverless analytics pipeline on Google Cloud:
The design explained above is neither the first design we created, nor it will be the last. It has evolved through several iterations:
Let’s examine how philosophy has been facilitating this evolution.
We drew a version of microservices architecture at the very beginning when we had only a demo app. So we knew that we would build it only when we need. But drawing it made it easy to discuss, and also always us aware of future needs.
This approach of keeping an eye on the future led us to design a configurable plug-and-play system. Following our minimal implementation rule, some of our microservices were routing to 3rd party services. That allowed us to go to market quickly. Later, when we developed our own more suitable implementations, we swapped out these 3rd party services with zero impact on our customers.
In low-level design, we are always mindful of the blast radius in case we have to change it radically. It goes hand-in-hand with “maximal design but minimal implementation.”
While the analytics pipeline was designed to hyper-scale (“maximal design”), we did not have enough traffic initially to justify the cost. We crafted a cheap ingestion and processing implementation using scheduled cloud functions. We knew that we would have to replace it with a different implementation. But the impact was limited to only one part of the pipeline.
We could fearlessly redesign Client SDK and almost rewrite it because we had a battery of automated tests as a safety net.
Any organization trying to ship products fast and learn from customer response will face the same three axioms:
This article showed how a set of coherent philosophies to handle it had shaped our system’s architecture.