Around last month, I was contacted by the Agoda Careers team, asking what it was about the Agoda culture that impresses me. Hearing that, I immediately thought about Agoda’s Fail Fast practice.
During my time at Agoda, I worked in the Core Engineering Team that takes care of the frontend performance and availability of the website.
With millions of users each day and the amount of money Agoda could lose each second the system is down, it seems like the weight of our responsibility shouldn’t allow for failure at all. How then could we “fail like a boss” if the stakes are so high? Let me tell you how.
Start by trying to understand the proposition
At Agoda before we get started on any project, we needed to be able to answer specific questions like:
- What is the business impact? — since our company is not a charity, what value are we creating for our company/users.
- How can we measure such business impact? — Normally, at Agoda, we already have a set of metrics for measuring business impact. However, when we are unable to use our existing metrics to measure, we design alternative methods.
- How can we tell whether the project is a success or a failure?
- What is a system design that will prevent damage from cascading to other business elements?
Regardless of the team management style (some teams use Scrum, some are Kanban believers, and some teams even use Waterfall!), the first step we all agreed on was to use a feedback loop to understand the business and technical value and the risks associated with our work. As soon as we have everything that we should have in the feedback loop, we could all put our minds at ease.
Use the feature toggle and experiment
While working at Agoda, one of many things you will come across is the feature toggle and experiment in order to ensure that our development is fail-safe.
Image credit: https://martinfowler.com/articles/feature-toggles.html
Using a feature toggle can be best explained using the above picture. The green boxes are the code blocks that will be executed, the orange lines are the execution pipeline connecting each code block together, and the blue box is the execution path. The switch in yellow can alter the execution path.
What we are trying to establish as a practice is to deploy the placeholder of our feature toggle first, regardless of whether the feature is completed or not, in order to ensure that we had a feature toggle that is control-ready.
The next step is to add a measurement metric into each execution path so that we can measure whether our job is successful or not.
Before deploying, run automated tests
One of the principles that we held on to religiously is to always write a test (if you really want to sleep soundly at night). We also had one unspoken rule that if you didn’t write an automated test, you would do the clean up when somebody eventually breaks it.
CI/CD at Agoda, especially Orchestration tools that have been developed by Core Engineering Team, allowed us to do just that and gave us more confidence in deployment, although this does by no means mean that it won’t fail in production. ;p
Fail for real!!!
At Agoda, a mistake does not mean a lost career, but rather an opportunity to learn from that mistake. Due to system demands, the potential impact of failure is huge. However, since the engineering practices and culture at Agoda are those of a company that is a truly data-driven company, my experience was that we were allowed to fail multiple times which, in turn, gave us the opportunity to learn from our mistakes and make improvements in each and every feedback loop.
Anyone interested in trying to fail like a boss at Agoda should visit https://careersatagoda.com and apply.
Happy coding, everyone!
Mahasak Pijittum is an Agoda alumnus. During his time at Agoda, he was a Senior Software Engineer in the Tech department. Read more about Mahasak’s experience at Agoda in his full Thai language article on Medium here.