Introduction Long integrated development environment (IDE) sync/indexing times can quietly erode developer productivity, making code navigation sluggish, spiking memory usage, and slowing down Jetpack Compose preview updates, turning the IDE into a bottleneck rather than a helpf … | Continue reading
Introduction Ensuring the reliability of Apache Flink deployments in Grab is crucial for the availability of our business-critical, real-time applications. While all applications are tested in a staging environment before getting promoted to the production environment, there is … | Continue reading
Introduction In Part I, we discussed why Grab is investing in a data mesh, referred to as the Signals Marketplace within Grab, as part of our evolving data culture. We also explained how data certification aids teams in reliably reusing data across different domains. However, cu … | Continue reading
Introduction In Part I, we discussed why Grab is investing in a data mesh, referred to as the Signals Marketplace within Grab, as part of our evolving data culture. We also explained how data certification aids teams in reliably reusing data across different domains. However, cu … | Continue reading
Introduction In our recent AutoTrack SDK blog post, we shared how we solved the challenge of capturing complete user journeys across our mobile app. One of the most promising applications we highlighted was automating iOS UI (User Interface) test case generation using the rich i … | Continue reading
Introduction In our recent AutoTrack SDK blog post, we shared how we solved the challenge of capturing complete user journeys across our mobile app. One of the most promising applications we highlighted was automating iOS UI (User Interface) test case generation using the rich i … | Continue reading
Abstract Grab’s Analytics Data Warehouse (ADW) team supports over 1,000 users each month and manages an extensive repository of more than 15,000 tables, which powers approximately 50% of all queries within our data lake. However, the manual process of addressing “quick questions … | Continue reading
Grab is Southeast Asia’s leading superapp, providing a suite of services that bring essential needs to users throughout the region. Its offerings include ride-hailing, food delivery, parcel delivery, mobile payments, and more. With safety, efficiency, and user-centered design at … | Continue reading
Introduction In a previous post, we discussed Project Bonsai, our initiative to reduce the Grab app’s download size. We successfully reduced the Android Application Package (APK) download size by 26%. This reduction offers a substantial advantage: it minimizes download friction, … | Continue reading
Adoption overview The illustration below encapsulates how Cursor is scaled across Grab, achieving rapid and widespread adoption that accelerated software development and empowered non-technical teams to build solutions. Figure 1: Adoption overview of AI tool Cursor in Grab. … | Continue reading
Introduction At Grab, we’ve been exploring ways to dramatically reduce container startup times for our data platforms. Large container images for services like Airflow and Spark Connect were taking minutes to download, causing slow cold starts and poor auto-scaling performance. … | Continue reading
Abstract You’ve vibe-coded an AI assistant that’s a game-changer for your team. It works perfectly on your laptop. But when you try to deploy it company-wide, everything falls apart. This is what is known as “deployment slop”—the messy reality when quick AI prototypes hit the e … | Continue reading
Introduction If you’ve ever been on-call during an outage, you know the drill: a flood of alerts, five dashboards open, logs streaming from different places, a dozen threads in Slack, and still no clear picture. Context-switching kills velocity, and “where do I even start?” beco … | Continue reading
Introduction Troubleshooting critical issues by deciphering a user’s journey on the Grab app is an extremely challenging task. With countless user journeys and multiple paths through the User Interface (UI), it’s akin to searching for a needle in a vast haystack. This challenge … | Continue reading
Introduction Delivering personalized user experiences in real-time is central to Grab’s strategy, but achieving this at scale poses significant engineering challenges. Grab’s Customer Data Platform (CDP) and Growth team has successfully delivered several real-time campaigns, dri … | Continue reading
Introduction Ten years ago, we launched our bug bounty program in partnership with HackerOne. Beyond a security initiative, it represented an open invitation to collaborative development. As pioneers in Southeast Asia, we began the program with 23 initial researchers, and it has … | Continue reading
Introduction In today’s data-driven landscape, monitoring data quality has become a critical need for ensuring reliable and efficient data usage across domains. High-quality data is the backbone of AI innovation, driving efficiency and unlocking new opportunities. As decentraliz … | Continue reading
Introduction At Grab, innovation isn’t just about building new features; it’s about evolving our platforms to meet the changing needs of our users and the broader technological landscape. SpellVault, our internal AI platform, exemplifies this philosophy. When SpellVault was firs … | Continue reading
Introduction In our mission to optimize continuous integration and delivery (CI/CD), we’ve taken a bold step by relocating our infrastructure from a cloud vendor in the US to a colocation cluster within Southeast Asia, closer to our Git server infrastructure. This change has dra … | Continue reading
Introduction In the world of digital services, accurate extraction of information from user-submitted documents such as identification (ID) cards, driver’s licenses, and registration certificates is a critical first step for processes like electronic know-your-customer (eKYC). T … | Continue reading
Introduction As Grab transitions to derive more valuable insights from our wealth of operational data, we are witnessing a steep increase in stream-processing applications. Over the past year, the number of Flink applications grew 2.5 times, driven by interest in real-time strea … | Continue reading
Introduction Catwalk is Grab’s machine learning (ML) model serving platform, designed to enable data scientists and engineers in deploying production-ready inference APIs. Currently, Catwalk powers hundreds of ML models and online deployments. To accommodate this growth, the pla … | Continue reading
Introduction Ah, the familiar beep beep beep but don’t worry, it’s not your alarm coaxing you out of bed. No, this is far worse: the dreaded PagerDuty on-call alert! What’s the crisis this time? There appears to be an issue with high database CPU utilisation, overwhelmed by a fl … | Continue reading
Introduction Artificial intelligence (AI) is central to Grab’s mission of delivering valuable, personalised experiences to millions of users across Southeast Asia. Achieving this requires a deep understanding of individual preferences, such as their favorite foods, relevant adve … | Continue reading
Introduction Grab operates as a dynamic ecosystem involving partners and various service providers, necessitating real-time intelligence and decision-making for seamless integration and service delivery. To facilitate this, GrabDeveloper serves as Grab’s centralized platform for … | Continue reading
At Grab, our engineering teams rely on a massive Go monorepo that serves as the backbone for a large portion of our backend services. This repository has been our development foundation for over a decade, but age brought complexity, and size brought sluggishness. What was once a … | Continue reading
Introduction At Grab, our journey towards a more robust and scalable data ecosystem has been a continuous evolution. Considering the size of our data lake and complexity of our ecosystem, with businesses spanning across ride hailing, food delivery, and financial services, we have … | Continue reading
Introduction In this post, we outline how we transformed the way we serve data for our machine learning (ML) models and why we chose Amazon Aurora Postgres as the storage layer for our new feature store. At Grab, we have always been at the forefront of leveraging technology to en … | Continue reading
The challenge: When good enough isn’t good enough Picture this: It’s 2024, and Grab’s microservices ecosystem is thriving with over 1000 services running in different infrastructure. But behind the scenes, our service mesh setup is showing its age. We’re running Consul with a fal … | Continue reading
Introduction DispatchGym is a research framework designed to facilitate Reinforcement Learning (RL) studies and applications for the dispatch system, which matches bookings with drivers. The primary goal is to empower data scientists with a tool that allows them to independently … | Continue reading
Abstract The Integrity Data Platform (IDP) team decided to rewrite one of our heavy Queries Per Second (QPS) Golang microservices in Rust. It resulted in 70% infrastructure savings at a similar performance, but was not without its pitfalls. This article will elaborate on: How we … | Continue reading
Introduction In the fast-paced world of data analytics, real-time processing has become a necessity. Modern businesses require insights not just quickly, but in real-time to make informed decisions and stay ahead of the competition. Apache Flink has emerged as a powerful tool in … | Continue reading
Introduction Grab, Southeast Asia’s leading superapp, has created many internal applications to support its diverse range of internal and external business needs. Authentication1 and authorisation2 serve as fundamental components of application development, as robust identity and … | Continue reading
Introduction In March 2023, I embarked on a mission to explore the potential of Large Language Models (LLMs) within Grab. What started off as an attempt to solve a specific problem—reducing the burden on our ML Platform team’s support channels, ended up becoming something much bi … | Continue reading
Introduction In the blog our previous introduction to the SOP-driven LLM Agent Framework, we the potential of LLM agent framework to revolutionise business operations was discussed. Now, we’re excited to explore a compelling use case: automating Account Takeover (ATO) investigati … | Continue reading
Introduction We’re excited to introduce an innovative Large Language Model (LLM) agent framework that reimagines how enterprises can harness the power of AI to streamline operations and boost productivity. At its core, this framework leverages Standard Operating Procedures (SOPs) … | Continue reading
Introduction At Grab, we operate a set of services that manage and provide counts of various items. While this may seem straightforward, the scale at which this feature operates—benefiting millions of Grab users daily—introduces complexity. This feature is divided into three micr … | Continue reading
Introduction Although Grab is a tech company, not everyone is an engineer. Many team members don’t use GitLab daily, and Markdown’s quirks can be challenging for them. This made adopting the Docs-as-Code culture a hurdle, particularly for non-engineering teams responsible for key … | Continue reading
Introduction Hugo plays a pivotal role in enabling data ingestion for Grab’s data lake, managing over 4,000 pipelines onboarded by users. The stability of Hugo pipelines is contingent upon the health of both the data sources and various Hugo components. Given the complexity of th … | Continue reading
Introduction At Grab, we’ve been working to perfect our Spark observability tools. Our initial solution, Iris, was developed to provide a custom, in-depth observability tool for Spark jobs. As described in our previous blog post, Iris collects and analyses metrics and metadata at … | Continue reading
Find out how the GrabFood team cut their bundle size by 3 times with these 7 webpack bundle optimisation strategies. | Continue reading
The Grab Order Platform is a distributed system that processes millions of GrabFood or GrabMart orders every day. Learn about how the Grab order platform stores food order data to serve transactional (OLTP) and analytical (OLAP) queries. | Continue reading
Find out where the messages and rewards come from, that arrive on your Grab app. Walk through scaling and processing optimizations that achieve tremendous throughput. | Continue reading
This blog post shares our learnings from building our very own chat platform for the web. | Continue reading
This blog addresses how engineers overcame the challenges Grab faced during the initial days due to sudden spike in ride demand. | Continue reading
This article details our journey building and deploying an event sourcing platform in Go, building a stream processing framework over it, and then scaling it (reliably and efficiently) to service over 300 billion events a week. | Continue reading
Curious about what a Principal Engineer role at Grab entails? Our Principal Engineers' responsibilities range from solving complex problems, taking care of the system-level architecture, collaborating with cross-functional teams, providing mentorship, and more. | Continue reading
This blog post explains why and how we came up with a machine learning model serving platform to accelerate the use of machine learning in Grab. | Continue reading