Scaling developer experience: How we improved Android Studio in a large monorepo

Introduction Long integrated development environment (IDE) sync/indexing times can quietly erode developer productivity, making code navigation sluggish, spiking memory usage, and slowing down Jetpack Compose preview updates, turning the IDE into a bottleneck rather than a helpf … | Continue reading


@engineering.grab.com | 2 days ago

Enhancing Flink Deployment with Shadow Testing

Introduction Ensuring the reliability of Apache Flink deployments in Grab is crucial for the availability of our business-critical, real-time applications. While all applications are tested in a staging environment before getting promoted to the production environment, there is … | Continue reading


@engineering.grab.com | 10 days ago

Data Mesh at Grab Part II: The Foundational Tools behind Certification

Introduction In Part I, we discussed why Grab is investing in a data mesh, referred to as the Signals Marketplace within Grab, as part of our evolving data culture. We also explained how data certification aids teams in reliably reusing data across different domains. However, cu … | Continue reading


@engineering.grab.com | 17 days ago

Data Mesh at Grab Part II: The Foundational Tools behind Certification

Introduction In Part I, we discussed why Grab is investing in a data mesh, referred to as the Signals Marketplace within Grab, as part of our evolving data culture. We also explained how data certification aids teams in reliably reusing data across different domains. However, cu … | Continue reading


@engineering.grab.com | 17 days ago

Record, generate, run: AI-powered UI test generation for iOS

Introduction In our recent AutoTrack SDK blog post, we shared how we solved the challenge of capturing complete user journeys across our mobile app. One of the most promising applications we highlighted was automating iOS UI (User Interface) test case generation using the rich i … | Continue reading


@engineering.grab.com | 25 days ago

Record, generate, run: AI-powered UI test generation for iOS

Introduction In our recent AutoTrack SDK blog post, we shared how we solved the challenge of capturing complete user journeys across our mobile app. One of the most promising applications we highlighted was automating iOS UI (User Interface) test case generation using the rich i … | Continue reading


@engineering.grab.com | 25 days ago

From firefighting to building: How AI agents restored our team’s core productivity

Abstract Grab’s Analytics Data Warehouse (ADW) team supports over 1,000 users each month and manages an extensive repository of more than 15,000 tables, which powers approximately 50% of all queries within our data lake. However, the manual process of addressing “quick questions … | Continue reading


@engineering.grab.com | 1 month ago

Enabling R8 optimization at scale with AI-assisted debugging

Grab is Southeast Asia’s leading superapp, providing a suite of services that bring essential needs to users throughout the region. Its offerings include ride-hailing, food delivery, parcel delivery, mobile payments, and more. With safety, efficiency, and user-centered design at … | Continue reading


@engineering.grab.com | 2 months ago

Reclaiming Terabytes: Optimizing Android image caching with TLRU

Introduction In a previous post, we discussed Project Bonsai, our initiative to reduce the Grab app’s download size. We successfully reduced the Android Application Package (APK) download size by 26%. This reduction offers a substantial advantage: it minimizes download friction, … | Continue reading


@engineering.grab.com | 2 months ago

Cursor at Grab: Adoption and impact

Adoption overview The illustration below encapsulates how Cursor is scaled across Grab, achieving rapid and widespread adoption that accelerated software development and empowered non-technical teams to build solutions. Figure 1: Adoption overview of AI tool Cursor in Grab. … | Continue reading


@engineering.grab.com | 3 months ago

Docker lazy loading at Grab: Accelerating container startup times

Introduction At Grab, we’ve been exploring ways to dramatically reduce container startup times for our data platforms. Large container images for services like Airflow and Spark Connect were taking minutes to download, causing slow cold starts and poor auto-scaling performance. … | Continue reading


@engineering.grab.com | 3 months ago

From deployment slop to production reality: How BriX bridges the gap with enterprise-grade AI infrastructure

Abstract You’ve vibe-coded an AI assistant that’s a game-changer for your team. It works perfectly on your laptop. But when you try to deploy it company-wide, everything falls apart. This is what is known as “deployment slop”—the messy reality when quick AI prototypes hit the e … | Continue reading


@engineering.grab.com | 4 months ago

Kinabalu AI SRE - Leveraging AI for scalable diagnostics and alert management (Part 1)

Introduction If you’ve ever been on-call during an outage, you know the drill: a flood of alerts, five dashboards open, logs streaming from different places, a dozen threads in Slack, and still no clear picture. Context-switching kills velocity, and “where do I even start?” beco … | Continue reading


@engineering.grab.com | 4 months ago

Demystifying user journeys: Revolutionizing troubleshooting with auto tracking

Introduction Troubleshooting critical issues by deciphering a user’s journey on the Grab app is an extremely challenging task. With countless user journeys and multiple paths through the User Interface (UI), it’s akin to searching for a needle in a vast haystack. This challenge … | Continue reading


@engineering.grab.com | 4 months ago

How Grab is accelerating growth with real-time personalization using Customer Data Platform scenarios

Introduction Delivering personalized user experiences in real-time is central to Grab’s strategy, but achieving this at scale poses significant engineering challenges. Grab’s Customer Data Platform (CDP) and Growth team has successfully delivered several real-time campaigns, dri … | Continue reading


@engineering.grab.com | 5 months ago

A Decade of Defense: Celebrating Grab's 10th Year Bug Bounty Program

Introduction Ten years ago, we launched our bug bounty program in partnership with HackerOne. Beyond a security initiative, it represented an open invitation to collaborative development. As pioneers in Southeast Asia, we began the program with 23 initial researchers, and it has … | Continue reading


@engineering.grab.com | 5 months ago

Real-time data quality monitoring: Kafka stream contracts with syntactic and semantic test

Introduction In today’s data-driven landscape, monitoring data quality has become a critical need for ensuring reliable and efficient data usage across domains. High-quality data is the backbone of AI innovation, driving efficiency and unlocking new opportunities. As decentraliz … | Continue reading


@engineering.grab.com | 5 months ago

SpellVault’s evolution: Beyond LLM apps, towards the agentic future

Introduction At Grab, innovation isn’t just about building new features; it’s about evolving our platforms to meet the changing needs of our users and the broader technological landscape. SpellVault, our internal AI platform, exemplifies this philosophy. When SpellVault was firs … | Continue reading


@engineering.grab.com | 5 months ago

Grab's Mac Cloud Exit supercharges macOS CI/CD

Introduction In our mission to optimize continuous integration and delivery (CI/CD), we’ve taken a bold step by relocating our infrastructure from a cloud vendor in the US to a colocation cluster within Southeast Asia, closer to our Git server infrastructure. This change has dra … | Continue reading


@engineering.grab.com | 6 months ago

How We Built a Custom Vision LLM to Improve Document Processing at Grab

Introduction In the world of digital services, accurate extraction of information from user-submitted documents such as identification (ID) cards, driver’s licenses, and registration certificates is a critical first step for processes like electronic know-your-customer (eKYC). T … | Continue reading


@engineering.grab.com | 6 months ago

Machine-learning predictive autoscaling for Flink

Introduction As Grab transitions to derive more valuable insights from our wealth of operational data, we are witnessing a steep increase in stream-processing applications. Over the past year, the number of Flink applications grew 2.5 times, driven by interest in real-time strea … | Continue reading


@engineering.grab.com | 6 months ago

Modernising Grab’s model serving platform with NVIDIA Triton Inference Server

Introduction Catwalk is Grab’s machine learning (ML) model serving platform, designed to enable data scientists and engineers in deploying production-ready inference APIs. Currently, Catwalk powers hundreds of ML models and online deployments. To accommodate this growth, the pla … | Continue reading


@engineering.grab.com | 6 months ago

Highly concurrent in-memory counter in GoLang

Introduction Ah, the familiar beep beep beep but don’t worry, it’s not your alarm coaxing you out of bed. No, this is far worse: the dreaded PagerDuty on-call alert! What’s the crisis this time? There appears to be an issue with high database CPU utilisation, overwhelmed by a fl … | Continue reading


@engineering.grab.com | 7 months ago

User foundation models for Grab

Introduction Artificial intelligence (AI) is central to Grab’s mission of delivering valuable, personalised experiences to millions of users across Southeast Asia. Achieving this requires a deep understanding of individual preferences, such as their favorite foods, relevant adve … | Continue reading


@engineering.grab.com | 7 months ago

Powering Partner Gateway metrics with Apache Pinot

Introduction Grab operates as a dynamic ecosystem involving partners and various service providers, necessitating real-time intelligence and decision-making for seamless integration and service delivery. To facilitate this, GrabDeveloper serves as Grab’s centralized platform for … | Continue reading


@engineering.grab.com | 7 months ago

Taming the monorepo beast: Our journey to a leaner, faster GitLab repo

At Grab, our engineering teams rely on a massive Go monorepo that serves as the backbone for a large portion of our backend services. This repository has been our development foundation for over a decade, but age brought complexity, and size brought sluggishness. What was once a … | Continue reading


@engineering.grab.com | 8 months ago

Data mesh at Grab part I: Building trust through certification

Introduction At Grab, our journey towards a more robust and scalable data ecosystem has been a continuous evolution. Considering the size of our data lake and complexity of our ecosystem, with businesses spanning across ride hailing, food delivery, and financial services, we have … | Continue reading


@engineering.grab.com | 9 months ago

The evolution of Grab's machine learning feature store

Introduction In this post, we outline how we transformed the way we serve data for our machine learning (ML) models and why we chose Amazon Aurora Postgres as the storage layer for our new feature store. At Grab, we have always been at the forefront of leveraging technology to en … | Continue reading


@engineering.grab.com | 9 months ago

Grab's service mesh evolution: From Consul to Istio

The challenge: When good enough isn’t good enough Picture this: It’s 2024, and Grab’s microservices ecosystem is thriving with over 1000 services running in different infrastructure. But behind the scenes, our service mesh setup is showing its age. We’re running Consul with a fal … | Continue reading


@engineering.grab.com | 10 months ago

DispatchGym: Grab’s reinforcement learning research framework

Introduction DispatchGym is a research framework designed to facilitate Reinforcement Learning (RL) studies and applications for the dispatch system, which matches bookings with drivers. The primary goal is to empower data scientists with a tool that allows them to independently … | Continue reading


@engineering.grab.com | 10 months ago

Counter Service: How we rewrote it in Rust

Abstract The Integrity Data Platform (IDP) team decided to rewrite one of our heavy Queries Per Second (QPS) Golang microservices in Rust. It resulted in 70% infrastructure savings at a similar performance, but was not without its pitfalls. This article will elaborate on: How we … | Continue reading


@engineering.grab.com | 11 months ago

The complete stream processing journey on FlinkSQL

Introduction In the fast-paced world of data analytics, real-time processing has become a necessity. Modern businesses require insights not just quickly, but in real-time to make informed decisions and stay ahead of the competition. Apache Flink has emerged as a powerful tool in … | Continue reading


@engineering.grab.com | 11 months ago

Effortless enterprise authentication at Grab: Dex in action

Introduction Grab, Southeast Asia’s leading superapp, has created many internal applications to support its diverse range of internal and external business needs. Authentication1 and authorisation2 serve as fundamental components of application development, as robust identity and … | Continue reading


@engineering.grab.com | 11 months ago

From failure to success: The birth of GrabGPT, Grab’s internal ChatGPT

Introduction In March 2023, I embarked on a mission to explore the potential of Large Language Models (LLMs) within Grab. What started off as an attempt to solve a specific problem—reducing the burden on our ML Platform team’s support channels, ended up becoming something much bi … | Continue reading


@engineering.grab.com | 12 months ago

Streamlining RiskOps with the SOP agent framework

Introduction In the blog our previous introduction to the SOP-driven LLM Agent Framework, we the potential of LLM agent framework to revolutionise business operations was discussed. Now, we’re excited to explore a compelling use case: automating Account Takeover (ATO) investigati … | Continue reading


@engineering.grab.com | 1 year ago

Introducing the SOP-driven LLM agent frameworks

Introduction We’re excited to introduce an innovative Large Language Model (LLM) agent framework that reimagines how enterprises can harness the power of AI to streamline operations and boost productivity. At its core, this framework leverages Standard Operating Procedures (SOPs) … | Continue reading


@engineering.grab.com | 1 year ago

Evaluating performance impact of removing Redis-cache from a Scylla-backed service

Introduction At Grab, we operate a set of services that manage and provide counts of various items. While this may seem straightforward, the scale at which this feature operates—benefiting millions of Grab users daily—introduces complexity. This feature is divided into three micr … | Continue reading


@engineering.grab.com | 1 year ago

Facilitating Docs-as-Code implementation for users unfamiliar with Markdown

Introduction Although Grab is a tech company, not everyone is an engineer. Many team members don’t use GitLab daily, and Markdown’s quirks can be challenging for them. This made adopting the Docs-as-Code culture a hurdle, particularly for non-engineering teams responsible for key … | Continue reading


@engineering.grab.com | 1 year ago

Improving Hugo stability and addressing oncall challenges through automation

Introduction Hugo plays a pivotal role in enabling data ingestion for Grab’s data lake, managing over 4,000 pipelines onboarded by users. The stability of Hugo pipelines is contingent upon the health of both the data sources and various Hugo components. Given the complexity of th … | Continue reading


@engineering.grab.com | 1 year ago

Building a Spark observability product with StarRocks: Real-time and historical performance analysis

Introduction At Grab, we’ve been working to perfect our Spark observability tools. Our initial solution, Iris, was developed to provide a custom, in-depth observability tool for Spark jobs. As described in our previous blog post, Iris collects and analyses metrics and metadata at … | Continue reading


@engineering.grab.com | 1 year ago

We Cut GrabFood.com’s Page JavaScript Asset Sizes by 3x

Find out how the GrabFood team cut their bundle size by 3 times with these 7 webpack bundle optimisation strategies. | Continue reading


@engineering.grab.com | 3 years ago

We store and process millions of orders daily

The Grab Order Platform is a distributed system that processes millions of GrabFood or GrabMart orders every day. Learn about how the Grab order platform stores food order data to serve transactional (OLTP) and analytical (OLAP) queries. | Continue reading


@engineering.grab.com | 3 years ago

Grab's real-time event processing engine, Trident

Find out where the messages and rewards come from, that arrive on your Grab app. Walk through scaling and processing optimizations that achieve tremendous throughput. | Continue reading


@engineering.grab.com | 5 years ago

How Grab built its in-house chat platform for the web

This blog post shares our learnings from building our very own chat platform for the web. | Continue reading


@engineering.grab.com | 5 years ago

We Prevented App Performance Degradation from Sudden Ride Demand Spikes

This blog addresses how engineers overcame the challenges Grab faced during the initial days due to sudden spike in ride demand. | Continue reading


@engineering.grab.com | 6 years ago

Plumbing at Scale

This article details our journey building and deploying an event sourcing platform in Go, building a stream processing framework over it, and then scaling it (reliably and efficiently) to service over 300 billion events a week. | Continue reading


@engineering.grab.com | 6 years ago

About Being a Principal Engineer at Grab

Curious about what a Principal Engineer role at Grab entails? Our Principal Engineers' responsibilities range from solving complex problems, taking care of the system-level architecture, collaborating with cross-functional teams, providing mentorship, and more. | Continue reading


@engineering.grab.com | 6 years ago

Catwalk: Serving Machine Learning Models at Scale

This blog post explains why and how we came up with a machine learning model serving platform to accelerate the use of machine learning in Grab. | Continue reading


@engineering.grab.com | 6 years ago