ML Education at Uber: Program Design and Outcomes

Introduction If you have read our previous article, ML Education at Uber: Frameworks Inspired by Engineering Principles, you have seen several examples of how Uber benefits from applying Engineering Principles to drive the ML Education Program’s content design and program framewo … | Continue reading


@eng.uber.com | 1 year ago

Supercharging A/B Testing at Uber

Introduction “Immensely laborious calculations on inferior data may increase the yield from 95 to 100 percent. A gain of 5 percent, of perhaps a small total. A competent overhauling of the process of collection, or of the experimental design, may often increase the yield ten- or … | Continue reading


@eng.uber.com | 1 year ago

Vertical CPU Scaling: Reduce Cost of Capacity and Increase Reliability

This blog post describes the implementation of an automated vertical CPU scaling system in which every storage workload running at Uber is allocated the ideal amount of cores. The framework is used today to right-size more than 500,000 Docker containers, and since its inception i … | Continue reading


@eng.uber.com | 1 year ago

LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving

G. P. Meyer, A. Laddha, E. Kee, C. Vallespi-Gonzalez, C. WellingtonIn this paper, we present LaserNet, a computationally efficient method for 3D object detection from LiDAR data for autonomous driving. The efficiency results from processing LiDAR data in the native range view of … | Continue reading


@eng.uber.com | 1 year ago

Uber Halved Go Monorepo CI Build Time

Painting the Picture Before 2021, Uber engineers would have to take quite a taxing journey to make a code change to the Go Monorepo. First, the engineer would make their changes on a local branch and put up a code revision to our internal code review system, Phabricator. Next, ou … | Continue reading


@eng.uber.com | 1 year ago

Data Race Patterns in Go (Uber Engineering Blog)

Uber has adopted Golang (Go for short) as a primary programming language for developing microservices. Our Go monorepo consists of about 50 million lines of code (and growing) and contains approximately 2,100 unique Go services (and growing). Go makes concurrency a first-class ci … | Continue reading


@eng.uber.com | 1 year ago

Uber’s Emergency Button and the Technologies Behind It

Safety has long been a top priority at Uber, as Uber’s CEO Dara Khosrowshahi wrote in ‘Raising the Bar on Safety’ in September 2018. In order to #StandForSafety, the team at Uber has rolled out a set of features, such as Safety Center, Trusted Contacts, and the in-app Emergency B … | Continue reading


@eng.uber.com | 2 years ago

Avoiding CPU Throttling in a Containerized Environment

At Uber, all stateful workloads run on a common containerized platform across a large fleet of hosts. Stateful workloads include MySQL®, Apache Cassandra®, ElasticSearch®, Apache Kafka®, Apache HDFS™, Redis™, Docstore, Schemaless, etc., and in many cases these workloads are co-lo … | Continue reading


@eng.uber.com | 2 years ago

One Stone, Three Birds: Finer-Grained Encryption Apache Parquet

Overview  Data access restrictions, retention, and encryption at rest are fundamental security controls. This blog explains how we have built and utilized open-sourced Apache Parquet™'s finer-grained encryption feature to support all 3 controls in a unified way. In particular, we … | Continue reading


@eng.uber.com | 2 years ago

Ballast: An Adaptive Load Test Framework

As Uber's architecture has grown to encompass thousands of interdependent microservices, we need to test our mission-critical components at max load in order to preserve reliability. Accurate load testing allows us to validate if a set of services are working at peak usage and op … | Continue reading


@eng.uber.com | 2 years ago

DeepETA: How Uber Predicts Arrival Times Using Deep Learning

At Uber, magical customer experiences depend on accurate arrival time predictions (ETAs). We use ETAs to calculate fares, estimate pickup times, match riders to drivers, plan deliveries, and more. Traditional routing engines compute ETAs by dividing up the road network into small … | Continue reading


@eng.uber.com | 2 years ago

Uber Radar: Intelligent Early Fraud Detection System with Humans in the Loop

Introduction Uber is a worldwide marketplace of services, processing thousands of monetary transactions every second. As a marketplace, Uber takes on all of the risks associated with payment processing. Uber partners who use the marketplace to provide services are paid for their … | Continue reading


@eng.uber.com | 2 years ago

We Saved 70K Cores Across 30 Mission-Critical Services

Introduction As part of Uber engineering’s wide efforts to reach profitability, recently our team was focused on reducing cost of compute capacity by improving efficiency. Some of the most impactful work was around GOGC optimization. In this blog we want to share our experience w … | Continue reading


@eng.uber.com | 2 years ago

Crisp: Critical Path Analysis for Microservice Architectures

Uber’s backend is an exemplar of microservice architecture. Each microservice is a small, individually deployable program performing a specific business logic (operation). The microservice architecture is a type of distributed computing system, which is suitable for independent d … | Continue reading


@eng.uber.com | 2 years ago

Real-Time Exactly-Once Event Processing

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we leveraged open source technology to bui … | Continue reading


@eng.uber.com | 2 years ago

Uber Migrated Financial Data from DynamoDB to Docstore (Custom Db)

Introduction Each day, Uber moves millions of people around the world and delivers tens of millions of food and grocery orders. This generates a large number of financial transactions that need to be stored with provable completeness, consistency, and compliance.   LedgerStore is … | Continue reading


@eng.uber.com | 2 years ago

Michelangelo: Uber's Machine Learning Platform

Uber Engineering introduces Michelangelo, our machine learning-as-a-service system that enables teams to easily build, deploy, and operate ML solutions at scale. | Continue reading


@eng.uber.com | 2 years ago

Building Uber’s Fulfillment Platform for Planet-Scale Using Google Cloud Spanner

  Introduction The Fulfillment Platform is a foundational Uber domain that enables the rapid scaling of new verticals. The platform handles billions of database transactions each day, ranging from user actions (e.g., a driver starting a trip) and system actions (e.g., creating an … | Continue reading


@eng.uber.com | 2 years ago

Handling Flaky Unit Tests in Java

Introduction to Flaky Tests Unit testing forms the bedrock of any Continuous Integration (CI) system. It warns software engineers of bugs in newly-implemented code and regressions in existing code, before it is merged. This ensures increased software reliability. It also improves … | Continue reading


@eng.uber.com | 2 years ago

Enabling Seamless Kafka Async Queuing with Consumer Proxy

Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone of our technology stack. It empowers a large number of different workf … | Continue reading


@eng.uber.com | 2 years ago

Enabling Seamless Kafka Async Queuing with Consumer Proxy

Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone of our technology stack. It empowers a large number of different workf … | Continue reading


@eng.uber.com | 2 years ago

Data Shapes the Uber Rider App

Introduction Data is crucial for our products. Data analytics help us provide a frictionless experience to the people that use our services. It also enables our engineers, product managers, data analysts, and data scientists to make informed decisions. The impact of data analysis … | Continue reading


@eng.uber.com | 2 years ago

Efficiently managing the supply and demand on Uber’s Big Data Platform

With Uber’s business growth and the fast adoption of big data and AI, Big Data scaled to become our most costly infrastructure platform. To reduce operational expenses, we developed a holistic framework with 3 pillars: platform efficiency, supply, and demand (using supply to desc … | Continue reading


@eng.uber.com | 2 years ago

Cost-Efficient Open Source Big Data Platform at Uber

As Uber’s business has expanded, the underlying pool of data that powers it has grown exponentially, and thus ever more expensive to process. When Big Data rose to become one of our largest operational expenses, we began an initiative to reduce costs on our data platform, which d … | Continue reading


@eng.uber.com | 2 years ago

Challenges and Opportunities to Dramatically Reduce the Cost of Uber’s Big Data

Introduction Big data is at the core of Uber’s business. We continue to innovate and provide better experiences for our earners, riders, and eaters by leveraging big data, machine learning, and artificial intelligence technology. As a result, over the last four years, the scale o … | Continue reading


@eng.uber.com | 2 years ago

Uber Achieves Operational Excellence in the Data Quality Experience

Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands of datasets. While growing rapidly, we’re also committed to maintaining data quality, as it can greatly impa … | Continue reading


@eng.uber.com | 2 years ago

Real-time geospatial analytics at Uber

Introduction By its nature, Uber’s business is highly real-time and contingent upon geospatial data. PBs of data are continuously being collected from our drivers, riders, restaurants, and eaters. Real-time analytics over this geospatial data could provide powerful insights. In t … | Continue reading


@eng.uber.com | 2 years ago

Uber's Fulfillment Platform

Introduction to Fulfillment at Uber Uber’s mission is to help our consumers effortlessly go anywhere and get anything in thousands of cities worldwide. At its core, we capture a consumer’s intent and fulfill it by matching it with the right set of providers.  Fulfillment is the “ … | Continue reading


@eng.uber.com | 2 years ago

Containerizing Apache Hadoop Infrastructure at Uber

Introduction As Uber’s business grew, we scaled our Apache Hadoop (referred to as ‘Hadoop’ in this article) deployment to 21000+ hosts in 5 years, to support the various analytical and machine learning use cases. We built a team with varied expertise to address the challenges we … | Continue reading


@eng.uber.com | 2 years ago

Efficient and Reliable Compute Cluster Management at Scale

Introduction Uber relies on a containerized microservice architecture. Our need for computational resources has grown significantly over the years, as a consequence of business’ growth. It is an important goal now to increase the efficiency of our computing resources. Broadly spe … | Continue reading


@eng.uber.com | 2 years ago

Tuning Model Performance

Introduction Uber uses machine learning (ML) models to power critical business decisions. An ML model goes through many experiment iterations before making it to production. During the experimentation phase, data scientists or machine learning engineers explore adding features, t … | Continue reading


@eng.uber.com | 2 years ago

Evolution of the Data Science Workbench

In October 2017, we published an article introducing Data Science Workbench (DSW), our custom, all-in-one toolbox for data science, complex geospatial analytics, and exploratory machine learning. It centralizes everything required to perform data preparation, ad-hoc analyses, mod … | Continue reading


@eng.uber.com | 2 years ago

Scaling of Uber's API Gateway

As a recap from the last article, Uber’s API Gateway provides an interface and acts as a single point of access for all of our back-end services to expose features and data to Mobile and 3rd party partners. Two major components for a system like API Gateway are configuration mana … | Continue reading


@eng.uber.com | 2 years ago

Uber Surge Pricing requires Real-time Streaming Analytics. There's a better way

Background Real-time data (# of ride requests, # of drivers available, weather, game) enables operations teams to make informed decisions like surge pricing, maximum dispatch ETA calculating, and demand/supply forecasting about our services that improve user experiences on the Ub … | Continue reading


@eng.uber.com | 2 years ago

The Architecture of Uber’s API Gateway

API gateways are an integral part of microservices architecture in recent years. An API gateway provides a single point of entry for all our apps and provides an interface to access data, logic, or functionality from back-end microservices. It also provides a centralized place to … | Continue reading


@eng.uber.com | 2 years ago

Uber Engineering Tricks of the Trade: Tuning JVM Memory for Large-Scale Services

Uber engineers share their learnings on how to tune a Java Virtual Machine so as to avoid long pauses and other issues with garbage collection. | Continue reading


@eng.uber.com | 2 years ago

Pprof++: A Go Profiler with Hardware Performance Monitoring

Motivation for a Better Go Profiler Golang is the lifeblood of thousands of Uber’s back-end services, running on millions of CPU cores. Understanding our CPU bottlenecks is critical, both for reducing service latencies and also for making our compute fleet efficient. The scale at … | Continue reading


@eng.uber.com | 2 years ago

Optimal Feature Discovery: Better, Leaner ML Models Through Information Theory

Introduction  Suppose you own a production ML model that already works reasonably well. You know that adding relevant and diverse sources of signal to your model is a sure way to boost performance, but finding new features that actually improve performance can be a slow and tedio … | Continue reading


@eng.uber.com | 2 years ago

Uber Freight

Intro Uber Freight was launched in 2017 to revolutionize the business of matching shippers and carriers in the huge and inefficient freight trucking industry (around $800B annual spend in the US). We believe, and have demonstrated, that a technology-first freight broker and marke … | Continue reading


@eng.uber.com | 3 years ago

Elastic Deep Learning: Introducing Horovod on Ray

Introduction In 2017, we introduced Horovod, an open source framework for scaling deep learning training across hundreds of GPUs in parallel.  At the time, most of the deep learning use cases at Uber were related to the research and development of self-driving vehicles, while in … | Continue reading


@eng.uber.com | 3 years ago

How Uber Deals with Large iOS App Size

The App Size Problem Uber’s iOS mobile Apps for Rider, Driver, and Eats are large in size. The choice of Swift as our primary programming language, our fast-paced development environment and feature additions, layered software and its dependencies, and statically linked platform … | Continue reading


@eng.uber.com | 3 years ago

Evolving Schemaless into a Distributed SQL Database

Introduction In 2016 we published blog posts (I, II) about Schemaless - Uber Engineering’s Scalable Datastore. We went over the design of Schemaless as well as explained the reasoning behind developing it. In this post today we are going to talk about the evolution of Schemaless … | Continue reading


@eng.uber.com | 3 years ago

Fast and Reliable Schema-Agnostic Log Analytics Platform

At Uber, we provide a centralized, reliable, and interactive logging platform that empowers engineers to work quickly and confidently at scale. The logs are tagged with a rich set of contextual key value pairs, with which engineers can slice and dice their data to surface abnorma … | Continue reading


@eng.uber.com | 3 years ago

Uber at Scale: Improving Gairos Scalability/Reliability

Background Real-time data (# of ride requests, # of drivers available, weather, game) enables operations teams to make informed decisions like surge pricing, maximum dispatch ETA calculating, and demand/supply forecasting about our services that improve user experiences on the Ub … | Continue reading


@eng.uber.com | 3 years ago

The journey towards metric standardization

At Uber, business metrics are vital for discovering insights about how we perform, gauging the impact of new products, and optimizing the decision making process. The use cases for metrics can range from an operations member diagnosing a fares issue at the trip level to a machine … | Continue reading


@eng.uber.com | 3 years ago

Monitoring data quality at scale with statistical modeling at Uber

Uber employs statistical modeling to find anomalies in data and continually monitor data quality. | Continue reading


@eng.uber.com | 3 years ago

Disaster Recovery for Multi-Region Kafka at Uber

Apache Kafka at Uber Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone to Uber’s technology stack and build a complex ec … | Continue reading


@eng.uber.com | 3 years ago

Uber’s Real-Time Push Platform

Uber builds multi-sided marketplaces handling millions of trips every day across the globe. We strive to build real-time experiences for all our users. The nature of real time marketplaces make them very lively. Over the course of a trip, there are multiple participants that can … | Continue reading


@eng.uber.com | 3 years ago