Introduction If you have read our previous article, ML Education at Uber: Frameworks Inspired by Engineering Principles, you have seen several examples of how Uber benefits from applying Engineering Principles to drive the ML Education Program’s content design and program framewo … | Continue reading
Introduction “Immensely laborious calculations on inferior data may increase the yield from 95 to 100 percent. A gain of 5 percent, of perhaps a small total. A competent overhauling of the process of collection, or of the experimental design, may often increase the yield ten- or … | Continue reading
This blog post describes the implementation of an automated vertical CPU scaling system in which every storage workload running at Uber is allocated the ideal amount of cores. The framework is used today to right-size more than 500,000 Docker containers, and since its inception i … | Continue reading
G. P. Meyer, A. Laddha, E. Kee, C. Vallespi-Gonzalez, C. WellingtonIn this paper, we present LaserNet, a computationally efficient method for 3D object detection from LiDAR data for autonomous driving. The efficiency results from processing LiDAR data in the native range view of … | Continue reading
Painting the Picture Before 2021, Uber engineers would have to take quite a taxing journey to make a code change to the Go Monorepo. First, the engineer would make their changes on a local branch and put up a code revision to our internal code review system, Phabricator. Next, ou … | Continue reading
Uber has adopted Golang (Go for short) as a primary programming language for developing microservices. Our Go monorepo consists of about 50 million lines of code (and growing) and contains approximately 2,100 unique Go services (and growing). Go makes concurrency a first-class ci … | Continue reading
Safety has long been a top priority at Uber, as Uber’s CEO Dara Khosrowshahi wrote in ‘Raising the Bar on Safety’ in September 2018. In order to #StandForSafety, the team at Uber has rolled out a set of features, such as Safety Center, Trusted Contacts, and the in-app Emergency B … | Continue reading
At Uber, all stateful workloads run on a common containerized platform across a large fleet of hosts. Stateful workloads include MySQL®, Apache Cassandra®, ElasticSearch®, Apache Kafka®, Apache HDFS™, Redis™, Docstore, Schemaless, etc., and in many cases these workloads are co-lo … | Continue reading
Overview Data access restrictions, retention, and encryption at rest are fundamental security controls. This blog explains how we have built and utilized open-sourced Apache Parquet™'s finer-grained encryption feature to support all 3 controls in a unified way. In particular, we … | Continue reading
As Uber's architecture has grown to encompass thousands of interdependent microservices, we need to test our mission-critical components at max load in order to preserve reliability. Accurate load testing allows us to validate if a set of services are working at peak usage and op … | Continue reading
At Uber, magical customer experiences depend on accurate arrival time predictions (ETAs). We use ETAs to calculate fares, estimate pickup times, match riders to drivers, plan deliveries, and more. Traditional routing engines compute ETAs by dividing up the road network into small … | Continue reading
Introduction Uber is a worldwide marketplace of services, processing thousands of monetary transactions every second. As a marketplace, Uber takes on all of the risks associated with payment processing. Uber partners who use the marketplace to provide services are paid for their … | Continue reading
Introduction As part of Uber engineering’s wide efforts to reach profitability, recently our team was focused on reducing cost of compute capacity by improving efficiency. Some of the most impactful work was around GOGC optimization. In this blog we want to share our experience w … | Continue reading
Uber’s backend is an exemplar of microservice architecture. Each microservice is a small, individually deployable program performing a specific business logic (operation). The microservice architecture is a type of distributed computing system, which is suitable for independent d … | Continue reading
Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we leveraged open source technology to bui … | Continue reading
Introduction Each day, Uber moves millions of people around the world and delivers tens of millions of food and grocery orders. This generates a large number of financial transactions that need to be stored with provable completeness, consistency, and compliance. LedgerStore is … | Continue reading
Uber Engineering introduces Michelangelo, our machine learning-as-a-service system that enables teams to easily build, deploy, and operate ML solutions at scale. | Continue reading
Introduction The Fulfillment Platform is a foundational Uber domain that enables the rapid scaling of new verticals. The platform handles billions of database transactions each day, ranging from user actions (e.g., a driver starting a trip) and system actions (e.g., creating an … | Continue reading
Introduction to Flaky Tests Unit testing forms the bedrock of any Continuous Integration (CI) system. It warns software engineers of bugs in newly-implemented code and regressions in existing code, before it is merged. This ensures increased software reliability. It also improves … | Continue reading
Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone of our technology stack. It empowers a large number of different workf … | Continue reading
Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone of our technology stack. It empowers a large number of different workf … | Continue reading
Introduction Data is crucial for our products. Data analytics help us provide a frictionless experience to the people that use our services. It also enables our engineers, product managers, data analysts, and data scientists to make informed decisions. The impact of data analysis … | Continue reading
With Uber’s business growth and the fast adoption of big data and AI, Big Data scaled to become our most costly infrastructure platform. To reduce operational expenses, we developed a holistic framework with 3 pillars: platform efficiency, supply, and demand (using supply to desc … | Continue reading
As Uber’s business has expanded, the underlying pool of data that powers it has grown exponentially, and thus ever more expensive to process. When Big Data rose to become one of our largest operational expenses, we began an initiative to reduce costs on our data platform, which d … | Continue reading
Introduction Big data is at the core of Uber’s business. We continue to innovate and provide better experiences for our earners, riders, and eaters by leveraging big data, machine learning, and artificial intelligence technology. As a result, over the last four years, the scale o … | Continue reading
Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands of datasets. While growing rapidly, we’re also committed to maintaining data quality, as it can greatly impa … | Continue reading
Introduction By its nature, Uber’s business is highly real-time and contingent upon geospatial data. PBs of data are continuously being collected from our drivers, riders, restaurants, and eaters. Real-time analytics over this geospatial data could provide powerful insights. In t … | Continue reading
Introduction to Fulfillment at Uber Uber’s mission is to help our consumers effortlessly go anywhere and get anything in thousands of cities worldwide. At its core, we capture a consumer’s intent and fulfill it by matching it with the right set of providers. Fulfillment is the “ … | Continue reading
Introduction As Uber’s business grew, we scaled our Apache Hadoop (referred to as ‘Hadoop’ in this article) deployment to 21000+ hosts in 5 years, to support the various analytical and machine learning use cases. We built a team with varied expertise to address the challenges we … | Continue reading
Introduction Uber relies on a containerized microservice architecture. Our need for computational resources has grown significantly over the years, as a consequence of business’ growth. It is an important goal now to increase the efficiency of our computing resources. Broadly spe … | Continue reading
Introduction Uber uses machine learning (ML) models to power critical business decisions. An ML model goes through many experiment iterations before making it to production. During the experimentation phase, data scientists or machine learning engineers explore adding features, t … | Continue reading
In October 2017, we published an article introducing Data Science Workbench (DSW), our custom, all-in-one toolbox for data science, complex geospatial analytics, and exploratory machine learning. It centralizes everything required to perform data preparation, ad-hoc analyses, mod … | Continue reading
As a recap from the last article, Uber’s API Gateway provides an interface and acts as a single point of access for all of our back-end services to expose features and data to Mobile and 3rd party partners. Two major components for a system like API Gateway are configuration mana … | Continue reading
Background Real-time data (# of ride requests, # of drivers available, weather, game) enables operations teams to make informed decisions like surge pricing, maximum dispatch ETA calculating, and demand/supply forecasting about our services that improve user experiences on the Ub … | Continue reading
API gateways are an integral part of microservices architecture in recent years. An API gateway provides a single point of entry for all our apps and provides an interface to access data, logic, or functionality from back-end microservices. It also provides a centralized place to … | Continue reading
Uber engineers share their learnings on how to tune a Java Virtual Machine so as to avoid long pauses and other issues with garbage collection. | Continue reading
Motivation for a Better Go Profiler Golang is the lifeblood of thousands of Uber’s back-end services, running on millions of CPU cores. Understanding our CPU bottlenecks is critical, both for reducing service latencies and also for making our compute fleet efficient. The scale at … | Continue reading
Introduction Suppose you own a production ML model that already works reasonably well. You know that adding relevant and diverse sources of signal to your model is a sure way to boost performance, but finding new features that actually improve performance can be a slow and tedio … | Continue reading
Intro Uber Freight was launched in 2017 to revolutionize the business of matching shippers and carriers in the huge and inefficient freight trucking industry (around $800B annual spend in the US). We believe, and have demonstrated, that a technology-first freight broker and marke … | Continue reading
Introduction In 2017, we introduced Horovod, an open source framework for scaling deep learning training across hundreds of GPUs in parallel. At the time, most of the deep learning use cases at Uber were related to the research and development of self-driving vehicles, while in … | Continue reading
The App Size Problem Uber’s iOS mobile Apps for Rider, Driver, and Eats are large in size. The choice of Swift as our primary programming language, our fast-paced development environment and feature additions, layered software and its dependencies, and statically linked platform … | Continue reading
Introduction In 2016 we published blog posts (I, II) about Schemaless - Uber Engineering’s Scalable Datastore. We went over the design of Schemaless as well as explained the reasoning behind developing it. In this post today we are going to talk about the evolution of Schemaless … | Continue reading
At Uber, we provide a centralized, reliable, and interactive logging platform that empowers engineers to work quickly and confidently at scale. The logs are tagged with a rich set of contextual key value pairs, with which engineers can slice and dice their data to surface abnorma … | Continue reading
Background Real-time data (# of ride requests, # of drivers available, weather, game) enables operations teams to make informed decisions like surge pricing, maximum dispatch ETA calculating, and demand/supply forecasting about our services that improve user experiences on the Ub … | Continue reading
At Uber, business metrics are vital for discovering insights about how we perform, gauging the impact of new products, and optimizing the decision making process. The use cases for metrics can range from an operations member diagnosing a fares issue at the trip level to a machine … | Continue reading
Uber employs statistical modeling to find anomalies in data and continually monitor data quality. | Continue reading
Apache Kafka at Uber Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone to Uber’s technology stack and build a complex ec … | Continue reading
Uber builds multi-sided marketplaces handling millions of trips every day across the globe. We strive to build real-time experiences for all our users. The nature of real time marketplaces make them very lively. Over the course of a trip, there are multiple participants that can … | Continue reading