Open sourcing Feathr – LinkedIn’s feature store for productive machine learning

We are open sourcing Feathr – the feature store we built to simplify machine learning (ML) feature management and improve developer productivity. At LinkedIn, dozens of applications use Feathr to define features, compute them for training, deploy them in production, and share the … | Continue reading


@engineering.linkedin.com | 1 year ago

LinkedIn’s Journey to Java 11

Introduction At LinkedIn, we are committed to deliver a best-in-class platform experience for our members. One of the technologies that we use to do that is Java, an object-oriented programming language that produces software for multiple platforms. We are a huge consumer of Java … | Continue reading


@engineering.linkedin.com | 1 year ago

Performance-Adaptive Sampling Strategy (Pass) for GNNs: Open Sourcing Pass

Co-authors: Jaewon Yang, Minji Yoon, Sufeng Niu, Dash Shi, and Qi He | Continue reading


@engineering.linkedin.com | 2 years ago

Addressing the last mile problem with MySQL high availability

image-of-my-sql-cluster | Continue reading


@engineering.linkedin.com | 2 years ago

Why am I seeing this ad?

Co-authors: Dhruv Bansal, Aanchal Somani, Sneha Dewan, and Vikrant Mahajan | Continue reading


@engineering.linkedin.com | 2 years ago

Hodor: Detecting and addressing overload in LinkedIn microservices

figure-of-framework | Continue reading


@engineering.linkedin.com | 2 years ago

Evolving LinkedIn’s analytics tech stack

Co-authors: Steven Chuang, Qinyu Yue, Aravind Rao, and Srihari Duddukuru | Continue reading


@engineering.linkedin.com | 2 years ago

Completing a member knowledge graph with Graph Neural Networks

Co-authors: Jaewon Yang, Jiatong Chen, and Yanen Li | Continue reading


@engineering.linkedin.com | 2 years ago

Remote Development in the Cloud LinkedIn

Co-authors: Shivani Pai Kasturi and Swati Gambhir | Continue reading


@engineering.linkedin.com | 2 years ago

Who moved my 99th percentile latency?

Co-author: Cuong Tran Longtail latencies affect members every day and improving the response times of systems even at the 99th percentile is critical to the member's experience. There can be many causes such as slow applications, slow disk accesses, errors in the network, and man … | Continue reading


@engineering.linkedin.com | 2 years ago

Open sourcing DynoYARN A simulation and testing infrastructure for YARN clusters

Co-authors: Keqiu Hu, Jonathan Hung, Haibo Chen, and Sriram Rao | Continue reading


@engineering.linkedin.com | 2 years ago

Greykite: A flexible, intuitive, and fast forecasting library

Co-authors: Reza Hosseini, Albert Chen, Kaixu Yang, Sayan Patra, Rachit Arora, and Parvez Ahammad | Continue reading


@engineering.linkedin.com | 2 years ago

Rethinking site capacity projections with Capacity Analyzer

While site outages are inevitable, it’s our job to minimize both the duration of outages and the likelihood for an outage to occur. One of our preemptive measures is in the way we determine overall site capacity and health on an everyday basis, in that we load-test in production. … | Continue reading


@engineering.linkedin.com | 3 years ago

Ultra fast OLAP queries with star-tree index

Pinot is an open source, scalable distributed OLAP data store that entered the Apache Incubation recently. Developed at LinkedIn, it works across a wide variety of production use cases to deliver real-time, low latency analytics. | Continue reading


@engineering.linkedin.com | 3 years ago

Solving the data integration variety problem at scale with Gobblin

Co-authors: Chris Li, Kevin Lau, and Subbu Sanka | Continue reading


@engineering.linkedin.com | 3 years ago

Open source update: School of SRE

Co-authors: Akbar KM and Kalyanasundaram Somasundaram | Continue reading


@engineering.linkedin.com | 3 years ago

Taming memory fragmentation in Venice with Jemalloc

Sometimes, an engineering problem arises that might make us feel like maybe we don't know what we're doing, or at the very least, forces us out of the comfort zone of our area of expertise. That day came for the Venice team at Linkedin when we began to notice that some Venice pro … | Continue reading


@engineering.linkedin.com | 3 years ago

A/B testing at LinkedIn: Assigning variants at scale

Co-authors: Alexander Ivaniuk and Weitao Duan | Continue reading


@engineering.linkedin.com | 3 years ago

Coral: SQL translation, analysis, and rewrite engine for modern data lakehouses

Co-authors: Walaa Eldin Moustafa, Wenye Zhang, Sushant Raikar, Raymond Lam, Ron Hu, Shardul Mahadik, Laura Chen, Khai Tran, Chris Chen, and Nagarathnam Muthusamy | Continue reading


@engineering.linkedin.com | 3 years ago

From Lambda to Lambda-less: Lessons learned

Co-authors: Xiang Zhang and Jingyu Zhu | Continue reading


@engineering.linkedin.com | 3 years ago

DataHub: Popular Metadata Architectures Explained

When I started my journey at LinkedIn ten years ago, the company was just beginning to experience extreme growth in the volume, variety, and velocity of our data. Over the next few years, my colleagues and I in LinkedIn’s data infrastructure team built out foundational technology … | Continue reading


@engineering.linkedin.com | 3 years ago

Pegasus Data Language: Evolving schema definitions for data modeling

Pegasus Data Schema (PDSC) is a Pegasus schema definition language that has been used for data modeling with Rest.li services for years. It's the underlying language that helps define data models, describe the data returned by REST endpoints, and generate derivative schemas for o … | Continue reading


@engineering.linkedin.com | 3 years ago

Dagli: Faster and easier machine learning on the JVM, without the tech debt

In recent years, we’ve been fortunate to see a growing number of excellent machine learning tools, such as TensorFlow, PyTorch, DeepLearning4J, and CNTK for neural networks, Spark and Kubeflow for very-large-scale pipelines, and scikit-learn, ML.NET, and the recent Tribuo for a w … | Continue reading


@engineering.linkedin.com | 3 years ago

LinkedIn Scales Compatibility Testing

Co-authors: Nima Dini and Dan Sully | Continue reading


@engineering.linkedin.com | 3 years ago

Updating LinkedIn's UI

Co-author: Jitesh Gandhi and Eric Babyak | Continue reading


@engineering.linkedin.com | 3 years ago

Fixing Linux filesystem performance regressions

As companies grow, adapt, morph, and mature, one item remains the same: the need for reinvention. Technical infrastructure is no exception. As our member community grew, our priorities were to keep up with that growth, or as we say, ensure continuous “site up.” (Read: adding serv … | Continue reading


@engineering.linkedin.com | 3 years ago

GDMix: A deep ranking personalization framework

Our logo is inspired by the chameleon: You can enable personalization on your ranking model with GDMix, bringing a personalized experience to every user, like a chameleon that can match its surroundings. | Continue reading


@engineering.linkedin.com | 3 years ago

LIquid: The Soul of a new Graph Database, part 2

Co-authors: Scott Meyer, Andrew Carter, and Andrew Rodriguez | Continue reading


@engineering.linkedin.com | 3 years ago

Production Testing with Dark Canaries

The internet software industry has moved away from long development cycles and dedicated quality assurance (QA) stages, toward a fast-paced continuous-integration/continuous-delivery (CI/CD) pipeline, where new code is quickly written, committed, and pushed to user-facing applica … | Continue reading


@engineering.linkedin.com | 3 years ago

Theory vs. Practice: Learnings from a Recent Hadoop Incident Engineering

Co-authors: Sandhya Ramu and Vasanth Rajamani | Continue reading


@engineering.linkedin.com | 3 years ago

DeText: A deep NLP framework for intelligent text understanding

Co-authors: Weiwei Guo, Xiaowei Liu, Sida Wang, Huiji Gao, and Bo Long | Continue reading


@engineering.linkedin.com | 3 years ago

Rebuilding messaging: How we bootstrapped our platform

Co-authors: Pradhan Cadabam and Jingxuan (Rex) Zhang | Continue reading


@engineering.linkedin.com | 3 years ago

Rebuilding Messaging: How we designed our new system

Co-authors: Tyler Grant, Armen Hamstra, Cliff Snyder | Continue reading


@engineering.linkedin.com | 3 years ago

Apache Pinot 0.3.0

Built at LinkedIn, Pinot is an open source, distributed, and scalable OLAP data store that we use as our de-facto near-real-time analytics service. We’ve previously discussed how and why we built Pinot to power a wide spectrum of use cases, including internal business intelligenc … | Continue reading


@engineering.linkedin.com | 4 years ago

DataHub: A Generalized Metadata Search and Discovery Tool

Co-authors: Mars Lan, Seyi Adebajo, Shirshanka Das | Continue reading


@engineering.linkedin.com | 4 years ago

Open Sourcing LinkedIn DataHub: Approaches to open source internal tools

Co-authors: Kerem Sahin, Mars Lan, and Shirshanka Das Finding the right data quickly is critical for any company that relies on big data insights to make data-driven decisions. Not only does this impact the productivity of data users (including analysts, machine learning develope … | Continue reading


@engineering.linkedin.com | 4 years ago

We retired Python 2 and improved developer happiness

Nearly 20 years after the first release of Python 2 and 11 years after the first release of Python 3, the Python development community has retired Python 2.7, the last of the Python 2 series. This marks the end of all upstream support for Python 2, including bug and security fixe … | Continue reading


@engineering.linkedin.com | 4 years ago

Making the LinkedIn A/B testing engine 20x faster

Co-authors: Alexander Ivaniuk, Jingbang Liu | Continue reading


@engineering.linkedin.com | 4 years ago

LiTr: A Lightweight Video/Audio Transcoder for Android

If a picture’s worth a thousand words, then what about a video?  | Continue reading


@engineering.linkedin.com | 4 years ago

IPv6 Inside LinkedIn Part III: The Elephant in the Room

Coauthor: Tim Crofts | Continue reading


@engineering.linkedin.com | 4 years ago

Learnings from the Journey to Continuous Deployment

As an engineer, your goal is for every commit to seamlessly land in production and provide a delightful experience for your customers. While frequent releases give you the ability to iterate and apply feedback quickly, they also require significant time, effort, and cost to achie … | Continue reading


@engineering.linkedin.com | 4 years ago

LinkedIn customizes Apache Kafka for 7T messages per day

Co-authors: Jon Lee and Wesley Wu | Continue reading


@engineering.linkedin.com | 4 years ago

Productivity at scale: How we improved build time with Gradle build cache

Editor's Note: This is the second in a series of posts describing how we improved productivity at scale—both in terms of lines of code and number of engineers—at LinkedIn. In our first post of the #ProductivityAtScale series, we shared details on how we improved build time by 400 … | Continue reading


@engineering.linkedin.com | 4 years ago

The Building Blocks of LinkedIn Assessments (Verified Skills)

Co-authors: Christian Mathiesen and Jie Zhang | Continue reading


@engineering.linkedin.com | 4 years ago

A Brief History of Scaling LinkedIn (2015)

LinkedIn started in 2003 with the goal of connecting to your network for better job opportunities. It had only 2,700 members the first week. Fast forward many years, and LinkedIn’s product portfolio, member base, and server load has grown tremendously. Today, LinkedIn operates gl … | Continue reading


@engineering.linkedin.com | 4 years ago

Engineering LinkedIn Reactions

As you casually scroll through a news feed, you may “like” a post here and there. “Liking” has become so second-nature that we don’t often think about what happens the minute you hit that “like” button. When we began considering building out our “likes” feature into a set of reac … | Continue reading


@engineering.linkedin.com | 4 years ago

Detecting and Preventing Abuse on LinkedIn Using Isolation Forests

The Anti-Abuse AI Team at LinkedIn creates, deploys, and maintains models that detect and prevent various types of abuse, including the creation of fake accounts, member profile scraping, automated spam, and account takeovers. There are several unique challenges we face when usin … | Continue reading


@engineering.linkedin.com | 4 years ago

LinkedIn Is Moving to Azure

The pursuit of our mission to connect the world’s professionals to make them more productive and successful is deeply dependent on the technology and infrastructure we build and maintain. Ten years ago, we had 50 million members. Fast forward five years and that number jumped to … | Continue reading


@engineering.linkedin.com | 4 years ago