Measuring productivity impact with Diff Authoring Time

Do types actually make developers more productive? Or is it just more typing on the keyboard? To answer that question we’re revisiting Diff Authoring Time (DAT) – how Meta measures how long it takes to submit changes to a codebase. DAT is just one of the ways e measure developer … | Continue reading


@engineering.fb.com | 2 hours ago

ILA Evo: Meta’s journey to reimagine fiber optic in-line amplifier sites

Today’s rapidly evolving landscape of use cases that demand highly performant and efficient network infrastructure is placing new emphasis on how in-line amplifiers (ILAs) are designed and deployed. Meta’s ILA Evo effort seeks to reimagine how an ILA site could be deployed to imp … | Continue reading


@engineering.fb.com | 6 days ago

Indexing code at scale with Glean

We’re sharing details about Glean, Meta’s open source system for collecting, deriving and working with facts about source code. In this blog post we’ll talk about why a system like Glean is important, explain the rationale for Glean’s design, and run through some of the ways we’r … | Continue reading


@engineering.fb.com | 28 days ago

Translating Java to Kotlin at Scale

Meta has been on a years-long undertaking to translate our entire Android codebase from Java to Kotlin. Today, despite having one of the largest Android codebases in the world, we’re well past the halfway point and still going. We’re sharing some of the tradeoffs we’ve made to su … | Continue reading


@engineering.fb.com | 29 days ago

How we think about Threads’ iOS performance

How did the Threads iOS team maintain the app’s performance during its incredible growth? Here’s how Meta’s Threads team thinks about performance, including the key metrics we monitor to keep the app healthy. We’re also diving into some case studies that impact publish reliabilit … | Continue reading


@engineering.fb.com | 29 days ago

How to build a mixed reality headset

How do you take a mixed reality (MR) headset from idea to finished product? Alfred Jones, VP of hardware engineering at Meta Reality Labs, joins Pascal Hartig (@passy) on the latest episode of the Meta Tech Podcast for a discussion on the realities (no pun intended) of building M … | Continue reading


@engineering.fb.com | 1 month ago

Inside Facebook’s video delivery system

We’re explaining the end-to-end systems the Facebook app leverages to deliver relevant content to people. Learn about our video-unification efforts that have simplified our product experience and infrastructure, in-depth details around mobile delivery, and new features we are wor … | Continue reading


@engineering.fb.com | 1 month ago

Typed Python in 2024: Well adopted, yet usability challenges persist

Ten years after the introduction of PEP 484, we surveyed the current state of the Python type system and the tools developers are using. [...] Read More... The post Typed Python in 2024: Well adopted, yet usability challenges persist appeared first on Engineering at Meta. | Continue reading


@engineering.fb.com | 1 month ago

Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine

Andromeda is Meta’s proprietary machine learning (ML) system design for retrieval in ad recommendation focused on delivering a step-function improvement in value to our advertisers and people. This system pushes the boundary of cutting edge AI for retrieval with NVIDIA Grace Hopp … | Continue reading


@engineering.fb.com | 1 month ago

Sequence learning: A paradigm shift for personalized ads recommendations

AI plays a fundamental role in creating valuable connections between people and advertisers within Meta’s family of apps. Meta’s ad recommendation engine, powered by deep learning recommendation models (DLRMs), has been instrumental in delivering personalized ads to people. Key t … | Continue reading


@engineering.fb.com | 1 month ago

How Meta built large-scale cryptographic monitoring

Cryptographic monitoring at scale has been instrumental in helping our engineers understand how cryptography is used at Meta. Monitoring has given us a distinct advantage in our efforts to proactively detect and remove weak cryptographic algorithms and has assisted with our gener … | Continue reading


@engineering.fb.com | 2 months ago

Diff Authoring Time: Measuring developer productivity at Meta

At Meta, we’re always looking for ways to enhance the productivity of our engineers and developers. But how exactly do you measure developer productivity? On this episode of the Meta Tech Podcast Pascal Hartig (@passy) sits down with Sarita and Moritz, two engineers at Meta who h … | Continue reading


@engineering.fb.com | 2 months ago

IPLS: Privacy-preserving storage for your WhatsApp contacts

Your contact list is fundamental to the experiences you love and enjoy on WhatsApp. With contacts, you know which of your friends and family are on WhatsApp, you can easily message or call them, and it helps give you context on who is in your groups. But losing your phone could m … | Continue reading


@engineering.fb.com | 2 months ago

OCP Summit 2024: The open future of networking hardware for AI

At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters. We’ve expanded our network hardware portfolio and are contributing two new disaggregated network fabrics and a new NIC to OCP. We look forward t … | Continue reading


@engineering.fb.com | 3 months ago

Meta’s open AI hardware vision

At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our latest open AI hardware designs with the OCP community. These innovations include a new AI platform, cutting-edge open rack designs, and advanced network fabrics and components. By sharing our designs, we … | Continue reading


@engineering.fb.com | 3 months ago

How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions

Data for Good at Meta is open-sourcing the data used to train our AI-powered population maps. We’re hoping that researchers and other organizations around the world will be able to leverage these tools to assist with a wide range of projects including those on climate adaptation, … | Continue reading


@engineering.fb.com | 3 months ago

React at Meta Connect 2024

At Meta, React and React Native are more than just tools; they are integral to our product development and innovation. With over five thousand people at Meta building products and experiences with React every month, these technologies are fundamental to our engineering culture an … | Continue reading


@engineering.fb.com | 3 months ago

Inside Bento: Jupyter Notebooks at Meta

This episode of the Meta Tech Podcast is all about Bento, Meta’s internal distribution of Jupyter Notebooks, an open-source web-based computing platform. Bento allows our engineers to mix code, text, and multimedia in a single document and serves a wide range of use cases at Meta … | Continue reading


@engineering.fb.com | 4 months ago

Simulator-based reinforcement learning for data center cooling optimization

We’re sharing more about the role that reinforcement learning plays in helping us optimize our data centers’ environmental controls. Our reinforcement learning-based approach has helped us reduce energy consumption and water usage across various weather conditions. Meta is revamp … | Continue reading


@engineering.fb.com | 4 months ago

Meta is getting ready for post-quantum cryptography

The Quantum Apocalypse is coming. The advent of quantum computers has raised real questions about the future of data privacy over the internet. Someday, advances in quantum computing will make it possible to decrypt sensitive data that was encrypted using today’s complex cryptogr … | Continue reading


@engineering.fb.com | 4 months ago

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

At Meta, we’ve been diligently working to incorporate privacy into different systems of our software stack over the past few years. Today, we’re excited to share some cutting-edge technologies that are part of our Privacy Aware Infrastructure (PAI) initiative. These innovations m … | Continue reading


@engineering.fb.com | 4 months ago

RETINAS: Real-Time Infrastructure Accounting for Sustainability

We are introducing a new metric— real-time server fleet utilization effectiveness —as part of the RETINAS initiative to help reduce emissions and achieve net zero emissions across our value chain in 2030. This new metric allows us to measure server resource usage (e.g., compute, … | Continue reading


@engineering.fb.com | 4 months ago

How PyTorch powers AI training and inference

How PyTorch powers AI training and inference Learn about new PyTorch advancements for LLMs and how PyTorch is enhancing every aspect of the LLM lifecycle. In this talk from AI Infra @ Scale 2024, software engineers Wanchao Liang and Evan Smothers are joined by Meta research scien … | Continue reading


@engineering.fb.com | 4 months ago

Inside the hardware and co-design of MTIA

In this talk from AI Infra @ Scale 2024, Joel Colburn, a software engineer at Meta, technical lead Junqiang Lan, and software engineer Jack Montgomery discuss the second generation of MTIA, Meta’s in-house training and inference accelerator. They cover the co-design process behin … | Continue reading


@engineering.fb.com | 4 months ago

Bringing Llama 3 to life

Llama 3 is Meta’s most capable openly-available LLM to date and the recently-released Llama 3.1 will enable new workflows, such as synthetic data generation and model distillation with unmatched flexibility, control, and state-of-the-art capabilities that rival the best closed so … | Continue reading


@engineering.fb.com | 4 months ago

Aparna Ramani discusses the future of AI infrastructure

Delivering new AI technologies at scale also means rethinking every layer of our infrastructure – from silicon and software systems and even our data center designs. For the second year in a row, Meta’s engineering and infrastructure teams returned for the AI Infra @ Scale confer … | Continue reading


@engineering.fb.com | 4 months ago

How Meta animates AI-generated images at scale

We launched Meta AI with the goal of giving people new ways to be more productive and unlock their creativity with generative AI (GenAI). But GenAI also comes with challenges of scale. As we deploy new GenAI technologies at Meta, we also focus on delivering these services to peop … | Continue reading


@engineering.fb.com | 5 months ago

RoCE networks for distributed AI training at scale

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 405B. This week at ACM SIGCOMM 2024 in Sydney, Australia … | Continue reading


@engineering.fb.com | 5 months ago

DCPerf: An open source benchmark suite for hyperscale compute applications

We are open-sourcing DCPerf, a collection of benchmarks that represents the diverse categories of workloads that run in data center cloud deployments. We hope that DCperf can be used more broadly by academia, the hardware industry, and internet companies to design and evaluate fu … | Continue reading


@engineering.fb.com | 5 months ago

Meet Caddy – Meta’s next-gen mixed reality CAD software

What happens when a team of mechanical engineers get tired of looking at flat images of 3D models over Zoom? Meet the team behind Caddy, a new CAD app for mixed reality. They join Pascal Hartig (@passy) on the Meta Tech Podcast to talk about teaching themselves to code, disruptin … | Continue reading


@engineering.fb.com | 6 months ago

AI Lab: The secrets to keeping machine learning engineers moving fast

The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A/B test common ML workflows – enabling proactive improvements and … | Continue reading


@engineering.fb.com | 6 months ago

Taming the tail utilization of ads inference at Meta scale

Tail utilization is a significant system issue and a major factor in overload-related failures and low compute utilization. The tail utilization optimizations at Meta have had a profound impact on model serving capacity footprint and reliability. Failure rates, which are mostly t … | Continue reading


@engineering.fb.com | 6 months ago

Meta’s approach to machine learning prediction robustness

Meta’s advertising business leverages large-scale machine learning (ML) recommendation models that power millions of ads recommendations per second across Meta’s family of apps. Maintaining reliability of these ML systems helps ensure the highest level of service and uninterrupte … | Continue reading


@engineering.fb.com | 6 months ago

The key to a happy Rust/C++ relationship

The history of Rust at Meta goes all the way back to 2016, when we first started using it for source control. Today, it has been widely embraced at Meta and is one of our primary supported server-side languages (along with C++, Python, and Hack). But that doesn’t mean there weren … | Continue reading


@engineering.fb.com | 6 months ago

Leveraging AI for efficient incident response

We’re sharing how we streamline system reliability investigations using a new AI-assisted root cause analysis system. The system uses a combination of heuristic-based retrieval and large language model-based ranking to speed up root cause identification during investigations. Our … | Continue reading


@engineering.fb.com | 6 months ago

PVF: A novel metric for understanding AI systems’ vulnerability against SDCs in model parameters

We’re introducing parameter vulnerability factor (PVF), a novel metric for understanding and measuring AI systems’ vulnerability against silent data corruptions (SDCs) in model parameters. PVF can be tailored to different AI models and tasks, adapted to different hardware faults, … | Continue reading


@engineering.fb.com | 7 months ago

MLow: Meta’s low bitrate audio codec

At Meta, we support real-time communication (RTC) for billions of people through our apps, including WhatsApp, Instagram, and Messenger. We are working to make RTC accessible by providing a high-quality experience for everyone – even those who might not have the fastest connectio … | Continue reading


@engineering.fb.com | 7 months ago

How Meta trains large language models at scale

As we continue to focus our AI research and development on solving increasingly complex problems, one of the most significant and challenging shifts we’ve experienced is the sheer scale of computation required to train large language models (LLMs). Traditionally, our AI model tra … | Continue reading


@engineering.fb.com | 7 months ago

Maintaining large-scale AI capacity at Meta

Meta is currently operating many data centers with GPU training clusters across the world. Our data centers are the backbone of our operations, meticulously designed to support the scaling demands of compute and storage. A year ago, however, as the industry reached a critical inf … | Continue reading


@engineering.fb.com | 7 months ago

Unlocking the power of mixed reality devices with MobileConfig

MobileConfig enables developers to centrally manage a mobile app’s configuration parameters in our data centers. Once a parameter value is changed on our central server, billions of app devices automatically fetch and apply the new value without app updates. These remotely manage … | Continue reading


@engineering.fb.com | 7 months ago

Serverless Jupyter Notebooks at Meta

At Meta, Bento, our internal Jupyter notebooks platform, is a popular tool that allows our engineers to mix code, text, and multimedia in a single document. Use cases run the entire spectrum from what we call “lite” workloads that involve simple prototyping to heavier and more co … | Continue reading


@engineering.fb.com | 7 months ago

Composable data management at Meta

In recent years, Meta’s data management systems have evolved into a composable architecture that creates interoperability, promotes reusability, and improves engineering efficiency. We’re sharing how we’ve achieved this, in part, by leveraging Velox, Meta’s open source execution … | Continue reading


@engineering.fb.com | 7 months ago

Post-quantum readiness for TLS at Meta

Today, the internet (like most digital infrastructure in general) relies heavily on the security offered by public-key cryptosystems such as RSA, Diffie-Hellman (DH), and elliptic curve cryptography (ECC). But the advent of quantum computers has raised real questions about the lo … | Continue reading


@engineering.fb.com | 7 months ago

Behind the scenes of Threads for web

When Threads first launched one of the top feature requests was for a web client. In this episode of the Meta Tech Podcast, Pascal Hartig (@passy) sits down with Ally C. and Kevin C., two engineers on the Threads Web Team that delivered the basic version of Threads for web in jus … | Continue reading


@engineering.fb.com | 8 months ago

Building new custom silicon for Meta’s AI workloads

[...] Read More... The post Building new custom silicon for Meta’s AI workloads appeared first on Engineering at Meta. | Continue reading


@engineering.fb.com | 9 months ago

Bringing HDR photo support to Instagram and Threads

Meta’s family of apps serves trillions of image download requests every day. And if you’re into high-quality images, you’ve probably noticed that Instagram and Threads have added support for high dynamic range (HDR) photos. Now people on Threads and Instagram can upload and share … | Continue reading


@engineering.fb.com | 9 months ago

Threads has entered the fediverse

Threads has entered the fediverse! As part of our beta experience, Threads users aged 18+ with public profiles can now choose to share their Threads posts to other ActivityPub-compliant servers. People on those servers can now follow federated Threads profiles and see, like, repl … | Continue reading


@engineering.fb.com | 10 months ago

Optimizing RTC bandwidth estimation with machine learning

Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Meta’s family of apps. We’ve adopted a machine learning (ML)-based approach that allows us to solve networking problems holistically across cro … | Continue reading


@engineering.fb.com | 10 months ago