Remote Development at Slack

In this article, “remote development environments” refer to AWS EC2 instances where engineers make code changes and can see a running Slack application with those changes. For years, engineers at Slack isolated and tested their changes by running microcosms of the Slack applicati … | Continue reading


@slack.engineering | 1 year ago

Building Self-driving Kafka clusters using open source components

In this article, I will talk about how Slack uses Kafka, and how a small-but-mighty team built and operationalized a self-driving Kafka cluster over the last four years to run at scale. Kafka is used at Slack as a pub-sub system, playing an essential role in the all-important Job … | Continue reading


@slack.engineering | 1 year ago

Building Background Effects for Slack Clips

Last September, Slack released Clips, allowing users to capture video, audio, and screen recordings in messages to help distributed teams connect and share their work. We’ve continued iterating on Clips since its release, adding thumbnail selection, background blur, and most rece … | Continue reading


@slack.engineering | 1 year ago

Continuous Load Testing at Slack

Building load test infrastructure is tricky and poses many questions. How can we identify performance regressions in newly deployed builds, given the overhead of spinning up test clients? To gather the most representative results, should we load test at our peak hours or when the … | Continue reading


@slack.engineering | 1 year ago

Slack’s Incident on 2-22-22

By Laura Nolan, with contributions from Glen D. Sanford, Jamie Scheinblum, and Chris Sullivan. Assessing conditions Slack experienced a major incident on February 22 this year, during which time many users were unable to connect to Slack, including the author — which certainly ma … | Continue reading


@slack.engineering | 1 year ago

A Simple Kubernetes Admission Webhook

While adding a recent feature to our Kubernetes compute platform, we had the need to mutate newly-created pods based on annotations set by users. The mutation needed to follow simple business rules, and didn’t need to keep track of any state. Surely there must be a canonical solu … | Continue reading


@slack.engineering | 2 years ago

The Case of the Recursive Resolvers

On September 30th 2021, Slack had an outage that impacted less than 1% of our online user base, and lasted for 24 hours. This outage was the result of our attempt to enable DNSSEC — an extension intended to secure the DNS protocol, required for FedRAMP Moderate — but which ultima … | Continue reading


@slack.engineering | 2 years ago

Developing in the Open

Today, we're sharing more of our work in open source and committing to developing more of our internal tools in the open. | Continue reading


@slack.engineering | 2 years ago

Two Interns Are Helping Secure Millions of Lines of Code

At Slack, proactively securing our systems is a top priority. One way we achieve this is by automating the detection of security issues with static code analysis, which are tools that inspect programs without executing them. They’re often used with security-based rules to automat … | Continue reading


@slack.engineering | 2 years ago

Infrastructure Observability for Changing the Spend Curve at Slack

Slack is an integral part of where work happens for teams across the world, and our work in the Core Development Engineering department supports engineers throughout Slack that develop, build, test, and release high-quality services to Slack’s customers. In this article, we share … | Continue reading


@slack.engineering | 2 years ago

The Four Agile Values and Slack

Agile development methods can bolster company culture and empower teams to move quickly, with a focus on frequently adding value for customers. Whether you are a program manager, game developer, event planner, or architect, within businesses where change is constant, it’s key to … | Continue reading


@slack.engineering | 2 years ago

Data Lineage at Slack

Reinventing how the world does work inevitably creates a lot of data. Each year, Slack’s scale has increased and the volume of data ingested and stored has kept pace. To make it possible to understand relationships within our data, we’ve invested heavily in an automated data line … | Continue reading


@slack.engineering | 2 years ago

How We Design Our APIs at Slack

More than five years ago, we launched the Slack Platform, giving developers an easy way to build apps in Slack and publish them in our App Directory. Today, millions of users bring their work into Slack, and those apps built by over 885,000 active developers on the platform are k … | Continue reading


@slack.engineering | 2 years ago

Email Classification

A deep-dive in how we built an eventual consistency data model to predict Slack Connect invites at Slack by smart classification system for email domains. | Continue reading


@slack.engineering | 2 years ago

Slack Engineering: How a Jenkins Job Broke Our Jenkins UI

Troubleshooting plugin upgrades by going down the rabbit hole of debugging Jenkins. | Continue reading


@slack.engineering | 2 years ago

Role Management at Slack

Controlling which users are able to take which actions is no simple task. Building this into Slack has always been an interesting challenge. In large enterprise organizations, the standard types of roles we offered to customers were too broad, and delegating a generic admin role … | Continue reading


@slack.engineering | 2 years ago

Slack Performs Load Testing

Complex systems are difficult to reason about at scale; we often can’t accurately extrapolate system behavior and performance, so we need to derive that data empirically. We use load testing to do just that: find the limits of our systems and weed out bugs at a large scale in a c … | Continue reading


@slack.engineering | 2 years ago

Migrating millions of web sockets to envoy

Slack has a global customer base, with millions of simultaneously connected users at peak times. Most of the communication between users involves sending lots of tiny messages to each other. For much of Slack’s history, we’ve used HAProxy as a load balancer for all incoming traff … | Continue reading


@slack.engineering | 3 years ago

All Hands on Deck (2020)

This story speaks to the process behind incident response at Slack and uses the May 12th, 2020 outage as an example. For a deeper technical review of the same outage, read Laura Nolan’s post, “A Terrible, Horrible, No-Good, Very Bad Day at Slack” Slack is a critical tool for mill … | Continue reading


@slack.engineering | 3 years ago

Slack’s Outage on January 4th 2021

And now we welcome the new year. Full of things that have never been.  — Rainer Maria Rilke January 4th 2021 was the first working day of the year for many around the globe, and for most of us at Slack too (except of course for our on-callers and our customer experience team, who … | Continue reading


@slack.engineering | 3 years ago

Women in Security at Slack

Since its inception, Slack has fostered a culture of inclusion and diversity. The Security organization at Slack is a prime example of how women can thrive in the security space,  transitioning to security from different backgrounds and expertises. With Slack’s strong commitment … | Continue reading


@slack.engineering | 3 years ago

A Day in the Life of a Slack Back End Platform Engineer

Kalpak is a Staff Engineer at Slack. When Kalpak joined Slack, he worked on features in Email Bridge. More recently, he joined the Platform Admin team and works on the backend to build and support admin APIs to make life for Slack Enterprise App administrators easier. | Continue reading


@slack.engineering | 3 years ago

Creating a React Analytics Recording Library

In the first installment of the article, we examined why we built a React analytics library. We also looked at how we use the library to share data efficiently, log smarter impressions, and simplify event logging. In this second part of the article, we will focus on how we abstra … | Continue reading


@slack.engineering | 3 years ago

Taking PHP Seriously

Slack uses PHP for most of its server-side application logic, which is an unusual choice these days. Why did we choose to build a new project in this language? Should you? Most programmers who have only casually used PHP know two things about it: that it is a bad language, which … | Continue reading


@slack.engineering | 3 years ago

Scaling Datastores at Slack with Vitess

From the very beginning of Slack, MySQL was used as the storage engine for all our data. Slack operated MySQL servers in an active-active configuration. This is the story of how we changed our data storage architecture from the active-active clusters over to Vitess — a horizontal … | Continue reading


@slack.engineering | 3 years ago

Bridging the Gap Between Slack and Email Users

In this post we’ll explore the architecture of Email Bridge and how it works, along with lessons learned while shipping the feature.  | Continue reading


@slack.engineering | 3 years ago

Building the Next Evolution of Cloud Networks at Slack

At Slack, we’ve gone through an evolution of our AWS infrastructure from the early days of running a few hand-built EC2 instances, all the way to provisioning thousands of EC2s instances across multiple AWS regions, using the latest AWS services to build reliable and scalable inf … | Continue reading


@slack.engineering | 3 years ago

Interop’s Labyrinth: Sharing Code Between Web and Electron Apps

While it’s no secret that the cross-platform Slack Desktop app is built on Electron, it might be slightly less well known that it’s a hybrid app built around our web app (slack.com). This is one of Electron’s most compelling draws — not only can you build a cross-platform desktop … | Continue reading


@slack.engineering | 3 years ago

Distributed Tracing at Slack: Thinking in Causal Graphs

“Why is it slow?” is the hardest problem to debug in a complex distributed system like Slack. To diagnose a slow-loading channel with over a hundred thousand users, we’d need to look at client-side metrics, server-side metrics, and logs. It could be a client-side issue: a slow ne … | Continue reading


@slack.engineering | 3 years ago

Blocking Slack Invite Spam with Machine Learning

A fact of life for building an internet service is that, sooner or later, bad actors are going to come along and try to abuse the system. Slack is no exception — spammers try to use our invite function as a way to send out spam emails. Having built up the infrastructure to easily … | Continue reading


@slack.engineering | 3 years ago

A Terrible, Horrible, No-Good, Bad Day at Slack

This story describes the technical details of the problems that caused the Slack downtime on May 12th, 2020. To learn more about the process behind incident response for same outage, read Ryan Katkov’s post, “All Hands on Deck”. On May 12, 2020, Slack had our first significant ou … | Continue reading


@slack.engineering | 3 years ago

Data Consistency Checks

by Paul Hammond and Samantha Stoller | Continue reading


@slack.engineering | 3 years ago

All Hands on Deck

What does Slack do when Slack goes down? | Continue reading


@slack.engineering | 3 years ago

A Terrible, Horrible, No-Good, Bad Day at Slack

On May 12, 2020, Slack had our first significant outage in a long time. This is a detailed look into the technical issues that caused it. | Continue reading


@slack.engineering | 3 years ago

A Day in the Life of a Back End Product Engineer at Slack

*Prior to the COVID-19 outbreak and our current shelter in place orders in San Francisco. | Continue reading


@slack.engineering | 3 years ago

TypeScript at Slack (2017)

Or, How I Learned to Stop Worrying & Trust the Compiler | Continue reading


@slack.engineering | 3 years ago

Prototyping at Slack

A picture is worth a thousand words; a prototype is worth a thousand meetings. | Continue reading


@slack.engineering | 3 years ago

Happiness is a freshly organized codebase

Starting From the Top | Continue reading


@slack.engineering | 3 years ago

Development Environments at Slack

How Slack’s development environments have evolved over time. | Continue reading


@slack.engineering | 3 years ago

Hacklang at Slack: A Better PHP

How and why Slack migrated to Hack, the benefits it gave us, and things to consider for your own codebase. | Continue reading


@slack.engineering | 4 years ago

Deploys at Slack

Deploys require a careful balance of speed and reliability. | Continue reading


@slack.engineering | 4 years ago

How Big Technical Changes Happen at Slack

We want to catch revolutions at the right time, while limiting the energy we spend chasing fads. What strategy can we follow to ensure this? | Continue reading


@slack.engineering | 4 years ago

Upgrading Apache Airflow at Slack

For two years we’ve been running Airflow 1.8, and it was time for us to catch up. Here’s how we did it without impacting 700B daily records. | Continue reading


@slack.engineering | 4 years ago

Nebula, the open source global overlay network from Slack

Introducing Nebula, an open source scalable overlay networking tool with a focus on performance, simplicity and security. | Continue reading


@slack.engineering | 4 years ago

Nebula, Slack's Open Source Global Overlay Network

Introducing Nebula, an open source scalable overlay networking tool with a focus on performance, simplicity and security. | Continue reading


@slack.engineering | 4 years ago

Why Slack is no longer using a cross-platform C++ library

Two years ago, I wrote a post about Libslack, Slack’s shared C++ client library. That post described how Slack used the Libslack library… | Continue reading


@slack.engineering | 4 years ago

Building Slack Dark Mode on Desktop

More than just CSS, dark mode represents a new way of thinking about color and styles at Slack. | Continue reading


@slack.engineering | 4 years ago

Slack Built Shared Channels

Building shared channels challenged Slack’s fundamental assumption that the workspace is the atomic unit of partitioning customer data. | Continue reading


@slack.engineering | 4 years ago