In this article, “remote development environments” refer to AWS EC2 instances where engineers make code changes and can see a running Slack application with those changes. For years, engineers at Slack isolated and tested their changes by running microcosms of the Slack applicati … | Continue reading
In this article, I will talk about how Slack uses Kafka, and how a small-but-mighty team built and operationalized a self-driving Kafka cluster over the last four years to run at scale. Kafka is used at Slack as a pub-sub system, playing an essential role in the all-important Job … | Continue reading
Last September, Slack released Clips, allowing users to capture video, audio, and screen recordings in messages to help distributed teams connect and share their work. We’ve continued iterating on Clips since its release, adding thumbnail selection, background blur, and most rece … | Continue reading
Building load test infrastructure is tricky and poses many questions. How can we identify performance regressions in newly deployed builds, given the overhead of spinning up test clients? To gather the most representative results, should we load test at our peak hours or when the … | Continue reading
By Laura Nolan, with contributions from Glen D. Sanford, Jamie Scheinblum, and Chris Sullivan. Assessing conditions Slack experienced a major incident on February 22 this year, during which time many users were unable to connect to Slack, including the author — which certainly ma … | Continue reading
While adding a recent feature to our Kubernetes compute platform, we had the need to mutate newly-created pods based on annotations set by users. The mutation needed to follow simple business rules, and didn’t need to keep track of any state. Surely there must be a canonical solu … | Continue reading
On September 30th 2021, Slack had an outage that impacted less than 1% of our online user base, and lasted for 24 hours. This outage was the result of our attempt to enable DNSSEC — an extension intended to secure the DNS protocol, required for FedRAMP Moderate — but which ultima … | Continue reading
Today, we're sharing more of our work in open source and committing to developing more of our internal tools in the open. | Continue reading
At Slack, proactively securing our systems is a top priority. One way we achieve this is by automating the detection of security issues with static code analysis, which are tools that inspect programs without executing them. They’re often used with security-based rules to automat … | Continue reading
Slack is an integral part of where work happens for teams across the world, and our work in the Core Development Engineering department supports engineers throughout Slack that develop, build, test, and release high-quality services to Slack’s customers. In this article, we share … | Continue reading
Agile development methods can bolster company culture and empower teams to move quickly, with a focus on frequently adding value for customers. Whether you are a program manager, game developer, event planner, or architect, within businesses where change is constant, it’s key to … | Continue reading
Reinventing how the world does work inevitably creates a lot of data. Each year, Slack’s scale has increased and the volume of data ingested and stored has kept pace. To make it possible to understand relationships within our data, we’ve invested heavily in an automated data line … | Continue reading
More than five years ago, we launched the Slack Platform, giving developers an easy way to build apps in Slack and publish them in our App Directory. Today, millions of users bring their work into Slack, and those apps built by over 885,000 active developers on the platform are k … | Continue reading
A deep-dive in how we built an eventual consistency data model to predict Slack Connect invites at Slack by smart classification system for email domains. | Continue reading
Troubleshooting plugin upgrades by going down the rabbit hole of debugging Jenkins. | Continue reading
Controlling which users are able to take which actions is no simple task. Building this into Slack has always been an interesting challenge. In large enterprise organizations, the standard types of roles we offered to customers were too broad, and delegating a generic admin role … | Continue reading
Complex systems are difficult to reason about at scale; we often can’t accurately extrapolate system behavior and performance, so we need to derive that data empirically. We use load testing to do just that: find the limits of our systems and weed out bugs at a large scale in a c … | Continue reading
Slack has a global customer base, with millions of simultaneously connected users at peak times. Most of the communication between users involves sending lots of tiny messages to each other. For much of Slack’s history, we’ve used HAProxy as a load balancer for all incoming traff … | Continue reading
This story speaks to the process behind incident response at Slack and uses the May 12th, 2020 outage as an example. For a deeper technical review of the same outage, read Laura Nolan’s post, “A Terrible, Horrible, No-Good, Very Bad Day at Slack” Slack is a critical tool for mill … | Continue reading
And now we welcome the new year. Full of things that have never been. — Rainer Maria Rilke January 4th 2021 was the first working day of the year for many around the globe, and for most of us at Slack too (except of course for our on-callers and our customer experience team, who … | Continue reading
Since its inception, Slack has fostered a culture of inclusion and diversity. The Security organization at Slack is a prime example of how women can thrive in the security space, transitioning to security from different backgrounds and expertises. With Slack’s strong commitment … | Continue reading
Kalpak is a Staff Engineer at Slack. When Kalpak joined Slack, he worked on features in Email Bridge. More recently, he joined the Platform Admin team and works on the backend to build and support admin APIs to make life for Slack Enterprise App administrators easier. | Continue reading
In the first installment of the article, we examined why we built a React analytics library. We also looked at how we use the library to share data efficiently, log smarter impressions, and simplify event logging. In this second part of the article, we will focus on how we abstra … | Continue reading
Slack uses PHP for most of its server-side application logic, which is an unusual choice these days. Why did we choose to build a new project in this language? Should you? Most programmers who have only casually used PHP know two things about it: that it is a bad language, which … | Continue reading
From the very beginning of Slack, MySQL was used as the storage engine for all our data. Slack operated MySQL servers in an active-active configuration. This is the story of how we changed our data storage architecture from the active-active clusters over to Vitess — a horizontal … | Continue reading
In this post we’ll explore the architecture of Email Bridge and how it works, along with lessons learned while shipping the feature. | Continue reading
At Slack, we’ve gone through an evolution of our AWS infrastructure from the early days of running a few hand-built EC2 instances, all the way to provisioning thousands of EC2s instances across multiple AWS regions, using the latest AWS services to build reliable and scalable inf … | Continue reading
While it’s no secret that the cross-platform Slack Desktop app is built on Electron, it might be slightly less well known that it’s a hybrid app built around our web app (slack.com). This is one of Electron’s most compelling draws — not only can you build a cross-platform desktop … | Continue reading
“Why is it slow?” is the hardest problem to debug in a complex distributed system like Slack. To diagnose a slow-loading channel with over a hundred thousand users, we’d need to look at client-side metrics, server-side metrics, and logs. It could be a client-side issue: a slow ne … | Continue reading
A fact of life for building an internet service is that, sooner or later, bad actors are going to come along and try to abuse the system. Slack is no exception — spammers try to use our invite function as a way to send out spam emails. Having built up the infrastructure to easily … | Continue reading
This story describes the technical details of the problems that caused the Slack downtime on May 12th, 2020. To learn more about the process behind incident response for same outage, read Ryan Katkov’s post, “All Hands on Deck”. On May 12, 2020, Slack had our first significant ou … | Continue reading
by Paul Hammond and Samantha Stoller | Continue reading
What does Slack do when Slack goes down? | Continue reading
On May 12, 2020, Slack had our first significant outage in a long time. This is a detailed look into the technical issues that caused it. | Continue reading
*Prior to the COVID-19 outbreak and our current shelter in place orders in San Francisco. | Continue reading
Or, How I Learned to Stop Worrying & Trust the Compiler | Continue reading
A picture is worth a thousand words; a prototype is worth a thousand meetings. | Continue reading
Starting From the Top | Continue reading
How Slack’s development environments have evolved over time. | Continue reading
How and why Slack migrated to Hack, the benefits it gave us, and things to consider for your own codebase. | Continue reading
Deploys require a careful balance of speed and reliability. | Continue reading
We want to catch revolutions at the right time, while limiting the energy we spend chasing fads. What strategy can we follow to ensure this? | Continue reading
For two years we’ve been running Airflow 1.8, and it was time for us to catch up. Here’s how we did it without impacting 700B daily records. | Continue reading
Introducing Nebula, an open source scalable overlay networking tool with a focus on performance, simplicity and security. | Continue reading
Introducing Nebula, an open source scalable overlay networking tool with a focus on performance, simplicity and security. | Continue reading
Two years ago, I wrote a post about Libslack, Slack’s shared C++ client library. That post described how Slack used the Libslack library… | Continue reading
More than just CSS, dark mode represents a new way of thinking about color and styles at Slack. | Continue reading
Building shared channels challenged Slack’s fundamental assumption that the workspace is the atomic unit of partitioning customer data. | Continue reading