If you want your program to use less memory, you will need to measure memory usage. You’ll want to measure the current usage, and then you’ll need to ensure it’s using less memory once you make some improvements. It turns out, however, that measuring memory usage isn’t as straigh … | Continue reading
You’ve finished building the initial Docker image for your Python application, you push it to the registry–and that takes a while, because your image is 2GB. Your image is clearly too large, and so your next step is to try to make your Docker image smaller. In this article you’ll … | Continue reading
One of the benefits of containers over virtual machines is that you get some measure of isolation without the performance overhead or distortion of virtualization. Docker images therefore seem like a good way to get a reproducible environment for measuring CPU performance of your … | Continue reading
Python is slow, and compiled languages like Rust, C, or C++ are fast. So when your application is too slow, rewriting some of your code in a compiled extension can seem like the natural approach to speeding things up. Unfortunately, compiled extensions are sometimes actually slow … | Continue reading
You have some data in a relational database, and you want to process it with Pandas. So you use Pandas’ handy read_sql() API to get a DataFrame—and promptly run out of memory. The problem: you’re loading all the data into memory at once. If you have enough rows in the SQL query’s … | Continue reading
Somebody is always wrong on the Internet, and bad Docker packaging advice is quite common. But one particular piece of advice keeps coming up, and it’s dangerous enough to merit its own article. In a whole bunch of places you will be told not to install security updates when buil … | Continue reading
If you run a security scanner on your Docker image, you might be in for a shock: often you’ll be warned of dozens of security vulnerabilities, even on the most up-to-date image. After the third or fourth time you get this result, you’ll start tuning the security scanner out. Even … | Continue reading
You probably don’t want to be an asshole. Being an asshole, as Siderea’s classic essay The Asshole Filter points out, is about being transgressive, about violating social boundaries and rules. And so within the cultural norms of our society, most of us try to avoid being an assho … | Continue reading
Python is not the fastest language around, so any performance boost helps, especially if you’re running at scale. It turns out that depending where you install Python from, its performance can vary quite a bit: choosing the wrong version of Python can cut your speed by 10-20%. Le … | Continue reading
A segfaulting program might be the symptom of a bug in C code–or it might be that your process is running out of memory. Crashing is just one symptom of running out of memory. Your process might instead just run very slowly, your computer or VM might freeze, or your process might … | Continue reading
You have some code—whether it’s Python, Rust, Java, or some other language—whose speed you want to measure over time. Ideally you want it to get faster, but at the very least you don’t want to get any slower. So you write a benchmark, and now you need to run it—but where? Virtual … | Continue reading
You’re processing a large amount of data with Python, the processing seems easily parallelizable—and it’s sloooooooow. The obvious next step is switch to some sort of multiprocessing, or even start processing data on a cluster so you can use multiple machines. Obvious, but often … | Continue reading
Docker packaging is an exercise in shoving square pegs into round holes, over and over and over again. Consider the Poetry packaging tool for Python. One of Poetry’s features can make Docker rebuilds slower, by breaking Docker’s caching. And it’s not a bad feature, there’s nothin … | Continue reading
Python 3.9 is now available–but should you switch to it immediately? And if not now, when? The short answer is, no, you probably don’t want to switch immediately; quite possibly you can’t switch immediately. To understand why, we need to consider Python packaging, the software de … | Continue reading
Let’s say you have an array, and you need to make some copies and modify those copies. Usually, memory usage scales with the number of copies: if your original array was 1GB of RAM, each copy will take 1GB of RAM. And that can add up. But often, you’re just changing a small part … | Continue reading
Whether it’s a data processing pipeline or a scientific computation, you will often want to figure out how much memory your process is going to need: If you’re running out of memory, it’s good to know whether you just need to upgrade your laptop from 8GB to 16GB RAM, or whether y … | Continue reading
The official Python image for Docker is quite popular, and in fact I recommend one of its variations as a base image. But many people don’t quite understand what it does, which can lead to confusion and brokenness. In this post I will therefore go over how it’s constructed, why i … | Continue reading
You’ve written your Python application—a server, CLI tool, or batch process—and now you need to distribute it to the machines where it will be running. In order to run your application, you will need: Your code. Various Python libraries your code depends on, like Flask or NumPy. … | Continue reading
If you want to understand a Docker image, there is no more useful tool than the docker history command. Whether it’s telling you why your image is so large, or helping you understand how a base image was constructed, the history command will let you peer into the innards of any i … | Continue reading
Every time you create an instance of a class in Python, you are using up some memory–including overhead that might actually be larger than the data you care about. Create a million objects, and you have a million times the overhead. And that overhead can add up, either preventing … | Continue reading
Let’s say you want to store a list of integers in Python: list_of_numbers = [] for i in range(1000000): list_of_numbers.append(i) Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte o … | Continue reading
Your code runs fine on your computer, but when you try to package it with Docker you keep getting ImportErrors: Python can’t find your code. There are multiple reasons why this can happen, some of them Python-specific, some of them Docker-specific. So let’s go through a step-by-s … | Continue reading
Unlike languages like C, much of the time Python will free up memory for you. But sometimes, it won’t work the way you expect it to. Consider the following Python program—how much memory do you think it will use at peak? import numpy as np def load_1GB_of_data(): return np.ones(( … | Continue reading
Helping you deploy with confidence, ship higher quality code, and speed up your application. | Continue reading
You run your program, and it crashes—it’s out of memory: If you’re lucky, you get a MemoryError exception. If you’re less lucky, you get a coredump. If you’re having a bad day, your computer locks up and you need to restart it. How do you figure out what is using up all your Pyth … | Continue reading
If your Python data pipeline is using too much memory, it can be very difficult to figure where exactly all that memory is going. And when you do make changes, it can be difficult to figure out if your changes helped. Yes, there are existing memory profilers for Python that help … | Continue reading
You don’t want to deploy insecure code to production—but it’s easy for mistakes and vulnerabilities to slip through. So you want some way to catch security issues automatically, without having to think about it. This is where security scanners come in. They won’t solve all your p … | Continue reading
Your application likely depends on a variety of third-party packages. As time passes, those dependencies will change, for two reasons: Security fixes and critical bug fixes: you don’t want someone stealing your data, or your data getting corrupted. Software packages get new relea … | Continue reading
Sometime last month you built a Docker image for your Python application. Today you start with the same revision, fix a minor bug, and build a new image from scratch. And suddenly you’ve got a mess on your hands. If your build is not reproducible, you might end up installing diff … | Continue reading
When data doesn’t fit in memory, you can use chunking: loading and then processing it in chunks, so that only a subset of the data needs to be in memory at any given time. But while chunking saves memory, it doesn’t address the other problem with large amounts of data: computatio … | Continue reading
When you’re building a Docker image, you might need to use some secrets: the password to a private package repository, for example. You don’t want this secret to end up embedded in the image, because then anyone who somehow gets access to the image will get access to your private … | Continue reading
If you’re using Docker, the next natural step seems to be Kubernetes, aka K8s: that’s how you run things in production, right? Well, maybe. Solutions designed for 500 software engineers working on the same application are quite different than solutions for 50 software engineers. … | Continue reading
If your NumPy array is too big to fit in memory all at once, you can process it in chunks: either transparently, or explicitly loading only one chunk at a time from disk. Either way, you need to store the array on disk somehow. For this particular situation, there are two common … | Continue reading
You’ve got a nice new Dockerfile, and it’s time to try it out: $ docker build -t mynewimage . Sending build context to Docker daemon 3.072kB Step 1/3 : FROM python:3.8-slim-buster ---> 3d8f801fc3db Step 2/3 : COPY build.sh . ---> 541b65a7b417 Step 3/3 : RUN ./build.sh ---> Runnin … | Continue reading
When you’re doing computationally intensive calculations with NumPy, you’ll want to use all your computer’s CPUs. Your computer has 2 or 4 or even more CPU cores, and if you can use them all then your code will run faster. Except, of course, when parallelism makes your code run s … | Continue reading
When you’re choosing a base image for your Docker image, Alpine Linux is often recommended. Using Alpine, you’re told, will make your images smaller and speed up your builds. And if you’re using Go that’s reasonable advice. But if you’re using Python, Alpine Linux will quite ofte … | Continue reading
The Conda packaging tool implements environments, that enable different applications to have different libraries installed. So when you’re building a Docker image for a Conda-based application, you’ll need to activate a Conda environment. Unfortunately, activating Conda environme … | Continue reading
You have a large chunk of data—a NumPy array, or a Pandas DataFrame—and you need to do a series of operations on it. By default both libraries make copies of the data, which means you’re using even more RAM. Both libraries do have APIs for modifying data in-place, but that can le … | Continue reading
When you’re building a Docker image for your Python application, you will need to: Upgrade system packages in order to get the latest security updates and critical bug fixes. Sometimes, install additional system packages as dependencies for your Python libraries or application, f … | Continue reading
You’re writing software that processes data, and it works fine when you test it on a small sample file. But when you load the real data, your program crashes. The problem is that you don’t have enough memory—if you have 16GB of RAM, you can’t load a 100GB file. At some point the … | Continue reading
Python 3.8 was released in mid-October, but if you look at my recommendation for a base image for Docker it still talks about Python 3.7. And in fact, switching to Python 3.8 immediately can cause you problems. Wondering when to switch your application? Here’s a quick rundown of … | Continue reading
Note: This article is based on a talk I gave at PyGotham 2019. I will post a link to the video when it becomes available. Your Python program is too slow. Maybe your web application can’t keep up, or certain queries are taking a long time. Maybe you have a batch program that take … | Continue reading
You’ve just gotten a bug report from production, and you want to reproduce it on your local development machine. How do you make sure you’re running the exact same version of the production code on your local machine? Using Docker images makes it easier, since you can download an … | Continue reading
Packaging can often be slow, and Docker builds are no exception. Downloading and installing system and Python packages, compiling C extensions, building assets—it all adds up. In order to speed up your builds, Docker implements caching: if your Dockerfile and related files haven’ … | Continue reading
Software needs automated tests to ensure changes don’t break: from unit test to end-to-end tests. And just like other software, your Docker image can also break. The problem with a broken Docker image is that: Unit tests won’t catch the problem. End-to-end tests will catch it too … | Continue reading
Let’s say your program is slow, and you’ve determined that it’s only partially due to CPU. How do you figure out which parts of the code are waiting for things other than CPU? In this article you’ll learn how to write custom profilers, and in particular profilers that will help y … | Continue reading
Sometimes your Python process will behave strangely, run slowly, or give you the wrong answers. And while hopefully you have logging, the logging isn’t always enough. So how do you debug this process? If you planned ahead, you can access an interactive Python prompt inside your r … | Continue reading
If your process is slow it might be because it’s very CPU-intensive—or maybe it’s blocking on I/O (network or filesystem), or locks, or just sleeping. But how can you tell? There are a variety of answers, but in this article I’m going to cover what is probably the simplest heuris … | Continue reading