Metablog

The wrong way to speed up your code with Numba

If your NumPy-based code is too slow, you can sometimes use Numba to speed it up. Numba is a compiled language that uses the same syntax as Python, and it compiles at runtime, so it’s very easy to write. And because it re-implements a large part of the NumPy APIs, it can also eas … | Continue reading

@pythonspeed.com | 1 month ago

Jevons Paradox doesn't always apply to software

When it comes to fighting climate change, I strongly believe that getting involved in politics is one of the most useful things you can do. But given how energy-intensive software is these days, writing more efficient software also seems worth doing, especially if your software i … | Continue reading

@pythonspeed.com | 2 months ago

What can you do about climate change?

Climate change is impacting the whole planet, and getting worse every year. So you want to do something—but you’re not sure what. If you do some research you might encounter an essay by Bret Victor—What can a technologist do about climate change? There’s a whole pile of good idea … | Continue reading

@pythonspeed.com | 2 months ago

Not just NVIDIA: GPU programming that runs everywhere

If you’re doing computations on a GPU, NVIDIA is the default, alongside its CUDA libraries. Some libraries like PyTorch support do support AMD GPUs and Macs. But from the re-implementations of NumPy, SciPy, and Pandas in the RAPIDS project, to Numba’s GPU support, NVIDIA has best … | Continue reading

@pythonspeed.com | 2 months ago

Profiling your Numba code

pre { font-size: 90% !important; } If you’re writing numeric Python code, Numba can be a great way to speed up your program. By compiling a subset of Python to machine code, Numba lets you write for loops and other constructs that would be too slow in normal Python. In other w … | Continue reading

@pythonspeed.com | 2 months ago

Beware of misleading GPU vs CPU benchmarks

Do you use NumPy, Pandas, or scikit-learn and want to get faster results? Nvidia has created GPU-based replacements for each of these with the shared promise of extra speed. For example, if you visit the front page of NVidia’s RAPIDS project, you’ll see benchmarks showing cuDF, a … | Continue reading

@pythonspeed.com | 3 months ago

NumPy 2 is coming: preventing breakage, updating your code

If you’re writing scientific or data science code with Python, there’s a good chance you’re using NumPy, directly or indirectly. Pandas, Scikit-Image, SciPy, Scikit-Learn, AstroPy… these and many other packages depend on NumPy. NumPy 2 is a new major release, with a release candi … | Continue reading

@pythonspeed.com | 3 months ago

How many CPU cores can you actually use in parallel?

When you’re running a CPU-intensive parallel program, you often want to have a thread or process pool sized by the number of CPU cores on your machine. Fewer threads and you’re not taking advantage of all the cores, more than that and your program will start running slower as mul … | Continue reading

@pythonspeed.com | 4 months ago

Using Polars in a Pandas world

Polars is a dataframe-based library that can be faster, more memory efficient, and often simpler to use than Pandas. It’s also much newer, and correspondingly less popular. In November 2023: Polars had ~2.6 million downloads from PyPI. Pandas had ~140 million downloads! Becau … | Continue reading

@pythonspeed.com | 5 months ago

Two kinds of threads pools, and why you need both

When you’re doing large scale data processing with Python, threads are a good way to achieve parallelism. This is especially true if you’re doing numeric processing, where the global interpreter lock (GIL) is typically not an issue. And if you’re using threading, thread pools are … | Continue reading

@pythonspeed.com | 5 months ago

When should you upgrade to Python 3.12?

Python 3.12 is out now–but should you switch to it immediately? And if you shouldn’t upgrade just yet, when should you? Immediately after the release, you may not want to upgrade just yet. But from December 2023 and onwards, upgrading is definitely worth trying. To understand why … | Continue reading

@pythonspeed.com | 6 months ago

Speeding up Cython with SIMD

Cython allows you to write compiled extensions for Python, by translating Python-y code to C or C++. Often you’ll use it to speed up your software, and it’s especially useful for implementing small data science or scientific computing algorithms. But what happens when Cython is t … | Continue reading

@pythonspeed.com | 6 months ago

Speeding up Floyd-Steinberg dithering: an optimization exercise

pre { white-space: pre; overflow-x: auto; font-size: 80%; } The common advice when Python is too slow is to switch to a low-level compiled language. But what do you do if that code is too slow? Almost always there’s still plenty of performance improvements you can get just … | Continue reading

@pythonspeed.com | 7 months ago

The easiest way to speed up Python with Rust

If you want to speed up some existing Python code, writing a compiled extension in Rust can be an excellent choice: In many situations, Rust code can run much faster than Python. Rust prevents most of the memory-management bugs that often occur in C, C++, and Cython code. The … | Continue reading

@pythonspeed.com | 9 months ago

When NumPy is too slow

If you’re doing numeric calculations, NumPy is a lot faster than than plain Python—but sometimes that’s not enough. What should you do when your NumPy-based code is too slow? Your first thought might be parallelism, but that should probably be the last thing you consider. There a … | Continue reading

@pythonspeed.com | 10 months ago

Understanding CPUs can help speed up Numba and NumPy code

When you need to speed up your NumPy processing—or just reduce your memory usage—the Numba just-in-time compiler is a great tool. It lets you write Python code that gets compiled at runtime to machine code, allowing you to get the kind of speed improvements you’d get from languag … | Continue reading

@pythonspeed.com | 10 months ago

Choosing a good file format for Pandas

Before you can process your data with Pandas, you need to load it (from disk or remote storage). There are plenty of data formats supported by Pandas, from CSV, to JSON, to Parquet, and many others as well. Which should you use? You don’t want loading the data to be slow, or us … | Continue reading

@pythonspeed.com | 11 months ago

"Externally managed environments": when PEP 668 breaks pip

You’re on a new version of Linux, you try a pip install, and it errors out, talking about “externally managed environments” and “PEP 668”. What’s going on? How do you solve this? Let’s see: What the problem looks like, and what causes it. The places you are likely to encounter … | Continue reading

@pythonspeed.com | 11 months ago

Goodbye to Flake8 and PyLint: faster linting with Ruff

Flake8 and PyLint are commonly used, and very useful, linting tools: they can help you find potential bugs and other problems with your code, aka “lints”. But they can also be slow. And even if they’re fast on your computer, they may still be slow in your CI system (GitHub Action … | Continue reading

@pythonspeed.com | 12 months ago

Polars for initial data analysis, Polars for production

Initial data analysis (IDA) has different goals than your final, production data analysis: With IDA you need to examine the initial data and intermediate results, check your assumptions, and try different approaches. Exploratory data analysis has similar requirements. Once you … | Continue reading

@pythonspeed.com | 1 year ago

Staying secure by breaking Docker caching

When building Docker images, caching lets you speed up rebuilding images. But this has a downside: it can keep you from installing security updates from your base Linux distribution. If you cache the image layer that includes the security update… you’re not getting new security u … | Continue reading

@pythonspeed.com | 1 year ago

Speeding up text processing in Python (is hard)

If you’re doing text or string manipulation in Python, what do you do if your code is too slow? Assuming your algorithm is reasonably efficient, the next step is to try faster alternatives to Python: a compiled extension. Unfortunately, this is harder than it seems. Some options … | Continue reading

@pythonspeed.com | 1 year ago

Python's multiprocessing performance problem

Because Python has limited parallelism when using threads, using worker processes is a common way to take advantage of multiple CPU cores. The multiprocessing module is built-in to the standard library, so it’s frequently used for this purpose. But while multiple processes let yo … | Continue reading

@pythonspeed.com | 1 year ago

Don't bother trying to estimate Pandas memory usage

You have a file with data you want to process with Pandas, and you want to make sure you won’t run out of memory. How do you estimate memory usage given the file size? At times you may see estimates like these: “Have 5 to 10 times as much RAM as the size of your dataset”, or “ … | Continue reading

@pythonspeed.com | 1 year ago

The problem with float32: you only get 16 million values

Libraries like NumPy and Pandas let you switch data types, which allows you to reduce memory usage. Switching from numpy.float64 (“double-precision” or 64-bit floats) to numpy.float32 (“single-precision” or 32-bit floats) cuts memory usage in half. But it does so at a cost: float … | Continue reading

@pythonspeed.com | 1 year ago

When should you upgrade to Python 3.11?

Python 3.11 has been released—when should you switch to using it? | Continue reading

@pythonspeed.com | 1 year ago

Find slow data processing tasks (before your customers do)

Your data processing jobs are fast… most of the time. Next, find the slow runs so you can speed them up. | Continue reading

@pythonspeed.com | 1 year ago

To find performance bottlenecks, observe production

Performance bottlenecks causes vary widely, from network latency to software bugs. Observation in production may therefore be the only way to find them. | Continue reading

@pythonspeed.com | 1 year ago

Why new Macs break your Docker build, and how to fix it

New Macs can break your Docker image build in unexpected ways; learn why, and how to fix it. | Continue reading

@pythonspeed.com | 1 year ago

Pandas vectorization: faster code, slower code, bloated memory

Vectorization in Pandas can make your code faster—except when it will make your code slower. | Continue reading

@pythonspeed.com | 1 year ago

Making pip installs a little less slow

Installing packages with pip, Poetry, and Pipenv can be slow. Learn how to ensure it’s not even slower, and a potential speed-up. | Continue reading

@pythonspeed.com | 1 year ago

Faster, more memory-efficient Python JSON parsing with msgspec

msgspec is a schema-based JSON encoder/decoder, which allows you to process large files with lower memory and CPU usage. | Continue reading

@pythonspeed.com | 1 year ago

When Python can’t thread: a deep-dive into the GIL’s impact

Python’s Global Interpreter Lock (GIL) stops threads from running in parallel or concurrently. Learn how to determine impact of the GIL on your code. | Continue reading