Building better software with better tools: sanitizers versus valgrind

We often have to write code using  permissive programming languages like C and C++. They tend to generate hard-to-debug problems that can crash your applications. Thankfully, many compilers offer “sanitizers”. I discussed them in my post No more leaks with sanitize flags in gcc a … | Continue reading


@lemire.me | 5 years ago

Bitset Decoding on Apple’s A12

In my post Really fast bitset decoding for “average” densities, I reported on our work accelerating the decoding of bitsets. E.g., given a 64-bit register, you want to find the location of every 1-bit. So given 0b110011, you would want to get 0, 1, 4, 5. We want to do this operat … | Continue reading


@lemire.me | 5 years ago

Setting up a ROCKPro64 (powerful single-card computer)

A few months ago, I ordered ROCKPro64. If you are familiar with the Raspberry Pi, then it is a bit of the same… an inexpensive computer that comes in the form of a single card. The ROCKPro64 differs from the Raspberry Pi in that it is much closer in power to a normal PC. You … Co … | Continue reading


@lemire.me | 5 years ago

I do not use a debugger

I learned to program with BASIC back when I was twelve. I would write elaborate programs and run them. Invariably, they would surprise me by failing to do what I expect. I would struggle for a time, but I'd eventually give up and just accept that wha | Continue reading


@lemire.me | 5 years ago

Fast bitset decoding for “average” densities

Suppose I give you a word and you need to determine the location of the 1-bits. For example, given the word 0b100011001, you would like to get 0,3,4,8.You could check the value of each bit, but that would take too long. A better approach is use the fact that modern processors hav … | Continue reading


@lemire.me | 5 years ago

Speeding up a random-access function?

A common problem in software performance is that you are essentially limited by memory access. Let us consider such a function where you write at random locations in a big array. for | Continue reading


@lemire.me | 5 years ago

Parsing short hexadecimal strings efficiently

It is common to represent binary data or numbers using the hexadecimal notation. Effectively, we use a base-16 representation where the first 10 digits are 0, 1, 2, 3, 5, 6, 7, 8, 9 and where the following digits are A, B, C, D, E, with the added complexity that we can use either … | Continue reading


@lemire.me | 5 years ago

Why are unrolled loops faster?

A common optimization in software is to unroll loops. It is best explained with an example. Suppose that you want to compute the scalar product between two arrays: sum = 0 | Continue reading


@lemire.me | 5 years ago

Technological Aging

We are all familiar with biological aging. Roughly speaking, it is the loss of fitness that most animals undergo with time. At the present time, there is simply not much you can do against biological aging. You are just not going to win any gold medals in the Olympics at age 65.H … | Continue reading


@lemire.me | 5 years ago

ARM and Intel have different performance characteristics

In my previous post, I reviewed a new fast random number generator called wyhash. I commented that I expected it to do well on x64 processors (Intel and AMD), but not so well on ARM processors.Let us review again wyhash:uint64_t wyhash64_x | Continue reading


@lemire.me | 5 years ago

Faster remainders when the divisor is a constant: beating compilers & libdivide

Not all instructions on modern processors cost the same. Additions and subtractions are cheaper than multiplications which are themselves slower than divisions. For this reason, compilers frequently replace division instructions by multiplications. Roughly speaking, it works in t … | Continue reading


@lemire.me | 5 years ago

Web caching: what is the right time-to-live for cached pages?

I have been having performance problems with my blog and this forced me to spend time digging into the issue. Some friends of mine advocate that I should just “pay someone” and they are no doubt right that it would be the economical and strategic choice. Sadly, though I am eager … | Continue reading


@lemire.me | 5 years ago

My blog can’t keep up: 500 errors all over

My blog is relatively minor enterprise. It is strictly non-profit (no ad). I have been posting one or two blog posts a week for about fifteen years. I have been using the same provider in all this time (csoft.net). They charge me about $50 a month. I also subscribe to Cloudflare … | Continue reading


@lemire.me | 5 years ago

What is the space overhead of Base64 encoding?

Many Internet formats from email (MIME) to the Web (HTML/CSS/JavaScript) are text-only. If you send an image or executable file by email, it often first gets encoded using base64. The trick behind base64 encoding is that we use 64 different ASCII characters including all letters, … | Continue reading


@lemire.me | 5 years ago

Rethinking Hamming’s questions

Richard Hamming is a famous computer scientist. In his talk You and Your Research, Hamming recounts how asked researchers three questions which I paraphrase: What are the important problems of your field? What important problems are you working on? If what you are doing is not im … | Continue reading


@lemire.me | 5 years ago

Faster intersections between sorted arrays via memory level parallelism

A common problem within databases and search engines is to compute the intersection between two sorted array. Typically one array is much smaller than the other one. The conventional strategy is the “galloping intersection”. In effect, you go through the values in the small array … | Continue reading


@lemire.me | 5 years ago

Memory-Level Parallelism: Intel Skylake versus Intel Cannonlake

All programmers know about multicore parallelism: your CPU is made of several nearly independent processors (called cores) that can run instructions in parallel. However, our processors are parallel in many different ways. I am interested in a particular form of parallelism calle … | Continue reading


@lemire.me | 5 years ago

Why we make up jobs out of thin air (2012)

We prefer to invent new jobs rather than trying harder and inventing a new system that wouldn’t require everybody to have a job.” (Philippe Beaudoin) In the XXIst century, people from wealthy countries work hard primarily to gain social status. We often make the mistake of tying … | Continue reading


@lemire.me | 5 years ago

Sorting strings properly is stupidly hard

Programming languages make it hard to sort arrays properly. Look at how JavaScript sorts arrays of integers: > v = [1,3,2,10] [ 1, 3, 2, 10 ] > v.sort() [ 1, 10, 2, 3 ] You need a magical incantation to get the right result: > v.sort((a,b)=>a>b) [ 1, 2, 3, 10 ] Though this … Cont … | Continue reading


@lemire.me | 5 years ago

Asking the right question is more important than getting the right answer

Schools train us to provide the right answers to predefined questions. Yet anyone with experience from the real world knows that, more often than not, the difficult part is to find the right question. To make a remarkable contribution, you need to start by asking the right questi … | Continue reading


@lemire.me | 5 years ago

XML for databases: a dead idea

One of my colleagues is teaching an artificial intelligence class. In his class, he uses old videos where experts from the early eighties make predictions about where AI is going. These experts come from the best schools such as Stanford. These videos were not meant as a joke. Wh … | Continue reading


@lemire.me | 5 years ago

Asking the right question is more important than getting the right answer

Schools train us to provide the right answers to predefined questions. Yet anyone with experience from the real world knows that, more often than not, the difficult part is to find the right question. To make a remarkable contribution, you need to start by asking the right questi … | Continue reading


@lemire.me | 5 years ago

Crazily fast hashing with carry-less multiplications

We all know the regular multiplication that we learn in school. To multiply a number by 3, you can multiply a number by two and add it with itself. Programmers write: a * 3 = a + (a | Continue reading


@lemire.me | 5 years ago

Memory-Level Parallelism: Intel Skylake versus Apple A12/A12X

Modern processors execute instructions in parallel in many different ways: multi-core parallelism is just one of them. In particular, processor cores can have several outstanding memory access requests “in flight”. This is often described as “memory-level parallelism”. You can me … | Continue reading


@lemire.me | 5 years ago

Measuring the memory-level parallelism of a system using a small C++ program?

Our processors can issue several memory requests at the same time. In a multicore processor, each core has an upper limit on the number of outstanding memory requests, which is reported to be 10 on recent Intel processors. In this sense, we would like to say that the level of mem … | Continue reading


@lemire.me | 5 years ago

Is WebAssembly faster than JavaScript?

Most programs running on web sites are written in JavaScript. There are still a few Java applets and other plugins hanging around, but they are considered obsolete at this point. While JavaScript is superbly fast, some people feel that we ought to do better. That’s where WebAssem … | Continue reading


@lemire.me | 5 years ago

Validating UTF-8 bytes using only 0.45 cycles per byte (AVX edition)

When receiving bytes from the network, we often assume that they are unicode strings, encoded using something called UTF-8. Sadly, not all streams of bytes are valid UTF-8. So we need to check the strings. It is probably a good idea to optimize this problem as much as possible. I … | Continue reading


@lemire.me | 5 years ago

Quickly identifying a sequence of digits in a string of characters

Suppose that you want to quickly determine a sequence of eight characters are made of digits (e.g., ‘9434324134’). How fast can you go? In software, characters are mapped to integer values called the code points. The ASCII and UTF-8 code points for the digits 0, 1,…, 9 are the co … | Continue reading


@lemire.me | 5 years ago

Are vectorized random number generators actually useful? – Daniel Lemire's blog

Related Posts: Vectorizing random number generators for greater… Innovation as a Fringe Activity Top speed for top-k queries | Continue reading


@lemire.me | 5 years ago

AVX-512 throttling: heavy instructions are maybe not so dangerous

Related Posts: Innovation as a Fringe Activity Science and Technology links (March 17, 2017) By how much does AVX-512 slow down your CPU? A first… | Continue reading


@lemire.me | 5 years ago

Trying harder to make AVX-512 look bad: my quantified and reproducible results

Related Posts: Innovation as a Fringe Activity Setting up a “robust” Minecraft server… Science and Technology links (March 17, 2017) | Continue reading


@lemire.me | 5 years ago

Avoid lexicographical comparisons when testing for string equality

Related Posts: Setting up a “robust” Minecraft server… Innovation as a Fringe Activity Science and Technology links (March 17, 2017) | Continue reading


@lemire.me | 5 years ago

Performance of ranged accesses into arrays: modulo, multiply-shift, and masks

Related Posts: Setting up a “robust” Minecraft server… Innovation as a Fringe Activity Stream VByte: breaking new speed records for integer… | Continue reading


@lemire.me | 5 years ago

Sorting already sorted arrays is much faster?

Related Posts: Accelerating intersections with SIMD instructions Where are all the search trees? Software performance is… counterintuitive | Continue reading


@lemire.me | 5 years ago

The dangers of AVX-512 throttling: a 3% impact

Related Posts: Setting up a “robust” Minecraft server… Innovation as a Fringe Activity The dangers of AVX-512 throttling: myth or reality? | Continue reading


@lemire.me | 5 years ago

Fast strongly universal 64-bit hashing everywhere

Related Posts: Setting up a “robust” Minecraft server… Innovation as a Fringe Activity Science and Technology links (March 17, 2017) | Continue reading


@lemire.me | 5 years ago

Visiting all values in an array exactly once in “random order”

Related Posts: Innovation as a Fringe Activity Setting up a “robust” Minecraft server… Aging is a software bug | Continue reading


@lemire.me | 5 years ago

The dangers of AVX-512 throttling: myth or reality?

Related Posts: Setting up a “robust” Minecraft server… Innovation as a Fringe Activity Science and Technology links (March 17, 2017) | Continue reading


@lemire.me | 5 years ago

Getting 4 bytes or a full cache line: same speed or not?

Related Posts: Setting up a “robust” Minecraft server… Innovation as a Fringe Activity Science and Technology links (April 7th, 2017) | Continue reading


@lemire.me | 5 years ago

It is more complicated than I thought: -mtune, -march in GCC

Related Posts: Setting up a “robust” Minecraft server… Innovation as a Fringe Activity Science and Technology links (March 17, 2017) | Continue reading


@lemire.me | 5 years ago

Accelerating Conway’s Game of Life with SIMD Instructions

Related Posts: Science and Technology links (April 7th, 2017) Innovation as a Fringe Activity Stream VByte: breaking new speed records for integer… | Continue reading


@lemire.me | 5 years ago

Are fungi making us sick?

Related Posts: Science and Technology links (March 17, 2017) Setting up a “robust” Minecraft server… Innovation as a Fringe Activity | Continue reading


@lemire.me | 5 years ago

Which is fastest: read, fread, ifstream or mmap?

Related Posts: On the memory usage of maps in Java Computing in 2025… what can we expect? Are 8-bit or 16-bit counters faster than 32-bit counters? | Continue reading


@lemire.me | 5 years ago

Data processing on modern hardware

Related Posts: Setting up a “robust” Minecraft server… Innovation as a Fringe Activity Science and Technology links (April 7th, 2017) | Continue reading


@lemire.me | 5 years ago

Emotions Killing Your Intellectual Productivity

Related Posts: We need more than spam filters: we need bona fide… The week-end freedom test What kind of researcher are you? | Continue reading


@lemire.me | 5 years ago

Predictions

(If you enjoy these predictions, you can follow me on Twitter at @lemire.)2020 Virtual reality is ubiquitous. New game consoles come with virtual capabilities by default. Volvo commercializes self-driving cars. Ot | Continue reading


@lemire.me | 5 years ago

Greater speed in memory-bound graph algorithms with just straight C code

Related Posts: Setting up a “robust” Minecraft server… Graph algorithms and software prefetching Innovation as a Fringe Activity | Continue reading


@lemire.me | 5 years ago

Graph algorithms and software prefetching

Related Posts: Setting up a “robust” Minecraft server… Innovation as a Fringe Activity Aging is a software bug | Continue reading


@lemire.me | 5 years ago