When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a few pages … | Continue reading
If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, r … | Continue reading
The ubiquitous IEEE floating-point standard defines two numbers to represent zero, the positive and the negative zeros. You also have the positive and negative infinity. If you compute the inverse of the positive zero, you get the positive infinity. If you compute the inverse of … | Continue reading
Programmers often write out numbers as strings (e.g., 3.1416) and they want to read back the numbers from the string. If you read and write JSON or CSV files, you do this work all of the time. Previously, we showed that we could parse floating-point numbers at a gigabyte per seco … | Continue reading
Many programming languages like Java, JavaScript and C# represent strings using UTF-16 by default. In UTF-16, each ‘character’ uses 16 bits. To represent all 1 million unicode characters, some special ‘characters’ can be combined in pairs (surrogate pairs), but for much of the co … | Continue reading
Computers typically rely on binary floating-point numbers. Most often they span 64 bits or 32 bits. Many programming languages call them double and float. JavaScript represents all its numbers, by default, with a 64-bit binary floating-point number type. Human beings most of ofte … | Continue reading
We are not rational beings. We cannot string simple sequences of logical arguments without making mistakes. We cannot reason probabilistically. There is ample evidence that rational arguments fail to convince. That’s not surprising given how poor we are at evaluating rational arg … | Continue reading
When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “near … | Continue reading
One-sided bet: People commonly assume implicitly that their actions may only have good outcomes. For example, increasing the minimum wage in a country may only benefit the poor. Taking a lottery ticket only has the upside of possibly winning a lot of money. Believing in God can o … | Continue reading
In most developed countries, government massively funds through academic grants, government laboratories, tax credits and research contracts: government R&D alone can often reach 1% of the GDP. In Canada, the government loves tax credits. In the US, the government spends about 60 … | Continue reading
The number of researchers and peer-review publications is growing exponentially. It has been estimated that the number of researchers in the world doubles every 16 years and the number of research outputs is increasing even faster. If you accept that published research papers ar … | Continue reading
Someone reminded me of a prediction I made in 2011: Your iPhone will have 1TB of storage by 2020, assuming exponential growth, see my plot: http://t.co/iDiT1J7y — Daniel Lemire (@lemire) November 18, 2011 At the time, an iPhone could hold at most 32 GB of data, so 1 TB sounded in … | Continue reading
In February 2016, I placed a bet against Greg Linden in these terms: within the next three years, starting in March of this year, we would sell at least 10 million VR units a year (12 continuous months) worldwide. According to some sources, around 5 million units have been sold e … | Continue reading
Many programming languages have a number type corresponding to the IEEE binary64. In many languages such as Java or C++, it is called a double. A double value uses 64 bits and it represents a significand (or mantissa) multiplied by a power of two: m * 2p. There is also a sign bit … | Continue reading
In my previous blog post, I compared the performance of my new ARM-based MacBook Pro with my 2017 Intel-based MacBook Pro. I used a number parsing benchmark. In some cases, the ARM-based MacBook Pro was nearly twice as fast as the older Intel-based MacBook Pro. I think that the A … | Continue reading
Up to yesterday, my laptop was a large 15-inch MacBook Pro. It contains an Intel Kaby Lake processor (3.8 GHz). I just got a brand-new 13-inch 2020 MacBook Pro with Apple’s M1 ARM chip (3.2 GHz). How do they compare? I like precise data points. In some respect, the Apple M1 chip … | Continue reading
When searching in a sorted array, the standard approach is to rely on a binary search. If the input array contains N elements, after log(N) + 1 random queries in the sorted array, you will find the value you are looking for. The algorithm is well known, even by kids. You first gu … | Continue reading
If you are a nerd, the Internet is a candy store… if only you stay away from mainstream sites. Some of the best scientists have blogs, YouTube channels, they post their papers online. When they review a paper, they speak frankly, openly. Is the work good or irrelevant? You can ag … | Continue reading
Software programming looks at a glance like work done best done in isolation. Nothing could be further from the truth in my experience. Though you may be working on your little program alone, you should not dismiss the social component of the work. I often say that “programming i … | Continue reading
This week, my family got a copy of each new major game console: the Microsoft Xbox Series X and the Sony PlayStation 5. I haven’t yet had time to try them out well, but I know enough to give my first impressions. They are both very similar machines from the inside. The same kind … | Continue reading
One neat family of tools that most programmers should know about are “theorem provers”. If you went to college in computer science, you may have been exposed to them… but you may not think of using them when programming. Though I am sure that they can be used to prove theorems, I … | Continue reading
Software programming is not for everyone, but among the careers that are mostly unregulated, and thus mostly free from rents, it has consistently been one of the best choices. You can earn more money if you embrace some professions that are regulated (e.g., medical professional), … | Continue reading
Amazon has 1 million employees. “The iPhone 12 contains a Lidar. The first 3D Lidar was released a decade ago and cost $75,000.” (Calum Chace) There is water on the Moon, possibly enough to make fuel. Good looking people have greater social networks and may receive favorable trea … | Continue reading
One of the most common “data type” in programming is the text string. When programmers think of a string, they imagine that they are dealing with a list or an array of characters. It is often a “good enough” approximation, but reality is more complex. The characters must be encod … | Continue reading
In most programming languages, the value 0.1 + 0.2 differs from 0.3. Let us try it out in Node (JavaScript): > 0.1 + 0.2 == 0.3 false Yet 1 + 2 is equal to 3. Why is that? Let us look at it a bit more closely. In most instances, your computer will represent numbers … Continue rea … | Continue reading
Integers in programming languages have a valid range but arithmetic operations can result in values that exceed such ranges. For example, adding two large integers can result in an integer that cannot be represented in the integer type. We often refer to such error conditions as … | Continue reading
Programmers often need to convert a string into a floating-point numbers. For example, you might get the string “3.1416” and you would like to get the resulting value as floating-point type. In C/C++, the standard way is the strtod function: char * string = 3.1416; char * string_ … | Continue reading
It is often believed that learning is a simple matter of collecting answers and replies. I suspect that “learn mechanistically how to answer the questions” is a great way for weak students to pass courses, and for smart students to ace courses. However, I believe that if you rea … | Continue reading
A standard trick in programming is to use “sentinel values”. These are special values that represent metadata efficiently. The C language represents strings as a sequences of characters terminated with the null character. The null character is a sentinel that indicates the string … | Continue reading
I started programming professionally when Java came out and right about when C++ was the “hot new thing”. Following the then-current fashion, I looked down at C and Pascal programming. I fully embraced object-oriented programming. Everything had to be represented as an object. It … | Continue reading
When processing strings, it is tempting to view them as arrays of characters (or bytes) and to process them as such. Suppose that you would like to determine whether a string is ASCII. In ASCII, every character must be a byte value smaller than 128. A fine C++17 approach to check … | Continue reading
I was given a puzzle recently. Someone was parsing JSON files downloaded from the network from a bioinformatics URI. One JSON library was twice as fast at the other one. Unless you are on a private high-speed network, the time required to parse a file will always be small compare … | Continue reading
For high-performance software, it is sometimes needed to use different functions, depending on what the hardware supports. You might write different functions, some functions for advanced processors, others for legacy processors. When you compile the code, the compiler does not y … | Continue reading
Science is the belief in the ignorance of experts said Richard Feynman. Feynman had a Nobel prize in physics. He was a remarquable educator: his lecture notes are still popular. He foresaw nanotechnology and quantum computing. He is credited with identifying the cause of the Spac … | Continue reading
I know that floating-point arithmetic is a bit crazy on modern computers. For example, floating-point numbers are not associative: 0.1+(0.2+0.3) == 0.599999999999999978 (0.1+0.2)+0.3 == 0.600000000000000089 But, at least, this is fairly consistent in my experience. You should sim … | Continue reading
Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. | Continue reading
One of my favorite languages is the Go language. I love its simplicity. It is popular and useful in a cloud setting. Many popular tools are written in Go, and for good reasons. I gave a talk on Go last year and I was asked for a criticism of Go. I do not mind that … Continue read … | Continue reading
In my blog post A fast alternative to the modulo reduction, I described how one might map 64-bit values to an interval of integers (say from 0 to N) with minimal bias and without using an expensive division. All one needs to do is to compute x * N ÷ 264 where ‘÷’ is the … Continu … | Continue reading
I have a small collection of servers, laptops and desktops. My servers were purchased and configured at different times. By design, they have different hardware and software configurations. I have processors from AMD, Intel, Ampere and Rockchip. I have a wide range of Linux distr … | Continue reading