These Numbers Look Random but Aren’t, Mathematicians Prove

A new mathematical proof helps show whether a sequence of numbers is “pseudorandom”

Six colorful lottery balls with numbers, shown against a black background. — Talaj/Getty Images

In the real world, probability is a tough thing to characterize. If I roll a die, what does it mean to say that it has a one-sixth chance of coming up 5? We say that the outcome is random because we lack the information needed to predict which side will land facing up. But it’s not truly random. If we know all the details about how I move my hand and what forces act on the die as I toss it, we might be able to predict the outcome of the roll. Practically speaking, however, we usually lack that prediction power. Mathematicians call this situation “pseudorandom”: although it looks and seems random to us, we know that, in truth, if we were to have all the information we wanted, the die roll would not be random.

Similarly, if I ask Google for a randomly generated number, it needs a process to spit one out, yet Google doesn’t have access to purely theoretical random mathematical models. So it uses a nonrandom process that results in a number that looks randomly generated to us because we don’t know what that process was—or at least we don’t know what some of the inputs were. The number produced by Google is also pseudorandom.

Pseudorandom numbers pop up in all sorts of areas: modeling, casinos and lotteries need pseudorandom numbers as inputs. Moreover, banks and financial agencies use such numbers for security and fraud detection. Because hackers have a lot of interest in cracking these pseudorandom codes, people have developed sophisticated ways of generating these numbers. For example, many “random” number generators rely on a physical process. The cybersecurity and Internet services company Cloudflare has a system called LavaRand, for instance, that uses lava lamps to produce pseudorandom numbers.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Now suppose we have a collection of numbers or, more generally, points in space. How could you tell if the set of points came from a truly random process or if the set was produced by a deterministic process? This question is central to numerous fields of mathematics, and mathematicians have a variety of tests to detect randomness. We wonder, however, “Just how good are these tests? Are there sets of points that aren’t randomly generated but can satisfy the tests?” If so, we call them pseudorandom. Pseudorandom properties and searches for them have all sorts of applications, from understanding prime numbers to something called quantum chaos (my pick for the area of mathematical physics that would make the best band name).

During the pandemic, I began working on a problem related to pseudorandomness: the search for random behavior where there is no randomness. The work started with an e-mail I sent to Niclas Technau, a researcher who was at Tel Aviv University at the time. By mid-2022, nearly three years later, despite never having met face-to-face, the two of us had written three papers together and had found some of the only examples of sequences that we could prove passed extremely strong pseudorandom tests.

Detecting Randomness

How can we detect randomness? Perhaps the crudest way is something called uniformdistribution. Consider two boxes with dots that are apparently sprinkled within them—one where the dots fill the whole square and another where half of the square is empty.

Which box looks more random? Hopefully you chose the one that is completely filled with dots because it would be very unlikely for a random process to result in a box with no dots in its bottom half. A way to think about uniform distribution is to ask, “If I take enough points in the set, does every subregion of the area covered by the set get its fair share of points?” If so, the set is uniformly distributed. How many points should I sample? Ideally, the answer is infinitely many. That way, if the points are indeed randomly generated, the probability of odd behavior (such as all the points falling above the middle line) is very low—and gets closer and closer to zero as the number of points rises.

Let’s focus on sequences of numbers between 0 and 1. A uniformly distributed, random sequence is one where the points have an equal probability of landing anywhere on the interval. Take the sequence 0.142, 0.566, 0.274, 0.265,..., which is created by the equation xn = {πn²}, where the curly brackets mean we ignore the integer part of that number. So the first number is π x 1² = 3.141. When we remove the 3, we get 0.141. The second number is π x 2² = 12.566; without the 12, that’s 0.566.

To determine whether the sequence is uniformly distributed, we can ask, “For a given number of points in this sequence, how many fill in any particular subregion between 0 and 1?” The fair share of points to land in the subregion is the number of points multiplied by the area of that region because the probability of a point landing in the region is equal to the region’s area.

When we compute this property, we find that this sequence, x_n = {πn²}, is uniformly distributed. But, for example, the sequence x_n = {³⁄₇n²} is not uniformly distributed because the subinterval between 0.05 and 0.1 is an example of a region that will never receive a point. (These results date back to pioneering work in 1916 by mathematician Hermann Weyl.)

Other Tests

Although uniform distribution is a useful test, it’s a crude measure for randomness. For another way to assess how random a set of points is, try spotting the randomly generated image here:

Even though the dots in all three boxes look uniformly distributed, they don’t all look random. The box at the left has very regular spacing between all its points, and in the middle image, the dots seem to be clustering in groups. Only the points in the box at right appear random. This kind of example led mathematicians to develop more sophisticated tests called gapdistribution and pair correlation. The former measures the size of the gaps between the points, and the latter measures the clustering of the points—how much they group up or stay apart. In these tests, we count the number of gaps between points or the number of pairs that are close to one another, respectively, as we consider more and more points. If these agree with what we would expect from random data, we say that the gap distribution or pair correlation is “Poissonian” (named for French probabilist Siméon-Denis Poisson).

These tests are useful, but it remains difficult to actually prove that a sequence satisfies them. Essentially, it’s easier to prove that a set of points has a pattern than to show that it lacks any pattern. But as mathematicians, we want a logical proof and won’t be satisfied by simply testing a large number of points.

A Set of Examples

What I used above is part of a popular set of examples that can all be written as x_n = {αn^θ}, where α and θ each represent a number (e.g., α = π and θ = 2). These are historically interesting sequences because they were one of the first examples Weyl used to understand uniform distribution. More recently, they have regained popularity because of some deep connections with ideas related to quantum chaos. Oddly enough, we know that almost all choices of α and θ will give a sequence with Poissonian pair correlation Until recently, however, we had no example of an exact choice of α and θ that would give Poissonian pair correlation. This is a common issue in mathematics, where we can show that most choices lead to a particular outcome but can’t find one specific example. (We say it’s like trying to find hay in a haystack.) And in the hunt for gap distribution, we are really stuck.

This is the problem that Technau and I decided to tackle. Following our early online discussions, we began by looking at θ = 1/2, which, for complicated reasons, is an important example. We were trying to bypass a lot of the heavy mathematical machinery typically used on this problem (mainly because I didn’t understand it) and return to the 1916 techniques that Weyl used for uniform distribution. The game came down to showing that a particular sum is small. Initially, we could show that the sum cannot be infinitely large, but we couldn’t limit it enough.

After lots of head scratching, the breakthrough came when we realized that these techniques work better if θ is small, so we changed problems. Together with mathematician Athanasios Sourmelidis, who had been working on the same problem independently, we were able to show that if θ is less than or equal to 1/3, the sequence has Poissonian pair correlation (for any α). This gave us a whole family of sequences that we could prove have Poissonian pair correlation. Then Techanau and I were able to extend these techniques, making the exponent even smaller and showing that the resulting sequences satisfy an even stronger pseudorandom test called triple correlations. By taking θ smaller and smaller, we were able to prove stronger and stronger pseudorandom properties.

The reason these techniques work is that, as we make the exponent smaller, the sequence n^θ grows more and more slowly, which is useful in limiting the resulting sum (which we couldn’t find an upper bound for in the case of θ = 1/2. There are sequences that grow slower than n^θ no matter what the exponent is, such as the sequence (log n)^A, where log is the natural logarithm (although any logarithm will work) and A is any exponent bigger than 0. Logs and powers of logs grow really, really slowly. With this insight, we could show that if θ > 1, the sequence x_n = {α(log n)^θ} has all correlations of Poissonian and Poissonian gap distribution.

This combination is, in a sense, the strongest pseudorandom test possible. And this is the only family of examples that we know of that displays this behavior. In other words, our mathematical machinery is unable to detect if this sequence is deterministic or random junk (even after infinitely many samples). That means that there is currently no way to tell whether this sequence is random or pseudorandom. The strange thing is that if we take θ = 1, the sequence isn’t even uniformly distributed. So for the sequence x_n = {αlog n}, the points seem to cluster to the left of the interval between 0 and 1. This pattern is even visible to the naked eye and is related to something called Benford’s law, meaning that it’s immediately obvious that this sequence isn’t random.

This world of pseudorandom numbers is exotic and full of open problems. Mathematicians think that all sorts of interesting sequences are pseudorandom but have no way to prove that. Although there are deep connections that exist to other areas of mathematics, the lack of proofs for so many of our conjectures about pseudorandom sequences has held us back. Technau, Sourmelidis and I have finally provided one set of examples that we can show exhibit strong pseudorandom properties, yet these techniques seem to only work on slowly growing sequences. It’s always hard to know where progress will come in mathematics. Perhaps combining some more complicated machinery out there with the methods we developed might give further insights. Or perhaps someone will develop a whole new toolbox of techniques to decide what is and isn’t random.