Boltzmann’s Past Hypothesis: Why Yesterday Was Special?
- JYP Admin
- 1 day ago
- 15 min read
Author: Arpan Dey
One of the biggest puzzles in physics is that although the laws of physics make no distinction between the past and the future at the fundamental level – meaning it is theoretically possible for time to ‘run backward’ – yet we do not see a broken coffee mug pick itself up from the floor and repair itself. Why does time not run backward? We do not have a definitive answer yet, but the overall increase in the entropy of the universe is believed by many to be responsible for the unidirectional flow of time (often referred to as the arrow of time).
Statistical mechanics tries to understand large (macroscopic) systems in terms of the dynamics of the microscopic constituents, and with the help of probabilistic assumptions [1,2]. A fundamental assumption in statistical mechanics is that every possible microstate is equally likely for a system in equilibrium. Fundamentally, we cannot assume a particular microstate to be more probable than any other microstate without precisely explaining this preference. We do not expect to see any particular possibility being preferred over any other possibility at a fundamental level. Consider the arrow of time. It is possible for it to point both into the future and into the past, but observations confirm that in our universe, it points into the future. To get around this ‘preference,’ the idea of a parallel universe where the cosmological arrow of time points in the opposite direction has been considered [3]. Essentially, we feel every possibility must occur in reality somewhere at some point in time. This is beautifully captured in the popular saying: “Everything not forbidden is compulsory” [4].
In statistical mechanics, we have two extremely fundamental ideas: ergodicity and the postulate of equal a priori probabilities [5]. According to the ergodic hypothesis in statistical mechanics, the time average of a macroscopic parameter of a system in equilibrium must be equal to the ensemble average of the same parameter of the same system in equilibrium. Suppose we want to measure a macroscopic parameter A of a box of gas in equilibrium with its surroundings. Let us construct a hypothetical scenario in which we measure A for this system a very large number of times, and at infinitesimal intervals of time. Then we calculate the average of all these values. This is the time average of the parameter A. Next, let us consider another hypothetical picture in which we create a very large number of identical copies (an ensemble) of this system in space. When we say “identical copies,” we mean all these copies must be in the same macrostate. There can be many possible microstates corresponding to the same macrostate; thus all these identical copies of our system are allowed to be in different microstates, as long as they correspond to the same macrostate.

Now, imagine we measure the parameter A of all these copies of the system at once, and calculate the average of all the obtained values; this gives the ensemble average of A for this system. According to the ergodic hypothesis, both these average values (the time average and the ensemble average) should be equal. One might wonder what is the need of even considering the ensemble average, since we have said that all the “identical copies” have the same macrostate, and we are measuring a macroscopic parameter A. Should all these “identical copies” not have the same value of A? It is important to realize that certain macroscopic parameters of a system are functions of the underlying microstate, and can vary across microstates corresponding to a macrostate that is defined using some other macroscopic parameter. Ensemble-averaging becomes important when we are measuring such a parameter. For example, say we have a box of gas where the energy is fixed. There are many possible microstates corresponding to this same energy macrostate. We, however, are interested in measuring the pressure. The pressure of this system will be different for these different microstates, even though all of them correspond to the same energy. In some microstates, the particles will be colliding with the walls of the box more frequently; in some other microstates, the particles will be colliding among themselves in the middle of the box more frequently, and so on. The pressure would be different for each microstate, and the average pressure can be determined by averaging the pressure measurement outcomes obtained for each microstate (ensemble average).
Let us now think about the implications of the apparently obvious-sounding claim that the time and ensemble averages must be equal. When we consider the identical copies (ensemble) of the system, we take into account all the possible, unique microstates of the system. And this number is very large, infinite for all practical purposes. Imagine all the possible combinations of position and momenta of all the microscopic gas molecules in a box of gas made of a large number of molecules; this is a huge number. This means, in the ensemble, we have one copy of each microstate corresponding to the given macrostate. The ensemble average is the average of all the values obtained for each microstate, counted only once. And when we are making a large (infinite) number of measurements at very small intervals of time to calculate the time average, we get the same result. This means as the system evolves over a sufficiently long period of time, it accesses all the possible, unique microstates (that correspond to the given macrostate). And importantly, the system will never revisit a microstate it has already visited without first covering all the available microstates. (Simply put, if we have ten single rooms in a guest house and nine guests, we would allocate nine unique rooms to the guests, and certainly not force two of them into the same room.) Consider the number of microstates to be infinite, this implies that it is equally likely to find a system in equilibrium in any one of the possible microstates that correspond to the given macrostate. This, in fact, is the postulate of equal a priori probability.
The above discussion, however, leaves us with a puzzling insight. According to the postulate of equal a priori probabilities, all the microstates corresponding to a given macrostate of a system in equilibrium are equally probable. Let us consider a box of ideal gas. It is perfectly possible that two different configurations of the gas particles – say one in which they are huddled close together in a corner of the box, and another in which they move through the entire volume of the box – correspond to the same temperature of the system. These two configurations, shown in the figure below, are two possible microstates corresponding to the same macrostate (temperature).

The pressure of the gas and occupied volume varies in the two cases, but that does not necessarily affect the temperature of the system, as long as the average kinetic energy of the gas particles remains the same. Now, according to the postulate of equal a priori probabilities, both the above configurations (which correspond to two possible microstates corresponding to the given temperature) are equally likely. However, from our knowledge of entropy, it appears that the configuration in which the particles are huddled in a corner (say, configuration A) is a low entropy configuration, and should be less likely as compared to the other configuration (say, configuration B) in a universe where the entropy of a closed system cannot decrease (assume this box of gas to be a closed system, it does not interact with its surroundings). This means if we find the system in configuration B (assuming the system has not yet been in the configuration A), we would expect not to find the system in configuration A ever. In a universe where the entropy of a closed system cannot decrease, how can we expect gas particles in a spread-out configuration to spontaneously collapse together in a small region of space, which is clearly a low entropy configuration? But the postulate of equal a priori probabilities demands that even this ‘unlikely’ configuration be attained by the system, since it is equally likely as the other configurations as long as they correspond to the given macrostate.
Clearly, in configuration A, the volume available to the gas particles is limited, reducing the number of spatial configurations. Now, since entropy depends not just on the number of possible spatial configurations, but also energy and velocity configurations, one might argue that when the gas molecules are close together, due to interaction the number of velocity and energy configurations increase, thus making up for the decrease in the number of possible spatial configurations and ensuring entropy does not decrease. However, this argument is flawed if we are talking about a system of ideal gas. Ideal gas particles do not have potential energy, and the kinetic energies of ideal gas particles (and hence their velocities) depend solely on the temperature. At a constant temperature (which is the given macrostate here), the energies and velocities of the particles would remain unaffected regardless of whether the particles are spread out or huddled in a corner. This means that configuration A certainly has a lower entropy than configuration B. There is only one conclusion: if we wait long enough, it is indeed possible that a system transitions, all by itself, from a high entropy configuration to a low entropy configuration. This can be thought of as a result of an extremely improbable (but not impossible) statistical fluctuation. Of course, this can be ignored for all practical purposes, and this conclusion cannot be drawn from the second law of thermodynamics, which simply states that the entropy of a closed system is not allowed to decrease. And yes, this law is correct, but it is a statistical law. It works extremely well in all cases practically. However, theoretically, we cannot ignore this strange conclusion that even in a closed system, a spontaneous transition to a lower entropy state could be possible.
In light of the above discussions, it is clear that we do not fundamentally understand why entropy increases in our universe. To answer this question, Boltzmann assumed that our universe began in a very unlikely, low entropy state [6]. This assumption is referred to as the past hypothesis, but the problem with the idea is that it is a postulate that cannot be questioned and if questioned, it cannot generate any further insights about the subject. It almost feels like sweeping the real question under the rug by forcefully postulating that the universe began in a low entropy state.
It is known that in the early universe, density of matter was much higher than it is today, and the gravitational pull between particles was extremely strong. It is also known, based on evidence like the Cosmic Microwave Background radiation, that the early universe was very smooth and homogeneous. We are talking about a time when matter had not clustered into structures like stars and galaxies, and the matter density and overall temperature of the universe was almost constant. This smoothness of the early universe is perplexing [7]. The universe must have been perfectly smooth, because even slight imperfections or non-uniformity in the density of matter would have caused the universe to collapse under the strong influence of gravity. This points at very special initial conditions of the universe. Only a small number of precise configurations would be able to maintain this overall smoothness. Clearly, this means the early universe was in a state of extremely low entropy, since there were very few ways one could prepare this state (keep in mind that the early universe was very small in size). Since the universe started off in a state of extremely low entropy, it is more likely for the overall entropy of the universe to increase rather than decrease further. This, however, is a probabilistic statement. Recall from our discussion on the postulate of equal a priori probabilities that all possible microscopic configurations for a given macrostate of a system in equilibrium (assume the universe is an isolated, closed system) are equally likely. Thus, in theory, if we wait long enough, we should watch configurations that seem unlikely, like the smooth configuration of the early universe. We have already discussed the essence of this argument previously. It is possible for gas particles in a system of ideal gas, initially in a spread-out configuration, to spontaneously come close together and huddle in a corner, simply because both these configurations can correspond to the same temperature (same macrostate) of the system, and are equally likely according to the postulate of equal a priori probabilities.
Let us consider such a spontaneous transition from a high entropy state to a low entropy state in a system, due to a random statistical fluctuation. Such transitions are never observed in the real world because they are very, very unlikely, and virtually impossible within time periods that we can analyze. But theoretically, the probability of such a transition is not zero.
Picture a box of ideal gas, initially in configuration A where the gas particles are relatively spread out as they move about randomly. Then, all of a sudden, the particles collapse into a corner of the box, all by themselves (configuration B). Then the particles spread out again (configuration C). Clearly, configuration B has the lowest entropy, the previous configuration A as well as the subsequent configuration C have higher entropies than configuration B. This illustrates how entropy can increase both into the past as well as in the future.

In the same way, we could start with a high entropy configuration, and arrive at a low entropy configuration like the Big Bang state via a random statistical fluctuation. Then again entropy begins to increase as the Big Bang state starts to expand. Over a long period of time, it is expected that the universe would reach a state of maximum entropy, when everything in the universe would be in perfect equilibrium with everything else. Of course, life would no longer exist in such a state, because intelligent life must be able to perceive changes around them, perceive the passage of time, process information, form memories, metabolize and so on. No change could occur in the maximum entropy state, since it is already in perfect equilibrium, and any flow of energy or information between its parts would mean disrupting that equilibrium. Entropy could no longer increase since it has already attained its maximum value.
The above plot of entropy as a function of time (Figure 3) shows entropy increasing both into the past and the future. Boltzmann hypothesized that over a very long period of time, statistical fluctuations could give rise to both small as well as big dips in entropy [8]. Clearly, our place in this picture is a point on the entropy-vs-time curve with a positive slope (Figure 4). This indicates that the entropy is increasing right now, since we started off in a very unusual state, which was probably the result of a statistical fluctuation from a more probable earlier state.

This line of argument leads us to even more bizarre possibilities. In an infinitely big universe, over a sufficiently long period of time, there could be bigger fluctuations into lower entropy states. Cosmology tells us that the universe came into existence about 13.8 billion years ago, and eventually evolved to its present form, where we find ourselves in. Statistically however, it is more probable for the fluctuation to simply create a single conscious brain, complete with false memories of a past that never really occurred, rather than creating a state like the Big Bang and eventually a real universe with all this complexity! This absurd-sounding idea is often referred to as the Boltzmann brain hypothesis. From a purely statistical point of view, this argument makes sense. It is more probable, for instance, for some particles to spontaneously come together and arrange themselves in precisely the right way to create a smartphone, than for some particles to come together in precisely the way required for a universe to form, that would then evolve in exactly the way necessary for intelligent life to emerge, and eventually the lifeforms would evolve into creatures sufficiently intelligent to manufacture a smartphone. Of course, we would never expect to see a smartphone assemble all by itself; this event is so unlikely that it is impossible from a practical perspective. However, statistically it is still more probable than the other option. In the same way, is it not more likely that a single brain spontaneously self-assembles with false memories of the past, rather than the entire universe – with all its structures – spontaneously emerging? If we think about the Big Bang state, it is improbable, but not impossible, that all the particles in a high entropy universe suddenly collapse to the low entropy Big Bang configuration. The point is, although theory leaves out some possibility, however small, for all the particles to end up at a single, extremely dense point, smaller, localized dips in entropy are much more likely. So why collapse an entire universe instead of assembling a random collection of particles into a single brain?
This absurd hypothesis is, undoubtedly, intriguing. However, many flaws with this argument have been identified [8]. Feynman pointed out that if the world around us was the result of a random fluctuation, we would not have found that the predictions we make based on observations on our immediate surroundings hold consistently in a distant part of the world, or the universe. For example, Newton’s law of gravity holds just as good for a stone thrown upwards from Earth as it does for the Earth and the Sun, and it does so always. It is not that we find Newton’s law worked yesterday but does not work today. Similarly, we do not see different results if we perform a scientific experiment today or tomorrow, provided we keep the external conditions exactly the same on both days. Feynman argued that we would have observed major inconsistencies around us if the world did not exist physically and was just the result of a random fluctuation. Sean Carroll puts forward another powerful counterargument. If we accept that the Boltzmann brain hypothesis is true, we essentially accept that the world around us is not real, but simulated. However, we were led to form this hypothesis based on observations on this simulated world. If this world is not real, these observations, and hence the hypothesis, cannot be trusted. Essentially, Carroll argues that if we accept that we are Boltzmann brains, we cannot trust any of our thoughts since Boltzmann brains result from random fluctuations and are “cognitively unstable” [9].
So, if the world is not a fluctuation, why was the early universe in such an extremely unlikely, low entropy state? We do not have a definitive answer today, and it is possible we never will. This could simply be a brute fact, an unknowable feature of the universe that cannot be understood in terms of anything more fundamental. We just have to accept it as a postulate. Boltzmann realized that it could be possible that the unlikely initial state of the universe cannot be explained, and has to be accepted as a fact. He said [10]: “That in nature the transition from a probable to an improbable state does not take place as often as the converse, can be explained by assuming a very improbable initial state of the entire universe surrounding us. This is a reasonable assumption to make, since it enables us to explain the facts of experience, and one should not expect to be able to deduce it from anything more fundamental.”
There is an alternative way to approach this problem. We may discard the assumption that the universe is a closed and isolated system, and that the Big Bang was the beginning of it all [7]. Our universe could be part of a bigger universe; in which smaller universes endlessly form and collapse. These ideas are not simply figments of imagination, we have indirect evidence suggesting that such a picture is possible, at least in theory. The idea of parallel universes crops up in inflation theory in cosmology, the many-worlds interpretation in quantum mechanics etc. Although parallel realities in many-worlds interpretation are not exactly identical to the parallel universes that are seen in inflationary cosmology, the idea of parallel universes seems to have stuck with physicists.
It is believed that fundamentally, every possibility must take place. It might appear crude to simply hypothesize that the possibilities that do not occur in our universe might occur in parallel universes, but the idea of parallel universes has its strengths and appeal. Max Tegmark makes a beautiful point that shows why the idea of parallel universes might not be as preposterous as it sounds [11]: “If you’re... struggling to make inner peace with parallel universes, here’s another way of thinking about them that might help... When we discover an object in Nature, the scientific thing to do is look for a mechanism that created it. Cars are created by car factories, rabbits are created by rabbit parents and solar systems are created from gravitational collapse in giant molecular clouds. So it’s quite reasonable to assume that our universe was created by some sort of universe-creation mechanism... Now here’s the thing: all the other mechanisms we mentioned naturally produce many copies of whatever they create; a cosmos containing only one car, one rabbit, and one solar system would seem quite contrived. In the same vein, it’s arguably more natural for the correct universe-creation mechanism, whatever it is, to create many universes rather than just the one we inhabit.”
Of course, these ideas are of little practical importance and remain out of reach of experimental falsification and are largely speculative. However, these arguments, counterarguments and ideas are so deep and fascinating that engaging in them not only offers great intellectual stimulation, but can also potentially offer insights about the fundamental nature of the universe.
References and Further Reading
[1] Frigg, R., Werndl, C. (2024). Philosophy of Statistical Mechanics. The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/win2024/entries/statphys-statmech/
[2] Eastman, P. (2014-2015). Introduction to Statistical Mechanics. Stanford University. https://web.stanford.edu/~peastman/statmech/index.html
[3] Barbour, J., Koslowski, T., & Mercati, F. (2014). Identification of a gravitational arrow of time. Physical Review Letters, 113(18). https://doi.org/10.1103/physrevlett.113.181101
[4] Kragh, H. (2019). Physics and the Totalitarian Principle. https://arxiv.org/pdf/1907.04623
[5] Kupervasser, O. (2013). Basic Paradoxes of Statistical Classical Physics and Quantum Mechanics. Universal Journal of Physics and Application, 7(3), 299–349. https://doi.org/10.13189/ujpa.2013.010311
[6] Chen, E. K. (2023). The past hypothesis and the nature of physical laws. In Harvard University Press eBooks (pp. 204–248). https://doi.org/10.2307/j.ctv32nxzc6.10
[7] Carroll, S. M. (2011). Cosmology and the arrow of time. TEDxCaltech. https://youtu.be/WMaTyg8wR4Y
[8] Lazarovici, D., & Reichert, P. (2020). Arrow(s) of Time without a Past Hypothesis. In World Scientific eBooks (pp. 343–386). https://doi.org/10.1142/9789811211720_0010
[9] Carroll, S. M. (2020). Why Boltzmann Brains are bad. In Routledge eBooks (pp. 7–20). https://doi.org/10.4324/9781315713151-3
[10] Callender, C. (2004). There is no Puzzle about the Low-Entropy Past. Chapter 12, Contemporary debates in philosophy of science. In Blackwell Pub. eBooks (Issue 1). http://www.fitelson.org/confirmation/contemporary_debates_in_philosophy_of_science.pdf
[11] Tegmark, M. (2015). Our mathematical universe: My Quest for the Ultimate Nature of Reality. Vintage.