# Chapter 5

The Self-Sampling Assumption in Science

We turn to the second strand of arguments for SSA. Here we show that many important scientific fields implicitly rely on SSA and that it (or something much like it) constitutes an indispensable part of scientific methodology.

Recall our earlier hunch that the trouble in deriving observational consequences from theories that were coupled to some Big World hypothesis might originate in the somewhat “technical” point that while in a large enough cosmos, every observation will be made by *some *observers here and there, it is notwithstanding true that those observers are exceedingly rare and far between. For every observation made by a freak observer spontaneously materializing from Hawking radiation or thermal fluctuations, there are trillions and trillions of observations made by regular observers who have evolved on planets like our own, and who make veridical observations of the universe they are living in. Maybe we can solve the problem, then, by saying that although all these freak observers exist and are suffering from various illusions, it is highly unlikely that *we *are among their numbers? In this case we should think, rather, that we are very probably one of the regular observers whose observations reflect reality. We could safely ignore the freak observers and their illusions in most contexts when doing science. Because the freak observers are in such a tiny minority, their observations can usually be disregarded. It is *possible *that we are freak observers. We should assign to that hypothesis some finite probability—but such a tiny one that it doesn’t make any practical difference.

To see how SSA enables us to cash in on this idea, it is first of all crucial that we construe our evidence differently than we did when originally stating the conundrum. If our evidence is simply “Such and such an observation is made” then the evidence has probability one given any Big World theory—and we ram our heads straight into the problem that all Big World theories become empirically impotent. But if we construe our evidence in the more specific form “*We *are making such and such observations.” then we have a way out. For we can then say that although Big World theories make it probable (*P *˜ 1) that some such observations be made, they need not make it probable that we should be the ones making them.

Let us therefore define:

*E’ *:= “Such and such observations are made by us.”

*E’ *contains an indexical component that the original evidence-statement we considered, *E*, did not. *E’ *is logically stronger than *E*. The rationality requirement that one should take all relevant evidence into account dictates that in case *E’ *leads to different conclusions than does *E*, it is *E’ *that determines what we ought to believe.

A question that now arises is how to determine the evidential bearing that statements of the form of *E’ *have on cosmological theories. Using Bayes’ theorem, we can turn the question around and ask, how do we evaluate P(*E’*|*T*&*B*), the conditional probability that a Big World theory gives to us making certain observations? The argument in chapter 3 showed that if we hope to be able to derive any empirical implications from Big World theories, then P(*E’*|*T*&*B*) should not generally be set to unity or close to unity. P(*E’*|*T*&*B*) must take on values that depend on the particular theory and the particular evidence that we are considering. Some theories *T *are supported by some evidence *E’*; for these choices P(*E’*|*T*&*B*) is relatively large. For other choices of *E’ *and *T*, the conditional probability will be relatively small.

To be concrete, consider the two rival theories *T _{1 }*and

*T*about the temperature of the cosmic microwave background radiation. (

_{2 }*T*was the theory that says that the temperature of the cosmic microwave background radiation is about 2.7 K (the observed value);

_{1 }*T*says it is 3.1 K.) Let

_{2 }*E’*be the proposition that we have made those observations that cosmologists innocently take to support

*T*.

_{1}*E’*includes readouts from radio telescopes, etc. Intuitively, we want P(

*E’*|

*T*&

_{1}*B*) > P(

*E’*|

*T*&

_{2}*B*). That inequality must be the reason why cosmologists believe that the background radiation is in accordance with

*T*rather than

_{1 }*T*, since a priori there is no ground for assigning

_{2}*T*a substantially greater probability than

_{1 }*T*.

_{2}
A natural way in which we can achieve this result is by postulating that we should think of ourselves as being in some sense “random” observers. Here we use the idea that the essential difference between *T*_{1 }and *T _{2 }*is that the

*fraction*of observers who would be making observations in agreement with

*E’*is enormously greater on

*T*

_{1 }than on

*T*. If we reason as if we were randomly selected samples from the set of all observers, or from some suitable subset thereof, then we can explicate the conditional probability P(

_{2}*E’*|

*T*&

*B*) in terms of the expected fraction of all observers in the reference class that the conjunction of

*T*and

*B*says would be making the kind of observations that

*E’*says that we are making. This will enable us to conclude that P(

*E’*|

*T*&

_{1}*B*) > P(

*E’*|

*T*&

_{2}*B*).

In order to spotlight basic principles, we can make some simplifying assumptions. In the present application, we can think of the reference class as consisting of all observers who will ever have existed. We can also assume a uniform sampling density over this reference class. Moreover, it simplifies things if we set aside complications arising from assigning probabilities over infinite domains by assuming that *B *entails that the number of observers is finite, albeit such a large finite number that the problems described earlier obtain.

Here is how SSA supplies the missing link needed to connect theories like *T _{1 }*and

*T*to observation. On

_{2 }*T*, the only observers who observe an apparent temperature of the cosmic microwave background CMB ˜ 2.7 K are those who have various sorts of rare illusions (for example because their brains have been generated by black holes and are therefore not attuned to the world they are living in) or happen to be located in extremely atypical places (where e.g. a thermal fluctuation has led to a locally elevated CMB temperature). On

_{2}*T*, by contrast, almost every observer who makes the appropriate astronomical measurements and is not deluded will observe CMB ˜ 2.7 K. A much greater fraction of the observers in the reference class observe CMB˜2.7 K if

_{1}*T*is true than if

_{1 }*T*is true. By SSA, we consider ourselves as random observers; it follows that on

_{2 }*T*we would be more likely to find ourselves as one of those observers who observe CMB ˜ 2.7 K than we would on

_{1 }*T*. Therefore, P(

_{2}*E’*|

*T*&

_{1}*B*) >> P(

*E’*|

*T*&

_{2}*B*). Supposing that the prior probabilities of

*T*and

_{1 }*T*are roughly the same, P(

_{2 }*T*) ˜ P(

_{1}*T*), it is then trivial to derive via Bayes’ theorem that P(

_{2}*T*|

_{1}*E’*&

*B*) > P(

*T*|

_{2}*E’*&

*B*). This vindicates the intuitive view that we do have empirical evidence that favors

*T*over

_{1 }*T*.

_{2}
The job that SSA is doing in this derivation is to enable the step from propositions about fractions of observers to propositions about corresponding probabilities. We get the propositions about fractions of observers by analyzing *T _{1 }*and

*T*and combining them with relevant background information

_{2 }*B*; from this, we conclude that there would be an extremely small fraction of observers observing CMB ˜ 2.7 K given

*T*and a much larger fraction given

_{2 }*T*. We then consider the evidence

_{1}*E’*, which is that

*we*are observing CMB ˜ 2.7 K. SSA authorizes us to think of the “we” as a kind of random variable ranging over the class of actual observers. From this it then follows that

*E’*is more probable given

*T*than given

_{1 }*T*. But without assuming SSA, all we can say is that a greater fraction of observers observe CMB ˜ 2.7 K if

_{2}*T*is true; at that point the argument would grind to a halt. We could not reach the conclusion that

_{1 }*T*is supported over

_{1 }*T*. Therefore, SSA, or something like it, must be adopted as a methodological principle.

_{2}Here we’ll examine Ludwig Boltzmann’s famous attempt to explain why entropy is increasing in the forward time-direction. We will show that a popular and intuitively very plausible objection against Boltzmann relies on an implicit appeal to SSA.

The outlines of Boltzmann’s^{1 }explanation can be sketched roughly as follows. The direction of time’s arrow appears to be connected to the fact that entropy increases in the forward time-direction. Now, if one assumes, as is commonly done, that low entropy corresponds in some sense to low probability, then one can see that if a system starts out in a low-entropy state then it will probably evolve over time into a higher entropy state, a more probable state of the system. The problem of explaining why entropy is increasing is thus reduced to the problem of explaining why entropy is currently so low. The world’s being in such a low-entropy state would appear a priori improbable. Boltzmann points out, however, that in a sufficiently large system (and the universe may well be such a system) there are, with high probability, local regions of the system—let’s call them “subsystems”—which are in low-entropy states even if the system as a whole is in a high-entropy state. Think of it like this: In a sufficiently large container of gas, there will be some places where all the gas molecules in that local region are lumped together in a small cube or some other neat pattern. That is probabilistically guaranteed by the random motion of the gas molecules together with the fact that there are so many of them. Hence, Boltzmann argued, in a large-enough universe there will be some places and some times at which, just by chance, the entropy happens to be exceptionally low. Since life can only exist in a region if it has very low entropy, we would naturally find that in our part of the universe entropy is very low. And since low-entropy subsystems are overwhelmingly likely to evolve towards higher-entropy states, we thus have an explanation of why entropy is currently low here and increasing. An observation selection effect guarantees that we observe a region where that is the case, even though such regions are enormously sparse in the bigger picture.

^{1 }Boltzmann attributes the idea to his assistant, Dr. Schuetz.

Lawrence Sklar has remarked about Boltzmann’s explanation that it has been “credited by many as one of the most ingenious proposals in the history of science, and disparaged by others as the last, patently desperate, ad hoc attempt to save an obviously failed theory” ((Sklar 1993), p. 44). I think that the ingenuity of Boltzmann’s contribution should be fully granted, especially considering that writing this in 1895, he was nearly seventy years ahead of his time in reckoning with observation selection effects when reasoning about the large-scale structure of the world. But the idea, nonetheless, is flawed.

The standard objection is that Boltzmann’s datum—that the observable universe is a low-entropy subsystem—turns out on a closer look to be in conflict with his explanation. Low-entropy regions that are as huge as the one we observe are *very *sparsely distributed if the universe as a whole is in a high-entropy state. A much smaller low-entropy region would have sufficed to permit intelligent life to exist. Boltzmann’s theory fails to account for why the observed low-entropy region is so large and so grossly out of equilibrium.

This plausible objection can be fleshed out with the help of SSA. Let us follow Boltzmann and suppose that we are living in a very vast, perhaps infinite, universe which is in thermal equilibrium, and that observers can exist only in low-entropy regions. Let *T *be the theory that asserts this. According to SSA, what *T *predicts we should observe depends on where *T *says that the bulk of observers tend to be. Since *T *is a theory of thermodynamic fluctuations, it implies that smaller fluctuations (low-entropy regions) are *vastly *more frequent than larger fluctuations, and hence that most observers will find themselves in rather small fluctuations. This is so because the infrequency of larger fluctuations increases rapidly enough to ensure that even though a given large fluctuation will typically contain more observers than a given small fluctuation, the vast majority of observers will nonetheless be in small fluctuations. By SSA, *T *assigns a probability to us observing what we actually observe that is proportional to the fraction of all observers *T *says would make that kind of observations. Since an extremely small fraction of all observers will observe a low entropy region as large as ours if *T *is true, it follows that *T *gives an extremely small probability to the hypothesis that we should observe such a large low-entropy region. Hence *T *is heavily disfavored by our evidence and should be rejected unless its a priori probability is so extremely high as to compensate for its empirical implausibility. For instance, if we compare *T *with a rival theory *T* *which asserts that the average entropy in the universe as a whole is about the same as the entropy of the region we observe, then in light of the preceding argument we have to acknowledge that *T* *is much more likely to be true, unless our prior probability function were severely skewed towards *T*. (The bias would have to be truly extreme. It would not suffice, for example, if one’s prior probabilities were P(*T*) = 99.999999% and P(*T**) = 0.000001%.) This validates the standard objection against Boltzmann. His anthropic explanation is refuted—probabilistically but with extremely high probability—by a more careful application of the anthropic principle.

A contemporary philosopher, Lawrence Sklar, writes that a Boltzmannian has a “reasonable reply” (ibid. p. 299) to this objection, namely that in Boltzmann’s picture there will be *some *large regions where entropy is low, so our observations are not really incompatible with his proposal. However, while there is no logical incompatibility, the *probabilistic incompatibility *is of a very high degree. This can, for all practical purposes, be just as decisive as a logical deduction of a falsified empirical consequence, making it totally unreasonable to accept this reply.

Sklar goes on to state what he sees as the real problem for Boltzmannians:

The major contemporary objection to Boltzmann’s account is its apparent failure to do justice to the observational facts . . . as far as we can tell, the parallel direction of entropic increase of systems toward what we intuitively take to be the future time direction that we encounter in our local world seems to hold throughout the universe.” (Ibid. p. 300)

It is easy to see that this is but a veiled reformulation of the objection discussed above. If there were a “reasonable reply” to the former objection, the same reply would work equally well against this reformulated version. An unreformed Boltzmannian could simply retort: “Hey, even on my theory there are some regions and some observers in those regions to whom, as far as they can tell, entropy seems to be on the increase throughout the universe—they see only their local region of the universe, after all. Hence our observations are compatible with my theory!” If we are not impressed by this reply, it is because we are willing to take probabilistic entailments seriously. Failing to do so would spell methodological disaster for any theory that postulates a sufficiently big cosmos, since according to such theories there will always be some observer somewhere who observes what we are observing, so the theories would be logically compatible with any observation we could make.^{2 }But that is clearly not how such theories work. Rational belief is constrained not only by the chains of deduction but also by the rubber bands of probabilistic inference.

^{2 }The only observational consequence such theories would have on that view is that we don’t make observations that are logically incompatible with the laws of nature which that theory postulates. That is too weak to be of any use. Any finite sequence of sensory stimulation we could have seems to be logically compatible with the laws of nature, both in the classical mechanics framework used in Boltzmann’s time and in a contemporary quantum mechanical setting.

Anthropic reasoning has been applied to estimate probabilistic cparameters in evolutionary biology. For example, we may ask how difficult it was for intelligent life to evolve on our planet.^{3 }Naively, one may think that since intelligent life evolved on the only planet we have closely examined, evolution of intelligent life seems quite easy. Science popularizer Carl Sagan appears to have held this view: “the origin of life must be a highly probable circumstance; as soon as conditions permit, up it pops!” (Sagan 1995). A moment’s reflection reveals that this inference is incorrect, since no matter how unlikely it was for intelligent life to develop on any given planet, we should still expect to have originated from a planet where such an improbable sequence of events took place. As we saw in chapter 2, the theories that are disconfirmed by the fact that intelligent life exists here are those according to which the difficulty of evolving intelligent life is so great that they give a small likelihood to there being even a single planet with intelligent life in the whole world.

^{3 }A natural way of explicating this question is by construing it as asking about what fraction of all Earth-like planets actually develop intelligent life, provided they are left untouched by alien civilization.

Brandon Carter combined this realization with some additional assumptions and argued that the chances of intelligent life evolving on any particular Earth-like planet are in fact very small (Carter 1983, 1989). His argument is summarized in this footnote.^{4 }

^{4 }Let us make use of a little story to convey the idea.

Define three time intervals: *t ^{—}*, “the expected average time . . . which would be intrinsically most likely for the evolution of a system of ‘intelligent observers’, in the form of a scientific civilization such as our own” (Carter 1983), p. 353);

*t*, which is the time taken by biological evolution on this planet ˜ 0.4 × 10

_{e}^{10}years; and t

_{0}, the lifetime of the main sequence of the sun ˜ 10

^{10}years.

The argument in outline runs as follows: Since at the present stage of understanding in biochemistry and evolutionary biology we have no way of making even an approximate calculation of how likely the evolution of intelligent life is on a planet like ours, we should use a very broad prior probability distribution for this. We can partition the range of possible values of *t ^{—}* roughly into three regions:

*t*<<

^{—}*t*,

_{0}*t*˜

^{—}*t*, or

_{0}*t*>>

^{—}*t*. Of these three possibilities we can, according to Carter, “rule out” the second one a priori, with fairly high probability, since it represents a very narrow segment of the total hypothesis space, and since a priori there is no reason to suppose that the expected time to evolve intelligent life should be correlated with the duration of the main sequence of stars like the sun. But we can also rule out, with great probability, the first alternative, since if the expected time to evolve intelligent life were much smaller than

_{0}*t*, then we would have expected life to evolve much earlier than it in fact did. This leaves us with

_{0}*t*>>

^{—}*t*, meaning that life was very unlikely to evolve as fast as it did, within the lifetime of the main sequence of the sun.

_{0}
What drives this conclusion is the near coincidence between *t _{e}* and

*t*. A priori, there is no reason to suppose that these two quantities would be within an order of magnitude (or even within a factor of about two) from each other. This fact, combined with an observation selection effect, yields the prediction that the evolution of intelligent life is very unlikely to happen on a given planet within the main sequence of its star. The contribution that the observation selection effect makes is that it prevents observations of intelligent life taking

_{0}*longer*than

*t*to evolve. Whenever intelligent life evolves on a planet, we must find that it evolved before its sun went extinct. Were it not for the fact that the only evolutionary processes that are observed firsthand are those which gave rise to intelligent observers in a shorter time than

_{0}*t*, then the observation that

_{0}*t*˜

_{e}*t*would have disconfirmed the hypothesis that

_{0}*t*>>

^{—}*t*just as much as it disconfirmed

_{0}*t*>>

^{—}*t*. But thanks to this selection effect,

_{0}*t*˜

_{e}*t*is precisely what one would expect to observe even if the evolutionary process leading to intelligent life were intrinsically very unlikely to take place in as short a time as

_{0}*t*.

_{0}Patrick Wilson (Wilson 1994) advances some objections against Carter’s reasoning, but as these objections do not concern the basic anthropic methodology that Carter uses, they don’t need to be addressed here.

A corollary of Carter’s conclusion is that there very probably aren’t any extraterrestrial civilizations anywhere near us, maybe not even in our galaxy.

Carter has also suggested a clever way of estimating the number of improbable “critical” steps in the evolution of humans. A princess is locked in a tower. Suitors have to pick five combination locks to get to her. They can do this only through random trial and error, i.e. without memory of which combinations have been tried. A suitor gets one hour to pick all five locks. If he doesn’t succeed within the allotted time, he is beheaded. However, the princess’ charms are such that there is an endless line of hopeful suitors waiting their turn.

After the deaths of some unknown number of suitors, one of them finally passes the test and marries the princess. Suppose that the numbers of possible combinations in the locks are such that the expected time to pick each lock is .01, .1, 1, 10, and 100 hours respectively. Suppose that pick-times for the suitor who got through are (in hours) {.00583, .0934, .248, .276, .319}. By inspecting this set you could reasonably guess that .00583 hour was the pick-time for the easiest lock and .0934 hour the pick-time for the second easiest lock. However, you couldn’t really tell which locks the remaining three pick-times correspond to. This is a typical result. When conditioning on success before the cut-off (in this case 1 hour), the average completion time of a step is nearly independent of its expected completion time, provided the expected completion time is much longer than the cut-off. Thus, for example, even if the expected pick-time of one of the locks had been a million years, you would still find that its average pick-time *in successful runs *is closer to .2 or .3 than to 1 hour, and you wouldn’t be able to tell it apart from the 1, 10, and 100 hours locks.

If we don’t know the expected pick-times or the number of locks that the suitor had to break, we can obtain estimates of these parameters if we know the time it took him to reach the princess. The less surplus time left over before the cut-off, the greater the number of difficult locks he had to pick. For example, if the successful suitor took 59 minutes to get to the princess, that would favor the hypothesis that he had to pick a fairly large number of locks. If he reached the princess in 35 minutes, that would strongly suggest that the number of difficult locks was small. The relation also works the other way around so that if we are not sure what the maximum allowed time is we can estimate it from information about the number of difficult locks and their combined pick-time in a random successful trial. Monte Carlo simulations confirming these claims have been performed by Robin Hanson, who has also derived some useful analytical expressions (Hanson 1998).

Carter applies these mathematical ideas to evolutionary theory by noting that an upper bound on the cut-off time after which intelligent life could not have evolved on Earth is given by the duration of the main sequence of the sun—about 10*10^{9} years. It took about 4*10^{9} years for intelligent life to develop. From this (together with some other assumptions which are problematic but not in ways relevant for our purposes), Carter concludes that the number of critical steps in human evolution is likely very small—not much greater than two.

One potential problem with Carter’s argument is that the duration of the main sequence of the sun gives only an upper bound on the cut-off. Maybe climate change or some other event would have made Earth unconducive to evolution of complex organisms long before the sun becomes a red giant. Recognizing this possibility, Barrow and Tipler apply Carter’s reasoning in the opposite direction and seek to infer the true cut-off by directly estimating the number of critical steps (Barrow and Tipler 1986).^{5 }In a recent paper, Robin Hanson scrutinizes Barrow and Tipler’s alleged critical steps and argues that their model does not fit the evidence very well when considering the relative time the steps actually took to complete (Hanson 1998).

^{5 }For example, the step from prokaryotic to eukaryotic life is a candidate for being a critical step, since it seems to have happened only once and appears to be necessary for intelligent life to evolve. By contrast, there is evidence that the evolution of eyes from an “eye precursor” has occurred independently at least forty times, so this step does not seem to be difficult. A good introduction to some of the relevant biology is (Schopf 1992).

Our concern here is not which estimate is correct or even whether at the current state of biological science enough empirical data and theoretical understanding are available to supply the substantive premises needed to derive any specific conclusion from this sort of considerations.^{6 }My contention, rather, is twofold. Firstly, if one wants to argue about or make a claim regarding such things as the improbability of intelligent life evolving, or the probability of finding extraterrestrial life, or the number of critical steps in human evolution, or the planetary window of opportunity during which evolution of intelligent life is possible, then one needs to be careful to make sure that one’s position is probabilistically coherent. The works by Carter and others have revealed subtle ways in which some views on these things are untenable. Secondly, underlying the basic constraints appealed to in Carter’s reasoning (and this is quite independent of the specific empirical assumptions he needs to get any concrete results) is an application of SSA. WAP and SAP are inadequate in these applications. SSA makes its entrée when we realize that in a large universe there are actual evolutionary histories of most any sort. On some planets, life evolves swiftly; on others, it will uses up all the time available before the cut-off.^{7 }On some planets, difficult steps are completed more quickly than easy steps. Without some probabilistic connection between the distribution of evolutionary histories and our own observed evolutionary past, none of the above considerations would even make sense.

^{6 }There are complex empirical issues that would need to be confronted were one to the seriously investigate these questions. For instance, if a step takes a very long time, that *may *suggest that the step was very difficult (perhaps requiring simultaneous muli-loci mutations or other rare occurrences). But there can be other reasons for a step taking long to complete. For example, oxygen breathing took a long time to evolve, but this is not a ground for thinking that it was a difficult step. For oxygen breathing became adaptive only after there were significant levels of free oxygen in the atmosphere, and it took anaerobic organisms hundreds of millions of years to produce enough oxygen to satiate various oxygen sinks and increase atmospheric oxygen to the required levels. This process was slow but virtually guaranteed eventually to run to completion, so it would be a mistake to infer that the evolution of oxygen breathing and the concomitant Cambrian explosion represent a hugely difficult step in human evolution.— Likewise, that a step took only a short time (as, for instance, did the transition from our ape ancestors to homo sapiens) *can *be evidence suggesting it was relatively easy, but it need not be if we suspect that there was only a small window of opportunity for the step to occur (so that if it occurred at all, it would have to happen within that time-interval).

^{7 }In case of an infinite (or extremely large finite) cosmos, intelligent life would also evolve after the “cut-off”. Normally we may feel quite confident in stating that intelligent life cannot evolve on Earth after the swelling sun has engulfed it. Yet the freak-observer argument made in chapter 3 can of course be extended to show that in an infinite universe there would, with probability one, be some red giants that enclose a region where—because of some ridiculously improbable statistical fluke—an Earth-like planet continues to exist and develop intelligent life. Strictly speaking, it is not impossible but only highly improbable that life will evolve on any given planet after its orbit has been swallowed by an expanding red giant.

SSA is not the only methodological principle that would establish such a connection. For example, we could formulate a principle stating that every *civilization *should reason as if it were a random sample from the set of all civilizations.^{8 }For the purposes of the above anthropic arguments in evolution theory, this principle would amount to the same thing as the SSA, provided that all civilizations contain the same number of observers. However, when considering hypotheses on which certain types of evolutionary histories are correlated with the evolved civilizations containing a greater or smaller number of observers, this principle is not valid. We then need to have recourse to the more generally applicable principle given by SSA.

^{8 }Such a principle would be very similar to what Alexander Vilenkin has (independently) called the “principle of mediocrity” (Vilenkin 1995).

When driving on the motorway, have you ever wondered about (or cursed!) the phenomenon that cars in the other lane appear to be getting ahead faster than you? Although one may be inclined to account for this by invoking Murphy’s Law^{9}, a recent paper in *Nature *(Redelmeier and Tibshirani 1999), further elaborated in (Redelmeier and Tibshirani 2000), seeks a deeper explanation. According to this view, drivers suffer from systematic illusions causing them to mistakenly think they would have been better off in the next lane. Here we show that their argument fails to take into account an important observation selection effect. Cars in the next lane actually do go faster.

^{9 }“If anything can go wrong, it will.” (Discovered by Edward A. Murphy, Jr., in 1949.)

In their paper, Redelmeier and Tibshirani present some evidence that drivers on Canadian roadways (which don’t have an organized laminar flow) think that the next lane is typically faster. The authors seek to explain this phenomenon by appealing to a variety of psychological factors. For example, “a driver is more likely to glance at the next lane for comparison when he is relatively idle while moving slowly”; “Differential surveillance can occur because drivers look forwards rather than backwards, so vehicles that are overtaken become invisible very quickly, whereas vehicles that overtake the index driver remain conspicuous for much longer”; and “human psychology may make being overtaken (losing) seem more salient than the corresponding gains”. The authors recommend that drivers be educated about these effects and encouraged to resist small temptations to switch lanes, thereby helping to reduce the risk of accidents.

While all these illusions may indeed occur^{10}, there is a more straightforward explanation of the phenomenon. It goes as follows. One frequent cause of why a lane (or a segment of a lane) is slow is that there are too many cars in it. Even if the ultimate cause is something else, such as road work, there is nonetheless typically a negative correlation between the speed of a lane and how densely packed are the vehicles driving in it. That suggests (although it doesn’t logically imply) that a disproportionate fraction of the average driver’s time is spent in slow lanes. And by SSA, that means that there is a greater than even prior probability of that holding true about you in particular.

^{10 }For some relevant empirical studies, see e.g. (Feller 1966; Tversky and Kahnemann 1981, 1991; Gilovich, Vallone et al. 1985; Larson 1987; Angrilli, Cherubini & Manfredini 1997; Snowden, Stimpson et al. 1998; Walton and Bathurst 1998).

The last explanatory link can be tightened up further if we move to a stronger version of the SSA replaces “observer” with “observer-moment”, i.e. time-segment of an observer. (We will discuss this stronger principle, “SSSA”, in depth in chapter 10; the invocation of it here is an aside.) If you think of your present observation, when driving on the motorway, as a random sample from all observations made by drivers, then chances are that your observation will be made from the viewpoint that most observers have, which is the viewpoint of the slow-moving lane. In other words, appearances are faithful: more often than not, the “next” lane *is *faster!

Even when two lanes have the same average speed, it can be advantageous to switch lanes. For what is relevant to a driver who wants to reach her destination quickly is not the average speed of the lane as a whole, but rather the speed of some segment extending maybe a couple of miles forwards from the driver’s current position. More often than not, the next lane has a higher average speed, at this scale, than does the driver’s present lane. On average, there is therefore a benefit to switching lanes (which of course has to be balanced against the costs of increased levels of effort and risk).

Adopting a thermodynamics perspective, it is easy to see that (at least in the ideal case) increasing the “diffusion rate” (i.e. the probability of lane-switching) will speed the approach to “equilibrium” (i.e. equal velocities in both lanes), thereby increasing the road’s throughput and the number of vehicles that reach their destinations per unit time.

The mistake to avoid is ignoring the selection effect residing in the fact that when you randomly select a driver and ask her whether she thinks the next lane is faster, more often than not you will have selected a driver in the lane which is in fact slower. And if there is no random selection of a driver, but it is just you yourself wondering why you are so unlucky as to be in the slow lane, then the selection effect is an observational one. Once we realize this, we see that no case has been made for recommending that drivers change lanes less frequently.

One of the fundamental problems in the interpretation of quantum physics is how to understand the probability statements that the theory makes. On one kind of view, the “single-history version”, quantum physics describes the “propensities” or physical chances of a range of possible outcomes, but only one series of outcomes actually occurs. On an alternative view, the “many-worlds version”, all possible sequences of outcomes (or at least all that have nonzero measure) actually occur. These two kinds of views are often thought to be observationally indistinguishable (Wheeler 1957; DeWitt 1970; Omnès 1973), but, depending on how they are fleshed out, SSA may provide a method of telling them apart experimentally. What follows are some sketchy remarks about how such an observational wedge could be inserted. We’re sacrificing rigor and generality in this section in order to keep things brief and simple.

The first problem faced by many-worlds theories is how to connect statements about the measure of various outcomes with statements about how probable we should think it is that we will observe a particular outcome. Consider first this simpleminded way of thinking about the many-worlds approach: When a quantum event E occurs in a quantum system in state S, and there are two possible outcomes A and B, then the wavefunction of S will after the event contain two components or “branches”, one were A obtains and one where B obtains, and these two branches are in other respects equivalent. The problem with this view is that it fails to give a role to the amplitude of the wavefunction. If nothing is done with the fact that one of the branches (say A) might have a higher amplitude squared (say 2/3) than does the other branch, then we’ve lost an essential part of quantum theory, namely that it specifies not just what *can *happen but also the probabilities of the various possibilities. In fact, if there are equally many observers on the branch were A obtains as on the branch where B obtains, and if there is no other relevant difference between these branches, then by SSA the probability that you should find yourself on branch A is 1/2, rather than 2/3 as asserted by quantum physics. This simpleminded interpretation must therefore be rejected.

One way of trying to improve the interpretation is by postulating that when the measurement occurs, the wavefunction splits into more than two branches. Suppose, for example, that there are two branches where A obtains and one branch were B obtains (and that these branches are otherwise equivalent). Then, by SSA, you’d have a 2/3 probability of observing A— the correct answer. If one wanted to adopt this interpretation, one would have to stipulate that there are lots of branches. One could represent this interpretation pictorially as a tree, where a thick bundle of fibers in the trunk gradually split off into branches of varying degrees of thickness. Each fiber would represent one “world”. When a quantum event occurs in one branch, the fibers it contains would divide into smaller branches, with the number of fibers going into each sub-branch being proportional to the amplitude squared of the wave function. For example, 2/3 of all the fibers on a branch where the event E occurs in system S would go into a sub-branch where A obtains, and 1/3

into a sub-branch where B obtains. In reality, if we wanted to hold on to the exact real-valued probabilities given by quantum theory, we’d have to postulate a continuum of fibers, so it wouldn’t really make sense to speak of different fractions of fibers going into different branches. But something of the underlying ontological picture could possibly be retained so that we could speak of the more probable outcomes as obtaining in “more worlds” in some generalized sense of that expression.

Alternatively, a many-worlds interpretation could simply decide to take the correspondence between quantum mechanical measure and the probability of one observing the correlated outcome as a postulated primitive. It would then be assumed that, as a brute fact, you are more likely to find yourself on one of the branches of higher measure. (Maybe one could speak of such higher-measure branches as having a “higher degree of reality”.)

On either of these alternatives, there are observational consequences that diverge from those one gets if one accepts the single-history interpretation. These consequences come into the light when one considers quantum events that lead to different numbers of observers. This was recently pointed out by Don N. Page (Page 1999). The point can be made most simply by considering a quantum cosmological toy model:

World 1: Observers; measure or probability 10^{-30 }

World 2: No observers; measure or probability 1-10^{-30 }

The single-history version predicts with overwhelming probability (P = 1-10^{-30}) that World 2 would be the (only) realized world. If we exist, and consequently World 1 has been realized, this gives us strong reasons for rejecting the single-history version, given this particular toy model. By contrast, on the many-worlds version, both World 1 and World 2 exist, and since World 2 has no observers, what is predicted (by SSA) is that we should observe World 1, notwithstanding its very low measure. In this example, if the choice is between the single-history version and the many-worlds version, we should therefore accept the latter.

Here’s another toy model:

World A: 10^{10 }observers; measure or probability 1-10^{-30 }

World B: 10^{50 }observers; measure or probability 10^{-30 }

In this model, finding that we are in World B does not logically refute the single-history version, but it does make it extremely improbable. For the single-history version gives a conditional probability of 10^{-30 }to us observing World B. The many-worlds version, on the other hand, gives a conditional probability of approximately 1 to us observing World B.^{11 }Provided, then, that our subjective prior probabilities for the single-history and the many-worlds versions are in the same (very big) ballpark, we should in this case again accept the latter. (The opposite would hold, of course, if we found that we are living in World A.)

11

These are toy models, sure. In practice, it will no doubt be hard to get a good grip on the measure of “worlds”. A few things should be noted though. First, the “worlds” to which we need assign measures needn’t be temporally unlimited. We could instead focus on smaller “world-parts” that arose from, and got their measures from, some earlier quantum event whose associated measures or probabilities we think we know. Such an event could, for instance, be a hypothetical symmetry-breaking event in an early inflationary epoch of our universe, or it could be some later occurrence that influences how many observers there will be (we’ll study in depth some cases of this kind in chapter 9). Second, the requisite measures may be provided by other theories so that the conjunction of such theories with either the single-history or the many-worlds versions may be empirically testable. For example, Page performs some illustrative calculations using the Hartle-Hawking “no-boundary” proposal and some other assumptions. Third, since in many quantum cosmological models, the difference in the number of observers existing in different worlds can be quite huge, we might get results that are robust for a rather wide range of plausible measures that the component worlds might have. And fourth, as far as our project is concerned, the important point is that our methodology ought to be able to make this kind of consideration intelligible and meaningful, whether or not at the present time we have enough data to put it into practice.^{12 }

^{12 }On some related issues, see especially (Leslie 1996; Page 1996, 1997) but also (Albert 1989; Papineau 1995, 1997; Tegmark 1996, 1997; Schmidhuber 1997; Olum 2002). Page has independently developed a principle he calls the “Conditional Aesthemic Principle”, which is a sort of special-case version of SSSA applied to quantum physics.

In the last chapter, we argued through a series of thought experiments for reasoning in accordance with SSA in a wide range of cases. We showed that while the problem of the reference class is sometimes irrelevant when all hypotheses under consideration imply the same number of observers, the definition of the reference class becomes crucial when different hypotheses entail different numbers of observers. In those cases, what probabilistic conclusions we can draw depends on what sort of things are included in the reference class, even if the observer doing the reasoning knows that she is not one of the contested objects. We argued that many types of entities should be excluded from the reference class (rocks, bacteria, buildings, plants, etc.). We also showed that variations in regard to many quite “deep-going” properties (such as gender, genes, social status, etc.) are not sufficient grounds for discrimination when determining membership in the reference class. Observers differing in any of these respects can at least in some situations belong to the same reference class.

In this chapter, a complementary set of arguments was presented, focusing on how SSA caters to a methodological need in science by providing a way of connecting theory to observation. The scientific applications we looked at included:

- Deriving observational predictions from contemporary cosmological models.
- Evaluating a common objection against Boltzmann’s proposed thermodynamic explanation of time’s arrow.
- Identifying probabilistic coherence constraints in evolutionary biology. These are crucial in a number of contexts, such as when asking questions about the likelihood of intelligent life evolving on an Earth-like planet, the number of critical steps in human evolution, the existence of extraterrestrial intelligent life, and the cut-off time after which the evolution of intelligent life would no longer have been possible on Earth.
- Analyzing claims about perceptual illusions among drivers.
- Realizing a potential way of experimentally distinguishing between single-history and many-worlds versions of quantum theory.

Any proposed rival to SSA should be tested in all of the above thought experiments and scientific applications. Anybody who refuses to accept that something like SSA is needed, is hereby challenged to propose a simpler or more plausible method of reasoning that works in all these cases.

Our survey of applications is by no means exhaustive. We shall now turn to a purported application of SSA to evaluating hypotheses about humankind’s prospects. Here we are entering controversial territory where it is not obvious whether or how SSA can be applied, or what conclusions to derive from it. Indeed, the ideas we begin to pursue at this point will eventually lead us (in chapter 10) to propose important revisions to SSA. But we have to take one step at a time.