Chapter 11

Observation Selection Theory Applied

The proof of the pudding being in its eating, we shall in this final chapter apply the observation selection theory to the fine-tuning and freak-observer problems in cosmology, to the Sleeping Beauty problem in game theoretic modeling of imperfect recall, and to the other scientific issues that we have studied (in evolution theory, thermodynamics, traffic analysis, and quantum physics). Then, towards the end, we shall argue that one can say something about how scientifically rigorous a given application is by looking at what sort of demand it places on how the reference class be defined. In general, weaker demands correspond to greater scientific rigor. Paradoxical applications are distinguished from the more scientific ones by the fact that the former work only for a rather special set of reference classes (which one may well reject) whereas the latter hold for a much wider range of reference classes (which arguably any reasonable person is required not to transgress). We will also tie this in with the foregoing discussion of the element of subjectivity that may exist in the choice of reference class.

Cosmological theorizing: fine-tuning and freak observers

In chapter 2, we argued, inter alia, for three preliminary conclusions regarding fine-tuning as evidence for multiverse hypotheses:

(1) Fine-tuning favors (other things equal) hypotheses h+ on which it is likely that one or more observer-containing universes exist over hypotheses h- on which this is unlikely.

(2) If two competing general hypotheses each imply that there is at least some observer-containing universe, but one of them implies a greater number of observer-containing universes, then fine-tuning is not a reason to favor the latter (other things equal).

(3) Although P(e|hM) may be much closer to zero than to one (hM being the multiverse hypothesis, and e the evidence we actually have), it could nonetheless easily be large enough to make the multiverse hypothesis supported by e.

We can now reexamine these theses in the new light of our theory. To begin with (1), let’s determine under what circumstances we will have Pa(h+|e) >Pa(h -|e).

Suppose that Pa(there is at least one actual observer-moment compatible with e|h+)=1. Since P(A|B) = P(A&B) / P(B), this can be expressed as


(M(h), remember, is the class of worlds wi where h is true and for which (wi) is non-empty.) Similarly, if we suppose that Pa(there is at least one actual observer-moment compatible with e|h)0, we get


If the hypotheses in question have about equal prior probability, Pℜa( h+)Pa(h), this implies that-


which is equivalent1 to

1 To see this, consider the worlds over which the sums range in ($): these worlds all have at least one observer-moment in Oe and are such that h+ (or h-) is true in them; and Pa(wi) appears in the sum once for every such world. In the second inequality ($$), the sum again includes only terms corresponding to worlds that have at least one observer-moment in Oe and are such that h+ (or h-) is true in them. The difference is that terms relating to such worlds occur multiple times in ($$): a term Pa(ws) occurs once for every such observer-moment sin each such world. Thus after dividing each term Pa(ws) with the number of such observer-moments (|OenO(ws)|), the sum is the same as in ($).


We may thus tell under what circumstances e will preferentially support h+ over h- by considering what is required for ($$) to yield (£). And from this we can learn three lessons:

  • If Oe=Os for each s∈Oen(Oh+Oh-) then (£) follows from ($$). This means that if all the observer-moments that the hypotheses say may exist and which are compatible with our evidence e are in the same reference class (Oe) then a hypothesis h+ on which it is likely that one or more observer-moments compatible with e exist is supported vis-à-vis a hypothesis h- on which that is unlikely.
  • In principle, it is possible for a hypothesis h- that makes it less likely that there should be some observer-moment compatible with e to get preferential support from e vis-à-vis a hypothesis h+ which makes that more likely. For example, if h+ makes it likely that there should be one observer-moment compatible with e but at the same time makes it very likely that there are very many other observer-moments in our reference class that are not compatible with e, then h+ may be disfavored by e compared to a hypothesis h- on which it is quite unlikely that there should be any observer-moment compatible with e but on which also it is highly unlikely that there should be a substantial number of observer-moments in our reference class that are not compatible with e.
  • In practice (i.e. regarding (3)), if we think of h+ as a multiverse theory and h- as a single-universe theory, it seems that the concrete details will sometimes be such that (£) follows from ($$) together with the facts about these details. This is the case when h+ entails a higher probability than does h- to there being some actual observer-moment that is compatible with e, while at the same time the expected ratio between the number of actual observer-moments that are compatible with e that are in our reference class and the number of actual observer-moments that are in our reference class that are incompatible with e is about the same on h+ as on h- (or greater on h+ than on h-). Crudely put: it is alright to infer a bigger cosmos in order to make it probable that at least some observer-moment compatible with e exists, but only if this can be done without sacrificing too much of the desideratum of making it probable that a large fraction of the actual observer-moments that are in our reference class are compatible with e.

We’ll continue the discussion of (3) in a moment, but first let’s direct the spotlight on the second preliminary thesis. The analysis of (2) follows a path parallel to that of (1).

Suppose that

Pa (there are many actual observer-moments compatible with e|h++
) ˜ 1

Pa (there is at least one actual observer-moment compatible with e|h+) ˜ 1

Pa(h++) ˜ Pa(h+)

Since the first expression implies that

Pa(there is at least one actual observer-moment compatible with e|h++) ˜ 1

we get, in a similar way to above,


Meanwhile, by OE, is equivalent to


Again we can compare ($$*) to (£*) to see under what circumstances the former implies the latter. We find that

  • As before, if Oe = Os for each s∈Oe∩(Oh+∪Oh) then (£*) follows from ($$*). This means that if the observer-moments that are compatible with e and with at least one of the hypotheses h++ and h+ are all in the same reference class (Oe) then a hypothesis h++ on which it is likely that there are a great many observer-moments compatible with e is not preferentially supported vis-à-vis a hypothesis h+ on which it is likely that there are relatively few observer-moments compatible with e.
  • Generally speaking, e will fail to distinguish between h++ and h+ if, for those observer-moments that are in our reference class, both hypotheses imply a similar expected ratio between the number of ones compatible with e and the number of ones incompatible with e. This means that ceteris paribus there is no reason to prefer a hypothesis that implies a greater number of observer-moments, beyond what is required to make it likely that there should be at least one actual observer-moment that is compatible with e.

Armed with these results, we can address (3). Let’s suppose for the moment that there are no freak observers.

First, consider a single-universe theory hU on which our universe is fine-tuned, so that conditional on hU there was only a very small probability that an observer-containing universe should exist. If we compare hU with a multiverse theory hM, on which it was quite likely that an observer-containing universe should exist, we find that if hU and hM had similar prior probabilities, then there are prima facie grounds for thinking hM to be more probable than hU given the evidence we have. Whether these prima facie grounds hold up on closer scrutiny depends on the distributions of observer-moments that hU and hM make probable. Supposing that the nature of the observer-moments that would tend to exist on hU (if there were any observer-moments at all, which would improbable on hU) are similar to the observer-moments that (most likely) exist on HM, then we do in fact have such grounds.

The precise sense of the proviso that our evidence e may favor hM over hU only if the observer-moments most likely to exist on either hypothesis are of a similar nature is specified by OE and the lessons we derived from it above. But we can say at least something in intuitive terms about what sorts of single-universe and multiverse theories for which this will be the case. For example, we can consider the case where there is a single relevant physical parameter, ?. Suppose the prior probability distribution over possible values of ? that a universe could have is smeared out over a broad interval (representing a priori ignorance about ? and absence of any general grounds, such as considerations of simplicity or theoretical elegance, for expecting that ? should have taken on a value within a more narrow range). In the archetypal case of fine-tuning, there is only a very small range of ?-values that give rise to a universe that contains observers. Then the conditional probability of e given hU is very small. By contrast, the conditional probability of e given hM can be quite large, since there will most likely be observers given hM and these observer-moments will all be living in universes where ? has a value within the small region of fine-tuned (observer-generating) values. In this situation, hM would be preferentially supported by e.

Now consider a different case that doesn’t involve fine-tuning but merely “ad hoc” setting of a free parameter. This is the case when observers can exist over the whole range of possible values of ? (or a fairly large part thereof). The conditional probability of e given hU is the same as before (i.e. very small), but in this case the conditional probability of e given hM is about equally small. For although hM makes it likely that there should be some observers, and even that there should be some observers compatible with e, hM also makes it highly likely that there should be very many other observers who are not compatible with e. These are the observers who live in other universes in the multiverse, universes where ? takes a different value than the one we have observed (and hence incompatible with e). If these other observers are in the same reference class as us (and there is no clear reason why they shouldn’t be, at least if the sort of observers living in universes with different ? are not too dissimilar to ourselves), then this means that the conditional probability of e given hM is very small. If enough other-? universes contain substantial quantities of observers who are in the same reference class as us, then hM will not get significant preferential support from e compared to hU.

We see here the sense in which fine-tuning suggests a multiverse in a way that mere free parameters do not. In the former case, hM tends to be strongly supported by the evidence we have (given comparable priors); in the latter case, not.

On this story, how does one fit in the scenario where we discover a simple single-universe theory hU* that accounts well for the evidence? Well, if hU* is elegant and simple, then we would assign it a relatively high prior probability. Since hU* by assumption implies or at least gives a rather high probability to e, the conditional probability of hU* given e would thus be high. This would be support for the single-universe hypothesis and against the multiverse hypothesis.

One kind of candidate for such a single-universe theory are theories involving a creator who chose to create only one universe. If one assigned one such theory hC* a reasonably high prior probability, and if it could be shown to give a high probability to there being one universe precisely like the one we observe and no other universes, then one would have support for hC*. Creator-hypotheses on which the creator creates a whole ensemble of observer-containing universes would be less supported than hC . However, if our universe is not of the sort that one might have suspected a creator to create if he created only one universe (if our universe is not the “nicest” possible one in any sense, for example), then the conditional probability of e on any creator-hypothesis involving the creation of only one universe might well be so slim that even if one assigned such a creator-hypothesis a high prior probability it would still not be tenable in light of e if there were some plausible alternative theory giving a high conditional probability to e (e.g. a multiverse theory successfully riding on fine-tuning and its concomitant selection effects, or a still-to-be-discovered simple and elegant single-universe theory that fits the facts). If there were no such plausible alternative theory, then one may believe either a fine-tuned single-universe theory, a multiverse-theory not benefiting from observation selection effects, or a creator hypothesis (either of the single-universe or the multi-verse kind)—these would be roughly on a par regarding how well they’d fit with the evidence (quite poorly for all of them) and the choice between them would be determined mainly by one’s prior probability function.

In chapter 2 we also touched on the case where our universe is discovered to have some “special feature” F. One example of this is if we were to find inscriptions saying “God created this universe and it’s the only one he created” in places where it seems only a divine being would have made them (and we thought that there was a significant chance that the creator was being honest). Another example is if we find specific evidence that favors on ordinary (non-anthropic) grounds some physical theory that either implies a single-universe world or a multiverse. Such new evidence e’ would be conjoined with the evidence e we already have. What we should believe in the light of this depends on what conditional probability various hypotheses give to e &e’ and on the prior probabilities we give to these hypotheses. With e’ involving special features, e &e’ might well be such as to preferentially favor hypotheses that specifically accounts for the special features, and this favoring may be strong enough to dominate any of the considerations mentioned above. For example, if we find all those inscriptions, that would make the creator-hypothesis seem very attractive even if one assigned it a low prior probability and even if the conditional probability of there being a single universe with F given the creator-hypotheses would be small; for other plausible hypotheses would presumably give very much smaller conditional probabilities to our finding that our universe has F. (On hU, it would be extremely unlikely that there would be any universe with F. On hM, it might be likely that there should be some universe with F, but it would nonetheless be extremely unlikely that we should be in that universe, since on any plausible multiverse theory not involving a creator it would seem that if it were likely that there should be one universe with F then it would also be most likely that there are a great many other universes not having F and in which the observers, although many of them would be in the same reference class as us, would thus not be compatible with the evidence we have.) Similar considerations hold if F is not divine-looking inscriptions but something more of the nature of ordinary physical evidence for some particular physical theory.

Finally, we have to tackle the question of how the existence of freak observers affects the story. The answer is: hardly at all. Although once we take account of freak observers there will presumably be a broad class of single-universe theories that make probable that some observers compatible with e should exist, this doesn’t help the case for such theories. For freak observers are random. Whether they are generated by Hawking radiation or by thermal fluctuations or by some other phenomena of a similar kind, these freak observers would not be preferentially generated to be compatible with e. Only an extremely minute fraction of all freak observers would be compatible with e. The case would therefore be essentially the same as if we have a multiverse where many universes contain observers (that are in our reference class) but only a tiny fraction of them contain observers who are compatible with e. Just as e didn’t especially favor such multiverse-theories over ad hoc single-universe theories, so likewise e is not given a sufficiently high probability by the there-is-a-single-universe-sufficiently-big-to-contain-all-kinds-of-freak observers theory (hF) to make such a theory supported by our evidence. In fact, the case for hF is much worse than the case for such a multiverse theory. For the multiverse theory, even if not getting any assistance from fine-tuning, would at least have a bias towards observers who have evolved (i.e. most observers would be of that kind). Evolved observers would tend to be in epistemic states that to some degree reflect the nature of the universe they are living in. Thus if not every logically possible universe is instantiated (with equal frequency) in the multiverse but instead the universes it contains tend to share at least some basic features with our actual universe, then a much greater fraction of the observers existing in the multiverse would be compatible with e than of the observers existing given hF. On hF the observers would be distributed roughly evenly over all logically possible epistemic states (of a given complexity) 2 whereas on the multiverse theory they’d be distributed over the smaller space of epistemic states that are likely to be instantiated in observers evolving in universes that share at least some basic features (maybe physical laws, or some physical laws, depending on the particular multiverse theory) with our universe. So hF is strongly disfavored by e.

2 If you were to generate lumps of matter at random and wait until a brain in a conscious state emerged, you’d most likely find that the first conscious brain-state was some totally weird psychedelic one, but at any rate not one consistent with the highly specific and orderly set of knowledge represented by e.

Freak observers, therefore, cannot rescue an otherwise flawed theory. At the same time, the existence of freak observers would not prevent a theory that is otherwise supported by our evidence from still being supported once the freak observers are taken into account—provided that the freak observers make up a small fraction of all the observers that the theory says exist. In the universe we are actually living in, for example, it seems that there may well be vast numbers of freak observers (if only it is sufficiently big). Yet these freak observers would be in an astronomically small minority3 compared to the regular observers who trace their origin to life that evolved by normal pathways on some planet. For every observer that pops out of a black hole, there are countless civilizations of regular observers. Freak observers can thus, in the light of our observation selection theory, be ignored for all practical purposes.

3 Again, we are disregarding the infinite case. It seems that in order to handle the infinite case one would have to strengthen OE with something that is formulated in terms of spatial densities of observer-moments rather than classes of observer-moments. But that is beyond the scope of this investigation.

The freak-observer problem places only lax demands on the reference class

We saw in chapter 10 that in order to solve the freak-observer problem, we must use a reference class definition that puts some subjectively distinguishable observer-moments in the same reference class. It is worth pointing out, however, that for the purpose of dealing with freak observers, it suffices to select a reference class definition ℜethat is only marginally more inclusive than ℜ0. The reason for this is illustrated in figure 11.


The fraction of the observer-moments in Oa(ℜ0) (i.e. our reference class as specified by ℜ0) that have the same total evidence e as we have (which includes observing a value of about 2.7 K for the cosmic microwave background radiation) is the same on T2 as it is on T1 (namely, 100% in either case). Therefore, on Oa(ℜ0), e could not distinguish between T1 and T2. Yet, if we move to the reference class Oa(ℜe) specified by the only slightly more inclusive ℜe(which places in our reference class also observer-moments that are just a tiny bit subjectively different from our own), then our evidence e will distinguish strongly between T1 and T2 (and strongly favor the former). This is so because the frequency distribution of observer-moments is strongly peaked around observer-moments that observe the true current value of CMB rather than one of the alternative values that are observed only by observer-moments suffering from illusions. In the figure, if we look at the interval marked “Oa(ℜe)”, we see that the proportion of area under the T1 curve in this interval that is inside the area under the smaller interval representing Oe=Oa is much larger than the corresponding proportion for the T2 -curve. The effect is actually more extreme than is apparent from the graph, both because the graph is not drawn to scale and because there are other dimensions, apart from the observed value of CMB, on which the randomly generated observer-moments will have a relatively broad and flat distribution compared to those observer-moments that have evolved in regular ways. The regular observer-moments will tend to be clustered in the region that a theory claims to represent the properties of the actual world.

We can thus lay down another constraint that any legitimate reference class definition must satisfy: it must be no less inclusive than ℜe (“ℜe-bound”).

The Sleeping Beauty problem: modeling imperfect recall

We’ll continue our exploration of how the observation selection theory applies to scientific problems a few sections hence, but we pause to interject a discussion of the Sleeping Beauty problem, a thought experiment involving imperfect recall. Sleeping Beauty is closely related to two other problems that have been discussed in recent game theory literature: Absent-Minded Driver and the Absent-Minded Passenger. One’s views on one of these problems is likely to determine how one thinks about the others. Therefore, we can regard Sleeping Beauty as a template for a broader class of imperfect recall problems. (The purpose of investigating these problems here is partly to see if our theory may shed light on them and partly to give a further illustration of how the theory works. Towards the end of this chapter, when tying various loose ends together in an attempt to capture some general lesson, we shall also find it useful to have a broad range of sample cases to draw from.)

Sleeping Beauty

On Sunday afternoon Beauty is given the following information. She will be put to sleep on Sunday evening and will wake up on Monday morning. Initially she will not know what day it is, but on Monday afternoon she’ll be told it is Monday. On Monday evening she will be put to sleep again. Then a fair coin will be tossed and if and only if it falls tails will she be awakened again on Tuesday. However, before she is woken she will have her memory erased so that upon awakening on Tuesday morning she has no memory of having been awakened on Monday. When she wakes up on Monday, what probability should she assign to the hypothesis that the coin landed heads?

Views diverge as to whether the correct answer is P(Heads) = 1/3 or P(Heads) = 1/2. In support of the former alternative is the consideration that if there were a long series of Sleeping Beauty experiments then on average one third of the awakenings would be Heads-awakenings. One might therefore think that on any particular awakening, Beauty should believe with a credence of 1/3 that the coin landed heads in that trial. In support of the view that P(Heads) = 1/2 there is the consideration that the coin is known to be fair and it appears as if awakening does not give relevant new information to somebody who knew all along that she would at some point be awakened. The former view is advocated in e.g. (Elga 2000) and the latter in e.g. (Lewis 2001); but see also (Aumann, Hart, et al. 1997; Battigalli 1997; Gilboa 1997; Grove 1997; Halpern 1997; Lipman 1997; Piccione and Rubinstein 1997a, 1997b; Wedd 2000) for earlier treatments of the same or similar problems.

My position is that the issue is more complicated than existing analyses admit and that the solution is underdetermined by the problem as formulated above. It contains ambiguities that must be recognized and disentangled. Depending on how we do that, we get different answers. In particular, we need to decide whether there are any outsiders (i.e. observer-moments other than those belonging to Beauty while she is in the experiment), and what Beauty’s reference class is. Once these parameters have been fixed, it is straightforward to calculate the answer using OE.

The case of no outsiders

Consider first the case of no outsiders. Suppose that Beauty is the only observer in the world and that she is created specifically for the experiment and that she is killed as soon as it is over. We can simplify by representing each possible period of being awake as a single possible observer-moment. (As shown earlier, it makes no difference how many observer-moments we associate with a unit of subjective time, provided we use a sufficiently fine-grained metric to accurately represent the proportions of subjective time spent in the various states.) We can then represent Sleeping Beauty graphically as follows (figure 12):

The diagram shows the possible observer-moments, and groups those that have the same total information together in equivalence classes. Thus, for instance, let ß2, ß4, and ß6 denote the “Monday-morning-in-the-Tailsworld” observer-moment, the “Tuesday-morning-in-the-Tails-world” observer-moment, and the “Monday-morning-in-the-Heads-world” observer-moment, respectively. Since they have the same evidence (not shared by any other observer-moment), they constitute an equivalence class. This equivalence class, which we have denoted by “Oe2”, represents the evidence that each of these observer-moments has.


To find the solution, we must also know Beauty’s reference class. Consider first the case where Beauty puts only subjectively indistinguishable observer-moments in her reference class,

Rß2,4,6 = {ß2, ß4, ß6}

It is then easy to verify that OE entails

P(ß2,4,6) = 1/2

So when Beauty wakes up on Monday morning (and of course likewise if and when she wakes up on Tuesday) she should think that the probability of Heads is . Intuitively, this is because all the possible observer-moments that are in the just awakened Beauty’s reference class (i.e. ß2, ß4 and ß6 — or “ß2,4,6” for short) share the same evidence e2 (“I know the set-up and I’ve just woken up but haven’t yet been told what day it is”) and this reference class was guaranteed to be non-empty independently of how the coin fell.

In the case where Beauty includes all observer-moments in her reference class,

Rß2 = Rß2 = Rß2 = {ß1, ß2, ß3, ß4, ß5, ß6, ß7}

OE entails P2, 4, 6 (Heads|e2) = 2/5

In this case, when Beauty wakes up she should think that the probability of Heads is 2/5. The reason why the probability is less than is that a smaller fraction of all observer-moments in her reference class would have her evidence if Heads (namely, one out of three observer-moments) than if Tails (two out of four). The exact figure of 2/5, however, depends on the detailed stipulations about the version of Sleeping Beauty we are considering and is not generic to the scenario.

The case with outsiders

Turning to the case where there are outsiders (figure 13), we note first that their existence makes no difference unless they are included in the awakened Beauty’s reference class. If they are not included—if, for instance, Beauty’s reference class (at that time) is Rß2,4,6 = {ß2, ß4, ß6}—then her credence in Heads when wakening on Monday morning is . “Outsiders”, however numerous, do not affect Beauty’s probabilities given this choice of reference class.

We can also note in passing that the assumption that the outsiders are not in Rß2,4,6( the reference class of ß2,4,6) implies that ß2,4,6 all know that they are in the experiment and consequently that they are not among the outsiders. For as we have argued earlier, every observer-moment’s reference class must include all other observer-moment that are subjectively indistinguishable from itself, i.e. all observer-moments that share the same total evidence that it has. So if the outsiders are not in Rß2,4,6then ß2,4,6 can infer that the outsiders have different evidence from their own, and thus that ß2,4,6 are not outsiders.


We get a different answer than , however, if there are possible outside observer-moments that are included in Rß2,4,6. Let’s assume that the number and nature of the outsiders are independent of the outcome of the coin toss. Then an observer-moment observing that it is in the experiment (e.g. ß2) thereby gets reason to increase its credence in the hypothesis (Tails) that entails that a greater fraction of all observer-moments in its reference class are in the experiment than does the rival hypothesis (Heads). In the limiting case where the number of outsiders (that are included in Rß2,4,6) gets very large, OE yields Pß2, 4, 6 (Heads | e2) ? 1/3.

Synthesis of the 1/2- and the 1/3-views

The account presented here shows how we can accommodate both of the rivaling intuitions about what Beauty’s credence should be when she wakes up.

On the one hand, the intuition that her credence of Heads should be 1/3 because that would match the long-run frequency of heads among her awakenings is vindicated if we assume that there is an actual series of experiments resulting in an actual long-run frequency. For there are then many observer-moments that are outside the particular run of the experiment that ß2 is in whilst nonetheless being in ß2’s reference class. This leads, as we saw, to Pß2, 4, 6(Heads|e2) ˜ 1/3.

On the other hand, the intuition that Beauty’s credence of Heads should be is justified in cases where there is only one run of the experiment and there are no other observer-moments in the awakened Beauty’s reference class than her other possible awakenings in that experiment. For in that case, the awakened Beauty does not get any relevant information from finding that she has been awakened, and she therefore retains the prior credence of 1/2.

Those who feel strongly inclined to answer P(Heads) = 1/2 on Beauty’s behalf even in cases were various outsiders are known to be present are free to take that intuition as a reason for choosing a reference class that places outsiders (as well as Beauty’s own pre- and post-experiment observer-moments) outside the reference class they would use as awakened observer-moments in the experiment. It is, hopefully, superfluous to here reemphasize that such a restriction of one’s reference class also needs to be considered in the broader context of other inferences that one wishes to make from indexical statements or observations about one’s position in the world. For instance, jumping to the extreme view that only subjectively indistinguishable observer-moments get admitted into one’s reference class would be unwise, because it would bar one from deriving observational consequences from Big-World cosmologies.

Observation selection theory applied to other scientific problems

Having now shown in detail how the observation selection theory replicates and extends earlier chapters’ informal findings about fine-tuning arguments and the freak-observer problem in cosmology, we can proceed more quickly in describing how it applies to the other scientific problems we have discussed. We will focus on what these applications presuppose about the reference class.

Consider first the criticism of Boltzmann’s attempt to explain time’s arrow. The criticism was that if Boltzmann’s picture were right, we should have expected to live in a much smaller low-entropy bubble than we in fact do. What definitions of the reference class are compatible with this conclusion? The answer is that any of a very broad range of reference class definitions would work. Let’s consider some examples.

The universal reference class definition ℜU would work, of course—it was the one implicitly used in our original discussion of this topic in chapter 5. But a narrower reference class definition would also work fine. The argument goes through so long as our reference class includes those possible observer-moments that are exactly like ours except that they observe themselves living in a somewhat smaller low-entropy region than we do. For if Boltzmann were right, the vast majority of observer-moments in such a reference class would find themselves in smaller low-entropy regions than we do. This would entail, via OE, that the conditional probability of our data on the Boltzmann theory would be extremely small, and hence (making only very weak assumptions about the prior probability of Boltzmann’s theory and its rivals) that our data disconfirms the Boltzmann theory. This claim does not depend on any assumption about the world being very big so that all relevant types of observations were likely to be made whether Boltzmann is right or wrong. Supposing the reference class definition has at least the diminutive degree of inclusiveness just described, our observations would have a much higher probability conditional on the theory that the universe as a whole is in a low-entropy state than on the theory that our region is a thermal fluctuation in a high-entropy bath. In fact, we can lower the requirements even further by considering that if we were the result of a thermal fluctuation then we would most likely have been the result of the smallest possible thermal fluctuation that would have produced observer-moments in our reference class, and the size of such a fluctuation would at any rate not be larger than the size of a human brain (which we know can produce such observer-moments). This means that we could fall back on the argument given for why we should not believe that we are freak observers; and for that we saw that the highly restrictive reference class definition ℜewould suffice.

Ponder, next, the point we made about it being a mistake to conclude from the fact that intelligent life evolved on Earth that the evolution of intelligent life on a given Earth-like planet is not highly improbable (assuming there are sufficiently many Earth-like planets to make it probable that intelligent life would evolve somewhere). Does this point depend very sensitively on a particular choice of reference class? Again, the answer is no. Here, however, there is a slight qualification. The point about an observation selection effect vitiating the attempt to learn about how hard it is for intelligent life to evolve depends on the assumption that the universe contains sufficiently many Earth-like planets (so that the selection effect has a sufficient pool from which to select). More specifically, the argument depends on the probability of at least some civilization “like ours” coming to exist being almost independent on which of the hypotheses under consideration (about the improbability of our evolution) being correct. In more technical terms, what this means is that the argument presupposes that it was approximately equally likely that some observer-moment in your reference class should come to exist whichever of the rival hypotheses is true. But how many Earth-like planets there have to be in total in order for that premiss this to hold true depends on how wide your reference class is. The broader your reference class, the fewer Earth-like planets are required to make the probability approach unity that some possible observer-moment in the reference class should be actualized. So it is not exactly true to say that how we define the reference class has no relevance for this application. Nonetheless, in practice this qualification may make little difference. For instance, if we suppose that the world contains an infinite number of Earth-like planets (as seems to be the case) then every legitimate reference class definition (which is no less inclusive than ℜ0) gives the same result in this application.

What of Carter’s ideas about how we might be able to estimate the number of critical steps in human evolution? Here, what the argument presupposes as far as the reference class is concerned is, roughly speaking, that the observers that would have existed if intelligent life on Earth had arisen earlier or later than it actually did would be in the same reference class as us. More accurately, we need not assume that all these different possible observers would be in our reference class (we don’t even have to suppose that another run of evolution on an Earth-like planet would be likely to produce observers in our reference class even if their evolution took the same time as ours did). Rather, what we need to suppose in order for the argument to work without complications is that the probability that an evolutionary process that leads to intelligent observers should produce observer-moments that are in our reference class is roughly independent of how long the process takes (within a largish interval). The easiest way to grasp the gist of this qualification is to consider a hypothetical case where it is contravened. Suppose that only observer-moments that were thinking “this planet that I am living on has existed for about 4.5 billion years” were included in our reference class (call this reference class “ℜ4.5Gyrs”). Since such observer-moments would not exist (or would be vastly less frequent) among intelligent species that took, say, eight billion years to evolve, we should by ℜ4.5Gyrs find no significant information in the fact that our evolution took 4.5 billion years. In particular, we could not reason that if there were very many critical steps in human evolution then we would most likely have come into existence closer to the cut-off date (i.e. when Earth becomes inimical to the emergence of intelligent life, which occurs no later than when our sun becomes a red giant) and that therefore, since we arose so early, there most likely weren’t very many critical steps. For given ℜ4.5Gyrs, the relevant observer-moments had to arise after 4.5 billion years (i.e. long before the cut-off) or not arise at all. Even if the evolution of intelligent life took much more than 4.5 billion years on the vast majority of the planets where it occurred, the type of observer-moments that are in ℜ4.5Gyrs would still overwhelmingly be found on planets where evolution progressed exceptionally rapidly. So Carter’s argument would not work with ℜ4.5Gyrs.

In the actual case, however, there do not seem to be strong reasons for thinking that civilizations that take somewhat longer or shorter to evolve than ours did would be significantly less likely to contain observer-moments that are in our reference class than are civilizations that take the same time as ours. The main systematic differences in observer-moments between various such civilizations would seem to be in regard to what the observer-moments believe about how long it took for their civilization to develop. (Of course, different civilizations may contain very different kinds of observer-moments, but there seems currently no good argument for thinking that most of these differences would be strongly correlated with how long a civilization takes to arise.) So it seems that Carter’s ideas for estimating the number of critical steps relies only on fairly weak assumptions about the reference class. What’s required is that we don’t adopt a reference class like ℜ4.5Gyrs, which excludes observer-moments primarily on the basis of what they believe about how long their species took to evolve. Yet, this is a defeasible claim. Further research might reveal that there is the kind of systematic correlation between how late a civilization arises and fundamental aspects of the subjective qualities of the observer-moments it is likely to contain that would weaken or destroy Carter’s argument even with a rather more inclusive choice of reference class than ℜ4.5Gyrs.

Traffic analysis.—If the explanandum is why it appears that one tends to end up in a slow lane, what the explanation we suggested in chapter 5 presupposes in terms of the reference class is that observer-moments of the kind that are in one’s reference class are likely to exist in larger numbers in slow lanes than in fast ones. This holds, for example, if the proportion of a lane’s observer-moments that are in one’s reference class is the same for fast and slow lanes (since there are more observer-moments in slow lanes). It would not hold if fast-lane observer-moments were much more likely to be in your reference class than slow-lane observer-moments (extreme example: if it were the case that people in slow lanes usually got so bored that their brains stopped working!) But realistically, it seems that when you are in a slow lane and puzzling about why that is so, then you have no reason to think that a fast lane observer-moment would be more likely to be sufficiently similar to your current observer-moment to be in its reference class than a slow lane observer-moment. (If anything, one would expect the opposite: that observer-moments that are in the same situation as you would be more likely to be in states that are similar to yours.)

We also noted in chapter 5 that observation selection effects may provide us with a method for observationally distinguishing between different interpretations, or versions, of quantum mechanics. This point holds under a wide range of choices of reference class definitions. Consider one of the toy models that we discussed:

World A: 1010 observers; measure or probability 1-10-30

World B: 1050 observers; measure or probability 10-30

A single-history version of quantum mechanics predicts that we should observe World A whereas a many-worlds version predicts that we should observe World B. This tenet presupposes that observer-moments in one of the worlds are not vastly more likely to be in our reference class than observer-moments in the other world. Again, it seems rather plausible, in the absence of arguments to the contrary, that this presupposition would hold in any real attempt to create an empirical test to distinguish between the two sorts of versions of quantum theory; but of course one cannot firmly proclaim on that issue until a concrete scenario has been specified. (Because of the difficulty of deriving the quantum measure for a suitable pair of possibilities to apply the test to, the task of describing a feasible empirical way of discriminating between the rival versions in this way is a non-trivial challenge for quantum cosmologists.) For the sake of illustration, we can imagine a hypothetical case where the presupposition fails: Suppose that all the “observers” in World B are kangaroos, and that you don’t take observer-moments of kangaroos to be in your present observer-moment’s reference class. Then even if you find yourself in World A, this would not be evidence against the many-worlds version.

Robustness of reference class and scientific solidity

Thus what we find is that the scientific arguments appealing to observation selection effects that we described in chapter 5 make various assumptions about the reference class, but that these assumptions are quite weak. That is to say, in these applications, any non-arbitrary reference class definition satisfying some relatively mild constraints gives basically the same result.

I wish to suggest that insensitivity (within limits) to the choice of reference class is exactly what makes the applications just surveyed scientifically respectable. Such robustness is one hallmark of scientific objectivity.

Again, it is useful to draw attention to the parallel to non-indexical scientific arguments. Such arguments also depend for their persuasiveness on assumptions about the shape of our prior credence function, as Hume taught us. That the moon is smaller than the Earth is as well established as any scientific truth. Yet this truth does not, of course, follow logically from any sensory data we have. Rather, it is a hypothesis that gets an extremely high credence after one conditionalizes on the available body of evidence— provided one has a suitable prior credence function. There exist, trivially, credence functions that give a puny probability to the moon being smaller than the Earth when conditionalized on current data; but this is irrelevant, for only a highly unreasonable person would have such a credence function. To say that there is strong scientific evidence for a hypothesis might just mean (roughly) that the evidence is such that any reasonable person considering the data carefully would accept the hypothesis.4

4 This is actually saying very little, since we don’t have much of an independent grasp of what it means to be reasonable other than that one accepts those results that are strongly supported by the evidence one has; but it seems about right as far as it goes. Compare also these suggestions about robustness to Brian Skyrms’ ideas about resilience (Skyrms 1980).

I submit that the same holds with regard to reasoning that involves indexical propositions and observation selection effects. The indexical and the non-indexical are on a par, and the scientifically rigorous anthropic arguments are those that work under any choice of reference class that a reasonable person could have (the choice of reference class being a reflection of the indexical part of one’s prior credence function).

Scientific rigor is a matter of degree. We might even informally rank the scientific applications we examined in order of their rigor and objectivity. At one extreme, we have the solution to the problem of freak observers. Any non-arbitrary reference class that is at least as inclusive as ℜe delivers the same verdict (namely, that we are extremely unlikely to be freak-observers), so this result is very solid. Likewise for the criticism against Boltzmann’s account of time’s arrow. The results regarding traffic analysis are also very firm. The arguments in evolutionary biology make slightly stronger assumptions about the choice of reference class and are therefore somewhat less rigorous (and of course some of these arguments—especially Carter’s argument that there were only few critical steps in human evolution—are also shaky because of the empirical modeling assumptions that they include, quite apart from what they suppose about observation selection effects). Regarding the quantum physics idea, we cannot really tell until we are presented with a concrete plan; but it at least conceivable that it could turn out to yield something that is solid as far as its invocation of observation selection effects is concerned (although it could well be that we’ll never find a rigorous way of establishing the prior quantum measure for a suitable set of possibilities, so that this application could fail to ever become firmly established for that reason).

It pays to contrast this list of scientific applications with the various paradoxical applications that we discussed in earlier chapters. Take the Doomsday argument. In order for it to work, one has to assume that the beings who will exist in the distant future if humankind avoids going extinct soon will contain lots of observer-moments that are in the same reference class as one’s current observer-moment. If one thinks that far-future humans or human descendants will have quite different beliefs than we have, that they will be concerned with very different questions, and that their minds might even be implemented on some rather different (perhaps technologically enhanced) neural or computational structures, then requiring that the observer-moments existing under such widely differing conditions are all in the same reference class is to make a very strong assumption. The same can be said about the cases of Adam & Eve, UN++, and Quantum Joe. These arguments will fail to persuade anybody who doesn’t use the particular kind of very inclusive reference class they rely on—indeed, reflecting on these arguments may well lead a reasonable person to adopt a more narrow reference class. Because they presuppose a very special shape of the indexical parts of one’s prior credence function, they are not scientifically rigorous. At best, they work as ad hominem arguments for those people who happen to accept the appropriate sort of reference class—but we are under no rational obligation to do so.5

5 As regards DA, we can distinguish versions of it that have a greater degree of persuasiveness than others. For example, DA provides stronger grounds for rejecting the hypothesis that humans will exist in very great numbers in the future in states that are very similar to our current ones (since for this, only relatively weak assumptions are needed: that the reference class definition be at least somewhat inclusive) than for rejecting the hypothesis that humans will continue to exist in any form in large numbers (which would require that a highly diverse set of possible observer-moments be included in our current reference class).


An elusive, controversial, and multifariously paradoxical set of problems, branded “anthropic”, formed the subject matter of our investigation. We have tried to show that something of importance can be found behind the smoke and confusion: the appreciation of observation selection effects and of their relevance for scientific and philosophical inferences. We have tried to describe what these things are, how they operate, and how they apply to concrete cases.

Part of our method was to take philosophical paradoxes seriously. We argued, for instance, that the Doomsday argument does not fail for any trivial reason. There are some gaps in its presentation, but we saw that many of these can be filled in. In parallel to this obsession with philosophical paradox, we pursued a detailed investigation of the role of observation selection effects in various concrete scientific contexts.

The theory we have developed in this book, and formalized in chapter 10, provides an exact and systematic framework for taking observation selection effects into account. From the Observation Equation, it is possible to derive as special cases many of the results established by other authors or in earlier chapters of this work. The Carter and Leslie versions of the weak and the strong anthropic principles, for example, are vindicated and extended. The theory solves the freak-observer problem. It explains how to evaluate fine-tuning arguments in cosmology. And it clarifies some murky issues in several other scientific disciplines.

We have seen that it is not necessary to adopt the Self-Indication Assumption (and thus to agree with the Presumptuous Philosopher) in order to avoid the counterintuitive conclusions of the Doomsday argument, Adam & Eve, UN++, and Quantum Joe. For the principle that led to those conclusions, the Self-Sampling Assumption, while being a helpful first step, left out a certain kind of relevant indexical information, namely information about which temporal segment of an observer one currently is. Including this extra information undercuts the inferences that led to strange results by sanctioning the use of a reference class that is relative to observer-moments. The Self-Sampling Assumption can thus be seen as a ladder that can be kicked away now that we have climbed it (or better yet, as something to be retained as a simplified special-case version of the Observation Equation).

The Observation Equation itself is neutral with regard to the definition of the reference class. We did, however, establish some constraints on permissible definitions (ℜ0-bound, non-arbitrariness, ℜe-bound, and the less firm ℜU-bound). We also pointed out some considerations that are relevant for choosing a reference class within these constraints. It was speculated that although further arguments may impose additional restrictions, it is likely that there will remain some latitude for subjective epistemic factors to influence the choice of reference class. If so, then our theory reflects a symmetry between the indexical and the non-indexical components of our prior credence function. In both components, there are limitations on what can reasonably be held, but these limitations do not pick out a uniquely correct credence assignment: rational thinkers could disagree to some extent even given the same evidence. This view has the virtue of enabling us to explain the differing degrees of scientific rigor and objectivity that pertain to different applications, ranging from solving the freak-observer problem (extremely rigorous) to the Doomsday argument (much shakier and hence non-compelling, especially in its more ambitious versions). Generally speaking, the weaker the assumptions that an application needs to make about the reference class, the more scientifically solid it is.

There we have, thus, a framework for connecting up indexical beliefs with non-indexical ones; a delineation of the element of subjectivity in both kinds of inferences; and a method for applying the theory to help solve concrete philosophical and scientific problems, ranging from the question of God’s existence to analyzing claims about perceptual illusions among motorists.

Yet some issues remain mysterious. In particular, I feel that the problem of the reference class, the problem of generalizing to infinite cases, and the problem of attaining a more intuitively transparent understanding of the relation between the indexical and the non-indexical may each enclose deep enigmas. These mysteries may even somehow be connected. I hope that others will see more clearly than I have and will be able to advance further into this fascinating land of thought.