Chapter 1

Introduction

Observation selection effects

How big is the smallest fish in the pond? You catch one hundred fishes, all of which are greater than six inches. Does this evidence support the hypothesis that no fish in the pond is much less than six inches long? Not if your net can’t catch smaller fish.

Knowledge about limitations of your data collection process affects what inferences you can draw from the data. In the case of the fish-size-estimation problem, a selection effect—the net’s sampling only the big fish—vitiates any attempt to extrapolate from the catch to the population remaining in the water. Had your net instead sampled randomly from all the fish, then finding a hundred fishes all greater than a foot would have been good evidence that few if any of the fish remaining are much smaller.

In 1936, the Literary Digest conducted a poll to forecast the result of the upcoming presidential election. They predicted that Alf Landon, the Republican candidate, would win by a large margin. In the actual election, the incumbent Franklin D. Roosevelt won a landslide victory. The Literary Digest had harvested the addresses of the people they sent the survey to mainly from telephone books and motor vehicle registries, thereby introducing an important selection effect. The poor of the depression era, a group where support for Roosevelt was especially strong, often did not have a phone or a car. A methodologically more sophisticated forecast would either have used a more representative polling group or at least factored in known and suspected selection effects.1

1 The Literary Digest suffered a major reputation loss as a result of the infamous poll and soon went out of business, being superceded by a new generation of pollsters such as George Gallup, who not only got the 1936 election right but also predicted what the Literary Digest’s prediction would be to within 1%, using a sample size just one thousandth the size of the Digest’s but more successfully avoiding selection effects. The infamous 1936 poll has secured a place in the annals of survey research as a paradigm example of selection bias, yet just as important was a nonresponse bias compounding the error referred to in the text (Squire 1988).—The fishing example originates from Sir Arthur Eddington (Eddington 1939).

Or to take yet another example, suppose you’re a young investor pondering whether to invest your retirement savings in bonds or equity. You are vaguely aware of some studies showing that over sufficiently lengthy periods of time, stocks have, in the past, substantially outperformed bonds (an observation which is often referred to as the “equity premium puzzle”). So you are tempted to put your money into equity. You might want to consider, though, that a selection effect might be at least partly responsible for the apparent superiority of stocks. While it is true that most of the readily available data does favor stocks, this data is mainly from the American and British stock exchanges, which both have continuous records of trading dating back over a century. But is it an accident that the best data comes from these exchanges? Both America and Britain have benefited during this period from stable political systems and steady economic growth. Other countries have not been so lucky. Wars, revolutions, and currency collapses have at times obliterated entire stock exchanges, which is precisely why continuous trading records are not available elsewhere. By looking at only the two greatest success stories, one would risk overestimating the historical performance of stocks. A careful investor would be wise to factor in this consideration when designing her portfolio. (For one recent study that attempts to estimate this survivorship bias by excavating and patching together the fragmentary records from other exchanges, see (Jorion and Goetzmann 2000); for some theory on survivorship biases, see (Brown et al. 1995).)

In these three examples, a selection effect is introduced by the fact that the instrument you use to collect data (a fishing net, a mail survey, preserved trading records) samples only from a proper subset of the target domain. Analogously, there are selection effects that arise not from the limitations of some measuring device but from the fact that all observations require the existence of an appropriately positioned observer. Our data is filtered not only by limitations in our instrumentation but also by the precondition that somebody be there to “have” the data yielded by the instruments (and to build the instruments in the first place). The biases that occur due to that precondition—we shall call them observation selection effects—are the subject matter of this book.

Anthropic reasoning, which seeks to detect, diagnose, and cure such biases, is a philosophical goldmine. Few fields are so rich in empirical implications, touch on so many important scientific questions, pose such intricate paradoxes, and contain such generous quantities of conceptual and methodological confusion that need to be sorted out. Working in this area is a lot of intellectual fun.

Let’s look at an example where an observation selection effect is involved: We find that intelligent life evolved on Earth. Naively, one might think that this piece of evidence suggests that life is likely to evolve on most Earth-like planets. But that would be to overlook an observation selection effect. For no matter how small the proportion of all Earth-like planets that evolve intelligent life, we will find ourselves on a planet that did (or we will trace our origin to a planet where intelligent life evolved, in case we are born in a space colony). Our data point—that intelligent life arose on our planet—is predicted equally well by the hypothesis that intelligent life is very improbable even on Earth-like planets as by the hypothesis that intelligent life is highly probable on Earth-like planets. This datum therefore does not distinguish between the two hypotheses, provided that on both hypotheses intelligent life would have evolved somewhere. (On the other hand, if the “intelligent-life-is-improbable” hypothesis asserted that intelligent life was so improbable that is was unlikely to have evolved anywhere in the whole cosmos, then the evidence that intelligent life evolved on Earth would count against it. For this hypothesis would not have predicted our observation. In fact, it would have predicted that there would have been no observations at all.)

We don’t have to travel long on the path of common sense before we enter a territory where observation selection effects give rise to difficult and controversial issues. Already in the preceding paragraph we passed over a point that is contested. We understood the explanandum, that intelligent life evolved on our planet, in a “non-rigid” sense. Some authors, however, argue that the explanandum should be: why did intelligent life evolve on this planet (where “this planet” is used as a rigid designator). They then argue that the hypothesis that intelligent life is quite probable on Earth-like planets would indeed give a higher probability to this fact (Hacking 1987; Dowe 1998; White 2000). But we shall see in the next chapter that that is not the right way to understand the problem.

The impermissibility of inferring from the fact that intelligent life evolved on Earth to the fact that intelligent life probably evolved on a large fraction of all Earth-like planets does not hinge on the evidence in this example consisting of only a single data point. Suppose we had telepathic abilities and could communicate directly with all other intelligent beings in the cosmos. Imagine we ask all the aliens, did intelligent life evolve on their planets too? Obviously, they would all say: Yes, it did. But equally obvious, this multitude of data would still not give us any reason to think that intelligent life develops easily. We only asked about the planets where life did in fact evolve (since those planets would be the only ones which would be “theirs” to some alien), and we get no information whatsoever by hearing the aliens confirming that life evolved on those planets (assuming we don’t know the number of aliens who replied to our survey or, alternatively, that we don’t know the total number of planets). An observation selection effect frustrates any attempt to extract useful information by this procedure. Some other method would have to be used to do that. (If all the aliens also reported that theirs was some Earth-like planet, this would suggest that intelligent life is unlikely to evolve on planets that are not Earth-like; for otherwise some aliens would likely have evolved on non-Earth like planets.)

Another example of reasoning that invokes observation selection effects is the attempt to provide a possible (not necessarily the only) explanation of why the universe appears fine-tuned for intelligent life in the sense that if any of various physical constants or initial conditions had been even very slightly different from what they are then life as we know it would not have existed. The idea behind this possible anthropic explanation is that the totality of spacetime might be very huge and may contain regions in which the values of fundamental constants and other parameters differ in many ways, perhaps according to some broad random distribution. If this is the case, then we should not be amazed to find that in our own region physical conditions appear “fine-tuned”. Owing to an obvious observation selection effect, only such fine-tuned regions are observed. Observing a fine-tuned region is precisely what we should expect if this theory is true, and so it can potentially account for available data in a neat and simple way, without having to assume that conditions just happened to turn out “right” through some immensely lucky—and arguably a priori extremely improbable—cosmic coincidence. (Some skeptics doubt that an explanation for the apparent fine-tuning of our universe is needed or is even meaningful. We examine the skeptical arguments in chapter 2 and consider the counterarguments offered by proponents of the anthropic explanation.)

Here are some of the topics we shall be covering: cosmic fine-tuning arguments for the existence of a multiverse or alternatively a cosmic “designer”; so-called anthropic principles (and how they fall short); how to derive observational predictions from inflation theory and other contemporary cosmological models; the Self-Sampling Assumption; observation selection effects in evolutionary biology and in the philosophy of time; the Doomsday argument, the Adam & Eve, UN++ and Quantum Joe paradoxes; alleged observer-relative chances; the Presumptuous Philosopher gedanken; the epistemology of indexical belief; game theoretic problems with imperfect recall; and much more.

Our primary objective is to construct a theory of observation selection effects. We shall seek to develop a methodology for how to reason when we suspect that our evidence is contaminated with anthropic biases. Our secondary objective is to apply the theory to answer some interesting scientific and philosophical questions. Actually, these two objectives are largely overlapping. Only by interpolating between theoretical desiderata and the full range of philosophical and scientific applications can we arrive at a satisfactory account of observation selection effects. At least, that is the approach taken here.

We’ll use a Bayesian framework, but a reader who doesn’t like formalism should not be deterred. There isn’t an excessive amount of mathematics; most of what there is, is elementary arithmetic and probability theory, and the results are conveyed verbally also. The topic of observation selection effects is extremely difficult. Yet the difficulty is not in the math, but in grasping and analyzing the underlying principles and in selecting appropriate models.

A brief history of anthropic reasoning

Even trivial selection effects can sometimes easily be overlooked:

It was a good answer that was made by one who when they showed him hanging in a temple a picture of those who had paid their vows as having escaped shipwreck, and would have him say whether he did not now acknowledge the power of the gods,—‘Aye,’ asked he again, ‘but where are they painted that were drowned after their vows?’ And such is the way of all superstition, whether in astrology, dreams, omens, divine judgments, or the like; wherein men, having a delight in such vanities, mark the events where they are fulfilled, but where they fail, though this happens much oftener, neglect and pass them by. (Bacon 1620)

When even a plain and simple selection effect, such as the one that Francis Bacon comments on in the quoted passage, can escape a mind that is not paying attention, it is perhaps unsurprising that observation selection effects, which tend to be more abstruse, have only quite recently been given a name and become a subject of systematic study.2

2 Why isn’t the selection effect that Bacon refers to an “observational” one? After all, nobody could observe the bottom of the sea at that time.—Well, one could have observed that the sailors had gone missing. Fundamentally, the criterion we can use to determine whether something is an observation selection effect is whether a theory of observation selection effects is needed to model it. That doesn’t seem necessary for the case Bacon describes.

The term “anthropic principle”, which has been used to label a wide range of things only some of which bear a connection to observation selection effects, is less than three decades old. There are, however, precursors from much earlier dates. For example, in Hume’s Dialogues Concerning Natural Religion, one can find early expressions of some ideas of anthropic selection effects. Some of the core elements of Kant’s philosophy about how the world of our experience is conditioned on the forms of our sensory and intellectual faculties are not completely unrelated to modern ideas about observation selection effects as important methodological considerations in theory-evaluation, although there are also fundamental differences. In Ludwig Boltzmann’s attempt to give a thermodynamic account of time’s arrow (Boltzmann 1897), we find for perhaps the first time a scientific argument that makes clever use of observation selection effects. We shall discuss Boltzmann’s argument in one of the sections of chapter 4, and show why it fails. A more successful invocation of observation selection effects was made by R. H. Dicke (Dicke 1961), who used it to explain away some of the “large-number coincidences”, rough order-of-magnitude matches between some seemingly unrelated physical constants and cosmic parameters, that had previously misled such eminent physicists as Eddington and Dirac into a futile quest for an explanation involving bold physical postulations.

The modern era of anthropic reasoning dawned quite recently, with a series of papers by Brandon Carter, another cosmologist. Carter coined the term “anthropic principle” in 1974, clearly intending it to convey some useful guidance about how to reason under observation selection effects. We shall later look at some examples of how he applied his methodological ideas to both physics and biology. While Carter himself evidently knew how to apply his principle to get interesting results, he unfortunately did not manage to explain it well enough to enable all his followers to do the same.

The term “anthropic” is a misnomer. Reasoning about observation selection effects has nothing in particular to do with homo sapiens, but rather with observers in general. Carter regrets not having chosen a better name, which would no doubt have prevented much of the confusion that has plagued the field. When John Barrow and Frank Tipler introduced anthropic reasoning to a wider audience in 1986 with the publication of The Anthropic Cosmological Principle, they compounded the terminological disorder by minting several new “anthropic principles”, some of which have little if any connection to observation selection effects.

A total of over thirty anthropic principles have been formulated and many of them have been defined several times over—in nonequivalent ways—by different authors, and sometimes even by the same authors on different occasions. Not surprisingly, the result has been some pretty wild confusion concerning what the whole thing is about. Some reject anthropic reasoning out of hand as representing an obsolete and irrational form of anthropocentrism. Some hold that anthropic inferences rest on elementary mistakes in probability calculus. Some maintain that at least some of the anthropic principles are tautological and therefore indisputable. Tautological principles have been dismissed by some as empty and thus of no interest or ability to do explanatory work. Others have insisted that like some results in mathematics, though analytically true, anthropic principles can nonetheless be interesting and illuminating. Others still purport to derive empirical predictions from these same principles and regard them as testable hypotheses. We shall want to distance ourselves from most of these would-be codifications of the anthropic organon. Some reassurance comes from the metalevel consideration that anthropic reasoning is used and taken seriously by a range of leading physicists. One would not expect this bunch of hardheaded scientists to be just blowing so much hot air. And we shall see that once one has carefully removed extraneous principles, misconceptions, fallacies and misdescriptions, one does indeed find a precious core of methodological insights.

Brandon Carter also originated the notorious Doomsday argument, although he never published on it. First to discuss it in print was philosopher John Leslie, whose prolific writings have also elucidated a wide range of other issues related to anthropic reasoning. A version of the Doomsday argument was invented independently by Richard Gott, an astrophysicist. The Doomsday argument has generated a bulky literature of its own, which sometimes suffers from being disconnected from other areas of anthropic reasoning. One lesson from this book is, I think, that different applications of anthropic reasoning provide important separate clues to what the correct theoretical account of observation selection effects must look like. Only when we put all the pieces of the puzzle together in the right way does a meaningful picture emerge.

The field of observational selection has begun to experience rapid growth in recent years. Many of the most important results date back only about a decade or less. Philosophers and scientists (especially cosmologists) deserve about equal parts of the credit for the ideas that have already been developed and which this book can now use as building blocks.

Synopsis of this book

Our journey begins in chapter 2 with a study of the significance of cosmic “fine-tuning”, referring to the apparent fact that if any of various physical parameters had been very slightly different then no observers would have existed in the universe. There is a sizable literature on what to make of such “coincidences”. Some have argued that they provide some evidence for the existence of an ensemble of physically real universes (a “multiverse”). Others, of a more religious bent, have used arguments from fine-tuning to attempt to make a case for some version of the design hypothesis. Still others claim that comic fine-tuning can have no special significance at all. The latter view is incorrect. The finding that we live in a fine-tuned universe (if that is indeed so) would, as we shall see, provide support for explanations that essentially involve observation selection effects. Such explanations raise interesting methodological issues which we will be exploring in chapter 2. I argue that only by working out a theory of observation selection effects can we get to the bottom of the fine-tuning controversies. Using analogies, we begin to sketch out a preliminary account of how observation selection effects operate in the cosmological context, which allows us to get a clearer understanding of the evidential import of fine-tuning. Later, in chapter 11, we will return to the fine-tuning arguments and use the theory that we’ll have developed in the intervening chapters to more rigorously verify the informal conclusions of chapter 2.

Given that observation selection effects are important, we next want to know more precisely what kind of beast they are and how they affect methodology. Is it possible to sum up the essence of observation selection effects in a simple statement? A multitude of so-called “anthropic principles” attempt to do just that. Chapter 3 takes a critical look at the main contenders, and finds that they fall short. Many “anthropic principles” are simply confused. Some, especially those drawing inspiration from Brandon Carter’s seminal papers, are sound, but we show that although they point in the right direction they are too weak to do any real scientific work. In particular, I argue that existing methodology does not permit any observational consequences to be derived from contemporary cosmological theories, in spite of the fact that these theories quite plainly can be and are being tested empirically by astronomers. What is needed to bridge this methodological gap is a more adequate formulation of how observation selection effects are to be taken into account. A preliminary formulation of such a principle, which we call the Self-Sampling Assumption, is proposed towards the end of chapter 3. The basic idea of the Self-Sampling Assumption is, very roughly put, that you should think of yourself as if you were a random observer from a suitable reference class.

Chapter 4 begins to build a “philosophical” case for our theory by conducting a series of thought experiments that show that something like the Self-Sampling Assumption describes a plausible way of reasoning about a wide range of cases.

Chapter 5 shows how the Self-Sampling Assumption enables us to link up cosmological theory with observation in a way that is both intuitively plausible and congruent with scientific practice. This chapter also applies the new methodology to illuminate problems in several areas, to wit: thermodynamics and the problem of time’s arrow; evolutionary biology (especially questions related to how improbable was the evolution of intelligent life on Earth and how many “critical” steps there were in our evolutionary past); and an issue in traffic analysis. An important criterion for a theory of observation selection effects is that it should enable us to make sense of contemporary scientific reasoning and that it can do interesting work in helping to solve real empirical problems. Chapter 5 demonstrates that our theory satisfies this criterion.

The notorious Doomsday argument, which seeks to show that we have systematically underestimated the probability that humankind will go extinct relatively soon, forms the subject matter for chapter 6. We review and criticize the literature on this controversial piece of reasoning, both papers that support it and ones that claim to have refuted it. I think that the Doomsday argument is inconclusive. But the reason is complicated and must await explanation until we have developed our theory further, in chapter 10.

The Doomsday argument deserves the attention it has attracted, however. Getting to the bottom of what is wrong or inconclusive about it can give us invaluable clues about how to build a sound methodology of observation selection effects. It is therefore paramount that the Doomsday argument not be dismissed for the wrong reasons. Lots of people think that they have refuted the Doomsday argument, but not all these objections can be right— many of the “refutations” are inconsistent with one another, and many presuppose ideas that can be shown unacceptable when tried against other criteria that a theory of anthropic reasoning must satisfy. Chapter 7 examines several recent criticisms of the Doomsday argument and explains why they fail.

In chapter 8, we refute an argument purporting to show that anthropic reasoning gives rise to paradoxical observer-relative chances. We then give an independent argument showing that there are cases where anthropic reasoning does generate probabilities that are “observer-relative” in an interesting but non-paradoxical sense.

Paradoxes lie in ambush in chapter 9. We explore the thought experiments Adam & Eve, UN++, and Quantum Joe. These reveal some counterintuitive aspects of the most straightforward version of the Self-Sampling Assumption.

Is there a way out? At the end of chapter 9 we find ourselves in an apparent dilemma. On the one hand, something like the Self-Sampling Assumption seems philosophically justified and scientifically indispensable on the grounds explained in chapters 4 and 5. On the other hand, we seem then to be driven towards a counterintuitive (albeit coherent) position vis-àvis the gedanken experiments of chapter 9. What to do?

Chapter 10 goes back and reexamines the reasoning that led to the formulation of the original version of the Self-Sampling Assumption. But now we have the benefit of lessons gleaned from the preceding chapters. We understand better the various constraints that our theory has to satisfy. And we have a feel for what is the source of the problems. Combining these clues, we propose a solution that enables us to escape the paradoxes while still catering to legitimate methodological needs. The first step of the solution is to strengthen the Self-Sampling Assumption so that it applies to “observer-moments” rather than just observers. This increases our analytical firepower. A second step is to relativize the reference class. The result is a general framework for modeling anthropic reasoning, which is given a formal expression in an equation, the Observation Equation, that specifies how to take into account evidence that has an indexical component or that has been subjected to an observation selection effect.

In chapter 11, we illustrate how this theory of observation selection effects works by applying it to a wide range of philosophical and scientific problems. We show how it confirms (and makes more precise) the preliminary conclusions that were arrived at by less rigorous analogy-based arguments in earlier chapters. Chapter 11 also provides an analysis of the Sleeping Beauty problem (and a fortiori its closely related game-theoretic analogues, the Absent-Minded Driver problem and the Absent-Minded Passenger problem). It is argued that the solution is more complex than previously recognized and that this makes it possible to reconcile the two opposing views that dominate the literature. We close with a discussion of the element of subjectivity that may reside in the choice of a prior credence function for indexical propositions. We compare it with the more widely recognized aspect of subjectivity infesting the non-indexical component of one’s credence function, and we suggest that the issue throws light on how to rank various applications of anthropic reasoning according to how scientifically rigorous they are. At the very end, there are some pointers to avenues for further research.