Two Envelope Paradox Solution

Two Envelope Paradox Solution

The Paradox

There are two envelopes, each of which contains a positive real number. One number is twice as large as the other. You choose an envelope, open it to find x, and can keep it, or swap it for the other. In order to maximise your value should you switch?

Method 1. Let the envelopes contain X and 2X. If x = X then by switching to the other envelope you gain X. If x = 2X then switching loses X. The net gain is zero, so switching is of no benefit.

Method 2. If your envelope contains x then the other one must have 2x or x/2. Switching means either a gain of x or a loss of x/2. Since these are equally probable you should switch.

This is a simple version of the paradox. A rigorous version is given in sections 3 and 4.

Synopsis & Acronyms

A brief summary of the literature is given under “Background“. Section 2 debunks the paradox as stated above. Section 3 presents a specific distribution that gives positive expected gain for switching for all x. Section 4 reformulates and solves the paradox for the general case by invoking a new principle, which disconnects utility from gain. Section 5 compares the proposed solution to that of Chalmers. The Discussion shows what happens in a finite case and attempts to explain the paradox intuitively. In summary, we are led to the false belief that swapping can statistically increase our value because the unbounded mean ensures that every actual value is less than the expected one.

Acronyms used (these are explained in the text)

2EP Two Envelope Paradox

GUD Gain Utility Disconnection principle

RDP Restricted Dominance Principle

SP St Petersburg paradox

SP-2EP St Petersburg Two Envelope Paradox

1 Background

This is a famous paradox, probably first framed by the Belgian mathematician, Maurice Kraitchik in 1953 and popularised by Doug Hofstadter in 1982. It has been the subject of papers in philosophical journals in recent years and its current status is unresolved, though many claim to have solved it. Some interesting treatments of the paradox are discussed below. Note that the formulation given above is the so-called open version of the paradox. There is also a “closed” version, where the envelope is not opened. Henceforth, unless qualified, 2EP will refer to the open version of the two envelope paradox.

“The Two-envelope Paradox” by John Broome, Analysis 55.1, Jan 1995, pp. 6-11.

Broome presents two specific distributions which give a positive expected gain for switching for every value. He does not present a solution to 2EP, but the examples he gives are an excellent test for any proposed solution. In Appendix B he gives a proof that the paradox does not arise if the mean is finite.

“The two envelope paradox and infinite expectations” by Frank Artzenius and David McCarthy, Analysis 57.1 January 1997, pp. 42-50.
The authors state that even though the expected gain is positive in all cases in Broome’s distribution ¹ this is not paradoxical because it does not mean that switching is beneficial. They justify this by denying the following proposition, “If there is a partition of outcomes such that for every element of the partition Pj, E( X | Pj ) is finite and E( X | Pj ) > E( Y | Pj ) then E( X ) > E( Y ).” Where E(X) is the expectation of X. Denying the quoted proposition is similar to Chalmers’ assertion of RDP (see below). They do not explain why the quoted proposition is false. They also state that the expectation of switching and of staying are both infinite, implying that there is no paradox. They refer here to the expected gain averaged over all values. It is hard to see why a gain of positive infinity in both directions is not a paradox, even if it is not a paradox for standard decision theory. More importantly, there are cases where the expected gain averaged over all values is positive but not unbounded, eg the distribution given by Clark and Shackel below.

“The Exchange Paradox” by John D. Norton of the University of S. California, the Pacific Philosophical Quarterly 79, 1998, 34-58For the closed version of the paradox, Norton shows that the expected gain of switching averaged over all values is given by an oscillating series. However, this does not address the gain calculation in Broome, which is for every value, not for the overall average. Norton fails to extend his analysis to include the open envelope case. He believes that if we know the contents of the first envelope then swapping is always advisable. He tries to explain away this violation of symmetry, but fails to do so. In fact, his argument can be extended from the closed to the open case. This is because in the open case a single bracketed term is selected from the infinite series to give the expected value, where the (arbitrary) ordering of the series determines the value of this term. Thus the order of summation determines the result analogously to the re-ordering of the infinite series in the closed case. The order of summation gives rise to the problem with the overall average gain in both the open and closed versions of 2EP. However, as noted, even this does not resolve the paradox described by Broome.

“The Two Envelope Paradox” by Michael Clark (Nottingham University) and Nicholas Shackel (De Montfort University), Mind Magazine (Vol 109.435.July 2000)Clark and Shackel present a distribution where the expected gain averaged over all values converges conditionally to 7/12. Essentially, they say that the calculation giving positive gain is incorrect because it does not respect the symmetry of the situation, which is true but does not prove anything. They fail to show at what point there is a mistake in the calculation, which is necessary to resolve the paradox. As for the open version, they admit that the expected gain is positive for each value, yet they claim that the overall gain over the whole run will not be, by arguing that what is true of the closed version of the puzzle must apply to the open version. This is not satisfactory, as the paradox of positive expected gain for every value remains. J. Weisberg and C. Meacham explore these failings at some length in “Clark and Shackel on the Two-Envelope Paradox” in Mind 112, 685-689 (2003). Clark and Shackel’s 7/12 result is not paradoxical because their series is conditionally convergent, so that another ordering, which would be equally valid, would give a different result.

“The St Petersburg Two-Envelope Paradox” by David Chalmers at the Dept of Philosophy of the University of Arizona, Analysis 62:155-57, 2002

Chalmers introduces an ad hoc principle, which might be termed the “Restricted Dominance Principle" (RDP) in order to abolish the paradox. RDP states that in some cases where the mean is unbounded, it is true that even though A is preferable to B for every value of x, it is not true that A is preferable to B. His rationale for invoking RDP is that it seems to be the only way of resolving the St Petersburg Two Envelope Paradox (SP-2EP), which combines the classical St Petersburg paradox (SP) with 2EP, ie the two envelopes contain arbitrary powers of 2. Chalmers argues that in SP-2EP if we open our envelope then the finite number we find cannot compare to the unbounded expected value of the other envelope. Hence in every particular case we should switch, yet by symmetry switching cannot be of benefit, so he proposes RDP.

Chalmers' argument that an unbounded expectation is always preferable to a finite value does not apply to 2EP because the value in the other envelope is bounded with respect to the value in the first one, ie whatever is in the first envelope, the other can at most contain twice as much. Furthermore, RDP is not needed to resolve SP-2EP either.

Consider the case where we have two sets, A and B, with some knowledge of their contents. If A and B are finite then it is unproblematic to compare every value in A with the average of B. However, if both A and B have infinite averages it is a different story. We can compare any value, a, in A with the average of B, but the point is we cannot draw any conclusion from this, ie we cannot conclude that a is of less benefit than selecting from B. We cannot say so because it's a property of sets of real numbers having an infinite average value that every element of such a set is below average. If we acknowledge this simple truth then we realise that finding that the value in our envelope is smaller than the expected value in the other is of no consequence, ie RDP is not needed. This is discussed in more detail in section 4. Indeed, it is the key point of the present paper.

"Envelopes and Indifference" by Graham Priest and Greg Restall of the Philosophy Dept of Melbourne University, (2003) published on the Net.

Priest and Restall attack the problem from the point of view of modal logic. Parts of their answer are excellent, especially the observation that in some formulations 2EP is under-determined and their proposal of three mechanisms to generate the paradox situation. However, I don’t think that they actually dispose of the kernel of the paradox, but then I am unable to follow their argument (despite receiving their clarifications). In addition, their reasoning does not apply to Broome’s calculation ¹.

“The Two Envelope Paradox and Using Variables Within the Expectation Formula” by E. Schwitzgebel and J. Dever of the Universities of California and Texas (November 6, 2007), published on the Net.

This article presents a very simple solution, namely that the x in “2x” and the x in “x/2” in the calculation of method 2 have different expectations and hence cannot be used in the same formula. However, this reasoning does not apply to Broome’s gain calculation ¹. Also, their paper deals only with the closed version of the paradox.

Though these are some of the best solutions in the literature, they all seem to miss the mark. At the least, a successful solution must resolve the problem of the gain calculation given by Broome.

2 The simple paradox resolved

If our envelope contains the largest number in the distribution (if you don't see why there must be a distribution then click here) then the other must contain half as much, so the paradox cannot arise if the values in the envelopes are bounded. Thus the argument of method 2 relies on the presence of arbitrarily large values in the distribution. Firstly, not all numbers in an infinite distribution can occur with equal probability in the envelopes. If it were so, and the probability of any number x was P( x ) = b, a constant value, then the sum of all the probabilities would be b times infinity, whereas probabilities must add up to 1. So we know that the probability of large values must fall off. If P( x ) > e for all x (where e > 0) then the probabilities would sum to more than 1, so P( x ) must approach 0 as x approaches infinity. To dissolve the simple version of the paradox it is sufficient to show that if we have x, then P( 2x ) is smaller than P( x/2 ) for some x, contradicting the argument of method 2. Thus it is enough to show that P( x, 2x ) < P( x/2, x ) for some x.

Proof: if P( x, 2x ) is greater than some constant value, e, for every x in the distribution then it follows that P( x ) > e for arbitrarily large x. This is because x occurs in ( x/2, x ) and ( x, 2x ), so P( x ) would be greater than ½( e + e) = e (since we are as likely to get the first as the second item in a pair). Hence P( x, 2x ) must tend to zero as x tends to infinity. Let P( b, 2b ) = e for a particular b and a constant value e > 0, then P( b, 2b ) = P( 2b, 4b ) = P( 4b, 8b ) = … = P( b2^N, b2^N+1 ) = e for all N > 0. This contradicts P( x, 2x ) -> 0. So P( x, 2x ) < P( x/2, x ) for some x.

This proof is similar to the one given by Norton in the above paper. It has been included for the sake of completeness.

You may find this explanation unsatisfactory. The paradox arises because we have the notion that the chain of pairs of envelopes can stretch forever, with each succeeding pair having equal probability but with double the values. In fact, either the chain must be finite or else the probabilities must diminish.

Does this dispose of the two envelope paradox? It doesn't, because the paradox can be resuscitated, as shown below.

3 A special case of the paradox

The preceding section dispels the paradox in its simple formulation. However, Broome and others have produced specific distributions that re-instate the paradox in a form that cannot be easily removed. We calculate the expected gain of switching in Broome’s first distribution in the paper cited above.

The envelopes in Broome’s distribution contain the pairs ( 1, 2 ), ( 2, 4 ), ( 4, 8 ), ( 8, 16 ) and so on, where each such pair ( 2^N, 2^N+1 ) occurs with probability 2^N/3^N+1 for integer N >= 0. ¹ It is not hard to show that the probability sum, 1/3 + 2/9 + 4/27 + 8/81 + ... converges to a finite value (actually 1), so there is no problem with infinite probabilities. The surprising feature of this distribution of envelopes is that the expected gain is positive for every value of N. We show this by calculating the net gain in switching from the value 2^N to 2^N+1in the pair ( 2^N, 2^N+1 ) minus the loss in switching from 2^N to 2^N-1 in ( 2^N-1, 2^N ). The expected gain is

( ( 2^N+1 - 2^N )p_N- ( 2^N - 2^N-1 ) p_N-1 )/( p_N+ p_N-1)

where p_Nis the probability of the pair ( 2^N, 2^N+1 ). This gives

( 2^N(2^N/3^N+1) - 2^N-1(2^N-1/3^N))/( 2^N/3^N+1 + 2^N-1/3^N )

= ( 2^N/3 - 2^N-2)/( 1/3 + 1/2 ) = 2^N(1/3 – 1/4 )/( 5/6 ) = 2^N(1/12)(6/5)

= 2^N/10 for N > 0 (G)

For N = 0 the gain is 1/6, but this case has to be treated separately since ‘1’ only appears in the pair ( 1, 2 ). Because the expected gain is positive for every positive value of N, we again have a paradox. It seems we should switch even without knowing the value of the contents of our envelope. Some people argue that the way out is to observe that the mean value in both envelopes is infinite so that the gain is ∞ − ∞, which is not defined. The flaw in this argument is that it does not address the gain for each value of the distribution. Suppose that you could choose between the first and the second value in the pairs ( 1, 3 ), ( 10, 30 ), ( 100, 300 ), ( 1000, 3000 ), ... You certainly would choose the second in every pair. The fact that both have infinite means is irrelevant.

Since there seems to be no way to undermine the logic of the argument in favour of switching, a broader approach is needed.

4 The general case

Since Section 2 refutes the original formulation of the paradox, 2EP needs to be redefined before it can be resolved. Suppose that S is a distribution of pairs of real positive values, (x, 2x), from which you randomly choose one value. Suppose further that the expected gain for swapping with the other value in your pair is positive in every case. Then S recreates 2EP in general form. Note that the distinction between the open and closed versions has disappeared.

It is instructive to see how the unbounded mean relates to swapping in scenarios similar to, but simpler than 2EP. If a countable set of positive real numbers has no upper bound then every element of this set is smaller than its mean, which is unbounded. This is true of the canonical example, the natural numbers.

Let D be a countably infinite distribution of positive real numbers, where the probability P(d_K), of each element d_Kis such that SP(d_K) = 1 and d_KP(d_K) >= e for all integer K > 0, where e is a positive constant. Since the expected gain of choosing a second number from D, ie Sd_KP(d_K), is unbounded, it seems that choosing again from D is preferable to sticking with a particular d_K. Yet this is obviously nonsense. What does this tell us about the gain? That in any such distribution the calculated gain does not determine the utility of switching.

This suggests a Gain Utility Disconnection principle, GUD: if a distribution of positive real numbers (or pairs thereof) is such that (a) its mean is unbounded, (b) our knowledge of the relative magnitudes of the two candidates is completely symmetrical, and (c) the expected gain for switching is positive for every value, then (d) this gain holds no implications for the utility of switching.

In Broome’s first 2EP distribution the probability of the pair ( 2^N, 2^N+1 ) is 2^N/3^N+1, so that d_NP(d_N) >= 2^N.2^N/3^N+1 = 2^2N/3^N+1 = (4/3)^N/3 >= 1/3. Since the mean is unbounded GUD tells us that the gain calculation has no bearing on utility. Note that opening our envelope does not give us any information that makes one envelope preferable to the other. That the expected value in the other envelope is greater than ours is not a denial of condition (b) because the same argument can be used in reverse. Broome’s second paradoxical example is a continuous distribution where each value, x, has the density function 1/(x + 1)². Its mean value is given by the integral of x/(x + 1)², which is a function similar to ln(x), given that the integral of 1/x is ln(x). Hence the mean is unbounded in this example and GUD applies here too.

Since Broome has shown that the paradox only arises if the mean is infinite, GUD abolishes the paradox of 2EP in every possible distribution of type S. GUD also abolishes it in SP-2EP as seen with D above. However, it is crucial to its formulation that GUD does not rule out making utility judgements in scenarios similar to 2EP which are not paradoxical. Two such cases are given below.

Case 1. Consider the distribution (1, 2), (2, 4), (4, 8) etc, where each pair has a non-zero probability. If we are told that we have the smaller value in a pair (but not 1) and are given a 50% chance of swapping our value with the smaller value in the previous pair, or a 50% chance of swapping with the smaller value in the next pair, then we should switch. However, there is no paradox in this case. GUD is qualified in such a way that it does not apply here, ie the swap is inherently skewed by our knowledge that we have the smaller value in the pair.

Case 2. Suppose that each person numbered N, for integer N > 1 has 2^N. Assume that each person is offered the choice of giving up their amount for a 3/5 chance of getting the amount of person (N - 1) and a 2/5 chance of swapping theirs for the amount of person (N + 1). Should they agree? Yes, because person N ends up gaining

1/5( 2(2^N+1-2^N)+3(2^N-1-2^N))= 1/5( 2(4.2^N-1-2.2^N-1)+3(2^N-1-2.2^N-1)) = 1/5( 8.2^N-1-4.2^N-1+3.2^N-1-6.2^N-1)) = 1/5( 2^N-1) = 2^N/10.

This is the same as G, the expected gain in Broome's first 2EP distribution. Again, GUD does not apply to this scenario because we know that person N has more than person (N – 1), so there is a basis to favour one candidate over another. The similarity of Case 2 to 2EP suggests that the gain calculation in Case 2 may not be a trustworthy measure of utility, ie the argument in favour of switching is not conclusive, but there is no paradox to resolve.

The intuitive explanation of 2EP is that in infinite sets, the mean value, which is used in gain calculations, does not indicate the central tendency in the way that it does in finite sets. In other words, the "mean value" is not the same as what we intuitively understand by the term "expected value", because every actual value is smaller than the infinite mean. From this it follows that the expected gain, which is based on the mean, does not determine utility. The positive gain calculated in problematic 2EP distributions results from the mean making the grass look taller in a paddock just like ours. Thus 2EP is a disguised version of the “paradox” that every element of an unbounded set is below average.

Note that the general case could be restated as, "Suppose that S is a distribution in which the expected gain averaged over all values converges absolutely to a positive value or else is unbounded in every ordering of the sum". If such an S could be found and were not covered by GUD then condition (c) would require generalisation. Such an extension of GUD has been omitted for the sake of simplicity and because a search of the literature has failed to find a distribution not covered by GUD that is also paradoxical in the sense just mentioned. It is conjectured that such a distribution is not possible.

5 RDP versus GUD

There are two basic ways of resolving a paradox. The first is to find a logical error, a mistaken assumption or an ambiguity in the language or representation used. As the literature review indicates, this preferred approach does not seem to work for 2EP. The other way is to assert that an accepted method of reasoning is not valid in certain restricted circumstances. This is the approach of Chalmers’ RDP, proposed in the paper cited above, as well as in GUD. When this second approach is taken it is desirable that it satisfy the following criteria:

a) Weakness. It should be as little restrictive as possible, while still resolving the paradox, or class of paradoxes, in question. In particular, it should not prevent us from making valid choices in non-problematic cases. If a medical analogy be permitted then it should be a minimally invasive procedure, like micro-surgery.

b) Reasonability. It should embody an intuitive understanding of the nature of the problem.

c) Relevance. It should arise out of the problem itself rather than be a high-order principle brought in without specific reference to the actual problem.

d) Non-arbitrariness. The chosen approach should not be one out of multiple possible candidates that appear equally reasonable.

e) Simplicity.

GUD, the solution proposed in this paper, is an alternative to Chalmers' RDP. Since both are negative results it is unlikely that either will lead to a contradiction. However, it is important to decide which is preferable.

Firstly, RDP is much stronger than GUD. In cases where the mean is unbounded, RDP invalidates proofs that use a basic method of argument, viz "For all a in A ...", which is a common coinage in mathematical proofs. RDP seems like advocating surgery for the common cold. The medicine is too strong by far. RDP is a new logical principle, whereas GUD merely modifies the interpretation of the meaning of gain, in particular its relation to utility. RDP seems to rule out making a rational choice in the two cases of section 4, and even in the last example of section 3. If so, then it is too restrictive.

Secondly, intuitively it makes sense to say, “If every member of a set is below average then the average is a misleading measure to use on that set when making decisions regarding utility.” (Note that the expected gain is based on the average.) By contrast, RDP is highly counter-intuitive. It has the flavour of a measure of last resort.

Thirdly, GUD is closely related to the problem in question. It invalidates a specific rule of reasoning that leads to problems in 2EP. RDP is far more general and does not pertain to a relatively small, problem-specific step in the chain of paradoxical reasoning.

Fourthly, if we are to resort to a notion as radical as RDP then we could call into question other logical principles employed in the chain of reasoning that results in the paradox, eg the validity of the symmetry argument. GUD, by contrast, seems to be the approach of choice, given the counter-intuitive properties of the mean value.

Lastly, though RDP is simpler to state than GUD, it is difficult to know where it applies, given that it has not been qualified. The simplicity of RDP is due to its not being precise. GUD is not as neat as RDP because it is qualified to ensure that it applies only to paradoxical cases.

6 Conclusion

Whether we open an envelope or not, we should be indifferent to switching because of symmetry: nothing suggests that one envelope is preferable to the other.

7 Discussion

The author wishes to acknowledge as key milestones the papers of Broome and Chalmers mentioned above. Broome provided a precise test that any solution must pass. Chalmers took the creative step of invoking a new principle as the key to resolving a paradox that resisted all conventional attempts. He also drew attention to the finite value versus infinite expectation conundrum. In a sense, the present paper merely re-interprets their results.

It is strange that such a childishly simple problem has no simple solution. It could be claimed that the two envelope paradox is the simplest logical puzzle that is (a) solvable and (b) has no easy solution, ie no solution that can be readily explained to people without mathematical training.

A paradox such as the two envelope puzzle consists of two arguments which lead to contradictory conclusions. To resolve such a paradox it is not sufficient to claim that one argument is totally sound and therefore that the other must be wrong. The only way to resolve such a paradox is to show where there is a mistake in one of the competing arguments. We must show exactly at what point the argument in favour of switching goes wrong, without reference to the competing argument and even without reference to the contradictions that result from the switching argument. In this case, the mistake is the assumption that the gain calculation has a bearing on utility when the mean is infinite.

To understand the paradox intuitively it helps to look at a real-world version of the puzzle, where only finite amounts are possible. Let's assume that we know in advance that the envelopes can only contain one of the pairs ( 1, 2 ), ( 2, 4 ), ( 4, 8 ) up to ( 2⁹⁹, 2¹⁰⁰ ) and that the chance of getting each pair is the same, ie 1/100. In the general case it did not matter whether we opened the first envelope or not. In this finite version it does. If the paradox statement does not include opening the envelope then indifference is still the correct solution. It is true that we will gain, statistically speaking, if our envelope contains 2⁹⁹ or less, but all these gains are exactly balanced by the loss if we happen to pick the envelope that contains 2¹⁰⁰. Here is the calculation where we add the gain in going from x to 2x and subtract the loss in going from x to x/2 (method 2):

If our envelope contains 1 then the gain is 1/200( (2 - 1) ) = 1/200

If our envelope contains 2 then the gain is 1/200( (1 - 2) + (4 - 2) ) = 1/200

If our envelope contains 4 then the gain is 1/200( (2 - 4) + (8 - 4) ) = 2/200

If our envelope contains 8 then the gain is 1/200( (4 - 8) + (16 - 8) ) = 4/200

…

If our envelope contains 2⁹⁹ then the gain is 1/200( (2⁹⁸ - 2⁹⁹) + (2¹⁰⁰ - 2⁹⁹) )

= 1/200( -2⁹⁸ + 2⁹⁹ ) = 2⁹⁸/200

If our envelope contains 2¹⁰⁰ then the gain is 1/200( 2⁹⁹ - 2¹⁰⁰ ) = -2⁹⁹/200

Summing, we find that the expected gain averaged over all values is 1/200( 1 + 1 + 2 + 4 + 8 + ... + 2⁹⁸ - 2⁹⁹ ) = 0.

The result is 0 because of the boundary conditions. By contrast, in the infinite case there is no payback, as there is in the last term of the calculation above. This is because in the general case, no matter what our envelope contains, the other one could contain twice as much, ie the “payback” grows and is pushed further away. It is asserted that the general case of the finite version of 2EP (ie where the highest value is a fixed number, though its value need not be known) will exhibit payback behaviour as above. Furthermore, the expected gain using both method 2 and method 1 will be zero.

If the paradox statement for the specific finite case above does include the extra information that we opened an envelope and found 128 then we should switch. Our expected gain is 1/2( (64 - 128) + (256 - 128) ) = 32. The symmetry argument no longer applies because we now know where we are in the distribution. Specifically, we know that our envelope does not contain the maximum amount of 2¹⁰⁰, so switching is statistically beneficial. Had we opened our envelope and found 2¹⁰⁰ we would not switch. Finally, what about method 1? This too no longer applies because it only gives us the overall answer for all cases, whereas we are no longer interested in the solution for all values (described above), but only in the case where our envelope contains 128 and the other holds 64 or 256.

Tad Boniecki
Original solution posted 15 April 2006

This version posted 12 July 2012

Footnote 1

Both the distribution and the value of the gain are found in Broome, John (1995) The Two-envelope Paradox. Analysis 55, 6-11.

Home IFAQ Home IFAQ Qs Thinkers Etc Forum Aphorisms Puzzles Humour Poetry Fiction About