Two Envelope Paradox Solution
There are two envelopes, each of which contains a positive real number. One number is twice as large as the other. You choose an envelope, open it to find x, and can keep it, or swap it for the other. In order to maximise your value should you switch?
Method 1. Let the envelopes contain X and 2X. If x = X then by switching
to the other envelope you gain X. If x = 2X then switching loses X. The net
gain is zero, so switching is of no benefit.
Method
2. If your envelope contains x then the other one must have 2x or x/2.
Switching means either a gain of x or a loss of x/2. Since these are equally
probable you should switch.
This
is a simple version of the paradox. A rigorous version is given in sections 3
and 4.
Synopsis & Acronyms
A brief summary of the literature is given under
“Background“. Section 2 debunks the paradox as stated above. Section 3 presents
a specific distribution that gives positive expected gain for switching for all
x. Section 4 reformulates and solves the paradox for the general case by
invoking a new principle, which disconnects utility from gain. Section 5 compares
the proposed solution to that of Chalmers. The Discussion shows what happens in
a finite case and attempts to explain the paradox intuitively. In summary, we
are led to the false belief that swapping can statistically increase our value
because the unbounded mean ensures that every actual value is less than the
expected one.
Acronyms
used (these are explained in the text)
2EP Two Envelope Paradox
GUD Gain Utility Disconnection
principle
RDP Restricted
Dominance Principle
SP St
Petersburg paradox
SP-2EP St
Petersburg Two Envelope Paradox
1 Background
This is a famous paradox, probably first framed by the Belgian mathematician, Maurice Kraitchik in 1953 and popularised by Doug Hofstadter in 1982. It has been the subject of papers in philosophical journals in recent years and its current status is unresolved, though many claim to have solved it. Some interesting treatments of the paradox are discussed below. Note that the formulation given above is the so-called open version of the paradox. There is also a “closed” version, where the envelope is not opened. Henceforth, unless qualified, 2EP will refer to the open version of the two envelope paradox.
“The Two-envelope Paradox” by John Broome,
Analysis 55.1, Jan 1995, pp. 6-11.
Broome
presents two specific distributions which give a positive expected gain for
switching for every value. He does not present a solution to 2EP, but the
examples he gives are an excellent test for any proposed solution. In Appendix
B he gives a proof that the paradox does not arise if the mean is finite.
“The two
envelope paradox and infinite expectations” by Frank Artzenius and
David McCarthy, Analysis 57.1 January 1997, pp. 42-50.
The authors state that even though the expected gain is
positive in all cases in Broome’s distribution 1 this is not paradoxical because it does
not mean that switching is beneficial. They justify this by denying the
following proposition, “If there is a
partition of outcomes such that for every element of the partition Pj, E( X |
Pj ) is finite and E( X | Pj ) > E( Y | Pj ) then E( X ) > E( Y ).” Where
E(X) is the expectation of X. Denying the quoted proposition is similar to
Chalmers’ assertion of RDP (see below). They do not explain why the quoted
proposition is false. They also state that the expectation of switching and of
staying are both infinite, implying that there is no paradox. They refer here
to the expected gain averaged over all values. It is hard to see why a gain of
positive infinity in both directions is not a paradox, even if it is not a
paradox for standard decision theory. More importantly, there are cases where
the expected gain averaged over all values is positive but not unbounded, eg
the distribution given by Clark and Shackel below.
“The Exchange Paradox” by John D. Norton of the University of S. California, the Pacific Philosophical Quarterly 79, 1998, 34-58For the closed version of the paradox, Norton shows that the expected gain of switching averaged over all values is given by an oscillating series. However, this does not address the gain calculation in Broome, which is for every value, not for the overall average. Norton fails to extend his analysis to include the open envelope case. He believes that if we know the contents of the first envelope then swapping is always advisable. He tries to explain away this violation of symmetry, but fails to do so. In fact, his argument can be extended from the closed to the open case. This is because in the open case a single bracketed term is selected from the infinite series to give the expected value, where the (arbitrary) ordering of the series determines the value of this term. Thus the order of summation determines the result analogously to the re-ordering of the infinite series in the closed case. The order of summation gives rise to the problem with the overall average gain in both the open and closed versions of 2EP. However, as noted, even this does not resolve the paradox described by Broome.
“The Two Envelope Paradox” by Michael Clark (Nottingham University) and Nicholas Shackel (De Montfort University), Mind Magazine (Vol 109.435.July 2000)Clark and Shackel present a distribution where the expected gain averaged over all values converges conditionally to 7/12. Essentially, they say that the calculation giving positive gain is incorrect because it does not respect the symmetry of the situation, which is true but does not prove anything. They fail to show at what point there is a mistake in the calculation, which is necessary to resolve the paradox. As for the open version, they admit that the expected gain is positive for each value, yet they claim that the overall gain over the whole run will not be, by arguing that what is true of the closed version of the puzzle must apply to the open version. This is not satisfactory, as the paradox of positive expected gain for every value remains. J. Weisberg and C. Meacham explore these failings at some length in “Clark and Shackel on the Two-Envelope Paradox” in Mind 112, 685-689 (2003). Clark and Shackel’s 7/12 result is not paradoxical because their series is conditionally convergent, so that another ordering, which would be equally valid, would give a different result.
“The St
Petersburg Two-Envelope Paradox” by David Chalmers at the Dept of
Philosophy of the University of Arizona, Analysis 62:155-57, 2002
Chalmers introduces an ad hoc principle,
which might be termed the “Restricted Dominance Principle" (RDP) in order
to abolish the paradox. RDP states that in some cases where the mean is
unbounded, it is true that even though A is preferable to B for every value of
x, it is not true that A is preferable to B. His rationale for invoking RDP is
that it seems to be the only way of resolving the St Petersburg Two Envelope
Paradox (SP-2EP), which combines the classical St Petersburg paradox (SP) with
2EP, ie the two envelopes contain arbitrary powers of 2. Chalmers argues that
in SP-2EP if we open our envelope then the finite number we find cannot compare
to the unbounded expected value of the other envelope. Hence in every
particular case we should switch, yet by symmetry switching cannot be of
benefit, so he proposes RDP.
Chalmers' argument that an unbounded expectation is always
preferable to a finite value does not apply to
2EP because the value
in the other envelope is bounded with respect to the value in the first one, ie
whatever is in the first envelope, the other can at most contain twice as much.
Furthermore, RDP is not needed to resolve SP-2EP either.
Consider the case where we have two sets, A and B, with some knowledge of their contents. If A and B are finite then it is unproblematic to compare every value in A with the average of B. However, if both A and B have infinite averages it is a different story. We can compare any value, a, in A with the average of B, but the point is we cannot draw any conclusion from this, ie we cannot conclude that a is of less benefit than selecting from B. We cannot say so because it's a property of sets of real numbers having an infinite average value that every element of such a set is below average. If we acknowledge this simple truth then we realise that finding that the value in our envelope is smaller than the expected value in the other is of no consequence, ie RDP is not needed. This is discussed in more detail in section 4. Indeed, it is the key point of the present paper.
"Envelopes
and Indifference" by Graham Priest and Greg Restall of
the Philosophy Dept of Melbourne University, (2003) published on the Net.
Priest and Restall attack the problem from the point of view of modal logic. Parts of their answer are excellent, especially the observation that in some formulations 2EP is under-determined and their proposal of three mechanisms to generate the paradox situation. However, I don’t think that they actually dispose of the kernel of the paradox, but then I am unable to follow their argument (despite receiving their clarifications). In addition, their reasoning does not apply to Broome’s calculation 1.
“The
Two Envelope Paradox and Using Variables Within the Expectation Formula” by
E. Schwitzgebel and J. Dever of the Universities of California
and Texas (November 6, 2007), published on the Net.
This
article presents a very simple solution, namely that the x in “2x” and the x in
“x/2” in the calculation of method 2 have different expectations and hence
cannot be used in the same formula. However, this reasoning does not apply to
Broome’s gain calculation 1. Also, their
paper deals only with the closed version of the paradox.
Though these are some of the best solutions in the literature, they all seem to miss the mark. At the least, a successful solution must resolve the problem of the gain calculation given by Broome.
2 The simple paradox
resolved
If our envelope contains
the largest number in the distribution (if you don't see why there must be a
distribution then click here)
then the other must contain half as
much, so the paradox cannot arise if the values in the envelopes are bounded. Thus
the argument of method 2 relies on the presence of arbitrarily large values in
the distribution. Firstly, not all numbers in an infinite distribution can occur
with equal probability in the envelopes. If it were so, and the probability of
any number x was P( x ) = b, a constant value, then the sum of all the
probabilities would be b times infinity, whereas probabilities must add up to
1. So we know that the probability of
large values must fall off. If P( x ) > e for all x (where e > 0) then the
probabilities would sum to more than 1, so P( x ) must approach 0 as x
approaches infinity. To dissolve the simple version of the paradox it is
sufficient to show that if we have x, then P( 2x ) is smaller than P( x/2 ) for
some x, contradicting
the argument of method 2. Thus it is enough to show that P( x, 2x ) < P(
x/2, x ) for some x.
Proof:
if P( x, 2x ) is
greater than some constant value, e, for every x in the distribution then it
follows that P( x ) > e for arbitrarily large x. This is because x occurs in
( x/2, x ) and ( x, 2x ), so P( x ) would be greater than ½( e + e) = e (since
we are as likely to get the first as the second item in a pair). Hence P( x, 2x
) must tend to zero as x tends to infinity. Let P( b, 2b ) = e for a particular
b and a constant value e > 0, then P( b, 2b ) = P( 2b, 4b ) = P( 4b, 8b ) =
… = P( b2N, b2N+1 ) = e for all N > 0. This
contradicts P( x, 2x ) -> 0. So P( x, 2x ) < P( x/2, x ) for some x.
This proof is similar to the one given by Norton in the above paper. It has been included for the sake of completeness.
You
may find this explanation unsatisfactory. The paradox arises because we have
the notion that the chain of pairs of envelopes can stretch forever, with each
succeeding pair having equal probability but with double the values. In fact,
either the chain must be finite or else the probabilities must diminish.
Does this dispose of the two envelope paradox? It doesn't, because the paradox can be resuscitated, as shown below.
3 A special case of the paradox
The preceding section
dispels the paradox in its simple formulation. However, Broome and others have
produced specific distributions that re-instate the paradox in a form that
cannot be easily removed. We calculate the expected gain of switching in
Broome’s first distribution in the paper cited above.
The envelopes in
Broome’s distribution contain the pairs ( 1, 2 ), ( 2, 4 ), ( 4, 8 ), ( 8, 16 )
and so on, where each such pair ( 2N, 2N+1 ) occurs with
probability 2N/3N+1 for integer N >= 0. 1 It is not hard to show that the probability
sum, 1/3 + 2/9 + 4/27 + 8/81 + ... converges to a finite value (actually 1), so
there is no problem with infinite probabilities. The surprising feature of this
distribution of envelopes is that the expected gain is positive for every
value of N. We show this by calculating the net gain in switching from the
value 2N to 2N+1 in the pair ( 2N, 2N+1
) minus the loss in switching from 2N to 2N-1 in ( 2N-1,
2N ). The expected gain is
(
( 2N+1 - 2N )pN - ( 2N - 2N-1
) pN-1 )/( pN + pN-1 )
where
pN is the probability of the pair ( 2N, 2N+1
). This gives
(
2N ( 2N/3N+1) - 2N-1 (
2N-1/3N)
)/( 2N/3N+1 + 2N-1/3N )
= ( 2N/3 - 2N-2 )/( 1/3 + 1/2 ) = 2N(1/3 –
1/4 )/( 5/6 ) = 2N(1/12)(6/5)
= 2N/10 for N
> 0
(G)
For
N = 0 the gain is 1/6, but this case has to be treated separately since ‘1’
only appears in the pair ( 1, 2 ). Because the expected gain is positive for
every positive value of N, we again have a paradox. It seems we should switch
even without knowing the value of the contents of our envelope. Some people
argue that the way out is to observe that the mean value in both envelopes is
infinite so that the gain is ∞ − ∞, which is not defined. The
flaw in this argument is that it does not address the gain for each value of
the distribution. Suppose that you could choose between the first and the
second value in the pairs ( 1, 3 ), ( 10, 30 ), ( 100, 300 ), ( 1000, 3000
), ... You certainly would choose the second in every pair. The fact that both
have infinite means is irrelevant.
Since
there seems to be no way to undermine the logic of the argument in favour of
switching, a broader approach is needed.
4 The general case
Since
Section 2 refutes the original formulation of the paradox, 2EP needs to be redefined
before it can be resolved. Suppose that S is a distribution of pairs of real
positive values, (x, 2x), from which you randomly choose one value. Suppose
further that the expected gain for swapping with the other value in your pair
is positive in every case. Then S recreates 2EP in general form. Note that the
distinction between the open and closed versions has disappeared.
It is
instructive to see how the unbounded mean relates to swapping in scenarios
similar to, but simpler than 2EP. If a
countable set of positive real numbers has no upper bound then every element of
this set is smaller than its mean, which is unbounded. This is true of the
canonical example, the natural numbers.
Let D be a countably infinite distribution of positive
real numbers, where the probability P(dK), of each element dK is such that SP(dK) = 1 and dKP(dK)
>= e for all integer K > 0, where e is a positive constant. Since the
expected gain of choosing a second number from D, ie SdKP(dK), is unbounded, it seems
that choosing again from D is preferable to sticking with a particular dK.
Yet this is obviously nonsense. What does this tell us about the gain? That in
any such distribution the calculated gain does not determine the utility of
switching.
This suggests a Gain Utility Disconnection principle, GUD:
if a distribution of positive real
numbers (or pairs thereof) is such that (a) its mean is unbounded, (b) our
knowledge of the relative magnitudes of the two candidates is completely
symmetrical, and (c) the expected gain for
switching is positive for every value, then (d) this gain holds no implications
for the utility of switching.
In
Broome’s first 2EP distribution the probability of the pair ( 2N, 2N+1 ) is 2N/3N+1,
so that dNP(dN) >= 2N.2N/3N+1 = 22N/3N+1 =
(4/3)N/3 >= 1/3. Since the mean is unbounded GUD tells us
that the gain calculation has no bearing on utility. Note that opening our
envelope does not give us any information that makes one envelope preferable to
the other. That the expected value in the other envelope is greater than ours
is not a denial of condition (b) because the same argument can be used in
reverse. Broome’s second paradoxical example is a continuous distribution where
each value, x, has the density function 1/(x + 1)2. Its mean value
is given by the integral of x/(x + 1)2, which is a function similar
to ln(x), given that the integral of 1/x is ln(x). Hence the mean is unbounded
in this example and GUD applies here too.
Since Broome has shown that the paradox only arises if
the mean is infinite, GUD abolishes the paradox of 2EP in every possible
distribution of type S. GUD also abolishes it in SP-2EP as seen with D above. However, it is
crucial to its formulation that GUD does not rule out making utility judgements
in scenarios similar to 2EP which are not paradoxical. Two such cases are given
below.
Case 1. Consider the distribution (1, 2), (2, 4), (4, 8) etc, where each pair has a
non-zero probability. If we are told that we have the smaller value in a pair
(but not 1) and are given a 50% chance of swapping our value with the smaller
value in the previous pair, or a 50% chance of swapping with the smaller value
in the next pair, then we should switch. However, there is no paradox in this
case. GUD is qualified in such a way that it does not apply here, ie the swap
is inherently skewed by our knowledge that we have the smaller value in the
pair.
Case 2. Suppose that each person numbered N, for integer N >
1 has 2N. Assume that each
person is offered the choice of giving up their amount for a 3/5 chance of getting
the amount of person (N - 1) and a 2/5 chance of swapping theirs for the amount
of person (N + 1). Should they agree? Yes, because person N ends up gaining
1/5( 2(2N+1 - 2N) + 3(2N-1 - 2N ))
=
1/5( 2(4.2N-1 - 2.2N-1) + 3(2N-1 - 2.2N-1 )) = 1/5( 8.2N-1 - 4.2N-1
+ 3.2N-1 - 6.2N-1 )) = 1/5( 2N-1) = 2N/10.
This is the same as G, the expected gain in Broome's first 2EP distribution.
Again, GUD does not apply to this scenario because we know that person N has
more than person (N – 1), so there is a basis to favour one candidate over
another. The similarity of Case 2 to 2EP suggests that the gain calculation in
Case 2 may
not be a trustworthy measure of utility, ie the argument in favour of switching
is not conclusive, but there is no paradox to resolve.
The intuitive explanation of
2EP is that in infinite sets, the mean value, which is used in gain
calculations, does not indicate the central tendency in the way that it does in
finite sets. In other words, the "mean value" is not the same as what
we intuitively understand by the term "expected value", because every
actual value is smaller than the infinite mean. From this it follows that the
expected gain, which is based on the mean, does not determine utility. The
positive gain calculated in problematic 2EP distributions results from the mean
making the grass look taller in a paddock just like ours. Thus 2EP is a
disguised version of the “paradox” that every element of an unbounded set is
below average.
Note that the general case
could be restated as, "Suppose that S is a distribution in which the
expected gain averaged over all values converges absolutely to a positive value
or else is unbounded in every ordering of the sum". If such an S could be
found and were not covered by GUD then condition (c) would require
generalisation. Such an extension of GUD has been omitted for the sake of
simplicity and because a search of the literature has failed to find a
distribution not covered by GUD that is also paradoxical in the sense just
mentioned. It is conjectured that such a distribution is not possible.
There are two basic ways of resolving a paradox. The first
is to find a logical error, a mistaken assumption or an ambiguity in the
language or representation used. As the literature review indicates, this
preferred approach does not seem to work for 2EP. The other way is to assert that
an accepted method of reasoning is not valid in certain restricted
circumstances. This is the approach of Chalmers’ RDP, proposed in the paper
cited above, as well as in GUD. When this second approach is taken it is
desirable that it satisfy the following criteria:
a) Weakness. It should be as little restrictive as possible,
while still resolving the paradox, or class of paradoxes, in question. In
particular, it should not prevent us from making valid choices in
non-problematic cases. If a medical analogy be permitted then it should be a
minimally invasive procedure, like micro-surgery.
b) Reasonability. It should embody an intuitive
understanding of the nature of the problem.
c) Relevance. It should arise out of the problem itself
rather than be a high-order principle brought in without specific reference to
the actual problem.
d) Non-arbitrariness. The chosen approach should not be one
out of multiple possible candidates that appear equally reasonable.
e) Simplicity.
GUD, the solution proposed in this paper, is an alternative
to Chalmers' RDP. Since both are negative results it is unlikely that either
will lead to a contradiction. However, it is important to decide which is
preferable.
Firstly,
RDP is much stronger than GUD. In cases where the mean is unbounded, RDP
invalidates proofs that use a basic method of argument, viz "For all a in
A ...", which is a common coinage in mathematical proofs. RDP seems like
advocating surgery for the common cold. The medicine is too strong by far. RDP is a new logical principle, whereas GUD merely
modifies the interpretation of the meaning of gain, in particular its relation
to utility. RDP seems
to rule out making a rational choice in the two cases of section 4, and even in
the last example of section 3. If so, then it is too restrictive.
Secondly, intuitively it makes sense to say, “If every
member of a set is below average then the average is a misleading measure to
use on that set when making decisions regarding utility.” (Note that the
expected gain is based on the average.) By contrast, RDP is highly
counter-intuitive. It has the flavour of a measure of last resort.
Thirdly, GUD is closely related to the problem in question.
It invalidates a specific rule of reasoning that leads to problems in 2EP. RDP is far more general and does not pertain to a relatively small,
problem-specific step in the chain of paradoxical reasoning.
Fourthly, if we are to resort to a notion as radical as RDP
then we could call into question other logical principles employed in the chain
of reasoning that results in the paradox, eg the validity of the symmetry
argument. GUD, by contrast, seems to be the approach of choice, given the
counter-intuitive properties of the mean value.
Lastly, though RDP is simpler to state than GUD, it is
difficult to know where it applies, given that it has not been qualified. The
simplicity of RDP is due to its not being precise. GUD is not as neat as RDP
because it is qualified to ensure that it applies only to paradoxical cases.
6 Conclusion
Whether
we open an envelope or not, we should be indifferent to switching because of
symmetry: nothing suggests that one envelope is preferable to the other.
7 Discussion
The author wishes to
acknowledge as key milestones the papers of Broome and Chalmers mentioned
above. Broome provided a precise test that any solution must pass. Chalmers
took the creative step of invoking a new principle as the key to resolving a
paradox that resisted all conventional attempts. He also drew attention to the
finite value versus infinite expectation conundrum. In a sense, the present
paper merely re-interprets their results.
It
is strange that such a childishly simple problem has no simple solution. It
could be claimed that the two envelope paradox is the simplest logical puzzle
that is (a) solvable and (b) has no easy solution, ie no solution that can be
readily explained to people without mathematical training.
A
paradox such as the two envelope puzzle consists of two arguments which lead to
contradictory conclusions. To resolve such a paradox it is not sufficient to
claim that one argument is totally sound and therefore that the other must be
wrong. The only way to resolve such a paradox is to show where there is a
mistake in one of the competing arguments. We must show exactly at what point
the argument in favour of switching goes wrong, without reference to the
competing argument and even without reference to the contradictions that result
from the switching argument. In this case, the mistake is the assumption that
the gain calculation has a bearing on utility when the mean is infinite.
To
understand the paradox intuitively it helps to look at a real-world version of
the puzzle, where only finite amounts are possible. Let's assume that we know
in advance that the envelopes can only contain one of the pairs ( 1, 2 ), ( 2,
4 ), ( 4, 8 ) up to ( 299, 2100 ) and that the chance of
getting each pair is the same, ie 1/100. In the general case it did not matter
whether we opened the first envelope or not. In this finite version it does. If
the paradox statement does not include opening the envelope then
indifference is still the correct solution. It is true that we will gain,
statistically speaking, if our envelope contains 299 or less, but
all these gains are exactly balanced by the loss if we happen to pick the
envelope that contains 2100. Here is the calculation where we add
the gain in going from x to 2x and subtract the loss in going from x to x/2
(method 2):
If our
envelope contains 1 then the gain is 1/200( (2 - 1) ) = 1/200
If our
envelope contains 2 then the gain is 1/200( (1 - 2) + (4 - 2) ) = 1/200
If our
envelope contains 4 then the gain is 1/200( (2 - 4) + (8 - 4) ) = 2/200
If
our envelope contains 8 then the gain is 1/200( (4 - 8) + (16 - 8) ) = 4/200
…
If
our envelope contains 299 then the gain is 1/200( (298 -
299) + (2100 - 299) )
= 1/200( -298 + 299 ) = 298/200
If our envelope contains 2100 then the gain is 1/200( 299 -
2100 ) = -299/200
Summing,
we find that the expected gain averaged over all values is 1/200( 1 + 1 + 2 + 4
+ 8 + ... + 298 - 299 ) = 0.
The
result is 0 because of the boundary conditions. By contrast, in the infinite
case there is no payback, as there is in the last term of the calculation
above. This is because in the general case, no matter what our envelope
contains, the other one could contain twice as much, ie the “payback” grows and
is pushed further away. It is asserted that the general case of the finite
version of 2EP (ie where the highest value is a fixed number, though its value
need not be known) will exhibit payback behaviour as above. Furthermore, the
expected gain using both method 2 and method 1 will be zero.
If the paradox statement for the specific finite case above does include the extra information that we opened an envelope and found 128 then we should switch. Our expected gain is 1/2( (64 - 128) + (256 - 128) ) = 32. The symmetry argument no longer applies because we now know where we are in the distribution. Specifically, we know that our envelope does not contain the maximum amount of 2100, so switching is statistically beneficial. Had we opened our envelope and found 2100 we would not switch. Finally, what about method 1? This too no longer applies because it only gives us the overall answer for all cases, whereas we are no longer interested in the solution for all values (described above), but only in the case where our envelope contains 128 and the other holds 64 or 256.
Tad Boniecki
Original solution posted 15 April 2006
This version posted 12 July
2012
Both
the distribution and the value of the gain are found in Broome, John (1995) The
Two-envelope Paradox. Analysis 55, 6-11.