Two Envelope Paradox Solution
The Paradox
There are two envelopes containing money. One has twice as much money as the other. You choose an envelope, open it, and can keep it, or swap it for the other. Should you keep it or switch?
Method 1. Let the envelopes contain x and 2x. If you find x in your
envelope then by switching to the other envelope you gain x. If you find 2x
then switching loses x. The net gain is zero. Therefore switching is of no
benefit.
Method
2. If your envelope contains x then the other one must have 2x or x/2.
Switching means either a gain of x or a loss of x/2. Since these are equally
probable you should switch.
Synopsis
Part 1 demonstrates that 2x and x/2 are not
equiprobable, though this is not enough to resolve the paradox. Part 2 revives
the paradox by presenting a specific distribution that appears to give positive
gain for switching for all x. This gain calculation is shown to be invalid.
Part 3 generalises this result to show that switching is of no benefit in the
general case, regardless of whether an envelope is opened or not. The
Discussion shows what happens in the finite case and attempts to explain the
paradox intuitively. Essentially,
the two envelope paradox is a zero-sum game in which the choice of the order of
summation can give a false positive value for the gain.
Part 1: The original paradox undermined
Surprisingly,
it is not true that all numbers can occur with equal probability in the
envelopes. If it were so, and the probability of any number x was P( x ) = b, a
constant value, then because there are infinitely many real numbers, the sum of
all the probabilities would be b times infinity, whereas probabilities must add
up to 1. So we know that the
probability of large values must fall off. More precisely, P( x ) must
approach 0 as x approaches infinity. This undermines the switching argument,
which posited that the probability of 2x in the other envelope is the same as
that of x/2.
You
may find this explanation unsatisfactory. The paradox arises because we have
the notion that the chain of pairs of envelopes can stretch forever, with each
succeeding pair having equal probability but with double the values. In fact,
either the chain must be finite or else the probabilities must diminish.
Does
this dispose of the two envelope paradox? It doesn't, because the paradox can
be resuscitated, as shown below. If you want to avoid reading a mathematical
solution then read the Simple
Explanation instead.
Part 2: A special
case of the paradox
Consider
the distribution of envelopes containing the pairs ( 1, 2 ), ( 2, 4 ), ( 4, 8
), ( 8, 16 ) and so on, where each such pair ( 2n, 2n+1 )
occurs with probability 2n/3n+1 for integer n >= 0. It
is not hard to show that the probability sum, 1/3 + 2/9 + 4/27 + 8/81 + ...
converges to a finite value (actually 1), so there is no problem with infinite
probabilities. (Note also that large numbers occur less often than small ones.)
The surprising feature of this distribution of envelopes is that the gain
appears to be positive for every value of n. We use method 2 to
calculate the net gain in switching from the value 2n to 2n+1 in
the pair ( 2n, 2n+1 ) minus the loss in switching from 2n
to 2n-1 in ( 2n-1, 2n ).
Since
there is a 50% chance of picking either of the two values in each pair, each term
begins with ½. The next part is the probability of this pair of envelopes being
chosen and the third part is the gain of switching. The net gain is the sum of
the two values.
1/2(
2n/3n+1 )( 2n+1 - 2n ) + 1/2( 2n-1/3n
)( 2n-1 - 2n )
= 1/2( 2n/3n+1 )( 2n ) + 1/2( 2n-1/3n
)( -2n-1 )
= 22n-1/3n+1 - 22n-3/3n
= ( 22n-3/3n )( 4/3 - 1 )
= 22n-3/3n+1 for
n > 0 (G)
Because
this is positive for every positive value of n, we are back with the original
paradox. For n = 0 the gain is 1/6, but this case has to be treated separately
since ‘1’ only appears in the pair ( 1, 2 ). It seems we should switch even
without knowing the value of the contents of our envelope. Some people argue
that the way out is to observe that the distribution has an infinite mean, so
that the expected gain from switching is ∞ − ∞, which is not
defined. The flaw in this argument is that we are not interested in the average
gain but in the gain for each value of the distribution. Suppose that you could
choose between the first and the second value in the pairs ( 1, 3 ), ( 10,
30 ), ( 100, 300 ), ( 1000, 3000 ), ... You certainly would choose the second
in every pair. The fact that both have infinite means is irrelevant.
Let's
look at the raw gain for each value in every pair of envelopes, ie the gain in
switching from 1 to 2, the loss in switching from 2 to 1, the gain in going
from 2 to 4, the loss in switching from 4 to 2, etc. Note that G gives us the net
gain after these positive and negative values have been summed for each value
of n. The raw gain is 1/6( 2 – 1 ) for the first, 1/6( 1 – 2 ) for the
second, and 1/9( 4 – 2 ) for the third, giving:
1/6
- 1/6 + 2/9 - 2/9 + 8/27 - 8/27 + 32/81 - 32/81 + 128/243 - 128/243 +
... (S)
The
terms in this infinite series of alternating values repeat with opposite sign
then increase. It is a well-known property of such series that they cannot be
summed. See Footnote 1. Note that the individual terms
of S are given by (4/3)n/6, which increases steadily. So we can
obtain nonsensical results by summing S in various ways. When we calculate the
gain using method 2 we are in effect summing the terms of the series S by
bracketing its terms as in (b) in the footnote:
1/6
+ ( -1/6 + 2/9 ) + ( -2/9 + 8/27 ) + ( -8/27 + 32/81 ) + ( -32/81 + 128/243 ) +
...
(A)
The
sums of the bracketed terms (ie 1/18, 2/27, 8/81 etc) are given by the formula
G for n > 0, ie each term gives the gain for a particular value of n. Of
course, one is tempted to say that 1/6 - 1/6 + 2/9 - 2/9 + 8/27 - 8/27 + 32/81
- 32/81 + 128/243 - 128/243 + ... is simply zero, but in truth the sum is
undefined. In fact, getting zero is exactly what we do when we apply method 1,
ie observing that if we have x in our envelope then we gain x by going to 2x,
and that if we have 2x then we lose x by going to x. Yet method 1 also sums A,
and hence is as invalid as method 2 (the gain calculated from x to 2x vs from x
to x/2), though unlike method 2, it does not lead us to a paradoxical
conclusion - the insatiable itch to switch.
My
contention is that taking the sum of an increasing oscillating series, which is
what we do to calculate the individual gains in A, is invalid. Logically it
makes sense to bracket the series as in A because this should give us the net
gain for each value of n. The catch is that mathematically this is invalid due
to the weird behaviour of infinite increasing series whose terms have
alternating signs. To understand this intuitively, imagine that the
"payback" grows, is being pushed ever further away, and is never
realised due to the infinite length of the series.
Of
course the problems with oscillating series only occur if they are infinite.
Any finite series such as 1 - 1 + 2 - 2 + 3 - 3 + 4 - 4 + 5 - 5 (ie with only
10 terms), can be summed in any order and will always give the same result.
Likewise, no finite distribution can generate the paradox because the value at
the end of a finite chain of envelopes acts as the payback. This is shown in
the postscript.
What
about the open envelope version of the paradox? If we open an envelope and find
that it contains 128, should we switch? Substituting n = 5 in G gives the gain
of 27/36 = 128/729. It seems that we can validly extract
just one bracketed element from the oscillating series A. Or can we? Recall
that the reason we cannot sum all of A is that changing the order of the terms
changes the sum. By extracting a single bracketed term out of A we are
instantiating the same problem, ie we are choosing a specific order so as to
obtain a specific result. This is arbitrary, and hence is just as fallacious in
the case of a single term as in the infinite one.
Part 3: The general case
What
of the general case? This involves every possible distribution as the source of
the two envelopes. If you can see why I invoke the notion of distributions then
read on, else see Footnote 2. Finite chains present no
problems, so let's look at an infinite chain, where the pair ( 2n, 2n+1
) occurs with probability P(n) for n >= 0. What is the gain for switching
for each value of n? Here is the net gain for n = 0, 1, 2 and 3 (for simplicity
I omit the factor of ½ that should precede every term):
P(0) * 1 = P(0)
P(0)
( 20 - 21 )
+ P(1) ( 22 - 21
) = -
P(0) + P(1) *
2
P(1)
( 21 - 22 ) +
P(2) ( 23 - 22 )
= - P(1) * 2
+ P(2) * 4
P(2)
( 22 - 23 )
+ P(3) ( 24 - 23
) = -
P(2) * 4 + P(3) * 8
Writing
out the total gain for all values we see that our friend, the alternating
series, is back:
P(0)
- P(0) + P(1) * 2 - P(1) * 2 + P(2) *
4 - P(2) * 4 + P(3) * 8 - P(3) * 8 +
P(4) * 16 + …
d0
- d0 + d1
- d1 + d2 - d2
+ d3 - d3 +
... (A1)
where
d0 ( = P(0) ) is the gain in going from n = 0 to n = 1, -d0
is the loss in going from n = 1 to n = 0, d1 is the gain in going
from n = 1 to n = 2 and so on. A1 gives the gain calculated using method 1. The
gain using method 2 is found by bracketing the series as we did in Part 2 (when
we got the series A):
d0
+ ( - d0 + d1 ) + ( - d1 + d2 )
+ ( - d2 + d3 ) +
...
(A2)
Is
it valid to sum the above? It may be valid or not, depending on the nature of
the series. Essentially there are two possibilities, the series is well-behaved
or it isn't. By "well-behaved" I mean that its terms can be
summed in any order to produce the same result (which may be ∞,
-∞, or a real number). One thing we know for sure: if A2 is well-behaved
then it will sum to zero, in which case there is no paradox, as 0 = (d0
- d0) + (d1 - d1) etc. If it is not
well-behaved then we should not try to sum it. RIP.
Or
are you still unconvinced? You may think that if the series converges when
summed in a particular order then we should accept the total it calculates and
hence that the paradox is unresolved. Clark and Shackel present a conditionally
convergent series for the gain that is perhaps the sharpest instantiation
of the paradox. My answer to this is in Footnote 3.
What
about if we open an envelope in the case where A2 is well-behaved? We can
choose to use the term ( - dn - 1 + dn ) for the gain for
the value 2n. Does this revive the paradox? No. Some of these terms will give gains, others losses,
and all will exactly balance out, since A2 is well-behaved. Finally, if we open
an envelope in the case where A2 is not well-behaved then the argument
at the end of Part 2 tells us that we cannot choose an arbitrary grouping of
part of A1 and proclaim this to give a valid answer.
You
may object that I have not taken the general case. For instance, the
distribution could be a decreasing one, such as ( 2-n, 2-n-1
) with probability P(n) for n >= 0. It could also be double-ended - an
increasing distribution appended to a decreasing one. I claim that the points
made in the preceding two paragraphs are still valid.
If
the open envelope contains 100 you might counter that we are not faced with
infinitely many cases but with just two possibilities: ( 100, 50 ) and ( 100,
200 ). This is the very heart of the paradox: the simple situation just
described derives from the general case of infinitely many possible
distributions of pairs of envelopes, and we cannot ignore the probabilistic
origins of our deceptively simple two cases. To see it as an each way bet on (
100, 50 ) vs ( 100, 200 ) is equivalent to saying that the original
distribution consisted of only the two pairs ( 100, 50 ) and ( 100, 200 ) and
that one of these has been randomly chosen. You cannot throw away the history
of the problem. Still not convinced? Suppose you had a box containing two white
pearls and one black one. You withdraw two pearls without looking at them, then
you pick one of these and note that it is black. What is the chance that the
second pearl is also black? Ms Maple, your sixth form teacher was right,
history matters!
Conclusion
Whether we open an envelope or not, we
should be indifferent to switching because of symmetry: nothing suggests that
one envelope is preferable to the other.
Discussion
It
is sad that such a childishly simple problem has no simple solution. I believe that
the two envelope paradox is the simplest logical puzzle that is (a) solvable
and (b) has no easy solution, ie no solution that can be readily explained to
people without mathematical training. Even now I find the solution
unsatisfying.
A
paradox such as the two envelope puzzle consists of two arguments which lead to
contradictory conclusions. To resolve such a paradox it is not sufficient to
claim that one argument is totally sound and therefore that the other must be
wrong. The only way to resolve such a paradox is to show where there is a
mistake in one of the competing arguments. We must show exactly at what point
the argument in favour of switching goes wrong, without reference to the
competing argument and even without reference to the contradictions that result
from the switching argument. In this case the mistake is the assumption that it
is valid to calculate the gain. Method 1 and method 2 sum the same series but
in a different way, producing different sums. Both methods are actually
invalid. This puzzle is an infinity paradox, though the infinity is hidden.
Essentially,
the two envelope paradox is a zero-sum game in which the choice of the order of
summation can give a false positive value for the gain.
It is vaguely analogous to Bertrand's Paradox, in
which the question asked can be answered in logically correct ways that give
different answers. This forces us to conclude that Bertrand’s question is not
sufficiently precise. In the two envelope paradox we are able to calculate the
gain in a number of ways, each of which is mathematically sound, yet which give
different answers for what must be a unique value. In fact we know that the
value is actually zero by symmetry. From this we conclude that we cannot make
the calculation. Specifically, that we cannot sum an oscillating or
conditionally convergent series to give us the answer.
To
understand the solution intuitively let’s look at a real-world version of the
puzzle, where only finite amounts are possible. Let's assume that we know in
advance that the envelopes can only contain one of the pairs ( 1, 2 ), ( 2, 4
), ( 4, 8 ) up to ( 299, 2100 ) and that the chance of
getting each pair is the same, ie 1/100. In the general case it did not matter
whether we opened the first envelope or not. In this particular finite version
it does. If the paradox statement does not include opening the envelope
then indifference is still the correct solution. It is true that we will gain,
statistically speaking, if our envelope contains 299 or less, but
all these gains are exactly balanced by the loss if we happen to pick the
envelope that contains 2100. Here is the calculation where we add
the gain in going from x to 2x and subtract the loss in going from x to x/2
(method 2):
If
our envelope contains 1 then the gain is 1/200( (2 - 1) ) = 1/200
If
our envelope contains 2 then the gain is 1/200( (1 - 2) + (4 - 2) ) = 1/200
If
our envelope contains 4 then the gain is 1/200( (2 - 4) + (8 - 4) ) = 2/200
If
our envelope contains 8 then the gain is 1/200( (4 - 8) + (16 - 8) ) = 4/200
If
our envelope contains 299 then the gain is 1/200( (298 -
299) + (2100 - 299) )
= 1/200(
-298 + 299 )
= 298/200
If
our envelope contains 2100 then the gain is 1/200( 299 -
2100 ) = -299/200
Summing,
we find that the expected gain = 1/200( 1 + 1 + 2 + 4 + 8 + ... + 298
- 299 ) = 0.
We
could have avoided this work. Summing a finite series is not affected by the
order in which we add its terms, ie we know that the answer has to be zero
because of method 1. Applying method 2 above we also got 0 gain because of the
boundary conditions. By contrast, in the infinite case there is no
payback, as there is in the last term of the calculation above. This is because
in the general case, no matter what our envelope contains, the other one could
contain twice as much. I assert that the general case of the finite
version of the two envelope paradox (ie where the highest value is at most N,
where N is a fixed number, though its value need not be known) will exhibit
payback behaviour as above. Furthermore, I assert that the gain using both
method 2 and method 1 will be zero.
If
the paradox statement for the specific finite case above does include
the extra information that we opened an envelope and found 128 then we should switch.
Our expected gain is 1/2( (64 - 128) + (256 - 128) ) = 32. What about the
symmetry argument? It no longer applies because we now know where we are in
the distribution. Specifically we know that our envelope does not contain
the maximum amount of 2100, so switching is statistically
beneficial. Had we opened our envelope and found 2100 we would not
switch. Finally, what about method 1? This too no longer applies because it
only gives us the overall answer for all cases, whereas we are no longer
interested in the solution for all values (which is described above), but only
in the case where our envelope contains 128 and the other holds 64 or 256.
Tad Boniecki
Posted 15 April 2006
Revised 16 Nov 2006
What
does that mean? The sum is not meaningful because it oscillates. Since there is
no last term in an infinite series we cannot say whether it is odd or even, ie
whether it is positive or negative, and hence whether the sum is positive or
zero. Increasing oscillating series like this have a further interesting
property, namely that by re-arranging their terms we can make them appear
to sum to widely different totals. Here is an illustration:
(a)
1 - 1 + 2 - 2 + 3 - 3 + 4 - 4 + ... = ( 1 - 1 ) + ( 2 - 2 ) + ( 3 - 3 ) + ( 4 -
4 ) + ...
=
0 + 0 + 0 + 0 +
...
= 0
(b)
1 - 1 + 2 - 2 + 3 - 3 + 4 - 4 + ... = 1 + ( - 1 + 2 ) + ( - 2 + 3 ) + ( - 3 + 4
) + ( - 4 + 5 ) + ...
=
1 + 1 + 1 + 1 + 1 +
...
= ∞
(c)
1 - 1 + 2 - 2 + 3 - 3 + 4 - 4 + ... = ( 1 - 1 ) + ( 2 - 2 ) + ( 3 - 3 ) + ( 4 -
4 ) + ...
=
- 1 + 1 - 2 + 2 - 3 + 3 - 4 + 4 + ... (reversing the order in each pair)
=
- 1 + ( 1 - 2 ) + ( 2 - 3 ) + ( 3 - 4 ) + ( 4 - 5 ) + ...
=
- 1 - 1 - 1 - 1 - 1 +
...
= -∞
The
point is that a series like the one above simply cannot be summed, and
that we can manipulate it to appear to sum to just about any value. The
same is true of the series S, that sums the gain from our distribution.
Footnote 2
Why
is it necessary to say that the two envelopes come from a distribution?
Essentially, we can approach the two envelope problem in one of two ways.
Firstly, we can treat it as a fantasy in which anything can happen without any
rational rules, ie the two envelopes appear magically out of nowhere, with no
probabilities attached, and hence there is no possibility of making a rational
calculation. This is like asking what is the chance that Lance Armstrong is
thinking about grated carrots at this moment. Such a question cannot be
answered mathematically to produce an answer in which we could have confidence.
Put simply, if there is no framework within which a probability calculation can
be made then we cannot make it.
The
second choice is to treat the puzzle as a situation that can be analysed using
probability theory. If we are to make a probability calculation then we need to
make a basic assumption - that our sample (the two envelopes) comes from some
kind of distribution, ie it comes from a well-defined range of probabilities.
We do not know the nature of this distribution, and we should not make any
assumptions about it, but we have to assume that it exists in order to
calculate a result for the expected gain of switching envelopes.
What
is meant by a distribution? Here are two examples, the first finite, the second
infinite:
(I)
( 100, 200 ) with probability 37% and ( 6, 12 ) with probability 63%.
(II)
(1, 2), (2, 4), (4, 8), (8, 16), (16, 32), ... with the probability of the pair
( 2n, 2n + 1 ) being
2-n -1 for all integers n >= 0. Note that the
probabilities in every possible distribution must add up to 1.
The
general case that we must address while solving the two envelope paradox is the
collection of all possible distributions like the two given above. We
need to calculate a double sum, firstly over all possible distributions of two
envelopes, and secondly over all possible choices of two envelopes within each
of those distributions.
Footnote 3
The series giving the gain, ie A2,
may turn out to be a conditionally convergent series, similar to the
alternating harmonic series:
1
- 1/2 + 1/3 - 1/4 + 1/5 - 1/6 + 1/7 ...
This series sums to ln2, but only if we sum it in the order given above. This and other conditionally convergent series can be re-arranged to sum to any value we care to name. We can split the terms to get
ln2
= (2 - 1) - 1/2 + (2/3 - 1/3) - 1/4 + (2/5 - 1/5) - 1/6 + (2/7 - 1/7) ...
and divide both sides by 2, giving
ln2/2
= 1 - 1/2 - 1/4 + 1/3 - 1/6 - 1/8 + 1/5 - 1/10 - 1/12 + 1/7 - 1/14 ...
The right hand side of the above
equation contains exactly the same terms as the original, yet the sum is half
the value due to re-arrangement. Note that we have not changed the order of
summation in the above steps so that the logic is valid.
Clark and
Shackel show an instance of the paradox where the gain is a conditionally
convergent series similar to the above. In other words they give an instance of
A2 that converges to a non-zero value. How do we arrive at A2? It is by summing
the gain in the most obvious way, ie by adding the positive term in the first pair of envelopes, subtracting
the first negative, then adding the positive term in the second pair and
subtracting the negative term in the second pair, and so on.
Alternatively, we can sum the gain by
adding the first positive amount, then subtract the negative term from the same
pair, then subtract the negative term from the second pair. Then we add the
positive term from the second pair, subtract the negative terms from the third
and fourth pairs. And so on, mirroring the procedure that sums to ln2/2 in the
above example. Obviously, by summing in this way we will reach a different
value for the gain if A2 is conditionally convergent. Since the gain
calculation must give us a single value it follows that we cannot sum A2 to
calculate the gain.
It may be objected that we are
summing the gain in an "unnatural way". While it is true that the
alternative summation given above is not the obvious way to sum the gain, there
is nothing to say that it is "wrong". It is just as valid to sum it
in this order as the "natural" or canonical order that gives A2.
Home IFAQ Home IFAQ Qs Thinkers Etc Forum Aphorisms Puzzles
Humour
Poetry
Fiction
About