L’escalier du Diable

Welcome one, welcome all to the Point at Infinity sideshow, where today we present a tantalizing and diabolical selection of musical and mathematical curiosities. Just watch your step; these stairs can be a bit tricky.

A few months ago, you may recall, we published two posts about the Shepard tone and the Risset rhythm, aural illusions in which a tone or rhythm seems to perpetually rise or fall in pitch or in tempo but is actually repeating the same pattern over and over again, the musical equivalents of Penrose stairs.

To accompany the posts we created some sound samples so the readers could hear the illusions themselves. A couple of weeks ago, one of these samples was used in an internet radio program on audio paradoxes released by Eat This Radio, paired with some work of Jean-Claude Risset. The entire program is really excellent, ranging from a piece by J.S. Bach to mid-twentieth century audio experiments to modern electronic music, and I encourage all of you to listen to it.

One of the pieces in the radio program is a piano étude written by György Ligeti in the late twentieth century. The étude is named L’escalier du diable, or The Devil’s Staircase, and its repeated ascents of the keyboard have a striking resonance with the never-ending ascent of the Shepard tone.

The Devil’s Staircase is also the colloquial name given to a particular mathematical function introduced by Georg Cantor in the 1880s. It is a function defined on the set of real numbers between 0 and 1 and taking values in the same interval, and it has some quite curious properties. Before we discuss it, let’s take a look at (an approximation to) the graph of the function.

To appreciate the strangeness of this function, let us recall some definitions regarding functions of real numbers. Very roughly speaking, a function is called continuous if it has no sudden jumps, or if its graph can be drawn without lifting the pencil from the page. Continuous functions satisfy a number of nice properties, such as the intermediate value theorem.

The derivative of a function at a given point of its domain, if it exists, measures the rate of change of the function at that point. If the x-axis measures time and the y-axis measures the position of an object along some one-dimensional track, then the derivative can be thought of as the velocity of that object. If a function is differentiable at a point (i.e., if its derivative exists there) then it must be continuous at that point, but the converse is not necessarily true. (For example, if the graph of a function has a sharp corner at a point, then the function cannot be differentiable there.)

Let’s think about what it means for a function to have a derivative of 0 at a point. It means that, at that point, the rate of change of the function has vanished. It means that, if we zoom in sufficiently close to that point, the function should look like a constant function. Its graph should look like a horizontal line. What would it mean for a function to have a derivative of 0 almost everywhere? (Here “almost everywhere” is a technical term (which I’m not going to define) and not just me being vague.) One might think that this must imply that the function is a constant function. At almost every point in its domain, the rate of change of the function is 0, so how can the value of the function change?

One will quickly discover that this is not quite right. Consider the function defined on the real numbers whose value is 0 at all negative numbers and 1 at all non-negative numbers.

This function has derivative 0 everywhere except at 0 itself, and yet it increases from 0 to 1. It does this quite easily by being discontinuous at 0, which, in hindsight, seems sort of like cheating. So what if we also require our function to be continuous? Now we need more exotic examples, and this is where the Devil’s Staircase comes in, for the Devil’s Staircase is a continuous function, it is differentiable almost everywhere, it has a  derivative of 0 wherever its derivative is defined, and yet it still manages to increase from 0 to 1. Wild!

What is the Devil’s Staircase exactly? I’ll give two different definitions. The first proceeds via an iterative construction. Start with the function $f_0(x) = x$. Its graph, between 0 and 1, is simply a straight line segment increasing from (0,0) to (1,1). Now, look at the midpoint of this increasing line segment, and draw a horizontal line segment centered there whose length is 1/3 of the horizontal line of the original increasing segment. Now connect the ends of this line segment via straight lines to (0,0) and (1,1). This new curve is the graph of a function that we call $f_1$. It consists of two increasing line segments with one horizontal line segment between them. Now repeat the process that took us from $f_0$ to $f_1$ on each of these increasing line segments, and let $f_2$ be the function whose graph is the result. Continue in this manner, constructing $f_n$ for every natural number $n$.

It turns out that, as $n$ goes to infinity, the sequence of functions $\langle f_n \mid n \in \mathbb{N} \rangle$ converges (uniformly) to a single function. This function is the Devil’s Staircase.

A more direct but also more opaque definition is as follows: Given a real number $x$ between 0 and 1, first express $x$ in base 3 (i.e., using only 0s, 1s, and 2s). If this base 3 representation contains a 1, then replace every digit after the first 1 with a 0. Next, replace all 2s with 1s. The result has only 0s and 1s, so we can interpret it as a binary (i.e., base 2) number, and we let $f(x)$ be this value. Then the function $f$ defined in this manner is the Devil’s Staircase. Play around with this definition, and you might get a feel for what it’s doing.

And now, on our way out, some musical addenda. An encore, if you will. First, after making the Risset rhythms for the aforementioned post, I did some further coding and wrote a little program that can take any short audio snippet and make a Risset rhythm out of it. Here’s an example, first accelerating and then decelerating, using a bit from a Schubert piano trio.

You may recognize the sample from the soundtrack to Barry Lyndon.

Finally, I can’t help but include here one of my favorite pieces by Ligeti, Poema sinfónico para 100 Metrónomos.

Cover image: Devil’s Staircase Wilderness, Oregon, USA

Ultrafilters III: Infinite Elections

Today, we present a curious “practical” application of ultrafilters, especially relevant in this election season. We will eventually consider elections with infinitely many voters, but let us first look at the classical case with a finite electorate.

Suppose that there are three or more political candidates, named $C_1, C_2, \ldots, C_n$, vying for political office, and finitely many voters in the electorate. Each voter ranks the candidates in the order in which they prefer them, and the outcome of the election is a collective ranking of the candidates. (We will not deal here with elections in which the result is a single winner rather than a complete ranking, but similar results hold there. We will also not deal with elections between two candidates.)

It has long been clear that aggregating individual rankings into a collective ranking in a “fair” way is a tough problem. Difficulties already arise in the following simple example: Suppose there are three candidates: A, B, and C, and three voters: U, V, and W. Suppose U’s ranking is ABC, V’s is BCA, and W’s is CAB. No collective ranking seems particularly good here. ABC, for example, seems unfair since two of the three voters in fact prefer C to A, while the collective ranking has A first and C last! Similar problems arise with the other possibilities. This particular example was recognized by the Marquis de Condorcet (for whom Condorcet voting methods are named) in 1785.

While the best way to aggregate individual rankings into a collective ranking is unclear, there are some criteria one would expect a good system to satisfy.

1. (Unanimity) If every voter has the same ranking of candidates, then that ranking is the final outcome.
2. (Irrelevant Alternatives) The relative ranking of two candidates $C_i$ and $C_j$ in the final outcome depends only on the relative ranking of those two candidates in the individual votes.

Let us call a system for converting individual rankings into a collective ranking a voting system. One particular type of voting system is what we will call a dictatorial voting system. In a dictatorial voting system, there is one distinguished voter (call this voter D) such that, regardless of the votes of the other voters, the final ranking will be exactly the ranking in D’s vote. In the context of this vote, at least, D is a dictator who single-handedly determines the outcome of the election.

A dictatorial voting system satisfies the unanimity and irrelevant alternatives criteria for somewhat trivial reasons. It is obviously not, though, what one has in mind when thinking of free, democratic, elections, as the voices of all voters except one are entirely irrelevant. However, a remarkable and famous result of Kenneth Arrow shows that, if one wants to avoid a dictatorial voting system, one must violate either the unanimity or irrelevant alternatives criterion. This result, and others that followed it, are often paraphrased as stating that there are no perfect voting systems.

Arrow’s Impossibility Theorem: Suppose that, in an election with finitely many voters and three or more candidates, a voting system satisfies the unanimity and irrelevant alternatives criteria. Then it is a dictatorial voting system.

Let us remove now our assumption that there are only finitely many voters, while retaining the assumption that only finitely many candidates run in any given election. The notion of a voting system still makes sense, as do the unanimity and irrelevant alternatives criteria. And here ultrafilters make their entrance. Ultrafilters can be seen as ways of choosing between finitely many alternatives, and, as such, they provide natural voting systems.

Let us be a bit more precise. Suppose that, in an election in which voters rank all candidates, $X$ is the set of voters and $C_1, \ldots, C_n$ is the list of candidates, where $n$ is a natural number. Let $\mathcal{U}$ be an ultrafilter on $X$. By the properties of an ultrafilter, if we partition $X$ into finitely many pieces, then exactly one of those pieces is in $\mathcal{U}$. This allows us to define a voting system, which we will call the $\mathcal{U}$-voting system, as follows. For every possible ranking $\sigma$ of the candidates, let $X_\sigma$ be the set of voters with individual ranking $\sigma$. Since there are $n$ candidates, there are $n!$ (i.e. $n(n-1)(n-2)\ldots(2)(1)$) possible rankings of these candidates. In particular, there are only finitely many, so there is some ranking $\sigma$ such that $X_\sigma \in \mathcal{U}$. The $\mathcal{U}$-voting system then declares $\sigma$ to be the collective ranking.

These voting systems mesh with our intuition that an ultrafilter is a mechanism for specifying which subsets of a given set should be considered “large.” In an election, we expect the final outcome to reflect the desires of a “large” portion of the electorate; one way to do this is to use an ultrafilter to specify exactly which sets of voters should be considered “large” and then to declare the result of the election to be the ranking chosen by a large set.

What happens if $\mathcal{U}$ is a principal ultrafilter? Then there is one voter $x \in X$ such that $\mathcal{U} = \mathcal{U}_x = \{Y \subseteq X \mid x \in Y\}$. In this case, the $\mathcal{U}$-voting system will simply return the ranking of voter $x$ as the collective ranking, ignoring the other voters. In other words, the $\mathcal{U}$-voting system is a dictatorial voting system.

If $\mathcal{U}$ is an ultrafilter on $X$, then the $\mathcal{U}$-voting system is easily seen to satisfy the unanimity and irrelevant alternatives criteria. What might be surprising is that these are the only such voting systems! Namely, we can generalize Arrow’s Impossibility Theorem to allow for a possibly infinite electorate as follows.

Generalized Arrow’s Impossibility Theorem: Suppose that, in an election, $X$ is the set of voters and $C_1, \ldots, C_n$ is the list of candidates, where $n \geq 3$ is a natural number. Then any voting system which satisfies the unanimity and irrelevant alternatives criteria must be the $\mathcal{U}$-voting system for some ultrafilter $\mathcal{U}$ on $X$.

First note that this theorem truly does generalize Arrow’s Impossibility Theorem: if $X$ is finite, then any ultrafilter on $X$ must be principal and thus yield a dictatorial voting system. Also note that, with an infinite electorate, there are voting systems that satisfy the unanimity and irrelevant alternatives criteria but are not dictatorial: any $\mathcal{U}$-voting system in which $\mathcal{U}$ is a non-principal ultrafilter will do the trick.

Before you write to Congress suggesting that we move towards elections with infinitely many voters based on non-principal ultrafilters, though, note that these voting systems, at least naively implemented, may not be what we actually want. For, consider a $\mathcal{U}$-voting system, where $\mathcal{U}$ is a non-principal ultrafilter on an infinite electorate $X$. Suppose that the electorate consists of, say, two sexes: male and female. Then one of these sexes will comprise a set in $\mathcal{U}$, while the votes of the other will be entirely irrelevant. Or suppose that our infinite electorate is divided amongst our current fifty states. Then the set of voters in one of these states (Kansas, say) will be in the ultrafilter, in which case the election will depend solely on the votes of the Kansas voters. Perhaps this just means, though, that the members of our infinite electorate should not be so simply categorized or placed on a linear spectrum, as any attempt to do so necessarily disenfranchises certain groups. Perhaps this has something to say about finite societies as well. Perhaps…

That’s enough for today. I will not give the proof of the generalized impossibility theorem here. It’s not terribly difficult, but it is somewhat long and technical. Interested readers can consult David Galvin’s notes on ultrafilters. Come back next week, when I will present and actually prove a similar result about infinite lightbulb-switching schemes.

The transfinite subway

Today, we move from the countable to the uncountable and encounter another counter-intuitive infinite thought experiment. I first heard this one in a bar in Toronto in 2012, though it’s been around for much longer.

The math in this post is not extraordinarily difficult, but it is substantially more sophisticated than anything we’ve done here before. There is a new page, Ordinals and Cardinals, that contains some relevant definitions and facts. In particular, you will need to know what ordinals are and that $\omega_1$ is the least uncountable ordinal. We will also need the following fact:

• Any countable union of countable sets is itself countable. As a consequence, if $A$ is a countable subset of $\omega_1$ (i.e. $A$ is a countable set of countable ordinals), then $A$ is bounded below $\omega_1$ in the sense that there is $\beta < \omega_1$ such that, for every $\alpha \in A$, $\alpha < \beta$. This is illustrated in the following crude drawing.

Without further ado, down to the subway tracks.

There’s a subway line (the Transfinite Line) running from the airport to the Hilbert Hotel. The stations are labeled in order by ordinal numbers, with the airport at Station 0 and the Hilbert Hotel at Station $\omega_1$. The Transfinite Line has other peculiarities in addition to its transfinite length. On each run it makes from the airport to the hotel, the following rules must be followed. First, before leaving the airport, countably infinitely many passengers (i.e. one for each natural number) get on the train. Upon arriving at each later station prior to the Hilbert Hotel, two things happen:

• First, if there are any passengers on the train when it arrives at the station, exactly one of them gets off. This passenger cannot board this train again during this run, either at this or any later station.
• Second, countably infinitely many new passengers get on the train.

The question is simply this: How many passengers are on the train when it pulls into the Hilbert Hotel at Station $\omega_1$? Take a few moments to consider this.

I imagine a typical thought process when first encountering this problem might be roughly as follows. First, you probably think, ‘There are so many more people getting on the train than there are getting off at each station, so there must be lots of people on the train, probably uncountably many, when the train arrives at the Hilbert Hotel.’ Thinking a bit more, you might notice that this problem bears a certain similarity to the problem discussed here on Monday about gold coins. You might realize that we haven’t specified which passenger gets off the train when the train arrives at a station, and you might suspect that, as was the case with the gold coins, the answer can vary depending on this choice. You might thus argue that we have not given enough information to determine the answer.

This is a tempting thought, but, somewhat surprisingly, it is incorrect. There is a definite answer to the question, and it is not dependent on the choice of which passenger gets off at each station. To see this, though, we’re going to have to do a bit of work. We first need some facts about functions from $\omega_1$ to $\omega_1$.

Definition: Suppose $f:\omega_1 \rightarrow \omega_1$ is a function. An ordinal $\beta < \omega_1$ is called a closure point for $f$ if, for every $\alpha < \beta$, we also have $f(\alpha) < \beta$.

Lemma: Suppose $f:\omega_1 \rightarrow \omega_1$ is a function, and suppose $\alpha < \omega_1$. Then there is an ordinal $\beta$ such that $\alpha < \beta < \omega_1$ and $\beta$ is a closure point for $f$.

Proof: We will define an infinite increasing sequence of countable ordinals $\alpha_0 < \alpha_1 < \alpha_2 < \ldots$, one for each natural number, as follows. First, let $\alpha_0 = \alpha$. Next, consider the set $A_0 = \{f(\xi) \mid \xi < \alpha_0\}$, i.e. $A_0$ is the set of all values $f$ takes when applied to ordinals less than $\alpha_0$. Since $\alpha_0$ is a countable ordinal, $A_0$ is a countable set. Therefore, $A_0$ is bounded below $\omega_1$, i.e. we can find $\gamma_0 < \omega_1$ such that, for all $\eta \in A_0$, $\eta < \gamma_0$. Choose $\alpha_1 < \omega_1$ to be large enough so that $\alpha_1$ is larger than both $\alpha_0$ and $\gamma_0$.

To define $\alpha_2$, repeat the process, but with $\alpha_0$ replaced by $\alpha_1$. More precisely, let $A_1 = \{f(\xi) \mid \xi < \alpha_1\}$ and choose $\alpha_2 < \omega_1$ large enough so that $\alpha_2 > \alpha_1$ and, for all $\eta \in A_1$, $\alpha_2 > \eta$. Continue in this way to define $\alpha_n$ for every natural number $n$.

At the end of this process, let $\beta$ be the least ordinal such that $\beta > \alpha_n$ for every natural number $n$. Since there are only countably many $\alpha_ns$, we know that $\beta < \omega_1$. It is clear that $\alpha < \beta$. I claim that $\beta$ is a closure point for $f$. To see this, we must show that, for every $\xi < \beta$, $f(\xi) < \beta$. So let $\xi < \beta$. Since $\beta$ is the least ordinal greater than all of the $\alpha_ns$, there is some $m < \omega$ such that $\xi < \alpha_m$. But then $f(\xi) \in A_m$ and, when we defined $\alpha_{m+1}$, we made sure that $\alpha_{m+1} > \eta$ for every $\eta \in A_m$. Therefore, $f(\xi) < \alpha_{m+1} < \beta$, so $\beta$ is indeed a closure point for $f$. This completes the proof of the Lemma. $\square$

Now let us return to our subway. Suppose we consider a specific run of the train. For each ordinal $\alpha < \omega_1$, let $B_\alpha$ be the set of $\eta < \omega_1$ such that a passenger who boarded the train at Station $\alpha$ got off the train at Station $\eta$. Since only countably many passengers got on at Station $\alpha$, $B_\alpha$ is a countable set and therefore is bounded below $\omega_1$.

We now define a function $f:\omega_1 \rightarrow \omega_1$ as follows. For every $\alpha < \omega_1$, let $f(\alpha)$ be the least ordinal $\gamma$ such that $\gamma > \eta$ for all $\eta \in A_\alpha$ (i.e. $f(\alpha)$ is the least strict upper bound for $A_\alpha$).

Claim: If $\beta < \omega_1$ is a closure point for $f$, then the subway was empty upon arriving at Station $\beta$.

Proof: Let $\beta < \omega_1$ be a closure point for $f$, and suppose that the train was not empty when it arrived at Station $\beta$. We will derive a contradiction.

Since the train was not empty when it arrived at the station, one passenger had to disembark at Station $\beta$. Therefore, there must have been some $\alpha < \beta$ such that this passenger got on the train at Station $\alpha$. Therefore, there is a passenger who got on the train at Station $\alpha$ and off at Station $\beta$, so $\beta \in A_\alpha$. By our definition of $f$, this means that $f(\alpha) > \beta$. However, $\beta$ is a closure point for $f$ and $\alpha < \beta$, which means $f(\alpha) < \beta$. This is a contradiction, and we have finished the proof of the claim. $\square$

We are now ready to reach our conclusion.

Solution: The train was empty when it arrived at the Hilbert Hotel!

Proof: Suppose the train was not empty. We will again reach a contradiction.

Since the train was not empty, we can choose a passenger who was on the train when it got to the hotel. There must be an ordinal $\alpha < \omega_1$ such that this passenger boarded the train at Station $\alpha$. But now we can invoke our Lemma and find some ordinal $\beta < \omega_1$ such that $\alpha < \beta$ and $\beta$ is a closure point for $f$. But now, by our previous claim, the train was empty upon arriving at Station $\beta$, so our passenger who got on at Station $\alpha$ must in fact have gotten off before Station $\beta$ and, by the rules of the subway, could not have reboarded the train. This passenger therefore could not have been on the train when it arrived at the Hilbert Hotel! $\square$

Comparing this with the result from the gold coins puzzle, we get our first indication that the class of infinities is not some undifferentiated mass and that different sizes of infinity can in fact behave quite differently. We will hopefully see many more such instances of this phenomenon in later posts.

The real lesson here, though, is that if you’re going to the Hilbert Hotel, don’t take the Transfinite Line; you’ll never get there. Instead, shell out the extra money to take a cab.

P.S. You may notice that this problem is not exactly analogous to the gold coins problem in the following sense. In this problem, the passenger who gets off at a given station must have gotten on at a strictly earlier station, whereas, in the gold coins problem, on a given night, the demon is allowed to remove a coin that it placed on that night. Indeed, if the rules of the subway were revised so that, at each station, the passengers boarded the train before one passenger (possibly one who had just gotten on) disembarked, then we are in the same indeterminate situation we were in with the gold coins. (Of course, this would go against standard subway etiquette.) However, there still is a real difference between the gold coin scenario and this one. If we restrict the demon so that it can only remove coins that were placed on previous nights, we have not actually changed anything and are still in an indeterminate case (Exercise: Why?). The mathematical difference responsible for this is that, though functions from $\omega_1$ to $\omega_1$ necessarily have closure points, functions from $\omega$ to $\omega$ might not.

On demons and gold coins

One of the features of infinite sets (in the presence of the usual axioms of set theory, this can in fact be taken as the defining feature) is that an infinite set has proper subsets of the same cardinality. For example, the set of even natural numbers is a proper subset of the set of all natural numbers, yet the two sets have the same cardinality. (This feature of infinite sets is responsible for the following fact about infinite limits that you likely learned in high school calculus: $\infty - \infty$ is on the famous list of ‘indeterminate forms.’ If you try to evaluate an infinite limit and obtain $\infty - \infty$, then you do not yet have an answer. You must be more clever.) The same is obviously not true of finite sets; if you take an element away from a finite set, it always becomes strictly smaller.

For this and other reasons, our well-developed intuitions about finite sets, which we encounter every day, often fail to extend to the realm of the infinite. This week, we will look at infinite situations in which our intuition fails us. And in each case, the root of this failure of intuition will be the fact that infinite sets have proper subsets of the same cardinality. Today, we will look at the countably infinite (i.e. at infinite sets of the same cardinality as the set of natural numbers). On Thursday, we will move to the uncountable, which will require us to develop some more mathematical sophistication.

I must first make an obligatory mention of Hilbert’s Hotel, David Hilbert’s famous thought experiment about a hotel with infinitely many rooms that has no vacancy but always has room for more guests. Hilbert’s Hotel is one of those subjects that has been dealt with well by many people, so I’ll let TED-Ed and Jeff Dekofsky do my work for me.

What I really want to share today, though, is a thought experiment brought to my attention by Miklós Erdélyi-Szabó, the professor of my mathematical logic class during the semester I spent in Budapest. The situation is as follows. You have a room. Every night, while you are asleep, a demon comes into your room. First, the demon puts two gold coins into your room. Next, the demon removes one gold coin from your room. Then he leaves. Otherwise, he does nothing, and the gold coins are otherwise untouched and, unless removed by the demon one night, allowed to accumulate in your infinitely expandable room. How many gold coins are in your room after infinitely many nights? (Where ‘infinitely many’ here means what you think it means: a countably infinite sequence of nights whose temporal order is the same as that of the natural numbers.)

Your first thoughts are probably roughly like this: at the end of each night, you have one more gold coin than you had when the night started. Therefore, at the end of infinitely many nights, you must have infinitely many coins. Upon further thought, though, you might begin to feel uneasy. You might realize that what you’re dealing with is something like $\infty - \infty$ (the demon has added infinitely many coins to your room but also taken infinitely many away) and conclude that the situation is indeterminate. And here you would be right. In fact, in my statement of the problem, I have not given you enough information to determine the answer.

Let’s reformulate the problem, being a bit more careful. In fact, let’s formulate the problem in two different ways, using our old friends Ada and Bertrand, who we last saw playing an infinite round of Hypergame. Ada and Bertrand are neighbors, and they live around the corner from the Hilbert Hotel. Every night, a demon comes into each of their rooms, and, in each one, first adds two gold coins to the room and then takes one gold coin away and destroys it. This demon is tidy, though, and, in each room, places all of the gold coins in a single stack. Once the coins have been put in the stack, their relative order never changes as long as they remain in the room. In Ada’s room, every night the demon puts two gold coins on top of the stack and them removes a coin from the top of the stack. In Bertrand’s room, the demon puts two gold coins on top of the stack and then removes a coin from the bottom of the stack. After infinitely many nights, how many gold coins do Ada and Bertrand each have?

At first glance, Ada and Bertrand seem to be in exactly the same situation. But more careful analysis will reveal that, counter-intuitively, their fortunes diverge wildly after infinitely many nights.

To allow us to think about the problem, let’s fix some notation. Suppose that the coins the demon adds to each room are numbered (mentally, if not physically). Suppose the midnight visits start on Night 0 and, on Night $n$, where $n$ is any natural number, the demon adds Coins $2n$ and $2n+1$ to the top of the stack in each room, with Coin $2n$ put on the stack first and Coin $2n+1$ put on top of it. Thus, on Night 0, Coins 0 and 1 are added to each room; on Night 1, Coins 2 and 3 are added to each room, and so on.

Let’s first consider the case of Ada. On Night $n$, after Coins $2n$ and $2n+1$ are added to her stack, the top coin of the stack is removed. The coin that is removed on Night $n$, therefore, is Coin $2n+1$. The coins that get removed from Ada’s stack, then, are exactly the odd-numbered coins. The even-numbered coins stay forever. After infinitely many nights, Ada still has all of the even-numbered coins; in particular, she has infinitely many coins.

Now turn to Bertrand. Let’s think about the first few nights. On Night 0, Coins 0 and 1 get added. Coin 0 is then the bottom coin on the stack, so it gets removed. On Night 1, Coins 2 and 3 get added. The bottom coin on the stack after this, though, is Coin 1, so Coin 1 gets removed. Similarly, on Night 2, Coins 4 and 5 get added and Coin 2 gets removed. You should see an obvious pattern here, and indeed it’s easy to prove that, on Night $n$, the demon always removes Coin $n$ from Bertrand’s room. I thus claim that there are no coins left in Bertrand’s room after infinitely many days. For how could there be any? If there is even a single coin in Bertrand’s room, it would have to have a number. Suppose this number is $m$. But now you have an immediate contradiction, since Coin $m$ was removed from Bertrand’s room and destroyed on Night $m$! Even though Bertrand’s stack of coins was steadily growing throughout the process and is at all times the same size as Ada’s, in the end he is left with nothing.

P.S. The story has a happy ending, even for Bertrand. During this whole process, Ada had a lot of time to spend on number theory (Bertrand just played tennis). At the end, feeling sorry that Bertrand’s fortune vanished, she generously(?) told him that he could have any of her remaining coins whose number cannot be expressed as the sum of two prime natural numbers. So now he has at least two gold coins and knows the answer to Goldbach’s conjecture!

.

Aristotle’s Wheel, Galileo, and the Jesuits

Today, we look at another classical paradox: Aristotle’s wheel. The paradox was introduced in the text Mechanica, attributed, not without controversy, to Aristotle. It runs as follows. Consider two circular wheels, fixed rigidly, one within the other. The wheels have the same center, but the radius of the outer wheel is twice that of the inner wheel. Suppose this combined wheel rolls without slipping for exactly one full revolution, and consider the paths traced by the bottoms of the two wheels. These paths are evidently equal in length to the circumferences of the respective circles, yet the two paths are the same length, while the circumference of the outer wheel is twice that of the inner wheel. This would seem, then, to yield a contradiction.

Unlike Zeno’s paradoxes, discussed on Monday, Aristotle’s wheel contains no real mystery today. Some combination of the following two observations should be enough to convince you of this.

1. It is physically impossible for the two joined wheels to roll without at least one of them “slipping” relative to the ground. Therefore, there is no reason to think that the paths traced by the bottoms of the wheels are equal in length to the circumferences of the respective wheels.
2. Even though the length of one of the paths may not be equal to the circumference of the wheel that creates it, the set consisting of the points on the path and the set consisting of the points on the circumference of the wheel have the same cardinality, so there is no contradiction in there being a one-to-one correspondence between points on the circumference of the wheel and points on the path.

If this paradox can be resolved in such a straightforward manner, you may be wondering why I decided to dedicate an entire post to it. The reason is that Aristotle’s wheel plays a significant role in Galileo’s last published work, the influential Discourses and Mathematical Demonstrations Relating to Two New Sciences, and Galileo’s solution to the paradox is both remarkable in its own right and influential in the history of mathematics and physics.

To attack the problem of Aristotle’s wheel, Galileo makes a move that had been common at least since ancient Greece: reasoning about circles by approximating them with regular polygons. To illustrate his ideas, Galileo considers the case in which the two wheels are not circular but rather are regular hexagons. He then considers what happens when this hexagonal wheel “rolls” (or, rather, lurches in six discrete steps) along the ground for one full revolution. The situation is illustrated in the diagram below.

Consider first the outer hexagonal wheel. Initially, the wheel is at rest, with side AB resting on the ground. When the wheel makes its first step in its “roll,” it pivots around point B, and side BC comes to rest on the ground, occupying the segment BQ. After the second step, side CD comes to rest on the segment QX, and so on. Through the course of the wheel’s revolution, the entire segment AS is thus covered successively by sides of the outer wheel. Therefore, the length of the segment AS is equal to the perimeter of the outer wheel.

Now consider the inner hexagonal wheel. Initially, side HI is resting on an initial segment of HT. After the first step of the revolution, though, side IK does not come to rest on the segment IO but rather “jumps ahead” and lands on the segment OP. Similarly, after the second step, side KL jumps across the segment PY to land on the segment YZ, and so on. Therefore, the parts of the segment HT that are covered by sides of the inner wheel during the revolution alternate with parts that are skipped over. This explains why AS is the same length as HT (or, rather, HT extended by a bit equal in length to one of the sides of the inner wheel, as shown in the diagram) while the perimeter of the outer wheel is twice that of the inner wheel.

The same situation holds for polygonal wheels of any number of sides. Thus, if, for example, the wheels are regular 100,000-gons, then the lower path will be entirely covered by the sides of the outer wheel in its revolution, while the upper path will be split into 200,000 equal pieces, and these pieces will alternately be covered or skipped over by the sides of the inner wheel in its revolution. Put another way, the path traced by the bottom of the inner wheel will consist of 100,000 pieces, each the length of one of the sides of the wheel, interspersed with 100,000 “voids” of equal length. The path traced by the outer wheel will have no such voids. As a polygonal wheel gets more and more sides, though, it more and more closely approximates a circle (a circle could even be seen as a regular polygon with infinitely many infinitely short sides), so, Galileo argues, a similar situation must hold in the case of circular wheels.

I will let Galileo explain this idea in his own (translated) words:

Let us return to the consideration of the above mentioned polygons whose behavior we already understand. Now in the case of polygons with 100000 sides, the line traversed by the perimeter of the greater, i. e., the line laid down by its 100000 sides one after another, is equal to the line traced out by the 100000 sides of the smaller, provided we include the 100000 vacant spaces interspersed. So in the case of the circles, polygons having an infinitude of sides, the line traversed by the continuously distributed [cantinuamente disposti] infinitude of sides is in the greater circle equal to the line laid down by the infinitude of sides in the smaller circle but with the exception that these latter alternate with empty spaces; and since the sides are not finite in number, but infinite, so also are the intervening empty spaces not finite but infinite. The line traversed by the larger circle consists then of an infinite number of points which completely fill it; while that which is traced by the smaller circle consists of an infinite number of points which leave empty spaces and only partly fill the line. And here I wish you to observe that after dividing and resolving a line into a finite number of parts, that is, into a number which can be counted, it is not possible to arrange them again into a greater length than that which they occupied when they formed a continuum [continuate] and were connected without the interposition of as many empty spaces. But if we consider the line resolved into an infinite number of infinitely small and indivisible parts, we shall be able to conceive the line extended indefinitely by the interposition, not of a finite, but of an infinite number of infinitely small indivisible empty spaces.

In essence, what Galileo is saying is this: the reason that the paths traced by the bottoms of the wheels can be the same length is that the path traced by the inner wheel consists of infinitely many points interspersed with infinitely many infinitely small empty spaces, while the path traced by outer wheel consists only of the points and not the empty spaces!

One should not let the fact that Galileo’s solution is, by modern standards, misguided at best detract from its remarkable inventiveness or take away from the tremendous influence it and related ideas have had on mathematics and science. The ideas in Galileo’s exposition of Aristotle’s wheel are central to the development, also in the Discourses, of his celebrated law of free fall, which states that if an object starts moving from rest with uniform acceleration (as does (approximately) an object in free fall), then the distance traveled by the object is proportional to the square of the time during which it is moving. This law, commonplace today to any physics student, was groundbreaking in its time and anticipated Newton’s famous laws of motion. The Discourses also appeared at the beginning of a period of renewed interest in the question of the composition of the continuum and a revolution ushered in by the increasing acceptance of the use of infinitesimal quantities in mathematics. Galileo’s work on infinitesimals was extended and refined through the 17th century by mathematicians such as Cavalieri, Torricelli, and Wallis, who paved the way for the development of the infinitesimal calculus at the end of the century by Newton and Leibniz.

Like any radical idea, the mathematical use of infinitesimals was not immediately and universally accepted. In fact, there was strong opposition to the idea, most prominently (though certainly not exclusively) from the Catholic Church and especially the Jesuits, who were seeking to reestablish order and hierarchy in the wake of the chaos brought about by the Reformation. In the 17th century, the use of infinitesimals had not been provided with a rigorous mathematical foundation, and paradoxes frequently arose from their indiscriminate application. This was seen as threatening to the status of mathematics, as exemplified by Euclidean geometry, as an orderly realm of absolute certainty and, in some circles, as a model for theology and indeed for society as a whole. As Amir Alexander asserts in his book Infinitesimal: How a Dangerous Mathematical Theory Shaped the Modern World:

The infinitely small was a simple idea that punctured a great and beautiful dream: that the world is a perfectly rational place, governed by strict mathematical rules … By demonstrating that reality can never be reduced to strict mathematical reasoning, the infinitely small liberated the social and political order from the need for inflexible hierarchies.

I suspect that Alexander may be exaggerating the importance of the controversy over infinitesimals in the social history of Europe, but there is no doubt that the Church came down strongly against them. Many times through the first half of the 17th century, the Jesuit Revisors, who were in charge of determining what could or could not be taught in the many Jesuit colleges throughout the world and therefore, in effect, what ideas would be endorsed or condemned by the Catholic Church, issued rulings against the doctrine of infinitesimals. For example, the following is from such a ruling in 1632:

We consider this proposition to be not only repugnant to the common doctrine of Aristotle, but that it is by itself improbable, and … is disapproved and forbidden in our Society.

Finally, in 1651, the Revisors published an official list of 65 forbidden philosophical theses, including no fewer then four forbidden theses regarding infinitesimals:

25. The succession continuum and the intensity of qualities are composed of sole indivisibles.

26. Inflatable points are given, from which the continuum is composed.

30. Infinity in multitude and magnitude can be enclosed between two unities or two points.

31. Tiny vacuums are interspersed in the continuum, few or many, large or small, depending on its rarity or density.

And now we have made a full revolution back to Aristotle’s wheel, as item 31 is precisely the theory of the continuum that Galileo developed in order to explain the paradox: the path traced by the inner wheel is the same length as the path traced by the outer wheel precisely because the “tiny vacuums” interspersed in the path of the inner wheel are larger than those in the path of the outer wheel.

Eventually, of course, infinitesimals became widely accepted in mathematics and have even been given rigorous foundations that do away with the paradoxes that plagued their early use. Even though Galileo’s solution to Aristotle’s wheel did not last, the ideas it helped usher in transformed mathematics and remain with us to this day.

Hypergame and Cantor’s Theorem, Part II

As promised on Monday, we present here William Zwicker’s proof of Cantor’s Theorem. But first: Let’s play Hypergame!

Hypergame

We will be concerned with two-player games, where the players, who we will name Ada and Bertrand, alternate making plays. Let us say that a game is finite if it is guaranteed to end after a finite number of moves. For example, because of the 50-move rule, under which a game is a draw if 50 moves elapse in which no pawn is advanced and no piece is captured, chess is a finite game. In fact, there is a fixed upper bound (6000 is certainly enough) on the number of moves in a single chess game, but this is not true of all finite games. Consider the game in which the first player names a natural number, $N$, and the players then play $N$ games of chess. This game is guaranteed to end after finitely many moves ($6000 \cdot N$ is an upper bound), but there is no fixed upper bound before the game starts, as the first player could simply choose a larger number as $N$.

We are now ready to define Hypergame. The rules are as follows. On her first move, the first player names a finite game. The players then play a round of that game, beginning with the second player. For example, a round of Hypergame between Ada and Bertrand might go something like this:

Bertrand: f3

Bertrand: g4

Now for the crucial question: Is Hypergame a finite game?

Well, yes, obviously. One move of Hypergame consists of naming a finite game. The rest of the moves consist of a round of that finite game, which must, by definition, take only a finite number of moves. One plus a finite number is a finite number, so Hypergame must end after a finite number of moves and is thus a finite game.

But now, armed with the knowledge the Hypergame is a finite game, Ada and Bertrand may play the following round of Hypergame:

Bertrand: Let’s play Hypergame!

Bertrand: Let’s play Hypergame!

This is an infinite (and rather boring) round of Hypergame! But didn’t we prove in the previous paragraph that Hypergame is a finite game? What has gone wrong here? I’ll leave you to ponder this instructive question, pointing out that we haven’t really given a formal definition for a “game.” If you try to make this argument more rigorous by providing such a formal definition, does Hypergame actually qualify as a game?

Proof of Cantor’s Theorem

We now turn to Zwicker’s proof of Cantor’s Theorem, where the ideas from the Hypergame paradox will be put to good use. Recall that Cantor’s Theorem states that, for every set $X$, $|X| < |\mathcal{P}(X)|$. (Recall also that a review of relevant mathematical definitions is given here.)

Suppose Cantor’s Theorem is false and that there is a set $X$ such that $|X| \geq |\mathcal{P}(X)|$. This means that there is a function $f:X \rightarrow \mathcal{P}(X)$ that is onto, i.e. a function $f$ such that, for every $Y \in \mathcal{P}(X)$, there is $x \in X$ such that $f(x) = Y$. We will use this function to derive a contradiction, thus showing that such a function cannot exist and proving Cantor’s Theorem.

We first need a definition. Consider a sequence $\langle x_0, x_1, x_2, \ldots \rangle$ (the sequence could be finite or infinite) consisting of elements of $X$. We say this sequence is a sequence through $f$ if, for every element $x$ of the sequence, the next element of the sequence, if it exists, comes from $f(x)$, i.e. $x_1 \in f(x_0), x_2 \in f(x_1)$, etc.

As an example, suppose $X = \mathbb{N}$ and we have $f(3) = \{2, 17, 9\}$, $f(9) = \{3, 13, 8\}$, and $f(8) = \emptyset$. Then $\langle 3, 9, 8 \rangle$ is a (finite) sequence through $f$. If $f(4) = \{5\}$ and $f(5) = \{4\}$, then $\langle 4, 5, 4, 5, 4, \ldots \rangle$ is an infinite sequence through $f$.

Back to our general case. We say an element $x \in X$ is infinitary if there is an infinite sequence through $f$ starting with $x$. $x$ is finitary if there is no such infinite sequence. Let $Y$ be the set of all finitary elements of $X$. $Y \in \mathcal{P}(X)$, so, since $f$ is onto, there is $x \in X$ such that $f(x) = Y$.

Is $x$ finitary? Well, yes. This is the same as the argument that Hypergame is a finite game: Suppose $(x_0, x_1, x_2, \ldots)$ is a sequence through $f$ with $x_0 = x$. Then $x_1 \in f(x)$ and $f(x)=Y$, so, by our definition of $Y$, we know that $x_1$ is finitary. Since $(x_1, x_2, \ldots)$ is a sequence through $f$ starting with $x_1$ and $x_1$ is finitary, $(x_1, x_2, \ldots)$ must be a finite sequence. But then $(x_0, x_1, x_2, \ldots)$ must also be finite, as it has just one more element. There are thus no infinite sequences through $f$ starting with $x$, so $x$ is finitary.

But now we have a problem. In fact, it is the same problem we had with Hypergame. Since $x$ is finitary, we must have $x \in Y$, by the definition of $Y$. But this means that $(x, x, x, x, \ldots)$, the infinite sequence, all of whose elements are $x$, is an infinite sequence through $f$ starting with $x$. This contradicts the fact that $x$ is finitary and completes our proof.

P.S. Hypergame and Zwicker’s proof of Cantor’s Theorem were brought to my attention by Raymond Smullyan’s excellent book, Satan, Cantor and Infinity, which I would highly recommend to all readers of this blog.

Dreaming of Infinity

Infinity has been with me for most of my life. Of course, there were the games at school, competing with friends to use larger and larger numbers to quantify how much each of us liked a certain shared object of affection. Inevitably, one of us would boldly say “infinity”, after which one of two things would happen. The other might respond with “infinity plus one” and the game would essentially restart, with the additional rule that every play must be prefixed by the phrase “infinity plus.” Or the other might say, “Infinity’s not a number!” The teacher would then be summoned to resolve our dispute, usually with disappointingly unenlightening results.

But there was more than just this. As a child, maybe seven years old, lying awake at night, the darkness of my bedroom impelled my thoughts to the ends of the universe and to the timeless question, “Does the world go on forever?” I was bewildered. Trying to imagine either an ‘edge’ to the universe (what’s on the other side of the edge?) or a universe of infinite expanse, containing an infinite number of people like me thinking these same thoughts at the same time, filled me with a thrilling and irresistible terror. Night after night, I was drawn to this question the way we are drawn to peek ever further over the edge of a precipice.

Later, in a high school calculus class, I became further absorbed by the strangeness of infinity. One day, the teacher presented us with two puzzling scenarios. In the first, he simply drew the line segment

$y = 2x$     $0 \leq x \leq 1$

which, by establishing a one-to-one correspondence, shows that there are exactly the same number of real numbers between 0 and 2 as there are between 0 and 1, conflicting with our naive intuition that there should in fact be twice as many.

The second involved Gabriel’s Horn, the surface obtained by rotating the curve

$y = \frac{1}{x}$     $1 \leq x < \infty$

about the x-axis. He demonstrated that this surface (together with a circular “cap” at x = 1) encloses a region with finite volume but infinite surface area. Therefore, he claimed, the amount of paint that could fit inside this region would, absurdly, not be enough to coat its interior surface!

Both of these apparent paradoxes are (to a certain extent, at least) easily resolvable, but I was hooked and quickly moved on to other mysteries of the infinite. As a freshman in college, I subjected at least three of my suitemates to impromptu expositions of Cantor’s diagonal argument for the uncountability of the set of real numbers. As a graduate student, I decided to focus on set theory, often described as the mathematical study of infinity. Just as when I was a kid, I now lie in bed dreaming about infinity, though in a more rigorous, technical, and focused manner. Here, though, I want to move back a bit and take in a broader perspective. I want to look at infinity from different angles, through different lenses. Mathematical lenses, to be sure, but also philosophical, artistic, literary, historical, and scientific ones. And I want to share the view with others. I hope you will join me.