The transfinite subway

Today, we move from the countable to the uncountable and encounter another counter-intuitive infinite thought experiment. I first heard this one in a bar in Toronto in 2012, though it’s been around for much longer.

The math in this post is not extraordinarily difficult, but it is substantially more sophisticated than anything we’ve done here before. There is a new page, Ordinals and Cardinals, that contains some relevant definitions and facts. In particular, you will need to know what ordinals are and that \omega_1 is the least uncountable ordinal. We will also need the following fact:

  • Any countable union of countable sets is itself countable. As a consequence, if A is a countable subset of \omega_1 (i.e. A is a countable set of countable ordinals), then A is bounded below \omega_1 in the sense that there is \beta < \omega_1 such that, for every \alpha \in A, \alpha < \beta. This is illustrated in the following crude drawing.


Without further ado, down to the subway tracks.

There’s a subway line (the Transfinite Line) running from the airport to the Hilbert Hotel. The stations are labeled in order by ordinal numbers, with the airport at Station 0 and the Hilbert Hotel at Station \omega_1. The Transfinite Line has other peculiarities in addition to its transfinite length. On each run it makes from the airport to the hotel, the following rules must be followed. First, before leaving the airport, countably infinitely many passengers (i.e. one for each natural number) get on the train. Upon arriving at each later station prior to the Hilbert Hotel, two things happen:

  • First, if there are any passengers on the train when it arrives at the station, exactly one of them gets off. This passenger cannot board this train again during this run, either at this or any later station.
  • Second, countably infinitely many new passengers get on the train.

The question is simply this: How many passengers are on the train when it pulls into the Hilbert Hotel at Station \omega_1? Take a few moments to consider this.

A disappointingly finite subway line.

I imagine a typical thought process when first encountering this problem might be roughly as follows. First, you probably think, ‘There are so many more people getting on the train than there are getting off at each station, so there must be lots of people on the train, probably uncountably many, when the train arrives at the Hilbert Hotel.’ Thinking a bit more, you might notice that this problem bears a certain similarity to the problem discussed here on Monday about gold coins. You might realize that we haven’t specified which passenger gets off the train when the train arrives at a station, and you might suspect that, as was the case with the gold coins, the answer can vary depending on this choice. You might thus argue that we have not given enough information to determine the answer.

This is a tempting thought, but, somewhat surprisingly, it is incorrect. There is a definite answer to the question, and it is not dependent on the choice of which passenger gets off at each station. To see this, though, we’re going to have to do a bit of work. We first need some facts about functions from \omega_1 to \omega_1.

Definition: Suppose f:\omega_1 \rightarrow \omega_1 is a function. An ordinal \beta < \omega_1 is called a closure point for f if, for every \alpha < \beta, we also have f(\alpha) < \beta.

Lemma: Suppose f:\omega_1 \rightarrow \omega_1 is a function, and suppose \alpha < \omega_1. Then there is an ordinal \beta such that \alpha < \beta < \omega_1 and \beta is a closure point for f.

Proof: We will define an infinite increasing sequence of countable ordinals \alpha_0 < \alpha_1 < \alpha_2 < \ldots, one for each natural number, as follows. First, let \alpha_0 = \alpha. Next, consider the set A_0 = \{f(\xi) \mid \xi < \alpha_0\}, i.e. A_0 is the set of all values f takes when applied to ordinals less than \alpha_0. Since \alpha_0 is a countable ordinal, A_0 is a countable set. Therefore, A_0 is bounded below \omega_1, i.e. we can find \gamma_0 < \omega_1 such that, for all \eta \in A_0, \eta < \gamma_0. Choose \alpha_1 < \omega_1 to be large enough so that \alpha_1 is larger than both \alpha_0 and \gamma_0.

To define \alpha_2, repeat the process, but with \alpha_0 replaced by \alpha_1. More precisely, let A_1 = \{f(\xi) \mid \xi < \alpha_1\} and choose \alpha_2 < \omega_1 large enough so that \alpha_2 > \alpha_1 and, for all \eta \in A_1, \alpha_2 > \eta. Continue in this way to define \alpha_n for every natural number n.

At the end of this process, let \beta be the least ordinal such that \beta > \alpha_n for every natural number n. Since there are only countably many \alpha_ns, we know that \beta < \omega_1. It is clear that \alpha < \beta. I claim that \beta is a closure point for f. To see this, we must show that, for every \xi < \beta, f(\xi) < \beta. So let \xi < \beta. Since \beta is the least ordinal greater than all of the \alpha_ns, there is some m < \omega such that \xi < \alpha_m. But then f(\xi) \in A_m and, when we defined \alpha_{m+1}, we made sure that \alpha_{m+1} > \eta for every \eta \in A_m. Therefore, f(\xi) < \alpha_{m+1} < \beta, so \beta is indeed a closure point for f. This completes the proof of the Lemma. \square

Now let us return to our subway. Suppose we consider a specific run of the train. For each ordinal \alpha < \omega_1, let B_\alpha be the set of \eta < \omega_1 such that a passenger who boarded the train at Station \alpha got off the train at Station \eta. Since only countably many passengers got on at Station \alpha, B_\alpha is a countable set and therefore is bounded below \omega_1.

We now define a function f:\omega_1 \rightarrow \omega_1 as follows. For every \alpha < \omega_1, let f(\alpha) be the least ordinal \gamma such that \gamma > \eta for all \eta \in A_\alpha (i.e. f(\alpha) is the least strict upper bound for A_\alpha).

Claim: If \beta < \omega_1 is a closure point for f, then the subway was empty upon arriving at Station \beta.

Proof: Let \beta < \omega_1 be a closure point for f, and suppose that the train was not empty when it arrived at Station \beta. We will derive a contradiction.

Since the train was not empty when it arrived at the station, one passenger had to disembark at Station \beta. Therefore, there must have been some \alpha < \beta such that this passenger got on the train at Station \alpha. Therefore, there is a passenger who got on the train at Station \alpha and off at Station \beta, so \beta \in A_\alpha. By our definition of f, this means that f(\alpha) > \beta. However, \beta is a closure point for f and \alpha < \beta, which means f(\alpha) < \beta. This is a contradiction, and we have finished the proof of the claim. \square

We are now ready to reach our conclusion.

Solution: The train was empty when it arrived at the Hilbert Hotel!

Proof: Suppose the train was not empty. We will again reach a contradiction.

Since the train was not empty, we can choose a passenger who was on the train when it got to the hotel. There must be an ordinal \alpha < \omega_1 such that this passenger boarded the train at Station \alpha. But now we can invoke our Lemma and find some ordinal \beta < \omega_1 such that \alpha < \beta and \beta is a closure point for f. But now, by our previous claim, the train was empty upon arriving at Station \beta, so our passenger who got on at Station \alpha must in fact have gotten off before Station \beta and, by the rules of the subway, could not have reboarded the train. This passenger therefore could not have been on the train when it arrived at the Hilbert Hotel! \square

Comparing this with the result from the gold coins puzzle, we get our first indication that the class of infinities is not some undifferentiated mass and that different sizes of infinity can in fact behave quite differently. We will hopefully see many more such instances of this phenomenon in later posts.

The real lesson here, though, is that if you’re going to the Hilbert Hotel, don’t take the Transfinite Line; you’ll never get there. Instead, shell out the extra money to take a cab.

P.S. You may notice that this problem is not exactly analogous to the gold coins problem in the following sense. In this problem, the passenger who gets off at a given station must have gotten on at a strictly earlier station, whereas, in the gold coins problem, on a given night, the demon is allowed to remove a coin that it placed on that night. Indeed, if the rules of the subway were revised so that, at each station, the passengers boarded the train before one passenger (possibly one who had just gotten on) disembarked, then we are in the same indeterminate situation we were in with the gold coins. (Of course, this would go against standard subway etiquette.) However, there still is a real difference between the gold coin scenario and this one. If we restrict the demon so that it can only remove coins that were placed on previous nights, we have not actually changed anything and are still in an indeterminate case (Exercise: Why?). The mathematical difference responsible for this is that, though functions from \omega_1 to \omega_1 necessarily have closure points, functions from \omega to \omega might not.

On demons and gold coins

One of the features of infinite sets (in the presence of the usual axioms of set theory, this can in fact be taken as the defining feature) is that an infinite set has proper subsets of the same cardinality. For example, the set of even natural numbers is a proper subset of the set of all natural numbers, yet the two sets have the same cardinality. (This feature of infinite sets is responsible for the following fact about infinite limits that you likely learned in high school calculus: \infty - \infty is on the famous list of ‘indeterminate forms.’ If you try to evaluate an infinite limit and obtain \infty - \infty, then you do not yet have an answer. You must be more clever.) The same is obviously not true of finite sets; if you take an element away from a finite set, it always becomes strictly smaller.

For this and other reasons, our well-developed intuitions about finite sets, which we encounter every day, often fail to extend to the realm of the infinite. This week, we will look at infinite situations in which our intuition fails us. And in each case, the root of this failure of intuition will be the fact that infinite sets have proper subsets of the same cardinality. Today, we will look at the countably infinite (i.e. at infinite sets of the same cardinality as the set of natural numbers). On Thursday, we will move to the uncountable, which will require us to develop some more mathematical sophistication.

I must first make an obligatory mention of Hilbert’s Hotel, David Hilbert’s famous thought experiment about a hotel with infinitely many rooms that has no vacancy but always has room for more guests. Hilbert’s Hotel is one of those subjects that has been dealt with well by many people, so I’ll let TED-Ed and Jeff Dekofsky do my work for me.

What I really want to share today, though, is a thought experiment brought to my attention by Miklós Erdélyi-Szabó, the professor of my mathematical logic class during the semester I spent in Budapest. The situation is as follows. You have a room. Every night, while you are asleep, a demon comes into your room. First, the demon puts two gold coins into your room. Next, the demon removes one gold coin from your room. Then he leaves. Otherwise, he does nothing, and the gold coins are otherwise untouched and, unless removed by the demon one night, allowed to accumulate in your infinitely expandable room. How many gold coins are in your room after infinitely many nights? (Where ‘infinitely many’ here means what you think it means: a countably infinite sequence of nights whose temporal order is the same as that of the natural numbers.)

Your first thoughts are probably roughly like this: at the end of each night, you have one more gold coin than you had when the night started. Therefore, at the end of infinitely many nights, you must have infinitely many coins. Upon further thought, though, you might begin to feel uneasy. You might realize that what you’re dealing with is something like \infty - \infty (the demon has added infinitely many coins to your room but also taken infinitely many away) and conclude that the situation is indeterminate. And here you would be right. In fact, in my statement of the problem, I have not given you enough information to determine the answer.

Let’s reformulate the problem, being a bit more careful. In fact, let’s formulate the problem in two different ways, using our old friends Ada and Bertrand, who we last saw playing an infinite round of Hypergame. Ada and Bertrand are neighbors, and they live around the corner from the Hilbert Hotel. Every night, a demon comes into each of their rooms, and, in each one, first adds two gold coins to the room and then takes one gold coin away and destroys it. This demon is tidy, though, and, in each room, places all of the gold coins in a single stack. Once the coins have been put in the stack, their relative order never changes as long as they remain in the room. In Ada’s room, every night the demon puts two gold coins on top of the stack and them removes a coin from the top of the stack. In Bertrand’s room, the demon puts two gold coins on top of the stack and then removes a coin from the bottom of the stack. After infinitely many nights, how many gold coins do Ada and Bertrand each have?

Take a minute to think about this problem now, and then come back for the solution.


At first glance, Ada and Bertrand seem to be in exactly the same situation. But more careful analysis will reveal that, counter-intuitively, their fortunes diverge wildly after infinitely many nights.

To allow us to think about the problem, let’s fix some notation. Suppose that the coins the demon adds to each room are numbered (mentally, if not physically). Suppose the midnight visits start on Night 0 and, on Night n, where n is any natural number, the demon adds Coins 2n and 2n+1 to the top of the stack in each room, with Coin 2n put on the stack first and Coin 2n+1 put on top of it. Thus, on Night 0, Coins 0 and 1 are added to each room; on Night 1, Coins 2 and 3 are added to each room, and so on.

Let’s first consider the case of Ada. On Night n, after Coins 2n and 2n+1 are added to her stack, the top coin of the stack is removed. The coin that is removed on Night n, therefore, is Coin 2n+1. The coins that get removed from Ada’s stack, then, are exactly the odd-numbered coins. The even-numbered coins stay forever. After infinitely many nights, Ada still has all of the even-numbered coins; in particular, she has infinitely many coins.

Now turn to Bertrand. Let’s think about the first few nights. On Night 0, Coins 0 and 1 get added. Coin 0 is then the bottom coin on the stack, so it gets removed. On Night 1, Coins 2 and 3 get added. The bottom coin on the stack after this, though, is Coin 1, so Coin 1 gets removed. Similarly, on Night 2, Coins 4 and 5 get added and Coin 2 gets removed. You should see an obvious pattern here, and indeed it’s easy to prove that, on Night n, the demon always removes Coin n from Bertrand’s room. I thus claim that there are no coins left in Bertrand’s room after infinitely many days. For how could there be any? If there is even a single coin in Bertrand’s room, it would have to have a number. Suppose this number is m. But now you have an immediate contradiction, since Coin m was removed from Bertrand’s room and destroyed on Night m! Even though Bertrand’s stack of coins was steadily growing throughout the process and is at all times the same size as Ada’s, in the end he is left with nothing.

P.S. The story has a happy ending, even for Bertrand. During this whole process, Ada had a lot of time to spend on number theory (Bertrand just played tennis). At the end, feeling sorry that Bertrand’s fortune vanished, she generously(?) told him that he could have any of her remaining coins whose number cannot be expressed as the sum of two prime natural numbers. So now he has at least two gold coins and knows the answer to Goldbach’s conjecture!


Forcing Axioms and Ultimate L

Mathematics has a reputation for objectivity. But without real-world infinite objects upon which to base abstractions, mathematical truth becomes, to some extent, a matter of opinion — which is Simpson’s argument for keeping actual infinity out of mathematics altogether. The choice between V=ultimate L and Martin’s maximum is perhaps less of a true-false problem and more like asking which is lovelier, an English garden or a forest?

-Natalie Wolchover, “To Settle Infinity Dispute, a New Law of Logic”

Today, a little taste of contemporary set theory, the type of mathematics I actually work on. I point you towards Natalie Wolchover’s 2013 piece for Quanta Magazine about some recent work in set theory, particularly relating to the Continuum Hypothesis. The piece isn’t perfect, and certain things are a bit caricatured, but overall it’s pretty nice work.

Infinity in the Classroom

Although it seems to be one of the most confounding things in mathematics, infinity can be a gateway drug to deeply personal mathematical experiences. It connects instantly to big, personal questions about life and death, power and control, the beginning of time and the end of the Universe. All of that makes infinity a great focal point for improving Common Core, and for sharpening mathematical literacy in the schools more generally. But really, everyone would benefit from more close encounters of the infinite kind.

-Sarah Scoles, “How thinking about infinity changes kids’ brains on math”

I’m on vacation this week, so there will be no new original material on Point at Infinity. Instead, I will point you towards a couple of interesting things other people have written.

Today, we look at a piece written for Aeon Magazine by Sarah Scoles about how thinking about infinity can lead to real experiential engagement with mathematics, something unfortunately rare in many classrooms. Enjoy!

Points at Infinity II: Topology

On Monday, we investigated the appearance of points at infinity in projective geometry. Today, we turn our attention to a different field of mathematics: topology.

Topology is a favorite field among writers of popular mathematics. It is often described as rubber-sheet geometry, a field in which a donut is the same thing as a coffee mug, a field in which mathematicians think about such fanciful objects as the Möbius strip or the Klein bottle. This is fine, but here we’re going to look at the basics of topology from a more down-to-earth vantage point.

Klein bottle

Topology is the study, unsurprisingly, of topological spaces. And what is a topological space? It is a set X together with a set, often called \tau, of distinguished subsets of X satisfying certain rules. \tau is then called a topology on X, and the elements of \tau are called open sets. The rules that such a pair (X, \tau) must satisfy in order to earn the name topological space are as follows.

  1. X \in \tau and \emptyset \in \tau.
  2. Any union of (possibly infinitely many) elements of \tau is in \tau.
  3. If Y_0 and Y_1 are in \tau, then their intersection, Y_0 \cap Y_1, is in \tau.

To illustrate this definition, we consider two examples that you are already familiar with, whether or not you know it. First, let X be any set, and let \tau consist of all subsets of X, i.e. \tau = \mathcal{P}(X). (X, \tau) easily satisfies the three axioms listed above and is known as the discrete topology on X.

Second, consider the set of real numbers, \mathbb{R}. You probably learned in elementary school that, if a < b are real numbers, then (a,b) is the set of all real numbers x such that a < x < b, and (a,b) is called the open interval between a and b. We can also consider open intervals of the form (-\infty, a), (a, \infty), and (-\infty, \infty), defined in the obvious way. We can also say that, if a is a real number, then (a,a) = \emptyset. If we now let \tau consist of all subsets of \mathbb{R} that can be written as unions of open intervals, then (\mathbb{R}, \tau) is a topological space, and \tau is often referred to as the standard or order topology on \mathbb{R}.

The axioms for topological spaces are very broad and capture a great variety of different structures. This is of course appealing, but it also makes it difficult to prove interesting theorems about arbitrary topological spaces. Therefore, topologists typically consider topological spaces with certain additional properties. One of the most important of these properties, which gradually emerged in the 19th century through the investigation of infinite sequences and continuous functions of real numbers, is compactness.

Very vaguely speaking, a topological space is compact if certain phenomena in the space can be captured by a finite amount of information. But let us be precise. First, if (X, \tau) is a topological space and \mathcal{O} is a subset of \tau (i.e. \mathcal{O} is a collection of open sets), then we say \mathcal{O} is an open cover of (X, \tau) if the union of the elements of \mathcal{O} is all of X, i.e. for all x \in X, there is Y \in \mathcal{O} such that x \in Y. We say that (X, \tau) is compact if, whenever \mathcal{O} is an open cover of the space, then there is a finite subset of \mathcal{O} that is also an open cover. Put more elegantly, (X, \tau) is compact if every open cover of X has a finite subcover.

Let us examine our two examples of topological spaces to determine whether or not they are compact. First, consider the discrete topology on the set of natural numbers, i.e. the topological space (\mathbb{N}, \tau), where \tau = \mathbb{P}(\mathcal{N}). Let \mathcal{O} be the collection of all 1-element subsets of \mathbb{N}, i.e. \mathcal{O} = \{\{0\}, \{1\}, \{2\}, \ldots \}. Then \mathcal{O} is an open cover for our space, but it has no finite subcover. Indeed, if we remove any element from \mathcal{O}, then it no longer covers our space (for example, if we remove \{3\}, then we no longer cover 3). Therefore, (\mathbb{N}, \tau) is not compact.

A similar phenomenon happens in (\mathbb{R}, \tau), where \tau is the standard topology on \mathbb{R}. In this case, let \mathcal{O} be the set of all open intervals (-n, n), where n is a natural number. Then \mathcal{O} is an open cover of (\mathbb{R}, \tau), but it has no finite subcover. Indeed, if \mathcal{O}' is any finite subset of \mathcal{O}, then there is some largest natural number n such that (-n,n) \in \mathcal{O}'. Then n is not covered by \mathcal{O}'.

In both of these cases, you could say that the underlying reason that our space fails to be compact is that there is a sequence of elements of the space such that no subsequence of it converges to any element of the space. The sequence “escapes” our space. It “goes to infinity.” For both of our examples, this can be illustrated by the sequence of natural numbers 0, 1, 2, \ldots .

Compactness is a very appealing property of a topological space, and one can prove all sorts of nice things about compact spaces, so one might be interested in taking an arbitrary topological space (X, \tau) that is not compact and studying it by enlarging it to make it compact. If one enlarges it too much, though, the new space might not have anything to do with (X, \tau), so we wouldn’t be able to learn anything about (X, \tau) from this new space. We will speak imprecisely here and say that a compactification of a space (X, \tau) is a topological space that is compact and extends (X, \tau) but doesn’t extend (X, \tau) too much. (Somewhat more precisely, we require that X is dense in the new space.)

How can we compactify a space? Well, we saw that a typical reason that a space fails to be compact is that it has a sequence that escapes to infinity. Therefore, you might naively think, we should simply add points at infinity to capture these sequences. And you would be correct! (Although you would have to be a little bit more specific.)

Any space (X, \tau) has many compactifications, but two are of particular interest. There is a largest (in some sense) compactification of (X, \tau) called the Stone–Čech compactification. We will return to this at a later date. There is also a smallest compactification, the Alexandroff compactification, which consists of just adding a single point at infinity to the space.

Let us see how the Alexandroff compactification works on the space (\mathbb{N}, \tau), where \tau is the discrete topology on \mathbb{N}. We will add one point, which we will suggestively call \infty, and some new open sets containing \infty. Intuitively, we want any open set containing \infty to also contain all sufficiently large natural numbers. More precisely, the compactification will be the space (\mathbb{N} \cup \{\infty\}, \tau'), where \tau' contains all of the sets in \tau plus all sets of the form \{\infty\} \cup Y, where Y is any subset of \mathbb{N} containing all but at most finitely many natural numbers.

You can check that (\mathbb{N} \cup \{\infty\}, \tau') is indeed a topological space. Let us prove that it is compact. To do this, consider an open cover \mathcal{O} of our space. We must find a finite subcover of \mathcal{O}. We first take care of \infty. Since \mathcal{O} covers our space, it must have an element that contains \infty. Thus, it must have an element of the form \{\infty\} \cup Y, where Y contains all but finitely many natural numbers. Now, for each natural number m that is not in Y, m is also covered by \mathcal{O}, so we can choose a set U_m \in \mathcal{O} such that m \in U_m. Now consider \mathcal{O}' = \{Y\} \cup \{U_m \mid m \not\in Y\}. \mathcal{O}' is a finite subset of \mathcal{O}, and we claim that it still covers the entire space. Indeed, \infty and every natural number in Y is covered by Y, and every natural number m not in Y is covered by U_m. We have thus proven that (\mathbb{N} \cup \{\infty\}, \tau') is indeed compact!

We will not explicitly describe the Alexandroff compactification of the standard topology on the real numbers, but we mention in passing that it has an appealing intuitive description: it is equivalent (homeomorphic, to be precise) to the standard topology on the circle! Essentially, the space is compactified by taking the two ‘ends’ of the real line and using a new point (called \infty, of course), to ‘stick’ them together, thus creating a circle. Similarly, the Alexandroff compactification of the real Euclidean plane is the sphere; the identification of these two spaces involves the stereographic projection, a function that finds application not just in mathematics but in fields such as cartography, crystallography, geology, and photography (in certain fish-eye lenses).

Paris compactified: photograph taken with a stereographic fish-eye lens. By Alexandre Duret-Lutz from Paris, France – Paris s’éveille, CC BY-SA 2.0,

A similar construction will work on an arbitrary topological space (X, \tau), and this illustrates again the conceptual power of infinity: any topological space, no matter how pathologically incompact it may be, can be made compact by adding a single point at infinity!

Points at Infinity I: Projective Geometry

It would … be the number of the point at which Achilles overtakes the tortoise—if he does overtake him—by exhausting all the intervening points successively. Or it would be the number of the stars, in case their counting could not terminate. Or again it would be the number of miles away at which parallel lines meet—if they do meet. It is, in short, a ‘limit’ to the whole class of numbers that grow one by one, and like other limits, it proves a useful conceptual bridge for passing us from one range of facts to another.

-William James, Some Problems of Philosophy

This site is called ‘Point at Infinity’ in part because that is exactly what it attempts to do. Everything we experience directly in life is finite. Infinity fascinates us, but we cannot touch it. The best we can do is ‘point’ at it.

But the phrase ‘point at infinity’ also has technical mathematical meanings, and this week we will explore two of them. Today: points at infinity in projective geometry.

We start our story in Renaissance Italy with the development of the method of perspective by Brunelleschi and other painters and architects. In brief, the use of perspective allows for an accurate depiction of three-dimensional scenes on two-dimensional surfaces. The method can be imagined as follows: Consider the two-dimensional surface (a canvas, say) as a windowpane placed between the observer and the scene to be depicted. For every element of the scene, draw a line between that element and the observer’s eye. The point where that line passes through the windowpane is the point where it should appear in the two-dimensional image. (A woodcut by Albrecht Dürer depicting exactly this process is the title image to this post.)

-Pietro Perugino, The Delivery of the Keys

One of the most important realizations of the early adopters of perspective is that, although this method of transferring three-dimensional scenes onto two-dimensional surfaces does preserve the straightness of straight lines, it does not preserve the parallelity of a pair of parallel lines. In particular, if, in the three-dimensional scene, you have a set of mutually parallel lines that are not themselves parallel to the two-dimensional surface, then, when they are transferred onto the surface, they all meet in a single point. Such a point is called a vanishing point.

Parallel railroad tracks provide the quintessential example of a vanishing point. (Zorba the Geek) / CC BY-SA 2.0

The mathematics of perspective was studied on and off for the next few hundred years, but it wasn’t until the work of Gaspard Monge and his student, Jean-Victor Poncelet, in the late 18th and early 19th centuries, that it really took off and became the basis for a field known as projective geometry.

I cannot possibly give a proper introduction to projective geometry here, but that will not stop me from trying. One way to motivate the study of projective geometry is to point out some possible shortcomings of Euclidean geometry. One of these shortcomings is its occasional inelegance. Euclidean geometry proofs must often accommodate annoying ‘special cases.’ For instance, in Euclidean geometry, any two distinct lines meet in exactly one point unless the lines are parallel. A pair of distinct circles may intersect in zero, one, or two points depending on their relative size and position. And so on. Proofs involving consideration of these special cases are, of course, perfectly correct, but, for mathematicians, the elegance of a proof is often as importance as its correctness. It was gradually realized that projective geometry offered certain advantages over Euclidean geometry, providing a more elegant, richer environment for investigation. Over the nineteenth century, projective geometry became the central focus of the growing field of algebraic geometry.

Like Euclidean geometry, projective geometry can be described by a list of axioms. One way to view these axioms is to think of them as the consequences of extending Euclidean geometry by requiring that every pair of distinct lines intersects in a single point. For example, the axioms for a projective plane are as follows.

  1. Given any two distinct points, there is exactly one line containing them.
  2. Given any two distinct lines, there is exactly on point on both lines.
  3. There are four points such that no line contains more than two of them.

(The third axiom is essentially present to exclude degenerate cases.) An interesting fact is that there are finite models of the projective plane axioms. The smallest and most well-known is the Fano plane, which has seven points and seven lines, each line containing three points.

The Fano plane. It is left to the reader to verify that it satisfies the projective plane axioms.

The study of finite projective planes is a fascinating and vibrant area of mathematics, but, this being a blog about infinity, we will not say more about them. Instead let us turn to the real projective plane, denoted by \mathbb{RP}^2, which can be thought of as the projective equivalent of two-dimensional Euclidean space. We will describe the real projective plane in two ways.

First, recall that \mathbb{R}^3 denotes three-dimensional Euclidean space and consists of all triples (a,b,c) of real numbers. The points of \mathbb{RP}^2 can be thought of as the points (a,b,c) in \mathbb{R}^3 such that at least one of a, b, c is non-zero, under the additional requirement that, if (a_0,b_0,c_0) and (a_1, b_1, c_1) are two such points and there is a real number r such that (a_0, b_0, c_0) = (ra_1, rb_1, rc_1), then (a_0, b_0, c_0) and (a_1, b_1, c_1) actually denote the same point in \mathbb{RP}^2. Another way of saying this is that, if the line passing through (a_0, b_0, c_0) and (a_1, b_1, c_1) passes through the origin, then (a_0, b_0, c_0) and (a_1, b_1, c_1) are the same point in \mathbb{RP}^2. Put more elegantly, the points of \mathbb{RP}^2 are precisely the lines through the origin of \mathbb{R}^3.

This should, with good reason, remind one of our description of the method of perspective. Indeed, suppose that the observer’s eye is placed at the origin. Now consider two points in the three-dimensional scene. If the line containing these points also passes through the observer’s eye, then these two points will be drawn at the same place on the canvas and therefore should be thought of as the same point in \mathbb{RP}^2.

I have so far described the points but not the lines of the real projective plane. Essentially, lines in \mathbb{RP}^2 correspond to planes through the origin in \mathbb{R}^3. More precisely, if (a,b,c) is a point in \mathbb{RP}^2, then the set of all points (x,y,z) satisfying the equation ax + by + cz = 0 describes a line in \mathbb{RP}^2, and all of the lines are described in this way. Moreover, as one would hope, if (a_0, b_0, c_0) and (a_1, b_1, c_1) describe the same point in \mathbb{RP}^2, then they also determine the same line. The interested reader is left to verify that this construction satisfies the projective plane axioms.

We next offer a more intuitive description of \mathbb{RP}^2. Start with the two-dimensional Euclidean plane, \mathbb{R}^2, consisting of all pairs of real numbers. Now, for each class of parallel lines in \mathbb{R}^2 (i.e. for each possible ‘slope’, where the slope of the vertical line is considered to be \infty), add a new point at infinity. \mathbb{RP}^2 then consists of all of the points of \mathbb{R}^2 plus all of the new points at infinity. The lines of \mathbb{RP}^2 are the lines of \mathbb{R}^2 (where each such line now includes the point at infinity associated with its slope) plus a single new line that contains all of the points at infinity. The points at infinity thus represent the points where the parallel lines of Euclidean space meet. They are the vanishing points of perspective drawing.

Why are these two descriptions the same? To illustrate, I will provide a translation from the first description to the second. First, consider a point (a,b,c) from our first characterization, and suppose that c \neq 0. The line in \mathbb{R}^3 containing the origin and (a,b,c) intersects the horizontal plane lying one unit above the origin at the point (\frac{a}{c}, \frac{b}{c}, 1). We then identify the point (a,b,c) from the first description with the real point (\frac{a}{c}, \frac{b}{c}) from the second description.

What if c = 0? In this case, the line in \mathbb{R}^3 passing through the origin and (a,b,c) does not meet the horizontal plane one unit above the origin; the two are parallel. But parallel objects meet at infinity, so (a,b,c) will correspond to a point at infinity from our second description, namely the point at infinity corresponding to the slope \frac{b}{a} (or \infty, if a = 0).

And the lines? Suppose (a,b,c) determines a line in our first description. If it is not the case that a=b=0, then this corresponds to one of the lines in \mathbb{R}^2 (if my hasty calculations are correct, the line y=-\frac{a}{b}x-\frac{c}{b} or, if b = 0, the vertical line x = -\frac{c}{a}). If a=b=0, then this line corresponds to the line containing the points at infinity in the second description.

I should note here that, although the second description of \mathbb{RP}^2 seems to make a distinction between the ‘real’ points of \mathbb{R}^2 and the ‘points at infinity,’ this distinction is at its core illusory. Indeed, in my translation from the first to the second description of the real projective plane, I could just have easily have decided that the points (a,b,c) satisfying a = 0 should correspond to the points at infinity instead of the points satisfying c = 0. There are in fact infinitely many ways I could translate the first description to the second, each of which would identify a different set of points from the first description with the set of points at infinity from the second. The real projective plane exhibits a profound symmetry which I have only been able to begin to hint at here.

I do not expect that, after this hurried introduction, the reader has developed an intuitive feel and appreciation for projective geometry. But I extend an invitation to spend some time in the real projective plane, to get to know it, to recognize its power, the unity created merely by the addition to Euclidean space of points at infinity. And then I invite the reader to go to an art museum, to revisit masterpieces of perspective with potentially new eyes, to ponder the violent breaking of perspective that happened in the early 20th century with the Cubists, who rejected the assumption that a painting should represent the viewpoint of a single observer at a single point in space, to wonder if perhaps this seemingly chaotic art is in fact at heart an illustration of the elegant symmetries of projective geometry.

-Georges Braque, Violin and Candlestick

P.S. As a bonus, here’s a third (and probably unhelpful) description of the real projective plane. Start with a Möbius strip. Now glue the Möbius strip’s single edge to itself, preserving direction (of course, this cannot be done in three dimensions, so first move to a four-dimensional space). Now you have a real projective plane!

P.P.S. If I were a different writer, I might have ended this post by going back to considering ‘Point at Infinity’ as a name for this blog in light of what we have learned about projective geometry. I might have made the case that this blog, or perhaps the idea of infinity, is itself a ‘point at infinity’ where the parallel pursuits of art and mathematics meet. I might have tried to obscure the fact that, on closer inspection, this analogy doesn’t hold up very well at all. But I am not a different writer, and I will refrain from this indulgence.

Aristotle’s Wheel, Galileo, and the Jesuits

Today, we look at another classical paradox: Aristotle’s wheel. The paradox was introduced in the text Mechanica, attributed, not without controversy, to Aristotle. It runs as follows. Consider two circular wheels, fixed rigidly, one within the other. The wheels have the same center, but the radius of the outer wheel is twice that of the inner wheel. Suppose this combined wheel rolls without slipping for exactly one full revolution, and consider the paths traced by the bottoms of the two wheels. These paths are evidently equal in length to the circumferences of the respective circles, yet the two paths are the same length, while the circumference of the outer wheel is twice that of the inner wheel. This would seem, then, to yield a contradiction.

An illustration of Aristotle’s wheel.

Unlike Zeno’s paradoxes, discussed on Monday, Aristotle’s wheel contains no real mystery today. Some combination of the following two observations should be enough to convince you of this.

  1. It is physically impossible for the two joined wheels to roll without at least one of them “slipping” relative to the ground. Therefore, there is no reason to think that the paths traced by the bottoms of the wheels are equal in length to the circumferences of the respective wheels.
  2. Even though the length of one of the paths may not be equal to the circumference of the wheel that creates it, the set consisting of the points on the path and the set consisting of the points on the circumference of the wheel have the same cardinality, so there is no contradiction in there being a one-to-one correspondence between points on the circumference of the wheel and points on the path.

If this paradox can be resolved in such a straightforward manner, you may be wondering why I decided to dedicate an entire post to it. The reason is that Aristotle’s wheel plays a significant role in Galileo’s last published work, the influential Discourses and Mathematical Demonstrations Relating to Two New Sciences, and Galileo’s solution to the paradox is both remarkable in its own right and influential in the history of mathematics and physics.

To attack the problem of Aristotle’s wheel, Galileo makes a move that had been common at least since ancient Greece: reasoning about circles by approximating them with regular polygons. To illustrate his ideas, Galileo considers the case in which the two wheels are not circular but rather are regular hexagons. He then considers what happens when this hexagonal wheel “rolls” (or, rather, lurches in six discrete steps) along the ground for one full revolution. The situation is illustrated in the diagram below.

Diagram from Galileo’s Discourses

Consider first the outer hexagonal wheel. Initially, the wheel is at rest, with side AB resting on the ground. When the wheel makes its first step in its “roll,” it pivots around point B, and side BC comes to rest on the ground, occupying the segment BQ. After the second step, side CD comes to rest on the segment QX, and so on. Through the course of the wheel’s revolution, the entire segment AS is thus covered successively by sides of the outer wheel. Therefore, the length of the segment AS is equal to the perimeter of the outer wheel.

Now consider the inner hexagonal wheel. Initially, side HI is resting on an initial segment of HT. After the first step of the revolution, though, side IK does not come to rest on the segment IO but rather “jumps ahead” and lands on the segment OP. Similarly, after the second step, side KL jumps across the segment PY to land on the segment YZ, and so on. Therefore, the parts of the segment HT that are covered by sides of the inner wheel during the revolution alternate with parts that are skipped over. This explains why AS is the same length as HT (or, rather, HT extended by a bit equal in length to one of the sides of the inner wheel, as shown in the diagram) while the perimeter of the outer wheel is twice that of the inner wheel.

The same situation holds for polygonal wheels of any number of sides. Thus, if, for example, the wheels are regular 100,000-gons, then the lower path will be entirely covered by the sides of the outer wheel in its revolution, while the upper path will be split into 200,000 equal pieces, and these pieces will alternately be covered or skipped over by the sides of the inner wheel in its revolution. Put another way, the path traced by the bottom of the inner wheel will consist of 100,000 pieces, each the length of one of the sides of the wheel, interspersed with 100,000 “voids” of equal length. The path traced by the outer wheel will have no such voids. As a polygonal wheel gets more and more sides, though, it more and more closely approximates a circle (a circle could even be seen as a regular polygon with infinitely many infinitely short sides), so, Galileo argues, a similar situation must hold in the case of circular wheels.

I will let Galileo explain this idea in his own (translated) words:

Let us return to the consideration of the above mentioned polygons whose behavior we already understand. Now in the case of polygons with 100000 sides, the line traversed by the perimeter of the greater, i. e., the line laid down by its 100000 sides one after another, is equal to the line traced out by the 100000 sides of the smaller, provided we include the 100000 vacant spaces interspersed. So in the case of the circles, polygons having an infinitude of sides, the line traversed by the continuously distributed [cantinuamente disposti] infinitude of sides is in the greater circle equal to the line laid down by the infinitude of sides in the smaller circle but with the exception that these latter alternate with empty spaces; and since the sides are not finite in number, but infinite, so also are the intervening empty spaces not finite but infinite. The line traversed by the larger circle consists then of an infinite number of points which completely fill it; while that which is traced by the smaller circle consists of an infinite number of points which leave empty spaces and only partly fill the line. And here I wish you to observe that after dividing and resolving a line into a finite number of parts, that is, into a number which can be counted, it is not possible to arrange them again into a greater length than that which they occupied when they formed a continuum [continuate] and were connected without the interposition of as many empty spaces. But if we consider the line resolved into an infinite number of infinitely small and indivisible parts, we shall be able to conceive the line extended indefinitely by the interposition, not of a finite, but of an infinite number of infinitely small indivisible empty spaces.

In essence, what Galileo is saying is this: the reason that the paths traced by the bottoms of the wheels can be the same length is that the path traced by the inner wheel consists of infinitely many points interspersed with infinitely many infinitely small empty spaces, while the path traced by outer wheel consists only of the points and not the empty spaces!

One should not let the fact that Galileo’s solution is, by modern standards, misguided at best detract from its remarkable inventiveness or take away from the tremendous influence it and related ideas have had on mathematics and science. The ideas in Galileo’s exposition of Aristotle’s wheel are central to the development, also in the Discourses, of his celebrated law of free fall, which states that if an object starts moving from rest with uniform acceleration (as does (approximately) an object in free fall), then the distance traveled by the object is proportional to the square of the time during which it is moving. This law, commonplace today to any physics student, was groundbreaking in its time and anticipated Newton’s famous laws of motion. The Discourses also appeared at the beginning of a period of renewed interest in the question of the composition of the continuum and a revolution ushered in by the increasing acceptance of the use of infinitesimal quantities in mathematics. Galileo’s work on infinitesimals was extended and refined through the 17th century by mathematicians such as Cavalieri, Torricelli, and Wallis, who paved the way for the development of the infinitesimal calculus at the end of the century by Newton and Leibniz.

Like any radical idea, the mathematical use of infinitesimals was not immediately and universally accepted. In fact, there was strong opposition to the idea, most prominently (though certainly not exclusively) from the Catholic Church and especially the Jesuits, who were seeking to reestablish order and hierarchy in the wake of the chaos brought about by the Reformation. In the 17th century, the use of infinitesimals had not been provided with a rigorous mathematical foundation, and paradoxes frequently arose from their indiscriminate application. This was seen as threatening to the status of mathematics, as exemplified by Euclidean geometry, as an orderly realm of absolute certainty and, in some circles, as a model for theology and indeed for society as a whole. As Amir Alexander asserts in his book Infinitesimal: How a Dangerous Mathematical Theory Shaped the Modern World:

The infinitely small was a simple idea that punctured a great and beautiful dream: that the world is a perfectly rational place, governed by strict mathematical rules … By demonstrating that reality can never be reduced to strict mathematical reasoning, the infinitely small liberated the social and political order from the need for inflexible hierarchies.

I suspect that Alexander may be exaggerating the importance of the controversy over infinitesimals in the social history of Europe, but there is no doubt that the Church came down strongly against them. Many times through the first half of the 17th century, the Jesuit Revisors, who were in charge of determining what could or could not be taught in the many Jesuit colleges throughout the world and therefore, in effect, what ideas would be endorsed or condemned by the Catholic Church, issued rulings against the doctrine of infinitesimals. For example, the following is from such a ruling in 1632:

We consider this proposition to be not only repugnant to the common doctrine of Aristotle, but that it is by itself improbable, and … is disapproved and forbidden in our Society.

Finally, in 1651, the Revisors published an official list of 65 forbidden philosophical theses, including no fewer then four forbidden theses regarding infinitesimals:

25. The succession continuum and the intensity of qualities are composed of sole indivisibles.

26. Inflatable points are given, from which the continuum is composed.

30. Infinity in multitude and magnitude can be enclosed between two unities or two points.

31. Tiny vacuums are interspersed in the continuum, few or many, large or small, depending on its rarity or density.

And now we have made a full revolution back to Aristotle’s wheel, as item 31 is precisely the theory of the continuum that Galileo developed in order to explain the paradox: the path traced by the inner wheel is the same length as the path traced by the outer wheel precisely because the “tiny vacuums” interspersed in the path of the inner wheel are larger than those in the path of the outer wheel.

Eventually, of course, infinitesimals became widely accepted in mathematics and have even been given rigorous foundations that do away with the paradoxes that plagued their early use. Even though Galileo’s solution to Aristotle’s wheel did not last, the ideas it helped usher in transformed mathematics and remain with us to this day.