Greifswald and the Mathematical Sublime

ANOTHER source of the sublime is infinity; if it does not rather belong to the last. Infinity has a tendency to fill the mind with that sort of delightful horror, which is the most genuine effect and truest test of the sublime.

Edmund Burke, A Philosophical Enquiry into the Origin of Our Ideas of the Sublime and Beautiful

Ask people for adjectives they would use to describe the infinite, and right after “really big” would probably come words like “overwhelming”, “incomprehensible”, “strange”. If they’re of a philosophical bent, the word “sublime” might even pass their lips.

In current common usage, “sublime” is most often used to mean something like “extremely good or beautiful”, but in philosophical aesthetics it has a different, more specific meaning. The modern philosophical study of the sublime began in the late 17th century but came to prominence in the 18th, most notably in the work of Edmund Burke and Immanuel Kant. For Burke and Kant, not only are sublimity and beauty distinct notions, but they are in fact mutually exclusive. In Kant’s analysis, a judgment of beauty is a pleasurable and disinterested cognition of a determinate form of a bounded object. If one finds a particular flower to be beautiful, then this is a judgment on the form of the flower: the coloration of its petals, the curl of its leaves, the way all of its components fit together into a harmonious whole. The sublime, on the other hand, is necessarily formless; it can only be conceived of as an overwhelming totality, and for Kant it is therefore a concept of reason rather than of cognition or understanding.

Kant distinguishes between two types of sublimity, each of which overwhelms us in a different way. The dynamical sublime occurs when one is confronted (from a sufficiently safe distance) with an overwhelming physical force, such as a powerful storm or a maelstrom at sea, bringing one’s physical finitude into stark relief. The mathematical sublime, which more directly concerns us here, occurs when one is confronted with something overwhelmingly large, something that outstrips one’s aesthetic comprehension of size. A large mountain, for instance, or a sprawling valley, or the vast reaches of outer space. And while there is a limit to our aesthetic comprehension of size, there are no such limits to our mathematical apprehension of size. For Kant, this movement from the initial inability to comprehend the totality to the subsequent recognition that it can nonetheless be apprehended is the essence of our experience of the sublime and the source of its pleasure.

(It is unsurprising that the notion of the sublime is often linked with that of the infinite, and it should be noted here that Kant was writing before the development of set theory and its accompanying array of sophisticated tools for making precise mathematical sense of the infinite in the 19th and 20th centuries. It is interesting to consider how a formal theory of the mathematical sublime might be affected by modern mathematical concepts, like the dizzying size of large cardinals, or the incompleteness and undecidability theorems of the mid-20th century.)

Caspar David Friedrich, Wanderer above the Sea of Fog (1818)

After Kant, the notion of the sublime continued to evolve, and it became a central topic of the work of 19th century Romantic artists. In the work of these artists, sublimity was no longer necessarily directly opposed to beauty, but it still drew much of its power from the juxtaposition of the overwhelming size or force of nature with the marked finitude, both physical and intellectual, of the human observer. Perhaps the artist of this era who is most associated with the sublime is the painter Caspar David Friedrich. Indeed, if I asked people to name a painting evoking the sublime, it is likely that the most common response would be his famous Wanderer above the Sea of Fog, depicting a man, his back to us, standing on a mountain precipice and looking out over a vast fog-covered landscape. Indeed, the scale of the vista, and the dense banks of fog obscuring the mysteries of the land below from human view, are quite evocative of the sublime. Yet the effect is somewhat mitigated by the positioning of the human observer in the painting, taking up a large space in the exact center of the canvas, artificially posed as if he is aware that his portrait is being painted. The depiction is not so much of somebody being swallowed up by the overwhelming grandeur of nature as of somebody confidently surveying his domain.

A different, slightly lesser known work by Friedrich is often held up as a more faithful representation of the sublime: The Monk by the Sea (see image below). Like Wanderer above the Sea of Fog, this painting depicts a solitary human in the midst of a vast landscape largely obscured by clouds or fog. Here, though, a formless sky takes up the majority of the painting, and the human figure (a monk on a beach, in this case) is much smaller and situated at the bottom of the painting, slightly left of center. His features and pose are quite indistinct; indeed, an inattentive viewer might not even see him, as his body blends into the sea behind him, and his head could be mistaken for one of its white-capped waves.

(A quick side note here. While the sublime is perhaps most tightly associated with 19th century Romantic art, it has been a central focus of a number of other artistic movements since then. Particularly prominent among these is the Abstract Expressionism of the 1940s-60s, with its massive canvases of abstract, almost formless color fields. Art historian Robert Rosenblum argued that the origins of Abstract Expressionism can be traced directly to Friedrich’s landscape painting, remarking quite perceptively, if a bit exaggeratedly, that one only has to remove the monk from Friedrich’s The Monk by the Sea to obtain Mark Rothko’s 1956 painting Untitled (Green on blue) (see image below). For a fuller exploration of the sublime in modern art, Rosenblum’s 1961 ARTnews article “The Abstract Sublime” is highly recommended (and contains the fantastic line, ‘What used to be pantheism has now become a kind of “paint-theism.”’).)


A couple of months ago, I gave a series of lectures at the Winter School in Abstract Analysis in Hejnice, Czech Republic. One of my title slides contained an image of Friedrich’s painting The Sea of Ice, which is the cover image for this post. I chose this painting both because of its affinities with the wintry landscape just outside our conference room and because the shape of the ice sheets jutting out from the sea in the painting is reminiscent of one of the mathematical arguments in the section of the lecture it preceded. The painting can also be seen as an effective depiction of the sublime, as the shipwreck in the right of the painting, almost swallowed up by the ice sheets, underscores the potential helplessness and insignificance of humankind in the face of the overwhelming power of nature, in this case encapsulated in the slow but irresistible movement of the massive ice sheets.

After the talk, I was approached by a mathematician who asked me if I knew the artist of the painting on my slides. I responded that yes, this was Caspar David Friedrich. He then asked if I knew what he and Caspar David Friedrich had in common. I replied that I did not, and he said that they shared a common birthplace, the small city of Greifswald, in northeast Germany on the Baltic Sea. He then told me that there was another relevant one-time resident of Greifswald: the mathematician Felix Hausdorff.

Felix Hausdorff (1868-1942)

The name of Hausdorff is well-known to any student of topology, as his name adorns a number of fundamental concepts therein. He is considered one of the founders of topology and was also one of the key figures in the early development of set theory (out of which topology grew). In set theory, he is particularly remembered for his investigations of infinite ordered sets. He was the first to construct an example of a certain non-trivial ordered structure now known as a Hausdorff gap; such objects are still the topic of active research today. He is also remembered for his 1914 book Grundzüge der Mengenlehre (trans. Foundations of Set Theory), a copy of which adorns my office bookshelf. This was the first comprehensive book-length treatment of the still-young field of set theory and was very influential in its further development. It was mostly written in Greifswald, where Hausdorff served his first appointment at the rank of Full Professor from 1913 to 1921, in between appointments at the more bustling and mathematically well-connected University of Bonn. The mathematician I was speaking with suggested that Hausdorff needed to ditch the distractions of Bonn for the more tranquil surroundings of Greifswald in order to finish the Grundzüge der Mengenlehre (indeed, during the winter term of 1916/17, Hausdorff was the only mathematician at the University of Greifswald). And though the actual explanation probably has more to do with the mundane practical necessity for academics to take jobs where they can find them, it is also tempting to imagine Hausdorff purposefully seeking out the birthplace of Caspar David Friedrich as a setting in which to contemplate infinity and compose his magnum opus.

Caspar David Friedrich, Greifswald in Moonlight (1817)

Cover image: The Sea of Ice (1823-24) by Caspar David Friedrich

While trying to understand Kant’s conception of the sublime and its connection with art, I found the following two articles helpful:

The Copernican Turn in Aesthetics: How Kant’s Notion of the Mathematical Sublime Re-Aestheticizes the Universe” by Thiti Owlarn

The Simplicity of the Sublime: A New Picturing of Nature in Caspar David Friedrich” by Laure Cahen-Maurel

Circles 3: Building Everything out of Nothing

My neighbor has a circular driveway…he can’t get out.

-Steven Wright

Welcome to the third installment in our miniseries about circularity. In the first, we looked broadly at circular definitions, both in everyday language and in mathematics. In the second, we tackled the surprisingly difficult task of establishing a non-circular definition for the simple-seeming notion of “finite set”. Today, we’re going to continue our discussion of finite sets, introducing a useful definition that at first glance appears circular but upon further analysis turns out to be on surprisingly solid foundations. (The contents of the previous posts are not at all necessary for understanding this one, so don’t worry of you’ve forgotton (or never read…) them.)


Let’s take a look at a few fine examples of finite sets. The empty set, \emptyset, is of course a finite set. The set \{2, 3, 5, 7\} is finite, as it contains 4 elements (the single-digit prime numbers). And the set \{\mathbb{N}, \mathbb{R}\} is finite, containing only 2 elements (the set of natural numbers and the set of real numbers). And while all three of these are certainly finite sets, two of them seem more truly finite than the other. The empty set is obviously finite through and through, as is the set \{2,3,5,7\}, since it only has finitely many elements and each of those elements is simply a finite number. It feels a bit like cheating, on the other hand, to call \{\mathbb{N}, \mathbb{R}\} a finite set. It technically qualifies, having only two elements, but those elements are themselves infinite sets (one is even uncountable!). We’ve somehow snuck infinity into a finite set.

With this in mind, we might want to introduce a strengthening of the notion of “finite set” to rule out sets like \{\mathbb{N}, \mathbb{R}\}, which satisfy the letter but not the spirit of the law. Maybe we’ll call this strengthening truly finite. What should the definition of “truly finite” be? With our example in mind, we might try something like the following.

Attempted Definition 1: A set X is truly finite if it satisfies the following two properties:

  1. X is finite.
  2. All of the elements of X are finite.

This definition certainly rules out our set \{\mathbb{N}, \mathbb{R}\}. But what about a set like \{\{\mathbb{N}\}, \{\mathbb{R}\}\}? This set is finite, having two elements (\{\mathbb{N}\} and \{\mathbb{R}\}). And each of its elements is finite, having only one element (\mathbb{N} and \mathbb{R}, respectively). So \{\{\mathbb{N}\}, \{\mathbb{R}\}\} satisfies our attempted definition of “truly finite”. But of course we don’t actually want it to be truly finite, for the same reason that we don’t want \{\mathbb{N}, \mathbb{R}\} to be truly finite: the infinite sets \mathbb{N} and \mathbb{R} are inside it in an essential way. We should therefore revise our definition. If we do this just to take care of the two examples we’ve encountered so far, we might propose:

Attempted Definition 2: A set X is truly finite if it satisfies the following three properties:

  1. X is finite.
  2. All of the elements of X are finite.
  3. All of the elements of all of the elements of X are finite.

Ok, but this was silly. We already knew that this definition wasn’t going to work for us, as something like \{\{\{\mathbb{N}\}\}, \{\{\mathbb{R}\}\}\} satisfies this definition but again should not be considered to be truly finite. We want to somehow catch our tail and say that a set is truly finite if, no matter how many layers deep you go into the set, you never reach an infinite set. Something like:

Attempted Definition 3: A set X is truly finite if it satisfies the following properties:

  1. X is finite.
  2. All of the elements of X are finite.
  3. All of the elements of all of the elements of X are finite.
  4. And so on…

This definition is at least beginning to express the idea we’re grasping at, and we’re not going to be able to find any clear examples of sets that satisfy this definition and yet obviously should not be considered to be truly finite. We’re trying to write a precise mathematical definition, though, and the phrase “And so on…” is much too vague to be an acceptable part of our final definition.

We need to find a clear, concise way to make the intuition behind Attempted Definition 3 precise. It turns out there is a quite elegant solution. It also turns out that this is a fundamental notion in set theory and already has a name that is different from “truly finite”, so we’ll switch over now to its actual name: “hereditarily finite”.

(Actual!) Definition: A set X is hereditarily finite if it satisfies the following two properties:

  1. X is finite.
  2. All of the elements of X are hereditarily finite.

“Whoa, hold on a minute,” I can hear you saying. “This must be a joke. We’re in the third installment of a series on circular definitions, and you’re trying to pass of this clearly circular definition as an ‘actual’ mathematical definition? Nice try, but I’m not fooled.”

And I’ll admit that it does look as if we’re defining the term “hereditarily finite” in terms of … “hereditarily finite.” But this is in fact a perfectly fine definition, for reasons that those of you familiar with recursion in computer programming may already begin to see.

Let’s start with a small step and ask ourselves the simple question, “Are there any sets at all that we can unambiguously say are hereditarily finite according to the above definition?” Perhaps surprisingly, there are! Take a moment to try to think of one before reading on.


To find a set that is definitely hereditarily finite, let’s return to that most finite of all finite sets: the empty set. The empty set is clearly finite, so it satisfies Property 1 of our definition. What about Property 2? Well, the empty set doesn’t have any elements, so it’s vacuously true that all of its elements are hereditarily finite. (Indeed, any statement asserting something about all of the elements of the empty set is vacuously true. All of the elements of the empty set are the emperor of France. All of the elements of the empty set can make a rock so heavy that they can’t lift it.) The empty set therefore satisfies both properties of our definition, so the empty set is hereditarily finite!

Now that we know this, we can start building a veritable menagerie of hereditarily finite sets. Consider the set \{\emptyset\}. It has one element, so it is finite. Moreover, we now know that its one element, the empty set, is hereditarily finite. Therefore, \{\emptyset\} satisfies both properties in our definition, so it is hereditarily finite, too. What about \{\emptyset, \{\emptyset\}\}? It has two elements, so it is finite. Moreover, we have already seen that both of its elements, \emptyset and \{\emptyset\}, are hereditarily finite. Therefore, \{\emptyset, \{\emptyset\}\} is hereditarily finite. We could keep going, generating any number of exotic hereditarily finite sets, like \{\{\emptyset, \{\emptyset\}\}, \{\{\{\emptyset\}\}\}, \{\emptyset, \{\{\emptyset\}\}\}\} or \{\emptyset, \{\emptyset, \{\emptyset\}\}, \{\emptyset, \{\emptyset, \{\emptyset\}\}\}\} or \{\{\{\{\{\{\{\{\{\{\emptyset\}\}\}\}\}\}\}\}\}\}. Our apparently circular definition actually has an abundance of unambiguous examples.


This is all well and good, but in order for our definition actually to do its job, we want something stronger to be true. We want it to be the case that, if somebody hands us an arbitrary set, we can definitively tell whether or not it is hereditarily finite. And there is in fact a step-by-step process for doing just this. Here it is (for concision, we will abbreviate “hereditarily finite” as “HF”):

To tell whether an arbitrary set X is HF, go through the following steps.

Step 0: If X is infinite, then stop the process and conclude that X is not HF. If X is the empty set, then stop the process and conclude that X is HF. Otherwise, continue to Step 1.

Step 1: If any of the elements of X is infinite, then stop the process and conclude that X is not HF. If the only element of X is the empty set, then stop the process and conclude that X is HF. Otherwise, let X_1 denote the set of all elements of elements of X, and continue to Step 2.

Step 2: If any of the elements of X_1 is infinite, then stop the process and conclude that X is not hereditarily finite. If the only element of X_1 is the empty set, then stop the process and conclude that X is HF. Otherwise, let X_2 denote the set of all elements of elements of X_1, and continue to Step 3.

Step n + 1: If any of the elements of X_n is infinite, then stop the process and conclude that X is not hereditarily finite. If the only element of X_n is the empty set, then stop the process and conclude that X is HF. Otherwise, let X_{n+1} denote the set of all elements of elements of X_n, and continue to Step n + 2.

We can get a good feel for what’s going on here by looking at this process in action for a couple of different sets. I think it’s instructive, but if you want to skip the details, you can scroll down to the next horizontal line.

First, consider the set X = \{\{\emptyset, \{\emptyset\}\}, \{\{\{\emptyset\}\}\}, \{\emptyset, \{\{\emptyset\}\}\}\}, one of our examples of a hereditarily finite set from above, and run the above process. In Step 0, we note that X is not infinite (it has 3 elements). Also, X is not the empty set. So we proceed to Step 1.

In Step 1, we first note that all of the elements of X are finite. Also, it’s not the case that the only element of X is the empty set. Therefore, we let X_1 be the set of all elements of elements of X, i.e., X_1 = \{\emptyset, \{\emptyset\}, \{\{\emptyset\}\} \} , and continue to Step 2.

In Step 2, we note that all of the elements of X_1 are finite, and it’s not the case that the only element of X_1 is the empty set. Therefore, we let X_2 be the set of all elements of elements of X_1, i.e., X_2 = \{\emptyset, \{\emptyset\}\}, and continue to Step 3.

In Step 3, we note that all of the elements of X_2 are finite, and it’s not the case that the only element of X_2 is the empty set. Therefore, we let X_3 be the set of all elements of elements of X_2, i.e., X_3 = \{\emptyset\}, and continue to Step 4.

In Step 4, we note that all of the elements of X_3 are finite. But the only element of X_3 is the empty set, so we stop the process and conclude that X is indeed hereditarily finite.

Next consider X = \{\{\{\mathbb{N}\}\}, \{\{\mathbb{R}\}\}\}, a set that we definitely didn’t want to consider hereditarily finite. In Step 0, we note that X is finite and it’s not the case that its only element is the empty set, so we continue to Step 1.

In Step 1, we note that all of the elements of X are finite (they both have one element) and it’s not the case that its only element is the empty set. So we let X_1 be the set of all elements of elements of X, i.e., X_1 = \{\{\mathbb{N}\}, \{\mathbb{R}\}\}, and continue to Step 2.

In Step 2, we note that all of the elements of X_1 are finite and it’s not the case that its only element is the empty set. So we let X_2 be the set of all elements of elements of X_1, i.e., X_2 = \{\mathbb{N}, \mathbb{R}\}, and continue to Step 3.

In Step 3, we note that, in fact, both elements of X_2 are infinite, so we stop the process and conclude that X is, as we desired, not hereditarily finite.


In both cases above, our process halted after a relatively small number of steps, giving us an answer. Do we know that this process necessarily finishes, though? A priori, it looks possible that for certain sets it might just keep going on forever without yielding any conclusion about whether or not the set is hereditarily finite. It turns out, though, that (at least when working with the standard axioms for set theory) the process necessarily ends after a finite (though potentially arbitrarily large) number of steps, and that this fact is due to a rather curious axiom known as the Axiom of Foundation (sometimes called the Axiom of Regularity). It reads as follows:

The Axiom of Foundation: For every non-empty set X, there is an element Y of X such that X \cap Y = \emptyset.

It might not immediately be obvious that this axiom implies that our process must eventually stop, or even that the axiom has anything to do with what we’ve been discussing today. And since this post is already way too long, I’m not going to give a proof here (maybe in a future post?). But it’s a direct consequence of the axiom (and indeed can even be seen as a kind of reformulation of the axiom) that, given any set X, if I run our process on X, then, after some finite number of steps, I will either encounter an infinite set, in which case I know that X is not hereditarily finite, or I will be left with just the empty set, in which case I know that X is hereditarily finite.

I want to say a few more words about this axiom to close, though. I introduced it as a “curious axiom”, and a reason for this is that, although the axiom is listed among the standard axioms of set theory, essentially all of mathematics can be developed entirely without using the Axiom of Foundation at all! Its raison d’être is simply that it makes sets nicer to work with! Sets that don’t satisfy the Axiom of Foundation are weird and pathological and, for most purposes, we’d rather just forget about them during our day-to-day lives. (There are certainly interesting things to say about axiom systems that don’t include the Axiom of Foundation or even include explicit anti-Foundation axioms, but we’ll leave that story for another day.)

One prominent potential set that the Axiom of Foundation rules out is the set X = \{X\}, i.e., the set whose only element is itself. Let’s see what happens if we try to apply our process to X to determine whether or not it is hereditarily finite. In Step 0, we note that X is not infinite and is not the empty set, so we proceed to Step 1.

In Step 1, we note that all of the elements of X are finite and it’s not the case that its only element is the empty set. We therefore let X_1 be the set of all elements of elements of X. But the only element of X is X, and the only element of X is X (I feel like I’m repeating myself…), so X_1 = \{X\}. But X = \{X\}, so it follows that X_1 = X. We continue to Step 2.

But now, since X_1 = X, Step 2 is going to be exactly the same as Step 1. And then Step 3 is going to be the same again. And Step 4. And Step 5. We’re going to repeat ourselves over and over, never making any progress or getting any closer to knowing whether or not X is hereditarily finite. We’ve at last encountered an actual instance of circularity!

So, is X = \{X\} hereditarily finite or not? Well, here we run into a real limitation of our definition. In this instance, it’s ambiguous. We could arbitrarily declare X to be hereditarily finite, or we could declare it not to be hereditarily finite, and either choice would be consistent with our definition. For if X is HF, then it satisfies both properties in our definition: it’s finite, and all of its elements are hereditarily finite. If, on the other hand, X is not hereditarily finite, then it fails property 2, since its one element is not hereditarily finite. The apparent circularity of our definition of “hereditarily finite” was tamed by the Axiom of Foundation, but if we venture into the wilds of non-well-founded sets like X = \{X\}, then it ceases to be a meaningful concept, and its circularity reemerges.


Cover Image: “The Battle Between Carnival and Lent” by Pieter Bruegel the Elder

Circles 2: Defining Infinity

In our previous post, we began a discussion of circularity, first in dictionary definitions and then in mathematics. Today, we dig a little bit deeper by considering what should be a rather straightforward mathematical notion, and certainly a foundational issue for this blog: the definition of the term “finite set”. (And hence, by negation, the definition of “infinite set”. Four years into a blog about infinity and we’re only now rigorously defining what it is we’ve been talking about. How embarrassing!)

We’ve talked about sets quite a bit here; all you need to remember today is that a set is simply a collection of objects. A set can, of course, either be finite or infinite. Most of you reading this probably think that you understand this distinction quite well and, when presented with a relatively simply defined set, you could quite reliably and accurately classify it as either finite or infinite. For example, the set of all real numbers is infinite, whereas the set \{\pi, e, 2\} is finite, as it contains only three elements. This intuitive understanding of “finite” and “infinite” works fine in everyday practice, but it does not amount to a definition, and if we want to study sets generally rather than specific instances of sets, or if we want to study the notion of “infinity”, then it will be important to have a solid definition of the concept “finite set”. And, considering that this is a concept that we are all quite familiar with, this turns out to be surprisingly tricky.

(Before continuing, take a moment to think about how you might try formally to define “finite set”. Can you think of multiple approaches to the task?)

Written 5-12-20, 16-17-1

When asked to define the term “finite set”, an immediate first attempt might be something like this:

Definition Attempt #1: A finite set is a set with only finitely many elements.

This is obviously a bad definition, though. When we’re defining the term “finite set”, what we’re really trying to do is define the word “finite”, so it’s unacceptable to use the word “finite” or “finitely” in its definition.

Let’s think about things a bit more carefully. Saying that a set is finite is making an assertion about its size, so let’s recall how we think about the size of a set. We say that two sets A and B have the same size if their elements can be put into a one-to-one correspondence with one another. Now, when you think about a finite set, you probably think about the act of counting the elements in the set, and eventually completing this count. What do you use to count the elements of the set? The natural numbers! (Recall that natural numbers are whole numbers of the form 0, 1, 2, \ldots.) A finite set should thus be a set whose size is the same as some natural number. This might lead us to formulate the following attempted definition:

Definition Attempt #2: A finite set is a set whose elements can be placed into a one-to-one correspondence with some natural number.

This is much better than Definition Attempt #1, and in fact this is exactly how “finite set” is often defined. When doing mathematics though, we need to make sure we’re being rigorous, so let’s examine this definition a little closer. Our definition makes crucial reference to the notion of a “natural number”, so if we are to declare success, then we should already have a solid definition of “natural number”.

But defining “natural number” is quite tricky in its own right. (Try it!) Notice that, when reminding you what a natural number is a few paragraphs ago, I did so by pointing to some examples, namely 0, 1, and 2, and saying, “it’s like those”. And this is a perfectly valid way of describing the natural numbers in conversational English, but it is certainly not precise enough to rigorously ground mathematics.

So let’s think more carefully about natural numbers. We can certainly talk about certain specific natural numbers, like 0 and 1. For example, 0 is the size of the empty set, and 1 is the size of the set \{\emptyset\}. We can also talk meaningfully about what it means to add 1 to a number: given a number x, find a set A of size x and an object b that is not an element of A, and then say that x + 1 is the size of the set A \cup \{b\} formed by adding b to A as an element.

Now notice that the natural numbers are precisely the numbers you can obtain by starting with 0 and then repeatedly adding 1. (For example, 4 = 0 + 1 + 1 + 1 + 1.) Great. It seems like we now have a fine definition of “natural number”, which in turn means that we have a fine definition of “finite set”. But wait a minute. When we say “repeatedly adding 1”, what do we really mean? How many times are we allowed to “repeatedly add 1”? Well, we want to specify that we’re only allowed to add 1 a finite number of times. But we haven’t defined “finite” yet, and, even worse, our current candidate for a definition of “finite” relies on the notion of “natural number”, so it would create a circle if we were to define “natural number” in terms of “finite”, and since we’re trying to be at least somewhat rigorous here, we definitely want to avoid that.

We could certainly continue down this path if we wanted and isolate a definition of “natural number” that does not depend on the general notion of “finite”. For our purposes, though, I think there’s a shorter, more elegant approach, so let’s change tack slightly. Rather than trying to characterize finite sets, let’s try to characterize infinite sets and use this to arrive at a definition of finiteness.

One of the first observations we make when we start thinking about infinity mathematically is one that might initially be rather unintuitive and disorienting: there are infinite sets A and B such that B obviously contains “more” elements than A, and yet A and B have the same size. The first example you probably saw of this is that in which A is the set of even natural numbers and B is the set of all natural numbers. B obviously contains all of the elements of A plus many more elements that are not in A, and yet we can see that A and B have the same size via the one-to-one correspondence that matches the 0 from A with the 0 in B, the 2 from A with the 1 from B, the 4 from A with the 2 from B, and so on. (I know we made this pivot precisely because we haven’t rigorously defined the notion of “natural number”, but that’s fine. This is just an illustrative example, not a foundational part of what will eventually be our definition.)

When we reflect on this further, this phenomenon in fact seems to be a fundamental feature of what it means for a set to be infinite. Given any infinite set B, it is always possible to find a proper subset A of B such that A and B have the same size (remember that A is a proper subset of B if every element of A is also an element of B but B contains at least one element that is not in A). Indeed, if you start with an infinite set B and simply remove one element from B, you always end up with a set of the same size as B. This is certainly not the case with finite sets. If you have a set of size 6, for instance, and take an element out of it, you now have a set of size 5, which is strictly smaller than your original set. Since this phenomenon seems so characteristic of infinite sets, then, why don’t we take it as the definition of “infinite”, and hence take its negation as the definition of “finite”.

Definition Attempt #3: A finite set is a set with the property that there is no proper subset of the set that has the same size as the set itself. In other words, a set is finite if, whenever you take an element out of the set, you make the set strictly smaller.

This is now a non-circular, rigorous definition (for a certain definition of the word “rigorous”…) and, to my mind at least, it is quite elegant and matches up with our experience and intuition of finite sets.

This definition of “finite” in fact has a relatively long history. It was first proposed by the German mathematician Richard Dedekind in 1888 and was the first modern definition of “finite” that did not rely on the definition of “natural number”. Nowadays, a set satisfying our Definition Attempt #3 is said to be “Dedekind-finite”, while a set satisfying Definition Attempt #2 (after a rigorous definition of “natural number” has been made) is simply said to be “finite”.

Are these two definitions “the same”? Well, they both seem to match our intuition of “finite set”, and for a number of years after the introduction of “Dedekind-finite sets”, it was widely believed among mathematicians that they were the same. And, under the standard axioms of set theory, it is in fact the case that they are the same, i.e., that the finite sets are precisely the Dedekind-finite sets. However, one of these standard axioms of set theory is the sometimes controversial Axiom of Choice. Since the Axiom of Choice often leads to unintuitive consequences (the Banach-Tarski paradox, for instance), mathematicians often like to consider what happens if we don’t adopt the Axiom of Choice. And it turns out that, without the Axiom of Choice, it is impossible to prove that these two definitions of “finite” are the same. In fact, there are mathematical universes that satisfy all of the standard axioms of set theory except for the Axiom of Choice and in which there are sets that are Dedekind-finite and yet are not the same size as any natural number. Try to think about what such a set would be like!


I had meant to use this post to discuss hereditarily finite sets, which provide an example of a mathematical definition that at first glance looks circular but turns out not to be. It took us so much work simply to define finite sets, though, that I think we’ll save that for next time!


Cover image: “Still life, pitcher and fruit,” by Paul Cézanne

Circles 1: Circular Definitions

circular definition. noun Logic. a definition in which the definiendum (the expression being defined) or a variant of it appears in the definiens (the expression that defines it).

-dictionary.com

Today we return to our “Circles” series, which we introduced what seems like years ago but was actually just March, with its first real installment, on circular definitions.


We are rightfully taught from a young age to avoid using a word in its own definition. This makes obvious sense: a definition is, in part, supposed to convey the meaning of a word to a reader who is unfamiliar with it. Ideally, this happens using words that the reader is already familiar with or can become familiar with.  If the unfamiliar word is itself in the definition, it is difficult for it to effectively convey this meaning. The definition has failed at one of its primary tasks.

Even if we diligently avoid using words within their own definitions, though, circularity can easily crop up across multiple definitions. For example, we might make the following sequence of definitions:

ice. noun. the solid form of water.

steam. noun. the gaseous form of ice.

water. noun. the liquid form of steam.

Although none of these definitions is circular on its own, they form a closed circle when taken as a set, and this obscures the meaning of all three words. A person who is unfamiliar with all of the words “ice”, “steam”, and “water” will come away from these definitions knowing something, namely that the three words represent, respectively, the solid, gas, and liquid forms of a certain substance, but not much more than that. They would not know, for instance, any relevant properties of this substance, or where this substance might be found, or what it is used for, and these three definitions would certainly not be printed together in any decent dictionary. (One might argue that the definitions of “ice” and “steam” are basically fine, if a bit limited, and that the definition of “water” should be altered to something more descriptive, thus breaking this instance of circularity. This is essentially what is done by Merriam-Webster, for instance).

Or here is an example actually occurring in Merriam-Webster. (To be fair, the full definitions in M-W are much longer and have multiple entries that we are not quoting in their entirety here. Still, what we reproduce below are the exact quotes of the first definitions of each of the three respective words. (The definition of “weigh” we quote is the first given as an intransitive verb, which is the sense in which it is being used in the definition of “weight”.))

weight. noun. the amount that a thing weighs.

weigh. verb. to have a certain heaviness

heavy. adjective. having great weight

This probably doesn’t seem like an ideal scenario, but satisfactorily removing this circularity from the definitions of “weight”, “weigh”, and “heaviness” (which redirects to “heavy”) is more difficult than it might initially seen. I encourage you to spend a few minutes thinking about it. Let me know if you come up with something significantly better.


One might be wondering at this point if it is even possible to avoid circularity entirely in a system of definitions.  In fact, a rather simple argument, of a sort that is ubiquitous in the mathematical field of graph theory, shows that it is impossible. Let us look at this argument now.

In our analysis, we will represent certain information from the dictionary in a directed graph, i.e., a collection of nodes and a collection of arrows between certain of these nodes. In our graph, each node will represent a word, and there will be an arrow pointing from Word 1 to Word 2 if Word 2 appears in the definition of Word 1. For example, a portion of this graph containing some of the words in our “ice”, “steam”, and “water” definitions given above might look like this:

circular_definition-1
A portion of our “Dictionary directed graph”. Note the circle formed by “ice”, “water”, and “steam”.

We now make two assumptions that I think will seem quite reasonable:

  1. There are only finitely many words in the dictionary.
  2. Every word that is used in the dictionary itself has a definition in the dictionary.

We will now see that these two innocuous assumptions necessarily lead to a circular set of definitions. To see this, choose any word in the dictionary and start following an arbitrary path of arrows in the directed graph derived from the dictionary. For example, in the segment above, we might start at “ice”, then move to “water”, then to “liquid”, and then elsewhere, maybe to “flowing”, for example. Let us label the words we visit on this path as w_1, w_2, w_3, and so on. Recalling the meaning of the graph, this means that the definition of w_1 contains w_2, the definition of w_2 contains w_3, and so on.

A consequence of Assumption 2 is that each of the words w_n visited in our path itself has a definition, and therefore there are certainly arrows pointing out of the node associated with w_n. This means that we never get stuck on our path. We can always choose a next node, so we can continue our path indefinitely and find a word w_n for every natural number n.

We now have an infinite path of words. But Assumption 1 says there are only finitely many words in the dictionary, so this means that our path must repeat some words! In other words, there are numbers m < n such that w_m and w_n are the same word. For example, maybe w_6 is the same word as w_{10}. But in that case, look at what we have:

  • the definition of w_6 contains w_7;
  • the definition of w_7 contains w_8;
  • the definition of w_8 contains w_9;
  • the definition of w_9 contains w_6 (since w_6 = w_{10}).

Written 5-12-20, 16-17-1
A circle of definitions.

Thus, w_6, w_7, w_8, and w_9 form a circular set of definitions. (The same sort of analysis obviously holds for any values of m and n, not just for 6 and 10.)


The fact that circularity is unavoidable should not, though, be a cause for despair, but can perhaps help us to clarify certain ideas about language and dictionaries. A dictionary is not meant to secure a language on a perfectly logically sound, immutable foundation. Nobody is worried that an inconsistency will be found in the dictionary and, as a result, the English language will collapse (whereas this sort of worry certainly does exist, in a limited form, in mathematics). Rather, it is meant to serve as a living documentation of a language as it is actually used in practice, and our everyday use of language is delightfully full of imprecisions and ambiguities and circularities. I would not want things to be any other way.

(This is not to say, of course, that all circular definitions should be uncritically embraced. Circular definitions are often simply unhelpful and can easily be improved. Our definitions of “ice”, “steam”, and “water”, for instance, would be greatly improved simply by defining “water” as something like “the liquid form of a substance whose molecules consist of two hydrogen atoms and one oxygen atom, which freezes at 0 degrees Celsius and boils at 100 degrees Celsius, and which is the principal component of the Earth’s rivers, oceans, lakes, and rain”.)

In the spirit of play, though, and in anticipation of the forthcoming mathematical discussion, let us consider two ways in which our language could be altered to actually allow us to avoid circles. These two ways essentially amount to denying the two Assumptions isolated in our proof above. The first way consists of allowing our language to be infinite. Our argument above hinged on the fact that our language has only finitely many words, so our path must return to a previously visited node at some point. If we had infinitely many words, though, we could line them all up and write down definitions so that each word’s definition only depended on words appearing later. So, for example, Word 1 might be defined in terms of Words 3, 5, and 17, and Word 3 might then be defined in terms of Words 6, 12, and 24, and so on. In practice, this is untenable for obvious physical reasons, but, even absent these physical concerns, it’s also unclear how any actual meaning would arise from such a system.

The second way consists of retaining a finite language but specifying one or more words as not needing definitions. These would be something like atomic words, perhaps expressing pure, indivisible concepts. In the graph diagram in our argument above, the nodes corresponding to these words would have no arrows coming out of them, so our paths would simply come to a halt, with no contradiction reached, if we ever reached such a node. Indeed, it is not hard to see how, given even a small number of such atomic words, one could build an entire dictionary, free from any circularity, on their foundation.

In some sense, this is actually close to how language works in practice. You can look up, for instance, the word “the” in the dictionary and find a definition that, at least for the first few entries, studiously avoids the use of “the”. Such definitions are certainly of use to, say, linguists, or poets. And yet nobody actually learns the word “the” from its definition. Similarly, you probably learned a word like “cat” by, over time, seeing many instances of actual cats, or images or sounds of cats, being told each time that what you were witnessing is a cat, and subconsciously collating the similarities across these examples into a robust notion of the word “cat”. Reading a definition may help clarify some formal aspects of the concept of “cat”, but it probably won’t fundamentally alter your understanding of the word.


One place where circularity certainly should be avoided, though, is in mathematical reasoning. Just as definitions of words make use of other words, proofs of mathematical theorems often make use of other theorems. For example, if I wanted to prove that there is a real number that is not a solution of any polynomial equation with integer coefficients (i.e., there exists a transcendental number), the shortest path might be to make use of two theorems due to Cantor, the first being that the set of polynomial equations with integer coefficients has the same size as the set of natural numbers, and the second being that the set of real numbers is strictly larger than the set of natural numbers. Since these theorems have already been proven, we are free to use them in our proof of the existence of transcendental numbers.

Carelessness about circularity can lead to problems here, though. Suppose we have “proved” three theorems (Theorem A, Theorem B, and Theorem C), but suppose also that our proof of Theorem A depends on the truth of Theorem B, our proof of Theorem B depends on the truth of  Theorem C, and our proof of Theorem C depends on the truth of Theorem A. Have we really proven anything? Well, no, we haven’t. In fact, if you allow for such circular reasoning, it’s easy to prove all sorts of obviously false statements (and I encourage you to try!).

How does mathematics avoid this problem and actually get started proving things? Exactly by the second method outlined above for potentially removing circular definitions from language. Just as we specified certain words as not needing definitions there, here we specify certain mathematical statements as being true without needing proof. These statements are known as axioms; they are accepted as being true, and they serve as the foundation on which mathematical theories are built.

Statements can be chosen as axioms for a number of reasons. Often they are statements that are either seen as obviously true or definitionally true. Sometimes long experience with a particular mathematical field leads to recognition that adopting a certain statement as an axiom is particularly useful or leads to particularly interesting consequences. Different sets of axioms are adopted by different mathematicians depending on their goals and area of study. But they are always necessary to make mathematical progress.

A proper discussion of axioms would fill up many books, so let us end this post here. In our next entry in our Circles series, we will look at the important mathematical idea of well-foundedness and how it allows us to develop mathematical definitions or arguments that at first glance may appear circular but are in fact perfectly valid.


Cover image: “Etretat in the Rain” by Claude Monet

Pythagoras’ Table of Opposites

On March 14 (aka Pi Day, aka International Day of Mathematics, aka Save a Spider Day), the New York Times reposted a 2012 piece about infinity by Natalie Angier (originally published on New Years’ Eve, aka Leap Second Time Adjustment Day, aka Make Up Your Mind Day). For obvious reasons, I thought the essay might be of interest to readers of this blog. It doesn’t delve particularly deeply into any one facet of infinity, but offers a nice overview of some views of the infinite through history.

One thing to which I was introduced by Angier’s piece is Pythagoras’ Table of Opposites, which collects ten pairs of contrasting qualities, placing those viewed positively by the Pythagoreans on the left and those viewed negatively on the right (which is sort of ironic given the fourth pair of opposites in the table below). Here is a (translation of) a version of the Pythagorean Table of Opposites found in work of Aristotle:

Finite Infinite
Odd Even
One Many
Right Left
Male Female
Rest Motion
Straight Curved
Light Darkness
Good Evil
Square Oblong

I’m not going to write too much about this, as I’m certainly not an expert in ancient Greek philosophy, but let me just make a few observations:

  • It’s no surprise that “infinite” was placed on the negative side of the table, given the Pythagoreans’ antipathy to irrational numbers and the accompanying infinities. We have written previously about mathematicians’ growing acceptance of infinity, especially in the last 150 years, though there are still a few mathematicians who would rather do away with the concept of infinity in mathematics.
  • The pairs “One/Many” and “Rest/Motion” play central roles in the paradoxes of Zeno of Elea, covered in this previous post.
  • The association of “right” with “good” and “left” with “bad” is certainly not unique to the ancient Greeks, showing up in a number of other societies throughout the world. It even persists in slightly hidden form in modern etymology: the Latin word for “right” is “dexter”, from which the English words “dextrous” and “dexterity” derive, while the Latin word for “left” is “sinister”, from which, naturally, the English word “sinister” derives. For a deeper and more rigorous account of “left” and “right” and other pairs of opposites in ancient Greek philosophy, see G.E.R. Lloyd’s paper, “Right and Left in Greek Philosophy”.

Cover Image: “PH-929” by Clyfford Still

Chromaticism, or, A Portrait of Mathematical Work

I was scheduled to speak at a graph theory conference this weekend, and while the conference was canceled due to COVID-19, I still wanted to do something with my prepared presentation, so today I’m going to try something new: converting my planned research talk to a blog post, and attempting for the first time to write for a general audience about my actual work.

The topic of my talk, a theorem about the growth rates of chromatic numbers of graphs, is, I hope, well-suited to this experiment. For while most of my work would be hard to present concisely to a general audience for the sole reason that it requires a fair amount of background knowledge to understand, I feel that this topic, and some intimation of why it’s interesting, is relatively accessible, especially to the sophisticated readers of this blog. So, let’s begin.

Before getting into the details, let’s start with a quick bird’s eye view. The question we’ll be addressing is of a sort that is central to a vast array of mathematical fields. Briefly put, it is a question of compactness, i.e., the extent to which the global properties of a mathematical object are determined by its local properties. Slightly less briefly put, it concerns the following situation. Suppose we want to understand some very large mathematical object, but we are only able to examine small pieces of the object at any given time. How much can we tell about the structure of the mathematical object as a whole? As a suggestive but imperfect analogy, imagine scientists in ancient Greece trying to determine the circumference of the Earth while only being able to survey a tiny fraction of the Earth’s surface at once.

Our topic today is a rather simple instance of this line of inquiry. The large mathematical objects our theorem deals with are uncountable graphs, and the small pieces that we are able to examine are their finite subgraphs. The general question we want to explore is: To what extent do the chromatic numbers of finite subgraphs influence the chromatic number of the entire graph? Before we go any further, then, we need to remind ourselves what all of these words mean.

(Note to the reader: This medium differs in many ways from that of the conference talk. One advantage here: I can provide a soundtrack! [Cue chromatic music.])

We have actually covered the requisite background information in a previous installment; I encourage readers to revisit that post, but we’ll recap the highlights here. Those already familiar with basics about graphs and their chromatic numbers can skip ahead to the next embedded video.

Our theorem is about graphs, which are simple mathematical representations of networks. A graph is a collection of vertices (often depicted visually as dots or circles) and edges connecting certain pairs of vertices (often depicted visually as lines or arcs between pairs of vertices). For example:

graph1
A graph.

Given a graph G, a subgraph H of G is just what it sounds like, consisting of some of the vertices of G and some of the edges of G that connect those vertices. For example, one subgraph of the graph depicted above would consist of the top three vertices together with the edge connecting the left and center top vertices. Or another subgraph would consist of all of the vertices except the isolated vertex in the upper right, together with only the edges that make up the triangle in the top left of the graph.

One of the central notions in graph theory is that of a chromatic vertex coloring. This is an assignment of a value (or color) to each vertex in the graph in a way such that no two vertices with the same color are connected by an edge. Here is a chromatic coloring of our example graph using three colors:

graph2
A chromatic coloring.

If G is a graph, then the chromatic number of G, denoted \chi(G), is the least number of colors needed for a chromatic coloring of G. Suppose now that G is our example graph from above. We exhibited a chromatic coloring of G using 3 colors, so we certainly know that \chi(G) \leq 3. It is not hard to show (Exercise: check this!) that it is impossible to construct a chromatic coloring of G using only 2 colors. Therefore, we obtain the equality \chi(G) = 3. Notice that the chromatic number of a graph G is at most the number of vertices in G, since any coloring assigning a distinct color to each vertex is a chromatic coloring.

Informally, the chromatic number of a graph can be thought of as a measure of the complexity of the web of connections among its vertices. In addition to its theoretical interest, the chromatic number and related measures of complexity are of practical importance in problems such as task scheduling or committee assignments.

The graph pictured above is finite, with only six vertices and five edges, but there is nothing in the definitions to prohibit graphs with infinitely many vertices and edges, or graphs with infinite chromatic numbers. For example, if a graph G is a complete graph, which means that every pair of vertices is connected by an edge, then any chromatic coloring of G must assign a different color to each vertex, so the chromatic number of G is equal to the number of vertices of G, which could very well be infinite. We will be interested primarily in graphs that have uncountably many vertices and edges (i.e., the cardinality of the set of edges and vertices is greater than the cardinality of the set of natural numbers).

The first significant result concerning the compactness of chromatic numbers is the following beautiful theorem of De Bruijn and Erdős from 1951.

Theorem (De Bruijn-Erdős): Suppose that G is a graph, d is a finite number, and, for every finite subgraph H of G, \chi(H) \leq d. Then \chi(G) \leq d.

The theorem gives a powerful tool for passing from local knowledge of a graph to global knowledge: if you examine every finite subgraph of a graph and find that its chromatic number is at most some fixed finite number d, then the entire graph, no matter how large it is, also has chromatic number at most d. The De Bruijn-Erdős theorem has a number of beautiful proofs; we gave one in the aforelinked post.

Turning the theorem around, we see that, if a graph G has infinite chromatic number, then it must have finite subgraphs of every possible finite chromatic number.  Therefore, if G is a graph with infinite chromatic number, then we can define a function f_G from the natural numbers to the natural numbers by letting f_G(k) be the least number of vertices in a subgraph of G that has chromatic number k.

To see how this works in practice, suppose that G is an infinite complete graph, i.e., it has infinitely many vertices and every pair of vertices is connected by an edge. Then f_G(k) = k for every natural number k, since G has many subgraphs consisting of k vertices and all of the edges between them. Such a subgraph has chromatic k, and clearly any graph with fewer than k vertices has chromatic number less than k.

For any graph G with infinite chromatic number, the function f_G is an increasing function, i.e., if k_0 < k_1, then f_G(k_0) < f_G(k_1). This is because, if H_1 is a finite subgraph of G with chromatic number k_1, then, by removing vertices from H_1 one at a time, we can find a smaller subgraph H_0 with chromatic number k_0.

For the infinite complete graph G, we saw that f_G grows linearly. A natural question arises: can f_G grow arbitrarily quickly? Can we find a graph G so that f_G grows exponentially? Can f_G grow like the Ackermann function? Like the TREE function?

Let’s pause for a moment to think about why this question might be interesting. For a graph G, if f_G grows quite quickly, then it means that, in order to find subgraphs of G with even relatively small chromatic numbers, we need to consider subgraphs with many, many vertices. For example, if f_G(3) = 1,000,000, that means that any subgraph of G with chromatic number 3 (the smallest non-trivial chromatic number), must have at least one million vertices; all smaller subgraphs have chromatic number at most 2. Maybe then f_G(4) is larger than the number of atoms in the observable universe, and f_G(5) is inconceivably larger still. Intuitively, the faster f_G grows, the less complexly connected the finite subgraphs of G are. Maybe there is some function f such if f_G grows faster than f then the finite subgraphs of G are so loosely connected that it places an upper limit on the chromatic number of G itself. This is because there is a tension between wanting G to have very large chromatic number and wanting f_G to grow very quickly. To achieve the former, we have to add sufficiently many edges; to achieve the latter, we have to not add too many edges. Perhaps there is a place where this tension reaches a breaking point. If so, it would be quite interesting to isolate this breaking point. If not, it would be interesting to prove that it doesn’t exist.

Another reason to suspect that this question is interesting is the fact that Paul Erdős was interested in it.

Recall that the smallest infinite cardinality, that of the natural numbers, is denoted by \aleph_0. All larger cardinalities are said to be uncountable. The smallest uncountable cardinal is \aleph_1, the next smallest is \aleph_2, and so on.

By results of Erdős from the 1960s, there is no restriction on how quickly f_G can grow for graphs with countable chromatic number. More precisely:

Theorem (Erdős): For every function f from the natural numbers to the natural numbers, there is a graph G such that \chi(G) = \aleph_0 and f_G grows faster than f (in particular, we can arrange f_G(k) > f(k) for all k \geq 3).

These graphs G are constructed by putting finite graphs of larger and larger chromatic number next to one another. The union of all of these finite graphs will then have chromatic number \aleph_0, and, if the finite graphs are chosen carefully, we can arrange for f_G to grow faster than f. If we want to build graphs with uncountable chromatic number, though, this strategy is not available to us. Any union of disjoint finite graphs (or even any union of disjoint graphs with countable chromatic number) has chromatic number at most \aleph_0; constructing a graph with uncountable chromatic number requires a more intricate approach. It is for this reason that, in my mind, the difference between countable and uncountable chromatic numbers is much more significant than the difference between finite and infinite chromatic numbers. In many ways that go beyond the scope of this post, graphs with finite or countably infinite chromatic numbers behave the same, whereas graphs with uncountable chromatic number behave differently.

Erdős and his frequent collaborator, Hajnal, were asking at least as early as the 1970s whether or not a version of this theorem also holds for graphs of uncountable chromatic number. The first appearance of this question in print appears to be a 1982 paper of Erdős, Hajnal, and Szemerédi.

Question(Erdős, Hajnal, and Szemerédi): Is it the case that, for every function f from the natural numbers to the natural numbers, there is a graph G such that \chi(G) is uncountable and f_G grows faster than f?

Erdős was quite fond of this question, spending time thinking about it himself, encouraging his colleagues in set theory to think about it, and including it in his 1990 paper, “Some of my favourite unsolved problems.” The first substantial progress came in 2005, when Komjáth and Shelah proved that it consistently has a positive answer.

Theorem(Komjáth-Shelah): It is consistent with the axioms of set theory that, for every function f from the natural numbers to the natural numbers, there is a graph G such that \chi(G) = \aleph_1 and f_G grows faster than f.

In other words, they showed that a positive answer to the question could not be disproven. This left open the possibility that the question, like many others in mathematics, is independent of the axioms of set theory, i.e., that it can be neither proven nor disproven. About a year ago, I resolved the question by showing that the question is not independent but in fact provably has a positive answer.

Theorem(CLH): For every function f from the natural numbers to the natural numbers, there is a graph G such that \chi(G) = \aleph_1 and f_G grows faster than f.

While this answers the question of Erdős, Hajnal, and Szemerédi, it by no means ends the story. The most pressing next question is how high we can push the chromatic number of G. Can we always find such a graph G with chromatic number \aleph_2? What about arbitrarily large chromatic numbers? The techniques of my proof are very specific to \aleph_1, so answering these questions will require new ideas.

I’m not going to provide any details of the proof of the theorem here, as it goes beyond the scope of this blog. Interested readers can find it written up here. I want to end, though, with a few words about how the proof was found, as I think it is illustrative of how mathematical work in particular and creative work in general is often done.

It was January 2019, and I was not in fact even working on the Erdős-Hajnal-Szemerédi question at the time. Instead, I was working on a different, more technical problem. Looking for some ideas that might shed some light on the problem I was thinking about, I was reading, among other works, the paper of Komjáth and Shelah that gave a consistent positive answer to the E-H-S question. I read their proof one afternoon in my office. What they actually do in their work is the following. They begin by assuming a principle known as \diamondsuit, which is a strengthening of the Continuum Hypothesis. They then perform a technique known as forcing to produce a model of set theory in which the E-H-S question has a positive answer. While their ideas ended up not being useful for the technical problem I was working on, while reading their proof, I had a vague feeling that I could combine some of their ideas with some ideas I had developed the previous fall to show that the forcing step of their proof was unnecessary. In other words, it seemed that maybe a positive answer to the E-H-S question already followed from \diamondsuit. I went to a coffee shop and, after an hour or so, convinced myself that this was true.

This was a nice result, and probably worth writing up and publishing, though it was really just a minor improvement to the Komjáth-Shelah result and certainly didn’t answer the E-H-S question, since \diamondsuit is itself independent of the axioms of set theory. As I was writing the paper, though, I gradually realized that my approach could be generalized to eliminate the need for \diamondsuit altogether and prove the result outright. The general idea is as follows: My original construction, using \diamondsuit, was a very linear construction, in which I used \diamondsuit to anticipate all possible chromatic colorings of my graph using only countably many colors and to ensure that none of them actually works. What I realized, though, is that if I abandoned the insistence on a linear construction and instead allow my construction to branch out widely to follow all possible chromatic colorings with countably many colors, then \diamondsuit was unnecessary. By the first week of February, I had worked these ideas out and had a completed draft of the final paper.

My reason for sharing this is to give a picture of what mathematical research actually looks like. A popular conception of original work in mathematics, or any other creative endeavor, for that matter, is of the lone genius, working obsessively and single-mindedly at their desks for months or years on end, original ideas coming from nowhere but their own minds, until a breakthrough finally arrives and they can reemerge triumphantly into the world. What is much more often the case, though, is that original work comes about because its creators consume widely and deeply of the work of others and put themselves in a position to be able to make meaningful connections between pieces pulled from these disparate sources. It is less the result of obsessive genius and more the result of serendipity and of its creators preparing themselves to take advantage of this serendipity. “Original” work is more often than not simply novel combinations of minor tweaks of old work. Recognizing this should not in any way diminish the accomplishments of its creators, but instead should highlight the deep connections running through all creative endeavors.


Cover image created using the Hydra live coding platform. Check it out!

Circles 0

Life is a full circle, widening until it joins the circle motions of the infinite.

-Anaïs Nin

No shape has captured the human imagination quite like the circle has. Its perfect symmetry and constant radius stand in contrast to the messy variability of our everyday lives. We have inner circles, vicious circles, fairy circles, crop circles, family circles, circles of influence, the circle of life. Circles permeate our conception of time, as they provide the shape of our clocks, and when things return to their original configuration, we say that they have come full circle.

We have seen intimate connections between circles and infinity in many previous posts. The circle is the one-point compactification of the infinite straight line, which itself can be thought of as a circle with infinite radius. The process of circle inversion provides a useful duality between the finite region inside a circle and the infinite region outside. The Poincaré disk model provides an elegant finite setting for a decidedly infinite instantiation of hyperbolic geometry.

In the next few posts, we will be exploring circles more deliberately, through lenses of mathematics, philosophy, linguistics, art, and literature. We will think about circular definitions and closed timelike curves, about causality and the Liar Paradox.

These are for future weeks, though. For today, to lay the foundations and whet the appetite, please enjoy these three essential pieces of circle-related culture:

Although popularly every one called a Circle is deemed a Circle, yet among the better educated Classes it is known that no Circle is really a Circle, but only a Polygon with a very large number of very small sides. As the number of the sides increases, a Polygon approximates to a Circle; and, when the number is very great indeed, say for example three or four hundred, it is extremely difficult for the most delicate touch to feel any polygonal angles. Let me say rather, it would be difficult: for, as I have shown above, Recognition by Feeling is unknown among the highest society, and to feel a circle would be considered a most audacious insult. This habit of abstention from Feeling in the best society enables a Circle the more easily to sustain the veil of mystery in which, from his earliest years, he is wont to enwrap the exact nature of his Perimeter or Circumference. Three feet being the average Perimeter, it follows that, in a Polygon of three hundred sides, each side will be no more than the tenth part of an inch; and in a Polygon of six or seven hundred sides the sides are little larger than the diameter of a Spaceland pin-head. It is always assumed, by courtesy, that the Chief Circle for the time being has ten thousand sides.

Edwin A. Abbott, Flatland: A Romance of Many Dimensions

On Jigsaw Puzzles

Our whole life is solving puzzles.

-Ernő Rubik

I found myself this year occupying an office with an extra desk that was disturbingly tidy for the first half of the fall semester, when I happened upon a jigsaw puzzle featuring an image of Edward Gorey’s “An Exhibition” and decided it would be the perfect way to add some clutter to my office. And so, during the months of November and December, Gorey’s drawing emerged on my desk as the 1000 pieces of the puzzle were slowly but steadily assembled during midday breaks in my mathematical work.

I enjoyed jigsaw puzzles as a child, but this was my first time completing a puzzle in over a decade, and my first time seriously working on a puzzle since starting to do research in mathematics. And as I progressed through its solution, I was struck by how the experience of solving it mirrored the experience of doing math research. I’ll be the first to admit that the observation that doing math is like solving puzzles is not particularly deep or original, but what increasingly stood out to me was how unexpectedly rich and multifaceted the similarities between the acts of research and puzzle-solving turned out to be.

So today, we are going to explore some of these similarities, with the aim of perhaps helping to begin to illuminate what it actually feels like, at least at an abstract level, to do math research. For although a relatively small percentage of the population has actively engaged in mathematical research, the act of solving jigsaw puzzles is a more universal experience, and I expect many people will be familiar with its joys and frustrations.


Solving a jigsaw puzzle and completing a mathematical research project are both processes, with distinct phases, and these phases will furnish a useful framework for our comparison. With this in mind, let us begin at the beginning.

1. Assembling the edge

When starting a puzzle, there are certain easily identifiable pieces about which one knows some additional information. These are, of course, the edge pieces, which are straight on one (or, in rarer but even more informative cases, two) of their sides. And so, the most common and most reliable strategy for beginning a jigsaw puzzle is first to isolate the edge pieces and then to fit all of them together to construct the “frame” of the puzzle. The reasons for this being an effective strategy are many, but chief among them are surely:

  • It is easy to identify edge pieces.
  • It is relatively easy to fit them together, since for two edge pieces there are at most two possible ways in which they can be joined, whereas for two interior pieces there are up to sixteen possible ways in which they can be joined.
  • Once constructed, the edge of the puzzle will help give structure to the more chaotic interior, providing scaffolding off of which one can steadily build into the center of the puzzle.

The counterpart to assembling the edge pieces in mathematical research is, I claim, doing background reading, getting yourself up to speed with regards to the current state of knowledge about the problem you are trying to solve. While it is certainly conceivable that one could solve a problem with little or no knowledge of previous work, in practice it is immensely helpful and time-saving to be familiar with what other researchers have already discovered about the problem, with techniques that have been developed to tackle this or similar problems, and with ways in which these techniques have succeeded or failed in the past. In somewhat strained analogy with the above, let me isolate three reasons why this is effective:

  • Given modern bibliographic and communication tools, it is relatively easy to identify relevant books or papers in the existing literature (though it is not always as easy as one might hope).
  • Reading the background literature should be an “easier” or at least more consistently productive process than jumping immediately in to the research, in large part because it will not require the novel insights that the eventual solution to your problem will involve. (This is, in part, assuming that the background literature is available to you and reasonably well-written, neither of which is an entirely safe assumption.)
  • After becoming familiar with previous work, you will have a solid foundation that should provide some firm jumping-off points to begin your research.

2. Experimental tinkering

Once you have completed the edge of your puzzle, it is probably wise to take a break to prepare yourself for the more daunting prospect of working your way into its interior. At the beginning, this process is likely be slow, frustrating, overwhelming, and bewildering, made especially difficult by the fact that, although the puzzle pieces fit together to form a coherent image, individually they are abstract blobs of color and texture, with little indication as to the form they will eventually be a part of. This phenomenon is particularly well depicted by the labyrinthine opening sentence (containing not one nor two but three colons!) of Georges Perec’s (yes, that Georges Perec) masterpiece novel, Life: A User’s Manual, in which jigsaw puzzles feature prominently:

To begin with, the art of jigsaw puzzles seems of little substance, easily exhausted, wholly dealt with by a basic introduction to Gestalt: the perceived object — we may be dealing with a perceptual act, the acquisition of a skill, a physiological system, or, as in the present case, a wooden jigsaw puzzle — is not a sum of elements to be distinguished from each other and analysed discretely, but a pattern, that is to say a form, a structure: the element’s existence does not precede the existence of the whole, it comes neither before nor after it, for the parts do not determine the pattern, but the pattern determines the parts: knowledge of the pattern and of its laws, of the set and its structure, could not possibly be derived from discrete knowledge of the elements that compose it. That means that you can look at a piece of a puzzle for three whole days, you can believe that you know all there is to know about its colouring and shape, and be no further on than when you started.

Georges Perec, Life: A User’s Manual

In practice, this initial phase of exploration into the center of the puzzle consists of repeatedly finding two pieces that have similar colors or textures and trying to fit them together in any conceivable configuration. Most of the time, this will fail. On the rare occasion when it succeeds, the two pieces will melt into one another, becoming a single larger piece, although, at least initially, this larger piece will still remain an abstract blob of color and texture, giving little indication of its eventual identity.

A useful strategy in this early stage of puzzle assembly is to try to directly attach interior pieces to pieces in the completed edge. This allows you to use to your advantage the facts that the completed edge already provides some nontrivial information about the structure of the interior of the puzzle, at least providing partitions into discrete regions of similar colors and textures, and that the edge pieces have fewer degrees of freedom than unconnected interior pieces, so progress will be quicker at the edge. Of course, once an interior piece is attached to the edge, it becomes part of the edge itself, to be further built upon.

The initial stages of mathematical research are equally slow, frustrating, overwhelming, and bewildering. As above, it is prudent to build off of your background knowledge, taking advantage of the little bit of structure it already provides. Make small tweaks to existing techniques to see if they lead anywhere interesting. Try to connect different bits of existing information to understand how they form complementary parts of a coherent whole. Again, most of these initial attempts at creating new knowledge fail. But every now and then something works, and a tiny breakthrough is achieved. And while these initial achievements don’t yet give you an idea of the final shape of the eventual solution, they gradually add both to existing store of background knowledge and to your own personal store of experience.

3. Making Connections

As the small groups of connected pieces constructed in Phase 2 begin to grow, something curious happens. The abstract patterns of the individual pieces begin coalescing into recognizable forms. A formerly mystifying red ribbon reveals itself to be a bicycle frame. A black and orange pattern becomes a person’s clothing. A black smudge, when placed in context, is instantly recognizable as said person’s shoe. And then, sometimes, something truly satisfying happens: two of these groups become rotated in just the right way and moved to just the right spots within the frame of the puzzle, and, in a moment of great serendipity, they are found to fit precisely together, in a way that was entirely unforeseen but in hindsight appears obvious and inevitable. This coupling creates a larger, even more structured group that gives the solver information not just about its constituents but about the pieces that will eventually fill in the remaining gaps. In this way, the landscape of the puzzle is gradually revealed so that, even if not all of the pieces are in place, the solver knows in a broad sense what the final picture will be.

img_20191110_175711

Mathematical research progresses in much the same way. The disparate facts built up in Phase 2 begin to fit together to form a coherent narrative. Bits of knowledge from two seemingly distant domain reveal themselves to be two sides of the same coin, shedding light not only on these specific bits but on a whole constellation of ideas. The correct questions to ask begin to emerge and, later on, you see outlines of their solutions.

It is in this third phase of puzzle solving and mathematical research that the real magic happens, where new discoveries are made, where fleeting moments of pure joy are unearthed. And it is these moments that make the admittedly rather toilsome other phases worth it.

4. Filling in the Gaps

When all of the major features in the puzzle have been fixed in place, when the solver understands the final image, some gaps will remain to be filled: stray undiscovered pieces in the midst of otherwise completed forms, or small expanses of blue sky or green grass. This phase again becomes somewhat tedious and involves none of the discovery of Phase Three, but its finality lends it a satisfaction not found in any of the early solving phases. There is also pleasure in the increasing ease with which you progress through this phase. As the stocks of remaining pieces and open spots simultaneously dwindle, the pieces are slotted in with accelerating rapidity until, suddenly, the image is complete.

img_20200113_124514

In mathematics, too, there is still work to be done after Phase Three. Technical details must be worked out. Proofs must be checked. The paper must be written. And though a mathematical project does not have as obvious an ending as a jigsaw puzzle, there must come a point when you say, “It is done,” and send it to a journal for publication.


There are two additional connections that I want to draw between puzzle solving and mathematical research that don’t quite fit within the above “phases” framework. The first involves the gradual development of intuition and expertise.

The early phases of solving a puzzle consist of a great deal of trial and error. Any two pieces with similar patterns are tried together, in every conceivable configuration, in a slow and rather tedious process. As the puzzle-solving progresses, though, you begin to be able quickly to see, through increased familiarity with subtleties in gradation or piece shape or other more ineffable factors, which pieces are more likely to fit together and which you needn’t bother trying, cutting down greatly on the amount of trial and error needed and considerably speeding up the process. You also begin to see more quickly through the abstraction of the individual pieces, identifying them for what they are even before connecting them with their neighbors.

I was able to witness the effects of this intuition recently. About a year ago, I visited some friends and spent about 15 minutes working on a jigsaw puzzle with their 5-year-old son. At the time, the puzzle was well into what would be labeled “Phase 2” in our framework. Over the course of these 15 minutes, I think I succeeded in connecting precisely one piece after a large amount of trial and error. The five-year-old, meanwhile, seemed to be making much fewer connection attempts but, in the same amount of time, connected about ten pieces. This disparity is not because he is better at puzzles or has faster visual cognition or has better geometric visualization than me, though all of those things are possible. Rather, I think it is because he had been working on this puzzle for enough time that he developed a feel for which types of pieces are likely to work in certain places in the puzzle.

Intuition works similarly in mathematics and continues to be developed across different projects and throughout many years. Every young researcher has had the experience of telling a more senior figure in the field about a recent result they have proven only to have the senior mathematician immediately give a simpler proof of a more general version of the result or make sweeping, initially surprising connections between the new result and existing mathematical facts. This can be initially disconcerting to the young mathematician, since this result that they have spent months or years of hard work on has been absorbed and improved upon by the senior mathematician in a matter of minutes, but in fact it should not take anything away from their accomplishment. After all, they proved this result, while the senior mathematician did not. What the senior mathematician brings is an extensive knowledge of the landscape in which the result sits that lets them quickly assimilate it in their mental space. And through experiences like this, the younger mathematician begins to build up their intuition as well.

The second and final additional connection I want to draw concerns what happens when the project is over.

After the last puzzle piece is placed and you step away to admire your work, the lines between the puzzle pieces begin to mentally dissolve, and the puzzle simply becomes the image printed on its face. All of the difficult toil of the previous hours and days gets washed away. Visitors who look at the completed puzzle do not see all of the ways in which the pieces didn’t fit together. The final solution looks obvious and inevitable. Of course that is how one should place the pieces; how else would you do it?

A completed mathematical project similarly takes on a sheen of inevitability. In your published account of your work, all of the dead ends and initial mistakes are excised from history, and the path from the beginning to the end is laid out in a nice orderly manner rather than the winding maze in which it was initially traversed. After you’ve had a bit of time to get perspective on your work (sometimes only a matter of hours), it begins to seem trivial and obvious. Of course this is how the problem was solved; how else would you do it?

This phenomenon (and some others we have discussed) is well-described by passages in Life: A User’s Manual (for puzzles) and Birth of a Theorem: A Mathematical Adventure, by Cedric Villani (for mathematics).  Let’s look first at the passage on puzzles, which in the book comes shortly after the passage excerpted above:

The pieces are readable, take on a sense, only when assembled; in isolation, a puzzle piece means nothing — just an impossible question, an opaque challenge. But as soon as you have succeeded, after minutes of trial and error, or after a prodigious half-second flash of inspiration, in fitting it into one of its neighbours, the piece disappears, ceases to exist as a piece. The intense difficulty preceding this link-up — which the English word puzzle indicates so well — not only loses its raison d’être, it seems never to have had any reason, so obvious does the solution appear. The two pieces so miraculously conjoined are henceforth one, which in its turn will be a source of error, hesitation, dismay, and expectation.

Life: A User’s Manual

Next, Villani, a distinguished French mathematician, thinks about the process of mathematical research while walking through a dark pedestrian tunnel at night:

This gloomy tunnel is a little like the one you pass through when you begin work on a new mathematical problem […] Total obscurity. Bilbo in Gollum’s tunnel.

A mathematician’s first steps into unknown territory constitute the first phase of a familiar cycle.

After the darkness comes a faint, faint glimmer of light, just enough to make you think that something is there, almost within reach, waiting to be discovered…. Then, after the faint, faint glimmer, if all goes well, you unravel the thread — and suddenly it’s broad daylight! You’re full of confidence, you want to tell anyone who will listen about what you’ve found.

And then, after day has broken, after the sun has climbed high into the sky, a phase of depression inevitably follows. You lose all faith in the importance of what you’ve achieved. Any idiot could have done what you’ve done, go find yourself a more worthwhile problem and make something of your life. Thus the cycle of mathematical research…

Cedric Villani, Birth of a Theorem: A Mathematical Adventure


I finally want to end with one key difference between puzzle-solving and mathematical research. Jigsaw puzzles are bounded, existing entirely within their frame. When you are done with a puzzle, you are done, and the next puzzle you complete will have nothing to do with it except its form. Each mathematical project, on the other hand, is part of the same landscape. When you zoom out and adjust your perspective, it becomes a puzzle piece of its own, to be connected in surprising ways to other projects, forming ever larger and more detailed pieces that combine to form the infinitely rewarding and exquisitely detailed background image of the puzzle of mathematics.


Cover Image: Illustration from The Doubtful Guest by Edward Gorey

The Subtle Art of Go, or Finite Simulations of Infinite Games

Tatta hito-ban to
uchihajimeta wa
sakujitsu nari

Saying `just one game’
they began to play . . .
That was yesterday.

-Senryū (trans. William Pinckard)

A few weeks ago, I found myself in Virginia’s finest book store and made a delightful discovery: a newly published translation of A Short Treatise Inviting the Reader to Discover the Subtle Art of Go, by Pierre Lusson, Georges Perec, and Jacques Roubaud (two mathematicians and two poets, all associates of the French literary workshop Oulipo), originally published in France in 1969.

Go, of course, is a notoriously difficult and abstract board game, invented in China over 2500 years ago and further developed in Korea and Japan. After millennia of being largely unknown outside of East Asia, it has in the last century become popular throughout the world (the publication of this book played a significant role in introducing it to France) and has even been in the news recently, as computer programs using neural networks have defeated some of the best professional go players in highly publicized matches.

The stated goal of Lusson, Perec, and Roubaud’s book is to “[provide], in a clear, complete, and precise manner, the rules of the game of GO” and to “heighten interest in this game.” As a practical manual on the rules and basic tactics and strategy of the game, the modern reader can do much better with other books. As a literary meditation on play, on art, and on infinity, it is dazzling. It is this latter aspect of the book that I want to touch on here today.


A theme running throughout the book is the idea that the practice of go is akin to a journey into infinity, and this theme is expressed both with respect to one’s relationship with other players and with one’s relationship to the game itself.

A joy of learning any game is developing relationships and rivalries with other players, and this is especially true with go, for two main reasons. First, an individual match is not simply won or lost but rather is won or lost by a certain number of points. Second, there is a robust handicapping system whereby a substantially weaker player can legitimately compete with a stronger player, in a match of interest to both players, by placing a specific number of pieces on specific points of the board before the first move. Through these two mechanisms, a rich and rewarding go relationship can thus develop, even between players of unequal skill, over not just one match but a progression of matches, during which handicaps can be finely calibrated and can, indeed, change over time, as the players learn more about each other’s play and about the game in general.

As such, GO, at its limits, constitutes the best finite simulation of an infinite game. The two adversaries battle on the Goban the way cyclists pursue each other in a velodrome.

This is not as much the case with, say, chess, in which the facts that the result of a game is purely binary and that handicap systems are clumsier and more substantially alter the character of the game mean that matches between players of unequal skill will frequently be frustrating for the weaker and boring for the stronger.


It is a cliché to say that one can never truly master a subject, that there is always more to learn. But the richness of go makes it especially true here, and, in a sense, quantifiably so. The number of legal positions in go is more than 2 \times 10^{170}. This is a truly astounding number, dwarfing the estimated number of atoms in the observable universe (10^{80}) or the estimated number of legal positions in chess (a piddling 10^{43}). The possibilities in go are, for all intents and purposes, infinite. No matter how much one learns, one knows essentially nothing.

Crucially, though, there is a well-developed hierarchy through which one progresses and by which one can measure one’s skill, even if remains practically zero when taking a wider view. Lusson, Perec, and Roubaud write about this better than I could in the following two excerpts, so let us simply listen to them:

The genius of GO stems precisely from what it hides as well as what it reveals, at any moment, at any level, in its different, hierarchized mysteries whose progressive mastery transforms the game every time:

A garden of bifurcating pathways, a labyrinth, the game of Babel, each step forward is decisive and each step forward is useless: we will never have finished learning..

(Note the surely intentional nod to Borges in the last sentence above).

From a beginner to a classified amateur in the bottom ranks of kyu, a player can rise to the top kyus and then, one by one, climb the six amateur echelons, approaching (but only approaching) the inaccessible regions where the true players reign, the professionals…

In this last excerpt, we hear echoes of a mathematical concept from set theory, my personal area of expertise. The authors temper the reader’s (and their own) dreams of go mastery by noting that, no matter how high an amateur go player may ascend in their study of go, they will still never reach the “inaccessible” realm of the true masters of the game. These professionals also inhabit a hierarchy, but it is a separate hierarchy, visible but unreachable from below.

This calls to mind the concept of an inaccessible cardinal, which is an uncountable cardinal number \kappa that cannot be reached from below through the standard procedures of climbing to the next cardinal, applying cardinal exponentiation, or taking unions of a small number of small sets. (More formally, \kappa is (strongly) inaccessible if it is regular, uncountable, and, for all \lambda < \kappa, we have 2^\lambda < \kappa.)

It cannot be proven that inaccessible cardinals exist or are even consistent, and the assumption of the consistency of such cardinals has significant implications for what one can prove (see a previous post for more information about inaccessible and other “large cardinals”). On the simplest descriptive level, an inaccessible cardinal divides the hierarchy of infinite cardinals into two realms that cannot communicate via the standard operations of arithmetic: those above and those below.

(A modern version of the book would surely posit a third separate region of hierarchies in go: that of the neural networks that with stunning swiftness have become stronger than the strongest professionals.)


So why bother? If one can spend one’s whole life studying go and still hardly understand the game, if one can develop to one’s full potential and still be nowhere close to the level of professional players, let alone the newly ascendant artificial intelligences, then why start at all?

The authors consider this question, but ultimately they reject its premises. The study of go is not worthwhile in spite of the fact that it is an “infinite pathway.” It is worthwhile because of it.

And this clearly has implications outside of go. Why devote much of my life to mathematical research if I can never know more than a miniscule portion of what remains undiscovered? Why write if it is physically impossible to write more than about 10,000,000 words in a life, and if everything I may write is already contained in Borges’ Library of Babel anyway? Perhaps because the best life is a finite simulation of an infinite life.

Only one activity exists to which GO may be reasonably compared.

We will have understood it is writing.


PS: We have had occasion to mention chess and its relation to the infinite in a previous post. One of the joys of A Short Treatise… is the exaggerated contempt expressed by its authors for the game of chess. We end by offering you just a taste:

Good news!

One of the best European players, Zoran Mutabzija, abandoned chess, to which he had devoted himself since the age of four, as soon as someone taught him GO!

In related news.

We just received a detailed report concerning a certain Louis A. caught in the act of robbing a gas station attendant on the Nationale 6. According to the report, whose source and information cannot be called into question, Louis A. is a notorious chess player.

Grandi’s Series and “Creation Ex Nihilo” (Supertasks 3)

The universe works on a math equation

That never even ever really even ends in the end

Infinity spirals out creation

-Modest Mouse, “Never Ending Math Equation”

In our last post, we discussed Thomson’s Lamp, a supertask in which a lamp that is initially off is switched on and then off again infinitely many times over the course of two seconds (the amount of time between successive switches of the lamp gets exponentially smaller and smaller so that these infinitely many switches can fit within two seconds). We can then ask the perplexing question: “At the end of these two seconds, is the lamp on or off?” Either answer seems wrong; it cannot be on, since there is no previous time at which we turned the lamp on and left it on. But by the same token, it cannot be off, since there is no previous time at which we turned the lamp off and left it off. And yet it must be either on or off at the end of the two seconds, in a state that is seemingly uncaused by any prior state.

Today, we’re going to strip away the physical details of this thought experiment and get to its mathematical core. To start, we note that it is common in electrical devices to denote the state of being “on” with the number 1 and the state of being “off” with the number 0.  With this framework, we can think of the act of turning the lamp on as being equivalent to adding 1 to the numerical state of the lamp, and we can think of the act of turning the lamp off as equivalent to subtracting 1 from the numerical state of the lamp. Our act of switching the lamp on and off an infinite number of times can then be reduced to the infinitely long sum

 1 - 1 + 1 - 1 + 1 - 1 + 1 - 1 + \ldots

and the question, “What is the state of the lamp after two seconds?” is equivalent to the question, “What is the value of this infinite sum?”


Before we actually answer this question (in two different ways!) let’s travel back to 18th century Italy to visit Luigi Guido Grandi, a mathematician and theologian. Grandi was one of the first to study the infinite sum

1 - 1 + 1 - 1 + 1 - 1 + 1 - 1 + \ldots ,

which is often called Grandi’s series in his honor. (What we are calling an infinite sum here is more formally (and more correctly) called an infinite series.) Grandi made the devious observation that, if one cleverly adds parentheses in different ways to the infinite sum, one can seemingly make it attain two different values. First, if one adds parentheses like this:

 (1 - 1) + (1 - 1) + (1 - 1) + (1 - 1) + \ldots

then the sum becomes:

 0 + 0 + 0 + 0 + \ldots ,

which clearly equals 0. On the other hand, if one adds parentheses like this:

 1 + (-1 + 1) + (-1 + 1) + (-1 + 1) + \ldots

then the sum becomes:

 1 + 0 + 0 + 0 + \ldots ,

which just as clearly equals 1! Depending on one’s perspective, the value of the sum can be either 0 or 1. The lamp can equally well be off or on.

Grandi, as a theologian, makes a somewhat more ambitious claim:

By putting parentheses into the expression 1 – 1 + 1 – 1… in different ways, I can, if I want to, obtain 0 or 1. But then the idea of the creation ex nihilo is perfectly plausible.

Creation ex nihilo, of course, is creation out of nothing. This is certainly the only instance I can think of in which clever parenthesis placement is used to make theological arguments about the creation of the universe.


Grandi’s arguments don’t hold up to modern mathematical scrutiny. (And, to be fair, Grandi didn’t really believe them himself; more on that later.) To say more, we need to say what we might actually mean when we assert that an infinite sum has a value. Very roughly speaking, the standard way of treating infinite sums in modern mathematics, the way that I taught to my calculus students a couple of weeks ago, is as follows. To determine whether the infinite sum

a_0 + a_1 + a_2 + a_3 + \ldots + a_k + \ldots

has a definite value, one considers longer and longer finite initial sums:

 S_0 = a_0 \newline  S_1 = a_0 + a_1 \newline S_2 = a_0 + a_1 + a_2 \newline \ldots \newline S_n = a_0 + a_1 + a_2 + \ldots + a_n

If these initial sums S_n approach a fixed number L as n becomes arbitrarily large, then we say that the value of the sum is in fact L:

a_0 + a_1 + a_2 + a_3 + \ldots + a_k + \ldots = L.

Otherwise, the value of the infinite sum is left undefined, and we say that the sum diverges.

This leads to a beautiful mathematical theory with some striking results. For example:

1 + \frac{1}{4} + \frac{1}{9} + \frac{1}{16} + \ldots + \frac{1}{k^2} + \ldots = \frac{\pi^2}{6}

When we consider Grandi’s series, though, the initial sums alternate between 0 and 1. They never approach a single fixed number, so the value of Grandi’s series is undefined. The lamp is neither on nor off.


The story isn’t quite over yet, though. As I mentioned earlier, Grandi didn’t actually think that he could change the value of his infinite sum just by placing parentheses in different ways. But he also didn’t think that its value should be left undefined, as we teach our students in calculus class today. Instead, he thought the series should have a definite value, and that that value, it what could be seen as an early anticipation of the idea of quantum superposition, should be 1/2. The lamp is half on and half off.

Grandi was not alone in thinking this, and a number of arguments have been put forward for why 1/2 is the correct value of the series

1 - 1 + 1 - 1 + 1 - 1 + 1 - 1 + \ldots

Let’s examine a couple of them here. First, Grandi advanced the following story as an explanation: Suppose that two siblings inherit a valuable gem from their parents. They are forbidden from selling the gem, and cutting it in half would ruin its value. So the siblings agree that they will alternate ownership of the gem, exchanging it every New Year’s Day. Assuming this arrangement continues indefinitely, then, from the point of view of the first sibling to take the gem, their ownership of the gem can be represented by the series

1 - 1 + 1 - 1 + 1 - 1 + 1 - 1 + \ldots .

And yet each sibling owns the gem half of the time, so the value of this series should be 1/2.

Next, we consider a more mathematical argument from a young Leibniz. This proceeds via a sequence of equations, each constructed from the previous one. First, he notes that

\frac{1}{2} = 1 - \frac{1}{2}.

Then, substituting the expression 1 - \frac{1}{2} in for \frac{1}{2} on the right side of this equation, he finds that

\frac{1}{2} = 1 - (1 - \frac{1}{2}).

Continuing with similar substitutions, he obtains the equations

\frac{1}{2} = 1 - (1 - (1 - \frac{1}{2})) \newline \frac{1}{2} = 1 - (1 - (1 - (1 - \frac{1}{2}))) \newline \frac{1}{2} = 1 - (1 - (1 - (1 - (1 - \frac{1}{2})))).

Taking this process through infinitely many steps, one should then obtain

\frac{1}{2} = 1 - (1 - (1 - (1 - (1 - (1 - \ldots))))).

And finally, distributing the minus signs through the parentheses, one ends up with

\frac{1}{2} = 1 - 1 + 1 - 1 + 1 - 1 + \ldots ,

which is exactly what we were trying to show.


Neither of these arguments would be considered rigorous or particularly convincing in any mathematics department today, but they can be useful to build intuition. And there is a rigorous framework for infinite sums in which Grandi’s series does equal 1/2. It is known as Cesàro summation, after the late-19th century Italian mathematician Ernesto Cesàro. In Cesàro summation, one doesn’t compute the limit of the finite initial sums S_0, S_1, S_2, \ldots as above. Instead, one computes the limit of the average of the first n initial sums. More precisely, given an infinite sum a_0 + a_1 + a_2 + \ldots and initial sums S_0 = a_0, S_1 = a_0 + a_1, S_2 = a_0 + a_1 + a_2 \ldots, one then defines a sequence A_0, A_1, A_2, \ldots by letting

A_0 = S_0 \newline A_1 = \frac{1}{2}(S_0 + S_1) \newline A_2 = \frac{1}{3}(S_0 + S_1 + S_2) \newline \ldots \newline A_n = \frac{1}{n}(S_0 + S_1 + S_2 + \ldots + S_n) .

Then, if these averages A_n approach a fixed number L as n becomes arbitrarily large, then that number L is said to be the value of the infinite sum a_0 + a_1 + a_2 + \ldots .

Cesàro summation can be seen as an extension of the standard theory of infinite sums explained above. If an infinite sum has a value under the standard account, then it still has that value under Cesàro summation. But there are a number of infinite sums that diverge in the standard theory but have defined values under Cesàro summation. Grandi’s series is one of these sums, and, as expected, its Cesàro value is 1/2.


Cover Image: “Starfield” by Vija Celmins