Circles 3: Building Everything out of Nothing

My neighbor has a circular driveway…he can’t get out.

-Steven Wright

Welcome to the third installment in our miniseries about circularity. In the first, we looked broadly at circular definitions, both in everyday language and in mathematics. In the second, we tackled the surprisingly difficult task of establishing a non-circular definition for the simple-seeming notion of “finite set”. Today, we’re going to continue our discussion of finite sets, introducing a useful definition that at first glance appears circular but upon further analysis turns out to be on surprisingly solid foundations. (The contents of the previous posts are not at all necessary for understanding this one, so don’t worry of you’ve forgotton (or never read…) them.)


Let’s take a look at a few fine examples of finite sets. The empty set, \emptyset, is of course a finite set. The set \{2, 3, 5, 7\} is finite, as it contains 4 elements (the single-digit prime numbers). And the set \{\mathbb{N}, \mathbb{R}\} is finite, containing only 2 elements (the set of natural numbers and the set of real numbers). And while all three of these are certainly finite sets, two of them seem more truly finite than the other. The empty set is obviously finite through and through, as is the set \{2,3,5,7\}, since it only has finitely many elements and each of those elements is simply a finite number. It feels a bit like cheating, on the other hand, to call \{\mathbb{N}, \mathbb{R}\} a finite set. It technically qualifies, having only two elements, but those elements are themselves infinite sets (one is even uncountable!). We’ve somehow snuck infinity into a finite set.

With this in mind, we might want to introduce a strengthening of the notion of “finite set” to rule out sets like \{\mathbb{N}, \mathbb{R}\}, which satisfy the letter but not the spirit of the law. Maybe we’ll call this strengthening truly finite. What should the definition of “truly finite” be? With our example in mind, we might try something like the following.

Attempted Definition 1: A set X is truly finite if it satisfies the following two properties:

  1. X is finite.
  2. All of the elements of X are finite.

This definition certainly rules out our set \{\mathbb{N}, \mathbb{R}\}. But what about a set like \{\{\mathbb{N}\}, \{\mathbb{R}\}\}? This set is finite, having two elements (\{\mathbb{N}\} and \{\mathbb{R}\}). And each of its elements is finite, having only one element (\mathbb{N} and \mathbb{R}, respectively). So \{\{\mathbb{N}\}, \{\mathbb{R}\}\} satisfies our attempted definition of “truly finite”. But of course we don’t actually want it to be truly finite, for the same reason that we don’t want \{\mathbb{N}, \mathbb{R}\} to be truly finite: the infinite sets \mathbb{N} and \mathbb{R} are inside it in an essential way. We should therefore revise our definition. If we do this just to take care of the two examples we’ve encountered so far, we might propose:

Attempted Definition 2: A set X is truly finite if it satisfies the following three properties:

  1. X is finite.
  2. All of the elements of X are finite.
  3. All of the elements of all of the elements of X are finite.

Ok, but this was silly. We already knew that this definition wasn’t going to work for us, as something like \{\{\{\mathbb{N}\}\}, \{\{\mathbb{R}\}\}\} satisfies this definition but again should not be considered to be truly finite. We want to somehow catch our tail and say that a set is truly finite if, no matter how many layers deep you go into the set, you never reach an infinite set. Something like:

Attempted Definition 3: A set X is truly finite if it satisfies the following properties:

  1. X is finite.
  2. All of the elements of X are finite.
  3. All of the elements of all of the elements of X are finite.
  4. And so on…

This definition is at least beginning to express the idea we’re grasping at, and we’re not going to be able to find any clear examples of sets that satisfy this definition and yet obviously should not be considered to be truly finite. We’re trying to write a precise mathematical definition, though, and the phrase “And so on…” is much too vague to be an acceptable part of our final definition.

We need to find a clear, concise way to make the intuition behind Attempted Definition 3 precise. It turns out there is a quite elegant solution. It also turns out that this is a fundamental notion in set theory and already has a name that is different from “truly finite”, so we’ll switch over now to its actual name: “hereditarily finite”.

(Actual!) Definition: A set X is hereditarily finite if it satisfies the following two properties:

  1. X is finite.
  2. All of the elements of X are hereditarily finite.

“Whoa, hold on a minute,” I can hear you saying. “This must be a joke. We’re in the third installment of a series on circular definitions, and you’re trying to pass of this clearly circular definition as an ‘actual’ mathematical definition? Nice try, but I’m not fooled.”

And I’ll admit that it does look as if we’re defining the term “hereditarily finite” in terms of … “hereditarily finite.” But this is in fact a perfectly fine definition, for reasons that those of you familiar with recursion in computer programming may already begin to see.

Let’s start with a small step and ask ourselves the simple question, “Are there any sets at all that we can unambiguously say are hereditarily finite according to the above definition?” Perhaps surprisingly, there are! Take a moment to try to think of one before reading on.


To find a set that is definitely hereditarily finite, let’s return to that most finite of all finite sets: the empty set. The empty set is clearly finite, so it satisfies Property 1 of our definition. What about Property 2? Well, the empty set doesn’t have any elements, so it’s vacuously true that all of its elements are hereditarily finite. (Indeed, any statement asserting something about all of the elements of the empty set is vacuously true. All of the elements of the empty set are the emperor of France. All of the elements of the empty set can make a rock so heavy that they can’t lift it.) The empty set therefore satisfies both properties of our definition, so the empty set is hereditarily finite!

Now that we know this, we can start building a veritable menagerie of hereditarily finite sets. Consider the set \{\emptyset\}. It has one element, so it is finite. Moreover, we now know that its one element, the empty set, is hereditarily finite. Therefore, \{\emptyset\} satisfies both properties in our definition, so it is hereditarily finite, too. What about \{\emptyset, \{\emptyset\}\}? It has two elements, so it is finite. Moreover, we have already seen that both of its elements, \emptyset and \{\emptyset\}, are hereditarily finite. Therefore, \{\emptyset, \{\emptyset\}\} is hereditarily finite. We could keep going, generating any number of exotic hereditarily finite sets, like \{\{\emptyset, \{\emptyset\}\}, \{\{\{\emptyset\}\}\}, \{\emptyset, \{\{\emptyset\}\}\}\} or \{\emptyset, \{\emptyset, \{\emptyset\}\}, \{\emptyset, \{\emptyset, \{\emptyset\}\}\}\} or \{\{\{\{\{\{\{\{\{\{\emptyset\}\}\}\}\}\}\}\}\}\}. Our apparently circular definition actually has an abundance of unambiguous examples.


This is all well and good, but in order for our definition actually to do its job, we want something stronger to be true. We want it to be the case that, if somebody hands us an arbitrary set, we can definitively tell whether or not it is hereditarily finite. And there is in fact a step-by-step process for doing just this. Here it is (for concision, we will abbreviate “hereditarily finite” as “HF”):

To tell whether an arbitrary set X is HF, go through the following steps.

Step 0: If X is infinite, then stop the process and conclude that X is not HF. If X is the empty set, then stop the process and conclude that X is HF. Otherwise, continue to Step 1.

Step 1: If any of the elements of X is infinite, then stop the process and conclude that X is not HF. If the only element of X is the empty set, then stop the process and conclude that X is HF. Otherwise, let X_1 denote the set of all elements of elements of X, and continue to Step 2.

Step 2: If any of the elements of X_1 is infinite, then stop the process and conclude that X is not hereditarily finite. If the only element of X_1 is the empty set, then stop the process and conclude that X is HF. Otherwise, let X_2 denote the set of all elements of elements of X_1, and continue to Step 3.

Step n + 1: If any of the elements of X_n is infinite, then stop the process and conclude that X is not hereditarily finite. If the only element of X_n is the empty set, then stop the process and conclude that X is HF. Otherwise, let X_{n+1} denote the set of all elements of elements of X_n, and continue to Step n + 2.

We can get a good feel for what’s going on here by looking at this process in action for a couple of different sets. I think it’s instructive, but if you want to skip the details, you can scroll down to the next horizontal line.

First, consider the set X = \{\{\emptyset, \{\emptyset\}\}, \{\{\{\emptyset\}\}\}, \{\emptyset, \{\{\emptyset\}\}\}\}, one of our examples of a hereditarily finite set from above, and run the above process. In Step 0, we note that X is not infinite (it has 3 elements). Also, X is not the empty set. So we proceed to Step 1.

In Step 1, we first note that all of the elements of X are finite. Also, it’s not the case that the only element of X is the empty set. Therefore, we let X_1 be the set of all elements of elements of X, i.e., X_1 = \{\emptyset, \{\emptyset\}, \{\{\emptyset\}\} \} , and continue to Step 2.

In Step 2, we note that all of the elements of X_1 are finite, and it’s not the case that the only element of X_1 is the empty set. Therefore, we let X_2 be the set of all elements of elements of X_1, i.e., X_2 = \{\emptyset, \{\emptyset\}\}, and continue to Step 3.

In Step 3, we note that all of the elements of X_2 are finite, and it’s not the case that the only element of X_2 is the empty set. Therefore, we let X_3 be the set of all elements of elements of X_2, i.e., X_3 = \{\emptyset\}, and continue to Step 4.

In Step 4, we note that all of the elements of X_3 are finite. But the only element of X_3 is the empty set, so we stop the process and conclude that X is indeed hereditarily finite.

Next consider X = \{\{\{\mathbb{N}\}\}, \{\{\mathbb{R}\}\}\}, a set that we definitely didn’t want to consider hereditarily finite. In Step 0, we note that X is finite and it’s not the case that its only element is the empty set, so we continue to Step 1.

In Step 1, we note that all of the elements of X are finite (they both have one element) and it’s not the case that its only element is the empty set. So we let X_1 be the set of all elements of elements of X, i.e., X_1 = \{\{\mathbb{N}\}, \{\mathbb{R}\}\}, and continue to Step 2.

In Step 2, we note that all of the elements of X_1 are finite and it’s not the case that its only element is the empty set. So we let X_2 be the set of all elements of elements of X_1, i.e., X_2 = \{\mathbb{N}, \mathbb{R}\}, and continue to Step 3.

In Step 3, we note that, in fact, both elements of X_2 are infinite, so we stop the process and conclude that X is, as we desired, not hereditarily finite.


In both cases above, our process halted after a relatively small number of steps, giving us an answer. Do we know that this process necessarily finishes, though? A priori, it looks possible that for certain sets it might just keep going on forever without yielding any conclusion about whether or not the set is hereditarily finite. It turns out, though, that (at least when working with the standard axioms for set theory) the process necessarily ends after a finite (though potentially arbitrarily large) number of steps, and that this fact is due to a rather curious axiom known as the Axiom of Foundation (sometimes called the Axiom of Regularity). It reads as follows:

The Axiom of Foundation: For every non-empty set X, there is an element Y of X such that X \cap Y = \emptyset.

It might not immediately be obvious that this axiom implies that our process must eventually stop, or even that the axiom has anything to do with what we’ve been discussing today. And since this post is already way too long, I’m not going to give a proof here (maybe in a future post?). But it’s a direct consequence of the axiom (and indeed can even be seen as a kind of reformulation of the axiom) that, given any set X, if I run our process on X, then, after some finite number of steps, I will either encounter an infinite set, in which case I know that X is not hereditarily finite, or I will be left with just the empty set, in which case I know that X is hereditarily finite.

I want to say a few more words about this axiom to close, though. I introduced it as a “curious axiom”, and a reason for this is that, although the axiom is listed among the standard axioms of set theory, essentially all of mathematics can be developed entirely without using the Axiom of Foundation at all! Its raison d’être is simply that it makes sets nicer to work with! Sets that don’t satisfy the Axiom of Foundation are weird and pathological and, for most purposes, we’d rather just forget about them during our day-to-day lives. (There are certainly interesting things to say about axiom systems that don’t include the Axiom of Foundation or even include explicit anti-Foundation axioms, but we’ll leave that story for another day.)

One prominent potential set that the Axiom of Foundation rules out is the set X = \{X\}, i.e., the set whose only element is itself. Let’s see what happens if we try to apply our process to X to determine whether or not it is hereditarily finite. In Step 0, we note that X is not infinite and is not the empty set, so we proceed to Step 1.

In Step 1, we note that all of the elements of X are finite and it’s not the case that its only element is the empty set. We therefore let X_1 be the set of all elements of elements of X. But the only element of X is X, and the only element of X is X (I feel like I’m repeating myself…), so X_1 = \{X\}. But X = \{X\}, so it follows that X_1 = X. We continue to Step 2.

But now, since X_1 = X, Step 2 is going to be exactly the same as Step 1. And then Step 3 is going to be the same again. And Step 4. And Step 5. We’re going to repeat ourselves over and over, never making any progress or getting any closer to knowing whether or not X is hereditarily finite. We’ve at last encountered an actual instance of circularity!

So, is X = \{X\} hereditarily finite or not? Well, here we run into a real limitation of our definition. In this instance, it’s ambiguous. We could arbitrarily declare X to be hereditarily finite, or we could declare it not to be hereditarily finite, and either choice would be consistent with our definition. For if X is HF, then it satisfies both properties in our definition: it’s finite, and all of its elements are hereditarily finite. If, on the other hand, X is not hereditarily finite, then it fails property 2, since its one element is not hereditarily finite. The apparent circularity of our definition of “hereditarily finite” was tamed by the Axiom of Foundation, but if we venture into the wilds of non-well-founded sets like X = \{X\}, then it ceases to be a meaningful concept, and its circularity reemerges.


Cover Image: “The Battle Between Carnival and Lent” by Pieter Bruegel the Elder

Circles 1: Circular Definitions

circular definition. noun Logic. a definition in which the definiendum (the expression being defined) or a variant of it appears in the definiens (the expression that defines it).

-dictionary.com

Today we return to our “Circles” series, which we introduced what seems like years ago but was actually just March, with its first real installment, on circular definitions.


We are rightfully taught from a young age to avoid using a word in its own definition. This makes obvious sense: a definition is, in part, supposed to convey the meaning of a word to a reader who is unfamiliar with it. Ideally, this happens using words that the reader is already familiar with or can become familiar with.  If the unfamiliar word is itself in the definition, it is difficult for it to effectively convey this meaning. The definition has failed at one of its primary tasks.

Even if we diligently avoid using words within their own definitions, though, circularity can easily crop up across multiple definitions. For example, we might make the following sequence of definitions:

ice. noun. the solid form of water.

steam. noun. the gaseous form of ice.

water. noun. the liquid form of steam.

Although none of these definitions is circular on its own, they form a closed circle when taken as a set, and this obscures the meaning of all three words. A person who is unfamiliar with all of the words “ice”, “steam”, and “water” will come away from these definitions knowing something, namely that the three words represent, respectively, the solid, gas, and liquid forms of a certain substance, but not much more than that. They would not know, for instance, any relevant properties of this substance, or where this substance might be found, or what it is used for, and these three definitions would certainly not be printed together in any decent dictionary. (One might argue that the definitions of “ice” and “steam” are basically fine, if a bit limited, and that the definition of “water” should be altered to something more descriptive, thus breaking this instance of circularity. This is essentially what is done by Merriam-Webster, for instance).

Or here is an example actually occurring in Merriam-Webster. (To be fair, the full definitions in M-W are much longer and have multiple entries that we are not quoting in their entirety here. Still, what we reproduce below are the exact quotes of the first definitions of each of the three respective words. (The definition of “weigh” we quote is the first given as an intransitive verb, which is the sense in which it is being used in the definition of “weight”.))

weight. noun. the amount that a thing weighs.

weigh. verb. to have a certain heaviness

heavy. adjective. having great weight

This probably doesn’t seem like an ideal scenario, but satisfactorily removing this circularity from the definitions of “weight”, “weigh”, and “heaviness” (which redirects to “heavy”) is more difficult than it might initially seen. I encourage you to spend a few minutes thinking about it. Let me know if you come up with something significantly better.


One might be wondering at this point if it is even possible to avoid circularity entirely in a system of definitions.  In fact, a rather simple argument, of a sort that is ubiquitous in the mathematical field of graph theory, shows that it is impossible. Let us look at this argument now.

In our analysis, we will represent certain information from the dictionary in a directed graph, i.e., a collection of nodes and a collection of arrows between certain of these nodes. In our graph, each node will represent a word, and there will be an arrow pointing from Word 1 to Word 2 if Word 2 appears in the definition of Word 1. For example, a portion of this graph containing some of the words in our “ice”, “steam”, and “water” definitions given above might look like this:

circular_definition-1
A portion of our “Dictionary directed graph”. Note the circle formed by “ice”, “water”, and “steam”.

We now make two assumptions that I think will seem quite reasonable:

  1. There are only finitely many words in the dictionary.
  2. Every word that is used in the dictionary itself has a definition in the dictionary.

We will now see that these two innocuous assumptions necessarily lead to a circular set of definitions. To see this, choose any word in the dictionary and start following an arbitrary path of arrows in the directed graph derived from the dictionary. For example, in the segment above, we might start at “ice”, then move to “water”, then to “liquid”, and then elsewhere, maybe to “flowing”, for example. Let us label the words we visit on this path as w_1, w_2, w_3, and so on. Recalling the meaning of the graph, this means that the definition of w_1 contains w_2, the definition of w_2 contains w_3, and so on.

A consequence of Assumption 2 is that each of the words w_n visited in our path itself has a definition, and therefore there are certainly arrows pointing out of the node associated with w_n. This means that we never get stuck on our path. We can always choose a next node, so we can continue our path indefinitely and find a word w_n for every natural number n.

We now have an infinite path of words. But Assumption 1 says there are only finitely many words in the dictionary, so this means that our path must repeat some words! In other words, there are numbers m < n such that w_m and w_n are the same word. For example, maybe w_6 is the same word as w_{10}. But in that case, look at what we have:

  • the definition of w_6 contains w_7;
  • the definition of w_7 contains w_8;
  • the definition of w_8 contains w_9;
  • the definition of w_9 contains w_6 (since w_6 = w_{10}).
Written 5-12-20, 16-17-1
A circle of definitions.

Thus, w_6, w_7, w_8, and w_9 form a circular set of definitions. (The same sort of analysis obviously holds for any values of m and n, not just for 6 and 10.)


The fact that circularity is unavoidable should not, though, be a cause for despair, but can perhaps help us to clarify certain ideas about language and dictionaries. A dictionary is not meant to secure a language on a perfectly logically sound, immutable foundation. Nobody is worried that an inconsistency will be found in the dictionary and, as a result, the English language will collapse (whereas this sort of worry certainly does exist, in a limited form, in mathematics). Rather, it is meant to serve as a living documentation of a language as it is actually used in practice, and our everyday use of language is delightfully full of imprecisions and ambiguities and circularities. I would not want things to be any other way.

(This is not to say, of course, that all circular definitions should be uncritically embraced. Circular definitions are often simply unhelpful and can easily be improved. Our definitions of “ice”, “steam”, and “water”, for instance, would be greatly improved simply by defining “water” as something like “the liquid form of a substance whose molecules consist of two hydrogen atoms and one oxygen atom, which freezes at 0 degrees Celsius and boils at 100 degrees Celsius, and which is the principal component of the Earth’s rivers, oceans, lakes, and rain”.)

In the spirit of play, though, and in anticipation of the forthcoming mathematical discussion, let us consider two ways in which our language could be altered to actually allow us to avoid circles. These two ways essentially amount to denying the two Assumptions isolated in our proof above. The first way consists of allowing our language to be infinite. Our argument above hinged on the fact that our language has only finitely many words, so our path must return to a previously visited node at some point. If we had infinitely many words, though, we could line them all up and write down definitions so that each word’s definition only depended on words appearing later. So, for example, Word 1 might be defined in terms of Words 3, 5, and 17, and Word 3 might then be defined in terms of Words 6, 12, and 24, and so on. In practice, this is untenable for obvious physical reasons, but, even absent these physical concerns, it’s also unclear how any actual meaning would arise from such a system.

The second way consists of retaining a finite language but specifying one or more words as not needing definitions. These would be something like atomic words, perhaps expressing pure, indivisible concepts. In the graph diagram in our argument above, the nodes corresponding to these words would have no arrows coming out of them, so our paths would simply come to a halt, with no contradiction reached, if we ever reached such a node. Indeed, it is not hard to see how, given even a small number of such atomic words, one could build an entire dictionary, free from any circularity, on their foundation.

In some sense, this is actually close to how language works in practice. You can look up, for instance, the word “the” in the dictionary and find a definition that, at least for the first few entries, studiously avoids the use of “the”. Such definitions are certainly of use to, say, linguists, or poets. And yet nobody actually learns the word “the” from its definition. Similarly, you probably learned a word like “cat” by, over time, seeing many instances of actual cats, or images or sounds of cats, being told each time that what you were witnessing is a cat, and subconsciously collating the similarities across these examples into a robust notion of the word “cat”. Reading a definition may help clarify some formal aspects of the concept of “cat”, but it probably won’t fundamentally alter your understanding of the word.


One place where circularity certainly should be avoided, though, is in mathematical reasoning. Just as definitions of words make use of other words, proofs of mathematical theorems often make use of other theorems. For example, if I wanted to prove that there is a real number that is not a solution of any polynomial equation with integer coefficients (i.e., there exists a transcendental number), the shortest path might be to make use of two theorems due to Cantor, the first being that the set of polynomial equations with integer coefficients has the same size as the set of natural numbers, and the second being that the set of real numbers is strictly larger than the set of natural numbers. Since these theorems have already been proven, we are free to use them in our proof of the existence of transcendental numbers.

Carelessness about circularity can lead to problems here, though. Suppose we have “proved” three theorems (Theorem A, Theorem B, and Theorem C), but suppose also that our proof of Theorem A depends on the truth of Theorem B, our proof of Theorem B depends on the truth of  Theorem C, and our proof of Theorem C depends on the truth of Theorem A. Have we really proven anything? Well, no, we haven’t. In fact, if you allow for such circular reasoning, it’s easy to prove all sorts of obviously false statements (and I encourage you to try!).

How does mathematics avoid this problem and actually get started proving things? Exactly by the second method outlined above for potentially removing circular definitions from language. Just as we specified certain words as not needing definitions there, here we specify certain mathematical statements as being true without needing proof. These statements are known as axioms; they are accepted as being true, and they serve as the foundation on which mathematical theories are built.

Statements can be chosen as axioms for a number of reasons. Often they are statements that are either seen as obviously true or definitionally true. Sometimes long experience with a particular mathematical field leads to recognition that adopting a certain statement as an axiom is particularly useful or leads to particularly interesting consequences. Different sets of axioms are adopted by different mathematicians depending on their goals and area of study. But they are always necessary to make mathematical progress.

A proper discussion of axioms would fill up many books, so let us end this post here. In our next entry in our Circles series, we will look at the important mathematical idea of well-foundedness and how it allows us to develop mathematical definitions or arguments that at first glance may appear circular but are in fact perfectly valid.


Cover image: “Etretat in the Rain” by Claude Monet

Pythagoras’ Table of Opposites

On March 14 (aka Pi Day, aka International Day of Mathematics, aka Save a Spider Day), the New York Times reposted a 2012 piece about infinity by Natalie Angier (originally published on New Years’ Eve, aka Leap Second Time Adjustment Day, aka Make Up Your Mind Day). For obvious reasons, I thought the essay might be of interest to readers of this blog. It doesn’t delve particularly deeply into any one facet of infinity, but offers a nice overview of some views of the infinite through history.

One thing to which I was introduced by Angier’s piece is Pythagoras’ Table of Opposites, which collects ten pairs of contrasting qualities, placing those viewed positively by the Pythagoreans on the left and those viewed negatively on the right (which is sort of ironic given the fourth pair of opposites in the table below). Here is a (translation of) a version of the Pythagorean Table of Opposites found in work of Aristotle:

Finite Infinite
Odd Even
One Many
Right Left
Male Female
Rest Motion
Straight Curved
Light Darkness
Good Evil
Square Oblong

I’m not going to write too much about this, as I’m certainly not an expert in ancient Greek philosophy, but let me just make a few observations:

  • It’s no surprise that “infinite” was placed on the negative side of the table, given the Pythagoreans’ antipathy to irrational numbers and the accompanying infinities. We have written previously about mathematicians’ growing acceptance of infinity, especially in the last 150 years, though there are still a few mathematicians who would rather do away with the concept of infinity in mathematics.
  • The pairs “One/Many” and “Rest/Motion” play central roles in the paradoxes of Zeno of Elea, covered in this previous post.
  • The association of “right” with “good” and “left” with “bad” is certainly not unique to the ancient Greeks, showing up in a number of other societies throughout the world. It even persists in slightly hidden form in modern etymology: the Latin word for “right” is “dexter”, from which the English words “dextrous” and “dexterity” derive, while the Latin word for “left” is “sinister”, from which, naturally, the English word “sinister” derives. For a deeper and more rigorous account of “left” and “right” and other pairs of opposites in ancient Greek philosophy, see G.E.R. Lloyd’s paper, “Right and Left in Greek Philosophy”.

Cover Image: “PH-929” by Clyfford Still

Circles 0

Life is a full circle, widening until it joins the circle motions of the infinite.

-Anaïs Nin

No shape has captured the human imagination quite like the circle has. Its perfect symmetry and constant radius stand in contrast to the messy variability of our everyday lives. We have inner circles, vicious circles, fairy circles, crop circles, family circles, circles of influence, the circle of life. Circles permeate our conception of time, as they provide the shape of our clocks, and when things return to their original configuration, we say that they have come full circle.

We have seen intimate connections between circles and infinity in many previous posts. The circle is the one-point compactification of the infinite straight line, which itself can be thought of as a circle with infinite radius. The process of circle inversion provides a useful duality between the finite region inside a circle and the infinite region outside. The Poincaré disk model provides an elegant finite setting for a decidedly infinite instantiation of hyperbolic geometry.

In the next few posts, we will be exploring circles more deliberately, through lenses of mathematics, philosophy, linguistics, art, and literature. We will think about circular definitions and closed timelike curves, about causality and the Liar Paradox.

These are for future weeks, though. For today, to lay the foundations and whet the appetite, please enjoy these three essential pieces of circle-related culture:

Although popularly every one called a Circle is deemed a Circle, yet among the better educated Classes it is known that no Circle is really a Circle, but only a Polygon with a very large number of very small sides. As the number of the sides increases, a Polygon approximates to a Circle; and, when the number is very great indeed, say for example three or four hundred, it is extremely difficult for the most delicate touch to feel any polygonal angles. Let me say rather, it would be difficult: for, as I have shown above, Recognition by Feeling is unknown among the highest society, and to feel a circle would be considered a most audacious insult. This habit of abstention from Feeling in the best society enables a Circle the more easily to sustain the veil of mystery in which, from his earliest years, he is wont to enwrap the exact nature of his Perimeter or Circumference. Three feet being the average Perimeter, it follows that, in a Polygon of three hundred sides, each side will be no more than the tenth part of an inch; and in a Polygon of six or seven hundred sides the sides are little larger than the diameter of a Spaceland pin-head. It is always assumed, by courtesy, that the Chief Circle for the time being has ten thousand sides.

Edwin A. Abbott, Flatland: A Romance of Many Dimensions

The Subtle Art of Go, or Finite Simulations of Infinite Games

Tatta hito-ban to
uchihajimeta wa
sakujitsu nari

Saying `just one game’
they began to play . . .
That was yesterday.

-Senryū (trans. William Pinckard)

A few weeks ago, I found myself in Virginia’s finest book store and made a delightful discovery: a newly published translation of A Short Treatise Inviting the Reader to Discover the Subtle Art of Go, by Pierre Lusson, Georges Perec, and Jacques Roubaud (two mathematicians and two poets, all associates of the French literary workshop Oulipo), originally published in France in 1969.

Go, of course, is a notoriously difficult and abstract board game, invented in China over 2500 years ago and further developed in Korea and Japan. After millennia of being largely unknown outside of East Asia, it has in the last century become popular throughout the world (the publication of this book played a significant role in introducing it to France) and has even been in the news recently, as computer programs using neural networks have defeated some of the best professional go players in highly publicized matches.

The stated goal of Lusson, Perec, and Roubaud’s book is to “[provide], in a clear, complete, and precise manner, the rules of the game of GO” and to “heighten interest in this game.” As a practical manual on the rules and basic tactics and strategy of the game, the modern reader can do much better with other books. As a literary meditation on play, on art, and on infinity, it is dazzling. It is this latter aspect of the book that I want to touch on here today.


A theme running throughout the book is the idea that the practice of go is akin to a journey into infinity, and this theme is expressed both with respect to one’s relationship with other players and with one’s relationship to the game itself.

A joy of learning any game is developing relationships and rivalries with other players, and this is especially true with go, for two main reasons. First, an individual match is not simply won or lost but rather is won or lost by a certain number of points. Second, there is a robust handicapping system whereby a substantially weaker player can legitimately compete with a stronger player, in a match of interest to both players, by placing a specific number of pieces on specific points of the board before the first move. Through these two mechanisms, a rich and rewarding go relationship can thus develop, even between players of unequal skill, over not just one match but a progression of matches, during which handicaps can be finely calibrated and can, indeed, change over time, as the players learn more about each other’s play and about the game in general.

As such, GO, at its limits, constitutes the best finite simulation of an infinite game. The two adversaries battle on the Goban the way cyclists pursue each other in a velodrome.

This is not as much the case with, say, chess, in which the facts that the result of a game is purely binary and that handicap systems are clumsier and more substantially alter the character of the game mean that matches between players of unequal skill will frequently be frustrating for the weaker and boring for the stronger.


It is a cliché to say that one can never truly master a subject, that there is always more to learn. But the richness of go makes it especially true here, and, in a sense, quantifiably so. The number of legal positions in go is more than 2 \times 10^{170}. This is a truly astounding number, dwarfing the estimated number of atoms in the observable universe (10^{80}) or the estimated number of legal positions in chess (a piddling 10^{43}). The possibilities in go are, for all intents and purposes, infinite. No matter how much one learns, one knows essentially nothing.

Crucially, though, there is a well-developed hierarchy through which one progresses and by which one can measure one’s skill, even if remains practically zero when taking a wider view. Lusson, Perec, and Roubaud write about this better than I could in the following two excerpts, so let us simply listen to them:

The genius of GO stems precisely from what it hides as well as what it reveals, at any moment, at any level, in its different, hierarchized mysteries whose progressive mastery transforms the game every time:

A garden of bifurcating pathways, a labyrinth, the game of Babel, each step forward is decisive and each step forward is useless: we will never have finished learning..

(Note the surely intentional nod to Borges in the last sentence above).

From a beginner to a classified amateur in the bottom ranks of kyu, a player can rise to the top kyus and then, one by one, climb the six amateur echelons, approaching (but only approaching) the inaccessible regions where the true players reign, the professionals…

In this last excerpt, we hear echoes of a mathematical concept from set theory, my personal area of expertise. The authors temper the reader’s (and their own) dreams of go mastery by noting that, no matter how high an amateur go player may ascend in their study of go, they will still never reach the “inaccessible” realm of the true masters of the game. These professionals also inhabit a hierarchy, but it is a separate hierarchy, visible but unreachable from below.

This calls to mind the concept of an inaccessible cardinal, which is an uncountable cardinal number \kappa that cannot be reached from below through the standard procedures of climbing to the next cardinal, applying cardinal exponentiation, or taking unions of a small number of small sets. (More formally, \kappa is (strongly) inaccessible if it is regular, uncountable, and, for all \lambda < \kappa, we have 2^\lambda < \kappa.)

It cannot be proven that inaccessible cardinals exist or are even consistent, and the assumption of the consistency of such cardinals has significant implications for what one can prove (see a previous post for more information about inaccessible and other “large cardinals”). On the simplest descriptive level, an inaccessible cardinal divides the hierarchy of infinite cardinals into two realms that cannot communicate via the standard operations of arithmetic: those above and those below.

(A modern version of the book would surely posit a third separate region of hierarchies in go: that of the neural networks that with stunning swiftness have become stronger than the strongest professionals.)


So why bother? If one can spend one’s whole life studying go and still hardly understand the game, if one can develop to one’s full potential and still be nowhere close to the level of professional players, let alone the newly ascendant artificial intelligences, then why start at all?

The authors consider this question, but ultimately they reject its premises. The study of go is not worthwhile in spite of the fact that it is an “infinite pathway.” It is worthwhile because of it.

And this clearly has implications outside of go. Why devote much of my life to mathematical research if I can never know more than a miniscule portion of what remains undiscovered? Why write if it is physically impossible to write more than about 10,000,000 words in a life, and if everything I may write is already contained in Borges’ Library of Babel anyway? Perhaps because the best life is a finite simulation of an infinite life.

Only one activity exists to which GO may be reasonably compared.

We will have understood it is writing.


PS: We have had occasion to mention chess and its relation to the infinite in a previous post. One of the joys of A Short Treatise… is the exaggerated contempt expressed by its authors for the game of chess. We end by offering you just a taste:

Good news!

One of the best European players, Zoran Mutabzija, abandoned chess, to which he had devoted himself since the age of four, as soon as someone taught him GO!

In related news.

We just received a detailed report concerning a certain Louis A. caught in the act of robbing a gas station attendant on the Nationale 6. According to the report, whose source and information cannot be called into question, Louis A. is a notorious chess player.

Grandi’s Series and “Creation Ex Nihilo” (Supertasks 3)

The universe works on a math equation

That never even ever really even ends in the end

Infinity spirals out creation

-Modest Mouse, “Never Ending Math Equation”

In our last post, we discussed Thomson’s Lamp, a supertask in which a lamp that is initially off is switched on and then off again infinitely many times over the course of two seconds (the amount of time between successive switches of the lamp gets exponentially smaller and smaller so that these infinitely many switches can fit within two seconds). We can then ask the perplexing question: “At the end of these two seconds, is the lamp on or off?” Either answer seems wrong; it cannot be on, since there is no previous time at which we turned the lamp on and left it on. But by the same token, it cannot be off, since there is no previous time at which we turned the lamp off and left it off. And yet it must be either on or off at the end of the two seconds, in a state that is seemingly uncaused by any prior state.

Today, we’re going to strip away the physical details of this thought experiment and get to its mathematical core. To start, we note that it is common in electrical devices to denote the state of being “on” with the number 1 and the state of being “off” with the number 0.  With this framework, we can think of the act of turning the lamp on as being equivalent to adding 1 to the numerical state of the lamp, and we can think of the act of turning the lamp off as equivalent to subtracting 1 from the numerical state of the lamp. Our act of switching the lamp on and off an infinite number of times can then be reduced to the infinitely long sum

 1 - 1 + 1 - 1 + 1 - 1 + 1 - 1 + \ldots

and the question, “What is the state of the lamp after two seconds?” is equivalent to the question, “What is the value of this infinite sum?”


Before we actually answer this question (in two different ways!) let’s travel back to 18th century Italy to visit Luigi Guido Grandi, a mathematician and theologian. Grandi was one of the first to study the infinite sum

1 - 1 + 1 - 1 + 1 - 1 + 1 - 1 + \ldots ,

which is often called Grandi’s series in his honor. (What we are calling an infinite sum here is more formally (and more correctly) called an infinite series.) Grandi made the devious observation that, if one cleverly adds parentheses in different ways to the infinite sum, one can seemingly make it attain two different values. First, if one adds parentheses like this:

 (1 - 1) + (1 - 1) + (1 - 1) + (1 - 1) + \ldots

then the sum becomes:

 0 + 0 + 0 + 0 + \ldots ,

which clearly equals 0. On the other hand, if one adds parentheses like this:

 1 + (-1 + 1) + (-1 + 1) + (-1 + 1) + \ldots

then the sum becomes:

 1 + 0 + 0 + 0 + \ldots ,

which just as clearly equals 1! Depending on one’s perspective, the value of the sum can be either 0 or 1. The lamp can equally well be off or on.

Grandi, as a theologian, makes a somewhat more ambitious claim:

By putting parentheses into the expression 1 – 1 + 1 – 1… in different ways, I can, if I want to, obtain 0 or 1. But then the idea of the creation ex nihilo is perfectly plausible.

Creation ex nihilo, of course, is creation out of nothing. This is certainly the only instance I can think of in which clever parenthesis placement is used to make theological arguments about the creation of the universe.


Grandi’s arguments don’t hold up to modern mathematical scrutiny. (And, to be fair, Grandi didn’t really believe them himself; more on that later.) To say more, we need to say what we might actually mean when we assert that an infinite sum has a value. Very roughly speaking, the standard way of treating infinite sums in modern mathematics, the way that I taught to my calculus students a couple of weeks ago, is as follows. To determine whether the infinite sum

a_0 + a_1 + a_2 + a_3 + \ldots + a_k + \ldots

has a definite value, one considers longer and longer finite initial sums:

 S_0 = a_0 \newline  S_1 = a_0 + a_1 \newline S_2 = a_0 + a_1 + a_2 \newline \ldots \newline S_n = a_0 + a_1 + a_2 + \ldots + a_n

If these initial sums S_n approach a fixed number L as n becomes arbitrarily large, then we say that the value of the sum is in fact L:

a_0 + a_1 + a_2 + a_3 + \ldots + a_k + \ldots = L.

Otherwise, the value of the infinite sum is left undefined, and we say that the sum diverges.

This leads to a beautiful mathematical theory with some striking results. For example:

1 + \frac{1}{4} + \frac{1}{9} + \frac{1}{16} + \ldots + \frac{1}{k^2} + \ldots = \frac{\pi^2}{6}

When we consider Grandi’s series, though, the initial sums alternate between 0 and 1. They never approach a single fixed number, so the value of Grandi’s series is undefined. The lamp is neither on nor off.


The story isn’t quite over yet, though. As I mentioned earlier, Grandi didn’t actually think that he could change the value of his infinite sum just by placing parentheses in different ways. But he also didn’t think that its value should be left undefined, as we teach our students in calculus class today. Instead, he thought the series should have a definite value, and that that value, it what could be seen as an early anticipation of the idea of quantum superposition, should be 1/2. The lamp is half on and half off.

Grandi was not alone in thinking this, and a number of arguments have been put forward for why 1/2 is the correct value of the series

1 - 1 + 1 - 1 + 1 - 1 + 1 - 1 + \ldots

Let’s examine a couple of them here. First, Grandi advanced the following story as an explanation: Suppose that two siblings inherit a valuable gem from their parents. They are forbidden from selling the gem, and cutting it in half would ruin its value. So the siblings agree that they will alternate ownership of the gem, exchanging it every New Year’s Day. Assuming this arrangement continues indefinitely, then, from the point of view of the first sibling to take the gem, their ownership of the gem can be represented by the series

1 - 1 + 1 - 1 + 1 - 1 + 1 - 1 + \ldots .

And yet each sibling owns the gem half of the time, so the value of this series should be 1/2.

Next, we consider a more mathematical argument from a young Leibniz. This proceeds via a sequence of equations, each constructed from the previous one. First, he notes that

\frac{1}{2} = 1 - \frac{1}{2}.

Then, substituting the expression 1 - \frac{1}{2} in for \frac{1}{2} on the right side of this equation, he finds that

\frac{1}{2} = 1 - (1 - \frac{1}{2}).

Continuing with similar substitutions, he obtains the equations

\frac{1}{2} = 1 - (1 - (1 - \frac{1}{2})) \newline \frac{1}{2} = 1 - (1 - (1 - (1 - \frac{1}{2}))) \newline \frac{1}{2} = 1 - (1 - (1 - (1 - (1 - \frac{1}{2})))).

Taking this process through infinitely many steps, one should then obtain

\frac{1}{2} = 1 - (1 - (1 - (1 - (1 - (1 - \ldots))))).

And finally, distributing the minus signs through the parentheses, one ends up with

\frac{1}{2} = 1 - 1 + 1 - 1 + 1 - 1 + \ldots ,

which is exactly what we were trying to show.


Neither of these arguments would be considered rigorous or particularly convincing in any mathematics department today, but they can be useful to build intuition. And there is a rigorous framework for infinite sums in which Grandi’s series does equal 1/2. It is known as Cesàro summation, after the late-19th century Italian mathematician Ernesto Cesàro. In Cesàro summation, one doesn’t compute the limit of the finite initial sums S_0, S_1, S_2, \ldots as above. Instead, one computes the limit of the average of the first n initial sums. More precisely, given an infinite sum a_0 + a_1 + a_2 + \ldots and initial sums S_0 = a_0, S_1 = a_0 + a_1, S_2 = a_0 + a_1 + a_2 \ldots, one then defines a sequence A_0, A_1, A_2, \ldots by letting

A_0 = S_0 \newline A_1 = \frac{1}{2}(S_0 + S_1) \newline A_2 = \frac{1}{3}(S_0 + S_1 + S_2) \newline \ldots \newline A_n = \frac{1}{n}(S_0 + S_1 + S_2 + \ldots + S_n) .

Then, if these averages A_n approach a fixed number L as n becomes arbitrarily large, then that number L is said to be the value of the infinite sum a_0 + a_1 + a_2 + \ldots .

Cesàro summation can be seen as an extension of the standard theory of infinite sums explained above. If an infinite sum has a value under the standard account, then it still has that value under Cesàro summation. But there are a number of infinite sums that diverge in the standard theory but have defined values under Cesàro summation. Grandi’s series is one of these sums, and, as expected, its Cesàro value is 1/2.


Cover Image: “Starfield” by Vija Celmins

Thomson’s Lamp and Determinism (Supertasks II)

In our previous post, we began a discussion of supertasks, or processes with infinitely many steps that are completed in a finite amount of time. Today, we visit the source of the word supertask (or, in its original formulation, super-task), J.F. Thomson’s 1954 paper, “Tasks and Super-tasks”, published in Analysis.

The main thesis of Thomson’s paper is that supertasks are paradoxical, that it is impossible for someone to have completed an infinite sequence of tasks. (He is careful to distinguish between (1) completing an infinite sequence of tasks, which he thinks is impossible and (2) having an infinite number of tasks that one can potentially do. This is essentially the classical distinction between actual and potential infinity that we have discussed here a number of times.)

To illustrate the impossibility of supertasks, Thomson introduces a particular supertask, now known as Thomson’s Lamp. The idea is as follows: Suppose we have a lamp with a button on its base. If the lamp is off, then pressing the button turns it on; if it is on, then pressing the button turns it off. You have probably encountered such a lamp at some point in your life. Now imagine the following sequence of actions. The lamp is initially off. After one second, I press the button to turn it on. After a further 1/2 second, I press the button again to turn it off. After a further 1/4 second, I press the button to turn it on. After a further 1/8 second, I press the button to turn it off. And so on. (As in our last post, we are ignoring the obvious physical impossibilities of this situation, such as the fact that my finger would eventually be moving faster than the speed of light or that the wires in the lamp would become arbitrarily hot.) In the course of 2 seconds, I press the button infinitely many times. Moreover, there is no “last” button press; each press is quickly followed by another. So the question is this: after 2 seconds, is the lamp on or off? Thomson feels that either answer is impossible, and yet one of them must hold:

It cannot be on, because I did not ever turn it on without at once turning it off. It cannot be off, because I did in the first place turn it on, and thereafter I never turned it off without at once turning it on. But the lamp must be either on or off. This is a contradiction.

Thomson concludes that supertasks are impossible. (I am oversimplifying a little bit here; I encourage interested readers to check out Thomson’s original paper, which is fairly short and well-written.) As anyone passingly familiar with the history of philosophy could guess, though, not everybody was convinced by Thomson’s arguments, and Thomson’s paper initiated a small but lively philosophical discussion on supertasks, some of which we may examine in future posts.

Today, though, I don’t want to focus directly on the possibility of supertasks in general or Thomson’s Lamp in particular, but rather to assume that they are possible and think about what this would mean for a topic of great interest in philosophy: determinism.

Roughly speaking, determinism is the assertion that all events are uniquely determined by what has happened before. Given the history of the universe, its future is fixed in stone. The first modern expression of this type of determinism is typically credited to the French mathematician Pierre-Simon Laplace, who imagined the existence of a demon who knows the precise location and momentum of every object in the universe and is therefore able to calculate all future events.

Determinism is a vast subject, and we don’t have nearly enough time to touch on its many connections with topics such as religion, free will, or quantum mechanics. We do have time, however, to show that Thomson’s Lamp, plus a bit of intuition (always dangerous when dealing with infinity, but let’s allow it for the sake of argument), places serious doubt on the truth of determinism.

To see this, let’s suppose that the supertask of Thomson’s Lamp is in fact coherent, and let’s revisit the question: after 2 seconds, is the lamp on or off? I claim that either answer is possible. To see this, let’s be a little bit more precise. Let’s call our supertask, in which the lamp starts off and we press the button infinitely many times in 2 seconds, Process A. Suppose my claim is wrong and that, at the conclusion of Process A, the lamp is necessarily on (if the lamp is necessarily off, then a symmetric argument will work). Now imagine that Process B is the same as Process A but with the role of “on” and “off” reversed: the lamp starts on and, over the course of 2 seconds, we press the button infinitely many times. One would expect that the final state of Process B would just be the reverse of the final state of Process A, namely that the lamp is necessarily off at the conclusion of Process B. (If this doesn’t seem intuitively obvious to you, try imagining a more symmetric situation in which the switch doesn’t turn the lamp on or off but instead switches the color of light emitted from red to blue.) But now suppose that an observer arrives at Process A exactly 1 second late and thus misses the first press of the button. From their perspective, the remainder of the supertask looks exactly like Process B, just sped up by a factor of two. This speeding up intuitively shouldn’t affect the final outcome, so they should rightly expect the lamp to be off at the end of the process, whereas I should rightly expect the lamp to be on at the end of the process, since I know we are completing Process A. We can’t both be correct, so this is a contradiction.

But if, at the end of the supertask, the lamp could consistently either be on or off, then the state of the lamp after two seconds is not uniquely determined by past events. Laplace’s demon cannot predict the behavior of the lamp after two seconds, even if it knows everything about the universe at all times before two seconds have elapsed. In other words, if we accept the coherence of Thomson’s lamp, then determinism must be false!


This was admittedly a very non-rigorous argument, but I hope it provides some food for thought or at least some amusement. Next time, we’ll return to doing some real mathematics and connect ideas of supertasks with the values of infinite series!

Cover image: “L’empire des lumières” by René Magritte

Infinite Collisions, or Something from Nothing (Supertasks I)

Hello! It’s been over a year since my last post; I took some time off as I moved to a new city and a new job and became generally more busy. I missed writing this blog, though, and recently I found myself reading about supertasks, tasks or processes with infinitely many steps but which are completed in a finite amount of time. This made the finitely many things I was doing seem lazy by comparison, so I think I’m going to try updating this site regularly again.

The first supertasks were possibly those considered by Zeno in his paradoxes of motion (which we’ve thought about here before). In the simplest, Achilles is running a 1 kilometer race down a road. To complete this race, though, Achilles must first run the first 1/2 kilometer. He must then run the next 1/4 kilometer, and then the next 1/8 kilometer, and so on. He must seemingly complete an infinite number of tasks just in order to finish the race.

Zeno_Dichotomy_Paradox
Achilles running a race. (Image by Martin Grandjean, CC BY-SA 4.0)

If this idea leaves you a bit confused, you’re not alone! It seems that the same idea can be applied to any continuous process; first half of the process must take place, then the next quarter, then the next eighth, and so on. How does anything ever get done? Is every task a supertask?

Supertasks reemerged as a topic of philosophical study in the mid-twentieth century, when J.F. Thomson introduced both the term supertask and a new example of a supertask, now known as Thomson’s Lamp: Imagine you have a lamp with an on/off switch. Now imagine that you switch the lamp on at time t = 0 seconds. After 1/2 second, you switch the lamp off again. After a further 1/4 second, you switch the lamp on again. After a further 1/8 second, you switch the lamp off again, and so on. After 1 full second, you have switched the lamp on and off infinitely many times. Now ask yourself: at time t = 1, is the lamp on or off? Is this even a coherent question to ask?

We’ll have more to say about this and other supertasks in future posts, but today I want to present a particularly elegant supertask introduced by Jon Perez Laraudogoitia in the aptly named paper “A Beautiful Supertask”, published in the journal Mind in 1996. Before describing the supertask, let me note that it will obviously not be compatible with our physical universe, or perhaps any coherent physical universe, for a number of reasons: it assumes perfectly elastic collisions with no energy lost (no sound or heat created, for instance), it assumes that particles with a fixed mass can be made arbitrarily small, it assumes that nothing will go horribly wrong if we enclose infinitely much matter in a bounded region, and it neglects the effects of gravity. Let’s put aside these objections for today, though, and simply enjoy the thought experiment.

Imagine that we have infinitely objects, all of the same mass. Call these objects m1, m2, m3, and so on. They are all arranged on a straight line, which we’ll think of as being horizontal. To start, each of these objects is standing still. Object m1 is located 1 meter to the right of a point we will call the origin, object m2 is 1/2 meter to the right of the origin, object m3 is 1/4 meter to the right of the origin, and so on. (Again, we’re ignoring gravity here. Also, so that we may fit these infinitely many objects in a finite amount of space, you can either assume that they are point masses without dimension or you can imagine them as spheres that get progressively smaller (but retain the same mass) as their index increases.) Meanwhile, another object of the same mass, m0, appears to the right of m1, moving to the left at 1 meter per second. At time t = 0 seconds, it collides with m1.

beautiful_supertask
Our system immediately before t = 0.

What happens after time t = 0? Well, since we’re assuming our collisions are perfectly elastic, m0 transfers all of its momentum to m1, so that m0 is motionless 1 meter from the origin and m1 is moving to the left at 1 meter per second. After a further half second, at t = 1/2, m1 collides with m2, so that m1 comes to rest 1/2 meter from the origin while m2 begins moving to the left at 1 meter per second. This continues. At time t = 3/4, m2 collides with m3. At time t = 7/8, m3 collides with m4, and so on.

Where are we after 1 full second? By time t = 1, each object has been hit by the previous object and has in turn collided with the next object. m0 is motionless where m1 started, m1 is motionless where m2 started, m2 is motionless where m3 started, and so on. Our entire system is at rest. Moreover, compare the system {m0, m1, m2, …} at t=1 to the system {m1, m2, m3, …} at t=0. They’re indistinguishable! Our initial system {m1, m2, m3, …} has absorbed the new particle m0 and all of its kinetic energy and emerged entirely unchanged!

Things become seemingly stranger if we reverse time and consider this process backwards. We begin with the system {m0, m1, m2, …}, entirely at rest. But then, at t = -1, for no apparent reason, a chain reaction of collisions begins around the origin (“begins” might not be quite the right terms — there’s no first collision setting it all off). The collisions propagate rightward, and at t = 0 the system spits out particle m0, moving to the right at 1 meter per second, while the system {m1, m2, m3, …} is now identical to the system {m0, m1, m2, …} at time t = -1. The ejection of this particle from a previously static system is an inexplicable event. It has no root cause; each collision in the chain was caused by the one before it, ad infinitum. A moving object has seemingly emerged from nothing; kinetic energy has seemingly been created for free.


Cover Image: “4 Spheres” by Victor Vasarely

Common Knowledge

The phrase “common knowledge,” in its common usage, refers to knowledge that is held by (nearly) everyone, at least within some given community. For example, it is common knowledge in modern society that the Earth is round and orbits the sun, or that 2+2=4. These are facts that I can assume that you, my readers, already know or can easily verify, and it’s common practice not to give citations or acknowledge sources for them.

Today, though, we’re not really going to be discussing this common, informal usage of “common knowledge.” Nor are we going to devote this post to facts about the musician behind the classic hip hop album, Resurrection. (Though did you know that the track “I Used to Love H.E.R.” launched a feud with Ice Cube that was eventually settled at a private meeting organized by Louis Farrakhan at his house??) Rather, we’re going to delve into “common knowledge” as a fascinating technical concept in philosophical logic, epistemology, and social science.

Let’s start with the idea of mutual knowledge. Suppose that there is a group of people, and there is a fact that is known by everyone in the group. Then this fact is mutual knowledge among the members of the group. The classification of a fact as mutual knowledge seems like a pretty strong assertion, but we can (and often do) ask for even more. We can ask, for example, that not only does everyone in the group know the fact, but everyone in the group knows that everyone in the group knows the fact. Or we can go further, and ask that everyone in the group knows that everyone in the group knows that everyone in the group knows the fact. Carrying this process to infinity, we obtain the concept of common knowledge.

To be more precise, let us say that a fact is first-order knowledge among members of a group if everyone knows it, i.e., it is mutual knowledge. Let us say it is second-order knowledge if everyone in the group knows that everyone in the group knows it. Continue in this manner. In general, if n is a natural number and we have already defined the notion of n-th order knowledge, then a fact is (n+1)-st order knowledge among members of a group if it is n-th order knowledge and everyone in the group knows that it is n-th order knowledge. A fact is then common knowledge among members of a group if it is n-th order knowledge for every natural number n.

Let’s look at an example to try to clarify the distinction between mutual knowledge and common knowledge. Suppose that all of the students in a class have arrived to a classroom early and are sitting at their desks, and there is a large beetle crawling on the wall. Every student sees the beetle, so the presence of the beetle in the classroom is mutual knowledge among the students. None of the students particularly wants to deal with the beetle, though, so each student pretends to ignore it. As far as each student is concerned, they might be the only student who has noticed the beetle. The presence of the beetle is thus not even second-order knowledge among the students. Then the professor walks in the room, sees the beetle, points at it, and loudly exclaims, “Ah, I see we have a beetle auditing the class today!” Now, suddenly, the students know that they all know that there is a beetle in the room, and they know that they all know that they all know that there is a beetle in the room, and so on. The presence of the beetle has suddenly become common knowledge.

In the grand scheme of things, the study of common knowledge, and higher order knowledge in general, is a rather recent development in philosophy; David Hume was perhaps the first major philosopher to consider it, in A Treatise of Human Nature. It has become extensively studied since then, though, and, if one knows where to look, it can be found almost anywhere.

Consider the following example, due to philosopher Alan Hájek. You are in a restaurant, and a server walking by you trips on his shoelaces and spills a bit of soup on your shirt. You are, naturally, upset, and the server becomes apologetic and says, “It was my fault.” This is of course something almost everyone would expect him to say, but let’s look at it more closely. Why exactly did he say it, and why is it important? Everyone involved in the incident already knew that the server was at fault, so this utterance would seem to add no new information. However, it actually can be the key to ensuring that the incident does not escalate further, by establishing the server’s fault as common knowledge. Indeed, before the server’s admission, you might have thought that the server considered you at fault, which surely would have made you even angrier. After the server’s statement, some common ground has been established, and, with luck and maybe a free dessert, the situation can calmly be resolved.

The notion of higher order knowledge also shows up all the time in game theory (and, therefore, economics, international relations, etc.). When formulating an ideal strategy for a game, one must consider not only the current state and rules of the game, but also the knowledge and strategies of all of one’s partners and adversaries. But they, in turn, of course, are considering your knowledge and strategies when formulating their own, and one quickly reaches vertiginous levels of higher order knowledge. If you want to see common knowledge repeatedly raise its many confusing heads, just head to a high stakes poker game. (“I know that she would almost never bluff in this spot. But we’ve been playing together for days, so I also know that she knows that I know this, so maybe she’s actually more likely to bluff. But maybe she knows that I know that she knows that I know this, which means…I don’t know. I call.”)


We end today with a puzzle that has common knowledge at its core: There are two prisoners, Louis and Marie, each in their own prison cell. Each looks out on their own little yard. Louis can see 8 trees in his yard; Marie can see 12. One day, the warden brings them together and tells them both, “Between the two of you, you can see a total of either 18 or 20 trees. Every day at 5 pm, I will go to each of you. If one of you tells me the correct total number of trees in your two yards, you will immediately be set free. If you are incorrect, you will immediately be executed. If you say nothing, you will stay in prison for another day.”

Louis and Marie are not able to directly communicate with one another during this meeting with the warden, and no communication is possible between their two cells. Each was fully aware of themselves during the meeting, though; the contents of the warden’s speech have become common knowledge between them.

Q: Assuming Louis and Marie are entirely rational (and that their rationality is common knowledge between them) can they guarantee their release? If so, how? How long will it take?

We’ll talk about the solution beneath this nice painting of trees. Meanwhile, take a few minutes to think about it yourselves.

klimt_pine_forest
Gustav Klimt, Pine Forest II. If you can correctly tell me how many trees you see, you will be set free…

At first, it might not seem like the warden’s speech gave any really useful information. Louis now knows that Marie sees either 10 or 12 trees, and Marie now knows that Louis sees either 6 or 8 trees. But it seems that neither can be sure which of the two numbers the other sees.

If we delve deeper into the thought processes of the prisoners, though, and consider some counterfactual situations, a different picture arises. First, suppose that Marie actually saw 19 or 20 trees out her window. Then, on Day 1, Marie could confidently say, “We see 20 trees between us,” because she see more than 18 trees herself. Therefore, back in the real world, after neither prisoner says anything on Day 1, and the prisoners therefore remain in their cells, Louis knows that Marie sees at most 18 trees. You might naturally raise the following objection here: “Of course Louis knows that Marie sees at most 18 trees. He didn’t have to wait until after Day 1 to know this. He already knew this immediately after the warden’s speech, when he knew that Marie sees either 10 or 12 trees!”

But the important point here is not just that Louis knows that Marie sees at most 18 trees, but that this fact is now common knowledge, since it follows immediately from the information given by the warden and by the fact that nobody said anything on Day 1, both of which are themselves common knowledge.

Furthermore, just for the sake of completeness, and at the risk of getting lost in the weeds, let us argue that the fact that Marie sees at most 18 trees was not common knowledge before 5 pm on Day 1. Indeed, Louis knew then that Marie sees 10 or 12 trees. Put Marie does not know that Louis knows this. This is because, as far as Marie knows, Louis could see 6 trees, in which case Louis would know that Marie sees either 12 or 14 trees. Therefore, Marie can only know that Louis knows that Marie sees either 10, 12, or 14 trees. But, in turn, Louis cannot know that Marie knows that, since, as far as Louis knows, Marie could see 10 trees, in which case Marie would only know that Louis knows that Marie sees either 8, 10, or 12 trees. It follows that Louis only knows that Marie knows that Louis knows that Marie sees either 8, 10, 12, or 14 trees. And we can continue in this way. The higher order the knowledge becomes, the more uncertainty is introduced into the range, until we reach the statement: Marie knows that Louis knows that Marie knows that Louis knows that Marie knows that Louis knows that Marie knows that Louis knows that Marie sees 4, 6, 8, 10, 12, 14, 16, 18, or 20 trees. Therefore, before 5 pm on Day 1, the knowledge that Marie sees at most 18 trees is not even ninth order knowledge!

I’m sure that was crystal clear. So let’s continue. We’ve established that, after Day 1, it is common knowledge that Marie sees at most 18 trees. Suppose now that it happened to be the case that Louis saw fewer than 2 trees. Then this fact, combined with the knowledge that Marie sees at most 18 trees, would lead Louis to conclude that the prisoners see fewer than 20 trees between them, so they must see exactly 18. In the real world, when Day 2 goes by and the prisoners are still in their cells, it therefore becomes common knowledge that Louis sees at least 2 trees. Again, this was not common knowledge before 5 pm on Day 2.

It should now be clear how to continue. After Day 3 goes by with no change, it becomes common knowledge that Marie sees at most 16 trees. Then, after Day 4, it becomes common knowledge that Louis sees at least 4 trees. After Day 5, Marie sees at most 14 trees. After Day 6, Louis sees at least 6 trees. After Day 7, Marie sees at most 12 trees.

So far, none of this common knowledge is enough for either prisoner to deduce the total number of trees. In fact, neither of the prisoners has learned anything new about the number of trees seen by the other. However, their higher order knowledge has steadily grown. Indeed, the increasing ranges we saw creeping in to the prisoners’ higher order knowledge on Day 1 have gradually been winnowed by the passage of time. And this pays off after Day 8, when it becomes common knowledge that Louis sees at least 8 trees. Marie then carefully recounts the trees outside her window, confirms that it is 12, and concludes that, together, they see at least (and therefore exactly) 20 trees. She gives the correct answer on Day 9, and the prisoners are released!


There’s another famous puzzle involving common knowledge, initially even more baffling than this one. It is known as the blue-eyed islanders puzzle. It has been written about extensively elsewhere, so let me just point you to one such place, namely, Terence Tao’s blog.


For more reading on common knowledge, see this entry at the Stanford Encyclopedia of Philosophy.


Cover Image: Spock performing a Vulcan mind meld, thereby establishing common knowledge with Dr. Simon Van Gelder.

Playing Games II: The Rules

In an earlier examination of games, we ran into some trouble when Hypergame, a “game” we defined, led to a contradiction. This ended up being a positive development, as the ideas we developed there led us to a (non-contradictory) proof of Cantor’s Theorem, but it indicates that, if we are going to be serious about our study of games, we need to be more careful about our definitions.

So, what is a game? Here’s what Wittgenstein had to say about the question in his famous development of the notion of language games and family resemblances:

66. Consider for example the proceedings that we call “games.” I mean board-games, card-games, ball-games, Olympic games, and so on. What is common to them all? Don’t say: “There must be something common, or they would not be called ‘games’” but look and see whether there is anything common on all. For if you look at them you will not see something that is common to all, but similarities, relationships, and a whole series of them at that. To repeat: don’t think, but look! Look for example at board-games, with their multifarious relationships. Now pass to card-games; here you find many correspondences with the first group, but many common features drop out, and others appear. When we pass next to ball-games, much that is common is retained, but much is lost. Are they all ‘amusing’? Compare chess with noughts and crosses. Or is there always winning and losing, or competition between players? Think of patience. In ball-games there is winning and losing; but when a child throws his ball at the wall and catches it again, this feature has disappeared. Look at the parts played by skill and luck; and at the difference between skill in chess and skill in tennis. Think now of games like ring-a-ring-a-roses; here is the element of amusement, but how many other characteristic features have disappeared! And we can go through the many, many other groups of games in the same way; can see how similarities crop up and disappear. And the result of this examination is: we see a complicated network of similarities overlapping and crisscrossing: sometimes overall similarities, sometimes similarities of detail.

-Ludwig Wittgenstein, Philosophical Investigations

This is perhaps the correct approach to take when studying the notion of “game” as commonly used in the course of life, but that is not what we are doing here. We want to isolate a concrete mathematical notion of game amenable to rigorous analysis, and for this purpose we must be precise. No doubt there will be things that many people consider games that will be left out of our analysis, and perhaps some of our games would not be recognized as such out in the wild, but this is beside the point.


To narrow the scope of our investigation, let us say more about what type of games we are interested in. First, for simplicity, we are interested in two-player games, in which the players play moves one at a time. We are also (for now, at least) interested in games that necessarily end in a finite number of moves (though, for any particular game, there may be no a priori finite upper bound on the number of moves in a run of that game). Finally, we will be interested in games for which the game must end in victory for one of the players. Our theory can easily be adapted to deal with ties, but this will just unnecessarily complicate things.

One way to think about a move in a game is as a transformation of the current game into a different one. Consider chess (and, just so it satisfies our constraints, suppose that a classical “tie” counts as a win for black). A typical game of chess starts with all of the pieces in their traditional spots (for simplicity, let’s be agnostic about which color moves first). However, we can consider a slightly different game, called chess_1, that has all of the same rules as chess except that white’s king pawn starts on e4, two squares up from its traditional square. This is a perfectly fine game, and white’s opening move of e2-e4 can be seen as a transformation of chess into chess_1.

With this idea in mind, it makes sense to think of a game as two sets of other games: one set is the set of games that one player can transform the game into by making a move, and the other set is the set of games that the other player can transform the game into by making a move. We will refer to our players as Left (L) and Right (R), so a game G can be thought of as a pair (L | R), where L and R are sets of games. This in fact leads to our first rule of games:

First Rule of Games: If L and R are sets of games, then G = (L | R) is a game.

Depending on one’s background assumptions, this rule does not necessarily rule out games with infinite runs, or pathological games like Hypergame. We therefore explicitly forbid this:

Second Rule of Games: There is no infinite sequence \langle G_i = (L_i | R_i) \mid i \in \mathbb{N} \rangle of games such that, for all i \in \mathbb{N}, G_{i+1} \in L_i \cup R_i.

And that’s it! Now we know what games are…

The skeptics among you may think this is not enough. It may not even be immediately evident that there are any games at all! But there are. Note that the empty set is certainly a set of games (all of its elements are certainly games). Therefore, G = (\emptyset | \emptyset) is a game. It is a boring game in which neither player can make any moves, but it is a game nonetheless. We can now begin to construct more interesting games, like (\{(\emptyset | \emptyset), (\{(\emptyset | \emptyset)\} | \{(\emptyset | \emptyset)\})\} | \{(\emptyset | \emptyset)\}), or chess.

There’s one crucial aspect of games we haven’t dealt with yet: who wins? We deal with this in the obvious way. Let us suppose that, in an actual run of a game, the players must alternate moves (though a game by itself does not specify who makes the first move). During a run of a game, a player loses if it is their turn to move and they have no moves to make, e.g., the game has reached a position (L | R), it is R’s turn to move, and R = \emptyset.


Let us look now at a simple, illustrative example of a game: Nim. A game of Nim starts with a finite number of piles, each containing a finite number of objects. On a player’s move, they choose one of these piles and remove any non-zero number of objects from that pile. The loser is the first player who is unable to remove any objects.

Let us denote games of Nim by finite arrays of numbers, arranged in increasing order. For example, the game of Nim starting with four piles of, respectively, 1,3,5, and 7 objects will be represented by [1,3,5,7]. The trivial game of Nim, consisting of zero piles, and in which the first player to move automatically loses, will be represented by [0].

Let us see that Nim falls into the game framework that we developed above. The trivial game of Nim is clearly equivalent to the trivial game, (\emptyset | \emptyset). We can now identify other game of Nim as members of our framework by induction on, say, the total number of objects involved in the game at the start. Thus, suppose we are trying to identify [1,3,5,7] as a game and we have already succeeded in identifying all instances of Nim with fewer than 16 objects. What instances of Nim can [1,3,5,7] be transformed into by a single move? Well, a player can remove all of the objects from a pile, resulting in [1,3,5], [1,3,7], [1,5,7], or [3,5,7]. Alternatively, they can remove parts of the 3, 5, or 7 piles, resulting in things like [1,3,4,5], [1,1,5,7], etc. All of these Nim instances, clearly, have fewer than 16 objects, so, if we let X denote the set of Nims that can result after one move of [1,3,5,7], then we have shown that X is a set of games, in the sense of our formal framework. We can therefore define a game (X | X), which is clearly equivalent to [1,3,5,7].

In the next post, we’ll look at strategies for games. When can we say for sure which player wins a game? How can we derive winning strategies for games? And what does it all mean?


Cover image: Paul Cézanne, “The Card Players”