Common Knowledge

The phrase “common knowledge,” in its common usage, refers to knowledge that is held by (nearly) everyone, at least within some given community. For example, it is common knowledge in modern society that the Earth is round and orbits the sun, or that 2+2=4. These are facts that I can assume that you, my readers, already know or can easily verify, and it’s common practice not to give citations or acknowledge sources for them.

Today, though, we’re not really going to be discussing this common, informal usage of “common knowledge.” Nor are we going to devote this post to facts about the musician behind the classic hip hop album, Resurrection. (Though did you know that the track “I Used to Love H.E.R.” launched a feud with Ice Cube that was eventually settled at a private meeting organized by Louis Farrakhan at his house??) Rather, we’re going to delve into “common knowledge” as a fascinating technical concept in philosophical logic, epistemology, and social science.

Let’s start with the idea of mutual knowledge. Suppose that there is a group of people, and there is a fact that is known by everyone in the group. Then this fact is mutual knowledge among the members of the group. The classification of a fact as mutual knowledge seems like a pretty strong assertion, but we can (and often do) ask for even more. We can ask, for example, that not only does everyone in the group know the fact, but everyone in the group knows that everyone in the group knows the fact. Or we can go further, and ask that everyone in the group knows that everyone in the group knows that everyone in the group knows the fact. Carrying this process to infinity, we obtain the concept of common knowledge.

To be more precise, let us say that a fact is first-order knowledge among members of a group if everyone knows it, i.e., it is mutual knowledge. Let us say it is second-order knowledge if everyone in the group knows that everyone in the group knows it. Continue in this manner. In general, if n is a natural number and we have already defined the notion of n-th order knowledge, then a fact is (n+1)-st order knowledge among members of a group if it is n-th order knowledge and everyone in the group knows that it is n-th order knowledge. A fact is then common knowledge among members of a group if it is n-th order knowledge for every natural number n.

Let’s look at an example to try to clarify the distinction between mutual knowledge and common knowledge. Suppose that all of the students in a class have arrived to a classroom early and are sitting at their desks, and there is a large beetle crawling on the wall. Every student sees the beetle, so the presence of the beetle in the classroom is mutual knowledge among the students. None of the students particularly wants to deal with the beetle, though, so each student pretends to ignore it. As far as each student is concerned, they might be the only student who has noticed the beetle. The presence of the beetle is thus not even second-order knowledge among the students. Then the professor walks in the room, sees the beetle, points at it, and loudly exclaims, “Ah, I see we have a beetle auditing the class today!” Now, suddenly, the students know that they all know that there is a beetle in the room, and they know that they all know that they all know that there is a beetle in the room, and so on. The presence of the beetle has suddenly become common knowledge.

In the grand scheme of things, the study of common knowledge, and higher order knowledge in general, is a rather recent development in philosophy; David Hume was perhaps the first major philosopher to consider it, in A Treatise of Human Nature. It has become extensively studied since then, though, and, if one knows where to look, it can be found almost anywhere.

Consider the following example, due to philosopher Alan Hájek. You are in a restaurant, and a server walking by you trips on his shoelaces and spills a bit of soup on your shirt. You are, naturally, upset, and the server becomes apologetic and says, “It was my fault.” This is of course something almost everyone would expect him to say, but let’s look at it more closely. Why exactly did he say it, and why is it important? Everyone involved in the incident already knew that the server was at fault, so this utterance would seem to add no new information. However, it actually can be the key to ensuring that the incident does not escalate further, by establishing the server’s fault as common knowledge. Indeed, before the server’s admission, you might have thought that the server considered you at fault, which surely would have made you even angrier. After the server’s statement, some common ground has been established, and, with luck and maybe a free dessert, the situation can calmly be resolved.

The notion of higher order knowledge also shows up all the time in game theory (and, therefore, economics, international relations, etc.). When formulating an ideal strategy for a game, one must consider not only the current state and rules of the game, but also the knowledge and strategies of all of one’s partners and adversaries. But they, in turn, of course, are considering your knowledge and strategies when formulating their own, and one quickly reaches vertiginous levels of higher order knowledge. If you want to see common knowledge repeatedly raise its many confusing heads, just head to a high stakes poker game. (“I know that she would almost never bluff in this spot. But we’ve been playing together for days, so I also know that she knows that I know this, so maybe she’s actually more likely to bluff. But maybe she knows that I know that she knows that I know this, which means…I don’t know. I call.”)

We end today with a puzzle that has common knowledge at its core: There are two prisoners, Louis and Marie, each in their own prison cell. Each looks out on their own little yard. Louis can see 8 trees in his yard; Marie can see 12. One day, the warden brings them together and tells them both, “Between the two of you, you can see a total of either 18 or 20 trees. Every day at 5 pm, I will go to each of you. If one of you tells me the correct total number of trees in your two yards, you will immediately be set free. If you are incorrect, you will immediately be executed. If you say nothing, you will stay in prison for another day.”

Louis and Marie are not able to directly communicate with one another during this meeting with the warden, and no communication is possible between their two cells. Each was fully aware of themselves during the meeting, though; the contents of the warden’s speech have become common knowledge between them.

Q: Assuming Louis and Marie are entirely rational (and that their rationality is common knowledge between them) can they guarantee their release? If so, how? How long will it take?

We’ll talk about the solution beneath this nice painting of trees. Meanwhile, take a few minutes to think about it yourselves.

Gustav Klimt, Pine Forest II. If you can correctly tell me how many trees you see, you will be set free…

At first, it might not seem like the warden’s speech gave any really useful information. Louis now knows that Marie sees either 10 or 12 trees, and Marie now knows that Louis sees either 6 or 8 trees. But it seems that neither can be sure which of the two numbers the other sees.

If we delve deeper into the thought processes of the prisoners, though, and consider some counterfactual situations, a different picture arises. First, suppose that Marie actually saw 19 or 20 trees out her window. Then, on Day 1, Marie could confidently say, “We see 20 trees between us,” because she see more than 18 trees herself. Therefore, back in the real world, after neither prisoner says anything on Day 1, and the prisoners therefore remain in their cells, Louis knows that Marie sees at most 18 trees. You might naturally raise the following objection here: “Of course Louis knows that Marie sees at most 18 trees. He didn’t have to wait until after Day 1 to know this. He already knew this immediately after the warden’s speech, when he knew that Marie sees either 10 or 12 trees!”

But the important point here is not just that Louis knows that Marie sees at most 18 trees, but that this fact is now common knowledge, since it follows immediately from the information given by the warden and by the fact that nobody said anything on Day 1, both of which are themselves common knowledge.

Furthermore, just for the sake of completeness, and at the risk of getting lost in the weeds, let us argue that the fact that Marie sees at most 18 trees was not common knowledge before 5 pm on Day 1. Indeed, Louis knew then that Marie sees 10 or 12 trees. Put Marie does not know that Louis knows this. This is because, as far as Marie knows, Louis could see 6 trees, in which case Louis would know that Marie sees either 12 or 14 trees. Therefore, Marie can only know that Louis knows that Marie sees either 10, 12, or 14 trees. But, in turn, Louis cannot know that Marie knows that, since, as far as Louis knows, Marie could see 10 trees, in which case Marie would only know that Louis knows that Marie sees either 8, 10, or 12 trees. It follows that Louis only knows that Marie knows that Louis knows that Marie sees either 8, 10, 12, or 14 trees. And we can continue in this way. The higher order the knowledge becomes, the more uncertainty is introduced into the range, until we reach the statement: Marie knows that Louis knows that Marie knows that Louis knows that Marie knows that Louis knows that Marie knows that Louis knows that Marie sees 4, 6, 8, 10, 12, 14, 16, 18, or 20 trees. Therefore, before 5 pm on Day 1, the knowledge that Marie sees at most 18 trees is not even ninth order knowledge!

I’m sure that was crystal clear. So let’s continue. We’ve established that, after Day 1, it is common knowledge that Marie sees at most 18 trees. Suppose now that it happened to be the case that Louis saw fewer than 2 trees. Then this fact, combined with the knowledge that Marie sees at most 18 trees, would lead Louis to conclude that the prisoners see fewer than 20 trees between them, so they must see exactly 18. In the real world, when Day 2 goes by and the prisoners are still in their cells, it therefore becomes common knowledge that Louis sees at least 2 trees. Again, this was not common knowledge before 5 pm on Day 2.

It should now be clear how to continue. After Day 3 goes by with no change, it becomes common knowledge that Marie sees at most 16 trees. Then, after Day 4, it becomes common knowledge that Louis sees at least 4 trees. After Day 5, Marie sees at most 14 trees. After Day 6, Louis sees at least 6 trees. After Day 7, Marie sees at most 12 trees.

So far, none of this common knowledge is enough for either prisoner to deduce the total number of trees. In fact, neither of the prisoners has learned anything new about the number of trees seen by the other. However, their higher order knowledge has steadily grown. Indeed, the increasing ranges we saw creeping in to the prisoners’ higher order knowledge on Day 1 have gradually been winnowed by the passage of time. And this pays off after Day 8, when it becomes common knowledge that Louis sees at least 8 trees. Marie then carefully recounts the trees outside her window, confirms that it is 12, and concludes that, together, they see at least (and therefore exactly) 20 trees. She gives the correct answer on Day 9, and the prisoners are released!

There’s another famous puzzle involving common knowledge, initially even more baffling than this one. It is known as the blue-eyed islanders puzzle. It has been written about extensively elsewhere, so let me just point you to one such place, namely, Terence Tao’s blog.

For more reading on common knowledge, see this entry at the Stanford Encyclopedia of Philosophy.

Cover Image: Spock performing a Vulcan mind meld, thereby establishing common knowledge with Dr. Simon Van Gelder.


Non-Euclidean Geometry and a Goldfish

We’ll be back, probably next week, with a new post about common knowledge. Today, though, a couple of links.

First, coming off of our recent posts about non-Euclidean geometry, a delightful 1970s BBC program on the subject:

Second, a poem, by Sara Baume and published at Granta, about a goldfish.



Life on the Poincaré Disk

Just at this time I left Caen, where I was then living, to go on a geological excursion under the auspices of the school of mines. The changes of travel made me forget my mathematical work. Having reached Coutances, we entered an omnibus to go some place or other. At the moment when I put my foot on the step the idea came to me, without anything in my former thoughts seeming to have paved the way for it, that the transformations I had used to define the Fuchsian functions were identical with those of  non-Euclidean geometry. I did not verify the idea; I should not have had time, as, upon taking my seat in the omnibus, I went on with a conversation already commenced, but I felt a perfect certainty. On my return to Caen, for conscience’ sake I verified the result at my leisure.

-Henri Poincaré, Science and Method

You’re out for a walk one day, contemplating the world, and you suddenly have an out-of-body experience, your perspective floating high above your corporeal self. As you rise, everything seems perfectly normal at first, but, when you reach a sufficient altitude, you notice something strange: your body appears to be at the center of a perfect circle, beyond which there is simply…nothing!

You watch yourself walk towards the edge of the circle. It initially looks like you will reach the edge in a surprisingly short amount of time, but, as you continue watching, you notice yourself getting smaller and slowing down. By the time you are halfway to the edge, you are moving at only 3/4 of your original speed. When you are 3/4 of the way to the edge, you are moving at only 7/16 of your original speed. Maybe you will never reach the edge after all? What is happening?

At some point, you see your physical self notice some friends, standing some distance away in the circle. You wave to one another, and your friends beckon you over. You start walking toward them, but, strangely, you walk in what looks not to be a straight line but rather an arc, curving in towards the center of the circle before curving outward again to meet your friends. And, equally curiously, your friends don’t appear to be surprised or annoyed by your seemingly inefficient route. You puzzle things over for a few seconds before having a moment of insight. ‘Oh!’ you think. ‘My physical body is living on a Poincaré disk model for hyperbolic geometry, which my mind has somehow transcended during this out-of-body experience. Of course!”

The Poincaré disk model, which was actually put forth by Eugenio Beltrami, is one of the first and, to my mind, most elegant models of non-Euclidean geometry. Recall from our previous post that a Euclidean geometry is a geometry satisfying Euclid’s five postulates. The first four of these postulates are simple and self-evident. The fifth, known as the Parallel Postulate (recall also that two lines are parallel if they do not intersect), is unsatisfyingly complex and non-immediate. To refresh our memories, here is an equivalent form of the Parallel Postulate, known as Playfair’s Axiom:

Given any line \ell and any point P not on \ell, there is exactly one line through P that is parallel to \ell.

A non-Euclidean geometry is a geometry that satisfies the first four postulates of Euclid but fails to satisfy the Parallel Postulate. Non-Euclidean geometries began to be seriously investigated in the 19th century; Beltrami, working in the context of Euclidean geometry, was the first to actually produce models of non-Euclidean geometry, thus proving that, supposing Euclidean geometry is consistent, then so is non-Euclidean geometry.

The Poincaré disk model, one of Beltrami’s models, is a model for hyperbolic geometry, in which the Parallel Postulate is replaced by the following statement:

Given any line \ell and any point P not on \ell, there are at least two distinct lines through P that are parallel to \ell.

Points and lines are the basic objects of geometry, so, to describe the Poincaré disk model, we must first describe the set of points and lines of the model. The set of points of the model is the set of points strictly inside a given circle. For concreteness, let us suppose we are working on the Cartesian plane, and let us take the unit circle, i.e., the circle of radius one, centered at the origin, as our given circle. The points in the Poincaré disk model are then the points in the plane whose distances from the origin are strictly less than one.

Lines in the Poincaré disk model (which we will sometimes call hyperbolic lines) are arcs formed by taking one of the following type of objects and intersecting it with the unit disk:

  1. Straight lines (in the Euclidean sense) through the center of the circle.
  2. Circles (in the Euclidean sense) that are perpendicular to the unit circle.

(These can, of course, be seen as two instances of the same thing, if one takes the viewpoint that, in Euclidean space, straight lines are just circles of infinite radius.)

D, D1, and D2 are all lines in the Poincaré disk model. By Jean-Christophe BENOIST, Own work – CC BY 3.0

It’s already pretty easy to see that this geometry satisfies our hyperbolic replacement of the Parallel Postulate. In fact, given a line \ell and a point P not on \ell, there are infinitely many lines through P parallel to \ell. Here’s an illustration of a typical case, with three parallel lines drawn:

Three lines passing through a given point, parallel to a given line. Source: William Barker

We’re not quite able right now to prove that the disk model satisfies the first four of Euclid’s postulates, in part because we haven’t yet specified what it means for two line segments in the model to be be congruent (we don’t, for example, have a notion of distance in our model yet). We’ll get to this in just a minute, but let us first show that our model satisfies the first postulate: Given any two distinct points, there is a line containing both of them.

To this end, let A and B be two points in the disk. If the (Euclidean) line that contains A and B passes through the center of the disk, then this is also a line in the disk model, and we are done. Otherwise, the (Euclidean) line that contains A and B does not pass through the center of the disk. In this case, we use the magic of circle inversion, which we saw in a previous post. Let A' by the result of inverting A across the unit circle. Now A, A', and B are distinct points in the Cartesian plane, so there is a unique circle (call it \gamma) containing all three. Since A and A' are both on the circle, it is perpendicular to the unit circle. Therefore, its intersection with the unit disk is a line in the disk model containing both A and B. Here’s a picture:

print copy
Hyperbolic line containing A and B. Source: Euclid and Beyond by Robin Hartshorne

We turn now to distance in the Poincaré disk model. And here, for the sake of brevity, I’m not even going to try to explain why things are they way they are but will just give you a formula. Given two points A and B in the disk, consider the hyperbolic line containing them, and let P and Q be the points where this line meets the boundary circle (with P closer to A and Q closer to B). Then the hyperbolic distance between A and B is given by:

d(A,B) = \mathrm{ln}(\frac{|PB|\cdot|AQ|}{|PA|\cdot|BQ|}).

This is likely inscrutable right now. That’s fine. Let’s think about what it means for this to be the correct notion of distance, though. For one thing, it means that, given two points in the disk model, the shortest path between them is not, in general, the straight Euclidean line that connects them, but rather the hyperbolic line that connects them. This explains your body’s behavior in the story at the start of this post. When you were walking over to your friends, what appeared to your mind (which was outside the disk, in the Euclidean realm) as a curved arc, and therefore an inefficient path, was in fact a hyperbolic line and, because your body was inside the hyperbolic disk, the shortest path between you and your friends.

This notion of distance also means that distances inside the disk which appear equal to an external Euclidean observer in fact get longer and longer the closer they are to the edge of the disk. This is also consistent with the observations at the beginning of the post: as your body got further toward the edge of the disk, it appeared from an external viewpoint to be moving more and more slowly. From a viewpoint inside the disk, though, it was moving at constant speed and would never reach the edge of the disk, which is infinitely far away. The disk appears bounded from the external Euclidean view, but from within it is entirely unbounded and limitless.

Let’s close by looking at two familiar shapes, interpreted in the hyperbolic disk. First, circles. Recall that a circle is simply the set of points that are some fixed distance away from a given center. Now, what happens when we interpret this definition inside the hyperbolic disk? Perhaps somewhat surprisingly, we get Euclidean circles! (Sort of.) To be more precise, hyperbolic circles in the Poincaré disk model are precisely the Euclidean circles that lie entirely within the disk. (I’m not going to go through the tedious calculations to prove this; I’ll leave that up to you…) Beware, though! The hyperbolic center of the circle is generally different from the Euclidean center. (This should make sense if you think about our distance definition. The hyperbolic center will be further toward the edge of the disk than the Euclidean center, coinciding only if the Euclidean center of the circle is in fact the center of the hyperbolic disk.)

Next, triangles. A triangle is, of course, a polygon with three sides. This definition works perfectly fine in hyperbolic geometry; we simply require that our sides are hyperbolic line segments rather than Euclidean line segments. If we assume the first four of Euclid’s postulates, then the Parallel Postulate is actually equivalent to the statement that the sum of the interior angles of a triangle is 180 degrees. In the Poincaré disk model (and, in fact, in any model of hyperbolic geometry) all triangles have angles that sum to less than 180 degrees. This should be evident if we look at a typical triangle:

A typical triangle in the Poincaré disk model.

Things become interesting when you start to ask how much less than 180 degrees a hyperbolic triangle has. The remarkable fact is that the number of degrees in a hyperbolic triangle is dependent entirely on its (hyperbolic) area! The smaller a triangle is, the larger the sum of its interior angles: as triangles get smaller and smaller, approaching a single point, the sum of their angles approaches 180 degrees from below. Correspondingly, as triangles get larger, the sum of their angles approaches 0 degrees. In fact if we consider an “ideal triangle”, in which the three vertices are in fact points on the bounding circle (and thus not real points in the disk model), then the sum of the angles of this “triangle” is actually 0 degrees!

“Ideal” triangle with interior angles adding to zero.

A consequence of this is the fact that, in the Poincaré disk model, if two triangles are similar, then they are in fact congruent!

This leads us to our final topic: one of the perks of living in a Poincaré disk model. Perhaps the most frequent complaint I hear from people living on a Euclidean plane is that there aren’t enough ways to tile the plane with triangles. Countless people come up to me and say, “Chris, I want to tile the plane with triangles, and I want this tiling to have the following two pleasing properties:

  1. All of the triangles are congruent, they don’t overlap, and they fill the entire plane.
  2. At every vertex of the tiling, all angles meeting that vertex are the same.

But there are only four essentially different ways of doing this, and I’m tired of all of them! What should I do?”

(Exercise for the reader: Find all four such tilings!)

It just so happens that I have a simple answer for these people: “Move to a Poincaré disk model, where there are infinitely many tilings with these properties!” Here are just a few (all by Tamfang and in the public domain):

Right triangles. The smallest, in fact, that can tile the Poincaré disk model.
Larger right triangles.
The largest right “triangles”, each with two “ideal” vertices on the edge of the disk.
The dual to a tiling hidden in Escher’s Circle Limit III, the cover image to this post.
Equilateral triangles.
The largest “triangles”, each with three ideal vertices.

I’ll leave you with that! Hyperbolic geometry is fascinating, and I encourage you to investigate further on your own. The previous mentioned Euclid and Beyond, by Hartshorne, is a nice place to start.

This also wraps up (for now, at least) a couple of multi-part investigations here at Point at Infinity: a look at the interesting geometry of circles, which started in our post on circle inversion, and a look at various notions of independence in mathematics, the other posts being here and here. Join us next time for something new!

Cover Image: M. C. Escher, Circle Limit III

Parallel Lines

Detest it as lewd intercourse, it can deprive you of all your leisure, your health, your rest, and the whole happiness of your life.

Do not try the parallels in that way: I know that way all along. I have measured that bottomless night, and all the light and all the joy of my life went out there.

-Letters from Farkas Bolyai to his son, János, attempting to dissuade him from his investigations of the Parallel Postulate.

In our previous post, cataloging various notions of mathematical independence, we introduced the idea of logical independence. Briefly, given a consistent set of axioms, T, a sentence \varphi is independent from T if it can be neither proven nor disproven from the sentences in T. Today, we discuss one of the most prominent and interesting instances of logical independence: Euclid’s Parallel Postulate.

Among the most famous sets of axioms (top 5, certainly) are Euclid’s postulates, five statements underpinning (together with 23 definitions and five other statements putting forth the properties of equality) the mathematical system of Euclidean geometry set forth in the Elements and still taught in high school classrooms to this day. (We should note here that, from a modern viewpoint, Euclid’s proofs do not always strictly conform to the standards of mathematical rigor, and some of his results rely on methods or assumptions not justified by his five postulates. This has been fixed, for example by Hilbert, who gave a different set of axioms for Euclidean geometry in 1899. Now that we have noted this, we will proceed to forget it for the remainder of the post.)

Euclid’s first four postulates are quite elegant in their simplicity and self-evidence. Reformulated in modern language, they are roughly as follows:

  1. Given any two points, there is a unique line segment connecting them.
  2. Given any line segment, there is a unique line (unbounded in both directions) containing it.
  3. Given any point P and any radius r, there is a unique circle of radius r centered at P.
  4. All right angles are congruent.

The fifth postulate, however, which is known as the Parallel Postulate, is, quite unsatisfyingly, markedly more complicated and less self-evident:

  1. If two lines intersect a third in such a way that the sum of the inner angles on one side is less than two right angles, then the two lines must intersect one another on that side.

A picture might help illustrate this postulate:

The two indicated angles sum to less than two right angles, so, by the Parallel Postulate, the two lines, if extended far enough, will intersect on that side of the third line. By Harkonnen2, CC BY-SA 3.0

This postulate doesn’t seem to be explicitly about parallel lines, so the reader may be wondering why it is often called the Parallel Postulate. The reason becomes evident, though, when considering the following statement and learning that, in the context of the other four postulates, it is in fact equivalent to the Parallel Postulate:

  1. Given any line \ell and any point P not on \ell, there is exactly one line through P that is parallel to \ell.

(Recall that two lines are parallel if they do not intersect.) This reformulation of the Parallel Postulate is often named Playfair’s Axiom, after the 18th-century Scottish mathematician John Playfair, though it was stated already by Proclus in the 5th century.

The Parallel Postulate was considered undesirably unwieldy and less satisfactory than the other four postulates, even by Euclid himself, who made a point of proving the first 28 results of the Elements without recourse to the Parallel Postulate. The general opinion among mathematicians for the next two millennia was that the Parallel Postulate should not be an axiom but rather a theorem; it should be possible to prove it using just the other four postulates.

Many attempts were made to prove the Parallel Postulate, and many claimed success at this task. Errors were then inevitably discovered by later mathematicians, many of whom subsequently put forth false proofs of their own. The aforementioned Proclus, for example, after pointing out flaws in a purported proof of Ptolemy, gives his own proof, which suffers from two instructive flaws. The first is relatively minor: Proclus assumes a consequence of Archimedes’ Axiom, which essentially states that, given any two line segments, there is a natural number n such that n times the length of the shorter line segment will exceed the length of the longer. (We encountered Archimedes’ Axiom in a previous post, about infinitesimals, which the reader is invited to revisit.) Archimedes’ Axiom seems like an entirely reasonable axiom to assume, but it notably does not follow from Euclid’s postulates.

Proclus’ more serious error, though, is that he makes the assumption that any two parallel lines have a constant distance between them. But this does not follow from the first four postulates. In fact, the statement, “The set of points equidistant from a straight line on one side of it form a straight line,” known as Clavius’ Axiom, is, in the presence of Archimedes’ Axiom and the first four postulates, equivalent to the Parallel Postulate. Proclus’ proof is therefore just a sophisticated instance of begging the question.

In the course of the coming centuries’ attempts to prove the Parallel Postulate, a number of other axioms were unearthed that are, at least in the presence of Archimedes’ Axiom and the first four postulates, equivalent to the Parallel Postulate. In addition to Playfair’s Axiom and Clavius’ Axiom, these include the following:

  • (Clairaut) Rectangles exist. (A rectangle, of course, being a quadrilateral with four right angles.)
  • (Legendre) Given an angle \alpha and a point P in the interior of the angle, there is a line through P that meets both sides of the angle.
  • (Wallis) Given any triangle, there are similar triangles of arbitrarily large size.
  • (Farkas Bolyai) Given any three points, not all lying on the same line, there is a circle passing through all three points.

A key line of investigation into the Parallel Postulate was carried out, probably independently, by Omar Khayyam, an 11th-century Persian mathematician, astronomer, and poet, and by Giovanni Gerolamo Saccheri, an 18th-century Italian Jesuit priest and mathematician. For concreteness, let us consider Saccheri’s account, which has the wonderful title, “Euclid Freed from Every Flaw.”

Saccheri and Khayyam were, similarly to their predecessors, attempting to prove the Parallel Postulate. Their method of proof was contradiction: assume that the Parallel Postulate is false and derive a false statement from it. To do this, they considered figures that came to be known as Khayyam-Saccheri quadrilaterals.

To form a Khayyam-Saccheri quadrilateral, take a line segment (say, BC). Take two line segments of equal length and form perpendiculars, in the same direction, at B and C (forming, say, AB and DC). Now connect the ends of those two line segments with a line segment (AD) to form a quadrilateral. A picture is given below.

A Khayyam-Saccheri quadrilateral. By HR – Own work, CC BY-SA 3.0

By construction, the angles at B and C are right angles, but the angles at A and D are unclear. Saccheri proves that these two angles are equal. He also proves that, if these angles are obtuse, then they are obtuse for every such quadrilateral; if they are right, then they are right for every such quadrilateral; and, if they are acute, then they are acute for every such quadrilateral. This then naturally divides geometries into three categories: those satisfying the Obtuse Hypothesis, those satisfying the Right Hypothesis, and those satisfying the Acute Hypothesis. (These types of geometries subsequently became known as semielliptic, semieuclidean, and semihyperbolic, respectively.)

At this point, Saccheri attempts to prove that the Obtuse Hypothesis and the Acute Hypothesis both lead to contradiction. (Note that a geometry satisfying all five of Euclid’s postulates must satisfy the Right Hypothesis. The converse is not true, so even a successful refutation of the Obtuse and Acute Hypotheses would not be enough to establish the Parallel Postulate.) Saccheri is able to prove (in the presence of Archimedes’ Axiom) that the Obtuse Hypothesis leads to the conclusion that straight lines are finite, thus contradicting the second postulate. He is unable to obtain a logical contradiction from the Acute Hypothesis, though. Instead, he derives a number of counter-intuitive statements from it and then concludes that the Acute Hypothesis must be false because it is “repugnant to the nature of a straight line.”

The next big steps towards the establishment of the independence of the Parallel Postulate were made by Nikolai Lobachevsky and János Bolyai (who fortunately did not heed his father’s letters quoted at the top of this post), 19th-century mathematicians from Russia and Hungary, respectively. (Similar work was probably done by Gauss, as well, though it was never published.) Their work entailed a crucial shift in perspective – rather than attempt to prove the Parallel Postulate from the others, the mathematicians seriously considered the possibility that it is not provable and thought of non-Euclidean geometries (i.e., those failing to satisfy the Parallel Postulate) as legitimate objects of mathematical study in their own right. In particular, they were interested in hyperbolic geometry, in which the Parallel Postulate is replaced by the assertion that, given any line \ell and any point P not on the line, there are at least two distinct lines passing through P and parallel to \ell. (Not surprisingly, considering the nomenclature, hyperbolic geometries are semihyperbolic, i.e., they satisfy the Acute Hypothesis.) This viewpoint was vindicated when, in 1868, Eugenio Beltrami produced a model of hyperbolic geometry. This shows that, as long as Euclidean geometry is consistent, then the Parallel Postulate is independent of the other four postulates: all five postulates are true, for example, in the Cartesian plane, while the first four are true and the Parallel Postulate is false in any model of hyperbolic geometry.

A number of other models for hyperbolic geometry are now known. In our next post, we will look at a particularly elegant one: the Poincaré disk model.

Cover Image: Michael Tompsett, Parallel Lines

For more information on this and many other geometric topics, I highly recommend Robin Hartshorne’s excellent book, Geometry: Euclid and Beyond.


I am no bird; and no net ensnares me: I am a free human being with an independent will.

-Charlotte Brontë, Jane Eyre

An object is independent from others if it is, in some meaningful way, outside of their area of influence. If it has some meaningful measure of self-determination. Independence is important. Nations have gone to war to obtain independence from other nations or empires. Adolescents go through rebellious periods, yearning for independence from parents or other authority figures. Though perhaps less immediately exciting, notions of independence permeate mathematics, as well. Viewed in the right light, they can even be seen as direct analogues of the more familiar notions considered above: in various mathematical structures, there is often a natural way of defining an area of influence of an element or subset of the structure. A different element is then independent of this element or subset if it is outside its area of influence. Such notions have proven to be of central importance in a wide variety of mathematical contexts. Today, in anticipation of some deeper dives in future posts, we take a brief look at a few prominent examples.

Graph Independence: Recall that a graph is a pair G = (V,E), where V is a set of vertices and E is a set of edges between these vertices. If u \in V is a vertex, then its neighborhood is the set of all vertices that are connected to u in the graph, i.e., \{v \in V \mid \{u,v\} \in E\}. One could naturally consider a vertex’s area of influence in the graph G to consist of the vertex itself together with all of its neighbors. With this viewpoint, we can say that a vertex u \in V is independent from a subset A \subseteq V if, for all v \in A, u is not in the neighborhood of v, i.e., u is not in the area of influence of any element of A. Similarly, we may say that a set A \subseteq V of vertices is independent if each element of A is independent from the rest of the elements of A, i.e., if each v \in A is independent from A \setminus \{v\}.

The blue vertices form a (maximum) independent set in this graph. By Life of Riley – Own work, GFDL


Fun Fact: In computer science, there are a number of interesting computational problems involving independent sets in graphs. These problems are often quite difficult; for example, the maximum independent set problem, in which one is given a graph and must produce an independent set of maximum size, is known to be NP-hard.

Linear Independence: Let n be a natural number, and consider the real n-dimensional Euclidean space \mathbb{R}^n, which consists of all n-tuples of real numbers. Given \vec{u} = (u_1,\ldots,u_n) and \vec{v} = (v_1, \ldots, v_n) in \mathbb{R}^n and a real number r \in \mathbb{R}, we can define the elements \vec{u} + \vec{v} and r\vec{u} in \mathbb{R}^n as follows:

\vec{u} + \vec{v} = (u_1 + v_1, \ldots, u_n + v_n)

r\vec{u} = (ru_1, \ldots, ru_n).

(In this way, \mathbb{R}^n becomes what is known as a vector space over \mathbb{R}). Given a subset A \subseteq \mathbb{R}^n, the natural way to think about its linear area of influence is as \mathrm{span}(A), which is equal to all n-tuples which are of the form

r_1\vec{v}_1 + \ldots + r_k\vec{v}_k,

where k is a natural number, r_1, \ldots, r_k are real numbers, and \vec{v}_1, \ldots, \vec{v}_k are elements of A.

In this way, we say that an n-tuple \vec{u} \in \mathbb{R}^n is linearly independent from a set A \subseteq \mathbb{R}^n if \vec{u} is not in \mathrm{span}(A). A set A is linearly independent if each element \vec{u} of A is not in \mathrm{span}(A \setminus \{\vec{u}\}), i.e., if each element of A is linearly independent from the set formed by removing that element from A. It is a nice exercise to show that every linearly independent subset of \mathbb{R}^n has size at most n and is maximal if and only if it has size equal to n.

Fun Fact: Stay tuned until the end of the post!

Thou of an independent mind,
With soul resolv’d, with soul resign’d;
Prepar’d Power’s proudest frown to brave,
Who wilt not be, nor have a slave;
Virtue alone who dost revere,
Thy own reproach alone dost fear—
Approach this shrine, and worship here.

-Robert Burns, “Inscription for an Altar of Independence”

Algebraic Independence: If A is a set of real numbers, then one can say that its algebraic area of influence (over \mathbb{Q}, the set of rational numbers), is the set of all real roots of polynomial equations with coefficients in A \cup \mathbb{Q}, i.e., the set of all real numbers that are solutions to equations of the form:

r_kx^k + \ldots + r_1x + r_0 = 0,

where k is a natural number and r_0, \ldots, r_k are elements of A \cup \mathbb{Q}. With this definition, a real number s is algebraically independent (over \mathbb{Q}) from a set A \subseteq \mathbb{R} if s is not the root of a polynomial equation with coefficients in A \cup \mathbb{Q}. A set A \subseteq \mathbb{R} is algebraically independent (over \mathbb{Q}) if each element s \in A is algebraically independent from A \setminus \{s\}.

Fun Fact: Note that a 1-element set \{s\} is algebraically independent over \mathbb{Q} if and only if it is transcendental, i.e., is not the root of a polynomial with rational coefficients. \pi and e are famously both transcendental numbers, yet it is still unknown whether the 2-element set \{\pi, e\} is algebraically independent over \mathbb{Q}. It is not even known if \pi + e is irrational!

Logical Independence: Let T be a consistent set of axioms, i.e., a set of sentences from which one cannot derive a contradiction. We can say that the logical area of influence of T is the set of sentences that can be proven from T, together with their negations. In other words, it is the set of sentences which, if one takes the sentences in T as axioms, can be proven either true or false. A sentence \phi is then logically independent from T if neither \phi nor its negation can be proven from the sentences in T.

Logical independence is naturally of great importance in the study of the foundations of mathematics. Much of modern set theory, and much of my personal mathematical research, involves statements that are independent from the Zermelo-Fraenkel Axioms with Choice (ZFC), which is a prominent set of axioms for set theory and indeed for all of mathematics. These are statements, then, that in our predominant mathematical framework can neither be proven true nor proven false. The most well-known of these is the Continuum Hypothesis (CH), which, in one of its formulations, is the statement that there are no infinite cardinalities strictly between the cardinality of the set of natural numbers and the cardinality of the set of real numbers. To prove that CH is independent from ZFC, one both produces a mathematical structure that satisfies ZFC and in which CH is true (which Kurt Gödel did in 1938) and produces a mathematical structure that satisfies ZFC and in which CH is false (which Paul Cohen did in 1963). Since Cohen’s result in 1963, a great number of natural mathematical statements have been proven to be independent from ZFC.

In our next post, we will consider a logical independence phenomenon of a somewhat simpler nature: the independence of Euclid’s parallel postulate from Euclid’s four other axioms for plane geometry, which will lead us to considerations of exotic non-Euclidean geometries.

Fun Fact: In the setting of general vector spaces, which generalize the vector spaces \mathbb{R}^n from the above discussion of linear independence, a basis is a linearly independent set whose span (what we referred to as its linear area of influence) is the entire vector space. A basis for \mathbb{R}^n is thus any linearly independent set of size n. Using the Axiom of Choice, one can prove that every vector space has a basis. However, there are models of ZF (i.e., the Zermelo-Fraenkel Axioms without Choice) in which there are (infinite-dimensional) vector spaces without a basis. Thus, the statement, “Every vector space has a basis,” is logically independent from ZF.

Solitude is independence.

-Hermann Hesse, Steppenwolf

Circle Inversion and the Pappus Chain

There is a pledge of the big and of the small in the infinite.

-Dejan Stojanović

In the next two posts, we are going to look at two interesting geometric ideas of the 19th century involving circles. Next time, we will consider Poincaré’s disk model for hyperbolic geometry. Today, though, we immerse ourselves in the universe of inversive geometry.

Consider a circle in the infinite 2-dimensional plane:


This circle divides the plane into two regions: the bounded region inside the circle and the unbounded region outside the circle (let’s say that the points on the circle belong to both regions). A natural thing to want to do, now, especially in the context of this blog, would be to try to exchange these two regions, to map the infinite space outside the circle into the bounded space of the circle, and vice versa, in a “natural” way.

I could be bounded in a nutshell, and count myself a king of infinite space.

-William Shakespeare, Hamlet

Upon first reflection, one might be tempted to say that we want to “reflect” points across the circle. And this is sort of right, but reflection already carries a meaning in geometry. Truly reflecting points across the circle would preserve their distance from the circle, so the inside of the circle could only be mapped onto a finite ring whose outer radius is twice that of the circle two. Moreover, it would not be clear how to reflect points from outside this ring into the circle.

Instead, we want to consider a process known as “inversion.” Briefly speaking, we want to arrange so that points arbitrarily close to the center of the circle get sent to points arbitrarily far away from the center of the circle, and vice versa. For simplicity, let us suppose that the circle is centered at the origin of the plane and has a radius of 1. The most natural way to achieve our aim is to send a point P to a point P' that lies in the same direction from the origin as P and whose distance from the origin is the reciprocal of the distance from P to the origin. Here’s an example:

P and P’ get swapped by inversion.

One can check that, algebraically, this inversion sends a point P with coordinates (x,y) to a point P' with coordinates (\frac{x}{x^2+y^2}, \frac{y}{x^2+y^2}). Points inside the circle are sent to points outside the circle, points outside the circle are sent to points inside the circle, and points on the circle are sent to themselves. Moreover, as one might expect from the name, the inversion map is its own inverse: applying it twice, we end up where we started. Perfect!

Wait a second, though. We’re being a little too hasty. What about the origin? Where is it sent? Our procedure doesn’t seem to tell us, and if we try to use our algebraic expression, we end up dividing by zero. Since the origin is inside the circle, it should certainly be sent to a point outside the circle, but all of those points are already taken. Also, since points arbitrarily close to the origin get mapped to points arbitrarily far from the origin, we want to send the origin to a point as far away from itself as possible. At first glance, we might seem to be in a quandary here, but longtime readers of this blog will see an obvious solution: the origin gets mapped to a point at infinity! (And the point at infinity, in turn, gets mapped to the origin.)

(Technical note: Since we’ve added a point at infinity, the inversion map should be seen not as a map on the plane \mathbb{R}^2, but on its one-point compactification (or Alexandroff compactification), \hat{\mathbb{R}}^2. In fact, the inversion map is a topological homeomorphism of \hat{\mathbb{R}}^2 with itself.)

Let’s examine what the inversion map does to simple geometric objects. We have already seen what happens to points. It should also be obvious that straight lines through the origin get mapped to themselves. For example, in the image above, the line connecting P and P' gets mapped to itself. (Here we are specifying, of course, that every line contains the point at infinity.)

A bit of thought and calculation will convince you that lines not passing through the origin get sent to circles that do pass through the origin.

The red line, when inverted across the black circle, gets sent to the red circle.

Since the inversion map is its own inverse, circles passing through the origin get mapped to lines that don’t pass through the origin. Circles that don’t pass through the origin, on the other hand, get mapped to other circles that don’t pass through the origin.

The red circle on the left is sent to the red circle on the right through inversion.

There’s an important special case of this phenomenon: a circle that is met perpendicularly by the circle through which we are inverting gets mapped to itself.

The red circle is perpendicular to the circle of inversion and is thus sent to itself.

We thus have a sort of duality between lines and circles that has been revealed through the process of circle inversion. Lines, when seen in the right light, are simply circles with an infinite radius. We’re going to move on to some applications of circle inversion in just a sec, but, first, a pretty picture of an inverted checkerboard.

Left: A checkerboard. Right: A checkerboard inverted across a circle centered at the middle of the board with radius equal to the side length of one checkerboard square. (from Mathographics by R. Dixon)

The introduction of the method of circle inversion is widely attributed to the Swiss mathematician Jakob Steiner, who wrote a treatise on the matter in 1824. When combined with the more familiar rigid transformations of rotation, translation, and reflection, the decidedly non-rigid transformation of inversion gives rise to inversive geometry, which became a major topic of study in nineteenth geometry. It was perhaps most notably applied by William Thomson (later to become 1st Baron Kelvin, immortalized in the name of a certain temperature scale), at the age of 21, to solve problems in electrostatics. Circle inversion also allows for extremely elegant proofs of classical geometric facts. We end today’s post with an example.

Consider three half-circles, all tangent to one another and centered on the same horizontal line, with two placed inside the third, as follows:

An arbelos. (Original by Julio Reis, new version by Rubber Duck, CC BY-SA 3.0)

This figure (or, more precisely, the grey region enclosed by the semicircles) is known as an arbelos, and its first known appearance dates back to The Book of Lemmas by Archimedes. A remarkable fact about the arbelos is that, starting with the smallest of the semicircles in the figure, one can nestle into it an infinite sequence of increasingly small circles, each tangent to the two larger semicircles and the circle appearing before it, thus creating the striking Pappus chain, named for Pappus of Alexandria, who investigated the figure in the 3rd century AD:

A Pappus chain. (By Pbroks13, CC BY 3.0)

Let us label the circles in the Pappus chain (starting with the smallest semicircle in the arbelos) \mathcal{C}_0, \mathcal{C}_1, \mathcal{C}_2, etc. (So, in the picture above, P_1 is the center of \mathcal{C}_1, P_2 is the center of \mathcal{C}_2, and so on.) Clearly, the size of \mathcal{C}_n decreases as n increases, but it is natural to ask how quickly it decreases. It is also natural to ask how the position of the point P_n changes as n increases. In particular, what is the height of P_n above the base of the figure? It turns out that the answers to these two questions are closely related, a fact discovered by Pappus through a long and elaborate derivation in Euclidean geometry, and which we will derive quickly and elegantly through circle inversion.

Let d_n denote the diameter of the circle \mathcal{C}_n, and let h_n denote the height of the point P_n above the base of the Pappus chain (i.e., the line segment AB). We will prove the remarkable formula:

For all n \in \mathbb{N}h_n = n \cdot d_n.

For concreteness, let us demonstrate the formula for \mathcal{C}_3. The same argument will work for each of the circles in the Pappus chain. As promised, we are going to use circle inversion. Our first task is to find a suitable circle across which to invert our figure. And that circle, it turns out, will be the circle centered at A and perpendicular to \mathcal{C}_3:

We will invert our figure across the red circle.

Now, what happens when we invert our figure? First, consider the two larger semicircles in the arbelos, with diameters AC and AB. The circles of which these form the upper half pass through the center of our circle of inversion and thus, as discussed above, are mapped to straight lines by our inversion. Moreover, since the centers of these circles lie directly to the right of A, a moment’s thought should convince you that they are mapped to vertical lines.

Now, what happens to the circles in the Pappus chain? Well, none of them pass through A, so they will all get mapped to circles. \mathcal{C}_3 is perpendicular to the circle of inversion, so it gets mapped to itself. But, in the original diagram, \mathcal{C}_3 is tangent to the larger semicircles in the arbelos. Since circle inversion preserves tangency, in the inverted diagram, \mathcal{C}_3 is tangent to the two vertical lines that these semicircles are mapped to. And, of course, the same is true of all of the other circles in the Pappus chain. Finally, note that, since the center of \mathcal{C}_0 lies on the base of the figure, which passes through the center of our inversion circle, it also gets mapped to a point on the base of the figure. Putting this all together, we end up with the following striking figure:

Pappus chain inversion: before and after. (from “Reflections on the Arbelos” by Harold P. Boas)

The circle with diameter AB gets mapped to the vertical line through B', and the circle with diameter AC gets mapped to the vertical line through C'. Our Pappus chain, meanwhile, is transformed by inversion into an infinite tower of circles, all of the same size, bounded by these vertical lines. Moreover, the circle \mathcal{C}_3 and the point P_3 are left in place by the inversion. It is now straightforward to use this tower to calculate the height h_3 of P_3 in terms of the diameter d_3 of \mathcal{C}_3. To get from P_3 down to the base, we must first pass through half of \mathcal{C}_3, which has a height of \frac{d_3}{2}. We then must pass through the image of \mathcal{C}_2 under the inversion, which has a height of d_3. Then the image of \mathcal{C}_1, which also has a height of d_3. And, finally, the image of the smallest semicircle of the arbelos, which has a height of \frac{d_3}{2}. All together, we get:

h_3 = \frac{d_3}{2} + d_3 + d_3 + \frac{d_3}{2} = 3d_3.

Pretty nice!

For further reading on circle inversion, see Harold P. Boas’ excellent article, “Reflections on the Arbelos.”

Cover image: René Magritte, The false mirror