Ultrafilters VI.b: Hyperreals

In a recent post, we considered the history of the infinitesimal in calculus, from its central and controversial role in the original formulations of Newton and Leibniz to its banishment in the 19th century through the introduction of the limit. Today, we move forward to the mid-20th century, when the use of infinitesimals in calculus was finally put on a rigorous (and elegant) foundation by Abraham Robinson and his development of what came to be known as non-standard analysis.

The basic idea is as follows: first, we will construct an extension of the standard real numbers, which will be called the “hyperreals.” The hyperreals will contain all of the standard real numbers but will also contain infinitesimal and infinite elements. Then, to perform basic calculus computations (such as taking derivatives), we will make use of the infinitesimals in the hyperreals and then translate the result back to the familiar world of the standard real numbers.

Let us now construct the hyperreals, which we denote by $\mathbb{R}^*$. To do this, let $\mathcal{U}$ be any non-principal ultrafilter on $\mathbb{N}$. Let $X$ be the set of all functions from $\mathbb{N}$ to $\mathbb{R}$. If $f$ and $g$ are elements of $X$, then we say $f =_{\mathcal{U}} g$ if $f$ and $g$ take the same values on a large set, or, to be more precise, if $\{n \in \mathbb{N} \mid f(n) = g(n)\} \in \mathcal{U}$.

Speaking somewhat imprecisely, the hyperreals are the members of $X$, where we say that two elements $f,g \in X$ are in fact the same hyperreal if $f =_\mathcal{U} g$. (Speaking more precisely for those familiar with the terminology, the hyperreals are the equivalence classes of $X$ modulo the equivalence relation $=_{\mathcal{U}}$.) We want to think of the hyperreals as extending the standard reals, $\mathbb{R}$, so, if $r \in \mathbb{R}$, then we identify $r$ with the constant function $f_r:\mathbb{N} \rightarrow \mathbb{R}$, defined so that $f_r(n) = r$ for all $n \in \mathbb{N}$.

We also want to be able to do everything with $\mathbb{R}^*$ that we can do with $\mathbb{R}$. For example, we want to be able to say if one hyperreal is less than another, and we want to be able to add, subtract, multiply, and divide hyperreals. We do this as follows.

First, for $f,g \in X$, we say $f <_{\mathcal{U}} g$ if $\{n \in \mathbb{N} \mid f(n) < g(n)\} \in \mathcal{U}$. We similarly define $f \leq_{\mathcal{U}} g$, $f \geq_{\mathcal{U}} g$, and $f >_{\mathcal{U}} g$. Since $\mathcal{U}$ is an ultrafilter, if $f,g \in X$, then exactly one of the following three statements is true:

• $f <_{\mathcal{U}} g$
• $f =_{\mathcal{U}} g$
• $f >_{\mathcal{U}} g$

Thus, we can order the hyperreals by $\leq_{\mathcal{U}}$ (we haven’t shown that $\leq_\mathcal{U}$ is in fact a linear ordering on the hyperreals; one piece of this will be done below and the rest left to you). Moreover, this ordering extends the usual ordering $\leq$ of the real numbers, since, if $r,s \in \mathbb{R}$ and $r < s$, then clearly $f_r <_{\mathcal{U}} f_s$.

The arithmetic operations are defined pointwise. For example, $f+g$ is defined to be the function from $\mathbb{N}$ to $\mathbb{R}$ such that, for all $n \in \mathbb{N}$, $(f+g)(n) = f(n) + g(n)$. Subtraction, multiplication, and division are defined in the same way. With division, we must be slightly careful with regards to division by zero. The details are unimportant for us, but let me just say that we can define the quotient $f/g$ for any $f,g \in X$ provided that $g \neq_{\mathcal{U}} 0$. In order to check that these operations are in fact well-defined, we would need to show, for instance, that, if $f_0 =_{\mathcal{U}} f_1$ and $g_0 =_{\mathcal{U}} g_1$, then $f_0 + g_0 =_{\mathcal{U}} f_1 + g_1$, and so on. I will leave this to the interested reader.

Here’s another imprecisely stated and important fact about the relationship between $\mathbb{R}$ and $\mathbb{R}^*$: a first-order sentence is true about $\mathbb{R}$ if and only if it is true about $\mathbb{R}^*$. I will not define ‘first-order sentence’ here, but let me illustrate this by an example. We know that $\leq$, the usual ordering of the real numbers, is transitive, i.e. if $r \leq s$ and $s \leq t$, then $r \leq t$ (in fact, transitivity is one of the requirements for being called an ‘ordering’). Let us show that $\leq_{\mathcal{U}}$ is also transitive.

To do this, consider $f,g,h \in X$ such that $f \leq_{\mathcal{U}} g$ and $g \leq_{\mathcal{U}} h$. We must show that $f \leq_{\mathcal{U}} h$. Let $Y_0 = \{n \in \mathbb{N} \mid f(n) \leq g(n)\}$ and $Y_1 = \{n \in \mathbb{N} \mid g(n) \leq h(n)\}$. Since $f \leq_{\mathcal{U}} g$ and $g \leq_{\mathcal{U}} h$, we know that both $Y_0$ and $Y_1$ are in $\mathcal{U}$. Since $\mathcal{U}$ is an ultrafilter, this means that $Y_0 \cap Y_1 \in \mathcal{U}$. If $n \in Y_0 \cap Y_1$, then $f(n) \leq g(n)$ and $g(n) \leq h(n)$, so, by the transitivity of $\leq$, we have $f(n) \leq h(n)$. Let $Y_2 = \{n \in \mathbb{N} \mid f(n) \leq h(n)\}$. We have shown that $Y_0 \cap Y_1 \subseteq Y_2$ and $Y_0 \cap Y_1 \in \mathcal{U}$. Therefore, $Y_2 \in \mathcal{U}$, so $f \leq_{\mathcal{U}} h$.

Here’s a slightly more complex example: ‘for every $x$, if $x \geq 0$, then there is $y$ such that $y^2 = x$.’ This is of course true in $\mathbb{R}$. To show that it is true in $\mathbb{R}^*$, suppose we are given $f$ such that $f \geq_{\mathcal{U}} f_0$. We must find $g$ such that $g^2 =_{\mathcal{U}} f$. Note that, since $f \geq_{\mathcal{U}} f_0$, there is a set $A \in \mathcal{U}$ such that, for every $n \in A$, $f(n) \geq 0$. Thus, by the existence of square roots in $\mathbb{R},$ for every $n \in A$, there is a real number $r_n$ such that $r_n^2 = f(n)$. Define a function $g:\mathbb{N} \rightarrow \mathbb{R}$ by letting $g(n) = r_n$ for $n \in A$ and $g(n) = 0$ for $n \not\in A$. Then, for all $n \in A$, $g(n)^2 = f(n)$, so $g^2 =_{\mathcal{U}} f$, and we have proven that our sentence is true in $\mathbb{R}^*$.

In addition to containing all of $\mathbb{R}$, $\mathbb{R}^*$ also contains many non-standard elements, and among these are two very interesting types of hyperreals. First, $\mathbb{R}^*$ contains infinite elements, i.e. elements that are larger in magnitude than every standard real number. As an example, consider the function $f \in X$ defined by $f(n) = n$ for all $n \in \mathbb{N}$. To see that $f$ is larger than every standard real number, consider a number $r \in \mathbb{R}$, and let $m$ be the least natural number such that $r < m$. Then, for every natural number $n \geq m$, $f(n) > f_r(n)$. Since $\mathcal{U}$ is a non-principal ultrafilter, $\{n \in \mathbb{N} \mid n \geq m\} \in \mathcal{U}$, so $f >_{\mathcal{U}} f_r$. Such an $f$ is a positive infinite element. Analogously, $\mathbb{R}^*$ also contains negative infinite elements.

Second, $\mathbb{R}^*$ contains infinitesimal elements, elements that are non-zero but are smaller in magnitude than any standard real number. An easy way to see this is simply to take the reciprocal of an infinite element: if $f$ is positive and infinite, then $1/f$ (or $f_1/f$, more precisely) is positive and smaller than any positive real number. If $f$ is the infinite element defined above, then $1/f$ is the function $g \in X$ defined by $g(n) = 1/n$ for all $n \in \mathbb{N}$.

An important fact about $\mathbb{R}^*$ is that, although it contains many non-standard elements, every hyperreal is either infinite or infinitely close to a standard real number (where we say that a hyperreal $f$ is infinitely close to a hyperreal $g$ if $f-g$ is either $0$ or infinitesimal). We can therefore define the standard part of a hyperreal $f$ (denoted by $st(f)$) as follows:

• if $f$ is positive and infinite, then $st(f) = \infty$;
• if $f$ is negative and infinite, then $st(f) = -\infty$;
• if $f$ is finite, then $st(f)$ is the unique $r \in \mathbb{R}$ such that $f-f_r$ is $0$ or infinitesimal.

We are now ready to do calculus with infinitesimals! Let us show, for example, how to compute a derivative. Recall that, in its original formulation, the derivative of a function $F$ at a real number $x$ (denoted $F'(x)$) was defined to be:

$\frac{F(x+h)-F(x)}{h}$,

where $h$ is infinitesimal. Now that we have a solid theory of infinitesimals, this is (almost) exactly the formula we will use! The only issue is that the result of this calculation will be a hyperreal, whereas we typically want answers in the standard real numbers. Therefore, we say that $F'(x)$, if it exists, is equal to:

$st(\frac{F(f_x + h) - F(f_x)}{h})$,

where $h$ is an infinitesimal hyperreal. To illustrate this, let’s compute the derivative of $F(x) = x^2$. Let $h$ be an infinitesimal hyperreal (for example, the function given by $h(n) = 1/n$. Then:

$st(\frac{F(f_x+h)-F(f_x)}{h}) =$

$st(\frac{(f_x+h)^2-f_x^2}{h}) =$

$st(\frac{f_x^2 + 2f_xh + h^2 - f_x^2}{h}) =$

$st(\frac{2f_xh+h^2}{h}) =$

$st(2f_x+h)$.

Since $h$ is infinitesimal, we have $st(2f_x+h) = 2x$, so, as expected, $F'(x) = 2x$.

The astute reader will note that the calculation we have done does not differ substantially from the calculation we would have carried out using the limit definition of the derivative. Although non-standard analysis has led to some new technical work, its appeal is in large part historical (setting the original formulations of Newton, Leibniz, and others on a rigorous foundation) and conceptual. Infinitesimals are (to me, at least) just nicer than limits. They seem more intuitively clear; they are more easily graspable by students; they’re more fun. And they’re made possible by the heroes of our story: ultrafilters.

Exercises for the Reader: Infinity in the Real World

I’m at a conference this week and thus will not be writing any new posts. Here are some things to think about, from Pradeep Mutalik and Quanta Magazine, until we return next week with Part B in our series on non-standard analysis.

Ultrafilters VI.a: Ghosts of Departed Quantities

Today, we begin a historical journey from actual infinity to potential infinity and back again. We will part ways with ultrafilters for this first leg; they will get a needed rest before making a spectacular reappearance on the return trip.

In a previous post, we discussed the use of infinitesimals by Galileo and some of his contemporaries and the controversy that they caused in the scientific and religious circles of Europe during the 17th century. At the end of the 17th century, the infinitesimal took an even larger role in mathematics as one of the central players in the calculus of Newton and Leibniz. To illustrate how infinitesimals were used, let us consider the derivative, one of the fundamental concepts of calculus.

Suppose $f:\mathbb{R} \rightarrow \mathbb{R}$ is a function. The derivative of $f$ (which exists, provided $f$ is a sufficiently well-behaved function) measures the rate of change of the value of $f$ with respect to changes in its input. For example, suppose the variable $x$ denotes time and $f(x)$ denotes the position of a car along a road at time $x$. Then the derivative of $f$ at a time $x_0$ (denoted, in Leibniz’s notation, by $\frac{df}{dx}(x_0)$), measures the velocity of the car at time $x_0$.

If $f$ is a linear function (so its graph is a straight line), then the derivative of $f$ is simply the slope of that line. In our example, this would correspond to a situation in which the car was traveling at a constant velocity. But what if $f$ is a more complicated function? In this case, a fruitful approach is to try to approximate the graph of $f$ by straight lines. (This is a common tactic in mathematics: given a complicated structure that you don’t know how to deal with, try to approximate it with simpler things that you do know how to deal with.)

In practice, this might look as follows. To find the derivative of $f$ at time $x$, choose a time $x_1$ close to (but different from) $x$, find the line connecting the points $(x, f(x))$ and $(x_1, f(x_1))$, and find the slope of this line. Doing the algebra, this slope turns out to be:

$\frac{f(x_1) - f(x)}{x_1 - x}$

If we rewrite $x_1$ as $x + h$, then this becomes:

$\frac{f(x + h) - f(x)}{h}$

The situation is illustrated in the following picture.

The smaller $h$ becomes, the closer the slope of the line between $(x,f(x))$ and $(x+h, f(x+h)$ gets to the actual derivative of $f$ at $x$ (which is the slope of the line tangent to the curve at the point $(x, f(x))$. Therefore, one might think, in order to find the true derivative of $f$, we just have to take $h$ to be really small. How small is this? Well, it certainly needs to be smaller in magnitude than every positive real number since, as the picture above illustrates, taking $h$ to be a non-zero real number just gives an approximation (though an arbitrarily good one) to the actual derivative. However, it cannot be equal to $0$, since, if $h=0$, then

$\frac{f(x + h) - f(x)}{h} = \frac{f(x)-f(x)}{h} = \frac{0}{0}$

which is of course undefined. The obvious answer, to those who accept the use of such things, at least, is to let $h$ be a non-zero infinitesimal. This is essentially the approach taken by Newton and Leibniz. (I am of course simplifying things, but this is the general idea).

The formulation of calculus by Newton and Leibniz was a great success and led to many further advances in science and mathematics, but the controversy surrounding the use of infinitesimals did not go away. As we mentioned in the earlier post, indiscriminate use of infinitesimals often led to paradox. There were also attacks on ontological grounds. Opponents of the use of infinitesimals claimed that their proponents wanted things both ways: on the one hand, infinitesimals should have positive magnitude to avoid undefined expressions such as $\frac{0}{0}$; on the other hand, this positive magnitude should be smaller than any real positive magnitude. These two demands seemed contradictory. In addition, there is the observation that, if $h$ is a positive infinitesimal, then $\frac{1}{h}$ would be a positive infinite number, clashing with the orthodoxy of the time that actual infinity should not be present in mathematics.

These objections essentially amounted to an appeal to what came to be known as the Archimedean property as applied to the set of real numbers. Briefly, the Archimedean property states that there are no infinite or infinitesimal elements in a given structure. More precisely (and in the context of the real numbers (it is the same for any ordered field, for example)), the Archimedean property states that, if $x$ and $y$ are any two positive numbers, then there is a natural number $n$ such that $nx > y$. The Archimedean property appeared in Euclid’s Elements and was given its name by mathematician Otto Stolz in the 1880s. Stolz did much work on extensions of the real numbers which do not satisfy the Archimedean property. Interestingly and rather surprisingly, Cantor called this work an “abomination” and published an attempted sketch of a proof of the non-existence of infinitesimals.

The standard real numbers, $\mathbb{R}$, satisfy the Archimedean property. However, if $\mathbb{R}$ is extended to include infinitesimals, then the Archimedean property fails. To see this, let $x$ be a positive infinitesimal, and let $y = 1$. Since $x$ is an infinitesimal, we must have $x < \frac{1}{n}$ for every natural number $n$, so $nx < 1$ for all natural $n$ and thus $x$ and $y$ witness the failure of the Archimedean property.

One of the most prominent opponents of infinitesimals was Bishop Berkeley, who, in 1734, published The Analyst, an attack on the foundations of infinitesimal calculus with the wonderful subtitle, A DISCOURSE Addressed to an Infidel MATHEMATICIAN. WHEREIN It is examined whether the Object, Principles, and Inferences of the modern Analysis are more distinctly conceived, or more evidently deduced, than Religious Mysteries and Points of Faith. In the following memorable passage, Berkeley calls into question the nature and existence of infinitesimals (in what follows, ‘fluxions’ refers to a definition of Newton closely related to the derivative):

And what are these Fluxions? The Velocities of evanescent Increments? And what are these same evanescent Increments? They are neither finite Quantities nor Quantities infinitely small, nor yet nothing. May we not call them the ghosts of departed quantities?

Newton, Leibniz, and like-minded mathematicians of course defended their use of infinitesimals. Leibniz explicitly justified their use in his Law of Continuity, expressed in a manuscript in 1701:

In any supposed continuous transition, ending in any terminus, it is permissible to institute a general reasoning, in which the final terminus may also be included.

And more plainly in a letter from 1702:

The rules of the finite are found to succeed in the infinite.

The debate over infinitesimals was undoubtedly a good thing for mathematics, as it inspired mathematicians to work to put calculus on rigorous foundations. This was largely done in the 19th century, as the actual infinities of infinitesimals were replaced by the potential infinities of limits, and the definition of the derivative settled into the form familiar to students of calculus ever since:

$\lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{h}$

In calculating such a limit, one considers the behavior of $f(x+h)$ as $h$ becomes arbitrarily close, but not equal, to $0$. Importantly, these values of $h$ are all standard real numbers and not infinitesimal. Infinitesimals themselves came to be seen as something like a useful but foundationally suspect fiction, a heuristic that helped lead to the great achievements of calculus but which had properly been discarded upon the rigorous founding of calculus in the language of limits.

This would be a natural ending point for our story. Indeed, this is how the story went for about 100 years, and calculus as taught in most schools today is hardly changed from its 19th century formulation. However, surprises awaited in the mid-20th century, when the use of infinitesimals in calculus was finally put on a rigorous foundation and Newton, Leibniz, and other proponents were, in a sense, vindicated. We’ll have that story in our next installment.

Space-Filling Curves

Today, we take a look at infinite objects that have actually found a use in our finite world. (Or, more accurately but less spectacularly, their finite approximations have found a use.)

Take a look at these two videos from 3Blue1Brown, the first giving some background on and a hypothetical application of space-filling curves, and the second indulging in some space-filling play. Enjoy!

Ultrafilters V: Ramsey Theory

Ramsey theory is an area of the mathematical field of combinatorics. Its slogan could be, “In a large enough structure, order must arise.” It is named after Frank Ramsey, student of Keynes, friend of Wittgenstein, brother of the future Archbishop of Canterbury, who did seminal work in the area before tragically dying at the age of 26.

You may have heard this one. Suppose you’re throwing a party. You don’t really know whether the people you’re thinking of inviting know one another, and you want to make sure that, at the party, there are either three people who mutually know one another or three people who mutually do not know one another (as everyone knows, a party is doomed to be a failure if this doesn’t happen). What is the least number of people you need to invite to make sure this happens?

The answer is 6. We need to prove two things. First, 5 is not enough. To see this, consider the following picture. Suppose each of the dots is a person at your party, and two people are connected with a red edge if they know one another and a blue edge if they do not know one another. If you get unlucky enough to invite exactly such a configuration of five guests, you will not have a group of three that mutually know one another nor a group of three that mutually do not know one another.

We next need to show that 6 is indeed sufficient. I’ll leave this as an exercise for the reader. It’s fun! You should do it.

(What is the least number of people you need to invite to ensure a group of four people who mutually know one another or a group of four people who mutually do not know one another? 18. What about for groups of five? Nobody knows (or, if they do, they haven’t told me), but it’s between 43 and 49.)

This is the standard introductory theorem in Ramsey theory. To state more general results, it will be helpful to have some notation.

Notation: Suppose $X$ is a set and $n$ is a natural number. Then $[X]^n$ denotes the set of all $n$-element subsets of $X$, i.e. $[X]^n = \{Y \subseteq X \mid |Y| = n\}$.

Definition: Suppose $X, Y$ are sets, $n$ is a natural number, and $f:[X]^n \rightarrow Y$. If $H \subseteq X$, then we say $H$ is homogeneous for $f$ if there is a single element $y \in Y$ such that, for all $A \in [H]^n$, $f(A) = y$.

The existence of a large homogeneous set for a given function $f:[X]^n \rightarrow Y$ is an example of a certain amount of order existing in $f$. Therefore, a typical theorem in Ramsey theory may be something like this: If $X$ is sufficiently large with respect to $Y$, $n$ is a certain natural number, and $f$ is any function from $[X]^n$ to $Y$, then there is a large homogeneous set for $f$. Of course, ‘sufficiently large’ and ‘large’ would need to be made precise in the statement of an actual theorem.

With this language, we can restate our party theorem as follows: Suppose $|X| = 6$, $|Y| = 2$, and $f:[X]^2 \rightarrow Y$. Then $f$ has a homogeneous set of size $3$. Moreover, this is no longer true if we just assume $|X| = 5$.

Finite Ramsey theory is a fascinating and rich subject, and I encourage readers to learn more about it. Given the domain of this blog though, we want to consider Ramsey theory on infinite sets. The seminal result in this direction is, naturally, the Infinite Ramsey Theorem, proven by Ramsey in 1930.

Infinite Ramsey Theorem: Suppose $n,k \in \mathbb{N}$ and $f:[\mathbb{N}]^n \rightarrow \{1, \ldots, k\}$. Then there is an infinite homogeneous set $H \subseteq \mathbb{N}$ for $f$.

Proof: If either $n$ or $k$ is equal to $1$, then the theorem is trivial. We prove the theorem for the case $n = 2$ and an arbitrary value of $k$. The general statement is proven in a similar way (with some additional care taken at certain points) by induction on $n$.

Let $\mathcal{U}$ be a non-principal ultrafilter on $\mathbb{N}$. We will consider elements of $[\mathbb{N}]^2$ as ordered pairs $(i,j)$ such that $i < j$. For each $i \in \mathbb{N}$ and $\ell \leq k$, let $A_{i, \ell}$ be the set of $j \in \mathbb{N}$ such that $i < j$ and $f(i,j) = \ell$. Since $\mathcal{U}$ is a non-principal ultrafilter and $\bigcup_{\ell \leq k} A_{i, \ell} = \mathbb{N} \setminus \{0, \ldots, i\}$, we may fix a natural number $\ell_i \leq k$ such that $A_{i, \ell_i} \in \mathcal{U}$. Again using the fact that $\mathcal{U}$ is a non-principal ultrafilter, we may find a set $B \in \mathcal{U}$ and a fixed natural number $\ell^* \leq k$ such that, for all $i \in B$, $i_\ell = i^*$.

We now construct an infinite set $H \subseteq B$ such that, for every $i < j$ with both $i$ and $j$ in $H$, we have $f(i,j) = \ell^*$. We will do this one element at a time, enumerating $H$ in increasing order as $\{i_m \mid m \in \mathbb{N}\}$. To start, let $i_0$ be the least element of $B$. Next, find $i_1 > i_0$ such that $i_1 \in B \cap A_{i_0, \ell^*}$. Since $B$ and $A_{i_0, \ell^*}$ are in $\mathcal{U}$, such an $i_1$ exists and, as $i_1 \in A_{i_0, \ell^*}$, we have $f(i_0, i_1) = \ell^*$. Next, find $i_2 > i_1$ such that $i_2 \in B \cap A_{i_0, \ell^*} \cap A_{i_1, \ell^*}$. As before, such an $i_2$ always exists and $f(i_0, i_2) = f(i_1, i_2) = \ell^*$. Keep going in this fashion. Namely, if $m^* \in \mathbb{N}$ and we have already defined $\{i_m \mid m \leq m^*\}$, then find $i_{m^*+1} > i_{m^*}$ such that $i_{m^*+1} \in B \cap A_{i_0, \ell^*} \cap A_{i_1, \ell^*} \cap \ldots \cap A_{i_{m^*}, \ell^*}$. Since this is an intersection of finitely many sets, all of which are in $\mathcal{U}$, we can always find such an $i_{m^*+1}$ and, for all $m \leq m^*$, we have $f(i_m, i_{m^*+1}) = \ell^*$. Thus, we can continue for infinitely many steps, completing the definition of $H$ and therefore the proof of the theorem.

Remarks: The use of an ultrafilter here is not really necessary but makes the proof a little slicker. It also raises an interesting question related to the notion that Ramsey theory is about finding large homogeneous sets. Recall that ultrafilters can be thought of as specifying which subsets of a given set should be considered to be ‘large.’ In the Infinite Ramsey Theorem, we were able to obtain an infinite homogeneous set, but can we always find a homogeneous set $H \in \mathcal{U}$? This question motivates the following definition.

Definition: Let $\mathcal{U}$ be a non-principal ultrafilter on $\mathbb{N}$. $\mathcal{U}$ is called a Ramsey ultrafilter if, for all natural numbers $n,k$ and every function $f:[\mathbb{N}]^n \rightarrow \{1, \ldots, k\}$, there is a set $H \in \mathcal{U}$ that is homogeneous for $f$.

So our question becomes: are all non-principal ultrafilters on $\mathbb{N}$ Ramsey ultrafilters? The answer to this is: “No.” There are always non-principal ultrafilters on $\mathbb{N}$ that are not Ramsey ultrafilters.

So the next question, of course, is: are there any Ramsey ultrafilters? This turns out to be the interesting question, and the answer is: “It depends.” The existence of Ramsey ultrafilters is independent of the usual axioms of set theory (the axioms of ZFC). Namely, it is consistent with the axioms of ZFC that Ramsey ultrafilters exist, but it is also consistent with ZFC that Ramsey ultrafilters do not exist.

Infinite Ramsey theory has been in the news lately due to a recent breakthrough in reverse mathematics by Ludovic Patey and Keita Yokoyama. Let me try to briefly (and therefore rather imprecisely) explain their result, with the caveat that I am in no way an expert in reverse mathematics.

As I mentioned, the standard proofs of the Infinite Ramsey Theorem are pretty similar for the cases $n=2$ and $n=3$. This should make a certain amount of sense, since the statements for $n=2$ and $n=3$ are quite similar. However, in another sense, this is somewhat misleading, as the standard proof of the Infinite Ramsey Theorem takes place in the setting of the axioms of set theory (ZFC), and these axioms may be much more powerful than necessary. Reverse mathematics attempts to classify theorems by exactly how much one needs to assume (beyond some fixed, very weak base system (known as $\mathrm{RCA}_0$)) to prove them. A hierarchy of systems extending $\mathrm{RCA}_0$ has been uncovered through the attempt to pin down the logical strengths of various theorems. An interesting dividing line in the set of these systems is between those that are ‘finitistic’ (essentially, those that do not rely on the existence of infinite sets) and those that are ‘infinitistic.’ The strongest finitistic system in widespread use is ‘primitive recursive arithmetic,’ or ‘PRA.’ A theorem is thus called ‘finitistically reducible’ if it is finitistically reducible to PRA (roughly, if it can be proven without the use of infinite sets).

It was known already that the instance of the Infinite Ramsey Theorem with $n=3$ and $k=2$ is not finitistically reducible. This makes a certain naive, intuitive sense, as the conclusion of the theorem asserts the existence of an infinite homogeneous set. Patey and Yokoyama’s surprising result is that the instance of the Infinite Ramsey Theorem with $n=2$ and $k=2$ is finitistically reducible. Unlike the infinite homogeneous set generated by the $n=3$ case, that generated by the $n=2$ case is, in a sense, simple enough that any consequence of its existence can be obtained purely finitistically.

For those of you who want to read more about this interesting result, there’s a decent article about it in Quanta Magazine, and Patey and Yokoyama’s paper can be found here.