In a previous installment of our Measuring series, we considered the question of measuring subsets of the natural numbers. We came up with a solution using a non-principal ultrafilter. This is an elegant solution, but it has some drawbacks. One of these is that, assuming we are assigning a measure of 1 to the entire set, it seems natural to want to assign a measure of $\frac{1}{2}$ to both the set of even numbers and the set of odd numbers. However, depending on which of these two sets is in our ultrafilter, the ultrafilter measure will assign a measure of 1 to one of these sets and 0 to the other. We discussed an ad hoc fix to this, but, even then, similar problems arise. The ultrafilter measure is simply not what we want in many instances.

Today, we are going to discuss another approach to thinking about the size of subsets of $\mathbb{N}$, an approach that exploits the fact that it is really easy to measure subsets of a finite set. Our strategy will be to approximate the natural numbers by larger and larger finite sets, which we can easily deal with, and then see if their behavior converges to a limit.

Recall that, given a natural number $n$, we think of $n$ as the set of all smaller natural numbers, i.e. $n = \{0,1,\ldots,n-1\}$. Given a subset $A \subseteq \mathbb{N}$ and a natural number $n$, let

$d_n(A) = \frac{|A \cap n|}{n}$.

$d_n(A)$ thus gives us the measure of $A$ relative to the first $n$ natural numbers. We might hope that, as $n$ gets larger, we get better and better information about the distribution of $A$. This happens precisely when the limit $\lim_{n \rightarrow \infty} d_n(A)$ exists. When it does, we let $d(A) = \lim_{n \rightarrow \infty} d_n(A)$. (For those of you who forget pre-calculus, recall that, intuitively, $\lim_{n \rightarrow \infty} d_n(A) = L$ if, as $n$ gets larger and larger, $d_n(A)$ gets closer and closer to $L$.)

$d(A)$, when it exists, is called the asymptotic density of $A$. Asymptotic density can be a useful way of measuring subsets of $\mathbb{N}$, and it has some nice properties. For example, the following are easily seen.

• $d(\emptyset) = 0$.
• $d(\mathbb{N}) = 1$.
• $d(\mathrm{Evens}) = d(\mathrm{Odds}) = \frac{1}{2}$.

The celebrated Prime Number Theorem states that, asymptotically, $d_n(\mathrm{Primes}) \sim \frac{1}{\log(n)}$. Therefore, since $\log(n)$ gets arbitrarily large as $n$ increases, we immediately get $d(\mathrm{Primes}) = 0$.

Asymptotic density also has the nice feature of being finitely additive. This means that, if $A,B \subseteq \mathbb{N}$, $A \cap B = \emptyset$, and $d(A)$ and $d(B)$ both exist, then $d(A \cup B)$ exists and $d(A \cup B) = d(A) + d(B)$.

A big drawback to asymptotic density, though, is that, in general, there is no guarantee that the limit $\lim_{n \rightarrow \infty} d_n(A)$ will exist. Thus, for many sets $A \subseteq \mathbb{N}$, $d(A)$ is simply undefined. This leads us to consider generalizations of asymptotic density, called upper and lower density, that will be defined for more sets (in fact, for all of them). To do so, we need to review some key facts about real numbers.

Recall that $[0,1]$ denotes the closed interval of real numbers $r$ such that $0 \leq r \leq 1$. If $X \subseteq [0,1]$, then the supremum of $X$, denoted $\sup(X)$, is the least real number $s$ such that, for all $r \in X$, $r \leq s$. Intuitively, $\sup(X)$ is the least real number that bounds $X$ from above. Dually, the infimum of $X$, denoted $\inf(X)$, is the largest real number $s$ such that, for all $r \in X$, $s \leq r$. $\inf(X)$ is thus the largest real number that bounds $X$ from below. It is one of the defining features of the real numbers that suprema and infima of bounded sets always exist.

Now suppose we have a sequence $\langle x_n \mid n \in \mathbb{N} \rangle$, where, for all $n$, $x_n \in \mathbb{R}$. The limit $\lim_{n \rightarrow \infty}x_n$ might fail to exist; for example, this will be the case if the sequence oscillates between two different numbers. To help analyze the behavior of bounded sequences without limits, we can consider variations on the limit, known as the limit superior ($\limsup$) and limit inferior ($\liminf$). These are defined as follows:

$\limsup_{n \rightarrow \infty} = \lim_{n \rightarrow \infty}(\sup\{x_m \mid m \geq n\})$

$\liminf_{n \rightarrow \infty} = \lim_{n \rightarrow \infty}(\inf\{x_m \mid m \geq n\})$

Take a minute to think about what these definitions are saying. The following image may be useful.

It should be noted that $\limsup_{n \rightarrow \infty} x_n$ is the limit of a non-increasing sequence and $\liminf_{n \rightarrow \infty} x_n$ is the limit of a non-decreasing sequence, so both of them are guaranteed to exist (if the sequence is not bounded, one or both might be $\pm \infty$).

With this in mind, we can now define the upper density of a set $A \subseteq \mathbb{N}$ to be:

$\overline{d}(A) = \limsup_{n \rightarrow \infty} d_n(A)$

and the lower density to be:

$\underline{d}(A) = \liminf_{n \rightarrow \infty} d_n(A)$.

Upper and lower density have the nice feature that they are defined for all subsets of natural numbers. In addition, it is easily seen that, if $A \subseteq \mathbb{N}$ and $d(A)$ exists, then $\overline{d}(A) = d(A) = \underline{d}(A)$. Upper and lower density can be useful ways of quantifying the “size” of subsets of $\mathbb{N}$. In particular, as we will see in our next post, the statement $\overline{d}(A) > 0$ can in many contexts be interpreted as stating that $A$ is a “non-negligible” subset of $\mathbb{N}$.

However, upper and lower density suffer from a fatal flaw that prevents them from being considered to be true “measures” of subsets of $\mathbb{N}$: they are not even finitely additive. I leave you for now to explore this phenomenon on your own; solutions to the following two exercises and further investigation of density will come next week.

Exercise 1: Find subsets $A,B \subseteq \mathbb{N}$ such that:

• $A \cap B = \emptyset$;
• $A \cup B = \mathbb{N}$;
• $\overline{d}(A) = \overline{d}(B) = 1$;
• $\underline{d}(A) = \underline{d}(B) = 0$.

Exercise 2: Show that, if $n \in \mathbb{N}$ and $A_1, A_2, \ldots, A_n$ are subsets of $\mathbb{N}$ such that $A_1 \cup A_2 \cup \ldots \cup A_n = \mathbb{N}$, then there is $k \leq n$ such that $\overline{d}(A_k) > 0$.

Cover image: Wallace Bournes by Gerhard Richter