# Measuring V.2: Density and Arithmetic Progressions

In our last post, we introduced the notions of density, upper density, and lower density as means of measuring subsets of natural numbers. We ended with two exercises. We recall them now; solutions are at the end of this post.

Exercise 1: Find subsets $A,B \subseteq \mathbb{N}$ such that:

• $A \cap B = \emptyset$;
• $A \cup B = \mathbb{N}$;
• $\overline{d}(A) = \overline{d}(B) = 1$;
• $\underline{d}(A) = \underline{d}(B) = 0$.

Exercise 2: Show that, if $n \in \mathbb{N}$ and $A_0, \ldots, A_n$ are subsets of $\mathbb{N}$ such that $A_0 \cup A_1 \cup \ldots \cup A_n = \mathbb{N}$, then there is $k \leq n$ such that $\overline{d}(A_k) > 0$.

Before getting to the solutions, though, let’s take an interesting detour through a simple topic that has been the focus of a large amount of groundbreaking mathematics over the last century: arithmetic progressions in natural numbers. The journey will stop at three beautiful theorems and one beautiful conjecture and will involve some of the mathematical giants of the twentieth and twenty-first centuries.

Definition: An arithmetic progression of natural numbers is an increasing sequence of natural numbers such that the difference between successive terms of the sequence is constant. The length of an arithmetic progression is the number of terms in the sequence.

For example, the sequence $\langle 4,7,10,13 \rangle$ is an arithmetic progression of length 4, as the difference between each pair of successive terms is 3. The sequence of even numbers, $\langle 0,2,4,6,\ldots \rangle$, and the sequence of odd numbers, $\langle 1,3,5,7, \ldots \rangle$ are both arithmetic progressions of infinite length.

In an earlier post, we introduced the mathematical field of Ramsey Theory. Roughly speaking, Ramsey Theory studies the phenomenon of order necessarily arising in sufficiently large structures. Ramsey Theory takes its name from Frank Ramsey, who published a seminal paper in 1930. Arguably the first important theorem in Ramsey Theory, though, came three years earlier and is due to the Dutch mathematician Bartel Leendert van der Waerden.

Theorem (van der Waerden, 1927): Suppose that the natural numbers are partitioned into finitely many sets, $A_0, A_1, \ldots, A_n$ (so we have $\bigcup_{i \leq n} A_i = \mathbb{N}$). Then there is an $i \leq n$ such that $A_i$ contains arithmetic progressions of every finite length.

This is naturally seen as a statement in Ramsey Theory: Arithmetic progressions are very orderly things, and Van der Waerden’s Theorem states that, no matter how deviously you divide the natural numbers into finitely many pieces, you cannot avoid the appearance of arbitrarily long arithmetic progressions in one of the pieces. It is also worth noting that Van der Waerden’s Theorem cannot be strengthened to ensure the existence of infinite arithmetic progressions: it is relatively easy to partition the natural numbers into even just two sets such that neither set contains an arithmetic progression of infinite length (try it!).

You might notice a similarity between the statement of Van der Waerden’s Theorem and our Exercise 2: both say that, if the natural numbers are divided into finitely many pieces, at least one of the pieces must be “large” in a certain sense. In Van der Waerden’s Theorem, “large” means containing arbitrarily long arithmetic progressions, while in Exercise 2, it means having positive upper density.

Is there a connection between these two notions of largeness? It turns out that there is! In 1936, Erdős and Turán conjectured that, if $A \subseteq \mathbb{N}$ and $A$ has positive density, then $A$ contains arbitrarily long arithmetic progressions. Almost forty years later, the conjecture was confirmed (and, in fact, strengthened, as it was proven for upper density rather than density) in a celebrated proof by Hungarian mathematician Endre Szemerédi.

Theorem (Szemerédi, 1975): Suppose that $A \subseteq \mathbb{N}$ and $\overline{d}(A) > 0$. Then $A$ contains arithmetic progressions of every finite length.

Note that Szemerédi’s Theorem is a true generalization and strengthening of Van der Waerden’s theorem: when combined with Exercise 2, Szemerédi’s Theorem directly implies Van der Waerden’s Theorem. However, Szemerédi’s Theorem does not give any information about sets of natural numbers with zero upper density. There is one particularly interesting such set, which we encountered in our last post: the set of prime numbers, $P = \{2,3,5,7,11, \ldots \}$.

According to experts, it is likely that it was conjectured as early as 1770, by Lagrange and Waring, that the set of primes contains arbitrarily long arithmetic progressions, and the question experienced renewed interest in the wake of the proof of Szemerédi’s Theorem. An answer finally came in 2004, in a paper by Ben Green and Terence Tao.

Theorem (Green-Tao, 2004): The set of prime numbers contains arithmetic progressions of every finite length.

The story has not ended yet, as many very natural questions about arithmetic progressions remain unsolved. Perhaps the most famous is the following conjecture of Erdős, strengthening his conjecture with Turán that was eventually confirmed by Szemerédi.

Conjecture (Erdős): Suppose $A \subseteq \mathbb{N}$ (with $0 \not\in A$) and $\sum_{n \in A} \frac{1}{n} = \infty$. Then $A$ arithmetic progressions of every finite length.

This conjecture, unlike Szemerédi’s Theorem, would cover the case in which $A$ is the set of all primes, so this conjecture would generalize both Szemerédi’s Theorem and the Green-Tao Theorem. The problem currently carries a cash prize of \$5000 (and, of course, the promise of mathematical fame).

Solution to Exercise 1: Recursively define a sequence $\langle a_n \mid n \in \mathbb{N} \rangle$ by letting $a_0 = 1$, and, given $a_0, \ldots, a_n$, letting $a_{n+1} = (n+1)(a_0 + a_1 + \ldots + a_n)$. The first few terms of the sequence are thus $1,1,4,18,96, \ldots$. Now divide $\mathbb{N}$ into pairwise disjoint blocks $\langle A_n \mid n \in \mathbb{N} \rangle$, arranged one after the other, in which, for all $n \in \mathbb{N}$, $|A_n| = a_n$. The first few blocks are thus as follows:

• $A_0 = \{0\}$;
• $A_1 = \{1\}$;
• $A_2 = \{2,3,4,5\}$;
• $A_3 = \{6,7,\ldots,23\}$;
• $A_4 = \{24,25,\ldots,119\}$.

Now let $A$ be the union of all of the even-numbered blocks, i.e. $A = \bigcup_{n \in \mathbb{N}}A_{2n}$, and let $B$ be the union of all of the odd-numbered blocks.

We clearly have $A \cap B = \emptyset$ and $A \cup B = \mathbb{N}$. We next show $\overline{d}(A) = 1$. Recall that $\overline{d}(A) = \limsup_{n \rightarrow \infty} \frac{|A \cap n|}{n}$. Given $n \in \mathbb{N}$, let $m_n$ be the largest element of $A_{2n}$. Think about the quantity $\frac{|A \cap (m_n+1)|}{m_n+1}$. Since $A_{2n} \subseteq A$, we clearly have $|A \cap (m_n+1)| \geq |A_{2n}| = a_{2n}$. We also have $m_n+1 = |A_0| + |A_1| + \ldots + |A_{2n}| = a_0 + a_1 + \ldots + a_{2n}$. But this yields:

$\frac{|A \cap (m_n+1)|}{m_n+1} \geq \frac{a_{2n}}{a_0 + \ldots + a_{2n}} = \frac{(2n)(a_0 + \ldots + a_{2n-1})}{a_0 + \ldots + a_{2n-1} + (2n)(a_0 + \ldots + a_{2n-1})} = \frac{2n}{2n+1}$.

As $n$ gets arbitrarily large, $\frac{2n}{2n+1}$ gets arbitrarily close to $1$, so we in fact get $\overline{d}(A) = 1$. The proofs of $\overline{d}(B) = 1$ and $\underline{d}(A) = \underline{d}(B) = 0$ are essentially the same and are thus left to the reader.

Solution to Exercise 2: We will in fact show something stronger: there is $k \leq n$ such that $\overline{d}(A_k) \geq \frac{1}{n+1}$. Suppose for sake of a contradiction that this is not the case. Then, by the definition of upper density, for each $k \leq n$ we can find $m_k \in \mathbb{N}$ such that, for all natural numbers $m \geq m_k$, $\frac{|A_k \cap m|}{m} < \frac{1}{n+1}$. Now we can choose a natural number $m^*$ large enough so that, for all $k \leq n$, $m_k \leq m^*$.

For each $k \leq n$, by our choice of $m_k$, we know that $\frac{|A_k \cap m^*|}{m^*} < \frac{1}{n+1}$. Also, by the fact that $A_0 \cup \ldots \cup A_n = \mathbb{N}$, we have that $|A_0 \cap m^*| + \ldots + |A_n \cap m^*| \geq m^*$. But this yields:

$1 = \frac{m^*}{m^*} \leq \frac{|A_0 \cap m^*| + \ldots + |A_n \cap m^*|}{m^*} = \frac{|A_0 \cap m^*|}{m^*} + \ldots + \frac{|A_n \cap m^*|}{m^*} <$

$<\frac{1}{n+1} + \ldots + \frac{1}{n+1} = (n+1)\frac{1}{n+1} = 1$.

We have thus shown $1 < 1$. This is, of course, a contradiction, which finishes the exercise.

Cover Image: Vir Heroicus Sublimis by Barnett Newman