An interesting result that demonstrates, among other things, the ubiquity of in mathematics is that the probability that two random positive integers are relatively prime is
. A more revealing way to write this number is
, where
is the Riemann zeta function. A few weeks ago this result came up on math.SE in the following form: if you are standing at the origin in and there is an infinitely thin tree placed at every integer lattice point, then
is the proportion of the lattice points that you can see. In this post I’d like to explain why this “should” be true. This will give me a chance to blog about some material from another math.SE answer of mine which I’ve been meaning to get to, and along the way we’ll reach several other interesting destinations.
Densities and measures
First of all, what do we mean by a “random” integer? It’s not hard to see that there is no uniform probability distribution on the positive integers. The strongest statement that is meant when somebody says that a positive integer has a probability of of being in some subset
is that the natural density
is equal to . For example, by this definition the probability that a positive integer is divisible by
is
, as one would expect. Natural density, however, has drawbacks: it is not defined for every subset of
, and it does not form a probability measure in the measure-theoretic sense. In particular, it’s not countably additive, since each positive integer has measure
but their union has measure
.
If we want an honest probability measure on , we need to choose a sequence of non-negative real numbers
such that
. Since we’re trying to answer a number-theoretic question about relatively prime integers, we should pick the
to have nice number-theoretic properties. One property that’s natural to ask for, as above, is that the probability that an integer is divisible by
should be
for any positive integer
. In other words, we want
for all . This implies, among other things, that the events that an integer is divisible by one of two relatively prime positive integers are independent.
Unfortunately, this is impossible. For any fixed positive integer , it’s clear that
is at most the probability that a positive integer is not divisible by
for any prime
. The probability that a positive integer is not divisible by
is
, and since these events are independent, the probability that a positive integer is not divisible by the primes not dividing
is
which diverges to zero, so ; contradiction.
Here’s one explanation for why this condition can’t work which I think is revealing. One way to justify requiring that the probability that a positive integer is divisible by is
is that this is true for all finite quotients of
; that is, in
it’s true that for all positive integers
, exactly
of the residues are congruent to
. In other words, when we imposed the above requirement, we were secretly looking for a probability distribution compatible with the counting measure on the finite quotients of
. Whenever you want to talk about the finite quotients of a group
, there’s a universal object that lets you talk about all of them at once: the profinite completion
, defined as the limit of the diagram of the finite quotients of
equipped with the obvious morphisms (that is, if
are two subgroups of finite index such that
, then there is an obvious morphism
). The profinite completion
comes equipped with a natural map
which is not always injective and with natural projections
for every finite quotient of
. It also comes with the initial topology making these projections continuous, and with this topology it is Hausdorff, compact, and totally disconnected. In addition, the image of
is dense.
Example. The profinite completion of is a gadget
called the profinite integers. By the Chinese Remainder Theorem, it is isomorphic to the direct product
. Profinite groups often appear as Galois groups, and in this case
naturally occurs as the absolute Galois group of a finite field.
Example. The profinite completion of is the trivial group because
has no finite quotients! This shows that the natural map from a group into its profinite completion need not be injective.
Since is compact Hausdorff, it comes equipped with a canonical (say, left) Haar measure (the one which assigns the entire group measure
). This is important because it describes a probability measure on
. For profinite groups the Haar measure is particularly easy to describe. Any Borel measure on a topological space is determined by what it does to a basis of open sets, and for profinite groups
such a basis is given by the partitions determined by every projection
. The kernel of such a morphism is a subgroup of finite index, and by invariance all of its cosets must have the same measure as it, so they must all have measure
. In particular, the (normalized) Haar measure on
is the unique one which assigns measure
to the kernel of the projection
.
This is precisely the property we wanted from a probability distribution on . But now the bitter truth hits us: in
, the subgroup
has measure zero! One way to see this is that the identity is in the kernel of every projection, so it has measure less than
for every
, hence measure zero. And by invariance, every element of
has measure zero (in other words, there are no atoms), hence
also has measure zero. One can also repeat the first proof given above in this context, but the point is that we are free to use invariance here because it was already guaranteed by the requirement that the measure be compatible with projections to the finite quotients. So we are totally defeated by this requirement, which is too strong.
(On the other hand, if we are ever in a situation where we want to pick random integers, we should perhaps think about picking random profinite integers instead because we now know there is a perfectly reasonable way to do this.)
The zeta distributions
What now? Well, the Chinese Remainder Theorem suggests that it is still natural to require that the events of being divisible by two relatively prime integers should be independent. This is equivalent to choosing the different exponents in the prime factorization independently. So how do we choose the exponent of a particular prime
?
To answer this restricted question, the relevant quotients of this time are the finite quotients
. In such a finite quotient the probability of not being divisible by
is
, the probability of being divisible by
(but not by
) is
, the probability of being divisible by
(but not by
) is
, and so forth. This geometric behavior extends even to the limit of these quotients, which is the
-adic integers
. The
-adic integers also have a canonical normalized Haar measure, and again because of compatibility with the finite quotients
the measure of the kernel (the elements divisible by
) is
, which duplicates the above behavior. So it is tempting to make the exponent of
be equal to
with probability
.
But since the profinite integers are just the direct product of the
, we are just repeating the same argument as above, but locally, so we know this can’t work. (To see this explicitly, once again we can compute the measure of
.) However, it is still natural to use a geometric distribution, for example because it maximizes entropy given a mean: see Keith Conrad’s excellent exposition on the subject. So let’s make the exponent of
equal to
with probability
for some constant
.
With this choice, the measure of a positive integer is now equal to
so we see that the product needs to converge in order for this measure to make sense. Other than that requirement,mhow do we pin down a nice choice of the constants
?
One additional requirement we can impose is that the measure of a positive integer is monotonically decreasing. It turns out (and I will leave this as a nice exercise) that this is possible if and only if for some positive real constant
. Since
also needs to converge, it turns out (again, exercise) that this is possible if and only if
is greater than
. This then gives
where is the zeta function! (This is one way to flesh out the tantalizing statement made by Rota which I quoted in this old blog post, although I don’t know if it’s the precise result that Rota had in mind.) This distribution is the zeta distribution on
with parameter
.
One way to think about the zeta distribution is as the probability distribution coming from viewing as the partition function of a certain statistical-mechanical system, the Riemann gas, whose states are the positive integers
and whose energies are their logarithms
. As with many other cool connections in mathematics, this was discussed by John Baez, who has links to more information, such as this wonderful page. One can think of a positive integer
as describing a multiset of fictitious particles called “primons,” where each primon
occurs with multiplicity the corresponding exponent in the prime factorization of
and energy
. So
is the sum of the energies of all the primons in the state corresponding to
.
The parameter corresponds to inverse temperature, and as
the partition function diverges, which corresponds to the Hagedorn temperature of the system. In this limit each state becomes closer to being equally likely; in other words, we are getting closer and closer to a uniform probability distribution on
, if such a thing existed.
This suggests the following idea. Given a subset , we look at its measure under the zeta distribution
and then we take the high-temperature limit . The resulting number, when it exists, turns out to equal the logarithmic density
of . The logarithmic density is still not a measure on
, but it has the benefit of existing for more sets than the natural density. Even better, if a set
has a natural density, then it also has a logarithmic density, and the two agree. So studying zeta distributions can tell us what the natural density of a subset
has to be, if it exists.
The proof
One reason I was finally inspired to write this post is that the zeta distribution occurred on my measure theory homework. We were asked to prove that the events of being divisible by distinct primes were independent and that this implies the celebrated Euler product formula; note that this is already implicit in the above derivation, where the assumption of independence leads inevitably to the product formula. Then we were asked to prove that the probability that a positive integer is squarefree is equal to . But since a squarefree positive integer is just a product of distinct primes, this is just
as desired. Finally, we were asked to prove that the probability that two positive integers are relatively prime is also equal to . This can be done as follows: the probability that a positive integer is squarefree is the same as the probability that one positive integer
is squarefree and another positive integer
is arbitrary. We want to show that this is the same as the probability that two positive integers
are relatively prime. One way to show this is to exhibit a probability-preserving bijection between the two sets, which can be done as follows: given
squarefree and
arbitrary, send
.
This map is injective because can be recovered as the radical of
. This map sends a squarefree positive integer and an arbitrary positive integer to two relatively prime positive integers, and it preserves probabilities. In the other direction, given
relatively prime, let
denote the product of the primes dividing
and send
This gives a squarefree positive integer and another positive integer, and it preserves probabilities. In addition, it is not hard to verify that the two maps given above are inverses of each other.
This argument is equivalent to the statement that the number of ways to write a positive integer as the product of two relatively prime positive integers is the same as the number of ways to write a positive integer
as the product of a squarefree integer and an arbitrary integer, which is in turn equivalent to a Dirichlet series argument which computes the probabilities above directly. This number is a multiplicative function in
, so it’s enough to verify that the two are equal for prime powers, which is clear. But I thought the above argument was cute.
In any case, it now follows that the “logarithmic probability” that two positive integers are relatively prime is . Hence the “natural probability” that two positive integers are relatively prime, if it exists, is also
. So the mysterious appearance of the zeta function in this result is now hopefully not so mysterious.
Generalizations
The above argument also works on more general zeta functions. For example, let be a Dedekind domain such that, for every ideal
, the quotient
is finite; this is what Pete Clark calls a Dedekind abstract number ring, and it includes the ring of integers in any number field as well as the ring of integers in any finite extension of
. Define the norm of an ideal
to be
. Then we can define the Dedekind zeta function
where the Euler product is taken over prime ideals. When , this is just the Riemann zeta function. When
, every ideal is principal and generated by some polynomial
, and the norm of the corresponding ideal is
where
. Since there are exactly
monic polynomials of degree
, it follows that
where . The corresponding Euler product is described in this previous post.
The Dedekind zeta function of describes, for fixed
such that the sum converges, a probability distribution on the set of ideals of
, where the ideal
occurs with probability
. Since ideals uniquely factor into products of prime ideals, everything that was said above carries over verbatim to any
. (In statistical-mechanical language, the prime ideals are the “primons” out of which states are built, and a prime ideal
has energy
.) This implies, for example, that in the limit as
, the probability that a polynomial in
is squarefree, which is also equal to the probability that two polynomials in
are relatively prime, equals
.
There is a beautiful blog post by Jordan Ellenberg which explains how to get the analogous version of this statement when is replaced by
.
What we have now is pretty funny: it is a probabilistic algorithm to compute for any
for which it is possible to explicitly construct random(ish) elements. In fact, if “squarefree” is replaced by “
-power free” for some
then we have a probabilistic algorithm to compute
for any positive integer
. One might imagine an alien civilization discovering zeta functions this way.
In this generality, the construction of the profinite completion of still makes sense, although it should be taken as a ring instead of just a group, where we take the limit over all finite quotient rings
. It is closely related to the adele ring of
when
is the ring of integers in a finite extension of a global field, and its Haar measure is very important in the modern approach to algebraic number theory, beginning with Tate’s thesis. I’m not qualified to describe how this works in any more detail than that, unfortunately.
This whole discussion generalizes even further to dynamical zeta functions, although I’m not entirely sure what to make of it. It gives a probability distribution on the space of multisets of orbits in a dynamical system whose primons are the orbits, and where the energy of an orbit is the number of points it contains.
Energy
Recall that if is the partition function of a statistical-mechanical system, then the negative logarithmic derivative
describes the average or expected energy of the system. For the Riemann zeta function, this gives
.
which is the Dirichlet series of the von Mangoldt function, and which figures in some proofs of the prime number theorem. Intuitively, this sum describes the contribution to the average energy of the system from each type of “primon” in the Riemann gas; the expected contribution to the energy from primon is
times the expected number of
-primons. One way to see why this expected number should be equal to
is to use the identity
which is valid for any non-negative integer-valued random variable . (Note that one can run this argument in reverse to deduce the form of the Riemann zeta function from the average energy function; in particular, the Euler product is now a consequence of linearity of expectation!)
Now let be a dynamical system; that is, let
be a set and
a function. We require, as before, that for every
the number
of fixed points of
is finite. Then the energy corresponding to the dynamical zeta function
,
where the product runs over all orbits of and
is the size of an orbit, is equal to
.
The RHS follows from our computation of the average energy by summing over the average contribution from each primon, and the LHS is equal to the RHS because any element of corresponds to some number of copies of the same orbit (that is, primon). This is, to me, an appealing way to think about the equivalence between the exponential and product forms of the dynamical zeta function. Previously I only had a purely combinatorial way of thinking about it, but the energy computation is a nice alternate perspective, especially as it applies to the case where
is a variety over
and
is the Frobenius map.
[…] Comments Alison on Zeta functions, statistical me…S.Chandrasekhar on Zeta functions, statistical me…Zeta functions, stat… on Walks on […]
A very nice post!
It reminds me of an idea I came up with when I was in middle school and was never quite able to rigorize. I was thinking about the product prod_(p prime) (1-1/p) and it occurred to me: (1-1/p) is the probability that a random natural number n is not divisible by p. Take the product over all primes: prod_p(1-1/p) is the probability that n is divisible by no primes. But of the infinitely many natural numbers, the only one that is divisible by no primes is 1! So this probability is 0.
When I grew up and learned some analysis, I found myself frustrated that I couldn’t make this proof rigorous, eg by using asymptotic density. (Although I did see hints of the idea in at least one proof I saw of some number theory fact.) At the same time I found myself a little bit miffed at zeta functions, which seemed messy and not very well motivated. So thank you for making clear to me how to make more sense of that intuition, and how this naturally leads to zeta functions.
Nice post QY.