Posts Tagged ‘partition functions’

The principle of maximum entropy asserts that when trying to determine an unknown probability distribution (for example, the distribution of possible results that occur when you toss a possibly unfair die), you should pick the distribution with maximum entropy consistent with your knowledge.

The goal of this post is to derive the principle of maximum entropy in the special case of probability distributions over finite sets from

We’ll do this by deriving an arguably more fundamental principle of maximum relative entropy using only Bayes’ theorem.


Read Full Post »

The previous post on noncommutative probability was too long to leave much room for examples of random algebras. In this post we will describe all finite-dimensional random algebras with faithful states and all states on them. This will lead, in particular, to a derivation of the Born rule from statistical mechanics. We will then give a mathematical description of wave function collapse as taking a conditional expectation.


Read Full Post »

In the previous post we described the following result characterizing the zeta distribution.

Theorem: Let a_n = \mathbb{P}(X = n) be a probability distribution on \mathbb{N}. Suppose that the exponents in the prime factorization of n are chosen independently and according to a geometric distribution, and further suppose that a_n is monotonically decreasing. Then a_n = \frac{1}{\zeta(s)} \left( \frac{1}{n^s} \right) for some real s > 1.

I have been thinking about the first condition, and I no longer like it. At least, I don’t like how I arrived at it. Here is a better way to conceptualize it: given that n | X, the probability distribution on \frac{X}{n} should be the same as the original distribution on X. By Bayes’ theorem, this is equivalent to the condition that

\displaystyle \frac{a_{mn}}{a_n + a_{2n} + a_{3n} + ...} = \frac{a_m}{a_1 + a_2 + ...}

which in turn is equivalent to the condition that

\displaystyle \frac{a_{mn}}{a_m} = \frac{a_n + a_{2n} + a_{3n} + ...}{a_1 + a_2 + a_3 + ...}.

(I am adopting the natural assumption that a_n > 0 for all n. No sense in excluding a positive integer from any reasonable probability distribution on \mathbb{N}.) In other words, \frac{a_{mn}}{a_m} is independent of m, from which it follows that a_{mn} = c a_m a_n for some constant c. From here it already follows that a_n is determined by a_p for p prime and that the exponents in the prime factorization are chosen geometrically. And now the condition that a_n is monotonically decreasing gives the zeta distribution as before. So I think we should use the following characterization theorem instead.

Theorem: Let a_n = \mathbb{P}(X = n) be a probability distribution on \mathbb{N}. Suppose that a_{nm} = c a_n a_m for all n, m \ge 1 and some c, and further suppose that a_n is monotonically decreasing. Then a_n = \frac{1}{\zeta(s)} \left( \frac{1}{n^s} \right) for some real s > 1.

More generally, the following situation covers all the examples we have used so far. Let M be a free commutative monoid on generators p_1, p_2, ..., and let \phi : M \to \mathbb{R} be a homomorphism. Let a_m = \mathbb{P}(X = m) be a probability distribution on M. Suppose that a_{nm} = c a_n a_m for all n, m \in M and some c, and further suppose that if \phi(n) \ge \phi(m) then a_n \le a_m. Then a_m = \frac{1}{\zeta_M(s)} e^{-\phi(m) s} for some s such that the zeta function

\displaystyle \zeta_M(s) = \sum_{m \in M} e^{-\phi(m) s}

converges. Moreover, \zeta_M(s) has the Euler product

\displaystyle \zeta_M(s) = \prod_{i=1}^{\infty} \frac{1}{1 - e^{- \phi(p_i) s}}.

Recall that in the statistical-mechanical interpretation, we are looking at a system whose states are finite collections of particles of types p_1, p_2, ... and whose energies are given by \phi(p_i); then the above is just the partition function. In the special case of the zeta function of a Dedekind abstract number ring, M = M_R is the commutative monoid of nonzero ideals of R under multiplication, which is free on the prime ideals by unique factorization, and \phi(I) = \log N(I). In the special case of the dynamical zeta function of an invertible map f : X \to X, M = M_X is the free commutative monoid on orbits of f (equivalently, the invariant submonoid of the free commutative monoid on X), and \phi(P) = \log |P|, where |P| is the number of points in P.

Read Full Post »

An interesting result that demonstrates, among other things, the ubiquity of \pi in mathematics is that the probability that two random positive integers are relatively prime is \frac{6}{\pi^2}. A more revealing way to write this number is \frac{1}{\zeta(2)}, where

\displaystyle \zeta(s) = \sum_{n \ge 1} \frac{1}{n^s}

is the Riemann zeta function. A few weeks ago this result came up on math.SE in the following form: if you are standing at the origin in \mathbb{R}^2 and there is an infinitely thin tree placed at every integer lattice point, then \frac{6}{\pi^2} is the proportion of the lattice points that you can see. In this post I’d like to explain why this “should” be true. This will give me a chance to blog about some material from another math.SE answer of mine which I’ve been meaning to get to, and along the way we’ll reach several other interesting destinations.


Read Full Post »

I finally learned the solution to a little puzzle that’s been bothering me for awhile.

The setup of the puzzle is as follows. Let G be a weighted undirected graph, e.g. to each edge i \leftrightarrow j is associated a non-negative real number a_{ij}, and let A be the corresponding weighted adjacency matrix. If A is stochastic, one can interpret the weights a_{ij} as transition probabilities between the vertices which describe a Markov chain. (The undirected condition then means that the transition probability between two states doesn’t depend on the order in which the transition occurs.) So one can talk about random walks on such a graph, and between any two vertices the most likely walk is the one which maximizes the product of the weights of the corresponding edges.

Suppose you don’t want to maximize a product associated to the edges, but a sum. For example, if the vertices of G are locations to which you want to travel, then maybe you want the most likely random walk to also be the shortest one. If E_{ij} is the distance between vertex i and vertex j, then a natural way to do this is to set

a_{ij} = e^{- \beta E_{ij}}

where \beta is some positive constant. Then the weight of a path is a monotonically decreasing function of its total length, and (fudging the stochastic constraint a bit) the most likely path between two vertices, at least if \beta is sufficiently large, is going to be the shortest one. In fact, the larger \beta is, the more likely you are to always be on the shortest path, since the contribution from any longer paths becomes vanishingly small. As \beta \to \infty, the ring in which the entries of the adjacency matrix lives stops being \mathbb{R} and becomes (a version of) the tropical semiring.

That’s pretty cool, but it’s not what’s been puzzling me. What’s been puzzling me is that matrix entries in powers of A look an awful lot like partition functions in statistical mechanics, with \beta playing the role of the inverse temperature and E_{ij} playing the role of energies. So, for awhile now, I’ve been wondering whether they actually are partition functions of systems I can construct starting from the matrix A. It turns out that the answer is yes: the corresponding systems are called one-dimensional vertex models, and in the literature the connection to matrix entries is called the transfer matrix method. I learned this from an expository article by Vaughan Jones, “In and around the origin of quantum groups,” and today I’d like to briefly explain how it works.


Read Full Post »


Get every new post delivered to your Inbox.

Join 463 other followers