**Definition-Theorem:** The following conditions on are all equivalent, and all define what it means for to be a **separable **-algebra:

- is projective as an -bimodule (equivalently, as a left -module).
- The multiplication map has a section as an -bimodule map.
- admits a
**separability idempotent**: an element such that and for all (which implies that ).

(**Edit, 3/27/16: **Previously this definition included a condition involving Hochschild cohomology, but it’s debatable whether what I had in mind is the correct definition of Hochschild cohomology unless is a field or is projective over . It’s been removed since it plays no role in the post anyway.)

When is a field, this condition turns out to be a natural strengthening of the condition that is semisimple. In general, loosely speaking, a separable -algebra is like a “bundle of semisimple algebras” over .

**Proofs that the above conditions are equivalent**

: the multiplication map

is an epimorphism of -bimodules, so if is projective as an -bimodule then it splits, meaning it has a section. Conversely, since is a free -bimodule, if this map has a section then is a retract of a free -bimodule, hence is projective.

: a section of the multiplication map, as above, is determined by what it does to ; let’s call the image , and abuse terminology by identifying with the section it defines. What does it mean for to split the multiplication map? As a splitting, it must satisfy

since it’s the image of . Second, as an -bimodule map, it must satisfy for all , since in . (In fact is the free -bimodule on a generator with this property.) These conditions together imply that

hence that is an idempotent, as the name “separability idempotent” suggests. Hence a splitting of the multiplication map is the same data as a separability idempotent, which is in fact an idempotent.

This concludes the proof.

**A note on **

Above we chose to write the source of the multiplication map as to emphasize that it is the free -module, or equivalently the free -bimodule, on a generator. However, it can just as well be written , provided that we remember that the natural -bimodule structure on this is given by left multiplication on the first copy of and right multiplication on the second copy of . (Also, when we think of the separability idempotent as an idempotent, we really have in mind the algebra structure on .) This is how we’ll write things down in examples below.

**Some examples**

*Example.* itself is a separable -algebra, since it is even free as a -module.

*Example.* The matrix algebra is a separable -algebra. We can prove this explicitly by writing down a separability idempotent. Letting be the usual basis of (with relations , and all other multiplications zero), set

where is fixed. We have

and

while

so is a separability idempotent.

*Example.* If is a finite group whose order is invertible in , then the group algebra is a separable -algebra. Again we can prove this explicitly by writing down a separability idempotent. Set

.

We have

and, for any ,

where in the second line we made the substitution . So is a separability idempotent.

To get some more examples it will be convenient to use the following lemma.

**Lemma:** If is semisimple, then is separable over .

*Proof.* A ring is semisimple iff every module over it is projective. So if is semisimple, then in particular is a projective -module.

**Corollary:** If is a finite separable extension of a field (in the usual sense), then is separable over (in the above sense).

*Proof.* By the primitive element theorem, for some irreducible separable polynomial . Hence

where are the irreducible factors of over (it has at least two, since by definition contains a root of ). This is a finite product of fields and hence semisimple, so by the lemma we conclude that is separable.

Over a field, this gives another way to prove that the matrix algebras and the group algebras (where is invertible in ) are separable: is semisimple, and so is . But the proofs via writing down separability idempotents work for much more general base rings.

**Some general lemmas**

**Lemma:** is a separable -algebra iff is.

*Proof.* The opposite of a separability idempotent for is a separability idempotent for .

More explicitly, suppose is a separability idempotent for . Then is a separability idempotent for .

**Lemma:** If and are separable -algebras, then so is .

*Proof.* The sum of separability idempotents for and is a separability idempotent for .

More explicitly, recall that tensor product distributes over finite products for algebras. (In the commutative case this means that product distributes over finite coproducts for affine schemes.) Hence

and this is even an isomorphism of -bimodules, respecting the multiplication map down to . From this it’s not hard to verify that if is a separability idempotent for , and is a separability idempotent for , then , included into the above, is a separability idempotent for .

**Lemma:** If and are separable -algebras, then so is .

*Proof.* The tensor product of separability idempotents for and is a separability idempotent for .

**Lemma:** If is a separable -algebra, then the base change is a separable -algebra, for any commutative -algebra . (Hence separability is a **geometric** property in the strong sense that it is preserved by arbitrary base change.)

*Proof.* A separability idempotent for remains a separability idempotent for .

Note that this is not true for semisimple algebras, since an inseparable extension of the ground field is a counterexample.

**Lemma:** Any quotient of a separable algebra is separable.

*Proof.* A separability idempotent for remains a separability idempotent for any quotient of .

**Corollary: **Two -algebras are separable over if and only if is separable over .

*Proof.* In one direction, if are separable, then so is . In the other, if is separable, then are quotients of it, hence are also separable.

**Lemma:** Separability is Morita invariant: if and are Morita equivalent over , then is separable over iff is.

*Proof.* By the Eilenberg-Watts theorem, the category of -bimodules is equivalent (even monoidally) to the category of cocontinuous -linear endofunctors of . Among these, the bimodule itself represents the identity functor. Hence separability is equivalent to the condition that the identity is projective, and since this condition can be stated entirely in terms of it is Morita invariant.

**Lemma:** If is a commutative -algebra and is an -algebra such that 1) is separable over and 2) is separable over , then is separable over .

*Proof.* By hypothesis, the multiplication maps and split as bimodule maps, and we want to know that the same is true of the multiplication map . (Note that we are writing even though is commutative and so canonically isomorphic to its opposite; we don’t want to use this isomorphism.) If we write

then we can factor the multiplication map as a composite of two maps we know how to split, namely

where the first map applies the multiplication map between the two copies of and the second map is the multiplication map . (A similar argument can be used to show that the tensor product of separable algebras is separable.)

**Classification over a field**

We now classify separable -algebras when is a field.

**Lemma:** If is separable over a field , then is semisimple.

*Proof.* Any -bimodule describes a cocontinuous -linear functor as follows:

.

The bimodule represents the identity functor, while the free bimodule represents the functor

.

Consequently, the multiplication map represents the natural transformation

given by the action of . Now, if is separable, then this natural transformation splits, and in particular all of these action maps split. If is a field, then is a free -module, so is a free -module. This means that every -module is a retract of a free -module, hence is projective, and so is semisimple as desired.

**Corollary:** If is separable over a field , then is a finite product of matrix algebras over division algebras over , all of which must also be separable.

*Proof.* Since is semisimple, Artin-Wedderburn implies that for some division rings over . The lemmas we proved above imply that is separable iff are separable and that is separable iff is separable, so any such product is separable iff each is separable.

**Corollary:** If is separable over a field , then is **geometrically semisimple**: is semisimple for every field extension .

*Proof.* We know that separability is geometric (preserved by base change), so is separable over for every . If is a field extension, then the above lemma implies that is also semisimple.

**Corollary:** If is separable over a field , then it is finite-dimensional over .

*Proof.* The base change to the algebraic closure is semisimple, and a semisimple -algebra is necessarily a finite direct product of matrix algebras over (since there are no nontrivial division algebras over an algebraically closed field), hence finite-dimensional over . And .

**Corollary:** An algebra over a field is separable over iff is semisimple.

*Proof.* Above we showed that if is semisimple, then is separable over (since every module, and in particular , is projective over ). Conversely, we also showed that tensor products and opposites of separable algebras are separable, so if is separable over then so is , and so it must also be semisimple.

At this point we’ve reduced to looking for finite-dimensional division algebras over such that is semisimple. We’ll find them by inspecting their centers , which are finite extensions of .

**Lemma:** Let be algebras over a field , and let denote the center. Then .

*Proof.* Suppose is central. This is equivalent to the condition that

for all and

.

for all . Since we’re working over a field, we can assume WLOG that the and are linearly independent in and respectively, from which it follows that these conditions hold if and only if for all . Hence , which (again, since we’re working over a field) is naturally a subalgebra of , and so must be the entire thing.

**Lemma:** If is a separable algebra over a field , then so is its center .

*Proof.* We know that is separable iff is semisimple. By the above lemma, we have

and since the center of a semisimple algebra is (a finite product of fields, hence) semisimple, it follows that is semisimple, hence (by another lemma) that is separable.

**Lemma:** Let be a finite field extension of a field . The following conditions are equivalent:

- is a separable extension of (in the usual sense).
- is a separable -algebra (in the above sense).
- is geometrically semisimple: is semisimple for all field extensions .
- is semisimple (equivalently, is a finite product of copies of ).

*Proof.* : we proved this above from the primitive element theorem.

: follows from a lemma above.

: set .

: any generates a -subalgebra of isomorphic to where is the minimal polynomial of . If is semisimple, it is a finite product of copies of , hence so is every -subalgebra of it. And is a finite product of copies of iff is separable over .

**Corollary:** If a division algebra over a field is separable over , then is finite-dimensional over , and its center is a separable extension of (in the usual sense).

This necessary condition in fact turns out to be sufficient. We need one more lemma to prove this, which is the following.

**Theorem:** Let be a **central simple algebra** over a field : that is, is a finite-dimensional simple -algebra with center . Then , where .

*Proof.* naturally acts on ( acting from the left, acting from the right). Simplicity of means that has no nontrivial two-sided ideals, or equivalently has no nontrivial -submodules, hence is simple as an -module. Its endomorphism ring is the center , and as a module over this endomorphism ring, . Hence the natural action of on gives a map

and we want this map to be a bijection. But it is a surjection by the Jacobson density theorem, hence a bijection since both sides have dimension over .

**Corollary:** A central simple algebra over a field is separable over .

**Corollary:** A finite-dimensional division algebra over a field whose center is a finite separable extension of is separable over .

*Proof.* We now know that is separable over , and that is separable over . By a lemma, it follows that is separable over .

**Corollary:** The separable algebras over a field are precisely the finite products of matrix algebras over finite-dimensional division algebras over whose centers are separable extensions of .

Over a perfect field, the last condition is automatic, so this just says that the separable algebras over are precisely the finite-dimensional semisimple -algebras.

]]>

Let be a cocommutative coalgebra over a commutative ring . If we want to make sense of as defining an algebro-geometric object, it needs to have a functor of points on commutative -algebras. Here it is:

.

In words, the functor of points of a cocommutative coalgebra sends a commutative -algebra to the set of setlike elements of . In the rest of this post we’ll work through some examples.

**Sets**

Recall that if is a set then is a cocommutative coalgebra with comultiplication coming from the diagonal . More explicitly, the comultiplication is determined by the condition that for all .

The functor of points of this coalgebra sends a commutative -algebra to the set of setlike elements of , and as we computed before, these are precisely the elements of the form where

and , or equivalently is a complete orthogonal set of idempotents in . Together, the determine a direct product decomposition

which geometrically corresponds to a decomposition of into disjoint components . As mentioned previously, the data of such a decomposition is equivalent to the data of a continuous function from the Pierce spectrum to .

In other words, consists of “locally constant functions from to .”

We can also equip with a group structure, and then , with the usual Hopf algebra structure, has a functor of points sending a commutative -algebra to the group of continuous functions from to , with pointwise product.

**Finite-dimensional algebras**

Now we restrict to the case that is a field.

Let be a finite-dimensional commutative -algebra. Then the linear dual acquires a natural coalgebra structure given by dualizing the algebra structure on . (We don’t need commutativity to say this.) More explicitly, if is an element of , then the comultiplication is

and the counit is

.

On the other hand,

.

We conclude the following.

**Lemma:** A linear functional is setlike if and only if for all and ; in other words, if and only if is a morphism of -algebras.

More generally, because is a finite-dimensional -vector space, if is any commutative -algebra then the natural map

is an isomorphism. We can check that it’s even an isomorphism of coalgebras, and exactly the same computation as above shows the following.

**Corollary:** An element of is setlike if and only if the corresponding element of is a morphism of -algebras.

Hence the functor of points of as a coalgebra is precisely the functor of points of as an algebra: setlike elements of correspond to morphisms of affine schemes over .

The dual map induces an equivalence of categories between finite-dimensional commutative algebras and finite-dimensional cocommutative coalgebras over , so we can learn something about the latter by learning something about the former. Every finite-dimensional commutative algebra over a field is in particular Artinian, and so factors as a finite product of Artinian local rings. The nilradical of such a ring coincides with its Jacobson radical, and the quotient is a finite-dimensional commutative semisimple -algebra, hence factors as a finite product of finite field extensions of .

Hence, up to taking finite extensions, looks like a finite set of points together with some “nilpotent fuzz.” looks like functions on this and looks like distributions; both are equally sensitive to the “nilpotent fuzz,” as we saw previously in the special case of primitive elements.

**Infinite-dimensional algebras**

Again let be a field. Let be a commutative -algebra, not necessarily finite-dimensional. Then it is no longer true that we can put a coalgebra structure on : when we try dualizing the multiplication, the map goes in the wrong direction to get a comultiplication.

Intuitively, the problem is that because we’re using the algebraic tensor product to define coalgebras, the comultiplication can only output a sum of finitely many tensors, and so has trouble dealing with distributions that are not “compactly supported.”

However, it is possible to rescue this construction as follows. If is a commutative -algebra, define its **finite dual**

to consist of all linear functionals factoring through a finite quotient of (as a -algebra). Geometrically, these are the distributions with “finite support,” and they do in fact have a comultiplication, as follows. If factors through a finite quotient , then

factors through

and the quotient map dualizes to a map , giving us an element of coming from , and hence giving an element of . This is our comultiplication. The counit is as usual; this poses no problems.

The result we showed in the finite-dimensional case above shows the following here.

**Theorem:** Let be a commutative -algebra and let be its finite dual. Then the setlike elements of can naturally be identified with the -algebra homomorphisms which factor through a finite quotient of .

Geometrically, this says that the functor of points of sends an affine scheme to maps from to the spectrum of the **profinite completion **

of . In other words, itself is the coalgebra of distributions on the profinite completion.

*Example.* Let , so that is the affine line. The distributions on the affine line with finite support, or equivalently the profinite completion of , can be very explicitly classified. By the Chinese remainder theorem, the finite quotients of take the form

where the are irreducible polynomials over . This is a finite product, hence a finite direct sum, of vector spaces, and so any linear functional on it breaks up as a direct sum of linear functionals on each piece, so we can restrict attention to linear functionals on (distributions “supported on “) without loss of generality.

In the simplest case, is a linear polynomial . Then the linear dual of has a basis consisting of taking each of the first terms of the Taylor series expansion of a polynomial in centered at : these are (up to the issue of dividing by various factors if has positive characteristic) the derivatives of the Dirac delta at .

In the general case we can understand what’s happening using Galois descent. After passing to a suitable field extension of , namely the splitting field of , the quotient breaks up further into linear factors. In the case that is Galois, linear functionals on can be interpreted as -invariant distributions on . Geometrically we should think of a finite set of “fuzzy” points acted on by the Galois group; examples of Galois-invariant distributions on this include the sum of Dirac deltas at each point, or the sum of derivatives of Dirac deltas at each point. If isn’t Galois (meaning that is inseparable), there is actually extra “fuzziness” that could be hidden over and only becomes visible over .

*Subexample.* Let and consider the quotient of . After passing to the Galois extension , this quotient becomes , and it’s clear that the dual space has a natural basis given by two Dirac deltas, one at and one at . The corresponding linear functionals are just evaluation at these two points.

Unfortunately, these Dirac deltas don’t directly make sense over . Instead, there are two Galois-invariant linear combinations that do: we can take

which, up to a factor of , takes the real part of , and

which, again up to a factor of , takes the imaginary part.

**Cartier duality**

We mostly restricted to the case of a field above because over a field duality behaves in the following very nice way.

**Theorem:** The functor is a contravariant equivalence of symmetric monoidal categories between the symmetric monoidal category of finite-dimensional -vector spaces and itself.

Because this equivalence is symmetric monoidal, it induces various further equivalences.

**Corollary:** The functor is a contravariant equivalence of categories between finite-dimensional -algebras and finite-dimensional -coalgebras, and between finite-dimensional commutative -algebras and finite-dimensional cocommutative -coalgebras.

These remain symmetric monoidal equivalences if we equip everything with the usual tensor product (which for commutative algebras is the coproduct and for cocommutative coalgebras is the product, so in this case we get that the equivalence is symmetric monoidal for free). We can even ask for both an algebra and a coalgebra structure at once, which gives us this.

**Corollary (Cartier duality):** The functor is a contravariant equivalence of categories between finite-dimensional commutative and cocommutative Hopf algebras over and itself.

Finite-dimensional commutative and cocommutative Hopf algebras over are the analogues of finite abelian groups in the world of algebraic geometry over : more precisely, they are finite (in the sense that they are Spec of a finite-dimensional algebra) commutative (because “abelian” means something else in algebraic geometry) group schemes (meaning Spec of a commutative Hopf algebra).

*Example.* Suppose is a finite abelian group, and is its group algebra, regarded as a Hopf algebra in the usual way (so cocommutative for general reasons, and commutative because is abelian). Then the Cartier dual of is the function algebra , regarded as a Hopf algebra in the usual way (commutative for general reasons, and cocommutative because is abelian).

*Subexample.* If is the cyclic group of order , then , as a group scheme, has functor of points

sending a commutative -algebra to the group of roots of unity in . This group scheme has its own name in algebraic geometry: it’s called . On the other hand, its Cartier dual is the “constant” group scheme with value : it has functor of points

sending a commutative -algebra to, as above, the group of locally constant functions from to . This is the same functor of points we get if we think about as a coalgebra, and its name is just .

Cartier duality can be described as switching between two possible functors of points for a finite-dimensional commutative and cocommutative Hopf algebra as above: one based on thinking of as a group object in finite schemes, and one based on thinking of itself as a group object in finite-dimensional cocommutative coalgebras. In the second description, the functor of points

sends a commutative -algebra to the group (really a group now, since we are in a Hopf algebra) of setlike elements of .

As it turns out, it’s possible to give a description of what this functor is doing without explicitly thinking about coalgebras or Cartier duality. Namely, we saw above that the coalgebra of distributions on a point represents the setlike elements functor on coalgebras. We can ask what represents the setlike elements functor on Hopf algebras, and it’s not hard to see that the answer is the Hopf algebra whose underlying algebra is

where the comultiplication is , the counit is , and the antipode is . This Hopf algebra is commutative, and thinking of it as a group scheme, it is a very famous one, the **multiplicative group scheme** , whose functor of points

sends a commutative -algebra to its group of units. Morphisms of Hopf algebras correspond to setlike elements of , and if is commutative these in addition correspond to morphisms of affine group schemes. A morphism from an affine group scheme to the multiplicative group is called a **character**: it is the correct notion of a -dimensional representation in the world of group schemes.

Cartier duality can then be interpreted as follows: if is a finite commutative group scheme, then “characters of ” forms another finite commutative group scheme, whose functor of points

sends a commutative -algebra to the group (under pointwise multiplication) of characters of the base change . But we saw earlier that this is nothing more than the set of setlike elements of , or equivalently the set of homomorphisms , and so this is precisely the functor of points of the Cartier dual as previously defined.

Once Cartier duality is described in terms of characters, it seems a little more suprising: since the dual of the dual of a finite-dimensional vector space is just again, we conclude that taking characters of the characters of a finite commutative group scheme gets us the same group scheme again. This should be compared to Pontryagin duality for finite abelian groups, which says the same thing, where “characters” means homomorphisms , and which can be interpreted as Cartier duality for constant group schemes over .

]]>

Less commonly, mathematicians sometimes think about coalgebras. In general it seems that mathematicians find these harder to think about, although it’s sometimes unavoidable, e.g. when discussing Hopf algebras. The goal of this post is to describe how to begin thinking about cocommutative coalgebras as consisting of distributions of some sort on spaces of some sort.

**Functions vs. distributions**

Distributions are typically defined as being duals (spaces of continuous linear functionals) to topological vector spaces of functions. Loosely speaking, a distribution is something you can integrate a class of functions against; it’s a kind of generalized measure.

For example, the dual of the space of continuous functions on a compact Hausdorff space (with the sup norm topology) is a space of (signed) Radon measures on . A class of examples closer to the examples we’ll be considering, although it involves more technicalities than we’ll need, is the dual of the space of smooth functions on a smooth manifold (with the Fréchet topology), which can be thought of as “distributions with compact support” on .

The simplest examples of distributions are the **Dirac delta** distributions, definable in great generality: as linear functionals on spaces of functions they are precisely the evaluation functionals

.

When we take duals to spaces of smooth functions, as opposed to continuous functions, we get more interesting distributions “supported at a point” given by taking derivatives. For example, on , at every point there are linear functionals on given by

.

These distributions are named using derivative notation because they are the distributional derivatives of .

The two most important things to keep in mind about the difference between functions and distributions is the following.

- Functions pull back, while distributions push forward.
- Functions form commutative algebras, while distributions form cocommutative coalgebras.

These points are closely related: the multiplication on functions resp. the comultiplication on functions, can be described using pullback resp. pushforward along the diagonal map

.

Namely, because we can multiply functions on by functions on to get functions on , for any reasonable notion of functions we get a dual map

giving the multiplication on functions.

The situation for distributions is similar but less straightforward: if is any reasonable notion of distributions we get a map

To get a comultiplication from this we’d like for there to be an isomorphism, or at least a map, from to . Unfortunately, the map that exists usually goes in the other direction, and usually will not be an isomorphism unless is some kind of completed tensor product.

Nevertheless, in some examples, and/or with the right modified notion of tensor product, the required maps do exist and we do get a comultiplication on distributions.

In addition to comultiplication, coalgebras also need a counit. In the case of distributions on spaces this counit comes from pushing forward along the unique map , getting a map

which, if we think of distributions as generalized measures, computes the “total measure” of a measure.

**The diagonal**

The appearance of the diagonal map above can be put into a more abstract context. Recall that in any category with finite products, every object is canonically a cocommutative comonoid in a unique way, via the diagonal map

.

A typical example for us will be , and in general we’ll want to think of as a category of “spaces.” We can get both commutative monoids and cocommutative comonoids out of diagonal maps as follows.

If is a contravariant functor out of (describing a notion of “functions”) to a symmetric monoidal category (typically something like ) which is lax symmetric monoidal in the sense that it is equipped with natural transformations

compatible with symmetries (plus some unit stuff), then pulling back along the diagonal endows each with the structure of a commutative monoid in .

*Example.* If , then we can take to consist of all functions , where is the underlying field. If , then is even symmetric monoidal in the sense that the natural transformations above are isomorphisms.

Dually, if is a covariant functor out of (describing a notion of “distributions”) to a symmetric monoidal category which is oplax symmetric monoidal in the sense that it is equipped with natural transformations

compatible with symmetries (plus unit stuff as above), then pushing forward along the diagonal endows each with the structure of a cocommutative comonoid in .

*Example.* If , then we can take to consist of the free -vector space on , where is the underlying field. Without any finiteness hypotheses, this is even symmetric monoidal.

**Sets as coalgebras**

Let’s slightly generalize the construction above. Let be a commutative ring (in fact we could take a commutative semiring here). Then we have a free -module functor from sets to -modules. The above construction shows that this functor can be regarded as taking values in cocommutative coalgebras over , so in fact we have a functor

.

At this point it will be convenient to introduce the following definition.

**Definition:** An element of a coalgebra (where is the comultiplication and is the counit) is *setlike* if and . If is a coalgebra, we’ll write for its set of setlike elements.

(The more common term is *grouplike*, but that term is really only appropriate to the case of Hopf algebras, since in that case the setlike elements form a group. Here the setlike elements only form a set.)

Now we can describe , as a coalgebra, as being freely generated by setlike elements. Thinking in terms of distributions, setlike elements correspond to Dirac distributions, and so it’s reasonable to think of them as the “points” of a coalgebra, or more precisely of a hypothetical space on which the coalgebra is distributions.

**Proposition:** The functor from sets to coalgebras above has a right adjoint sending a coalgebra to its set of setlike elements.

*Proof.* We want to show that if is a set and is a coalgebra, we have a natural bijection

.

But this is clear from the observation that is a free -module on setlike elements, from which it follows that a coalgebra homomorphism is uniquely and freely determined by what it does to each element . These elements must be sent to some setlike element of and can be sent to any such element.

In praticular, the functor is represented by the coalgebra (of “distributions on a point”).

**Lemma: **Suppose has no nontrivial idempotents (that is, it is a connected ring). Then the setlike elements of are precisely the elements : that is, the unit of the above adjunction is an isomorphism.

*Proof.* Suppose is a setlike element. Then

must be equal to

which happens if and only if if and otherwise. The counit condition is

.

Altogether, the condition that is primitive is precisely the condition that the elements are a complete set of orthogonal idempotents in . Since has no nontrivial idempotents by assumption, each is equal to either or . Since they are orthogonal (meaning if ), at most one of them is equal to . And since they sum to , exactly one of them is equal to . Hence our setlike element is some .

The correct statement without the hypothesis that is connected, which is not hard to extract from the above argument, is that the setlike elements of in general correspond to functions from the set of connected components of to with finite image, or equivalently to continuous functions from the Pierce spectrum to .

**Corollary:** Let have no nontrivial idempotents. Then the functor is an equivalence of categories from sets to cocommutative coalgebras over which are free on setlike elements.

In other words, as a slogan, sets are coalgebras of Dirac deltas.

*Proof.* We showed that the unit of the adjunction between sets and coalgebras is an isomorphism on sets. In general, an adjunction restricts to an equivalence of categories between the subcategories on which the unit resp. the counit of the adjunction are isomorphisms. So it remains to determine for which coalgebras the counit of the adjunction is an isomorphism. Explicitly, the counit is the natural map

from the free -module on the setlike elements of a coalgebra to . If this is an isomorphism, then must in particular be free on some setlike elements. Conversely, if is free on setlike elements, then the lemma above shows that naturally, so that is an isomorphism.

This equivalence induces an equivalence between groups and cocommutative Hopf algebras over which are free (as modules) on setlike (here “grouplike”) elements.

**Beyond Dirac deltas**

We’ve said a lot about setlike elements of coalgebras, or equivalently about Dirac delta distributions. But coalgebras have lots of other kinds of elements in general. For example, if is a Lie algebra, its universal enveloping algebra has a natural comultiplication given by extending

where ; that is, each is primitive. In a geometric story about distributions, where do the primitives?

The first observation is that in an arbitrary coalgebra there isn’t an element called , so coalgebras don’t have a notion of primitive element. What makes the element special is that it is in fact the unique setlike element: it satisfies and is the only element of with this property. So whatever primitivity means, geometrically it has something to do with a fixed setlike element, or in distributional terms with a fixed Dirac delta.

**Definition:** Let be a setlike element of a coalgebra . An element is *primitive with respect to* if

and .

We can get a big hint about what this definition means by going back to the example of distributions coming from taking the dual of the space of smooth functions . Consider the distribution

.

How does comultiplication act on this distribution? To answer that question we need to see what this distribution does to a product of functions (since this describes the action of the distribution on at least a dense subspace of the pullback of functions along the diagonal map ). The answer, using the product rule, is that

.

This gives that

and tells us that primitivity is a reflection of the Leibniz rule for derivations: saying that an element is primitive with respect to a setlike element means that if is a “point,” or more precisely a Dirac delta at a point, then is a “directional derivative” in a tangent direction at that point. Similarly, computing the pushforward to a point means differentiating constant functions (which are the functions pulled back from a point), which gives zero.

More formally, we can say the following.

**Theorem:** Let be a setlike element of a cocommutative coalgebra over , and let be an arbitrary element. Then is primitive with respect to iff is a setlike element of .

*Proof.* Computation.

Intuitively, is primitive with respect to iff both and are “points,” where the indicates that they are “infinitesimally close” points.

The fact that , as a Hopf algebra, is generated by primitive elements can be interpreted geometrically as saying that it corresponds to distributions “supported at a point.” In fact it is possible to describe as distributions supported at the identity on a Lie group with Lie algebra .

]]>

The goal of this post is to derive the principle of maximum entropy in the special case of probability distributions over finite sets from

- Bayes’ theorem and
- the principle of indifference: assign probability to each of possible outcomes if you have no additional knowledge. (The slogan in statistical mechanics is “all microstates are equally likely.”)

We’ll do this by deriving an arguably more fundamental principle of maximum relative entropy using only Bayes’ theorem.

**A better way to state Bayes’ theorem**

Suppose you have a set of hypotheses about something, exactly one of which can be true, and some prior probabilities that these hypotheses are true (which therefore sum to ). Then you see some evidence . (Here is a simultaneous definition of both hypotheses and evidence: hypotheses are things that assert how likely or unlikely evidence is. That is, what it means to give evidence about some hypotheses is that there ought to be some conditional probabilities , the likelihoods, describing how likely it is that you see evidence conditional on hypothesis .)

Bayes’ theorem in this setting is then usually stated as follows: you should now have updated posterior probabilities that your hypotheses are true conditional on your evidence, and they should be given by

.

That is, each prior probability gets multiplied by , which describes how much more likely thinks the evidence is than before. You might be concerned that requires the introduction of extra information, but in fact it must be given by

by conditioning on each in turn, so it’s already determined by the priors and the likelihoods. (This is if the are parameterized by a discrete parameter ; in general this sum should be replaced by an integral.)

In practice this statement of Bayes’ theorem seems to be annoyingly easy to forget, at least for me. Here is a better statement. The idea is to think of as just a normalization constant. Hence the revised statement is

.

That is, the posterior probability is proportional to the prior probability times the likelihood, where the proportionality constant is uniquely determined by the requirement that the probabilities sum to .

Intuitively: after seeing some evidence, your confidence in a hypothesis gets multiplied by how well the hypothesis predicted the evidence, then normalized. Now you can take your posteriors to be your new priors in preparation for seeing some more evidence. This is a **Bayesian update**.

**Aside: measures up to scale and improper priors**

This statement of Bayes’ theorem suggests a slight reformulation of what we mean by a probability measure: a probability measure is the same thing as a measure with nonzero total measure, up to scaling by positive reals. One reason to like this description is that it naturally incorporates improper priors, which correspond to prior probabilities with possibly infinite total measure, up to scaling by positive reals. The point is that after a Bayesian update an improper prior may become proper again. For example, there’s an improper prior assigning measure to every positive integer , which allows us to talk about hypotheses indexed by the positive integers and with a prior which makes all of them equally likely.

Improper priors may seem obviously bad because they don’t assign probabilities to things: in order to assign a probability you need to normalize by the total measure, which is infinite. However, with an improper prior it is still meaningful to make comparisons between probabilities: you can still meaningfully say that is larger than , or exactly times , since this comparison is invariant under scaling by positive reals.

There’s a somewhat philosophical argument that when performing Bayesian reasoning, only comparisons between probabilities are meaningful anyway: in order to know the probability, in the absolute sense, of something, you need to be absolutely sure you’ve written down every possible hypothesis (in order to ensure that exactly one of them is true). If you leave out the true hypothesis, then you might end up being more and more sure of an arbitrarily bad hypothesis because the true hypothesis wasn’t included in your calculations. In other words, computing the normalization constant in the usual statement of Bayes’ theorem is “global” in that it requires information about all of the , but computing is “local” in that it only involves one at a time.

(And it’s not enough just to have a few hypotheses and then a catch-all hypothesis called “everything else,” because “everything else” is not a hypothesis in the sense that it does not assign likelihoods . A hypothesis has to make predictions.)

**The setup**

Back to maximum entropy. Imagine that you are repeatedly rolling an -sided die, and you don’t know what the various weights on the die are: that is, you don’t know the true probabilities that the face of the die will come up.

However, you have some hypotheses about these probabilities. Your hypotheses are parameterized by a parameter , which for the sake of concreteness we’ll take to be a real number or a tuple of real numbers, but which could in principle be anything. Your hypotheses assign probability to the face coming up. You also have some prior over your hypotheses, which we’ll write as a probability density function . Hence and

while the are normalized so that

.

*Example.* If , we might imagine that we’re flipping a coin, with the probability that we flip tails and the probability that we flip heads. Our hypotheses might take the form where , and our prior might be the uniform prior: each is equally likely. Hence our probability density is .

Now suppose you roll the die times. What happens to your beliefs under Bayesian updating in the limit ?

**The principle of maximum relative entropy**

Suppose you see the face come up times (so the are nonnegative and ; they described observed relative frequencies of the various faces coming up, and altogether describe the empirical probability distribution). Hypothesis predicts that this happens with probability

.

Let’s see how this function behaves as . Taking the log, and using Stirling’s approximation in the form

we get

.

Various terms cancel here due to the fact that . At the end of the day we get

.

This is the first apperarance of the function , the **entropy** of , regarded as a probability distribution over faces. This is perhaps the most concrete and least mysterious way of introducing entropy: it’s a concise way of summarizing the asymptotic behavior of the multinomial distribution as . Already we see that the entropy being larger corresponds to the counts being more likely, in a very serious way.

But there’s a second term in the likelihood, so let’s compute the logarithm of that too. This gives

.

Thus the logarithm of the likelihood, or log-likelihood, is

.

Now the function that appears is the negative of the **Kullback-Leibler divergence**. We’ll call it the **relative entropy** (although this term is sometimes used for the KL divergence, not its negative) and denote it somewhat arbitrarily by .

Altogether, the posterior density is now proportional to

.

From here it’s not hard to see that the posterior density is overwhelmingly concentrated at the hypotheses that maximize relative entropy as , subject to the constraint that the prior density is positive. This is because all the other posterior densities are exponentially smaller in comparison, and as long as the prior density is positive, it doesn’t matter what its exact value is because it too is exponentially small in comparison to the main exponential term.

This calculation suggests that we can interpret the relative entropy as a measure of how well the hypothesis fits the evidence : the larger this number is, the better the fit. (A more common way to describe relative entropy is as a measure of how well a hypothesis fits the “truth.” Here our model for being told that is the “truth” is seeing it asymptotically as .)

Let’s wrap that conclusion up into a theorem.

**Theorem:** With hypotheses as above, as , Bayesian updates converge towards believing the hypothesis that maximizes the relative entropy subject to the constraint that the prior density is positive.

Now, suppose the true probabilities are . Then as we expect, by the law of large numbers, that the observed frequencies approach the true probabilities . If the true probabilities are among our hypotheses , we would hope, and it seems intuitively clear, that we’ll converge towards believing the true hypothesis. This requires showing the following.

**Theorem:** The relative entropy is nonpositive, and for fixed , it takes its maximum value iff .

*Proof.* This is more or less a computation with Lagrange multipliers. For fixed , we want to maximize

subject to the constraint . This constraint means that at a critical point of (whether a maximum, a minimum, or a saddle point), all of the partial derivatives should be equal. (Intuitively, we have a “budget” of probability to spend to increase , and as we spend more probability on one we necessarily must spend less probability on the others. The critical points are then the points where we can’t do any better by shifting our probability budget, meaning that the marginal value of each probability increase is equally good.)

We compute that

so setting all partial derivatives equal we conclude that must be proportional to , and the additional constraint gives for all .

At this critical point takes value . Now we need to show that this critical point is a maximum and not a minimum. Since it’s the unique critical point, it suffices to show that it’s a local maximum. So, consider a point in a small neighborhood of this critical point, where . To second order, we have

and hence

The linear term vanishes, and the quadratic term is negative definite as desired. (Strictly speaking we need to require that the are all positive, but if any of them happen to be zero then the corresponding value of can be safely ignored anyway, since it won’t figure in any of our computations.)

**Corollary:** With hypotheses as above, as , if the true hypothesis is among the hypotheses with positive density, Bayesian updates converge towards believing it. False hypotheses are disbelieved at an exponential rate with base the exponential of the relative entropy.

In other words, as , the Bayesian definition of probability converges to the frequentist definition of probability.

*Example.* Let’s return to the example, where we’re flipping a coin with an unknown bias , so that are the probabilities of flipping heads and tails respectively given bias , and our prior is uniform. Suppose that after trials we observe heads and tails, where . Then

.

(We can drop the binomial coefficient because it’s the same for all values of and so can be absorbed into our proportionality constant. We introduced it into the above computation because it becomes important later.)

This computation can be used to deduce the rule of succession, which asserts that at this point you should assign probability to heads coming up on the next coin flip. Note that as this converges to .

The posterior density can be written as

which takes its maximum value when by our results above, although in this case one-variable calculus suffices to prove this. Near this maximum value, Taylor expanding around shows that, for values of sufficiently close to , the posterior density is approximately a Gaussian centered at with standard deviation . Hence in order to be confident that the true bias lies in an interval of size with high probability we need to look at coin flips.

**Maximum entropy**

We were supposed to get a criterion in terms of maximizing entropy, not relative entropy. What happened to that?

Now instead of knowing the relative frequencies , let’s assume that we only know that they satisfy some conditions. For example, for any function , where is a finite-dimensional real vector space, we might know the expected value

of with respect to the empirical probability distribution. In statistical mechanics a typical and important example is that we might know the average energy. We also might be observing a random walk, and while we don’t know how many steps the random walker took in a given direction (perhaps because they’re moving too fast for us to see), we might know where they ended up after steps, which tells us the average of all the steps the walker took.

(Strictly speaking, if we’re talking about the empirical distribution, where in particular each is necessarily rational, it’s too much to ask that any particular condition be exactly satisfied. We’d be happy to see that it’s asymptotically satisfied as , from which we’re concluding that our conditions are exactly satisfied for the true probabilities , or something like that. It seems there’s something subtle going on here and I am going to completely ignore it.)

Knowing the expected values of some functions is equivalent to knowing that the empirical distribution lies in some affine subspace of the probability simplex

.

However, more complicated constraints are possible. For example, suppose that and that we’re really rolling two independent dice with and sides, respectively, so that we can relabel the possible outcomes with pairs

.

Observing this is true means that, at least asymptotically as , we observe that we can write , where

is the empirical probability that the first die comes up , and similarly

is the empirical probability that the second die comes up . This is a nonlinear constraint: in fact it describes a collection of quadratic equations that the variables must satisfy. Imposing these equations turns out to be equivalent to imposing the simpler homogeneous quadratic equations

which we might recognize as the equations cutting out the image of the Segre embedding

.

The idea is to think of the probability simplex as sitting inside projective space ; then the restriction of the Segre embedding to probability simplices produces a map

describing how a probability distribution over the first die and a probability distribution over the second die gives rise to a joint probability distribution over both of them. More complicated variations of this example are considered in algebraic statistics.

In any case, the game is that instead of knowing the empirical distribution we now only know some conditions it satisfies. Write the set of all distributions satisfying these conditions as . What happens as ? Hypothesis still predicts that empirical distribution occurs with probability

and hence it predicts that we observe that our conditions are satisfied with probability

(where is shorthand for the event ). Using our previous approximations, we can rewrite this as

which gives posterior densities

.

As before, we find that the posterior densities are overwhelmingly concentrated at the hypotheses that maximize relative entropy as (again subject to the constraint that ), but where is now allowed to run over all .

If our prior assigns nonzero density to every possible probability distribution in the probability simplex (for simple, we could take to be parameterized by the points of the probability simplex, to be the probability distribution corresponding to the point , and to be a constant, suitably normalized), then we know that relative entropy takes its maximum value when its two arguments are equal, so we can restrict our attention to the case that above, and we find that, asymptotically as , the posterior density is proportional to the prior density as long as satisfies the conditions, and otherwise.

This is unsurprising: we assumed that all we were told about the empirical distribution is that it satisfied some conditions, so the only change we make to our prior is that we condition on that.

We still haven’t gotten a characterization in terms of entropy, as opposed to relative entropy. This is where we are going to invoke the principle of indifference, which in this situation asserts that the prior we should have is the one concentrated entirely at the hypothesis

that the die rolls are being generated uniformly at random. Note that this means the posterior is *also* concentrated entirely at this hypothesis!

We now predict empirical probability distribution with probability distributed according to the multinomial distribution, namely

where is now a normalization constant and can be ignored, and is the entropy, rather than the relative entropy. This comes from substituting the uniform distribution for into the relative entropy .

We now want to ask a slightly different question than before. Before we were asking what our beliefs were about the underlying “true” probabilities generating the die rolls. Now we’ve already fixed those beliefs, and we’re instead going to ask what our beliefs are about the empirical distribution , which we now no longer know, conditioned on the fact that . By Bayes’ theorem, this is

where is either if or if , and as before. Overall, we conclude the following.

**Theorem:** Starting from the indifference prior, as Bayesian updates converge towards believing that the empirical distribution is the maximum entropy distribution in .

In some sense this is not at all a deep statement: it’s just the observation that entropy describes the asymptotics of the multinomial distribution, together with conditioning on . Although it is somewhat interesting that conditioning on , in this setup, is done by seeing that appears to asymptotically lie in as .

**Edit: **This is essentially the Wallis derivation, but with a much larger emphasis placed on the choice of prior.

*Example.* Suppose consists of all probability distributions such that the expected value

of some random variable (possibly vector-valued) is fixed; call this fixed value . Then we want to maximize subject to this constraint and the constraint . This is again a Lagrange multiplier problem. We’ll introduce a vector-valued Lagrange multipler , as well as a scalar Lagrange multiplier for the constraint that will later disappear from the calculation. (The will slightly simplify the calculation.)

Then the method of Lagrange multipliers says that any maximum must be a critical point of the function

for some value of and . (Here we are hiding the dependence of on .) Using the fact that

we compute that

and setting these partial derivatives equal to gives

where . But since the must sum to , is a normalization constant determined by this condition, and in fact must be the **partition function**

.

From here we can compute that the expected value of is

and the entropy is

.

In statistical mechanics, a “die” is a statistical-mechanical system, and is a vector of variables such as energy and particle number describing that system. , the Lagrange multiplier, is a vector of conjugate variables such as (inverse) temperature and chemical potential. The probability distribution we’ve just described is the canonical ensemble if consists only of energy and the grand canonical ensemble if consists of energy and particle numbers.

The uniform prior we assumed using the principle of indifference, possibly after conditioning on a fixed value of the energy (rather than a fixed expected value), is the microcanonical ensemble. The assumption that this is a reasonable prior is called the fundamental postulate of statistical mechanics. As Terence Tao explains here, in a suitable finite toy model involving Markov chains at equilibrium it can be proven rigorously, but in more complicated settings the fundamental postulate is harder to justify, and of course in some settings it will just be wrong. In these settings we can instead use the principle of maximum relative entropy.

]]>

It’s also common to think of monads as generalized monoids; previously we discussed why this was a reasonable thing to do.

Today we’ll discuss a different intuition: monads are (loosely) categorifications of idempotents.

**Conventions**

As previously, in this post compositions will be done in diagrammatic order, so if and are two morphisms, their composite will be denoted , or sometimes (which is independent of, but strongly suggests, the diagrammatic order).

This will end up switching the role of “left” and “right” in a few statements relative to the usual order of composition. For example, it will switch which adjoint goes first when constructing a monad out of an adjunction, so be careful when matching up statements in this post to statements elsewhere. But as we’ll see this convention has nice properties.

It will also force us to use the following curious-looking notation: if is a functor and is an object, we’ll denote the value of on by (thinking of as a morphism ). Read this in your head the same way you would read “c squared.”

Several different kinds of composition throughout this post will be denoted by concatenation, and it should hopefully be clear from the types of the objects involved what kind of composition is meant. For example, if is a 2-morphism and is a 1-morphism such that the compositions are defined, then denotes the vertical composition of with the identity .

**Monads in a 2-category**

Like the definition of adjoints, the definition of monads is purely 2-categorically equational, so makes sense in any 2-category and is preserved by any 2-functor, and we’ll introduce it at this level of generality.

In short, a monad on an object in a 2-category is a monoid in the monoidal category of endomorphisms of . More explicitly, a **monad** is a 1-morphism from an object to itself together with a **unit** 2-morphism and a **multiplication** 2-morphism satisfying the following compatibilities. The first compatibility (**associativity**) says that the two ways of using to write down a 2-morphism agree; explicitly,

.

The second compatibility (**unit**) says that

.

Dually, a **comonad** is a monad in the 1-opposite 2-category (reversing the order of composition of 1-morphisms): it still starts out as a 1-morphism but has a counit and a comultiplication .

Our favorite way of producing monads and comonads will be via adjunctions, as follows. Let be an adjunction, so is the left adjoint and is the right. Let

denote the unit of the adjunction, and let

denote the counit. (This is a place where diagrammatic order matters.)

**Proposition:** is a monad on , with unit the unit of the adjunction and multiplication the map

Dually, is a comonad on , with counit the counit of the adjunction and comultiplication derived from the unit.

This can be thought of as a categorification of the usual method of producing an idempotent from a section-retraction pair, namely a pair of morphisms and such that . (This is another place where diagrammatic order matters. is the retraction and is the section.) This condition means that satisfies , which categorifies to the above.

*Example.* Let be the one-object 2-category corresponding to a monoidal category . Then a monad in is just a monoid in , and a comonad in is a comonoid in . (Working backwards, if monads and comonads in a 2-category categorify idempotent morphisms in a category, then monoids and comonoids in a monoidal category categorify idempotent elements of a monoid.)

An adjoint pair in is a dual pair of objects of (here indicates the right dual of ). So the construction above specializes to the observation that the tensor product has a natural monoid structure, which should be familiar from the case of vector spaces, where it is a matrix algebra. (What might be less familiar is the dual statement that has a natural comonoid structure.)

More generally, if is a closed symmetric monoidal category with internal hom , then

so by the Yoneda lemma we conclude that for a dualizable object , the tensor product is canonically isomorphic to the internal endomorphism object (and a little more work shows that this is even an isomorphism of monoid objects).

In general, we get that is a monoid in which naturally acts on from the left, and on from the right. Furthermore,

where denotes the monoidal unit, so the “points” of (the result of applying ) are endomorphisms of .

*Example.* Let be the 2-category of posets. Then a monad in is a closure operator on a poset .

Explicitly, must first of all be a morphism of posets, so implies . Next, the unit becomes the condition , and finally, multiplication becomes the condition . Since implies , we have . (So all monads on posets are genuinely idempotents.)

The construction of monads from adjunctions specializes here to the construction of closure operators from adjunctions between posets, also known as Galois connections. This reproduces various familiar closure operators in mathematics, such as Zariski closure.

*Example.* Let be the Morita 2-category of algebras, bimodules, and bimodule homomorphisms over a commutative ring . Then a monad in is an algebra object in -bimodules over , where is some -algebra; we’ll call this an -algebra for short, although note that even if is commutative it doesn’t reduce to the usual notion of algebra over a commutative ring.

An adjoint pair in is, as we saw previously, a pair consisting of an -bimodule which is f.g. projective as a right -module (the left adjoint) and its -linear dual regarded as a -bimodule (the right adjoint). The notation is meant to again evoke the special case of vector spaces, which we recover when is a field. The corresponding monad is the -algebra , which can be thought of as the algebra of -linear endomorphisms of (acting on the left) or of (acting on the right).

**Left and right modules**

Just like monoids, monads have modules over them. A **right module** over a monad is an object , a 1-morphism , and an action 2-morphism

satisfying the associativity condition

and the unit condition

.

Dually, a **left module** is an object , a 1-morphism , and an action 2-morphism satisfying the obvious duals of the above conditions.

Our favorite way of producing modules is again via adjunctions. If is an adjunction giving rise to , then the left adjoint is naturally a left module over , and dually the right adjoint is naturally a right module over . (If we had stuck to compositions in the usual rather than diagrammatic order this would be reversed.) In fact the two together form a kind of bimodule over .

Classically, in , what is usually called a **module** or **algebra** for a monad is an object together with an action map satisfying the same axioms as above. This is a special kind of right module where is the terminal category. More generally, in a right module can be interpreted as a family of -algebras parameterized by . It’s less clear what a left module is.

Thinking of monads as idempotents , right modules categorify morphisms such that , or equivalently that equalize and , and dually left modules categorify morphisms such that , or equivalently that coequalize and . These are used to state the universal property of the equalizer and coequalizer of and respectively, which can be thought of as the object of invariants (fixed points) or the object of coinvariants (orbits), respectively, under the action of , and (because is an idempotent) which are canonically isomorphic.

This categorifies as follows. The categorification of invariants is the **Eilenberg-Moore object** of a monad . This is, if it exists, the universal right module over : that is, it is equipped with a 1-morphism and an action 2-morphism making it a right module, and any other right module uniquely factors through it. Said another way, right module structures on an object are equivalent to 1-morphisms .

Dually, the categorification of coinvariants is the **Kleisli object** . This is, if it exists, the universal left module over : that is, it is equipped with a 1-morphism and an action 2-map making it a left module, and any other left module uniquely factors through it. Said another way, left module structures on an object are equivalent to 1-morphisms .

*Example.* In , the Eilenberg-Moore category of a monad on a category turns out to be the category of -algebras (categorifying how the invariants of an idempotent endomorphism of a set is the set of its fixed points). The right module structure on has 1-morphism the forgetful functor given by forgetting the -algebra structure and action 2-morphism the natural transformation whose components are given by the action maps

of the -algebras .

The universal property of says that a right module structure on a category is a functor , or in other words a family of -algebras parameterized by , which at least makes sense at the level of objects.

The Kleisli category turns out to have the same objects as , but where a morphism from to is a **Kleisli morphism**, namely a morphism . Composition is as follows: if and are two Kleisli morphisms, then their composite is

.

The left module structure on has 1-morphism the functor which is the identity on objects and which sends an ordinary morphism to the Kleisli morphism

.

Its action 2-morphism is the natural transformation whose components are the identity , regarded as a Kleisli morphism from to .

The universal property of says that a left module structure on a category is a functor . It’s less clear to me what this means.

**Adjunctions from monads**

A **splitting** of an idempotent is a pair of morphisms (a section-retraction pair) such that and ; we say that **splits** (or, in the terminology we used earlier, is a **split idempotent**) if it admits a splitting. Under very mild hypotheses (e.g. the existence of either equalizers or coequalizers), every idempotent admits a unique (up to unique isomorphism) splitting, where is simultaneously both the object of invariants and the object of coinvariants of . To exhibit this isomorphism we start by writing down a map from coinvariants to invariants (when they both exist), and this map exists because an idempotent both equalizes and coequalizes itself.

This categorifies as follows. (All of the terminology I’m about to introduce is nonstandard.) A **splitting** of a monad is a pair of adjoint 1-morphisms together with an isomorphism of monads; we say that **splits** (or is a **split monad**) if it admits a splitting.

*Example.* As above, let be the one-object 2-category corresponding to a monoidal category . A monad in is a monoid in , and a monoid splits iff there is a right dualizable object such that as monoids.

Specializing to the case that is the symmetric monoidal category of modules over a commutative ring , a monad in is a -algebra , and an algebra splits iff there is a f.g. projective -module such that as -algebras, or equivalently iff

.

Such a need not either exist or be unique. Lack of existence is clear; for example, when is a field the above condition says that is a matrix algebra, and there are plenty of non-matrix algebras. To see lack of uniqueness we can observe that if then we can take to be any invertible -module, so need not be unique if the Picard group is nontrivial.

(There’s something interesting to say here even when is a field. It’s possible for a -algebra not to split over but to split over a finite extension of ; such algebras correspond to nontrivial classes in the Brauer group . Incidentally, in this area there’s an existing definition of “split,” and it’s a happy accident as far as I can tell that the two uses coincide in this special case.)

We’ve learned that in general, splittings of monads neither exist nor are unique. However, it’s true that the Eilenberg-Moore object and Kleisli object (when they exist) both give rise to splittings, as follows. In particular, all monads in split (so arise from adjunctions).

A monad is naturally both a left and a right module over itself. Because is a right module over itself, admits a factorization

through the universal right module . The pair of 1-morphisms are in fact adjoint: the unit of the adjunction comes from the unit of , while the counit comes from the action 2-morphism , as follows. Part of the definition of a right module implies that the action 2-morphism is in fact a morphism of right -modules, and so by the universal property of the Eilenberg-Moore object it factors through , giving a 2-morphism which is our counit.

The verification of the zigzag identities isn’t hard but is annoying without good notation, so we’ll omit it. One of them follows from the unit condition for the right -module structure on , and the other one follows from the unit condition for the monad structure on , together with the same factoring-through- argument as above.

Hence whenever the Eilenberg-Moore object exists, it exhibits a splitting of . Dually (by reversing 1-morphisms), whenever the Kleisli object exists, it also exhibits a splitting of . (Hence Eilenberg-Moore and Kleisli objects can’t always exist in 2-categories with monads that aren’t split.)

But we can say more than this. The left adjoint now admits a natural left -module structure (since it’s the left adjoint of an adjunction that splits ), so by the universal property of the Kleisli object (if it exists), we get a further factorization

of . If our situation were exactly analogous to the case of idempotents, the middle morphism would be an equivalence. This isn’t true, but it’s very close, and it’s true if in addition is an **idempotent monad** (meaning that the multiplication 2-morphism is an isomorphism). In these arise from reflective subcategories, for example the adjunction between and .

*Example.* In the case of , let be a monad on a category . Then, as we saw above, the Eilenberg-Moore category is the category of -algebras, so objects equipped with action maps satisfying unit and associativity conditions. There’s an obvious forgetful functor given by forgetting the action map, and its left adjoint acts on objects as

.

This exhibits as the **free -algebra** on (categorifying how, if is an idempotent endomorphism of a set, then for every , the element is a fixed point of ). The -algebra structure on comes from the right -module structure on itself; explicitly, the action map is

and unit and associativity for it reduce to unit and associativity for . The claim that is the free -algebra says concretely that if is an -algebra (with action map ), then we have a natural bijection

.

This bijection is, explicitly, the following. The natural map from the LHS to the RHS sends an -algebra morphism to the composite

The natural map from the RHS to the LHS sends a morphism to the composite

.

The verification that these two maps are inverses to each other is purely equational. In one direction (starting from an -algebra morphism ), we need the identity

which we can prove as follows. being an -algebra morphism means, by definition, that

.

The identity we need then becomes

which follows from the unit condition on .

In the other direction (starting from a morphism ), we need the identity

The naturality of means precisely that

so the identity we need becomes

which follows from the unit condition on the -algebra structure on .

According to our abstract argument above, the left adjoint of the forgetful functor from the Eilenberg-Moore category should naturally factor through the Kleisli category . This factorization can be described as follows. There is a natural functor

sending an object to the free -algebra , and sending a Kleisli morphism to the composite

.

The free -algebra functor factors through this functor in the obvious way. In fact more is true: because we know that is the free -algebra on , we have

from which it follows that in fact the functor from the Kleisli category to the Eilenberg-Moore category is fully faithful: it exhibits the Kleisli category as the full subcategory of free -algebras among all -algebras.

*Subexample.* If is a poset, so that is a closure operator, then both the Eilenberg-Moore and Kleisli posets can be identified with the poset of -closed elements of (those elements such that ), and the natural map is an equivalence. This reflects the fact that closure operators are idempotent, so every closed element is a “free” closed element .

]]>

Below we work throughout over a field of characteristic zero.

For starters, the universal enveloping algebra functor , which a priori takes values in algebras (it’s left adjoint to the forgetful functor from algebras to Lie algebras), in fact takes values in Hopf algebras. This upgraded functor continues to be a left adjoint, although the forgetful functor is less obvious. Given a Hopf algebra , its primitive elements are those elements satisfying

where is the comultiplication. The primitive elements of a Hopf algebra form a Lie algebra, and this gives a forgetful functor from Hopf algebras to Lie algebras whose left adjoint is the upgraded universal enveloping algebra functor.

The key observation is that this upgraded functor is fully faithful; that is, there is a natural bijection between Lie algebra homomorphisms and Hopf algebra homomorphisms . This is more or less equivalent to the claim that the natural inclusion induces an isomorphism from to the Lie algebra of primitive elements of , which can be proven using the PBW theorem.

Hence Lie algebras embed as a full subcategory of Hopf algebras; that is, they can be thought of as Hopf algebras satisfying certain properties, rather than having extra structure (in the nLab sense). What are these properties? For starters, they are all cocommutative. This is important because cocommutative Hopf algebras are group objects in the category of cocommutative coalgebras (this is *not* true with “cocommutative” dropped!), which in turn can be interpreted as a category of infinitesimal spaces. (For example, this category is cartesian closed, and in particular distributive.)

Hence Lie algebras are group objects in cocommutative coalgebras satisfying some property (for example, “conilpotence”; see Theorem 3.8.1 here).

]]>

The pictures in this post can be interpreted in at least two ways. On the one hand, they are graphs of groups in the sense of Bass-Serre theory, and on the other hand, they are also dessin d’enfants (for the rest of this post abbreviated to “dessins”) in the sense of Grothendieck. But you don’t need to know that to draw and appreciate them.

**How to draw a group**

Using the fact that is a free product, we can think of the classifying space as the wedge sum of the classifying spaces and . These spaces are the homotopy quotients of points (which we give two different names to make the diagrams we’re about to draw easier to parse) by the trivial actions of and . Intuitively, they can be thought of as “half a point” and “a third of a point,” respectively. We’re going to draw as their wedge sum using the following dessin (using QuickLaTeX, because WordPress doesn’t support xymatrix):

The reason we want to draw is that we’re going to think of finite index subgroups as finite covers , and we’ll be drawing covers of this picture in a way closely analogous to how we can draw finite index subgroups of free groups by drawing covers of graphs.

What do covering spaces of this thing look like? The preimage of the edge will be a disjoint union of edges, so we expect to see something like a graph. The interesting question is what happens locally around the “orbifold points” . The covering space theory of these spaces is very simple: they both admit a single nontrivial connected cover, namely points (a double and a triple cover, respectively), and every covering space is a disjoint union of some copies of these nontrivial covers and the trivial cover. (This reflects the corresponding decomposition of -sets resp. -sets into transitive components.)

What’s more interesting is how the edge and the orbifold points interact. When gets covered by in some cover, the portion of the edge adjacent to it is doubled; similarly, when gets covered by in some cover, the portion of the edge adjacent to it is tripled and acquires a cyclic ordering (which describes how acts on the fiber over a basepoint in ).

For example, here is the unique connected double cover of .

The orbifold point has “unfolded” itself into an ordinary point , in the process doubling the edge, but the orbifold point has no interesting double covers, so it can’t “unfold” yet.

We can extract a lot of information from this dessin. Recall that in order to get a subgroup of of a space from a cover of it, we need to pick both a basepoint and a lift of that basepoint to the cover. Here it’s easiest to pick a basepoint in the middle of the edge, where there’s no funny orbifold business going on, and so a lift of the basepoint corresponds to an edge in the cover. The conjugates of the subgroup we get come from picking different edges, and two edges give the same subgroup iff they are related by an automorphism of the cover. (Moreover, “automorphism of the cover” means more or less the obvious thing in terms of dessins; we’ll be more precise about this later.) In particular, we can see visually when a subgroup is normal: it’s iff the automorphism group of the corresponding dessin acts transitively on edges, which is the case here. More generally, the number of subgroups conjugate to a given subgroup is given by the number of orbits of the action of the automorphism group on the edges of its dessin.

After picking an edge, the corresponding subgroup of is given by the stabilizer of this edge with respect to the action of on edges given as follows: if we write as a free product

then sends an edge which is connected to an “unfolded” white point to the other edge connected to (and otherwise fixes the edge), and sends an edge which is connected to an “unfolded” black point to the next edge in the cyclic order (and otherwise fixes the edge). We can find a set of generators for the stabilizer by finding a spanning tree of the dessin (which is unnecessary here) and looking at the corresponding loops we get in the same way as in the case of graphs, while remembering that there are extra loops coming from and .

More topologically, this procedure computes the (“orbifold”) fundamental group of the dessin, regarded as a picture of a space (“1-dimensional orbifold”) obtained by gluing copies of and to some of the vertices of a graph, via iterated application of Seifert-van Kampen.

(This argument, cleaned up and made fully rigorous, shows that every subgroup of , not necessarily finite index, is a free product of copies of , and . For a generalization of this result to arbitrary free products, which you should be able to guess from here, see the Kurosh subgroup theorem.)

From this description we see that the unique subgroup of index in (which can be described as the kernel of the natural map killing the second factor) is the free product

.

Now let’s move on to index . Here there is an obvious triple cover, namely

given by “unfolding” but covering trivially. (We need to pick a cyclic ordering of these three edges, but it doesn’t matter too much because the resulting dessins are isomorphic.) The corresponding subgroup is normal (as we can see because the automorphism group of this dessin acts transitively on edges); in fact it is the kernel of the natural map killing the first factor. This dessin shows that it is the free product

.

But we know that there are subgroups of index . The other are conjugate and come from the same dessin, namely

where has now also been “unfolded” once. The cyclic ordering on the edges around (which we haven’t drawn) matters now: without it this dessin would have a nontrivial automorphism exchanging the two edges around .

This is the first example we’ve looked at where the underlying graph is not a tree, and accordingly we get our first factor in the free product decomposition: this subgroup (we’ll pick our basepoint to be the odd edge out) is the free product

.

We get the free generator from the loop in the dessin as follows: takes us from our basepoint to (depending on the cyclic ordering) one of the other two edges around . Going around the loop involves passing through , which corresponds to acting on one of the edges around using to get the other one. Finaly, going back to our basepoint involves passing through , which corresponds to using again.

As we’ll see later, this subgroup is in fact (conjugate to) the smallest congruence subgroup .

Let’s move faster now. The subgroups of index form conjugacy classes of size coming from the following dessins, which have trivial automorphism group:

The dessins tell us that these subgroups are free products and respectively. The second one is (conjugate to) the second smallest congruence subgroup .

The subgroups of index form a single conjugacy class coming from the following dessin, which again has trivial automorphism group:

This dessin tells us that these subgroups are free products .

Index is when things start getting exciting because we can now have two trivalent vertices : among other things, we see our first appearance of subgroups which are torsion-free (and hence free). The dessin

has both and fully “unfolded.” Abstractly the corresponding subgroups are free products (so free groups on two generators).

There are actually two dessins here depending on whether or not the cyclic orderings around the two copies of match, and they are not the same, although they both have transitive automorphism group and so correspond to normal subgroups. One of them corresponds to the kernel of the natural map

while the other corresponds to the kernel of the natural map

and hence to the congruence subgroup .

But there are also torsion-free subgroups of index that are not normal, coming from the dessin

which does not have transitive automorphism group. There are orbits of edges under the action of the automorphism group (which is , coming from the obvious reflection), so we get conjugate subgroups, one of which them is the congruence subgroup .

The congruence subgroup also has index , and its dessin is

so in particular it is not free: instead, it is the free product . This dessin has trivial automorphism group, so we get conjugates.

There are dessins left for index :

The first dessin here is actually dessins depending on whether the cyclic orderings match again. Altogether we find that there are conjugacy classes of subgroups of of index , with sizes (in order based on the order in which we drew the dessins)

in agreement with our previous count. This means that the sequence of conjugacy classes of subgroups of index in begins

which we can look up on the OEIS, getting the sequence A121350.

**Dessins as graphs**

The graphs appearing here as dessins are bipartite graphs ( vertices can only be connected to vertices) where vertices have degree or and vertices have degree or . In addition, the vertices of degree are equipped with a cyclic ordering on their edges.

These dessins can be thought of as built up of copies of a single “puzzle piece,” namely an edge with “half a point” on one side and a “third of a point” on the other (perhaps a half and a third of a sphere, with appropriate connecting bits). The name of the game is that you take of these pieces and combine them by combining two half-points to make a whole point and three third-points to make a whole point . You can even encode the cyclic orders around the points by arranging the connecting bits so that three pieces can only connect in one cyclic order, and one of the connecting bits has an arrow pointing to the other. I wonder if someone would be interested in manufacturing these…

**Euler characteristic**

It’s not hard to see that torsion-free subgroups can only exist when the index is divisible by . This is because, thinking in terms of the corresponding transitive -sets , torsion-freeness is equivalent to both the generator of order and the generator of order acting with no fixed points. The first condition means is divisible by , while the second means is divisible by .

A more geometric way to see this is using the notion of virtual or **orbifold Euler characteristic**, which in some sense generalizes both the usual Euler characteristic and groupoid cardinality. We will only compute this number for (classifying spaces of) finitely generated groups which are virtually free, meaning that they contain a free subgroup of finite index (necessarily also finitely generated), which is the case for all finite index subgroups of . In this case the Euler characteristic is determined by the following two axioms:

- , and
- If is a subgroup of of index , then .

Note that is the Euler characteristic in the usual sense of , which can be described as a wedge of circles. The second axiom is motivated by the fact that the map is an -fold cover.

Since has a free subgroup on two generators of index , its Euler characteristic is

.

But the Euler characteristic axioms also imply that if is a finite group, then (since is a subgroup of of index , and , being the free group on generators, has Euler characteristic ), and we can arrive at this formula in another way assuming that inclusion-exclusion holds in a suitable form for this notion of Euler characteristic: namely, is obtained by connecting and by an edge, so if the usual inclusion-exclusion formula holds, then we should have

.

Assuming that Euler characteristic is well-defined, it now follows that a subgroup of of index has Euler characteristic , which is an integer iff is divisible by . Hence we learn not only that the free subgroups must have index but that they must be free on generators.

You can now go back through the examples above and verify that their Euler characteristics computed using either their index or inclusion-exclusion agree: for example, the unique subgroup of index has Euler characteristic

.

**Drawing congruence subgroups**

The dessins we’re drawing describe sets equipped with a transitive action of , namely the edges of the dessins, in the manner described above. For the congruence subgroups , the corresponding -set is given by the natural action of on itself, while for the congruence subgroups , the corresponding -set is given by the natural action of on the projective line .

may be a little unfamiliar if is not prime; for an arbitrary positive integer , it can be described as the quotient of the set of coprime pairs by the equivalence relation of multiplication by a unit in . Loosely speaking, these pairs can be thought of as fractions modulo , although if isn’t prime they have some funny and somewhat unexpected behavior. The size of , and hence the index of , is

and so we can compute that have indices respectively, and these are the only ones of index at most . The index of is the size of ; the only one of these groups of size at most is .

Here’s how to draw the dessin corresponding to . The generator of order can be taken to be the fractional linear transformation

acting on the elements of thought of as fractions as above, while the generator of order can be taken to be the fractional linear transformation

.

Since the action of is transitive, we can reach every element of by repeatedly applying each of these transformations to any particular point, say (by which we mean the ordered pair ). We can build up the dessin from here by repeatedly applying and and adding vertices and edges as appropriate.

For example, here is the dessin for drawn in this way, now with the edges labeled by elements of :

The implied cyclic order around the vertices, corresponding to the action of , is clockwise.

None of what we’ve said requires the subgroups to have finite index, so here is a small slice of the dessin for the subgroup generated by translation , which corresponds to the action of on the projective line (since translation generates the subgroup fixing ). The dessins are all quotients of this one.

**Higher-dimensional pictures**

Dessins give a “1-dimensional” picture of the finite index subgroups of the modular group in terms of certain 1-dimensional orbifolds modeling the classifying spaces . It is also possible to give 2-dimensional, 3-dimensional, and 4-dimensional pictures.

- In 2 dimensions, we can look at the quotient . These quotients are generalizations of modular curves, and their points describe elliptic curves equipped with various extra data. Geometrically they are 2-dimensional orbifolds which become ordinary surfaces if is torsion-free. Remarkably, Belyi’s theorem implies that in this case, the corresponding Riemann surfaces arise from algebraic curves defined over , and that all algebraic curves over arise in this way.
- In 3 dimensions, we can look at the quotient . These quotients are the unit tangent bundles of the modular quotients above, and are always ordinary 3-manifolds with PSL geometry. itself is the complement of the trefoil knot in , so the quotients are finite covers of this complement. See this blog post by Bruce Bartlett for more on this.
- In 4 dimensions, we can look at the quotient , where denotes the preimage of in . (I won’t write out this action because I have very little to say about this construction.) These quotients are elliptic surfaces with base and fiber over a point the elliptic curve corresponding to that point. Elliptic surfaces arising in this way are called elliptic modular surfaces.

We can recover dessins as drawn above from the modular curves as follows (although these are not quite the dessins described in the Wikipedia article, which I believe come from finite index subgroups of rather than ). These curves come with a natural map , where is the usual modular curve parameterizing elliptic curves with no extra structure. There is a map

sending an elliptic curve (where ) to its j-invariant which is almost an isomorphism. However, the action of on is not free, so we shouldn’t just take the ordinary quotient: the moduli space of elliptic curves is really a stack, and we can think about this stacky structure in terms of two special “orbifold points” in the quotient .

One of these special points corresponds to the elliptic curve given by quotienting by the square lattice, which has an extra automorphism of order given by multiplication by , while the other corresponds to the elliptic curve ( a primitive sixth root of unity) given by quotienting by the hexagonal lattice, which has an extra automorphism of order given by multiplication by . These extra automorphisms are reflected in the fact that these points have nontrivial stabilizers under the action of . Namely, the generator of order stabilizes , and the generator of order stabilizes .

So we can draw the dessin for itself as a path in (say a geodesic with respect to the hyperbolic metric) connecting the two special orbifold points. The preimage of this path in for a subgroup (not necessarily finite index) of is then the dessin associated to . When the corresponding dessin is a tree sitting inside (I think it is half of the 1-skeleton of the usual tesselation of the hyperbolic plane on which acts by symmetries), and this tree can in fact be used to prove that .

It follows that if is torsion-free, then the dessin of can be drawn on a surface whose genus it is possible to calculate. In the simplest case this surface has genus zero and is a punctured sphere; the corresponding dessins are then planar. This is the case, for example, for the modular groups , where the corresponding dessins encode Cayley graphs for the groups

.

These dessins turn out to be essentially 1-skeleta of the tetrahedron, cube, and dodecahedron respectively. More precisely, the dessins can be obtained by coloring the vertices of the 1-skeleta , then adding a to the middle of each edge (and then cyclically ordering appropriately).

]]>

For starters, let be a distributive category with terminal object , and let be the coproduct of two copies of . For an object , what does look like? If and is a sufficiently well-behaved topological space, morphisms correspond to subsets of the connected components of , and naturally has have the structure of a Boolean algebra or Boolean ring whose elements can be interpreted as subsets of the connected components of .

It turns out that naturally has the structure of a Boolean algebra or Boolean ring (more invariantly, the structure of a model of the Lawvere theory of Boolean functions) in any distributive category. Hence any distributive category naturally admits a contravariant functor into Boolean rings, or, via Stone duality, a covariant functor into profinite sets / Stone spaces. This is our “connected components” functor. When the object this functor outputs is known as the Pierce spectrum.

This construction can be thought of as trying to do for what the étale fundamental group does for .

**The proof**

In general, in a category with finite products, the set naturally acquires the structure of a model of the Lawvere theory generated by , namely the full subcategory on the finite products . (In our previous post on Lawvere theories we made use of the analogous construction for finite coproducts of .)

How can we calculate this Lawvere theory when in a distributive category? Using distributivity! By induction, it’s not hard to see that is literally the coproduct of copies of in any distributive category. It follows that a morphism is a -tuple of morphisms . There are two distinguished such morphisms, namely the two coproduct inclusions into , and using these two morphisms we can write down for any Boolean function a map , in a way that agrees with products and composition.

(**Edit: **This section previously contained a mistake which was pointed out by Zhen Lin in the comments.)

More abstractly, if is any distributive category then there is a natural functor given by sending a finite set to the coproduct , and by distributivity this functor preserves finite products in addition to finite coproducts. Above we’re looking at the image of the sets and morphisms between them under this functor, which are the objects and some distinguished morphisms between them.

(Note that distributivity can even be regarded as the crucial ingredient in the proof we outlined previously that Boolean functions can all be generated by constants and the if-then-else ternary operator; the key observation in that proof is that we can distribute , which is what let us express -ary Boolean functions in terms of pairs of -ary Boolean functions using if-then-else.)

It follows that in any distributive category, naturally acquires the structure of a model of the Lawvere theory of Boolean functions (so, according to taste, either a Boolean algebra or a Boolean ring).

**The intuition**

Intuitively, a morphism is a way to disconnect into two pieces, namely the pullbacks where the morphism is either of the two coproduct inclusions, although is probably not the coproduct of these two pieces unless we assume the stronger condition that the ambient category is extensive. These can in turn be thought of as subsets of the connected components of , and the Boolean algebra / ring structure then comes from the usual logical operations on such subsets, e.g. intersection and union.

**The example of affine schemes**

The example of affine schemes is worth working through in detail. First, the terminal object in affine schemes is , and so is . It’s enlightening to rewrite this using the isomorphism

which reveals that is the free commutative ring on an idempotent, and hence that morphisms , for a commutative ring, are naturally in bijection with idempotents in . Idempotents are in turn in bijection with direct product decompositions

and so morphisms of affine schemes really do correspond to ways to disconnect into pieces . Abstractly, this reflects the fact that is extensive, and not only distributive.

The abstract discussion above implies that the set of idempotents in canonically acquires the structure of a Boolean ring, which we’ll denote . The multiplication in is just the usual multiplication on idempotents, but the addition is the following modified addition : if are idempotents, then

.

Note that the third term disappears if has characteristic . Geometrically we can think of as indicator functions of unions of connected components of ; then the RHS describes the operation that must be performed on indicator functions, regarded as taking values in , to get XOR.

Altogether, we find that taking idempotents gives a functor from commutative rings to Boolean rings. (Curiously, it is not the right adjoint to the inclusion of Boolean rings into commutative rings, although it does preserve limits.) Taking opposite categories, we get a functor from affine schemes to profinite sets called the **Pierce spectrum** functor, which we’ll denote

.

consists off a point iff is connected, meaning it has exactly two idempotents, and (which are not equal, so the zero ring is not connected). This condition is equivalent to the Zariski spectrum being connected as a topological space, and holds, for example, for any integral domain.

The Pierce spectrum organizes the Zariski spectrum into “connected components” as follows. If is a prime ideal of , then the quotient map induces a map

on Pierce spectra. Since the Pierce spectrum of is a point, we can associate to a unique point in , which intuitively is the connected component to which the point belongs. This construction organizes into a natural map

where on the LHS denotes the prime spectrum as a topological space. (Curiously, this is a map on underlying topological spaces between two ringed spaces which cannot be promoted to a map of ringed spaces, basically because the natural inclusion of the Boolean ring object into the affine line is not a morphism of ring objects.)

The fibers of this map can be given natural affine scheme structures, as follows. An element of the Pierce spectrum can be thought of as a homomorphism of Boolean rings. These can be regarded as generalizations of ultrafilters, which they reduce to in the special case that is a product of copies of (so that the Pierce spectrum is the Stone-Čech compactification ). This occurs whenever is a product of connected rings (e.g. integral domains).

Accordingly, there is a generalization of an ultraproduct construction we can perform here: given , we can quotient by the ideal generated by the elements of (which, recall, are idempotents in ). The result, which we’ll call in deference to ultraproduct notation, is a connected ring, since every idempotent in belongs to either or and hence gets identified with either or in this quotient, and in fact any morphism from to a connected ring must factor through one of these morphisms . Geometrically this says that any morphism from a connected affine scheme to factors through some , so these quotients really do deserve the name “connected components.”

*Example.* Let . On math.SE, Martin Brandenburg recently asked what one can say about the Zariski spectrum of , and Eric Wofsey gave an excellent answer in terms of looking at the fibers of over its Pierce spectrum exactly as above (which in fact motivated the writing of this post).

In this case is , so the Pierce spectrum is , which can be identified with the space of ultrafilters on , and the fibers are given by ultraproducts. These ultraproducts admit many interesting prime ideals: for example, if is any sequence of primes, then we get a quotient map

to an ultraproduct of fields, which is again a field. So there is a prime ideal for every “nonstandard prime.”

]]>

In general, suppose you find yourself in some category. What sort of behavior could you look for that might qualify as “behaving like a category of spaces”?

One thing to look for is **distributivity**. Recall that a distributive category is a category with finite products and finite coproducts such that finite products distribute over finite coproducts; more explicitly, the natural maps

should be isomorphisms, and also the natural maps should be isomorphisms, where denotes the initial object. (Curiously, distributive categories are themselves like categorified versions of commutative rings.)

This is a pretty good test. The following familiar categories are distributive:

- More generally, any bicartesian closed category, and in particular any topos

These are all reasonable candidates for categories of “spaces.” On the other hand, the following familiar categories are not distributive:

- More generally, any nontrivial category with a zero object, and in particular any abelian category

You might object that there is also an entire field of mathematics dedicated to treating groups as geometric objects. I contend that the geometric object a group describes is actually a groupoid, and is distributive!

]]>

Since this category has finite products and is freely generated under finite products by a single object, namely , it is a Lawvere theory.

**Question: **What are models of this Lawvere theory?

**The answer**

The answer is that they are Boolean algebras, and also that they are Boolean rings. Equivalently, the standard axiomatizations of Boolean algebras resp. Boolean rings describe two different presentations of the Lawvere theory of Boolean functions. In the Boolean algebra presentation, the generators are

- two constants (true and false, or and ),
- a unary operator (NOT), and
- two binary operators (AND and OR),

while in the Boolean ring presentation, the generators are

- two constants ( and ),
- two binary operators (AND and XOR).

The relations are given by the various axioms that these satisfy.

It’s an interesting observation that either of these small sets of Boolean functions already suffice to build arbitrary ones. This means that logic circuits built out of these basic components can compute any Boolean function, which is maybe the zeroth interesting theorem in computer science. (The first interesting theorem, which is much more interesting, is the existence of universal Turing machines.)

Here’s a proof that either of these sets of Boolean functions generate all Boolean functions, although I won’t prove anything about relations. We only need to worry about Boolean functions of the form . By conditioning on whether or not the first bit of the input is or , the data of such a Boolean function is equivalent to the data of two other Boolean functions , namely the Boolean function of the other bits of the input you get if the first bit of the input is , and the function you get if the first bit of the input is . In terms of these two functions, itself can be described as “if the first bit is , then return ; else, return .”

Inducting on , this argument shows that all Boolean functions can be generated by

- two constants ( and ), and
- a ternary operator “if-then-else.”

This ternary operator is known in computer science as the conditional operator or ternary operator and in logic as conditional disjunction.

(I once spent some time asking myself what a programming language is from the point of view of category theory. My tentative answer is that a programming language is a tool for naming morphisms in categories. We can think about the generators above as a toy programming language whose only data type is booleans and whose only control structure is if-then-else, and the way in which I happened upon it was as a tool for naming morphisms in the Lawvere theory of Boolean functions. More complicated programming languages can be thought of as tools for naming morphisms in more complicated categories, such as cartesian closed categories.)

It follows that in order to show that some collection of Boolean functions generates all Boolean functions (a condition known in logic as functional completeness), it suffices to show that it can generate , and if-then-else.

In terms of logical implication , and as a function of three bits , if-then-else can be written

.

Logical implication can itself be written as , so if-then-else can be written in terms of (NOT, OR, AND). This shows that the Boolean algebra generators generate all Boolean functions. To show that the Boolean ring generators also generate all Boolean functions, we need to write NOT, OR, and AND in terms of XOR () and AND (). We can get using , and we already have AND, which means we can get OR using

.

Hence the Boolean ring generators also generate all Boolean functions.

]]>