into its presheaf category (where we use to denote the category of functors ). The Yoneda lemma asserts in particular that is full and faithful, which justifies calling it an embedding.

When is in addition assumed to be small, the Yoneda embedding has the following elegant universal property.

**Theorem:** The Yoneda embedding exhibits as the **free cocompletion** of in the sense that for any cocomplete category , the restriction functor

from the category of cocontinuous functors to the category of functors is an equivalence. In particular, any functor extends (uniquely, up to natural isomorphism) to a cocontinuous functor , and all cocontinuous functors arise this way (up to natural isomorphism).

Colimits should be thought of as a general notion of gluing, so the above should be understood as the claim that is the category obtained by “freely gluing together” the objects of in a way dictated by the morphisms. This intuition is important when trying to understand the definition of, among other things, a simplicial set. A simplicial set is by definition a presheaf on a certain category, the simplex category, and the universal property above says that this means simplicial sets are obtained by “freely gluing together” simplices.

In this post we’ll content ourselves with meandering towards a proof of the above result. In a subsequent post we’ll give a sampling of applications.

**A toy version of the above result**

Coproducts in particular are examples of colimits, so if we think of coproducts as being analogous to addition, we can think of a cocomplete category as being analogous to a commutative monoid and a cocontinuous functor as being analogous to a morphism of commutative monoids. The universal property above can then be thought of as analogous to the following. Let be a set and let be the set of functions which vanish except at finitely many points in . There is an inclusion sending a point in to the indicator function which is equal to at that point and elsewhere.

**Theorem:** The natural inclusion exhibits as the the free commutative monoid on in the sense that for any commutative monoid , the restriction map

from the set of monoid homomorphisms to the set of functions is a bijection.

(Of course an intriguing difference between the toy theorem and the real theorem is that being cocomplete is a property of a category, while being a commutative monoid is a structure placed on a set.)

In the setting of commutative monoids, a shorter description of the above theorem is that there’s a forgetful functor from commutative monoids to sets and that describes its left adjoint. Similarly, we’d like to be able to say that there’s a forgetful functor from cocomplete categories to categories and that the Yoneda embedding is its left adjoint. Unfortunately, there are nontrivial size issues that get in the way: is never small, and in fact, the only cocomplete small categories are preorders by a theorem of Freyd.

In any case, before we get to discussing the result in full generality, let’s look at some illustrative examples.

**Sets**

Take to be the terminal category. Then is just the category of sets. This example already says something interesting: the universal property implies that is the free cocomplete category on an object in the sense that if is a cocomplete category, then the category of cocontinuous functors is equivalent to itself. The inverse of this equivalence sends an object to the functor

which, given a set , returns the coproduct of copies of , and conversely every cocontinuous functor has this form. This statement should be thought of as analogous to the statement that is the free commutative monoid on a point.

**Graphs**

Take to be the category with two objects and two parallel morphisms between them. (This category is in fact a truncation of the simplex category.) Think of as a vertex, as an edge, and the two morphisms as the two inclusions of the endpoints of the edge. A presheaf is then precisely a pair of sets together with a pair of functions

.

The two maps have been named because we can think of them as source and target maps: in fact, is precisely a (directed multi)graph with vertex set and edge set . Here the universal property of presheaves can be interpreted as the claim that graphs are obtained by freely gluing together edges along vertices.

The universal property also gives a natural way of describing graphs as topological spaces, as follows: is a cocomplete category, and there is a functor sending to a point, to an interval , and the two arrows to the two inclusions of the endpoints of the interval. By the universal property, this functor extends to a cocontinuous functor sending a graph to its underlying topological space (with directions on the edges ignored). This is a simple version of geometric realization.

But of course the universal property implies that there are many other more exotic notions of geometric realization for graphs. For example, instead of using topological spaces we could use affine schemes: fixing a field , the category of affine schemes over is cocontinuous, and there is a functor sending to a point , to , and the two maps to the inclusions of the two points into (for example). By the universal property we obtain a geometric realization functor which, for example, sends the loop (the graph consisting of a vertex and an edge from that vertex to itself) to the affine scheme with ring of functions

.

This affine scheme is precisely the nodal cubic. To see this, write the loop as the coequalizer of the two maps , thought of as natural transformations between the corresponding presheaves. To compute the ring of functions on the resulting affine scheme means computing the equalizer of the two maps given by evaluation at and respectively.

**Species**

Write for the category (really groupoid) of finite sets and bijections. This is equivalently the core of the category of finite sets and functions. It is equivalent, as a category, to the disjoint union

of the one-object groupoids corresponding to the symmetric groups , hence the name . We will often think of the objects of as the non-negative integers. A presheaf is, depending on who you ask, a **species**, **-module**, or **symmetric sequence** in sets; we’ll use the term species. More concretely, a species is a collection of sets indexed by the non-negative integers such that each set is equipped with a (right) action of the symmetric group .

Species are surprisingly fundamental objects in mathematics. Under the name species, they were introduced by Joyal to study combinatorics, and among other things to categorify the theory of exponential generating functions; see, for example, Bergeron, Labelle, and Leroux. I think the names -module and symmetric sequence are used by authors studying operads, as operads are species with extra structure (see the nLab for details).

The universal property tells us that we can extend any functor from to a cocomplete category to a cocontinuous functor . An important source of functors is given by taking to be a symmetric monoidal category, to be an object, and considering the functor

.

This observation can be codified as the following universal property.

**Theorem:** , equipped with disjoint union, is the free symmetric monoidal category on an object in the sense that for any symmetric monoidal category , the restriction functor from the category of symmetric monoidal functors to the category of functors , which is just , is an equivalence.

If is in addition cocomplete, in such a way that the monoidal operation is cocontinuous in both arguments (**symmetric monoidally cocomplete**), then after choosing an object , we get not only a symmetric monoidal functor but even a functor , which turns out to be symmetric monoidal if is given a monoidal structure via Day convolution. (Day convolution is the monoidal structure categorifying the product of exponential generating functions.) This observation can in turn be codified as a universal property.

**Theorem:** , equipped with Day convolution, is the free symmetric monoidally cocomplete category on an object in the sense that for any symmetric monoidally cocomplete category , the restriction functor from the category of symmetric monoidal cocontinuous functors to the category of functors (thinking of as a representable presheaf), which is just , is an equivalence.

What do these symmetric monoidal cocontinuous functors actually look like? For an object , the corresponding functor is

where is shorthand for taking coinvariants with respect to the diagonal action of , and is shorthand for the coproduct of an -indexed family of s (see copower for some motivation behind this notation). This is an important construction: in the special case that is an operad, so that describes the set of -ary operations in the operad, the above construction describes the free -algebra on . If all of the are finite sets, the above construction can also be thought of as categorifying the exponential generating function

(thinking of taking coinvariants with respect to an -action as categorifying dividing by , in accordance with the general yoga of groupoid cardinality.)

*Example.* Let be the associative operad. Here consists of operations of the form

for each permutation and hence, as a right -set, is isomorphic to . is then naturally isomorphic to , so the free associative algebra (monoid) on an object in a symmetric monoidally cocomplete category is the infinite coproduct

.

Regarded just as a combinatorial species, categorifies the generating function .

*Example.* Let be the commutative operad. Here consists of the single operation

and hence, as a right -set, is trivial. is then the quotient , so the free commutative algebra (commutative monoid) on an object in a symmetric monoidally cocomplete category is the infinite coproduct

.

Regarded just as a combinatorial species, categorifies the generating function .

**Intuitions about the proof**

Recall that we are trying to show that the restriction functor

is an equivalence. By analogy with the corresponding statement about sets, commutative monoids, and free commutative monoids, one way to proceed with this proof is to figure out how to write every presheaf as a colimit of representable presheaves (the image of the Yoneda embedding ), then turn this colimit into a colimit in by applying a given cocontinuous functor . This will show, roughly speaking, that the restriction map is “injective” (although we need to be careful about what this means because we’re dealing with categories, not sets).

To show that the restriction map is “surjective,” we need to extend a functor to a cocontinuous functor . We’d like to do this “by linearity,” by choosing an expression for a presheaf as a colimit of representable presheaves and turning this colimit into a colimit in by applying our functor; however, we need to be able to make this choice functorially, and then we still need to verify that the resulting functor is actually cocontinuous.

**Presheaves as colimits of representable presheaves**

The following result is at least implicit in the use of the terminology “free cocompletion” and is important in getting the above proof to work, as well as being a generally useful thing to know in category theory. It is sometimes called the co-Yoneda lemma for reasons that are a little difficult to explain without more background. Previously it showed up when we discussed operations and pro-objects, but there we rushed through the proof and here we’ll take a more leisurely pace.

**Theorem:** Let be a (locally small) category. Then every presheaf is canonically a colimit of representable presheaves.

*Idea #1.* One relevant intuition here is to think of a presheaf as a recipe for writing down a colimit in by prescribing how many copies of each object and morphism in appear in the diagram, in the same way that one can think of a function from a set to the non-negative integers (with finite support) as a recipe for writing down an element of the free commutative monoid on by prescribing how many copies of each element of to add up. This intuition is hopefully quite clear in the case of graphs, where a presheaf on tells you how many edges and vertices to glue together as well as how to glue them together.

*Idea #2.* For the more categorically minded, a related intuition is the following. Let be a diagram in . The colimit , if it exists, is defined by a universal property describing how maps out of it behave. This determines the covariant functor it represents uniquely, but says very little about the contravariant functor it represents. However, there is in some sense a “minimal” possibility for this contravariant functor. For example, if the colimit in question is the coproduct of two objects, then by definition

but the only thing we know about is that there are natural inclusion maps , hence we know that admits a natural map from , but this is all we know without further information. Now, since colimits in functor categories are computed pointwise, is none other than the coproduct of , but regarded as lying in the presheaf category. In general, the sense in which presheaves are “free colimits” of objects of is that, as contravariant functors, they describe the “minimal” contravariant functors that a colimit of objects in could represent.

Now we turn to the proof itself.

*Proof.* Let by a presheaf. Since we want to describe as a colimit, let’s think about the contravariant functor that represents. By definition, consists of families of maps satisfying the naturality condition that if is a morphism, then the diagram

(drawn using QuickLaTex) commutes. We want to write as a colimit of representable functors, and we know that by the Yoneda lemma, if (which we use to designate the representable functor ) is a representable functor, then . To go from elements of to maps we need copies of .

A clean way to obtain these copies is to write down a diagram whose objects are given by pairs of an object and an element , equipped with the map to given by forgetting . The preimage of is then precisely , and if we don’t specify any morphisms then a cocone over this diagram in is precisely a family of maps satisfying no naturality conditions.

To get the naturality conditions back we need to equip with morphisms. Choosing the morphisms such that sends to enforces precisely the naturality condition desired on the maps , and furthermore the maps canonically exhibit as the colimit of the corresponding diagram in as desired.

(The diagram we constructed above is the opposite of the **category of elements** of , which is a special case of the **Grothendieck construction**. As described in the nLab article, we can think of as the classifying space of -bundles, and then is the classifying map of a -bundle on and is the total space of the bundle. admits other more sophisticated descriptions that won’t concern us at the moment.)

**The actual proof**

Now we return to the proof of the theorem. Let be a small category and be a cocomplete category. Recall, again, that we are trying to show that the restriction functor

is an equivalence of categories. If we wanted to show that a map of sets was a bijection, we’d just have to show that it’s injective and surjective, and we sketched some intuition for why this should be the case above. But an equivalence of categories is more subtle, and instead of verifying two conditions we need to verify three: needs to be full, faithful, and essentially surjective.

To show that is fully faithful, let be a natural transformation between two cocontinuous functors . We want to show that knowing the restriction of to representable functors uniquely determines for all presheaves , and moreover that given such a restriction we can always extend it to a natural transformation on all presheaves. But since is cocontinuous and is a colimit of representables, is freely determined by the universal property of colimits: in particular it is determined by its restriction to every representable , which is just composed with the inclusion by naturality, and given such a compatible family of restrictions it exists.

To show that is essentially surjective, let be a functor. We want to extend to a cocontinuous functor , which we will do “by linearity”: if is a presheaf, we’ll write it canonically as a colimit of representable presheaves using the diagram of shape we described above (which is small since is small), then apply to this diagram to obtain a diagram in , then take the colimit in . In symbols,

.

Every step of this process, including the formation of the category of elements, is functorial, so really is a functor. (It is crucial that be small to ensure that is a small diagram; “cocomplete” only means that all small colimits exist, and in fact the theorem of Freyd alluded to above also implies that a category with all colimits is a preorder.)

It remains to verify first that really is cocontinuous and second that it really does restrict to (a functor naturally isomorphic to) . These will both be a corollary of the following.

**Proposition:** is the left adjoint of the functor

.

(A version of this construction, the “left pro-adjoint,” appeared previously on this blog.)

(There is some mild abuse of notation going on here. should really denote the functor given by precomposition with , and should really denote the left adjoint of this functor, also known as **left Kan extension**. The decorations (pronounced “upper star” and “lower shriek” respectively) on these functors are by analogy with some of Grothendieck’s six operations on sheaves.)

*Proof.* We want to show that there is a natural bijection

.

We know that , hence we can write the RHS as

first by the universal property of colimits and second by the Yoneda lemma. On the other hand, by definition is also a colimit over , hence we can write the LHS as

by the universal property of colimits. The conclusion follows.

In particular, since is a left adjoint, it is necessarily cocontinuous, and if above is a representable presheaf then the above adjunction gives

by the Yoneda lemma, so by a second application of the Yoneda lemma. It follows that is essentially surjective, hence an equivalence as desired.

(In fact should really have denoted the functor given by precomposition with , and what we really wrote down above is the left adjoint to this functor, which is a genuine left Kan extension along . We could’ve written the proof so as to show that is not only a left adjoint but in fact an inverse once we restrict to cocontinuous functors.)

]]>

Although I’m sure there are more, I’m only aware of two other students at Berkeley who’ve posted transcripts of their quals, namely Christopher Wong and Eric Peterson. It would be nice if more people did this.

]]>

Standard presentations of propositional logic treat the Boolean operators “and,” “or,” and “not” as fundamental (e.g. these are the operators axiomatized by Boolean algebras). But from the point of view of category theory, arguably the most fundamental Boolean operator is “implies,” because it gives a collection of propositions the structure of a category, or more precisely a poset. We can endow the set of propositions with a morphism whenever , and no morphisms otherwise. Then the identity morphisms simply reflect the fact that a proposition always implies itself, while composition of morphisms

is a familiar inference rule (hypothetical syllogism). Since it is possible to define “and,” “or,” and “not” in terms of “implies” in the Boolean setting, we might want to see what happens when we start from the perspective that propositional logic ought to be about certain posets and figure out how to recover the familiar operations from propositional logic by thinking about what their universal properties should be.

It turns out that when we do this, we don’t get ordinary propositional logic back in the sense that the posets we end up identifying are not just the Boolean algebras: instead we’ll get Heyting algebras, and the corresponding notion of logic we’ll get is intuitionistic logic.

**True, false**

Propositional logic should have two special propositions, “true” and “false.” Categorically, we should expect “true” and “false” to have universal properties, and indeed they do: “true” should be implied by everything while “false” should imply everything. In other words, “true” should be a terminal object or **top element** or **greatest element** of the poset and “false” should be an initial object or **bottom element** or **least element**. We will denote these by and respectively.

Hence the posets we are interested in have both a top and a bottom element; these are the **bounded** posets.

*Example.* Starting from any poset , we can adjoin top and bottom elements in the obvious way, and every bounded poset arises in this way for a unique poset (namely the one obtained by removing the top and bottom elements). So this hypothesis is not very restrictive.

**And, or**

Propositional logic should have a logical “and” operator. Categorically, we again expect a universal property, which is the following: we should have if and only if and . This is precisely the universal property of products, so should be the product or **meet** or **infimum** of the two elements . The projection maps reproduce conjunction elimination, another familiar inference rule.

Dually, we need to be able to take the logical “or” of two propositions. The corresponding universal property is that we should have if and only if and . This is precisely the universal property of coproducts, so should be the coproduct or **join** or **supremum** of the two elements . The inclusion maps reproduce disjunction introduction. Note in particular that the empty meet is the top element and the empty join is the bottom element.

Hence the posets we are interested in have all finite joins and meets; these are the (bounded) **lattices**.

*Example.* Any total order with top and bottom elements is a lattice, where the meet of two elements is their minimum and the join of two elements is their maximum. For example, is such a total order, as is any successor ordinal.

*Example.* The poset of open subsets of a topological space and the poset of measurable subsets of a measurable space are by definition lattices.

*Example.* The poset of subobjects of an object in a category is often a lattice. For example, any group has a lattice of subgroups, where the meet is the intersection and the join is the subgroup generated by two subgroups. Similarly, any module has a lattice of submodules, where the meet is the intersection and the join is the sum. In particular, any ring has a lattice of ideals (left, right, or two-sided).

**Implies**

Propositional logic should have an “internal” notion of implication. In other words, should not only just be true or false but should itself be a proposition. This would allow us to state inference rules like modus ponens (). The corresponding universal property is that if and only if . This is precisely the universal property of exponential objects, which we encountered when talking about the Lawvere fixed point theorem.

For posets, having finite meets is equivalent to having finite limits, and dually having finite joins is equivalent to having finite colimits. A category with finite limits and exponential objects is cartesian closed, and a cartesian closed category with finite coproducts is bicartesian closed. Hence the posets we are interested in are precisely the bicartesian closed posets; these are in turn precisely the **Heyting algebras**.

*Example.* Let be a lattice with arbitrary joins such that finite meets distribute over arbitrary joins. Then is cartesian closed and hence a Heyting algebra. This is a consequence of the adjoint functor theorem for posets, and in particular implies that the lattice of open subsets of any topological space is a Heyting algebra. For open sets the implication turns out to be

.

**Not**

Finally, propositional logic should have a notion of negation. The notion of negation we’ll adopt is that the negation of a proposition asserts that it implies false, so

.

Note that there is no reason for double negation to hold in general. There is also no reason for excluded middle to hold in general. So we’re really in the realm of intuitionistic logic here.

*Example.* Let be the lattice of open subsets of a topological space as above. Then negation takes the form

.

and can easily fail. For example, let and let . Then , so . Excluded middle fails even more badly: in any topological space, iff is clopen, hence satisfies excluded middle iff is discrete, in which case is a Boolean algebra. is connected iff never satisfies any nontrivial case of excluded middle.

Note, however, that for topological spaces we always have , and in fact in any Heyting algebra we always have

.

To see this, observe that by the universal property this implication holds if and only if , but this follows from modus ponens.

]]>

**Method 1: Mayer-Vietoris**

For this particular method write . We can compute the cohomology of inductively by regarding it as the union of two copies of with intersection and using Mayer-Vietoris. The cohomological version of Mayer-Vietoris is a long exact sequence of the form

.

The maps are induced by pulling back along the inclusion , whereas the maps are induced by the difference between the pullbacks along the inclusions . Because these maps are homotopic to the identity map , we can think of as being given by

where , and we can think of as being given by two copies of a single map , which we’ll denote by . It follows that is the antidiagonal copy of in , hence factors through the map from to and contains a copy of given by .

It also follows that is the diagonal copy of , hence that is surjective. Finally, is the kernel of , hence the quotient of by is . In other words, we have short exact sequences

.

But inductively it will turn out that all the groups involved are free abelian so all of these exact sequences split. In fact, inducting on the above relation it follows that the Poincaré polynomials

satisfy and , hence

.

So by induction we conclude that . Note that we have not computed the cup product structure.

**Method 2: the Künneth formula**

This method will compute the cup product structure. is the product of copies of , whose cohomology as a ring is ; there are no interesting cup products. By the Künneth formula, the cohomology of is the graded tensor product, as algebras, of copies of (since all of the cohomology groups involved are free). This is precisely the exterior algebra , with each generator in degree . In particular, naturally and that under this isomorphism the cup product corresponds to the wedge product.

**Method 3: de Rham cohomology**

This method will compute the cohomology over by computing the de Rham cohomology of . One particularly nice way to do this is to use the following.

**Theorem:** Let be a compact connected Lie group acting on a smooth manifold . The inclusion of invariant differential forms into differential forms is a quasi-isomorphism (induces an isomorphism on cohomology).

The idea behind this result is that, since is compact, there is an averaging operator given by averaging over the action of with respect to normalized Haar measure on . But since is connected, the action of any individual element of is homotopic to the identity, so this average is also homotopic to the identity.

In particular, letting act on itself by translation, we conclude that we can compute its de Rham cohomology using translationally invariant differential forms on , or equivalently on its universal cover . But these are precisely the differential forms obtained by wedging together the -forms . The exterior derivative vanishes on all such forms, so we conclude that the de Rham cohomology of is the exterior algebra on .

**Method 4: Hopf algebras**

This method will compute the cohomology over . Since is a topological group, it’s equipped with a product operation . The induced map in cohomology has the form

by the Künneth formula. This map is coassociative and compatible with cup product, so equips with the structure of a bialgebra. Together with the map induced by the inversion map and the identity , the cohomology of acquires the structure of a Hopf algebra, and in fact this was Hopf’s motivation for introducing Hopf algebras. Hopf algebras arising in this way satisfy the following very stringent structure theorem.

**Theorem (Hopf):** Let be a finite-dimensional graded commutative and cocommutative Hopf algebra over a field of characteristic zero such that (the Hopf algebra is connected). Then is the exterior algebra on a finite collection of generators of odd degrees.

The comultiplication sends each generator to , the antipode sends each generator to , and the counit sends each generator to .

To compute the cohomology of it therefore suffices to determine what the possible generators of the exterior algebra are. For starters, let’s write more abstractly as where is a finite-dimensional real vector space of dimension and is a lattice in of full rank (the subgroup generated by a basis of ). Covering space theory gives us that . By the Hurewicz theorem, , so by the universal coefficient theorem,

.

This gives us generators of degree , one for each element of a basis of , and so at the very least contains the exterior algebra . But now we’re done: the cohomology can’t contain any generators of higher degree because wedging them with the generators we’ve already found would produce nonzero elements of the cohomology of in degrees higher than , and no such elements exist (either because admits a CW-decomposition involving cells of dimension at most or because the de Rham complex only extends up to dimension for a smooth manifold of dimension ).

**Method 5: suspension**

Recall that cohomology is a stable invariant in the sense that

where is the (reduced) suspension of (here a pointed space). Recall also that for nice pointed spaces the suspension of a product has homotopy type

where is the wedge sum and is the smash product. Finally, recall that and that , so .

Two spaces are said to be stably homotopy equivalent if for some ; in particular, stably homotopy equivalent spaces have isomorphic cohomology. The above result tells us that is stably homotopy equivalent to (once we know that suspension commutes with wedge sums). More generally, by induction we conclude that a product is stably homotopy equivalent to a wedge obtained formally by expanding

,

where denotes the unit of the smash product, and removing the unit. It follows that is stably homotopy equivalent to a wedge of copies of the -sphere, , and by a simple application of Mayer-Vietoris (for wedge sums), the cohomology of such a wedge is the same as what we’ve computed before.

This argument does not get us the cup product structure, since the cup product is an unstable phenomenon; after suspension, all cup products are trivial. However, it does describe the stable homotopy type of , which contains information that cohomology doesn’t (e.g. about stable homotopy groups).

**Method 6: cellular homology**

To compute the cohomology of it suffices to compute the homology and apply either universal coefficients or Poincaré duality. It is possible to describe fairly concretely what the homology of looks like using cellular homology. Recall that cellular homology describes a chain complex computing the homology of a CW-complex which in degree is free abelian on the -cells in a cell decomposition of . Our particular admits a cell decomposition with cells of dimension given by starting with the minimal cell decomposition of into two cells (a -cell and a -cell connecting the -cell to itself) and taking products, where we’re thinking of cubical -cells here. Equivalently, we can think of as being with opposite -faces identified, and then our cells are the faces of up to this identification.

The boundary maps in the cellular complex are as follows. If is a -cell and its attaching map (where here denotes the -skeleton of ), then the differential is

where runs over an enumeration of all -cells , denotes the degree, and is the map induced by collapsing all of except the cell to a point.

In this particular case all of the boundary maps in the cellular complex are trivial, so the homology is free abelian on cells. To see this, note that if is not surjective, then it necessarily has degree since it is null-homotopic, so we reduce to the surjective case. In this case the -cell must be a face of the -cell , and since we’ve collapsed everything else we can reduce to the case that , so that is the top-dimensional cell. At this point we will cheat a little: if in this case, then we would have , but is a compact orientable manifold and therefore must satisfy .

In particular, the cell decomposition we gave above is minimal: it is not possible to give a cell decomposition with fewer cells. In addition, by Poincaré duality the cohomology can also be thought of as free abelian on cells, and moreover we can describe the cup product in terms of transverse intersections of submanifolds representing homology classes. We can do this by explicitly intersecting the cells above, but the following description is perhaps more elegant: if we think of as , then a subspace represents a homology class if it is translation-invariant (given by the pushforward of a fundamental class). The images of two such subspaces intersect transversely if , and then their intersection represents a homology class which Poincaré dualizes to the cup product of the Poincaré duals of . In particular, note that the short exact sequence

implies that . Its Poincaré dual is therefore a class in , which has the correct degree.

**Method ???: the Lefschetz fixed point theorem**

This method is not numbered because the argument is incomplete. Consider the map

where each is a positive integer equal to at least . This map has fixed points, since in each coordinate the fixed points of are precisely the th roots of unity. Each fixed point has index . By the Lefschetz fixed point theorem it follows that

.

Knowing what we already know about the cohomology, it is tempting to identify a monomial on the LHS with a cohomology class on the RHS on which acts by multiplication by that monomial. We can do this as follows. For any subset of indices we have a projection map . Since is a compact orientable manifold, it has a fundamental class generating its top cohomology. The map induces a map on such that any point has preimages, hence has degree as a map on , so acts on the fundamental class by multiplication by . This action induces an action on the pullback of the fundamental class of to which is also by multiplication by .

As the vary this argument shows that the cohomology classes arising in this way are all linearly independent, hence all contribute to the RHS of the Lefschetz fixed point theorem. The sum of the corresponding contributions to the RHS exhaust all terms on the LHS, so if there is any more cohomology to be found then it isn’t being detected by .

**Method ???: Morse theory**

There is a convenient choice of Morse function on given by

.

The gradient of this function is , and in particular it vanishes iff for all . There are therefore critical points , organized in batches of critical points such that coordinates are equal to and coordinates are equal to . At such a point of the second derivatives of each term are equal to and are equal to , with no other contributions to the second-order Taylor series expansion of , so all critical points are nondegenerate (hence we do in fact have a Morse function) with index . Morse theory then guarantees that has the homotopy type of a CW-complex with cells of dimension .

This argument should be placed in the context of a Morse-theoretic proof of the Künneth formula; more generally, if are manifolds with Morse functions , then is a Morse function on the product , and critical points of are precisely products of critical points on the , and so forth.

With more effort Morse theory even provides a complex computing the homology, but I wasn’t able to easily compute the differentials in it (they should all vanish in this case).

**Interpretation**

Our computations admit the following interpretation. Recall that is the Eilenberg-MacLane space representing integral cohomology in the sense that there is a natural isomorphism , where denotes the space of homotopy classes of maps (or weak homotopy classes if are not CW-complexes) . It follows that represents -tuples of cohomology classes in . By the Yoneda lemma, cohomology classes in , or equivalently homotopy classes of maps , can naturally be identified with natural transformations

.

Such natural transformations between cohomology functors are called cohomology operations, and the computations we did above imply that the only cohomology operations of this form are generated by wedge products under addition. (“Interesting” cohomology operations over , not generated by addition and the wedge product, require higher cohomology classes as input. The smallest one is a cohomology operation ; see this math.SE question.)

]]>

Omega places before you two opaque boxes. Box A, it informs you, contains $1,000. Box B, it informs you, contains either $1,000,000 or nothing. You must decide whether to take only Box B or to take both Box A and Box B, with the following caveat: Omega filled Box B with $1,000,000 if and only if it predicted that you would take only Box B.

What do you do?

(If you haven’t heard this problem before, please take a minute to decide on an option before continuing.)

**The paradox**

The paradox is that there appear to be two reasonable arguments about which option to take, but unfortunately the two arguments

support opposite conclusions.

The **two-box** argument is that you should clearly take both boxes. You take Box B either way, so the only decision you’re making is whether to also take Box A. No matter what Omega did before offering the boxes to you, Box A is guaranteed to contain $1,000, so taking it is guaranteed to make you $1,000 richer.

The **one-box** argument is that you should clearly take only Box B. By hypothesis, if you take only Box B, Omega will predict that and will fill Box B, so you get $1,000,000; if you take both boxes, Omega will predict that and won’t fill Box B, so you only get $1,000.

The two-boxer might respond to the one-boxer as follows: “it sounds like you think a decision you make in the present, at the moment Omega offers you the boxes, will affect what Omega did in the past, at the moment Omega filled the boxes. That’s absurd.”

The one-boxer might respond to the two-boxer as follows: “it sounds like you think you can just make decisions without Omega predicting them. But by hypothesis he can predict them. That’s absurd.”

Now what do you do?

(Again, please take a minute to reassess your original choice before continuing.)

**The von Neumann-Morgenstern theorem**

Let’s avoid the above question entirely by asking some other questions instead. For example, a question one might want to ask after having thought about Newcomb’s paradox for a bit is “in general, how should I think about the process of making decisions?” This is the subject of **decision theory**, which is roughly about decisions in the same sense that game theory is about games. The things that make decisions in decision theory are abstractions that we will refer to as **agents**. Agents have some preferences about the world and are making decisions in an attempt to satisfy their preferences.

One model of preferences is as follows: there is a set of (mutually exclusive) **outcomes**, and we will model preferences by a binary relation on outcomes describing pairs of outcomes such that the agent **weakly prefers** to . This means either that in a decision between the two the agent would pick over (the agent **strictly prefers** to ; we write this as ) or that the agent is indifferent between them. The weak preference relation should be a total preorder; that is, it should satisfy the following axioms:

**Reflexivity:**. (The agent is indifferent between an outcome and itself.)**Transitivity:**If and , then . (The agent’s preferences are transitive.)**Totality:**Either or . (The agent has a preference about every pair of outcomes.)

If and then this means that the agent is indifferent between the two outcomes; we write this as . The axioms above imply that indifference is an equivalence relation.

The strong assumptions here are transitivity and totality. One reason totality is a reasonable axiom is that an agent whose preferences aren’t total may be incapable of making a decision if presented with a choice between two outcomes the agent doesn’t have a defined preference between, and this seems undesirable. For example, if we were trying to write a program to make medical decisions, we wouldn’t want the program to crash if faced with the wrong kind of medical crisis.

One reason transitivity is a reasonable axiom is that an agent whose preferences aren’t transitive can be **money pumped**. For example, if an agent strictly prefers apples to oranges, oranges to bananas, and bananas to apples, then I can offer the agent an apple, then offer to trade it a banana for the apple and a penny (say), then offer to trade it an orange for the banana and a penny (say), and so forth. Again, if we were trying to write a program to make important decisions of some kind, this kind of vulnerability would be very dangerous.

In this model, an agent makes decisions as follows. Each time it makes a decision, it must choose from some number of actions. It needs to determine what outcomes result from each of these actions. Then it needs to determine which of these outcomes is greatest in its preference ordering, and it selects the corresponding action.

This is very unsatisfying as a model of decision making because it fails to take into account uncertainty. In practice, agents making decisions cannot completely determine what outcomes result from their actions: instead, they have some uncertainty about possible outcomes, and that uncertainty should be factored into the decision-making process. We will take uncertainty into account as follows. Define a **lottery** over outcomes to be a formal linear combination

of outcomes, where the are real numbers summing to and should be interpreted as the probabilities that the outcomes occurs. (Equivalently, a lottery is a particularly simple kind of probability measure on the space of outcomes, which is given the discrete -algebra as a measurable space, but we will not need to use this language.) We now want our agent to have preferences over lotteries rather than preferences over outcomes. That is, the agent’s preferences are now modeled by a total order on lotteries.

Aside from the axioms defining a total order, what other axioms seem reasonable? First, suppose that are two lotteries such that . Now consider the modified lotteries and where with probability the original lotteries occur but with probability some other fixed lottery occurs. Whether we are in the first case or not, we either prefer or are indifferent to what happens in the second lottery, so the following seems reasonable.

**Independence:**If , then for all and all we have . Moreover, if and then .

Note that by taking the contrapositive of the second part of independence we get a partial converse of the first part: if such that , then . In particular, if , then . This will be useful later.

Another reasonable axiom is the following. Suppose are three lotteries such that . Now consider the family of lotteries . When the agent weakly prefers this lottery to , but when the agent weakly prefers to this lottery. What happens for intermediate values of ? It seems reasonable for an “intermediate value theorem” to hold here: the agent’s preferences should not jump as varies. So the following seems reasonable.

**Continuity:**If , then there exists some such that .

With these axioms we can now state the following foundational theorem.

**Theorem (von Neumann-Morgenstern):** Suppose an agent’s preferences satisfy the above axioms. Then there exists a function on outcomes, the **utility function** of the agent, such that if and only if

where and . The utility function is unique up to affine transformations where .

If is a lottery, the corresponding sum is the **expected utility** with respect to the lottery, so the von Neumann-Morgenstern theorem allows us to describe the goal of an agent (a **VNM-rational agent**) satisfying the above axioms as maximizing expected utility.

*Proof.* First observe that we can reduce to the case that is finite. If the theorem were false in the infinite case, then for any proposed utility function we would be able to find a pair of lotteries such that but . But since in total only involve finitely many outcomes, restricts to a utility function with the same property on the finitely many outcomes involved in , so the theorem is false in the finite case.

Now for the proof. It is possible to take a fairly concrete but tedious approach by first constructing using continuity and then proving that satisfies the conclusions of the theorem by induction. We will instead take a more abstract approach by appealing to the hyperplane separation theorem. To start with, think of the set of lotteries as sitting inside Euclidean space as the probability simplex . Let be outcomes which are minimal resp. maximal in the agent’s preference ordering. For , let .

We would like to show that the subset

(of lotteries the agent strictly prefers to , but strictly prefers to) and the subset

(of lotteries the agent strictly prefers to but strictly prefers to ) are disjoint convex open subsets of . That they are disjoint follows from the definition of strict preference. That they are convex can be seen as follows: if are two lotteries such that , then by independence we have

for all , hence for all . Applying this argument with and then applying the argument with reversed inequality signs, first with general and then with , gives the desired result.

Finally, that they are open can be seen as follows: let be a lottery such that . By inspection every point in an open ball around has the form where is some other lottery, which can be taken to be either a lottery equivalent to (in that the agent is indifferent between them) or a lottery such that . So it suffices by convexity to show that for any such there exists some such that .

In the case that can be taken to be equivalent to this is straightforward; by independence

.

In the case that can be taken to satisfy , a similar application of independence gives

.

Again, applying the argument with and then applying the argument with reversed inequality signs, first with general and then with , gives the desired result.

Now by the hyperplane separation theorem there exists a hyperplane separating and , where are constants. These constants are in fact independent of and are (up to affine transformation, and in particular we may need to flip their signs) the utility function we seek. To see this, let be two lotteries. Then by independence , and by continuity there is a constant such that

.

If , then , and the separating hyperplane must pass through both and (since are in neither nor , and the complement of their union consists of lotteries equivalent to ), so they have the same utility. Conversely, if a separating hyperplane passes through two lotteries then they must be equivalent to the same and hence must be equivalent.

Otherwise, , and the separating hyperplane separates and . With the correct choice of signs, it follows that as desired. Conversely, if a separating hyperplane separates two lotteries then they cannot have the same expected utility and hence cannot be equivalent; with the correct choice of signs, if then .

It remains to address the uniqueness claim. The above discussion shows that the utility function is uniquely determined by its value on and , subject to the constraint that . To fix the correct choice of signs above we may set ; any other choice is related to this choice by a unique positive affine linear transformation.

**But what about the paradox?**

The relevance of the von Neumann-Morgenstern theorem to Newcomb’s paradox is that a particular interpretation of Newcomb’s paradox in the context of expected utility maximization supports the one-box argument. A VNM-rational agent participating in Newcomb’s paradox should be acting in order to maximize expected utility. For the purposes of recasting Newcomb’s paradox in this framework, it’s reasonable to equate utility with money; agents certainly don’t need to have the property that their utility functions are linear in money, but Newcomb’s paradox can just be restated in units of utility (**utilons**) rather than money.

So, it remains to determine the expected utility of the lottery that occurs if the agent takes one box and the lottery that occurs if the agent takes two boxes. Newcomb’s paradox can be interpreted as saying that in the first lottery, the box contains $1,000,000 with high probability (whatever probability the agent assigns to Omega being an accurate predictor), while in the second lottery, the two boxes together contain $1,000 with high probability. Provided that this probability is sufficiently high, which again can be absorbed into a suitable restatement of Newcomb’s paradox, it seems clear that a VNM-rational agent should take one box. (Note that stating the one-box argument in this way shows that it does not depend on Omega being a perfect predictor; Omega need only be a sufficiently good predictor, where the meaning of “sufficiently” depends on the ratio of the amounts of money in each box.)

This version of the one-box argument is therefore based on the **principle of expected utility** (to be distinguished from the von Neumann-Morgenstern theorem); roughly speaking, that rational agents should act so as to maximize expected utility. Relative to the definition of expected utility given above this says exactly that rational agents should be VNM-rational.

The two-box argument can also be based on a decision-making principle, namely the **principle of dominance**, which says the following. Suppose an agent is choosing between two options and . Say that **dominates** if there is a way to partition possible states of the world such that in each partition, the agent would prefer to choice . (The notion of domination does not depend on having a notion of probability distribution over world states; it requires something much weaker, namely a set of possible world states.) The principle of dominance asserts that rational agents should choose dominant options.

This seems plausible. But it also seems to be the case that taking two boxes dominates taking one box in Newcomb’s paradox:

- If Omega has filled Box B with $1,000,000, then taking both boxes gives you $1,001,000 rather than $1,000,000, so it’s $1,000 better.
- If Omega hasn’t filled Box B with $1,000,000, then taking both boxes gives you $1,000 rather than $0, so it’s still $1,000 better.

One situation in which the principle of dominance doesn’t make sense is if the choice between options itself affects which partition of world-states you’re in. For example, if you chose which boxes to open and then Omega chose whether to fill Box B based on your choice, then the above reasoning doesn’t seem to apply since Omega gets to choose which partition of world-states you’re in after seeing your choice between the two options. But in the setting of Newcomb’s paradox itself this doesn’t seem to be the case: Omega has already made its decision in the past, and it seems absurd to think of the agent’s decision in the present as having an effect on Omega’s past decision.

So Newcomb’s paradox appears to show that the principle of expected utility maximization and the principle of dominance are inconsistent.

Now what do you do?

**Further reading**

Newcomb’s paradox remains, as far as I can tell, a hotly debated topic in the philosophical literature, and in particular is considered unresolved. Campbell and Sowden’s *Paradoxes of Rationality and Cooperation* is a thorough, if somewhat outdated, overview of some aspects of Newcomb’s paradox and its relationship to the prisoner’s dilemma.

]]>

be a function. Then we can write down a function such that . If we **curry** to obtain a function

it now follows that there cannot exist such that , since .

Currying is a fundamental notion. In mathematics, it is constantly implicitly used to talk about function spaces. In computer science, it is how some programming languages like Haskell describe functions which take multiple arguments: such a function is modeled as taking one argument and returning a function which takes further arguments. In type theory, it reproduces function types. In logic, it reproduces material implication.

Today we will discuss the appropriate categorical setting for understanding currying, namely that of cartesian closed categories. As an application of the formalism, we will prove the Lawvere fixed point theorem, which generalizes the argument behind Cantor’s theorem to cartesian closed categories.

**Some examples of mathematical currying**

*Example.* A group action on a set is often described using a function . Currying gives a function ; in other words, it associates to every element a function . It seems more natural to define a group action in this way, but what works in may work less well in other categories; for example, when defining actions of Lie groups on manifolds, we talk about smooth functions because it is unclear in this setting in what sense the space of smooth functions is a smooth manifold (hence in what sense we should be asking for smooth functions from into this space).

*Example.* A vector space is equipped with a dual pairing . Currying gives a function , and the corresponding functions are in fact linear, so we can associate to every an element of the double dual space . In other words, currying gives us the double dual map . There is a similar map in the setting of Pontrjagin duality.

*Example.* A topological space is equipped with an evaluation map , where here denotes the space of continuous complex-valued functions . Currying gives a function which associates to every an evaluation map . When is compact Hausdorff, every homomorphism of complex algebras has this form.

**Cartesian closed categories**

A **cartesian closed category** is a category with finite products in which the product functor has a right adjoint, the **exponential** . In other words, there is a natural identification

.

The notation is nonstandard; a more conventional notation is , but the notation (which is sometimes used for the more general notion of internal hom) emphasizes the fact that a Cartesian closed category is in particular a closed monoidal category, and in particular is enriched over itself.

Letting be the terminal object, we get that there is a natural identification

.

In other words, the **global points** (morphisms from , also just called **points**) of are naturally identified with the set of morphisms from to .

More generally, the -points of , which by definition are naturally identified with , should be thought of as “-parameterized families of morphisms from to .”

Uncurrying the identity map , we obtain the **evaluation map**

describing, internally, how to evaluate functions on arguments. In computer science, this function is also called **apply**.

*Example.* is cartesian closed, and is the basic example. Here the internal hom is the set of functions from to and the global points of a set are its set of points in the ordinary sense. The same applies to .

*Example.* The category of (small) categories is cartesian closed. Here the product is the usual product of catgories and the internal hom is the category of functors from to , with morphisms given by natural transformations. The global points of a category are its objects.

*Subexample.* In , the subcategory of groupoids is cartesian closed, since the product of groupoids and the functor category between two groupoids both remain groupoids. If are two groups regarded as one-object categories, the functor category is the groupoid whose objects are the morphisms and whose morphisms are given by pointwise conjugation by elements of . Note that the category of groups is not cartesian closed.

*Subexample.* In , the subcategory of posets is cartesian closed, since the product of posets and the functor category between two posets both remain posets. If are two posets, then is the poset of order-preserving functions with iff for all .

*Example.* Let be a group. The category is cartesian closed; it has a product inherited from , and exponential objects are given by the set of all functions from to together with the -action

.

The global points of a -set are its fixed points, and in particular the global points of are the set of -morphisms .

*Example.* Any Boolean algebra, regarded as a poset and then regarded as a category, is cartesian closed. The product of two propositions is their logical “and” , and the exponential object is the material implication . The currying adjunction

simply says that implies if and only if implies . The terminal object is the proposition “true,” and a proposition has a global point if and only if it is a tautology. The evaluation map is an internal description of modus ponens.

*Non-example.* It is an unfortunate fact about point-set topology that is not cartesian closed (see, for example, this math.SE question). When it exists, the exponential is often given the compact-open topology. This problem is fixed by working instead with a convenient category of topological spaces, such as the category of compactly generated spaces.

*Non-example.* Suppose a cartesian closed category has a zero object . Since there is a unique morphism from to any other object, it follows that every exponential has a unique global point, hence that there is a unique morphism from any object to any other object (necessarily the zero morphism). Conversely, if has a zero object and a nonzero morphism, then cannot be cartesian closed.

**Proposition:** In a cartesian closed category, products distribute over colimits in both variables, and exponentials send colimits in to limits and preserves limits in .

**Corollary:** If is a cartesian closed category with finite coproducts (a **bicartesian closed category**), then letting denote the coproduct, we have the following natural identifications:

- (so is a distributive category),
- .

*Proof.* These all follow from the natural identifications

.

In more detail, is a left adjoint and hence preserves arbitrary colimits, is a right adjoint and hence preserves arbitrary limits, and is a (contravariant) right adjoint (to itself!) and hence, as a contravariant functor on , sends colimits to limits.

Specialized to the cartesian closed category of finite sets, the above result explains from a categorical point of view the algebraic axioms satisfied by addition, multiplication, and exponentiation of non-negative integers.

**Corollary:** Let be a category. Then the category of presheaves on is cartesian closed.

This greatly generalizes the example of ; we get, for example, a version of the category of graphs and the category of simplicial sets as special cases.

*Proof.* Products are easy to construct, since limits are computed pointwise. To construct exponentials, suppose that are two presheaves whose exponential exists. The universal property and the Yoneda lemma together imply that

which uniquely defines a presheaf. It remains to check that this presheaf really satisfies the universal property, but this follows from the fact that every presheaf is a colimit of representable presheaves and from the fact that products distribute over colimits, which is true because it is true pointwise; that is, in .

The terminal object in is the presheaf sending every object to and sending every morphism to the unique morphism . If itself has a terminal object , then it represents the terminal presheaf, hence a global point of a presheaf is just an element of , so we can explicitly verify that . In general, a global point of a presheaf is a choice of element for each which is compatible with every morphism in in the sense that if is any morphism, then ; in other words, it is an element of the limit .

In particular, if is the category of open subsets of a topological space (so that a presheaf on is a presheaf on in the usual sense), then a global point of a presheaf is a **global section**. Note that this is equivalently an element of (since is the terminal object) or a choice of element for each open which is compatible with inclusions in the sense that if then restricts to .

The category of sheaves on a topological space is also a cartesian closed category, and moreover is a topos.

**Presheaves on a monoid**

We showed earlier that is cartesian closed for a group, but the explicit description we gave of the exponential requires talking about inverses in . On the other hand, the above theorem implies in particular that is cartesian closed for a monoid which is not necessarily a group. What does the exponential look like in this case?

Let be a category with one object with endomorphism monoid . Then is the category of right -sets, and the unique representable presheaf is as a right -module over itself. If are two -sets, then the above description of the exponential gives

with right -action induced from the left -action of on itself. If is a group, this is naturally isomorphic to , since a morphism of right -sets is freely and uniquely determined by what it does to (where is the identity). This can fail if is not a group, since the value of such a morphism on may not be determined by the value on an element of the form if is not of the form for any , and also since the value of a morphism on may be constrained by the value on two elements of the form if there exists an such that .

*Example.* Let be the free monoid on an idempotent, so that . This is the smallest monoid which is not a group. The category of right -sets is the category of sets equipped with idempotent endomorphisms. The subcategory of such sets such that is constant (equivalently, such that has a unique fixed point) is equivalent to the category of pointed sets: a morphism between such -sets is precisely a map of sets which preserves the unique fixed point. Thus if are such -sets of cardinalities respectively, then is again such a set of cardinality , and so there are morphisms . On the other hand, there are maps of sets.

**The Lawvere fixed point theorem**

To motivate the Lawvere fixed point theorem, let’s write the diagonalization argument above in somewhat greater generality. If is any function, then we can find a function such that iff . Now we curry to obtain a function . If there exists such that , then as before and cannot be in the image of , hence is not surjective.

The crucial step is the step where we write down the function such that . A systematic way to do this is to compose with a function with no fixed points. Lawvere realized that, by taking contrapositives, this means the basic argument behind Cantor’s theorem can be recast as the following fixed point theorem.

**Theorem (Lawvere):** Let be objects in a category with finite products such that the exponential exists (in particular, this is true for any pair of objects in a cartesian closed category). Let and suppose that is **surjective on points** in the sense that the induced map is surjective. Then every morphism has a **fixed point** in the sense that the induced map has a fixed point; that is, has the **fixed point property**.

*Proof.* Let be any morphism and let

where is the diagonal map; see, for example, this blog post. ( specializes to the paradoxical subset constructed in the usual proof of Cantor’s theorem.) By hypothesis, there exists a point such that (where, if are two morphisms in a category with finite products, denotes the product morphism .) But then

whereas by definition , from which it follows that is a fixed point of .

**Taking the contrapositive**

Taking the contrapositive, we conclude that if is an object in a cartesian closed category such that there exists a function with no fixed points, then no morphism can be surjective on points. When in we immediately reproduce Cantor’s theorem, and morally we reproduce Russell’s paradox as well. The proof of the Lawvere fixed point theorem actually provides a particular morphism not in the image of any morphism ; this particular morphism generalizes CantorBot and also gives us the unsolvability of the halting problem.

]]>

- Given a Lie group , its tangent space at the identity is
*a priori*a vector space, but it ends up having the structure of a Lie algebra. - Given a space , its cohomology is
*a priori*a graded abelian group, but it ends up having the structure of a graded ring. - Given a space , its cohomology over is
*a priori*a graded abelian group (or a graded ring, once you make the above discovery), but it ends up having the structure of a module over the mod- Steenrod algebra.

The following question suggests itself: given a construction which we believe to output objects having a certain amount of structure, can we show that in some sense there is no extra structure to be found? For example, can we rule out the possibility that the tangent space to the identity of a Lie group has some mysterious natural trilinear operation that cannot be built out of the Lie bracket?

In this post we will answer this question for the homotopy groups of a space: that is, we will show that, in a suitable sense, each individual homotopy group is “only a group” and does not carry any additional structure. (This is not true about the collection of homotopy groups considered together: there are additional operations here like the Whitehead product.)

**Extra structure on a functor**

The setting in which we will work is the following. Suppose we have some functor which *a priori* takes values in a category . To what extent can we lift to a functor taking values in a “more structured” category equipped with a forgetful functor such that the obvious diagram commutes? As phrased, this question is incredibly general, so we will restrict ourselves to lifts which are described by taking into account structure coming from -ary operations, as follows.

Suppose has finite products. Then we can consider natural transformations to be -ary operations (as in this previous post on Lawvere theories) on the outputs of the functor which equip the objects with extra structure. More precisely, the full subcategory of the functor category on the objects is a Lawvere theory, the **endomorphism Lawvere theory** of (named in analogy with the endomorphism operad). Note that equipping an object in a category with finite products with the structure of a model of a Lawvere theory is equivalent to giving a morphism of Lawvere theories; in particular, itself is tautologically a model of , and this model structure passes to . This lets us lift to a functor taking values in the category of -valued models of , or more precisely the category of product-preserving functors .

If , is representable by some object , and also has finite coproducts, then we can identify natural transformations with morphisms by the Yoneda lemma. Consequently, we can identify with , where is regarded as an object in the opposite category . There is a corresponding story where is a contravariant representable functor; here we just have .

It may be hard to compute the entire endomorphism Lawvere theory of a functor, but any natural transformations that we can find may already provide extra structure that wasn’t there before. More generally it is often possible to identify Lawvere theories and morphisms of Lawvere theories, which allow us to lift to the category of -valued models of . These kinds of observations are already enough to reproduce many familiar examples of extra structure, and generalize the observation that is acted on from the left by the monoid of endomorphisms and from the right by the monoid of endomorphisms .

*Example.* If is a group object in a category with finite products, then the group operation gives a morphism from the Lawvere theory of groups to . Hence naturally acquires the structure of a group. (Conversely, by the Yoneda lemma, if naturally has the structure of a group then is a group object.)

*Example.* Dually, if is a cogroup object in a category with finite coproducts, then the cogroup operation gives a morphism from the Lawvere theory of groups to . Hence naturally acquires the structure of a group. (Again, conversely, by the Yoneda lemma, if naturally has the structure of a group then is a cogroup object.)

*Example.* In the category of schemes over a base ring , the endomorphism Lawvere theory of the affine line is the Lawvere theory of polynomials over , or equivalently the Lawvere theory of commutative -algebras. Hence naturally acquires the structure of a commutative -algebra. (We previously discussed the case for affine schemes in this blog post.)

*Example.* In the category of topological spaces, the space admits addition and multiplication operations in addition to scalar multiplication operations , and these generate the Lawvere theory of polynomials over . Hence naturally acquires the structure of a commutative -algebra.

*Example.* A **distributive category** is a category with finite products and coproducts such that the former naturally distribute over the latter; the standard example is , although and more generally any cartesian closed category also qualify, and and (the category of schemes) are important examples which are not cartesian closed.

In any distributive category, the endomorphism Lawvere theory of the object canonically admits a morphism from the Lawvere theory of Boolean algebras, or equivalently the Lawvere theory of Boolean rings, or equivalently the category of Boolean functions (the full subcategory of on finite sets of size ). Hence naturally acquires the structure of a Boolean algebra, or equivalently a Boolean ring. In this reproduces the lattice of clopen subsets of a topological space. In general I think it should be interpreted as something like the “lattice of decidable properties.”

*Example.* If is an abelian group, then the group operation is itself a morphism in , giving a morphism from the Lawvere theory of abelian groups to . Hence naturally acquires the structure of an abelian group. (We discussed a more general setting in which such an abelian group structure exists in this previous post on semiadditive categories.)

**The homotopy groups are groups**

Recall that the **pointed homotopy category** is the category whose objects are pointed topological spaces and whose morphisms are homotopy classes of pointed continuous maps preserving the base point. Recall also that the homotopy groups are a sequence of functors naturally defined on this category and represented by the spheres with some choice of base point, which we will usually omit in our notation. That the homotopy groups are groups is equivalent to the statement that the spaces , as objects of the pointed homotopy category, are all cogroup objects.

The basic idea is to observe that a pointed map from to a pointed space is the same thing as a map from the -cube to such that the boundary is sent to . In general, morphisms from the -cube can be glued together along any pair of -dimensional faces provided that the images of those faces match. There are distinguished such gluings coming from gluing together each of the copies of in the product in the usual way that one glues two intervals together. These gluing operations are natural, associative, and have inverses up to homotopy. They give compatible group operations on which, when , make it an abelian group by the Eckmann-Hilton argument.

The appearance of maps out of and multiple composition operations suggests a higher-category-theoretic perspective on the situation where we can think of as a suitable automorphism group. More precisely, for any we can associate to an unpointed topological space its fundamental -groupoid , which is the -category whose

- objects are the points of ,
- morphisms are the paths between points of ,
- -morphisms are the homotopies between paths,
- -morphisms are the homotopies between homotopies,

… - -morphisms are the homotopy classes of homotopies between homotopies between…

Note that a -morphism can be thought of as a map , with its source and its target determined by its restriction to a suitable choice of two copies of in it. -morphisms have notions of composition given by gluing along the coordinate directions, generalizing horizontal and vertical composition of -morphisms in -categories (in particular, of functors).

The homotopy group of a pointed space can then be interpreted as the group of -automorphisms of the identity -endomorphism of the identity -endomorphism of… of the identity endomorphism of in the fundamental -groupoid.

**The homotopy groups are only groups**

We would like to show that the homotopy groups are only groups in the sense that the endomorphism Lawvere theories of the functors are generated by the Lawvere theory of groups. In fact we will be able to say slightly more than this.

**Theorem:** The endomorphism Lawvere theory of is precisely the Lawvere theory of groups.

*Proof.* By the Yoneda lemma, this means we want to show that the full subcategory of on the finite wedge sums of is equivalent, as a category with finite coproducts, to the full subcategory of on the finitely generated free groups. To show this it more or less suffices to show that the fundamental group of a wedge of circles is the free group generated by each circle (strictly speaking we should show that this identification can be made compatible with partial composition, but we already know this because we already know that the fundamental group is a group), but this follows from Seifert-van Kampen.

In the context of a more general result, not only has fundamental group but is an Eilenberg-MacLane space , since its universal cover is a tree, and the subcategory of on Eilenberg-MacLane spaces (suitably pointed) is known to be equivalent to , with the equivalence given by .

**Theorem:** The endomorphism Lawvere theory of is precisely the Lawvere theory of abelian groups.

*Proof.* By the Yoneda lemma, this means we want to show that the full subcategory of on the finite wedge sums is equivalent, as a category with finite coproducts, to the subcategory of on the finitely generated free abelian groups. To show this it more or less suffices to show that is the free abelian group generated by each inclusion of into the wedge (where there are spheres in the wedge) (and, again, strictly speaking we should show compatibility with partial composition, but we already know this).

Since admits a CW-structure with a single -cell and no -cells, , it is -connected by cellular approximation. By the Hurewicz theorem, it follows that the Hurewicz map is an isomorphism, so to compute the former it suffices to compute the latter. But now by Mayer-Vietoris.

]]>

]]>

**Theorem:** Let be a finite -group acting on a finite set . Let denote the subset of consisting of those elements fixed by . Then ; in particular, if then has a fixed point.

Although this theorem is an elementary exercise, it has a surprising number of fundamental corollaries.

**Proof**

is a disjoint union of orbits for the action of , all of which have the form and hence all of which have cardinality divisible by except for the trivial orbits corresponding to fixed points.

**Some group-theoretic applications**

**Theorem:** Let be a finite -group. Then its center is nontrivial.

**Corollary:** Every finite -group is nilpotent.

*Proof.* acts by conjugation on . If , this set has cardinality , which is not divisible by . Hence has a fixed point, which is precisely a nontrivial central element.

**Cauchy’s theorem:** Let be a finite group. Suppose is a prime dividing . Then has an element of order .

*Proof.* acts by cyclic shifts on the set

since if then . This set has cardinality , which is not divisible by , hence has a fixed point, which is precisely a nontrivial element such that .

**Some number-theoretic applications**

**Fermat’s little theorem:** Let be a non-negative integer and be a prime. Then .

*Proof.* acts by cyclic shifts on the set , where ; equivalently, on the set of strings of length on letters. (Orbits of this group action are sometimes called **necklaces**; see, for example, this previous blog post.) This set has cardinality and its fixed point set has cardinality , since a function is fixed if and only if it is constant.

**Fermat’s little theorem for matrices:** Let be a square matrix of non-negative integers and let be a prime. Then .

This result also appeared in this previous blog post.

*Proof.* Interpret as the adjacency matrix of a graph. acts by cyclic shifts on the set of closed walks of length on this graph. This set has cardinality and its fixed point set has cardinality , since a closed walk is fixed if and only if it consists of repetitions of the same loop at a vertex.

In the same way that Fermat’s little theorem can be used to construct a primality test, the Fermat primality test, Fermat’s little theorem for matrices can also be used to construct a primality test, which doesn’t seem to have a name. For example, the Perrin pseudoprimes are the numbers that pass this test when

.

More generally, a stronger version of this test gives rise to the notion of Frobenius pseudoprime.

**Wilson’s theorem:** Let be a prime. Then .

*Proof.* Consider the set of total orderings of the numbers modulo addition ; that is, we identify the total ordering

with the total ordering

where the addition occurs .

acts by cyclic shifts on this set. It has cardinality and its fixed point set has cardinality , since the fixed orderings are precisely the ones such that for some .

**Lucas’ theorem:** Let be non-negative integers and be a prime. Suppose the base- expansions of are

respectively. Then

.

*Proof.* It suffices to show that

where , and then the desired result follows by induction. To see this, consider a set of size , and partition it into a block of size together with blocks of size . Identify all of the blocks of size with each other in some way. Then acts by cyclic shift on these blocks. This action extends to an action on the set of subsets of of size . A fixed point of this action consists of a subset of the block of size together with copies of a subset of any block of size . Since the union of the copies has size divisible by , and since , it follows that there must be elements in the block of size and elements in each of the blocks. Thus the set of subsets has cardinality , and its fixed point set has cardinality .

(It is also possible to prove this result directly by considering a more complicated -group action. The relevant -group is the Sylow -subgroup of the symmetric group of . It is an iterated wreath product of the cyclic groups , and describing how it acts on requires recursively partitioning blocks in a way that is somewhat tedious to describe.)

The following proof is due to Don Zagier.

**Fermat’s two-square theorem:** Every prime is the sum of two squares.

*Proof.* Let be the set of all solutions in the positive integers of the equation . The involution has the property that its fixed points correspond to solutions to the equation . We will show that such fixed points exist by showing that there are an odd number of them; to do so, it suffices to show that is odd, and to show that is odd it suffices to exhibit any other involution on which has an odd number of fixed points. (Notice that this is two applications of the -group fixed point theorem.) We claim that

is an involution on with exactly one fixed point, which suffices to prove the desired claim. Verifying that it sends to is straightforward. If is a fixed point, then since are all assumed to be positive integers we cannot have , so we are not in the first case. If we are in the second case, we conclude that , but then , and since we conclude that , which gives . If we are in the third case, we conclude that , then that , but this contradicts . Hence is the unique fixed point as desired.

The following proof is due to V. Lebesgue.

**Quadratic reciprocity:** Let be odd primes. Let denote the Legendre symbol. Then

.

*Proof.* Let be an odd prime and consider the set

.

For an odd prime, we will compute in two different ways. The first way is to observe that acts by cyclic shifts on . The number of fixed points is the number of such that , hence

.

The second way is to find a recursive formula for which will end up being expressed in terms of . The details are as follows.

First, it is clear that . For nonzero, we claim that is independent of . First, observe that by the pigeonhole principle (there are possible values of and possible values of , hence at least one common value) we always have . Second, observe that the map is the norm map from to ; in particular, it restricts to a homomorphism from the unit group of the former ring to the unit group of the latter ring, and we know that this homomorphism is surjective, so each fiber has the same size. The unit group of has size if (since then we are looking at ) and size if (since then we are looking at ). Hence

for any nonzero . When , the equation always has the solution . There are no nonzero solutions if (since does not have a square root) and nonzero solutions if (since for each of the possible values of there are possible values of ). Hence

.

We will now compute inductively by rewriting as

.

From this it follows that

.

We know that takes the value for possible values of and takes any particular other value for possible values of . Hence we can write

But it’s clear that

since this is just the number of possible tuples . Hence

.

This gives a recursion which we can unwind into an explicit formula for as follows. Substituting the recursion into itself and canceling terms gives

.

This pattern continues, and by induction we can show that

.

Hence if is odd, we conclude that

.

Now let be an odd prime. Then by Fermat’s little theorem and by Euler’s criterion, so we we find that

.

On the other hand, we know that . We conclude that

and since the LHS and RHS are both equal to this equality holds on the nose.

This proof is in some sense a proof via Gauss sums in disguise; the numbers are closely related to Jacobi sums, which can be written in terms of Gauss sums. More generally, various kinds of exponential sums can be related to counting points on varieties over finite fields. The Weil conjectures give bounds on the latter which translate into bounds on the former, and this has various applications, for example the original proof of Zhang’s theorem on bounded gaps between primes (although my understanding is that this step is unnecessary just to establish the theorem).

Similar methods can be used to prove the second supplementary law, although strictly speaking we will use more than the -group fixed point theorem.

**Quadratic reciprocity, second supplement:** . Equivalently, iff .

*Proof.* We will compute in two different ways. The first is using the formula we found earlier, which gives

.

The second is to observe that the dihedral group of order acts on by rotation and reflection (thinking of the points as being on the “circle” defined by ). Most orbits of this action have size and hence are invisible . The ones that are not invisible correspond to points with some nontrivial stabilizer. Any stabilizer contains an element of order , and there are five such elements in , so the remaining possible orbits are as follows:

- Points stable under . These are precisely the points such that , hence there are of them.
- Points stable under . These are precisely the points such that , hence there are of them.
- Points stable under . These are precisely the points such that , hence there are of them.
- Points stable under . There are no such points.
- Points stable under . These are precisely the points such that , hence there are of them.

In total, we conclude that

and the conclusion follows.

A general lesson of the above proofs is that it is a good idea to replace integers with sets because sets can be equipped with more structure, such as group actions, than integers. This is a simple form of categorification.

]]>

The goal of this post is to present a rephrasing of the statement and proof of Cantor’s theorem so that it is no longer about sets, but about a particular kind of game related to the prisoner’s dilemma. Rather than showing that there are no surjections , we will show that a particular kind of player in this game can’t exist. This rephrasing may make the proof more transparent and easier to absorb, although it will take some background material about the prisoner’s dilemma to motivate. As a bonus, we will almost by accident run into a proof of the undecidability of the halting problem.

**Cantor’s theorem**

If are two sets, let denote the set of functions . In particular, we will write the power set of as , where , since we can identify a subset of with its **indicator function**

.

**Theorem:** Let be a set. Then there does not exist a surjection .

*Proof.* Suppose otherwise, and let be a surjection. Consider the subset

.

By hypothesis, there exists such that . But now if , then , so , but if , then , so ; contradiction. Hence does not exist.

Note that the proof goes through perfectly fine for a finite set, where it tells us that for all non-negative integers . This is one hint that Cantor’s theorem is not really about infinite sets, even though its main application is proving that infinite sets come in different sizes.

**The prisoner’s dilemma**

A **prisoner’s dilemma** is any two-player game in which the players move simultaneously; each player has two possible moves, cooperate or defect; and each player prefers outcomes in the following order:

- (I defect, you cooperate), often called
**exploiting**the other player, followed by - (I cooperate, you cooperate), then
- (I defect, you defect), and then
- (I cooperate, you defect), often called being
**exploited**.

A common way of analyzing games like the prisoner’s dilemma is to determine their **Nash equilibria**. To define these, we first need the notion of a **mixed strategy**, which is a probability distribution over possible moves. A Nash equilibria in a game is then a choice of mixed strategy for each player such that each player, upon learning the mixed strategy of the other player, does not wish to change their strategy. Nash famously proved that in a game where each player has finitely many possible moves, a Nash equilibria always exists. We need to consider mixed rather than pure strategies because Nash equilibria are mixed in general; for example, the unique Nash equilibrium in rock-paper-scissors is that each player chooses rock, paper, or scissors with probability each.

The prisoner’s dilemma roughly models many real-world situations, and is of interest because the only Nash equilibrium in the game is that both players defect (since both players, upon learning what move their opponent is making, would strictly prefer to change their move to defect), but this situation is worse for both players than if both players had cooperated. This raises some interesting questions about how mutual cooperation can occur among rational agents of the kind considered in game theory, and possibly also some questions about how mutual cooperation can evolve in nature.

One idea is to consider the **iterated prisoner’s dilemma**, where two players repeatedly engage in a prisoner’s dilemma for many rounds. Here it is important to assign point values to the various outcomes and specify that both players are trying to maximize the number of points they have, and then it could be a sensible strategy to cooperate rather than defect in the hopes of promoting cooperation on future rounds. Unlike a **one-shot** prisoner’s dilemma (one that lasts one round) where it is assumed that you don’t know anything about your opponent, in an iterated prisoner’s dilemma you and your opponent learn about each other over time, so if you each learn that the other is cooperative then mutual cooperation might occur.

Axelrod famously ran an iterated prisoner’s dilemma tournament played by programs implementing various strategies, and found empirically that greedy strategies (that try to exploit opponents) generally did worse than altruistic strategies (that offered cooperation even with the possibility of being exploited). The most successful deterministic such strategy was **tit for tat**, which is also very simple to describe: cooperate on the first round, then repeat whatever your opponent did on the previous round. Tit for tat achieves mutual cooperation against other players that also cooperate but immediately attempts to punish defection with defection of its own. The iterated prisoner’s dilemma is a plausible rough model of social interaction, and tit for tat (or its more altruistic cousin, **generous tit for tat**, which occasionally forgives defection on the part of its opponent), is a plausible rough model of how cooperation among social animals like humans works.

But what about trying to achieve mutual cooperation on the one-shot prisoner’s dilemma? Unlike the iterated prisoner’s dilemma, it seems that you have no chance to learn anything about your opponent. Here a standard argument makes an additional assumption: namely, that you assume your opponent reasons the same way you do and that you both have common knowledge of this. (It is also typical to further assume that the payoff matrix is symmetric.) In this case, it seems plausible that if you cooperate, then so will your opponent, and similarly for defection. In other words, exploitation is asymmetric and therefore seems to be ruled out, and in that case you should cooperate expecting your opponent to have come to the same conclusion. In the philosophy literature this is known as the **symmetry argument** for cooperation; Hofstadter calls this superrationality.

However, this argument is philosophically contentious. It almost seems to be claiming that choosing to cooperate would somehow cause your opponent to cooperate, while choosing to defect would cause your opponent to defect, which seems implausible. The contention here is related to issues surrounding Newcomb’s problem and decision theory which we’ll avoid for the time being. Can we say something less contentious?

**The prisoner’s dilemma with visible source code**

One reason the iterated prisoner’s dilemma is different from the one-shot prisoner’s dilemma is that playing multiple rounds allows you to learn about your opponent. Consider the following alternate setup for learning about your opponent while only playing one round: two players, rather than play against each other directly, write programs to play a one-shot prisoner’s dilemma for them. Moreover, the programs are fed as input each other’s source code. Call this the **prisoner’s dilemma with visible source code**. For a brief introduction to the literature on this game, which gives rise to the notion of program equilibrium, see for example this blog post and the resources therein. I was introduced to it at a workshop run by the Machine Intelligence Research Institute.

We’ll call a program that plays this game a **bot**. The two simplest bots are **CooperateBot** and **DefectBot**, which always cooperate and always defect respectively. CooperateBot is **exploitable** (it gets exploited by at least one other bot), and DefectBot can never achieve mutual cooperation. Can we do better? For starters, does there exist an unexploitable bot that is not DefectBot (equivalently, that achieves mutual cooperation against at least one other bot)?

As it turns out, there is. **CliqueBot** is the bot that checks its opponent’s source code to see if it’s identical to its own source code (which it has access to via quining; see also Kleene’s recursion theorem). If so, it cooperates; otherwise, it defects. CliqueBot defects against every bot except CliqueBot and achieves mutual cooperation with CliqueBot, so it satisfies our requirements.

Another interesting example is **FairBot** (although the version I’m about to define is not the version defined in the blog post I linked to), which executes its opponent’s source code when fed its own source code as input and returns the result as its move. FairBot can be seen as trying to instantiate the symmetry argument for cooperation. However, it ends up running forever when it plays against itself, so it doesn’t in fact achieve mutual cooperation with itself. This can be fixed by modifying FairBot so that it searches for proofs in, say, Peano arithmetic of up to some fixed length that its opponent cooperates and cooperates if it finds such a proof. Somewhat surprisingly, a variant of Löb’s theorem can be used to show that this version of FairBot cooperates with itself (see the blog post previously linked to above for details).

**The total prisoner’s dilemma with visible source code**

To connect back to Cantor’s theorem, we will impose an additional requirement but relax a previous requirement. Namely, we will impose the additional requirement that bots are required to halt: in other words, they can’t run forever but must eventually output either cooperate or defect. However, we will relax the requirement that bots must be programs: in other words, we will allow them to potentially implement arbitrary functions. More precisely, define a **bot environment** to be a set of bots together with a function

describing, for each pair of bots, whether cooperates or defects against opponent . We place no other constraints on the set or the function . For example, any set of programs in the prisoner’s dilemma with visible source code which are guaranteed to halt against each other defines a bot environment.

Say that a bot in a bot environment is a **CantorBot** if it has the property that ; in other words, it cooperates against precisely those bots which do not cooperate with themselves.

**No-CantorBot Theorem:** No bot environment contains a CantorBot.

*Proof.* CantorBot’s move against itself is not well-defined, since its definition requires that .

Cantor’s theorem follows as a corollary. To see this we need to **curry** the function to obtain a function

.

By mapping , we can identify with the power set of , and then we can think of as the function which describes for each bot the set of all bots it cooperates with. If is a surjection, then there must exist bots which exhibit all possible cooperation behaviors against other bots. The above proof shows that this can’t be the case by showing that there does not exist such that

since such an is precisely a CantorBot. (Strictly speaking, the claim that in any bot environment, not all cooperation behaviors are exhibited is the correct restatement of Cantor’s theorem in this language.)

The reasoning above is essentially the same as in the **Barber paradox**, and hence in **Russell’s paradox**, but with “ cooperates with ” in place of “ shaves ” and “ contains ” respectively. Note that in the Barber paradox and in Russell’s paradox there are extra and unnecessary assumptions placed on the “shaves” and “contains” relations; in the No-CantorBot theorem, “ cooperates with ” is an arbitrary relation on a set .

**The halting problem**

One interesting observation about CantorBot is that it exists in the usual prisoner’s dilemma with visible source code: a bot receiving its opponent’s source code can attempt to compute what would do against itself and do the opposite of that. CantorBot will, of course, run forever against many opponents, including itself, which is why the above proof doesn’t apply to this case. In other words, functions described by a program are not really functions but **partial functions**: they may run forever and not produce an output.

We can think of a partial function as an ordinary function , where is the output that corresponds to running forever. And it’s only a very small jump from here to a proof of the **halting theorem**. We just think of the set of programs as a bot environment, equipped with the function

given by if halts given as input and otherwise. When we run this bot environment through the proof of the No-CantorBot theorem, we get the following. Suppose the halting problem is decidable by a Turing machine. Then we can program a CantorBot which, when fed the source code of another bot , halts if runs forever given itself as input and runs forever if halts given itself as input. This CantorBot can neither run forever nor halt given itself as input, and therefore doesn’t exist; contradiction.

Note that Cantor’s theorem, when applied to this situation, only claims that there exists a pattern of halting or running forever when fed various programs as input that isn’t exhibited by an actual program, which implies in particular that there exists an undecidable problem about Turing machines. But the No-CantorBot theorem actually exhibits such a problem, namely the halting problem.

]]>