In this post I’m going to prove the Schroeder-Bernstein theorem and then discuss an order on the class of all sets.
Let $A$ and $B$ be sets. I’ll define $A\leq B$ to mean that there is an injection from $A$ to $B$, and $A \sim B$ will mean there is a bijection from $A$ to $B$. I’m using $\sim$ rather than $=$ to distinguish this from the usual equality between sets, and because it is an equivalence relation.
The Schroeder-Bernstein theorem
Theorem ([Cantor]-Schroeder-Bernstein): for two sets $A$ and $B$, if $A\leq B$ and $B\leq A$ then $A\sim B$.
I like this theorem because even though two sets having the same cardinality requires proving they’re in bijective correspondence, in practice I very rarely actually exhibit a bijection. For example, suppose I want to prove
$$
F := \mathcal P_{fin}(\mathbb N) =\{A \subset \mathbb N : |A| < \infty\}
$$
is countable. I can inject $\mathbb N$ into $F$ via $n\mapsto \{n\}$, and I can inject $F$ into $\mathbb N$ by
$$
A \mapsto \prod_{i \in A} p_i
$$
where $p_1,p_2,\dots$ enumerate the primes; this is well defined since every $A \in F$ is finite so $\prod_{i\in A} p_i$ is too, and it’s an injection by the fundamental theorem of arithmetic. So I’ve just come up with two injections, not a single bijection, but Schroeder-Bernstein lets me stop here because this is sufficient to conclude that a bijection exists.
Pf of theorem: I’m going to construct a bijection $h : A\to B$. I think the form is rather unintuitive when just announced with no context so I’m going to try to build up to it so that it seems like a natural enough thing for a person to have come up with in the first place. Let $f : A\to B$ and $g : B\to A$ be the injections that $|A| \leq |B|$ and $|B| \leq |A|$ guarantee exist.
This means $f$ is an injection and $g^{-1} : g(B) \to B$ is a bijection. If I just take $h = f$ then I’ll miss everything in $B\backslash f(A)$. That makes $h = g^{-1}$ tempting, as $\text{im }g^{-1} = B$, but $h$ needs to be defined on all of $A$ and in general $g^{-1}$ isn’t. Therefore I’m going to consider functions that partition $A$ as $A_f \cup A_g$ and then do
$$
h(a) = \begin{cases}g^{-1}(a) & a \in A_g \\ f(a) & a \in A_f\end{cases}
$$
so I’ll try to make a bijection by getting surjectivity from $g^{-1}$ but adding in some of $f$ to get it defined everywhere and maintain injectivity.
I’ll do this by constructing a sequence of functions $h_0, h_1,\dots$ where each $h_n$ will hopefully improve over the previous ones. In particular, I’ll make the $h_n$ so that they’re always injections but I’ll successively modify the $A_g$ so that in the limit I’m guaranteed a surjection. I’ll make use of a collection of subsets of $B$ which I’ll denote by $B_0,B_1,\dots$, to be defined later.
Because I think it’s a little neater, I will actually test if $f(a)$ is in a certain subset of $B$ rather than if $a \in A_g$ directly, although because $f$ is injective the two are equivalent.
So, since I want to always have injections, I’ll begin with $h_0 = f$, which means I use $g^{-1}$ if $f(a) \in \emptyset$ (so never). This function misses all of $B\backslash f(A)$ but, as promised, it’s at least an injection. Define $B_0 = B\backslash f(A)$.
Now for $h_1$, I want to hit $B_0$ and the only way to do that is via $g^{-1}$, so I’m going to risk missing some of $f(A)$ in order to cover all of $B_0$. But how should I decide if I should apply $g^{-1}$ to a given $a \in A$? I want to apply $g^{-1}$ to exactly the $a \in A$ necessary to cover all of $B_0$. This is exactly $g(B_0)$ ,since $g^{-1}$ is a bijection and $g^{-1}(g(B_0)) = B_0$. Testing $a \in g(B_0)$ is equivalent to testing if $f(a) \in (f\circ g)(B_0)$ so I can take
$$
h_1(a) = \begin{cases}g^{-1}(a) & f(a) \in (f\circ g)(B_0) \\ f(a) & \text{o.w.}\end{cases}
$$
so now $h_1$ hits everything in $B_0$, but misses $f(A \backslash g(B_0))$. Define $B_1 = (f\circ g)(B_0)$ so $h_1$ checks if $f(a) \in B_1$.
Now I need to send more stuff to $f(A)$ in $B$ without missing any of $B_0$. If I reduce the region that I use $g^{-1}$ on then I will miss some points in $B_0$, so my only option is to enlarge the region that I apply $g^{-1}$ to. To that end, consider
$$
h_2(a) = \begin{cases}g^{-1}(a) & f(a) \in B_1\cup B_2 \\ f(a) & \text{o.w.}\end{cases}
$$
where $B_2 = (f\circ g)(B_1)$. Testing $f(a) \in B_1$ still guarantees that I hit all of $B_0$. As for this choice of $B_2$, recall how $h_1$ uses $g^{-1}$ on $g(B_0)$ instead of $f$. This means $f(g(B_0)) = B_1$ is missing from $h_1(A)$ (since $f$ is an injection so $f(A\backslash g(B_0))$ can’t otherwise hit these points). So now, for $h_2$, testing $f(a) \in B_2 = (f\circ g)(B_1)$ is equivalent to testing if $a \in g(B_1)$; and if it is, I apply $g^{-1}$ so now $h_2$ covers $B_1$ which was previously missed. So checking if $f(a) \in B_2 = (f\circ g)(B_1)$ is a tidy way to see if $g^{-1}(a)$ would land in a part of $B$ that I’ve missed.
What $h_2$ in turn misses is $f\left(A\backslash (g(B_1)\cup g(B_2))\right)$. In my very first post I showed that if $g$ is an injection then it commutes with intersections, so I have $g(B_1)\cup g(B_2) = g(B_1\cup B_2)$. This means I can write the set that $h_2$ is missing as
$$
f(A \backslash g(B_1\cup B_2)) \subseteq f(A \backslash g(B_1)).
$$
Thus by monotonically increasing the test region of $f(a)$ from $B_1$ to $B_1\cup B_2$ I have hope that I’m progressively shrinking the region of $f(A)$ that I’m missing.
In general I’ll take $B_{n+1} = (f\circ g)(B_n) = (f\circ g)^{n+1}(B_0)$ and define
$$
h_n(a) = \begin{cases}g^{-1}(a) & f(a) \in \bigcup_{k\leq n} B_k \\ f(a) & \text{o.w.}\end{cases}
$$
so, since $B_1$ is in that union, $h_n$ is still hitting all of $B_0$, and I’m missing
$$
f\left(A\backslash g\left(\bigcup_{k\leq n} B_k\right)\right)
$$
which hopefully gets smaller and smaller.
This gives me my guess for a bijective function: take
$$
h(a) := h_\infty(a) = \begin{cases}g^{-1}(a) & f(a) \in \bigcup_{k \geq 0} B_k \\ f(a) & \text{o.w.}\end{cases}
$$
i.e. I’m testing if $f(a)$ is in the natural upper bound of the collection $\{B_k\}$. Let $A_g = \bigcup_{k \geq 0} B_k$. Also note that I’m now adding $B_0$ in to the test region in $B$, even though $f(a)$ is never in $B_0$. I omitted $B_0$ from the test regions for $h_1$ and $h_2$ just to keep things tidier.
Step 1. $h$ is well-defined
Let $a \in A$. If $f(a) \in A_g$ then $f(a) \in B_k$ for some $k$, so $f(a) = (f\circ g)(b)$ for some $b \in B_{k-1}$. This means $a = g(b)$, so in particular $a \in g(B)$ which means $g^{-1}(a)$ is defined. And if $f(a) \notin A_g$ then $h(a) = f(a)$ which is also well-defined.
Step 2: $h$ is an injection
First, I’ll prove the following lemma.
Lemma: $h(a) \in \cup B_k \iff f(a) \in \cup B_k$
Pf: let $a \in A$. Suppose $h(a) \in \cup B_k$. Assume $f(a) \notin \cup B_k$. Then $h(a) = f(a)$, but $h(a) \in \cup B_k$ so that’s impossible. Therefore $f(a) \in \cup B_k$.
Now suppose $f(a) \in \cup B_k$ so $f(a) = (f\circ g)(b)$ for some $b \in B_{k}$. $f(a) \in \cup B_k$ means $h$ will use $g^{-1}$, so $h(a) = g^{-1}(a) = b \in B_k \subseteq \cup B_k$.
That concludes the proof of the lemma. Now I’ll show $h$ is injective. Let $a_1,a_2\in A$ and assume $h(a_1) = h(a_2)$.
Case I: $f(a_1) \in \cup B_k$.
By the lemma this means $h(a_1) = h(a_2) \in \cup B_k$, and that in turn means $f(a_2) \in \cup B_k$. Therefore
$$
g^{-1}(a_1) = h(a_1) =h(a_2) = g^{-1}(a_2)
$$
so $a_1 = a_2$ by applying $g$ which is well-defined.
Case II: $f(a_1) \notin \cup B_k$
Now by the lemma again $h(a_1) = h(a_2) \notin \cup B_k$ so $f(a_2)\notin \cup B_k$. This means
$$
f(a_1) = h(a_1) =h(a_2) = f(a_2)
$$
so $a_1=a_2$ by $f$ being an injection.
Therefore $h$ is an injection.
Step 3: $h$ is a surjection
Let $b \in B$.
Case I: $b \in \cup B_k$
$b \in B$ so I can apply $g$ to it to get some point $a := g(b) \in A$. This in turn means I can apply $f$ to get $f(a) \in \cup B_k$ by $b \in \cup B_k$; this means $h$ will apply $g^{-1}$ to $a$, so $h(a) = g^{-1}(g(b)) = b$.
Case II: $b \notin \cup B_k$
$B_0 \subseteq \cup B_k$ so this means that $b \notin B_0$, i.e. there is some $a \in A$ with $f(a) = b$. And since $f(a) = b \notin \cup B_k$, $h(a) = f(a) = b$.
So that’s it. $h$ is a bijection.
$\square$
That proof definitely took a lot longer than it could have, but I feel like I at least have a better idea of where $h$ came from. The first proof of this that I saw didn’t motivate the choice of $h$ at all and I felt that that made it much less satisfactory.
Establishing a partial order on the class of all sets
Let $S$ be the proper class of all sets and consider $(S, \leq)$.
Result 1: $(S,\leq)$ is a poset
Pf: I need to show $\leq$ is reflexive, antisymmetric, and transitive on $S$.
Reflexivity: let $A \in S$. Then $\text{id} : A \to A$ gives an injection so $A\leq A$.
Antisymmetry: let $A, B \in S$ and suppose $A\leq B$ and $B \leq A$. Then by Schroeder-Bernstein $A \sim B$, so Schroeder-Bernstein can actually be viewed as proving exactly this, that $\leq$ is antisymmetric on $S$.
Transitivity: let $A,B,C \in S$ and suppose $A \leq B$ and $B \leq C$, so there are injections $f : A \to B$ and $g : B\to C$. Consider $h := g \circ f$ which is a function from $A$ to $C$. Suppose $h(a_1) = h(a_2)$ so $g(f(a_1))=g(f(a_2))$. By $f$ and $g$ being injections this means $a_1=a_2$, so $h$ is an injection too and therefore $A \leq C$.
Thus $(S, \leq)$ is a partial order.
$\square$
The final piece is as follows:
Result 2: if $\sf{AC}$ is assumed then for every pair of sets $A$ and $B$, either $A\leq B$ or $B \leq A$.
Pf: let $\mathscr F$ be the set of all injections from subsets of $A$ to subsets of $B$. A function $f : A’\to B’$ for $A’ \subseteq A$, $B’\subseteq B’$ is equivalent to a subset of $A\times B$ as $f$ formally is the set $\{(a, f(a)) : a \in A’\}$. This means $\mathscr F$ can be thought of as a subset of $\mathcal P(A\times B)$; $\mathcal P(A\times B)$ is partially ordered by $\subseteq$, so $(\mathscr F, \subseteq)$ is also a poset.
For $f, g \in \mathscr F$, $f \subseteq g$ means
$$
(a,b) \in f \implies (a,b) \in g.
$$
So suppose $f(a) = b$ for some $a \in \text{dom}(f)$, and $f \subseteq$. Then $(a,f(a)) \in g$, so $a\in\text{dom}(g)$, and by $g$ being well defined this means $g(a) = f(a)$. So if $f\subseteq g$ then $\text{dom}(f)\subseteq\text{dom}(g)$ and $f =g$ on $f$’s domain. Thus $g$ is an extension of $f$.
So consider a chain $C$ in $\mathscr F$. By being totally ordered this means that $C$ is a collection of functions extending each other, and the domains and ranges of elements of $C$ are chains in $\mathcal P(A)$ and $\mathcal P(B)$ respectively. I want to find an upper bound in $\mathscr F$, so a natural thing to try is to take a function between an upper bound of the domains and the ranges. So for $f \in C$ define $D_f = \text{dom}(f)$ and $E_f = \text{im}(f)$ and consider
$$
h : \bigcup_{f\in C} D_f \to \bigcup_{f\in C} E_f.
$$
For $a \in \text{dom}(h)$, there must be some $f \in C$ such that $a \in D_f$ so define $h(a) = f(a)$ for that $f$. $h$ is well-defined because even though there may be many $f$ such that $a \in D_f$, these functions are all in $C$ so they are all extensions of each other which means they all send $a$ to the same thing. So it doesn’t matter which $f$ is chosen and $h$ is well defined.
Step 1: $h \in \mathscr F$
Clearly $h$ is a function between a subset of $A$ and a subset of $B$, so I just need to show that it is an injection. Suppose $h(a_1) = h(a_2)$. $a_1 \in D_f$ and $a_2 \in D_g$ for some $f,g \in C$.
It is always the case that $h(a) = f(a)$ for some $f \in C$, possibly depending on $a$, so $f(a_1) = h(a_1) = h(a_2) = g(a_2)$ for $f, g \in C$. For $f$ and $g$ either $f \subseteq g$ or $g \subseteq f$ by $C$ being a chain; WLOG take $f \subseteq g$. Then $g$ extends $f$ so $f(a_1) = g(a_1)$. This means $g(a_1) = g(a_2)\implies a_1=a_2$ by $g$ being an injection. Therefore $h$ is an injection so $h \in \mathscr F$.
Step 2: $h$ is an upper bound of $C$
Let $f \in C$ and consider $(a,b) \in f$. $D_f \subset \cup D_f$ so $a \in \text{dom}(h)$ which means $h(a)$ is defined. And by its construction $h(a) = f(a)$ since $a \in D_f$. This means $(a, b) \in h$ so $f \subseteq h$. $f$ was arbitrary so $h$ is an upper bound in $\mathscr F$ for $C$.
Now I can apply Zorn’s lemma to produce a maximal element $t \in \mathscr F$, where $t : A’ \to B’$, say.
Suppose $a_0 \in A\backslash A’$ and $b_0 \in B\backslash B_0$. Then I can extend $t$ to an injection $t’$ on $A’\cup\{a_0\}$ to $B’\cup\{b_0\}$ by setting $t'(a_0) = b_0$, so $t \subseteq t’$ and both are in $\mathscr F$. But $t$ was maximal so this is a contradiction. Therefore it must be that either there is no such $a_0$ or no such $b_0$, which means either $A = A’$ or $B = B’$ (or both). If $A=A’$ then I have an injection from $A$ to $B$ which means $A\leq B$. If $B=B’$ then I have a surjection from $A’$ to $B$, which I can extend to a surjection from $A$ to $B$ by mapping everything in $A\backslash A’$ to some $b \in B$, and therefore $B\leq A$. So either way, at least one of $A\leq B$ and $B\leq A$ holds.
$\square$
Corollary: if $\sf{AC}$ is assumed then $(S,\leq)$ is a total order.
Pf: I need to show that for every $A,B \in S$ either $A \leq B$ or $B \geq A$. This is true from Result 2.
$\square$
So, in conclusion, regardless of $\sf{AC}$ being assumed or not, $(S,\leq)$ is a partial order, i.e. the class of all sets can be partially ordered by cardinality comparisons. But if $\sf{AC}$ is also assumed then every pair of cardinals can be compared so the class of all sets is totally ordered w.r.t. cardinality comparisons.
The standard definition of a cardinal number that I’ve seen is as an equivalence class of $\sim$, so the class of all cardinal numbers would be $S / \sim$. For two cardinal numbers $\mathfrak a$ and $\mathfrak b$, $\mathfrak a \leq_c \mathfrak b$ if and only if $A\leq B$ for $A \in \mathfrak a$ and $B \in \mathfrak b$. Thus this order on $S$ produces an analogous order on $S/ \sim$ so I could have done this with cardinal numbers instead of sets had I wanted to.