I want to take a second to thank everyone who's managed to read this far into my blog posts; I know better than most that mathematics is not the most popular subject amongst the masses, and reading material that one finds mundane can be one of the most difficult tasks. However, I imagine that I have found my target audience after four blog posts (unless you're my mom reading for support and error checking — thanks mom), so I will likely continue down this path.
In my experience of theoretical mathematics, I have failed to find a subject that does not — in some way — incorporate algebra. Now when I say algebra, I am not referring to the high school algebra which tests students on the focus of a parabola, long polynomial division, etc; algebra (often referred to as abstract algebra by universities) is the study of algebraic structures. Now an algebraic structure is (in simplified terms) a generalization of our number system and standard operations (i.e. addition, subtraction, multiplication, and division).
Now I'm going to attempt to crash through the definitions of our primary algebraic structures without really stopping to give the background theory of each (at least yet — if I do blog posts on algebraic geometry later on I will need to). To start off, suppose we have some set \(A\); this can be a set of anything from numbers to colors to animals.
Alright, so we're all experts in algebra now. Just in case you need a bit more concrete of an explanation, lets go over a few examples.
Let's start with groups — I like to think of groups as mathematical representations for the symmetry of an object. To motivate this idea, lets consider a triangle composed of three vertices: \(a, b\) and \(c\). It's fairly easy to see that there are 6 possible configurations for a triangle with these points that correspond to the rigid motions of rotation and reflection.
We establish a binary operation between two elements by simply composing one rigid motion after another; thus, rotation by \(120^\circ\) composed with rotation by \(120^\circ\) is the same as rotation by \(240^\circ\), so that \(\rho_1\rho_1 = \rho_2\). This group is commonly known as the dihedral group on \(3\) points and will not be studied in this series. It is worth noting that this is a great example of a non-abelian group since \(\sigma_1\sigma_0 = \rho_1\) while \(\sigma_0\sigma_1 = \rho_2\).
Next, lets consider applications of rings — I like to think of rings as generalizations of polynomials (we will come to see in algebraic geometry that every field has an associated polynomial ring). For example, we can always add and subtract polynomials by looking at terms of the same degree. Moreover, multiplying two polynomials \(f(x)g(x)\) also results in a polynomials; yet, it is not the case that \(\frac{1}{f(x)}\) is a polynomial unless \(f(x)\) is constant. Therefore, the set of polynomials over a field \(k\) (usually denoted \( k[x] \)) is a ring but not a field.
Lastly, we consider fields and algebras over fields; fields are simply generalizations of our number systems \( \mathbb{R} \) and \( \mathbb{C} \) with the usual addition, subtraction, multiplication, and division. In a similar fashion, algebras are generalizations of \( \mathbb{R}^n \) and \( \mathbb{C}^n \) for \(n \geq 2\), equipped with the standard cross-product of vectors \(\times\).
Up to this point, we really haven't used any information from our previous blog posts — in most contexts, algebra and differential geometry are quite distinct. However, what happens when we have a space that is both a smooth manifold AND a group? Suppose \(G\) is a smooth manifold, and \(\oplus : G \times G \to G\) is the binary group operation on \(G\). If the functions
$$ \begin{align} \mu(a, b) &= a \oplus b \\ \iota (a) &= a^{-1} \end{align} $$are both smooth on \(G\), then we call \(G\) a Lie group.
For every element \(g \in G\) we define the "left-translation by \(g\)" function \(l_g : G \to G\) by \(l_g(x) = \mu(g, x) = g \oplus x\) which must also be smooth by smoothness of \(\mu\). Even more interesting, since we defined our inverse function \(\iota(x)\) to be smooth, the function \(l_{g^{-1}}(x) = \mu(\iota(g), x) = g^{-1} \oplus x\) is smooth as well! But since
$$ \begin{align} (l_g \circ l_{g^{-1}})(x) &= g \oplus (g^{-1} \oplus x) \\&= (g \oplus g^{-1}) \oplus x \hspace{2em} (\textrm{associativity}) \\&= e \oplus x \\&= x \hspace{8em} (\textrm{identity}) \end{align} $$is the identity function on \(G\), we must have that \(l_g\) is a diffeomorphism. That's good an all, but what does this tell us about our tangent vectors? Great question, you really are on the road to becoming active listeners! If you recall from my second post, every smooth function \(F\) induces a linear ismorphism on the tangent space via the pushforward \(F_*\). By definition of the identity \(e\in G\), we have that \(l_g(e) = g\) for any \(g \in G\). Thus, the pushforward of \(l_g\) at the identity gives us a function
$$ (l_g)_{*,e} : T_eG \to T_gG $$Therefore, all the information about our Lie group \(G\) is actually contained in a neighborhood of \(e\)! We wish to make this idea a little bit more concrete: suppose we have some vector field \(X : G \to TG\). Since \(l_g\) is a map from \(G\) to \(G\), our pushforward \((l_g)_*\) takes vector fields on \(G\) to vector fields on \(G\). As one would expect, we call our vector field \(X\) left-invariant if
$$ (l_g)_*(X) = X $$Let \(L(G)\) denote the set of all left-invariant vector fields on \(G\). Now since \((l_g)_*(X_e) = X_g\), we need only know our vector field's value at \(X_e\). Thus, we have a clear map \(L(G) \to T_e(G)\) given by \(X \mapsto X_e\). Next, we wish to associate each tangent vector \(Y_e \in T_e(G)\) to a left-invariant vector field. Define the vector field \(\widetilde{Y}\) pointwise by \( (\widetilde{Y})_g = (l_g)_*(Y) \) for each \(g \in G\); then \(\widetilde{Y}\) must be left-invariant since
$$ (l_h)_*(\widetilde{Y}_g) = (l_h)_*((l_g)_*(Y)) = (l_h \circ l_g)_*(Y) = (l_{hg})(Y) = \widetilde{Y}_{hg} $$Giving us a map \(T_eG \to L(G)\) defined by \(Y \mapsto \widetilde{Y}\). In particular, this tells us that \(T_eG \simeq L(G)\).
"This is all good and well," you may say, "but does \(T_eG\) inherit any algebraic structure from \(G\)?" Great question — as we've done with everything so far, we're going to examine this directly. For starters, how might you define a binary operation on two vector fields \(X\) and \(Y\)? Adding them could work. What about multiplication? Well if you remember from my third post, we require that our tangent vectors be derivations, so that they are linear and satisfy
$$ X(fg) = X(f)g + fX(g) $$(also known as the Liebniz rule) for all smooth functions \(f, g\) on \(M\). HOWEVER, when we try the same with the product (or composition — how this is interpreted is a little foggy) \(XY\), we get the following:
$$ \begin{align} XY(fg) &= X( Y(f)g + fY(g)) \\&= X(Yf)g + Y(f)X(g) + X(f)Y(g) + fXY(g) \end{align} $$This is clearly not the Liebniz rule, due to those pesky \(Y(f)X(g) + X(f)Y(g)\) terms. It is worth noting, though, that if we swapped the order of \(XY\) to \(YX\), we would simply get our middle terms as \( X(f)Y(g) + Y(f)X(g) \) which is the exact same due to commutativity! Therefore, we define what is known as the Lie bracket to be the binary operator \( [ \cdot, \cdot] : T_eG \times T_eG \to T_eG \) defined by
$$ [X, Y] = XY - YX $$Ultimately, we will show that the Lie bracket establishes an algebraic structure on \(L(G)\). We've already shown that \( [X, Y] \) allows us to construct a derivation from two derivations — but how do we know that our binary operation is closed. That is, will the Lie bracket of two left-invariant vector-fields in \(L(G)\) give us a left-invariant vector field in \(L(G)\)?
This tells us that \( [\cdot, \cdot] : L(G) \times L(G) \to L(G) \) is a closed binary operator over left-invariant derivations. However, since we showed \(L(G) \simeq T_eG\), we have that it is also a closed binary operator over \(T_eG\). You might not find this distinction particularly useful right away, but it is worth noting that \(T_eG\) forms a vector space over \(G\) (see where this is going?) Hence we have that \( [\cdot, \cdot] : T_eG \times T_eG \to T_eG \) is bilinear by our definition of the Lie bracket, so that
$$ \begin{align} [aX + bY, Z] &= (aX + bY)Z - Z(aX + bY) \\&= (aXZ - ZaX) + (bYZ - ZbY) \\&= a(XZ - ZX) + b(YZ - ZY) \\&= a[X, Z] + b[Y, Z] \\ [X, aY + bZ] &= X(aY + bZ) - (aY + bZ)X \\&= (XaY - aYX) + (XbZ - bZX) \\&= a(XY - YX) + b(XZ - ZX) \\&= a[X, Y] + b[X, Z] \end{align} $$for \(a, b \in G\) and \(X, Y: G \to TG\). Therefore, the Lie bracket \([ \cdot , \cdot]\) is a binary operator over the vector space \(T_eG\) that satisfies the distributive law (with respect to standard addition) and scalar compatibility, so that \(T_eG\) along with the Lie bracket becomes an algebra! We will normally refer to this as the Lie algebra of the Lie group \(G\) and denote it \(\mathfrak{g}\). Pretty cool stuff, huh?
As a bit of an aside, a general Lie algebra is defined to be a vector space \(\mathfrak{h}\) over some field \(k\), along with a binary operation \([\cdot, \cdot] : \mathfrak{h} \times \mathfrak{h} \to \mathfrak{h}\) that satisfies
I know what you might be wondering, and the answer is no — we did not prove axioms 2 or 3 for our example of the Lie bracket \([X, Y] = XY - YX\) on \( \mathfrak{g} = T_eG\). However, axiom 2 should be pretty obvious since \([X, X] = XX - XX = 0\). I will conclude this section with the following theorem:
One would be led to think that since we have covered so much material on smooth manifolds, we are perfectly ready to jump into more advanced content regarding Riemannian manifolds. The overwhelmed reader should be happy to hear that this is not quite the case; in fact, much of what we will be doing for the remainder of this article will simply involve our tangent vectors. From my second post, we know that a vector field over some manifold \(M\) is simply a map \(X : M \to TM \) that acts as a right-inverse to the projection map \(\pi : TM \to M\) (i.e. \(\pi \circ X = id_M\)). However, when we look at our vector field over some coordinate chart \( (U, x_1, \dots, x_n)\), it resembles (and acts like) a differential operator:
$$ X\vert_U = a_1\vert_U \frac{\partial}{\partial x_1} + \dots + a_n\vert_U \frac{\partial}{\partial x_n} $$If \(X\) is a smooth vector field, then the functions \(a_1(p), \dots, a_n(p)\) must be smooth as well (the converse also holds). Ultimately, we denote the set of all smooth vector fields over a manifold \(M\) as \(\mathfrak{X}(M)\). Now we already explored the structure of \(\mathfrak{X}(M)\) as a Lie algebra near the identity in the last section, so we will now turn our attention to operators on \(\mathfrak{X}(M)\).
The first operator we wish to look at should be a fimiliar concept to those who have taken multivariable calculus of any kind; in particular, we will take advantage of both the vector-like nature and derivation nature of a tangent vector \(X_p \in T_pM\). In a coordinate neighborhood of \(p\) we denote the tangent vector \(X_p\) as
$$ X_p = a_1(p) \frac{\partial}{\partial x_1} \Big\vert_p + \dots + a_n(p) \frac{\partial}{\partial x_n}\Big\vert_p $$Alternatively, we could represent \(X_p\) in classical vector notation as some vector \(\overrightarrow{u} \in \mathbb{R}^n\)
$$ \overrightarrow{u} = \begin{pmatrix} a_1 \\ \vdots \\ a_n \end{pmatrix} $$Recall from multivariable calculus that we defined the directional derivative of a function \(f(\overrightarrow{x}) \) along \(\overrightarrow{u}\) at some point \(p \in \mathbb{R}^n\) to be
$$ D_{\overrightarrow{u}} f = \lim_{h \to 0} \frac{ f(p + h\overrightarrow{u}) - f(p) }{h} $$where \( p= (p_1, \dots, p_n)\). However, we may simplify this equation using the chain rule on our local chart to get the following:
$$ \begin{align} D_{X_p} f &= \lim_{h \to 0} \frac{ f(p + h\overrightarrow{u}) - f(p) }{h} \\&= \frac{d}{dt} \Big\vert_{t=0} f(p + t\overrightarrow{u}) \\&= \sum_{i=1}^n \frac{ \partial f}{\partial x_i}\Big\vert_{p} \frac{\partial x_i}{\partial t}\vert_{t=0} \\&= \sum_{i=1}^n \frac{ \partial f}{\partial x_i}\Big\vert_{p} \cdot a_i(p) \\&= \left( \sum_{i=1}^n a_i(p) \frac{ \partial }{\partial x_i}\Big\vert_{p} \right) f \\&= X_p f \end{align} $$Thus, we can see that our tangent vectors already act as directional derivatives on smooth functions! We lastly want to extend this notation a little bit further so that \(D_{X_p}\) acts as an operator on other vector fields over \(\mathbb{R}^n\). Thus, if we account for some second smooth vector field \(Y \in \mathfrak{X}(\mathbb{R}^n)\) with
$$ Y = b_1 \frac{\partial}{\partial x_1} + \dots + b_n \frac{\partial}{\partial x_n} $$Then we may apply linearity to see that
$$ \begin{align} D_{X_p}Y &= D_{X_p}b_1 \frac{\partial}{\partial x_1} + \dots + D_{X_p}b_n \frac{\partial}{\partial x_n} \\&= \sum_{i=1}^n (X_p b_i) \frac{\partial}{\partial x_i} \end{align} $$Thus, we have successfully converted the directional derivative into a smooth operator \(D : \mathfrak{X}(\mathbb{R}^n) \times \mathfrak{X}(\mathbb{R}^n) \to \mathfrak{X}(\mathbb{R}^n)\) using the identification \((D_X Y)_p = D_{X_p}Y\). As one typically does, the first thing we want to do is identify the properties of our newly-created operator \(D : \mathfrak{X}(\mathbb{R}^n) \times \mathfrak{X}(\mathbb{R}^n) \to \mathfrak{X}(\mathbb{R}^n)\). So what kind of properties do we want to explore? Well a pretty easy one is to ask whether the order of our inputs matters — that is, is \(D\) symmetric in the sense that \(D_XY = D_YX\)? Well if we let \(X = \sum_i a_i \frac{\partial}{\partial x_i}\) and \(Y = \sum_i b_i \frac{\partial}{\partial x_i}\), then we can take
$$ D_XY - D_YX = \sum_i (Xb_i - Ya_i) \frac{\partial}{\partial x_i} = \sum_i Xb_i \frac{\partial}{\partial x_i} - \sum_i Ya_i \frac{\partial}{\partial x_i} = XY - YX = [X, Y] $$to show that our directional derivative operator is symmetric if and only if our vector fields commute. Now that we have introduced the Lie bracket to our vector field operations, what can we say about the directional derivative's relationship with the Lie bracket? In particular, is the map \(X \mapsto D_X\) a Lie algebra homomorphism so that
$$ [D_X, D_Y] = D_{[X, Y]} $$holds? Expanding the left-hand side, we see that for \(Z = \sum_i c_i \frac{\partial}{\partial x_i} \in \mathfrak{X}(\mathbb{R}^n)\)
$$ \begin{align} D_XD_YZ &= D_X\left( \sum_i (Yc_i) \frac{\partial}{\partial x_i} \right) = \sum_i (XYc_i) \frac{\partial}{\partial x_i} \\ D_YD_XZ &= D_Y\left( \sum_i (Xc_i)\frac{\partial}{\partial x_i} \right) = \sum_i (YXc_i) \frac{\partial}{\partial x_i} \end{align} $$Combining these two equations gives us
$$ D_XD_YZ - D_YD_XZ = \sum_i (XY - YX)c_i \frac{\partial}{\partial x_i} = D_{[X, Y]} Z $$One last formula I want to show for good measure is the following: suppose we have three vector fields over \(\mathbb{R}^n\), say \(X, Y, Z \in \mathfrak{X}(\mathbb{R}^n)\). The reader may recall from multivariable calculus or linear algebra that we defined the dot product between two vectors \(u, v \in \mathbb{R}^n\) to be
$$ \langle u, v \rangle = u \cdot v = \sum_i u_iv_i $$At each point \(p \in \mathbb{R}^n\), our vector fields evaluated at that point are simply tangent vectors \(X_p, Y_p, Z_p\) — this allows us to define an inner product on our vector fields point-wise
$$ (\langle X, Y \rangle)\vert_p = \langle X_p, Y_p \rangle $$In particular, since we can represent \(X = \sum_i a_i \frac{\partial}{\partial x_i}\) and \(Y = \sum_i b_i \frac{\partial}{\partial x_i}\), we may simplify the formula in the obvious way
$$ \langle X, Y \rangle = \sum_i a_ib_i $$Giving us a smooth function on \(\mathbb{R}^n\)! Using this function, the last formula for our directional derivative that I wanted to provide was the following:
$$ \begin{align} X \langle Y, Z \rangle &= X \left( \sum_i a_ib_i \right) \\&= \sum_i X(a_ib_i ) \\&= \sum_i X(a_i) b_i + \sum_i X(b_i)a_i \\&= \langle D_XY, Z \rangle + \langle Y, D_XZ \rangle \end{align} $$Putting the pieces together, we have ultimately shown three things in this section:
The reader should note that everything we have done in this section has not been on a general manifold \(M\) but instead on our prototype \(\mathbb{R}^n\). I feel that it was necessary to explain the directional derivative on \(\mathbb{R}^n\) first, however, since it gives an unprecedented exposition to affine connections. In most applications, it is enough to consider torsion and curvature from the perspective of the directional derivative on \(\mathbb{R}^n\); however, I hope that after a few more blog posts I will be able to construct a second Manifolds Application post to introduce a more general curvature.
You may have noticed by this point that I titled this series of blog posts "Riemannian Manifolds", and have done nothing to indicate thus far what a Riemannian manifold is. However, after proving the last property of the directional derivative in my previous section (known as metric compatibility), I feel I have set a good enough precedent. For many, one of the first things taught in a multivariable calculus or linear algebra class is the notion of the dot product \(v \cdot v\). Without even bringing manifolds into the discussion, the dot product may be abstractd to what is known as an inner product. As discussed in the link, an inner product \(\langle\, ,\,\rangle : V \times V \to k\) over a (real or complex) vector space \(V\) is simply a way to numerically (i.e. via a scalar) evaluate two vectors in a manner that satisfies:
As was the case for our dot product, we wish to evaluate vectors over our tangent bundle \(TM\) — unfortunately, our tangent bundle isn't actually a vector space in the majority of cases. However, at each point \(p \in M\), the tangent space \(T_pM\) is a vector space. Therefore, we may assign an inner product \(\langle\, ,\, \rangle_p : T_pM \times T_pM \to \mathbb{R} \) to each tangent space, ultimately giving us a map \( p \mapsto \langle \, , \,\rangle_p\). Ultimately, we denote this assignment by \(g : M \to \mathbb{R}\) and say that it is a Riemannian metric if it is smooth. A Riemannian manifold is simply a pair \((M, g)\) where \(M\) is a smooth manifolds and \(g : M \to \mathbb{R}\) is a Riemannian metric on \(M\).
An easy way to think of our Riemannian metric is as a generalization of our dot product on Eucliean space \(\mathbb{R}^n\) — at each point we may compare two tangent vectors \(X_p =\sum_i a_i(p) \frac{\partial}{\partial x_i}, Y_p = \sum_i b_i(p) \frac{\partial}{\partial x_i}\) in some way (for example, via
$$ g(p) = \sum_i a_i(p)b_i(p) $$the standard dot product). However, once we allow \(p\) to vary it no longer holds that
$$ g = \langle X, Y \rangle = \sum_i a_ib_i $$This proves to be an immense difficulty when trying to compare tangent vectors that do not lie on the same tangent plane. For example, when the directional derivative was introduced in the last section, we showed that properties such as torsion and curvature could be defined using our operator \(D : \mathfrak{X}(\mathbb{R}^n) \times \mathfrak{X}(\mathbb{R}^n) \to \mathfrak{X}(\mathbb{R}^n)\). Since we are no longer dealing with \(\mathbb{R}^n\) but now more general Riemannian manifolds \((M,g)\), we wish to find a map
$$ \nabla : \mathfrak{X}(M) \times \mathfrak{X}(M) \to \mathfrak{X}(M) $$that resembles our directional derivative as closely as possible. In particular, recall that we defined a derivation \(d\) to be a linear map that satisfies the Liebniz rule
$$ d(\alpha \beta) : D(\alpha)\beta + \alpha D(\beta) $$Now our directional derivative \(D : \mathfrak{X}(\mathbb{R}^n) \times \mathfrak{X}(\mathbb{R}^n) \to \mathfrak{X}(\mathbb{R}^n)\) satisfied a similar property in the second input \(Y = \sum_i b_i \frac{\partial}{\partial x_i}\):
$$ D_X(f\,Y) = \sum_i X (fb_i)\frac{\partial}{\partial x_i} = \sum_i (X\,f)b_i \frac{\partial}{\partial x_i} + f\sum (X\,b_i)\frac{\partial}{\partial x_i} = (X\,f)Y + f\,D_XY $$but not in the first input
$$ D_{fX}Y = \sum_i fY(b_i)\frac{\partial}{\partial x_i} = f\,D_XY $$Therefore, we define an affine connection to be any map \( \nabla : \mathfrak{X}(M) \times \mathfrak{X}(M) \to \mathfrak{X}(M)\) that satisfies
$$ \begin{align} \nabla_{fX}Y &= f\,\nabla_XY \\ \nabla_X(fY) &= (X\,f)Y + f\nabla_XY \end{align} $$for all smooth functions \(f \in C^\infty(M)\). We can, however, do a little bit better — I hinted at the terminology earlier, but what we now want to introduce are the torsion and curvature. Mimicing our notation for the directional derivative, given an affine connection \(\nabla_XY\) on a Riemannian manifolds \(M\), we define the torsion and curvature to be
$$ \begin{align} T(X, Y) &= \nabla_XY - \nabla_YX - [X, Y] \\ R(X, Y) &= [\nabla_X, \nabla_Y] - \nabla_{[X, Y]} \end{align} $$respectively. Now I should point out that our torsion \(T\) is inherently different from the torsion groups defined in homological algebra; in particular, de Rham cohomology from my third post is a much better tool for determining how much a shape twists. Our torsion map \(T\) more so measures how much our affine connection deviates from a Lie algebra homomorphism.
Since we are trying to mimic the directional derivative as closely as possible, we also wish for an affine connection that acts as a Lie algebra homomorphism; in particular, this will allow us to transport structure from one tangent plane to another as desired. However, the reader should be aware by this point that not every manifold is necessarily flat. Consider the sphere for exampe — at each point the sphere has a positive (and constant) curvature, though we still wish to classify the sphere as a Riemannian manifold. Therefore, we cannot say anything about our curvature tensor \(R(X, Y)\) for a general manifold. Ultimately, we define a Riemannian connection to be an affine connection \(\nabla : \mathfrak{X}(M) \times \mathfrak{X}(M) \to \mathfrak{X}(M)\) that satisfies
To wrap this blog post up, I'll finish with a bit of content that hopefully you recognize from multivariable calculus. As some of you may have noticed, I have gone into excessive detail and applications of our tangent vectors, but failed to discuss anything regarding other types of vectors fields over a manifold \(M\) aside from \(TM\). In fact, the theory of vector bundles is incredibly extensive and will be covered in several future posts from both the differential and algebraic perspectives of geometry.
Getting back on topic, lets consider the trivial example of \(\mathbb{R}^3\): given any surface \(S\), what are the three standard vectors that encompass the Frenet frame? I assume that if you didn't get the question then you probably clicked the link — in either case, the answer is the tangent vector \(\mathbf{T}\), the normal vector \(\mathbf{N}\) and the binormal vector \(\mathbf{B}\). We already discussed the tangent bundle \(TM\) consisting of all our tangent vectors, and the binormal vector won't really come into play — however, the unit normal vector will peak some interest.
For starters, we can easily define the unit normal vector field \(N\) by associating each point \(p \in M\) to the unit normal vector \(N_p\) of our tangent plane \(T_pM\). Now clearly \(N_p\) is not an element of \(T_pM\) so its breaking a bit of notation to call \(N\) a vector field as it isn't a map \(M \to TM\) — we'll ignore this for now.
The most important application of our unit normal vector field \(N\) for the near future is to act as a placeholder for operators we have already defined. Indeed, much of what we have done up to this point is define operators over our Lie algebra \(T_eM\). For starters, given a Riemannian connection \(\nabla : \mathfrak{X}(M) \times \mathfrak{X}(M) \to \mathfrak{X}(M)\) over a Riemannian manifold \(M\), we define the shape operator \(L : \mathfrak{X}(M) \to \mathfrak{X}(M)\) to be
$$ L(X) = -\nabla_XN $$A nice property of our shape operator \(L\) is that it is self-adjoint with respect to our Riemannian metric \(g\). To see this, fix some smooth vector fields \(X, Y \in \mathfrak{X}(M)\) and note that we must have \(\langle X, N \rangle = \langle Y, N \rangle = 0\) since \(N\) is orthogonal to both \(X\) and \(Y\) by definition. But then
$$ \begin{align} 0 &= X\langle Y, N\rangle = \langle \nabla_XY, N\rangle + \langle Y, \nabla_XN \rangle = \langle \nabla_XY, N\rangle - \langle Y, L(X)\rangle \\ 0 &= Y\langle X, N\rangle = \langle \nabla_YX, N\rangle + \langle X, \nabla_YN \rangle = \langle \nabla_YX, N\rangle - \langle X, L(Y)\rangle \end{align} $$so that by torsion-freeness of \(\nabla\)
$$ \langle L(X), Y \rangle - \langle X, L(Y) \rangle = \langle \nabla_XY, N \rangle - \langle \nabla_YX, N \rangle = \langle [X, Y], N \rangle $$However, \([X, Y]\) is also a tangent vector field on \(M\) so we must have \(\langle [X, Y], N\rangle = 0\). Therefore,
$$ \langle L(X), Y \rangle = \langle X, L(Y)\rangle $$The shape operator \(L\) may be a bit difficult to comprehend geometrically, but I find that the easiest way to grasp it is by using our connection \(D\) (i.e. the directional derivative). Using the directional derivative, we have that \(L(X)\) simplifies to \(L(X) = -D_XN\) — thus, our shape operator would measure how much our unit normal vector \(N_p\) changes along some unit vector field \(X\).
Ultimately, we want to measure this change in our normal vector via some scalar value. Thus, using our Riemannian metric \(g\) we are able to measure how much the change in our normal vector along \(X\) differs from our original unit vector field \(X\) via
$$ \langle L(X), X \rangle $$We will refer to this map as the normal curvature, denoted \(\kappa : \mathfrak{X}(M) \to \mathbb{R}\). Now the set of all unit vectors over a point \(p \in M\) (where \(M\) is \(n\)-dimensional) is isomorphic to \(S^n\). Since \(S^n\) is compact, the extreme value theorem tells us that \(\kappa(X_p)\) must have a minimum and maximum — we refer to these values as \(\kappa_1\) and \(\kappa_2\), respectively. Using these values, we may further define the Gaussian curvature \(K\) to be the value
$$ K = \kappa_1\kappa_2 $$Now the Gaussian curvature indicates an extrodinary amount regarding the geometry of a shape, allowing us to easily classify spaces based on their sign. Assuming we are at some point \(p \in M\), if \(\kappa_1 \lt 0 \lt \kappa_2\) then \(K \lt 0\) and the surface is said to have a saddle point at \(p\). However, when both \(\kappa_1\) and \(\kappa_2\) have the same sign, we must have that \(K \gt 0 \) so that \(p\) is an elliptic point and locally looks like a cusp. Lastly, when either \(\kappa_1 = 0\) or \(\kappa_2 = 0\), we have that \(K = 0\) so that our shape just looks like a flat plane thats been folded.
Let's consider an example: the two-dimensional sphere \(S^2 \subset \mathbb{R}^3\) of radius \(r \gt 0\). The first thing to is to identify our unit normal field \(N\) on \(S^2\) — however this is actually much easier than you would think, since at each point the outwards normal vector points in the same direction as the position vector. In other words, every line we draw through the origin will be perpendicular to the surface of our sphere. Therefore,
$$ N = \frac{x}{r} \frac{\partial}{\partial x} + \frac{y}{r} \frac{\partial}{\partial y} + \frac{z}{r} \frac{\partial}{\partial z} $$If we let \(X = f(x, y, z) \frac{\partial}{\partial x} + g(x, y, z) \frac{\partial}{\partial y} + h(x,y,z) \frac{\partial}{\partial z}\), then a simple calculation gives us
$$ \begin{align} L(X) &= -X(\frac{x}{r}) \frac{\partial}{\partial x} - X(\frac{y}{r}) \frac{\partial}{\partial y} - X(\frac{z}{r}) \frac{\partial}{\partial z} \\&= -\frac{f(x,y,z)}{r}\frac{\partial}{\partial x} - \frac{g(x, y, z)}{r} \frac{\partial}{\partial y} - \frac{h(x, y, z)}{r} \frac{\partial}{\partial z} \\&= \frac{-1}{r}X \end{align} $$If we choose \(X\) to specifically be any unit vector, then we have that the map \(\kappa(X) = \langle L(X), X \rangle = \frac{-1}{r}\langle X, X \rangle = \frac{-1}{r}\) is clearly constant. Ultimately, this tells us that the Gaussian curvature of the sphere is \(K = \frac{1}{r^2}\).
The reader should note that if \(e_1, e_2\) is a basis for the tangent plane \(T_pM\) at some point \(p\), then we can express
$$ L(e_1) = ae_1 + be_2 \hspace{5em} L(e_2) = be_1 + ce_2 \ \ \ (\text{note}\ \text{we}\ \text{use}\ b\ \text{twice}\ \text{by}\ \text{self-adjointness}) $$where
$$ a = \langle L(e_1), e_1 \rangle, \hspace{2em} b = \langle L(e_1), e_2 \rangle = \langle e_1, L(e_2) \rangle, \hspace{2em} c = \langle L(e_2), e_2 \rangle $$However, since our curvature is defined by \(\kappa(X) = \langle L(X), X\rangle\), we have that the principle curvautres \(\kappa_1, \kappa_2\) are, in fact, the eigenvalues of the shape operator \(L\). Thus, our Gaussian curvature \(K\) is precisely
$$ K = \kappa_1\kappa_2 = \text{det}L = ac - b^2 = \langle L(e_1), e_1 \rangle \langle L(e_2), e_2 \rangle - \langle L(e_1), e_2 \rangle \langle e_1, L(e_2) \rangle $$At long last, we have come to the pinnacle of this blog post: Gauss' Theorema Egregium. Now to be quite honest, I haven't talked much about the category of Riemannian manifolds \(\textrm{Rm}\) — in other words, I havent really talked much about how to tell when two Riemannian manifolds are basically the same object. For example, when we look locally at a horizontal cylinder of infinite length, it simply looks like a plane thats been rolled up. If that hand-waiving didn't convince you, we could simply consider \(M = \{ (x, y, z) \in \mathbb{R}^3 \mid x^2 + y^2 = 1\}\) and look at the local isometry \(\phi : \mathbb{R}^2 \to M\) given by \(\phi(x, y) = (\cos x, \sin x, y)\). If we look at the tangent plane \(T_pM = \text{span}(e_1, e_2)\), then
$$ d\phi(e_1) = \frac{\partial \phi}{\partial x} = (-\sin x, \cos x, 0) \\ d\phi(e_2) = \frac{\partial \phi}{\partial y} = (0, 0, 1) $$which tells us \(\langle d\phi(X), d\phi(Y) \rangle = \langle X, Y \rangle \) for all \(X, Y \in T_pM\)
We use this example to motivate what is known as an isometry between Riemannian manifolds — in particular, we define an isometry \(\phi : (M, g) \to (\widetilde{M}, \widetilde{g})\) to be a diffeomorphism which satisfies
$$ g(X, Y) = \widetilde{g}(\phi_*X, \phi_*Y) \hspace{4em} \forall X, Y \in \mathfrak{X}(M) $$This was the last piece of machinery needed to state Gauss' Theorema Egregium:
This was a more mechanical proof, and for good reason; Gauss' Theorema Egregium is an incredibly important theorem that describes the inherent nature of an object's curvature. Indeed, even if we cut a slit at the top of a ball, there is no way that we can force a ball into a flat plane without tearing or stretching the ball in some way. As many sources will tell you, Gauss' Theorema Egregium is the primary reason we cannot have a flat map of the Earth without at least distorting some part of the globe.
A final closing example I'll give before I go eat dinner was proposed to me by my masters advisior during my first differential geometry course: consider how you hold a piece of pizza. If you simply support the middle, the front-end of the pizza is bound to droop down (on account of gravity). However, if you fold the pizza like a taco (so that the crust forms a parabola-like shape), then the front-end of the pizza will not droop down (regardless of gravity or other forces)! This is not a physical phenomenon — this is the Theorema Egregium. Like everything else in our reality, a piece of pizza has an intrinsic curvature to it; thus, that pizza cannot curve in both a positive and negative direction (positive along the taco curve, negative along the frontal droop).
That said, thank you guys for reading and I hope you enjoyed! 😁