30 December 2017

Rotations in spacetime

...or, How to Guess Special Relativity from Geometry.

Non-relativistic physics takes place on a background of 3-dimensional Euclidean space. Time is incorporated into this picture as a global parameter used to describe motion in this space. We can write things in terms of a 4-dimensional "space-time", but this is something of a triviality as space and time have very little to do with each other. If space (and its contents) at any given instant could be described as a flat slice, then our "space-time" would be like a stack of slices with time increasing vertically, but this picture is not really any different to looking at each slice individually (in the correct order). This artificiality is reflected in the Galilean transformations, which describe how inertial (non-accelerating) observers interpret space and time coordinates - in 3 spatial dimensions, these are given by $(t',\mathbf{x}')=(t,\mathbf{x}-\mathbf{v}t)$, with $t',t\in\mathbb{R}$, $\mathbf{x}',\mathbf{x},\mathbf{v}\in\mathbb{R}^3$ for observers with a relative velocity of $\mathbf{v}$. Note that the time coordinate is unaffected by the motion, leading to a sort of asymmetry between space and time.

The Galilean transformations are important from a physical perspective because they relate to how observers (for example, experimental physicists) perceive the universe. However, there are important "pre-physical" geometric transformations of interest as well. The ones of primary interest are the translation in time, the translations in space and the rotations. Translations are given by $(t',\mathbf{x}')=(t+s,\mathbf{x}+\mathbf{y})$, where $s\in\mathbb{R}$ and $\mathbf{y}\in\mathbb{R}^3$, with $s$ and $\mathbf{y}$ carrying the interpretations as shifts in time and space respectively. We won't focus on these as they're basically the same in the relativistic case (and are arguably harder to deal with). Rotations however take the apparently even more simple form of $(t',\mathbf{x}')=(t,\mathbf{Rx})$, for $\mathrm{SO}(3)$, but this is really something of a sleight of hand of notation. To see why, let's investigate rotations a little more deeply.

Rotations can be thought of as (linear) transformations which preserve lengths and angles, so in order to talk about rotations it's necessary to impose some way to measure those things. The way we will do that is using a metric, which is a bilinear (takes two inputs and is linear in both) form (the objects it takes is vectors and it spits out a scalar). As one can tell from this description, when acting on pairs of vectors the metric looks much like an inner product, and in fact for the case of $\mathbb{R}^n$ the Euclidean metric (which we will henceforth use) defines the standard inner product in this way. Calling our metric $\delta$, for vectors $\mathbf{u}$ and $\mathbf{v}$ we have $\delta(\mathbf{u},\mathbf{v})=\mathbf{u}\cdot\mathbf{v}\equiv\langle\mathbf{u},\mathbf{v}\rangle\equiv\mathbf{u}^\mathrm{T}\mathbf{v}\in\mathbb{R}$, to give an overview of common notations.

First, we note that the norm induced by the inner product/metric is given in the standard way by $\lVert\mathbf{u}\rVert=\delta(\mathbf{u},\mathbf{u})$. If a transformation (/operator/matrix), denoted by $\mathbf{M}$, preserves the length of $\mathbf{u}$ then $\lVert\mathbf{Mu}\rVert=\lVert\mathbf{u}\rVert$. If $\mathbf{M}$ is length-preserving, then the requirement for it to be angle-preserving is expressed as $\delta(\mathbf{Mu},\mathbf{Mv})=\delta(\mathbf{u},\mathbf{v})$ (note that in this case, length-preservation is a special case of angle-preservation). Using transpose/matrix notation, this can be written as $(\mathbf{Mu})^\mathrm{T}(\mathbf{Mv})=\mathbf{u}^\mathrm{T}\mathbf{M}^\mathrm{T}\mathbf{Mv}=\mathbf{u}^\mathrm{T}\mathbf{v}$, which clearly implies that $\mathbf{M}^\mathrm{T}\mathbf{M}=\mathbf{I}$, or in other words, $\mathbf{M}^\mathrm{T}=\mathbf{M}^{-1}$. Finally, we will also impose that the determinant of $\mathbf{M}$ is equal to $+1$ - the length-preservation requirement demands $\mathrm{det}(\mathbf{M})=\pm1$, but the $-1$ case corresponds to reflections, which we are not interested in here. As a side-note, as a matrix, the identity $\mathbf{I}$ is exactly equivalent to the metric in Cartesian coordinates $\mathbf{\delta}$, so we could equally write this condition in the somewhat unusual form $\mathbf{M}^\mathrm{T}\mathbf{\delta M}=\mathbf{\delta}$ (although one could argue the identity matrix was secretly there all along in the definition of the inner product $\delta(\mathbf{u},\mathbf{v})=\mathbf{u}^\mathrm{T}\delta\mathbf{v}$). In any case, all of the above is what it means for $\mathbf{M}$ to be a rotation, also known as a "special orthogonal transformation", hence the name of the space $\mathrm{SO}$. For a rotation in 3 spatial dimensions, we write this as $\mathbf{M}\in\mathrm{SO}(3)$.

Now, what happens when instead of 3-dimensional Euclidean space, we instead examine 4-dimensional Minkowski space(time)? A priori there is little reason to do this in particular, outside of a certain limited sense of curiosity, but let us continue regardless. In this case, rather than position vectors $\mathbf{x}$, we have position 4-vectors $x$, which we can coordinatise by $x=(x^0,\mathbf{x})$ where $x^0$ is a time coordinate baked into the space, and the spatial coordinates are as in the 3-dimensional Euclidean case. This is like, rather than a collection of space-slices stacked time-wise, a single block of spacetime we can choose to slice horizontally, or diagonally, or at whatever angle we choose (up to 45°), picking our space and time directions as appropriate. Of course, this picture isn't yet completely justified, but it can be with the formalism we will have developed by the end of this post. In any case, we must note that what distinguishes any viable choice of $x^0$ from $x^i$, $i=1,2,3$ is that, in matrix form, our metric takes the form $\eta=\mathrm{diag}(-1,1,1,1)$ - this is critical, as the fact that the first entry (the "00" component) is negative is what makes the time coordinate different to the spatial ("$ii$") ones (recall for comparison that $\mathbf{I}_3=\delta=\mathrm{diag}(1,1,1)$).

The fact that our metric is different immediately suggests that we are free (or perhaps required) to modify our idea of a rotation, which we defined using the Euclidean 3-metric. What is the Minkowski equivalent? Let's consider first the angle-preservation requirement. Now we have $\eta(Mu,Mv)=\eta(u,v)$, but this time the matrix form of the inner product is given by $\eta(u,v)\equiv u^\mathrm{T}\eta v$, so we have $u^\mathrm{T}M^\mathrm{T}\eta Mv=u^\mathrm{T}\eta v\implies M^\mathrm{T}\eta M=\eta$. This is encouraging, because it's exactly the same as the requirement for angle-preservation in the 3-dimensional case but with the Euclidean metric swapped for the Minkowski metric (and the $3\times3$ rotation matrices swapped out for these mysterious $4\times4$ "rotation" matrices). As before, we will demand that $\mathrm{det}(M)=+1$, and having done so we can write $M\in\mathrm{SO}(1,3)$, the set of "indefinite special orthogonal matrices (of signature (1,3))". A bit more of a mouthful this time around!

Having names and labels is convenient, but it doesn't give us much of a feel for what we've found. To do that, let's simplify, and consider rather than the case of 1 time dimension and 3 space dimensions (a (1,3) signature), the case of 1 time dimension and 1 space dimension (a (1,1) signature). This might seem odd at first, given you can't exactly rotate in 1 spatial dimension, but it will pay off! Let's consider an arbitrary $2\times2$ matrix $A=\left(\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right)$. The angle-preservation requirement then means
\begin{equation*}
A^\mathrm{T}\eta A=\begin{pmatrix}a&c\\b&d\end{pmatrix}\begin{pmatrix}-1&0\\0&1\end{pmatrix}\begin{pmatrix}a&b\\c&d\end{pmatrix}=\begin{pmatrix}c^2-a^2&cd-ab\\cd-ab&d^2-b^2\end{pmatrix}=\eta=\begin{pmatrix}-1&0\\0&1\end{pmatrix}.
\end{equation*}
This gives us 3 equations:

  1. $c^2-a^2=-1$ or equivalently $a^2-c^2=1$,
  2. $d^2-b^2=1$,
  3. $cd-ab=0$ or $ad=bc$.
Three equations for four variables means we must express the solution in terms of a parameter, which we will call $\phi$. In that case, we have 2 options (up to signs) which solve the above equations equally well: $a=d=\cosh(\phi)$ and $b=c=\sinh(\phi)$, or $a=-d=\cosh(\phi)$ and $b=-c=\sinh(\phi)$. We can distinguish between them by demanding that $\det(A)=+1$, which is satisfied by the former option. So, now we've found the explicit form for matrices in $\mathrm{SO}(1,1)$... what can we make of them?

Figure 1: On the left, two circular sectors, each of equal Euclidean area subtending an equal circular angle $\theta_0$. Each can be considered as the other one but (circularly) rotated (red arrows).
On the right, three hyperbolic sectors, each of equal Euclidean area subtending an equal hyperbolic angle $\phi_0$. Each can be considered the same as the others but (hyperbolically) rotated (red arrows). Note that the hyperbolic sector subtends varying circular angles as it's hyperbolically rotated through, but the hyperbolic angle subtended by the sector remains constant. Be careful that this does not lead to confusion.

As it happens, the matrices $\left(\begin{smallmatrix}\cosh(\phi)&\sinh(\phi)\\ \sinh(\phi)&\cosh(\phi)\end{smallmatrix}\right)$ describe what are called hyperbolic rotations. Just like a circle can be parametrised by the (circular) angle $\theta\in[0,2\pi)$, and (circular) rotations leave circles invariant, rectangular hyperbolas can be parametrised by the hyperbolic angle $\phi\in(-\infty,+\infty)$ and rectangular hyperbolas are left invariant by hyperbolic rotations (Fig. 1). These hyperbolic rotations are sometimes called squeeze mappings, because they preserve (Euclidean) areas without rotating or shearing by squeezing (or stretching) them in some sense (Fig. 2). In fact, analogous to the circular case, the exact parametrisation of a hyperbola by $\phi$ can be related to the angle subtended by a hyperbolic sector under hyperbolic rotation, although the details are not important for our purposes.

Figure 2: The reason hyperbolic rotations are sometimes called "squeeze mappings" is that a shape which is hyperbolically rotated becomes 'squeezed' taller and thinner or wider and flatter, depending on the sign of the rotation. One way to think of this is by taking a shape by its bounding box, holding one corner in place and moving its opposite corner along a hyperbola. In any case, note that the Euclidean area is constant throughout the hyperbolic rotation!
What we are more interested in are the physical interpretations of this hyperbolic rotation. In order to investigate, let's consider the action of $A$ on a position vector $(x^0,x^1)$ in Minkowski space, noting that our Cartesian coordinates are given by $x^0=ct,x^1=x$ in dimensionful units ($c$ is a constant with dimensions of [length/time] (the same as speed) such that $ct$ has units of length to match $x$):
\begin{equation*}
\begin{pmatrix}ct'\\x'\end{pmatrix}=\begin{pmatrix}\cosh(\phi)&\sinh(\phi)\\ \sinh(\phi)&\cosh(\phi)\end{pmatrix}\begin{pmatrix}ct\\x\end{pmatrix}=\begin{pmatrix}\cosh(\phi)ct+\sinh(\phi)x\\\sinh(\phi)ct+\cosh(\phi)x\end{pmatrix}.
\end{equation*}
The meaning of this is not immediately obvious, so let's consider a special case of interest: the world-line of an observer at rest within their own frame, i.e., the line given by $x=0$. This line, when transformed, clearly takes the form $(ct',x')=(\cosh(\phi)ct,\sinh(\phi)ct)$, or written differently, the pair of equations $t'=\cosh(\phi)t$, $x'=\sinh(\phi)ct$. We can divide one equation by the other to give $x'/t'=\tanh(\phi)c$ (noting that, comparable to the ordinary trigonometric case, $\sinh(\phi)/\cosh(\phi)=\tanh(\phi)$). This is significant because it strongly suggests that $\tanh(\phi)c$ is a speed, in particular the speed of the worldline $(ct',x')$ with respect to the $(ct,x)$ frame of reference (Fig. 3). This is born out by the standard definition of speed in one-dimension, $v\equiv x/t$, which in this case would mean that $\tanh(\phi)=v/c$, which agrees with the fact that $\tanh(\phi)$ ranges from $-1$ to $+1$ as $\phi$ ranges from $-\infty$ to $+\infty$. One should note that if this is correct, then it implies that the speed of an inertial (straight) worldline with respect to a stationary observer ($v$) can never exceed $c$ (because $\lvert\tanh(\phi)\rvert<1$ for all $\phi$)! Furthermore, we note that $\tanh(\alpha+\beta)=\frac{\tanh(\alpha)+\tanh(\beta)}{1+\tanh(\alpha)\tanh(\beta)}$, which immediately yields the otherwise peculiar-looking relativistic velocity addition formula $u=\frac{v+w}{1+vw/c^2}$ (constrast with the non-relativistic $u=v+w$).

Figure 3: Three reference frames, $(ct,x)$ (dark colours), $(ct',x')$ (medium colours) and $(ct'',x'')$ (light colours) on a 1+1-dimensional spacetime diagram. The frames are related by Lorentz transformations/hyperbolic rotations, and correspond to three observers with differing velocities who all pass each other at a point which is the origin of this diagram. Note that for each frame, the time and space coordinate directions are orthogonal with respect to the Minkowski metric (though clearly not with respect to the Euclidean metric - circular angles are not hyperbolic angles!).
 Their worldlines correspond to the paths $x=0$, $x'=0$ and $x''=0$ - each considers themselves to be stationary, though clearly their (red) worldlines depict differing velocities. In fact, on this diagram, the dark observer would consider the other two frames to be Lorentz transformed by hyperbolic angles/rapidities of $\phi'\simeq0.28$ and $\phi''\simeq0.66$. These correspond to velocities of approximately $0.27c$ and $0.54c$ respectively. Note that twice the velocity does not give twice the rapidity, as per the velocity addition formula! It should also not be overlooked that in the dark colour frame, the other time coordinates are quite literally hyperbolically rotated into the spatial direction and the other space coordinates are equally hyperbolically rotated into the temporal direction!
Some additional information about this spacetime diagram can be found in the footnotes.
Before going further, it is worthwhile to remark upon just how remarkable this is. From purely geometrical concerns and the coordinate identification $(x^0,x^1)=(ct,x)$ with dimensional arguments we have managed to link rotation (albeit the strange hyperbolic version of it) with something as concrete as velocity. What's weirder, the (hyperbolic) rotation we're talking about is the rotation of a spatial coordinate into the time direction, and vice versa! Not just that, we seem to have stumbled on a universal speed limit, entirely by accident. That's pretty cool.

In this context, the hyperbolic angle $\phi=\tanh^{-1}(v/c)$ is referred to as the rapidity. It's natural to ask next what our hyperbolic rotation matrix looks like in terms of velocity rather than rapidity. Substituting in, we find $\cosh(\phi)=\frac{1}{\sqrt{1-(v/c)^2}}$ and $\sinh(\phi)=\frac{v}{c}\frac{1}{\sqrt{1-(v/c)^2}}$. If we denote by the "Lorentz gamma" $\gamma=\frac{1}{\sqrt{1-(v/c)^2}}$ then we can write our hyperbolic rotation matrix as
\begin{equation*}
\Lambda=\begin{pmatrix}\gamma&\gamma v/c\\ \gamma v/c&\gamma\end{pmatrix}.
\end{equation*}
This transformation is known as a Lorentz transformation (hence its renaming $\Lambda$). The Lorentz transformations are the main transformations under which the equations of electromagnetism are invariant, and using this fact one can determine that the speed $c$ in fact corresponds to the speed of light in a vacuum, although that exercise is beyond the scope of this post. In terms coordinates, the Lorentz transformations can be written as $t'=\gamma(t-vx/c^2)$ and $x'=\gamma(x-vt)$. It is worthwhile to consider these equations in 'natural units' where $c=1$ and compare their forms to the Galiliean transformations we shoehorned in at the start of the post. In exactly the way the Galilean transformations were awkwardly asymmetric between space and time, the Lorentz transformations are beautifully symmetric, reflecting the fact that they emerged not a posteriori, inserted by hand, but as a natural property of the geometry of the Minkowski spacetime. And, in fact, this 1+1-dimensional case extends naturally to the 1+3-dimensional one; rather than $\Lambda\in\mathrm{SO}(1,1)$ we instead consider $\Lambda\in\mathrm{SO}(1,3)$, with all the requisite changes in machinery to go along with the change in spatial dimension. Note that $\mathrm{SO}(1,3)$ (thankfully) contains as a subgroup $\mathrm{SO}(3)$, the group of ordinary rotations in 3-dimensions, just as we expect it should if Minkowski spacetime is to have any meaningful correspondence to our experienced reality! This should give us even more confidence that $\mathrm{SO}(1,3)$ correctly generalises the characterising space of rotations $\mathrm{SO}(3)$ on non-relativistic space to its equivalent on relativistic spacetime.

Now, all of the above is a long, long way from a complete characterisation of the spaces $\mathrm{SO}(p,q)$ and $\mathrm{SO}(n)$ (even in the dimensions we looked at) and their relations to Minkowski and Euclidean space(time)s. It's also a long way from describing the full richness of the theory of relativity and its many important consequences and subtleties. In fact, all of the information I've touched on above is really just the first scratch of the surface a student might get in both of these areas; it just happens that they turn out to be connected in a way which most students won't find out about until much later, and I hope I've managed to spark some interest by bringing that connection to light perhaps a little earlier than might otherwise happen.

Footnotes

The diagonal dotted line corresponds to the speed of light and is at $\phi=+\infty$ according to all observers, and appears exactly diagonal in all observers' spacetime diagrams. This reflects the invariance of the speed of light with respect to all reference frames.

The points where the hyperbolas intersect the coordinate lines denote equal invariant distances from the origin in the time and space directions in each frame. As a side note, the diagram gives an intuitive explanation for length contraction - the distances from the origin to $x_0$, and the origin to $x''_0$ have the same invariant length (they have the same length in their respective rest frames) but $x''_0$ is measured to be shorter than $x_0$ by the dark colour observer and $x_0$ is measured to be shorter than $x_0''$ by the light colour observer. A similar construction can be made for time dilation.

10 March 2015

News (2015/03/10)

This news post will be a short one as there isn't actually all that much news to report. One thing I will say is that in terms of the blog I've been working on an exciting new project, a series of posts on a topic which has interested me for a number years now, exterior algebra and exterior calculus. It will be a slightly different format to the usual, with shorter posts in greater numbers. This has been much easier for me to write and I hope it will be much easier for you to read. I haven't yet decided on how to release them, but I am thinking of perhaps on a weekly or twice-weekly basis once they are all complete. As such the next post will likely not be for a little while, but after it comes there will be a steady rate of updates for a decent period of time.

The main reason for this news post is that on the 8th of March it was International Women's Day. I know I'm a couple of days late, but if anything, previous news posts should show that that is par for the course when it comes to these things. I am personally a very strong believer in improving the participation and representation of women in science and engineering which is generally dismal to say the least (and very well-documented if you are interested in reading more about it, which I strongly suggest). As such I was heartened to see a lot of articles posted on or around International Women's Day that were about great women in science (and especially physics, astronomy and mathematics). I thought I'd share some of them here, along with a couple of others I've seen in the past. I hope that you will learn something about someone you may not have heard of and gain an appreciation for the incredible work that they've done, and perhaps even find yourself a role model. Remember, systemic discouragement of women towards science only turns great minds elsewhere -- these women (and many others) are evidence enough that women can and do excel in science!

Marie Curie (xkcd)
Pioneering Women of Physics (Perimeter Institute)
You probably haven’t heard of these five amazing women scientists – so pay attention (The Conversation)
Chien-Shieng Wu: The First Lady of Physics (From Quarks to Quasars)
Of symmetries, the strong force and Helen Quinn (Symmetry Magazine)
How Ada Lovelace, Lord Byron’s Daughter, Became the World’s First Computer Programmer (Brain Pickings)
Mildred "Millie" Dresselhaus (Physics Central)

23 February 2015

Fourier fun 3: You can trip on my synthesiser

This post is a sequel to my earlier post Fourier Fun 2: The full monty, which derives the full expression for the one-dimensional Fourier series. If you have not read that post, you may wish to do so before reading this one. 

Some practical applications

We have already examined the Fourier series for a number of periodic functions such as the square wave and the triangle wave, and for these waves the Fourier series is (more or less) exactly convergent. One might wonder whether the Fourier series could equally be applied to arbitrary functions. The answer is yes, but as we are expressing the function in terms of sines and cosines which are periodic, the series will only be particularly useful near to the point about which we are expanding. However, this is always the case when considering series expansions and so should not be cause for concern. Quite to the contrary, the Fourier series provides a powerful tool for approximating functions in terms of sines and/or cosines (which may render it much more amenable to analysis) by taking the truncated (finite) Fourier series about the point of interest. This is one of the tools of Fourier analysis, and is one very useful application of the Fourier series.

Taking the the Fourier series of a function and looking at its components is often referred to as Fourier decomposition, and lies at the core of Fourier analysis. It is possible, however, to 'go the other way' and form a function by adding sines or cosines. This process is known as additive synthesis, and is often used, for example, in the context of music as the main principle behind electronic synthesisers. To explain, in music we can think of a note as being a sound with a pitch (a single sound frequency). But, for example, a middle C (C4) will sound different on a piano compared to a guitar. This is because every instrument produces a unique timbre (pronounced "tam-ber") that is determined by other additional frequencies mixed in with the fundamental frequency (the one with the largest amplitude which determines the note). Before we go on, let's take a slight diversion.

Complex exponential form

As a reminder, our "full" equation for the Fourier series in one dimension is given by
\begin{equation}
f(t)=\frac{a_0}{2}+\sum_{n=1}^{\infty}\Bigg(a_n\cos\left(\frac{2\pi nt}{T}\right)+b_n\sin\left(\frac{2\pi nt}{T}\right)\Bigg)
\end{equation}
where
\begin{align}
a_n&=\frac{2}{T}\int_{t_0}^{t_0+T}\! f(t)\cos\left(\frac{2\pi nt}{T}\right)\,\mathrm{d}t\\
b_n&=\frac{2}{T}\int_{t_0}^{t_0+T} \! f(t)\sin\left(\frac{2\pi nt}{T}\right)\,\mathrm{d}t
\end{align}
are known as the Fourier coefficients.

What we want to do now is use Euler's formula, one of the most famous formulas in all of mathematics (and rightly deserving of a post of its own). Euler's formula states that
\begin{equation}
\cos(\theta)+i\sin(\theta)=e^{i\theta}
\end{equation}
where $i$ is the imaginary unit (the number such that $i^2=-1$). For the purposes of this post we will simply take this formula for granted and leave its deeper mysteries for another time. Now, in Note 2 of the previous Fourier Fun post we showed that we can express the Fourier series in terms of complex exponentials thanks to Euler's formula. I will reproduce a condensed version of that demonstration here:

Let us define a set of terms $c_n$ in terms of the known Fourier coefficients $a_n$ and $b_n$ from the Fourier series formula:
\begin{equation*}
c_n=
\begin{cases}
\frac{1}{2}(a_n-ib_n) & \text{for } n > 0 \\
\frac{a_0}{2} & \text{for } n = 0 \\
\frac{1}{2}(a_n+ib_n) & \text{for } n < 0.
\end{cases}
\end{equation*}
Using these new $c_n$ coefficients, let's define an infinite summation
\begin{equation}
f(\theta)=\sum_{n=-\infty}^\infty c_n e^{in\theta}.
\end{equation}
We now break the sum into the three cases for $n$ (positive, zero and negative) and substitute in Euler's formula:
\begin{align*}
f(\theta)&=\sum_{n=-\infty}^{\infty}c_n e^{in\theta}\\
&=\sum_{n=1}^{\infty}c_{n} e^{in\theta}+c_0 e^0+\sum_{n=-\infty}^{-1}c_{n} e^{in\theta}\\
&=\sum_{n=1}^{\infty}c_{+n} e^{+in\theta}+c_0+\sum_{n=1}^{\infty}c_{-n} e^{-in\theta}\\
&=\frac{a_0}{2}+\sum_{n=1}^{\infty}\left(\frac{1}{2}(a_n-ib_n)\big(\cos(n\theta)+i\sin(n\theta)\big)\right.\\
&\qquad \left.+\frac{1}{2}(a_{-n}-ib_{-n})\big(\cos(-n\theta)+i\sin(-n\theta)\big)\right)\\
&=\frac{a_0}{2}+\sum_{n=1}^{\infty}\left(\frac{1}{2}(a_n-ib_n)\big(\cos(n\theta)+i\sin(n\theta)\big)\right.\\
&\qquad \left.+\frac{1}{2}(a_n+ib_n)\big(\cos(n\theta)-i\sin(n\theta)\big)\right)\\
&=\frac{a_0}{2}+\sum_{n=1}^{\infty}\big(a_n\cos(n\theta)+b_n\sin(n\theta)\big),
\end{align*}
where in the third line we have made the change of index $n\mapsto -n$ in the last sum (note that this means that in order for $-n$ to be negative-valued, $n$ must now be positive-valued) and exploited in going from the fourth to the fifth line the facts that $\cos(-\theta)=\cos(\theta)$ and $\sin(-\theta)=-\sin(\theta)$ and therefore $a_{-n}=a_n$ and $b_{-n}=b_n$. Thus we have demonstrated the equivalence of our previously known Fourier series formula with the infinite sum in equation 5.

This explanation is poor in the sense that it is not well motivated (it would be much more satisfying to go the other way, from our known Fourier series formula deriving the complex exponential form) but for our purposes demonstrating the equivalence will be sufficient for the next section. 

Fourier transforms

We will now introduce the (unitary, temporal frequency) Fourier transform:
\begin{equation}
\tilde{f}(\nu)=\int_{-\infty}^\infty\! f(t)e^{-2\pi it\nu}\,\mathrm{d}t
\end{equation}
where if $t$ is time then $\nu$ is frequency and the inverse Fourier transform is given by
\begin{equation}
f(t)=\int_{-\infty}^\infty\! \tilde{f}(\nu)e^{+2\pi it\nu}\,\mathrm{d}\nu.
\end{equation}
The fact that this particular integral transform is named after Fourier should provide a hint that it is related to Fourier analysis, and in fact it is a central part of it. We can demonstrate this relationship by finding the Fourier transform of a function which is expressed in terms of its Fourier series, which is easiest to compute when we use the complex exponential form.$^1$ This will give 
\begin{align}
\tilde{f}(\nu)&=\int_{-\infty}^\infty\! f(t)e^{-2\pi i t \nu}\,\mathrm{d}t\nonumber\\
&=\int_{-\infty}^\infty\!\sum_{n=-\infty}^\infty c_n e^{2\pi int/T}e^{-2\pi i t \nu}\,\mathrm{d}t\nonumber\\
&=\sum_{n=-\infty}^\infty\left(c_n\int_{-\infty}^\infty\! e^{2\pi it(n/T-\nu)}\,\mathrm{d}t\right)\nonumber\\
&=\sum_{n=-\infty}^\infty c_n\delta\left(\frac{n}{T}-\nu\right)
\end{align}
where in going to the final line we have used the definition of Dirac delta functional $\delta(a-\xi)=\int_{-\infty}^\infty \! e^{2\pi ix(a-\xi)}\,\mathrm{d}x$. This final line might look a bit odd, but we can easily make sense of it. The Dirac delta $\delta(a-\xi)$ is defined (simply speaking) as a 'spike' of infinitessimal width and infinite height located at $\xi=a$, so what equation 8 is saying is that the Fourier transform of a periodic function's Fourier series is an infinite number of Dirac deltas located at $\nu=n/T$ for all integers $n$, each one 'weighted'$^2$ by its corresponding Fourier coefficient $c_n$. This is quite a result -- as the Fourier transform automatically gives us the Fourier coefficients of the original function and their corresponding frequencies, it allows us to perform a decomposition on any periodic function (see Fig. 1 for an intuitive visualisation).

Figure 1: A graphical representation of the first 6 terms of the Fourier series for the square wave in the time domain, here denoted by $s_6(t)$, and its Fourier transform in the frequency domain, denoted by $S(f)$. Note that, as per equation 8 and subsequent discussion, is the Fourier series is expressed in terms of sine functions, the frequency axis for $S(f)$ will be imaginary. Consider this figure with Note 2 in mind. [1]
Now let's consider a simple string that is held taut at both ends and plucked exactly in the middle. The vibrating string will support a standing wave with a wavelength given by twice the length of the string, such that there are nodes (points of zero amplitude) at the ends and an antinode (point of maximal amplitude) in the centre. However, in principle it could also support standing waves with odd multiples of frequency (such as those with three antinodes, or five, and so on -- odd-numbered so that there is always an antinode in the centre) or a superposition (or sum) of these waves.$^3$ If this is sounding familiar, it certainly should, because the initial state of such a plucked string, when the centre of the string is pulled away but before it is released, is of course triangular, and the Fourier series of a triangle wave is given by a weighted sum of sine waves with frequencies that are odd multiples of the fundamental frequency.

As sound is merely displaced air, the sound produced by the vibrating string will match (approximately) the vibration on the string itself. Thus, if we take the Fourier transform of the sound produced by the string when it is just released and the string is retaining its triangular shape (before it loses energy due to heat dissipation and air resistance) we will find the Fourier coefficients for a triangular wave! In this context, the Fourier components of the triangle wave have a physical and musical significance, as harmonics of the fundamental frequency (the frequency of the overall note). Furthermore, if we take Fourier transforms at later times, as the vibration in the string 'decays', we can see how the Fourier composition of the sound changes with time. The way the composition/vibration changes with time (even over the course of only a second) has a huge impact on the timbre of the string. As an example, consider two strings of equal length, plucked in the exact same way, but made of different materials. We know they will sound different, albeit similar, despite their initial shape being identical. 

If we have a machine which can produce 'overlapping' single-frequency tones, we can play those harmonic frequencies at the appropriate relative amplitudes and thus reproduce the sound of the string at any given time. Do this for a range of times, changing the components over time to match the Fourier decomposition we found for the decaying note, and now you can (re)create the sound of the string being plucked without any string required! Or, if you want to be creative, change all the frequencies and thus the note being played or change only some of the frequencies or change the relative amplitudes to change the timbre altogether. This is additive synthesis in action, and electronic synthesisers work on this principle.$^4$

Double-slit diffraction

Fourier transforms in particular can be found all over physics, much more so than Fourier series. One sub-field which makes considerable use of them is in the application of Fourier analysis to wave optics, which is often referred to as Fourier optics. One example I particularly like is regarding diffraction in the far-field (Fraunhofer) regime,$^5$ where we are looking at the intensity profile of light at a large distance from the aperture(s). We will assume that the light is incident on the aperture(s) at a right angle and is also monochromatic (of a single wavelength/frequency). As a matter of terminology, I will refer to the diffraction pattern as being visualised on a screen parallel to the plane of the aperture.

In this case, the amplitude profile of the diffracted light (when we are using Cartesian coordinates) is proportional to the Fourier transform of the amplitude at the aperture. But wait, in previous examples we discussed taking Fourier transforms of functions of time $t$. These resulted in functions of frequency $\nu$, which has units inverse to time. Surely if we take the Fourier transform of a function of distance $x$ (as our aperture amplitude is of course a distribution over space and not time!) we should end up with a function of spatial frequency $\xi$?$^6$ This would have units of inverse distance, but the diffraction pattern amplitude is also distributed over space on the screen, not some "inverse space"! How do we reconcile these ideas?

In this case, spatial frequency has no physical meaning, and this method of using the Fourier transform is just a mathematical tool. So we can choose to define the frequency however we want (so long as we are consistent). We choose to define the spatial frequency as $\xi=x'/\lambda L$ where $x'$ is the usual coordinate distance along the diffraction pattern (we would call it $x$ except we're already using that as the spatial coordinate for the aperture function), $\lambda$ is the wavelength of the light and $L$ is the distance from the aperture to the screen. Checking the dimensions, we see that $\xi$ still has dimensions of inverse distance (since $x'$, $\lambda$ and $L$ all have units of distance) but now we can everywhere substitute $x'/\lambda L$ for $\xi$, and since $\lambda$ and $L$ are both constants, our function of spatial frequency has just become a function of (scaled) $x'$, just as we wanted!

So now we have a way to find our diffraction pattern $U(x')$ from our aperture function $A(x)$:
\begin{equation}
U(x')\propto\int_{-\infty}^\infty \! A(x)e^{-2\pi ixx'/\lambda L}\,\mathrm{d}x
\end{equation}
or, equivalently, $U(x')\propto\tilde{A}(x'/\lambda L)$. To make this more concrete, let's do some examples (these examples will be one-dimensional only, so neglect patterns across the whole plane of the screen -- this is not possible in practice, although for appropriately chosen sources and apertures it is a very good approximation). First, we'll consider a single slit of finite width. This can be represented by the rectangular function $\mathrm{rect}(x/W)$ which is $0$ everywhere except between $x=\pm W/2$, where it is equal to $1$ (clearly here $W$ represents the width of the non-zero part of the function).$^7$ This means that our Fourier transform becomes
\begin{align}
U(x')&\propto\int_{-\infty}^\infty \! \mathrm{rect}\left(\frac{x}{W}\right)e^{-2\pi ixx'/\lambda L}\,\mathrm{d}x=\int_{-W/2}^{W/2}\! e^{-2\pi ixx'/\lambda L}\,\mathrm{d}x\nonumber\\
&\propto\left[\frac{\lambda L}{2\pi ix'}e^{-2\pi ixx'/\lambda L}\right]_{-W/2}^{+W/2}\nonumber\\
&\propto\frac{\lambda L}{2\pi ix'}\left(e^{+\pi iWx'/\lambda L}-e^{-\pi iWx'/\lambda L}\right)\nonumber\\
&\propto\frac{\lambda L}{\pi x'}\sin\left(\frac{\pi Wx'}{\lambda L}\right)=W\mathrm{sinc}\left(\frac{\pi Wx'}{\lambda L}\right)\\
\Rightarrow U(\theta)&\propto W\mathrm{sinc}\left(\frac{\pi W}{\lambda}\sin(\theta)\right).
\end{align}
In the fourth line we have made use of the identity (following immediately from Euler's formula) $\sin(x)=(e^{ix}-e^{-ix})/2i$ and the definition $\mathrm{sinc}(x):=\sin(x)/x$. In the final line we have made use of the small angle approximation to put the diffraction pattern amplitude in terms of $\theta$ instead of $x'$, which is more common.$^8$ Finally, what we are really interested in is not the amplitude but rather the (observable) intensity profile at the screen, which is simply the square of the amplitude, and thus we find
\begin{align}
I(x')&\propto \mathrm{sinc}^2\left(\frac{\pi Wx'}{\lambda L}\right),\\
I(\theta)&\propto \mathrm{sinc}^2\left(\frac{\pi W}{\lambda}\sin(\theta)\right).
\end{align}
This is, up to a scalar factor, exactly the intensity distribution we observe for single-slit diffraction (Fig. 2)!

Figure 2:Finite-width single-slit diffraction intensity profile $I(x')$ superposed above an experimental single-slit diffraction pattern for illustrative purposes. As the experimental parameters were not known, the units shown are arbitrary. Adapted from [2]
Now let's try something a bit more interesting. What about two slits? First, we'll consider two slits of infinitesimal width located at $x=\pm d/2$ such that the slits are separated by a distance $d$. This gives an aperture function of $A(x)=\delta(x-d/2)+\delta(x+d/2)$.
\begin{align}
U(x')&\propto\int_{-\infty}^\infty\! \delta\left(x-\frac{d}{2}\right)e^{-2\pi ixx'/\lambda L}\,\mathrm{d}x+\int_{-\infty}^\infty\! \delta\left(x+\frac{d}{2}\right)e^{-2\pi ixx'/\lambda L}\,\mathrm{d}x\nonumber\\
&\propto e^{-\pi idx'/\lambda L}+e^{+\pi idx'/\lambda L}\nonumber\\
&\propto 2\cos\left(\frac{\pi dx'}{\lambda L}\right).
\end{align}
In the second line we have exploited the sifting property of the Dirac delta, $\int_{-\infty}^\infty\!\delta(x-a)f(x)\mathrm{d}x=f(a)$ and in the third line used the identity $\cos(x)=(e^{ix}+e^{-ix})/2$ which, as for the sine case above, also immediately follows from Euler's formula. Therefore, we find an intensity profile of
\begin{align}
I(x')&\propto\cos^2\left(\frac{\pi dx'}{\lambda L}\right),\\
I(\theta)&\propto\cos^2\left(\frac{\pi d}{\lambda}\sin(\theta)\right).
\end{align}
which is again what we expect for "ideal" double-slit interference. In reality, however, not too far from $x'=\theta=0$ one observes unexpected dark regions, which is not what this $\cos^2$ profile suggests. What might be the cause of this discrepancy?

If you guessed the fact that we are using infinitesimal slits, that's a pretty good guess! Let's now go through the same derivation, but with two finite slits as in the single-slit example. Before going on, have a think about what you expect the final function to be first!
\begin{align}
U(x')&\propto\int_{-\infty}^\infty\! \Bigg(\mathrm{rect}\left(\frac{x-d/2}{W}\right)+\mathrm{rect}\left(\frac{x+d/2}{W}\right)\Bigg)e^{-2\pi ixx'/\lambda L}\,\mathrm{d}x\nonumber\\
&\propto\left[\frac{\lambda L}{2\pi ix'}e^{-2\pi ixx'/\lambda L}\right]_{(d-W)/2}^{(d+W)/2}+\left[\frac{\lambda L}{2\pi ix'}e^{-2\pi ixx'/\lambda L}\right]_{(-d+W)/2}^{(-d-W)/2}\nonumber\\
&\propto\frac{\lambda L}{2\pi ix'}\left( e^{-\pi i(d+W)x'/\lambda L}-e^{-\pi i(d-W)x'/\lambda L}+e^{-\pi i(-d+W)x'/\lambda L}\right.\nonumber\\
&\qquad \left.-e^{-\pi i(-d-W)x'/\lambda L} \right)\nonumber\\
&\propto\frac{-\lambda L}{2\pi ix'}\left(e^{+\pi idx'/\lambda L}+e^{-\pi idx'/\lambda L}\right)\left(e^{+\pi iWx'/\lambda L}-e^{-\pi iWx'/\lambda L}\right)\nonumber\\
&\propto-2W\cos\left(\frac{\pi dx'}{\lambda L}\right)\mathrm{sinc}\left(\frac{\pi Wx'}{\lambda L}\right),
\end{align}
and so our intensity is
\begin{align}
I(x')&\propto\cos^2\left(\frac{\pi dx'}{\lambda L}\right)\mathrm{sinc}^2\left(\frac{\pi Wx'}{\lambda L}\right),\\
I(\theta)&\propto\cos^2\left(\frac{\pi d}{\lambda }\sin(\theta)\right)\mathrm{sinc}^2\left(\frac{\pi W}{\lambda}\sin(\theta)\right).
\end{align}
This is just the infinitesimal double-slit intensity profile with the envelope of the single-slit intensity profile, which is exactly what we observe when we perform the experiment (Fig. 3)! Now, it is possible to use this and other related methods to find the intensity profiles for any arbitrary aperture, but I hope that from these few examples you already appreciate the power of this technique. I am sure that those readers who have derived these expressions through more direct (and convoluted) wave phase and trigonometric arguments certainly will!

Figure 3: Finite-width double-slit diffraction intensity profile $I(x')$ superposed above an experimental double-slit diffraction pattern for illustrative purposes. As the experimental parameters were not known, the units shown are arbitrary. We have chosen $d=7W$ to approximately match the image. Adapted from [2].

Conclusion

This is the final post of my 3-part Fourier Fun series, a crash-course introduction to Fourier analysis. In it we have covered the basics of the Fourier series and Fourier transform, hopefully to the point where the reader has a little be of intuitive understanding of where they come from, how they work (and why) and, with a little practice, will be able to make use of them in calculations. I want to stress, however, that I have only scratched the surface in these posts. In physics alone, the Fourier transform in particular is almost ubiquitous and its uses and subtleties are great in number. Even if the reader is unlikely to encounter it in their day-to-day life, I have no doubt that we will see it crop up time and again on the blog. When I first learnt about the Fourier transform, and didn't particularly understand it well, that thought would have horrified me! I hope that is not the case for the reader, but even if so, take solace in the fact that a good couple of years of learning how to tame Fourier transforms, now I wouldn't know what to do without them -- so it goes with many things! I hope you've enjoyed the series, the next blog post should be coming up in the next week or two, so keep an eye out!

Notes

$1$. Something may be jumping out at you here -- the complex exponential form of the Fourier series may be equivalent to the sine+cosine version we are used to, but what about only a single component? Won't the Fourier transform of $e^{i\xi}=\cos (\xi)+i\sin(\xi)$ be different to, for example, $\cos (\xi)$ by itself? The answer is yes. The Fourier transform of $\cos (\xi)$ is given by two deltas, at $\xi=\pm 1$, while the Fourier transform of $\sin (\xi)$ is given by one positive imaginary delta at $\xi=-1$ and one negative imaginary delta at $\xi=+1$. I leave the questions of what these negative frequency deltas mean and how we end up with only a single positive delta using the complex exponential form to the reader.

$2$. An important property of the Dirac delta is that, despite the fact that it is (in some well-defined sense) infinitessimally thin and infinitely tall, when integrated over one finds that the area underneath it is $1$. Thus even though we often think and display the Fourier transform of a periodic function as a number of spikes of various finite heights each given by their particular Fourier coefficient, this is just a sort of short-hand. In truth the 'shape' of the delta does not change, but the area under it (rather than its height) does -- this is why I say it is weighted by $c$. Of course, ordinarily, if we multiply a function by a some scalar then not only does its area change by a factor of the scalar but the height does as well. The difference here is that the Dirac delta is not truly a function but is rather a sort of distribution, so our intuitive rules for dealing with functions do not quite apply -- this is one of the dangers of working with infinities!

$3$. The reason that the string can support a superposition of standing waves is because (in mathematical terms) its behaviour is described by the one-dimensional wave equation $\partial_t^2 u=c^2\partial_x^2 u$, which is what's called a "linear" partial differential equation. In this context, linearity describes the fact that if you have any two known solutions to the equation, any linear combination (weighted sum) of the two will also be a solution.

$4$. The exact same process as for simple strings can also be applied to simple woodwind instruments. Instead of a standing mechanical wave on a string, in a woodwind instrument the standing wave is itself sonic, a wave of compressed/rarefied air rather than displaced string. Despite the different medium, mathematically we describe them in much the same way (albeit the woodwind instrument will have different boundary conditions, such as an antinode at one end rather than nodes at both ends). For real instruments, other effects such as sympathetic vibration and echoes will also play a part, but we can still use additive synthesis to reproduce them, even in anharmonic instruments such as drums as the sound waves they produce will still be sums of sine waves -- they just won't be described by Fourier series with neat, evenly spaced frequencies. Beyond the simple method expressed in the main body, improving on sound fidelity by taking into consideration these various effects becomes a computational exercise.

$5$. This can be roughly defined by the condition $L\gg W^2/\lambda$ where $L$ is the distance from the aperture, $W$ is the aperture width and $\lambda$ is the wavelength of the light. To contrast, the near-field (Fresnel) regime can be roughly defined by $L\ll W^2/\lambda$. These regimes are notable in that they both allow analytic (as opposed to numerical) solutions to be found for the Kirchhoff diffraction equation, which describes diffraction generally. For the interested reader, the Kirchhoff equation is given by
\begin{equation*}
U(P)=-\frac{1}{4\pi}\int_\! S\left(U\frac{\partial}{\partial n}\left(\frac{e^{iks}}{s}\right)-\frac{e^{iks}}{s}\frac{\partial U}{\partial n}\right)\,\mathrm{d}S
\end{equation*}
where $U$ is the (complex phasor) classical electromagnetic field amplitude, $P$ is an arbitrary point of interest, $S$ is an arbitrary surface enclosing $P$, $k$ is the wavenumber, $s$ is the distance from $P$ to $S$ and $n$ is the coordinate giving the direction normal to the aperture. The derivation of this equation, and the Fraunhofer case in turn, is an interesting and informative exercise that would unfortunately require a post of its own, so I will not going into it here.

$6$. I have used spatial frequency on the blog before. We define spatial frequency in terms of wavelength $\xi=1/\lambda$, similar to how (temporal) frequency is defined in terms of period $\nu=1/T$. However, it is  not commonly used. It is much more usual to see wavenumber $k$ used instead of spatial frequency, where we have $k=2\pi\xi=2\pi/\lambda$. I decided to use spatial frequency here instead to maintain the comparison with temporal frequency and keep things conceptually simpler, but be aware that "in the wild" wavenumber is much more ubiquitous in almost every circumstance (that I am familiar with at least!).

$7$. Note that, for the sake of simplicity, we are effectively saying that the amplitude of the light at the single slit is uniformly equal to $1$. We can of course change the amplitude with a scaling factor if we wish, or make the amplitude unevenly distributed by changing the aperture function. For example, if we wanted to model light with a Gaussian distribution over a single slit (or without any slit) all we would need to do is set $A(x)=\mathrm{rect}(x/W)e^{-x^2/2\sigma^2}/\sigma\sqrt{2\pi}$ (or $A(x)=e^{-x^2/2\sigma^2}/\sigma\sqrt{2\pi}$ respectively). We don't do that here because in the Fraunhofer regime we are in the far-field, therefore for practical purposes, the width of the aperture must be very small and thus a uniform distribution of light across the aperture is not at all unreasonable.

$8$. Because we are in the far-field regime, we expect the distance between the aperture and screen $L$ to be much larger than the relevant horizontal distance across the screen $x'$. Note that we have assumed that $x$ and $x'$ are aligned. We can form a right-angle triangle with the smallest angle located at the aperture ($x=0$), the longer cathetus (non-hypotenuse triangle side) having length $L$ (i.e., joining $x=0$ and $x'=0$) and the shorter cathetus having length $\chi$ (running from $x'=0$ to $x'=\chi$), see Fig. 4. As $L\gg \chi$ for all interesting parts of the diffraction pattern, we can say that the hypotenuse $h$ of the triangle is very close in length to $L$. If we make the approximation $h\approx L$ then we find by basic trigonometry that $\tan(\theta)=\chi/L\approx\sin(\theta)=\chi/h$ where $\theta$ is the smallest angle in the triangle (between $L$ and $h$). Thus we can make the substitution $\chi/L\rightarrow\sin(\theta)$, assuming small $\theta$. Of course, this is a very unusual use of the small angle approximation, but it is done to keep the final expression consistent with that found through other (more direct) methods.

Figure 4: Geometric representation of the diffraction apparatus. The aperture is located at $x=0$, the screen is represented by the $x'$ axis and the two are joined by a (minimal) distance $L$. Diffracted light follows a path of length $h$, which with $L$ forms a right-angled triangle with a shortest side length of $\chi$. $\chi$ is subtended by the angle $\theta$ which, in the far-field regime ($L\gg\chi$) will be small. 

Image credits

[1] Lucas V. Barbosa, Fourier series and transform, public domain

[2] Jordgette, Single and Double Slit 2, licensed under CC BY-SA 3.0

7 February 2015

News (2015/02/07)

Hey everyone! I've got lots of news to share in this post, so let's get to it!

Regarding the blog, first of all, as you may have noticed, I've tweaked the layout of the blog! The biggest change is to the font (okay, technically the typeface, but everyone refers to it as the font so that's what I'm going to do) -- I've changed from using Helvetica to Computer Modern as my default font. I decided after my last entry that I wanted to use a serif font for the main body text just to increase the legibility for the longer posts (and sharpen things up a bit). Given I've been using MathJax to render LaTeX for all the equations, I decided after some experimentation that Computer Modern, which is the default font for LaTeX, would vastly improve the integration of equations into the text and unify the posts a bit, which I think it does excellently. Unfortunately the fonts are hosted on an external mirror, so you may notice a slight delay when loading pages (especially longer posts) but since there's a delay anyway when rendering equations, I don't think it's much of an issue, but please let me know if you disagree!

Otherwise, the only real other changes are regarding font size (which I've increased) and I've also been toying with the idea of changing post heading/sub-heading style, but we'll see how that goes using the new font. I'm pretty happy with the overall look of the blog (I spent long enough designing it!) so not much else it set to change in the near future.

There are more changes coming in terms of content as well. For quite a while now (since late September 2014) I've had an idea for a new series on the blog named 'Papers, Please', where I feature guest posts written by friends and colleagues who explain research that they have completed and had published in peer-reviewed journals for a lay audience. Not only does it give them a bit of exposure (and communication practice) it will also hopefully bring readers a more intimate picture of scientific research compared to the sort of 'big ticket' announcements you get from the news and press releases.

My usual posts normally take weeks to write at least, not only due to the time needed to write but also to research, fact-check and edit, a process which is especially difficult for posts about topics I'm not expert in myself! In addition to these posts, I have decided to also start writing much shorter posts where, instead of trying to explain a big topic in detail for an inexperienced audience, I'll be writing about topics that will be geared towards readers who are more experienced in physics or mathematics but may not have come across, for example, a certain interesting equation or operation. Hopefully this will allow me to cover more topics (and clear my ever-increasing back-log of post ideas) and give my readers a little more to read a little more often.

In terms of physics news, the Planck/BICEP2 collaboration results are officially in and the news isn't good for BICEP2's claimed discovery of primordial gravitational waves, a saga I've mentioned in previous News posts on a number of occasions (because I find it very interesting!). As it turns out, the majority (perhaps even all) of the detected B-modes were due to galactic dust. As Sean Carroll points out though, there's still lots of work to be done in this area by both theorists and observational astronomers, so in that sense there is a silver lining to that cloud!

Also, since I occasionally do recommendations on the blog, I thought I'd plug reddit, specifically the discussion subreddits Ask PhysicsAsk Science and Ask Science Discussion which I occasionally contribute responses to, as well as aggregation subreddits Particle PhysicsPhysicsPhysics .gifsQuantumScience and Space. The aggregation subreddits can be a bit hit and miss (if you see something that's look a bit suspect, check the comments!) but the discussion subreddits are excellent if you have a question about physics, conceptual or calculational, that you've always wanted to ask.

Before I sign off I just want to say that the next post will be the conclusion of the 'Fourier Fun' series, and I hope to have it completed shortly. I also hope you enjoy the improvements and additions I'm making to the blog, and I look forward to having them all fully implemented soon!

15 January 2015

Analytic continuation of the zeta function

Introduction

This post is a follow-up to my post on infinite series wherein I discussed the Riemann zeta function as the analytic continuation of the Dirichlet series $D(s)=\sum^\infty_{n=1}a_n/n^s$ in the $a_n=1$ case. Since then I've been asked to write a little bit more about the zeta function and analytic continuation, so in this post I'm going to go through Riemann's second proof of the analytic continuation of the zeta function (based primarily on this presentation, though there are many others on the Internet) and hopefully make it a little bit more accessible. There are a number of other proofs, but I chose this one because it's quite straightforward and doesn't introduce too many techniques foreign to a physics-minded person like myself.

The gamma function

I have a soft spot for the gamma function, so you'll have to indulge me going into a little extra detail here. We'll start by considering an operator we've seen many times before, the factorial operator, which is given by $n!=n\times (n-1)\times (n-2)\times \ldots \times 3\times 2\times 1$ for all non-negative integers $n$ (with $0!=1$ by definition).$^1$ Now, from this definition it is clear that it makes no sense to ask what, say, $1.5!$ or $\pi !$ are, but if we plot $n!$ we notice something interesting -- they seem to follow a sensible 'path' (fig. 1) and as such it may be possible that $n!$ may be sensibly interpolated (simply put, we may be able to join the dots in a consistent and useful way).
Figure 1: $n!$ for $n=0,1,\ldots,6$. Note the logarithmic scale on
the vertical axis. The plot strongly suggests that $n!$ is interpolable. 
Now, almost any set of dots can be interpolated by an arbitrary function. They might be joined by straight lines, or very wavy lines, or anything in between, but most (potentially all) of these interpolations will be of little practical use and, more importantly, will just not be very sensible. If we say that $3!=6$ and $4!=24$ we don't want to say $3.5!=-198$ unless there's a very good reason to. As it turns out, there is a sensible interpolation of the factorial operation, and it's known as the gamma function $\Gamma(s)$ (fig. 2).

Figure 2a) The Gamma function $\Gamma(s)$ along the real axis (shown in blue) and the factorial operation $(n-1)!$  for integers $n$ (shown in red) overlaid. The Gamma function as the interpolation of the factorial operation is clear from this plot. b) The absolute value of the Gamma function $\Gamma(s)$ over $\sigma\in [-4,4]$ and $t\in [-1,1]$ where $s=\sigma+it$. The locations of the poles at the negative integers are clear to see here. 
For integers $n\geq 1$ we have $\Gamma(n)=(n-1)!$ which is exactly the factorial operation shifted by $1$ in the positive direction (this shift can be easily eliminated by considering $n\mapsto n+1$ but in practice this is virtually never done, purely due to historical convention).

In fact, the gamma function is not only defined for the real numbers but can itself be analytically continued over the whole complex plane. It is meromorphic, which means it is everywhere analytic or holomorphic (everywhere differentiable) apart from at one or more singularities (poles)$^2$ (see fig. 2b). The poles of the gamma function are simple poles located at the non-positive integers, which means that at $s=0, -1, -2, \ldots$ it behaves like $1/s$ behaves as $s\rightarrow 0$. This fact will become important shortly.

The gamma function is given by the (improper) integral
\begin{equation}
\Gamma(s):=\int^\infty_0 t^{s-1}e^{-t}\mathrm{d}t,\quad\mathrm{Re}(s)>0.
\end{equation}
It is easy to show using integration by parts that the gamma function can be analytically continued using the functional (implicit) equation $\Gamma(s+1)=s\Gamma(s)$. The formula for integration by parts is
\begin{equation}
\int_a^b u(t)v'(t)\mathrm{d}t=\left[u(t)v(t)\right]_a^b-\int_a^b u'(t)v(t)\mathrm{d}t,
\end{equation}
where a prime denotes differentiation with respect to $t$. For this case, $\Gamma(s+1)=\int^\infty_0 t^s e^{-t}\mathrm{d}t$ so we can choose $u(t)=t^s$ and $v'(t)=e^{-t}$ to give
\begin{align}
\Gamma(s+1)&=\int^\infty_0 t^s e^{-t}\mathrm{d}t\nonumber \\
&=\left[-t^s e^{-t}\right]_0^\infty-\int_0^\infty -st^{s-1}e^{-t}\mathrm{d}t\nonumber \\
&=0+s\int_0^\infty t^{s-1}e^{-t}\mathrm{d}t\nonumber \\
&=s\Gamma(s).
\end{align}

Now we will consider a different but equivalent definition of the gamma function:$^3$
\begin{equation*}
\Gamma(s):=\frac{1}{s}\prod_{n=1}^\infty\frac{\left(1+\frac{1}{n}\right)^s}{1+\frac{s}{n}}.
\end{equation*}
Using the functional equation (eqn. 3) for $-s$ we find
\begin{align}
\Gamma(s)\big(\Gamma(1-s)\big)&=\Gamma(s)\big(-s\Gamma(-s)\big)\nonumber\\
&=-s\frac{1}{s}\prod_{n=1}^\infty\frac{\left(1+\frac{1}{n}\right)^s}{1+\frac{s}{n}}\frac{1}{-s}\prod_{k=1}^\infty\frac{\left(1+\frac{1}{k}\right)^{-s}}{1-\frac{s}{k}}\nonumber\\
&=\frac{1}{s}\prod_{n=1}^\infty \frac{1}{1-\frac{s^2}{n^2}}\nonumber\\
&=\frac{1}{s}\frac{\pi s}{\sin(\pi s)}\nonumber\\
&=\frac{\pi}{\sin(\pi s)}\nonumber\\
\Rightarrow \frac{1}{\Gamma(s)}&=\frac{1}{\pi}\sin(\pi s)\Gamma(1-s)
\end{align}
where in the third line we have combined the two infinite products under a single infinite product sign and simplified (this is acceptable because we have a product of products and the ordering of terms is not important; this is analogous to combining two infinite sums added together under a single infinite summation) and in the fourth line we have used the identity$^4$ $\sin(\pi x)/\pi x=\prod_{n=1}^\infty (1-x^2/n^2)$.

The final line is the important result here; the gamma function is meromorphic, that is it is analytic except for simple poles at $s=0, -1, -2, \ldots$. which means that $\Gamma(1-s)$ will also be meromorphic but with simple poles at $s=1,2,3,\ldots$. However, for those values of $s$, $\sin(\pi s)=0$, which removes the poles, so in the end $1/\Gamma(s)$ is entire, that is, analytic over the whole complex plane.

The Fourier transform

The Fourier transform is an integral transform we have seen before in the context of quantum mechanics, but is closely related to Fourier series, which we have also seen in some depth. To give a heuristic understanding, in the discrete case the Fourier transform allows us to take a function $f(x)$ which is a sum of waves of certain frequencies with heights given by the corresponding Fourier coefficients, and form a new function $\mathcal{F}\{ f(x)\} =\tilde{f}(\xi)$ made up of a sum of "peaks" with heights given by the Fourier coefficients at locations given by the corresponding wave frequencies$^5$. 

The transform itself is given by$^6$
\begin{equation*}
\tilde{f}(\xi):=\int^\infty_\infty f(x)e^{-2\pi i\xi x}\mathrm{d}x.
\end{equation*}
It is important to note that there is nothing about this definition which restricts its use to 'discrete' Fourier sum cases only. In fact, though the above motivating example uses the case of the Fourier series, any function (even non-periodic functions) can be Fourier transformed, provided of course that they are integrable. 

This is exploited by the Poisson summation formula, which states that the infinite sum of values of any continuous function at integer points is equal to the infinite sum of values of the function's Fourier transform at integer points, or, mathematically,
\begin{equation}
\sum^\infty_{n=-\infty}f(n)=\sum^\infty_{k=-\infty}\tilde{f}(k).
\end{equation}
This is easy to prove, but as the proof is not essential it is included in the Notes.$^7$

Let us also consider a couple of examples which will prove useful. First, the Fourier transform of some function $g(x)=f(ax)$ for $a>0$. In this case,
\begin{align*}
\tilde{g}(\xi)=\int^\infty_{-\infty}g(x)e^{-2\pi i\xi x}\mathrm{d}x&=\int^\infty_{-\infty}f(ax)e^{-2\pi i\xi x}\mathrm{d}x\\
&=\int^\infty_{-\infty}f(x)e^{-2\pi i\xi \frac{x}{a}}\frac{\mathrm{d}x}{a}\\
&=\frac{1}{a}\int^\infty_{-\infty}f(x)e^{-2\pi i\frac{\xi}{a}x}\mathrm{d}x
\end{align*}
where in the second line we have made the change of variables $x\mapsto x'=x/a$. However we can simplify further:
\begin{align}
\tilde{f}\left(\frac{\xi}{a}\right)&=\int^\infty_{-\infty}f(x)e^{2\pi i\frac{\xi}{a}x}\mathrm{d}x\nonumber\\
\therefore \tilde{g}(\xi)&=\frac{1}{a}\tilde{f}\left(\frac{\xi}{a}\right).
\end{align}
The second example is for the case of $f(x)=e^{-\pi x^2}$. We start by differentiating
\begin{align*}
\frac{\mathrm{d}}{\mathrm{d}\xi}\tilde{f}(\xi)&=\frac{\mathrm{d}}{\mathrm{d}\xi}\int^\infty_{-\infty}f(x)e^{2\pi i\xi x}\mathrm{d}x\\
&=\int^\infty_{-\infty}e^{-\pi x^2}\frac{\mathrm{d}}{\mathrm{d}\xi}\left(e^{2\pi i\xi x}\right)\mathrm{d}x\\
&=\int^\infty_{-\infty}2\pi ixe^{-\pi x^2}e^{2\pi i\xi x}\mathrm{d}x
\end{align*}
where in the second line we have substituted in the definition of $f(x)$ and used the fact that it does not include a $\xi$ term to take it outside of the derivative.$^8$ The resulting integral is certainly daunting, so we will use integration by parts (eqn. 2) to tackle it. If we choose to take $u(x)=e^{2\pi i\xi x}$ and $v'(x)=xe^{-\pi x^2}$ (where the prime denotes differentiation with respect to $x$) it follows that $u'(x)=2\pi i\xi e^{2\pi i\xi x}$ and $v(x)=-e^{-\pi x^2}/2\pi+c$ where $c$ is a constant of integration which we will set to $0$. Substituting these in and evaluating gives$^9$
\begin{align*}
\frac{\mathrm{d}\tilde{f}}{\mathrm{d}\xi}&=2\pi i\Bigg(\left[e^{2\pi i\xi x}\frac{e^{-\pi x^2}}{-2\pi}\right]_{-\infty}^\infty-\int_{-\infty}^\infty 2\pi i\xi e^{2\pi i\xi x}\frac{e^{-\pi x^2}}{-2\pi}\mathrm{d}x\Bigg) \\
&=-2\pi i\int_{-\infty}^\infty 2\pi i\xi e^{2\pi i\xi x}\frac{e^{-\pi x^2}}{-2\pi}\mathrm{d}x \\
&=-2\pi\xi\int_{-\infty}^\infty e^{2\pi i\xi x}e^{-\pi x^2}\mathrm{d}x \\
&=-2\pi\xi\int_{-\infty}^\infty e^{2\pi i\xi x}f(x)\mathrm{d}x=-2\pi\xi\tilde{f}(\xi) \\
\end{align*}
where we have taken out the common constant factor of $2\pi i$ at the beginning. The final result is especially pleasing because it allows us to set up and solve a standard differential equation:$^{10}$
\begin{align*}
\frac{\mathrm{d}\tilde{f}(\xi)}{\mathrm{d}\xi}&=-2\pi\xi\tilde{f}(\xi) \\
\frac{\mathrm{d}\tilde{f}(\xi)}{\tilde{f}(\xi)}\frac{1}{\tilde{f}(\xi)}&=-2\pi\xi \\
\Rightarrow\tilde{f}(\xi)&=Ce^{-\pi\xi^2}
\end{align*}
for some constant $C$. It is plain to see that $\tilde{f}(\xi=0)=Ce^0=C$ so from the definition of the Fourier transform we then have $C=\tilde{f}(0)=\int_{-\infty}^{\infty}e^{0}f(x)\mathrm{d}x=\int_{-\infty}^{\infty}e^{-\pi x^2}\mathrm{d}x=1$. Thus we finally and rather anticlimactically find that
\begin{equation*}
\tilde{f}(\xi)=e^{-\pi\xi^2}=f(\xi).
\end{equation*}
The integration by parts and differential equation are treated in much more detail in the Notes. 

The theta function

The Jacobi theta function is a function of two complex variables, $z$ across the whole complex plane and $\tau$ across the upper-half of the complex plane (the complex numbers with positive imaginary part), which is frequently used in the theory of (amongst others) elliptic functions. As most of its uses are well beyond the scope of this proof, I will gloss over them here. The theta function is (in generality)$^{11}$ given by
\begin{equation*}
\vartheta(z;\tau):=\sum^\infty_{n=-\infty}e^{i\pi n^2 \tau+2\pi inz}.
\end{equation*}
However, we will only consider the simplified function given by restricting the domain to $z=0$, $\tau=it$
\begin{equation}
\vartheta(0;it)\equiv\theta(t)=\sum^\infty_{n=-\infty}e^{-\pi n^2 t}
\end{equation}
where we have introduced the notation $\theta(t)$ for the sake of readability for this proof.

Now let's define functions $f(n)=e^{-\pi n^2}$ and $g(n)=e^{-\pi n^2 t}$ for some fixed $t>0$. We can see that $g(n)=f(n\sqrt{t})$. This allows us to use the first example from the above section on Fourier transforms (eqn. 6) to give the functional equation
\begin{equation*}
\tilde{g}(k)=\frac{1}{\sqrt{t}}\tilde{f}\left(\frac{k}{\sqrt{t}}\right)
\end{equation*}
We now use the Poisson summation formula (eqn. 5) to find
\begin{equation}
\sum_{k=-\infty}^\infty\tilde{g}(k)=\sum_{n=-\infty}^\infty g(n)=\theta(t).
\end{equation}
From equation 8 it immediately follows that
\begin{align}
\theta(t)&=\sum_{k=-\infty}^\infty\tilde{g}(k)=\sum_{k=-\infty}^\infty\frac{1}{\sqrt{t}}\tilde{f}\left(\frac{k}{\sqrt{t}}\right)\nonumber\\
&=\frac{1}{\sqrt{t}}\sum_{n=-\infty}^\infty\tilde{f}\left(\frac{n}{\sqrt{t}}\right)\nonumber\\
&=\frac{1}{\sqrt{t}}\theta\left(\frac{1}{t}\right)
\end{align}
where in the second line we have made a trivial index relabelling of $k\rightarrow n$ to keep our notation consistent.

We will now consider the behaviour of $\theta(t)$ for $t$ approaching $0$ from above (from the positive-$t$ direction, also denoted $t\rightarrow 0^+$). While it is not immediately obvious why, we will need to exploit it for the proof. We start with the expression
\begin{equation*}
\left|\theta(t)-\frac{1}{\sqrt{t}}\right|=\left|\frac{1}{\sqrt{t}}\left(\theta\left(\frac{1}{t}\right)-1\right)\right|
\end{equation*}
where we have made use of equation 9. Now let's consider the summand in the definition of the theta function, $e^{-\pi n^2 t}$. Due to the squaring of $n$, negative values of $n$ give the same result as positive values, while for $n=0$ we simply have $e^0=1$. This means we can rewrite the definition of the theta function given in equation 7 as the equivalent form
\begin{equation}
\theta(t)=1+2\sum_{n=1}^{\infty}e^{-\pi n^2t}.
\end{equation}
Using this form, we see that
\begin{equation*}
\left|\theta(t)-\frac{1}{\sqrt{t}}\right|=\left|\frac{1}{\sqrt{t}}\left(1+2\sum_{n=1}^{\infty}e^{-\pi n^2/t}-1\right)\right|=\frac{2}{\sqrt{t}}\sum_{n=1}^{\infty}e^{-\pi n^2/t}.
\end{equation*}
As we want to know the behaviour for very small $t$, we can assume that $1/\sqrt{t}<e^{1/t}/4$ (see fig. 3a). This allows us to state that
\begin{equation*}
\left|\theta(t)-\frac{1}{\sqrt{t}}\right|=\frac{2}{\sqrt{t}}\sum_{n=1}^{\infty}e^{-\pi n^2/t}<\frac{1}{2}e^{1/t}\sum_{n=1}^{\infty}e^{-\pi n^2/t}.
\end{equation*}
In turn we can likewise state that $e^{-3\pi/t}<1/2$ (see fig. 3b). Following on from this we find
\begin{align*}
\left|\theta(t)-\frac{1}{\sqrt{t}}\right|&<\frac{1}{2}e^{1/t}\sum_{n=1}^{\infty}e^{-\pi n^2/t}\\
&<\frac{1}{2}e^{1/t}\left(e^{-\pi/t}+e^{-4\pi/t}+e^{-9\pi/t}+\ldots\right)\\
&<\frac{1}{2}e^{-(\pi-1)/t}\left(1+e^{-3\pi/t}+e^{-8\pi/t}+\ldots\right)\\
&<\frac{1}{2}e^{-(\pi-1)/t}\left(1+\frac{1}{2}+\frac{1}{4}+\ldots\right)
\end{align*}
Figure 3a) Plot of $1/\sqrt{t}$ (blue) and $e^{1/t}$ (yellow). Clearly for small $t$, $1/\sqrt{t}<e^{1/t}$.
b) Plot of $e^{-3\pi/t}$. Note the values on the vertical axis -- for small $t$, the function is much smaller than $1/2$.
Of course, while we could have chosen a great number of values for these inequalities, as we can see, these choices have allowed us to simplify our calculations significantly. Note that the final two lines are not equal, but rather we have made use of the fact that $e^{-\pi n^2/t}<e^{-3(n-1)\pi/t}$ for $n\geq 2$ and therefore $e^{-\pi n^2/t}<(1/2)^{n-1}$ from $e^{-3\pi/t}<1/2$, thus preserving the overall inequality.

At this point, we note that, famously,
\begin{equation*}
\frac{1}{2}+\frac{1}{4}+\frac{1}{8}+\ldots=\sum_{n=1}^{\infty}\left(\frac{1}{2}\right)^n=1.
\end{equation*}
Thus our inequality becomes
\begin{align}
\left|\theta(t)-\frac{1}{\sqrt{t}}\right|&<\frac{1}{2}e^{-(\pi-1)/t}\left(1+\frac{1}{2}+\frac{1}{4}+\ldots\right)\nonumber\\
&<\frac{1}{2}e^{-(\pi-1)/t}(1+1)\nonumber\\
&<e^{-(\pi-1)/t}
\end{align}
This is a very useful result because we know that $e^{-(\pi-1)/t}$ is positive and finite for $t\rightarrow 0^+$, and so it provides a useful upper bound. Furthermore, it is easy to show that in this limit $e^{-(\pi-1)/t}\rightarrow 0$. Do note, however, that at $t=0$, $e^{-(\pi-1)/t}$ is not defined!

The Mellin transform

The final tool we need to add to our toolbox is the Mellin transform $\mathcal{M}\{f(t)\}=\hat{f}(s)$. The Mellin transform is an integral transform closely related to the Fourier transform (amongst others) and finds uses all over mathematics, from number theory to statistics to, naturally, complex analysis. It is given by
\begin{equation*}
\hat{f}(s):=\int^\infty_0 f(t)t^{s-1}\mathrm{d}t.
\end{equation*}

Let us consider the Mellin transform of $f(t)=e^{-ct}$:
\begin{equation*}
\hat{f}(s)=\int^\infty_0 e^{-ct}t^{s-1}\mathrm{d}t.
\end{equation*}
We can make the substitution of variables $ct=u$ and $c\mathrm{d}t=\mathrm{d}u$ to give
\begin{align}
\hat{f}(s)&=\int^\infty_0 e^{-u}\left(\frac{u}{c}\right)^{s-1}\frac{\mathrm{d}u}{c}\nonumber\\
&=c^{-s}\int^\infty_0 e^{-u}u^{s-1}\mathrm{d}u\nonumber\\
&=c^{-s}\Gamma(s)
\end{align}
where in the final line we have used the integral definition of the gamma function (eqn. 1), noting that the variable of integration is a dummy variable (it doesn't matter if it is $u$ or $t$ as $\Gamma(s)$ does not depend on it).$^{12}$

The proof itself

We begin by considering the Mellin transform of the theta function:
\begin{equation*}
\int_0^\infty \theta(t)t^{s-1}\mathrm{d}t=\int_0^\infty \sum_{n=-\infty}^\infty e^{-\pi n^2 t}t^{s-1}\mathrm{d}t.
\end{equation*}
As $t\rightarrow\infty$, $\theta(t)\rightarrow 1$ since all terms in the sum approach $0$ except for $n=0$. However, as $t\rightarrow 0^+$, we can see from equation 11 that $\theta(t)-1/\sqrt{t}\rightarrow 0$ and therefore $\theta(t)$ behaves like $1/\sqrt{t}$, which approaches infinity as $t\rightarrow 0^+$. In order to get convergence at both ends of the integral, we want the integrand to go to zero at the limits, and therefore we want to modify our integral to include correction terms. To do this we break the integral into two parts, the upper integral with the integrand $\theta(t)-1$ and the lower integral with the integrand $\theta(t)-1/\sqrt{t}$. To ensure the integrands are continuous over the full domain of integration, we make the break at $t=1$, where both integrands are equal to $\theta(t=1)-1$. So, mathematically, our integral becomes
\begin{equation*}
\int_0^1 \left(\theta(t)-\frac{1}{\sqrt{t}}\right)t^{s-1}\mathrm{d}t+\int_1^\infty (\theta(t)-1)t^{s-1}\mathrm{d}t.
\end{equation*}
Before we move on, we will make the substitution $s\mapsto s/2$. This will ensure the zeta function is in terms of $s$ rather than $2s$, which we choose simply for aesthetic appeal. Doing so gives us
\begin{equation*}
\phi(s)=\int_0^1 \left(\theta(t)-\frac{1}{\sqrt{t}}\right)t^{\frac{s}{2}-1}\mathrm{d}t+\int_1^\infty (\theta(t)-1)t^{\frac{s}{2}-1}\mathrm{d}t.
\end{equation*}

Now we calculate the first integral, using the definition of the theta function given by equation 10 and assuming $\mathrm{Re}(s)>1$ to avoid the possibility of taking the divergent integral of $1/t$.
\begin{align*}
\int_0^1 \left(\theta(t)-\frac{1}{\sqrt{t}}\right)t^{\frac{s}{2}-1}\mathrm{d}t&=\int_0^1 \left(1+2\sum_{n=1}^\infty e^{-\pi n^2 t}-t^{-\frac{1}{2}}\right)t^{\frac{s}{2}-1}\mathrm{d}t\\
&=\int_0^1 t^{\frac{s}{2}-1}\mathrm{d}t+2\int_0^1\sum_{n=1}^\infty e^{-\pi n^2 t}t^{\frac{s}{2}-1}\mathrm{d}t-\int_0^1 t^{\frac{s-3}{2}}\mathrm{d}t\\
&=\left[\frac{2}{s}t^{\frac{s}{2}}\right]^1_0+2\int_0^1\sum_{n=1}^\infty e^{-\pi n^2 t}t^{\frac{s}{2}-1}\mathrm{d}t-\left[\frac{2}{s-1}t^{\frac{s-1}{2}}\right]^1_0\\
&=\frac{2}{s}+2\int_0^1\sum_{n=1}^\infty e^{-\pi n^2 t}t^{\frac{s}{2}-1}\mathrm{d}t+\frac{2}{1-s}
\end{align*}
Using the same definition of the theta function, the second integral simplifies to
\begin{align*}
\int_1^\infty (\theta(t)-1)t^{\frac{s}{2}-1}\mathrm{d}t&=\int_1^\infty \left(1+2\sum_{n=1}^\infty e^{-\pi n^2 t}-1\right)t^{\frac{s}{2}-1}\mathrm{d}t\\
&=2\int_1^\infty \sum_{n=1}^\infty e^{-\pi n^2 t}t^{\frac{s}{2}-1}\mathrm{d}t.
\end{align*}
Therefore the overall integral becomes
\begin{align*}
\phi(s)&=\frac{2}{s}+2\int_0^1\sum_{n=1}^\infty e^{-\pi n^2 t}t^{\frac{s}{2}-1}\mathrm{d}t+\frac{2}{1-s}+2\int_1^\infty \sum_{n=1}^\infty e^{-\pi n^2 t}t^{\frac{s}{2}-1}\mathrm{d}t\\
&=2\sum_{n=1}^\infty \int_0^\infty e^{-\pi n^2 t}t^{\frac{s}{2}-1}\mathrm{d}t+\frac{2}{s}+\frac{2}{1-s}
\end{align*}
where we still have $\mathrm{Re}(s)>1$. Now we can employ equation 12 to find
\begin{align}
\frac{1}{2}\phi(s)&=\sum_{n=1}^\infty \left(\pi n^2\right)^{-\frac{s}{2}}\Gamma\left(\frac{s}{2}\right)+\frac{1}{s}+\frac{1}{1-s}\nonumber\\
&=\pi^{-\frac{s}{2}}\zeta(s)\Gamma\left(\frac{s}{2}\right)+\frac{1}{s}+\frac{1}{1-s}\nonumber\\
\Rightarrow\zeta(s)&=\frac{\pi^{\frac{s}{2}}}{\Gamma\left(\frac{s}{2}\right)}\left(\frac{1}{2}\phi(s)-\frac{1}{s}-\frac{1}{1-s}\right).
\end{align}
Note that in the second line we have identified the Dirichlet series $D(s)=\sum^\infty_{n=1}1/n^s$ with the zeta function $\zeta(s)$. We can do this because we are still working under the $\mathrm{Re}(s)>1$ condition, and so this is true by definition.

Now we are on the home stretch. We know from equation 4 that $1/\Gamma(s/2)$ is entire, and we ensured that $\phi(s)$ was everywhere convergent by hand. This means that the only possible poles in equation 13 can come from the $1/s$ and $1/(1-s)$ terms. For the $1/s$ term, we expect a possible pole at $s=0$, but in fact when we consider the whole term, $\pi^{s/2}/\big(s\Gamma(s/2)\big)$, we find
\begin{align*}
\frac{\pi^{\frac{s}{2}}}{s\Gamma\left(\frac{s}{2}\right)}&=\frac{\pi^{\frac{s}{2}}}{2\frac{s}{2}\Gamma\left(\frac{s}{2}\right)}\\
&=\frac{\pi^{\frac{s}{2}}}{2\Gamma\left(\frac{s}{2}+1\right)}\\
&\overset{s\mapsto 0}{=}\frac{1}{2\Gamma(1)}=\frac{1}{2}
\end{align*}
where we have again made use of the gamma function's functional equation (eqn. 3). It is easy to show that this trick does not work for the $1/(1-s)$ term at $s=1$, which is genuinely singular and produces a simple pole.

Thus equation 13 gives us an expression for $\zeta(s)$ which we know is equal to the Dirichlet series for $\mathrm{Re}(s)>1$ but is also meromorphic over the complex plane with a simple pole at $s=1$. Therefore, at long last, we have completed the analytic continuation of the Riemann zeta function.

Before we finish, one question remains unanswered. If $\zeta(s)$ can be analytically continued, how can we find its values for $\mathrm{Re}(s)<1$, where the Dirichlet series is no longer convergent? The simplest answer to this question is that the zeta function satisfies a functional equation (in fact it satisfies a number of them) which can be used to find the values in the $\mathrm{Re}(s)<1$ region. The seminar that this post is based on gives the proof for the functional equation $\Lambda(s)=\Lambda(1-s)$ where $\Lambda(s):=\pi^{\frac{s}{2}}\Gamma\left(\frac{s}{2}\right)\zeta(s)$. I strongly encourage the interested reader to attempt to prove this functional equation themselves as an exercise -- all of the required techniques have already been presented in this post, so it's certainly achievable! I hope this post was interesting and useful, thank you for reading along and best of luck with the exercise!

Notes

$1$. The reasoning behind this may not be obvious, but the convention is chosen as such for a number of good reasons that I won't go into here. Just to give one motivating example though, consider the power series expansion of the exponential function
\begin{equation}
e^x=\sum^\infty_{n=0}\frac{x^n}{n!}.\nonumber
\end{equation}
If $0!$ were not defined, or defined differently, this elegant expression would not be possible, and the case exists likewise for a large number of other unrelated examples.

$2$. First, let's consider a meromorphic function $m(s)$ with a single pole at $s=s_0$. We can express this function as $m(s)=h(s)/(s-s_0)^n$ for some entire (everywhere holomorphic) function $h(s)$ -- effectively the holomorphic function is 'punctured' by a singularity acting like $1/s^n$ located at $s_0$. The behaviour of the pole is determined by its order $n$, with "simple" poles corresponding to poles of order $n=1$. This negative-degree polynomial behaviour is important to ensure differentiability in the neighbourhood of the pole. Note the similarity of this definition to that of the Laurent series (mentioned in Note 2 of my Fourier Fun 2 post), which is defined on an annulus, rather than by a Taylor series as we might naively expect from real analysis. In fact the Laurent series $m(s)=\sum_{k=-\infty}^\infty a_k (s-s_0)^k$ about a pole at $s_0$ can be broken into two parts: the "regular part" for $k\geq 0$ and the "principal part" for $k<0$. The principal part will always (necessary and sufficient) have $a_k=0$ for values of $k<-n$ and $a_n\neq 0$, where $n$ is still the order of the pole. For meromorphic functions with multiple poles, we simply make the replacement $m(s)=h(s)/z(s)$ for appropriately chosen $z(s)$ (appropriately chosen in the sense that the behaviour of the poles is the same as in the single pole case and the poles are located in the correct positions).

$3$. I have omitted the proof that this definition is equivalent to the integral representation in equation 1 because it is somewhat non-trivial and is not particularly relevant to the overall proof. If the reader is unsatisfied with taking the definition for granted and would like to research it themselves, this formula is known as the Euler (infinite product) representation of the gamma function. If they are up for a challenge, they may wish to prove it themselves!

$4$. As in the case of Note 3, proving this identity is challenging and not particularly important for the overall proof, so I am omitting it. Coincidentally, as in the case of Note 3, this infinite product definition of the sine function is also due to Euler. Again, I encourage the interested reader to read more about it or attempt a proof of their own.

$5$. This will be covered in more depth in my upcoming post to round out the Fourier Fun series, Part 3.

$6$. This definition, as well as the definition for the inverse Fourier transform which results from it, is a matter of convention. For example, this definition is unitary and uses spatial frequency $\xi$ rather than angular spatial frequency (wavenumber) $k$ which is arguably more common in a physics context. Neither of these properties are necessary, although the corresponding definitions will be slightly different (e.g., factors of $1/\sqrt{2}$).

$7$. Define a function $f(x)=\sum^\infty_{n=-\infty}g(x+n)$. As the summation runs over the integers $n$, the function $f(x)$ must have period $1$ (if this is not clear, consider as an example what $f(0)$ and $f(1)$ would look like if the summation were expanded out term by term -- as the summation is infinite, they must be identical) and therefore can be expressed as a Fourier series: $f(x)=\sum^\infty_{n=-\infty}c_n e^{2\pi inx}$ where we have used the polar/exponential form rather than sines and cosines. Interestingly, the Fourier coefficients are given by
\begin{align*}
c_n&=\int^1_0 f(x)e^{2\pi inx}\mathrm{d}x\\
&=\int^1_0 \sum^\infty_{n=-\infty}g(x+n) e^{2\pi inx}\mathrm{d}x\\
&=\int^\infty_{-\infty} g(x) e^{2\pi inx}\mathrm{d}x\\
&=\tilde{g}(n)
\end{align*}
where in going from the second to the third line we have exploited the periodicity of $g$ and taken the integral-of-a-sum as a sum-of-integrals (note the change in the limits of integration). Following on from this, we have
\begin{align*}
\sum^\infty_{n=-\infty}g(n)=f(0)&=\sum^\infty_{n=-\infty}c_n e^{2\pi in0}\\
&=\sum^\infty_{n=-\infty}c_n=\sum^\infty_{n=-\infty}\tilde{g}(n) \nonumber
\end{align*}
thus proving the formula.

$8$. Furthermore, in the first line we have interchanged the order of integration and differentiation. The more keenly rigour-minded readers may be somewhat concerned about this. While it is certainly true that there are functions for which the order of integration and differentiation cannot be interchanged (that is, for some $g(x)=\int f(x,y) \mathrm{d}y$, $\mathrm{d}g/\mathrm{d}x\neq\int\mathrm{d}y\partial f/\partial x$) these functions are often "pathological", or at the very least discontinuous in $f$ or $\partial f/\partial x$ over the region defined by the limits of integration. The case is similar for interchange of the order of integration (for multiple integrals) or interchange of the order of differentiation (for multiple partial derivatives). We will not have to worry about such concerns in this, or indeed for virtually any of the posts on this blog except, of course, where otherwise noted.

$9$. This part of the derivation is especially tricky so I want to give it a more solid treatment in the notes. First, we start with the function $v'(x)=xe^{-\pi x^2}$. In order to find $v(x)$ we integrate both sides with respect to $x$ to give
\begin{equation*}
\int\frac{\mathrm{d}v(x)}{\mathrm{d}x}\mathrm{d}x=\int xe^{-\pi x^2}\mathrm{d}x
\end{equation*}
Now we simplify the left-hand side and make the substitution of variables $u=-\pi x^2$ and $\mathrm{d}u=-2\pi x\mathrm{d}x$:
\begin{align*}
\int\mathrm{d}v(u)&=\int\frac{e^u}{-2\pi}\mathrm{d}u\\
v(u)&=\frac{e^u}{-2\pi}+c\\
v(x)&=\frac{e^{-\pi x^2}}{-2\pi}+c
\end{align*}
where in the second line we have made use of the fact that the integral of $e^u$ is itself and in the third line substituted $-\pi x^2$ back in for $u$. In this final line, $c$ is a constant of integration.

As the computation of $u(x)$ is straightforward, we are now in a position to substitute in $u(x)$, $v(x)$, $u'(x)$ and $v'(x)$ into the integration by parts formula to yield
\begin{equation*}
\frac{\mathrm{d}\tilde{f}}{\mathrm{d}\xi}=2\pi i\Bigg(\left[e^{2\pi i\xi x}\left(\frac{e^{-\pi x^2}}{-2\pi}+c\right)\right]_{-\infty}^\infty-\int_{-\infty}^\infty 2\pi i\xi e^{2\pi i\xi x}\left(\frac{e^{-\pi x^2}}{-2\pi}+c\right)\mathrm{d}x\Bigg)
\end{equation*}
Momentary consideration of the first term inside the brackets shows why we must set the constant of integration to zero -- we have the expression $ce^{2\pi i\xi x}|^\infty_{-\infty}$ appearing. The evaluations of $x$ at $\pm\infty$ cannot be dealt with sensibly and so must be removed by setting $c=0$. (As a side-note, in general this constant of integration does not need to be considered -- for all definite integrations by parts it will not play a relevant part. I have included it here for completeness but will not do so for other cases.) The other part of the first term, $e^{2\pi i\xi x}e^{-\pi x^2}|^\infty_{-\infty}$, is automatically zero because $e^{-\pi x^2}\rightarrow 0$ as $x\rightarrow +\infty$ or $-\infty$.

This only leaves what was the second (integral) term inside the brackets, simplified by taking $c=0$. Since the first term was zero, we can remove the brackets to give
\begin{equation*}
\frac{\mathrm{d}\tilde{f}}{\mathrm{d}\xi}=-2\pi i\int_{-\infty}^\infty 2\pi i\xi e^{2\pi i\xi x}\frac{e^{-\pi x^2}}{-2\pi}\mathrm{d}x
\end{equation*}
After cancelling off factors of $2\pi$ and being very careful with signs, substituting $e^{-\pi x^2}=f(x)$ causes the main result to neatly fall out:
\begin{equation*}
\frac{\mathrm{d}\tilde{f}}{\mathrm{d}\xi}=-2\pi\xi\int_{-\infty}^\infty e^{2\pi i\xi x}f(x)=-2\pi\xi\tilde{f}(\xi)\mathrm{d}x
\end{equation*}

$10$. This differential equation is very easily to solve manually in case you can't be bothered memorising the standard solutions to first order ODEs. I presume that if the reader has made it this far they are reasonably comfortable with them but I have provided the solution below regardless, perhaps in case of a memory lapse.
\begin{align*}
\frac{\mathrm{d}\tilde{f}(\xi)}{\mathrm{d}\xi}&=-2\pi\xi\tilde{f}(\xi) \\
\frac{\mathrm{d}\tilde{f}(\xi)}{\mathrm{d}\xi}\frac{1}{\tilde{f}(\xi)}&=-2\pi\xi \\
\int\frac{\mathrm{d}\tilde{f}(\xi)}{\mathrm{d}\xi}\frac{1}{\tilde{f}(\xi)}\mathrm{d}\xi&=-2\pi\int\xi\mathrm{d}\xi \\
\int\frac{\mathrm{d}\tilde{f}(\xi)}{\tilde{f}(\xi)}&=-2\pi\int\xi\mathrm{d}\xi \\
\log_e(\tilde{f}(\xi))+c_1&=-\pi\xi^2+c_2 \\
\log_e(\tilde{f}(\xi))&=-\pi\xi^2+c_3 \\
\tilde{f}(\xi)&=Ce^{-\pi\xi^2}
\end{align*}
where $c_1$ and $c_2$ are constants of integration, $c_3=c_2-c_1$ and $C=e^{c_3}$.

$11$. Similar to the case of the Bessel functions, there are also three modified versions of the theta function which are known as auxiliary theta functions. When considered in this context, the theta function as described here is referred to as $\vartheta_{00}(z;\tau)$ to distinguish it from the first auxiliary theta function. Additionally, when the theta function and the three auxiliary theta functions are taken together as a function of "nomes" $q=e^{i\pi\tau}$ instead of $\tau$, they are labelled $\theta_1\ldots\theta_4$ in which case the theta function in this proof is given by $\theta_3$. These are only notational subtleties, however, which is why I do not raise them in the main body. I only mention them here to avoid confusion for any interested readers who may wish to find out more about the topic elsewhere.

$12$. Note also that this implies that $\mathcal{M}\{e^{-t}\}=\Gamma(s)$. Furthermore, this method can be used to prove one of the elementary properties of the Mellin transform, that $\mathcal{M}\{f(ct)\}=c^{-s}\hat{f}(s)$. As this is the generalised form of what we showed, we could also have assumed it from the start to gain the result immediately. Interestingly, this generalised form can also be proven using another related integral transform, the two-sided Laplace transform.