26 June 2014

Fourier fun 1: Sums and square waves

Periodic functions

Please note that this post will assume a familiarity with summation notation. A brief explanation is given in my post on divergent series, which can be found here.

What is a Fourier series? In the simplest of terms, a Fourier series is a method of re-expressing (or approximating) any arbitrary periodic function, that is, a function that repeats itself over and over again (the length of the repeated interval is known as the period). This might sound very abstract, but it is an extremely powerful tool that asserts itself in very many fields, a couple of examples of which will be given later on. So how does it work? Let me show you.

In some sense, the most basic periodic function$^1$ is the sine function: $\sin$, although equally we could say it is the cosine function: $\cos$, which is the same as the sine function but shifted a quarter of a period across (see fig. 1). The natural expression of sine ($\sin(t)$) has a period of $2\pi$, or mathematically $\sin(t+2\pi)=\sin(t)$, but the function can be stretched or compressed to have an arbitrary period $T$ such that $\sin\left(\frac{2\pi}{T}t+T\right)=\sin\left(\frac{2\pi}{T}t\right)$. Note that for the specific case of $T=2\pi$ this reverts to the "natural" expression as expected. With this in mind, we can define a cosine with period $T$ in terms of sine as $\cos\left(\frac{2\pi}{T}t\right)=\sin\left(\frac{2\pi}{T}t+\frac{T}{4}\right)$.


Figure 1: $\sin(t)$ is shown in blue and $\cos(t)$ is shown in red. In this plot (and all
others except otherwise noted), the $t$-axis is shown in units of $\pi$ so that
$1.0$ corresponds to $\pi$, $2.0$ corresponds to $2\pi$, and so on. The fact that the
cosine function is related to the sine function by a quarter-period shift in the
negative direction is clearly demonstrated (recall that for $\sin(t)$ and $\cos(t)$ the
period $T$ is $2\pi$, so a quarter period shift is given by $T/4=2\pi/4=\pi/2$ or
$0.5$ in units of $\pi$ as shown on the plot).

This is all well and good, but there are many other periodic functions we might be interested in besides sine and cosine. One simple example is the square wave $S(t)$, which is the signal you get when, for example, you switch a DC power source on and off (and on and off and so on) at regular intervals. This is an example of a digital signal, and mathematically we would represent it as a function that oscillates evenly between $1$ and $-1$ over every period of length $T$ (see fig. 2)$^2$. What we will make special note of, however, is that the square wave looks very much like a flattened, widened $\sin(t)$ (assuming $S(t)$ is defined such that $T=2\pi$). This suggests the question of if and how we might get be able to get from a sine wave to a square wave.


Figure 2: $S(t)$ is shown in blue and $\sin(t)$ is shown in red. In this plot,
the similarity between the square wave (as we have defined it) and the
sine wave is apparent.

If we think additively, to get to $S(t)$ from $\sin(t)$ we need to add to the function near the zeroes (but not at them!) to amplify those regions and subtract from the function near the peaks to flatten them out. What's more, whatever function we add to $\sin(t)$ will also have to be periodic with a period that's an integer multiple of $2\pi$ so that the alterations we make occur everywhere at the right places (so that, for example, we don't add anything to our zero points). It just so happens that we find an excellent candidate in $\sin(3t)$—if we scale $\sin(t)$ and $\sin(3t)$ just right and then add them together, we get a little bit closer to $S(t)$. Repeating the process, we find we get closer if we add an appropriately scaled $\sin(5t)$ and closer again by adding more and more similar sine functions (see fig. 3).


Figure 3: For appropriately chosen amplitudes, the square wave $S(t)$ (shown in blue in the left column)
can be approximated by a sum of sine functions $\sin(nt)$ (shown in red in the left column). The right
column shows the sine functions used for each summation to the immediate left (the Fourier components).
The summations used (a) two components; (b) three components; (c) four components; and (d) five
components.

The square wave

As adding more and more sine functions improves our approximation to the square wave, we can reasonably assume$^3$ that as the number of sine functions we add approaches infinity, the approximation to the square wave becomes arbitrarily accurate. As we noted in the previous section, however, we are only interested in sine functions $\sin(nt)$ with odd values of $n$. Thus we can write our definition of the square wave as
\begin{equation} \label{eq:square1} S(t)=\sum_{n\mathrm{ odd}}b_n\sin(nt), \end{equation}
or equivalently,
\begin{equation} \label{eq:square2} S(t)=\sum^\infty_{m=1}b_m\sin\big((2m-1)t\big), \end{equation}
where in the second equation we have just replaced the positive odd numbers $n$ with the positive integers $m$ such that $2m-1=n$ is still always odd. The terms $b_n$ are the Fourier coefficients (think scaling factors or amplitudes) corresponding to each sine function $\sin(nt)$ such that the infinite sum can be expanded as $b_1\sin(t)+b_3\sin(3t)+b_5\sin(5t)+\ldots$.

Equations \ref{eq:square1} and \ref{eq:square2} are both examples of Fourier series. However, they are not very useful as we haven't found the scaling factors $b_n$. How do we find them? It turns out that's a little more complicated. The derivation for these terms involve a little bit of calculus and so is probably a little beyond the scope of this post, but I strongly encourage the interested reader to turn to the notes where I have given a sketched-out version.$^4$ The result we find, however, is that
\begin{equation} \label{eq:b} b_n = \frac{2}{T}\int_{t_0}^{t_0+T} \! f(t)\sin\left(\frac{2\pi nt}{T}\right) \, \mathrm{d}t \end{equation}
for the general case of a periodic function $f(t)$ with period $T$, where $t_0$ is any arbitrary $t$-value of our choosing. Note that as this applies to the general case, the $n$ in this expression could be odd or even.

Let us now calculate the $b_n$ terms for the square wave [A warning: this section requires a little calculus. It isn't as full-on as that required for the derivation which is why I have decided to include it here. If you have vague memories of calculus from high school you should be able to make it through with some care, but do not worry if you have some trouble or you can't follow it]. The first thing we must note is that both $S(t)$ and $\sin(2\pi nt/T)$ are odd functions, that is, they look upside-down when reflected across the $y$-axis (this is expressed mathematically as $f(-t)=-f(t)$). The product of two odd functions is an even function, that is, a function that looks the same when reflected across the $y$-axis (mathematically, $f(-t)=f(t)$; the cosine function is an example of an even function). That means that the product $S(t)\sin(2\pi nt/T)$ must be an even function. We can use this fact to our advantage! When we evaluate our integral from equation \ref{eq:b} we can choose $t_0=-T/2$ so that the integral ranges from $-T/2$ to $T/2$. Because the integrand is even, and therefore symmetric about $t=0$, the integral from $-T/2$ to $T/2$ is double the integral from $0$ to $T/2$: $\int_{-T/2}^{T/2}S(t)\sin(2\pi nt/T)\mathrm{d}t=2\int_{0}^{T/2}S(t)\sin(2\pi nt/T)\mathrm{d}t$ (recall that the integral calculates the area under the function). Using this fact and substituting in $T=2\pi$, equation \ref{eq:b} becomes
\begin{align} b_n &= \frac{4}{T}\int_{0}^{T/2} \! S(t)\sin\left(\frac{2\pi nt}{T}\right) \, \mathrm{d}t \nonumber \\ &= \frac{2}{\pi}\int_{0}^{\pi} \! S(t)\sin(nt) \, \mathrm{d}t. \end{align}
However, as we can see from fig. 2, between $t=0$ and $t=\pi$, $S(t)=1$. This equation now becomes simple to solve:
\begin{align*}b_n &= \frac{2}{\pi}\int_{0}^{\pi} \! \sin(nt) \, \mathrm{d}t \\&= \frac{2}{\pi}\bigg[-\cos(nt)\bigg]_0^\pi \\&= -\frac{2}{\pi}\big(\cos(\pi n)-\cos(0)\big).\end{align*}
As $\cos(0)=1$ and $\cos(\pi n)=\pm 1$ depending on whether or not $n$ is odd or even, we find that
\[b_n=\begin{cases}4/\pi n & \text{when } n \text{ is odd} \\0 & \text{when } n \text{ is even}\end{cases}\]
Thus, as expected, only odd $n$ terms contribute to the sum -- a very neat result to fall out! Now that we know the Fourier coefficients we know the Fourier series for the square wave (varying from $-1$ to $1$ with period $T=2\pi$) in full:
\begin{equation} S(t)=\frac{4}{\pi}\sum_{n\textrm{ odd}} \frac{1}{n} \sin(nt). \end{equation}

This is the first part in a 3-part series on some of the basics of Fourier analysis. Part 2 can be found here.

Notes

$1$. Here "most basic" refers both to the fact that sine and cosine are typically the only periodic functions taught in school and the more fundamental significance of its relationship to the unit circle.

$2$. Of course, the amplitude of the square wave is only a matter of convention (likewise the period); my preferred representation varies between $0$ and $1$ instead of $-1$ and $1$, but the transformation from the main body text version to my preference is simple: $S(t)\mapsto(S(t)+1)/2$. In the body text I used the $-1$ to $1$ form only because it is more immediately relatable to $\sin(t)$ and therefore, I hope, the Fourier sum becomes less conceptually difficult.

$3$. Of course, the convergence of Fourier series has been proven rigorously, but I leave the investigation of that topic to the interested reader. For the examples shown in this post, convergence can be inferred reasonably safely from the appearance of an ever-improving approximation, except notably at non-differentiable (sharp or discontinuous) points.

$4$. Suppose we have a function $f(t)$ that can be expressed purely as a sum of sine functions, similar to the case of $S(t)$. For this arbitrary periodic function we then have \begin{equation*}f(t)=b_1\sin(\omega_1t)+b_2\sin(\omega_2t)+b_3\sin(\omega_3t)+\ldots,\end{equation*} where $\omega_n=2\pi n/T$. Multiplying both sides by $\sin(\omega_nt)$ gives
\begin{align*}f(t)\sin(\omega_nt)&=(b_1\sin(\omega_1t)+b_2\sin(\omega_2t)+b_3\sin(\omega_3t)+\ldots)\sin(\omega_nt) \\ &=b_1\sin(\omega_1t)\sin(\omega_nt)+b_2\sin(\omega_2t)\sin(\omega_nt)+b_3\sin(\omega_3t)\sin(\omega_nt)+\ldots\end{align*}
We then wish to integrate both sides over one period $T$, giving
\begin{equation*} \int_{t_0}^{t_0+T} \! f(t)\sin(\omega_nt) \, \mathrm{d}t=\int_{t_0}^{t_0+T} \! \left(b_1\sin(\omega_1t)\sin(\omega_nt)+b_2\sin(\omega_2t)\sin(\omega_nt)+\ldots\right) \, \mathrm{d}t\end{equation*}
where $t_0$ is any arbitrary $t$-value we wish to choose to begin our integration at (because the function is periodic and the integration is over a period, the choice of starting point is not important).
We now consider the trigonometric identity $\sin(\theta)\sin(\phi)=(\cos(\theta-\phi)-\cos(\theta+\phi))/2$. Using $\omega_n$ and $\omega_m$ in place of $\theta$ and $\phi$ respectively we can integrate over a period beginning at $t_0=0$ (without loss of generality) to give
\begin{align*}\int_{0}^{T} \! \sin(\omega_nt)\sin(\omega_mt) \, \mathrm{d}t &= \int_{0}^{T} \! \sin\left(\frac{2\pi nt}{T}\right)\sin\left(\frac{2\pi mt}{T}\right) \, \mathrm{d}t \\ &= \frac{1}{2} \int_{0}^{T} \! \cos\left(\frac{2\pi (n-m)t}{T}\right)-\cos\left(\frac{2\pi (n+m)t}{T}\right) \, \mathrm{d}t.\end{align*}
We now consider two cases:
$1) \: n=m$
\begin{align*}\int_{0}^{T} \! \sin^2(\omega_nt) \, \mathrm{d}t &= \frac{1}{2}\int_{0}^{T} \! 1-\cos\left(\frac{4\pi nt}{T}\right) \, \mathrm{d}t \\ &= \frac{1}{2}\left[t-\frac{T}{4\pi n}\sin\left(\frac{4\pi nt}{T}\right)\right]^T_0 \\ &= \frac{T}{2}-\frac{T}{8\pi n}\sin(4\pi n) = \frac{T}{2} \end{align*}
as $n$ is a non-zero integer and so $\sin(4\pi n)=0$ for any $n$.
$2) \: n\neq m$
\begin{align*} \int_{0}^{T} \! \sin(\omega_nt)\sin(\omega_mt) \, \mathrm{d}t &= \frac{1}{2}\int_{0}^{T} \! \cos\left(\frac{2\pi (n-m)t}{T}\right)-\cos\left(\frac{2\pi (n+m)t}{T}\right) \, \mathrm{d}t \\ &= \frac{1}{2}\left[\frac{T}{2\pi (n-m)}\sin\left(\frac{2\pi (n-m)t}{T}\right)\right. \\ &\qquad \left.-\frac{T}{2\pi (n+m)}\sin\left(\frac{2\pi (n+m)t}{T}\right)\right]^T_0 \\ &= \frac{T}{4\pi}\left(\frac{1}{(n-m)}\sin(2\pi (n-m))-\frac{1}{(n+m)}\sin(2\pi (n+m))\right) \\ &=0 \end{align*}
as $n$ and $m$ are non-zero integers and so $\sin(2\pi (n\pm m))=0$ for any $n$ and $m$.
These two cases can be summarised as $\int_{0}^{T} \! \sin(\omega_nt)\sin(\omega_mt) \, \mathrm{d}t = T\delta_{nm}/2$ where $\delta_{nm}$ is the Kronecker delta, which gives $1$ when $n=m$ and $0$ otherwise. This property is known as 'orthogonality', and the set of sine functions of the form $\sin(\omega_nt)$ provides an example of such orthogonal functions.
We can use this orthogonality result to evaluate the integral:
\begin{align*} \int_{t_0}^{t_0+T} \! f(t)\sin(\omega_nt) \, \mathrm{d}t &= \int_{t_0}^{t_0+T} \! \left(b_1\sin(\omega_1t)\sin(\omega_nt)+b_2\sin(\omega_2t)\sin(\omega_nt)+\ldots\right) \, \mathrm{d}t \\ &=\int_{t_0}^{t_0+T} \! b_n\sin^2(\omega_nt)  \, \mathrm{d}t \\ &= b_n\frac{T}{2} \\ \Rightarrow b_n &= \frac{2}{T}\int_{t_0}^{t_0+T} \! f(t)\sin(\omega_nt) \, \mathrm{d}t \end{align*}
where in going to the second line we have used the fact that $\int_{t_0}^{t_0+T} \! b_n\sin(\omega_nt)\sin(\omega_mt)  \, \mathrm{d}t=0$ for all $n\neq m$, leaving only the $n=m$ case relevant, the result from which is used in going to the third line.

16 June 2014

News (2014/06/22)

This news post will be a short one, because I think the news should be pretty obvious -- I've started a new blog! Background Independence will be a home for all of my writings on physics, mathematics, science communication, education, and everything in that vein. With some luck I'll be updating it semi-regularly, so you should have good reason to keep track!

An introduction to the blog can be found on the About page, although I'll be running it in much the same way as Philosophia Mea (which I won't be abandoning completely!) so long-time readers won't need much catching up. One small note, I discourage readers from viewing the blog on mobile devices (not including tablets) if they can. In order to make the mathematics render on mobile I've had to use a mobile template which is not especially user-friendly, and strongly resembles the standard version. Everything is still there, but for the time being you might have to do a lot of reading zoomed-in, with lots of side-scrolling.

In physics news, the controversy regarding the BICEP2 results rages on, with an altered version of their initial pre-print paper being published in Physical Review Letters, a premier physics journal. How significant the alterations are is a matter of perspective, but certainly questions about the accuracy of the group's conclusions are yet to be answered, and likely won't be at least until the Planck group releases their data, which is expected to be more precise, around October.

In somewhat less science-y news, a couple of weeks ago was a rare full moon Friday 13th, which you would think would be off the bad luck charts. I'm not superstitious myself, and as far as I can remember my day went by fairly unremarkably. It was my intention to write a belated post (tongue firmly in cheek) on ghost fields in particle physics to mark the occasion, but unfortunately life has gotten in the way and I haven't had the time, nor likely will I any time soon, so that Wikipedia article will have to do for the time being.

Apart from that, I'll be restarting work on the Fourier post in earnest soon, so I expect to have it up in full or in part within the next couple of weeks, so keep an eye out!

News (2014/05/29)

First of all, I must apologise for the fact that I haven't been updating my blog as much as I'd intended to over the past few weeks. As often happens, real life has gotten in the way, including (for example) my graduation. I can now officially carry the post-nominal letters BScAdv(Hons), although anyone concerned for my ever-inflating ego will no doubt be glad to hear that I have very little intention of doing so.

In terms of the blog, I'm currently working on a post on Fourier analysis and the following post will likely be on the physics of mechanical flight, by popular request. At the current rate I'm going that might not be for a while though!

In physics news, since my last news post some rumours have arisen that the BICEP2 results may not be as sound as initially claimed. Basically, Adam Falkowski has claimed that the BICEP2 collaboration have miscalculated the galactic foreground radiation by misinterpreting an image on a Planck collaboration slide, and their primordial polarisations can mostly be accounted for due to this error (a claim that the BICEP2 team strongly denies). A sceptical take on the rumour is provided by Sesh Nadathur, who argues that the issue has been blown far out of proportion. It will be interesting to see how things unfold in the coming months!

Why Heisenberg uncertainty is not that weird

Whenever quantum mechanics (QM) is brought up in a popular context, in a scientific or pseudo-scientific way, the 'weirdness' of it is almost always mentioned, and the Heisenberg uncertainty principle$^1$ is almost always the go-to example of the weirdness (although in pseudo-scientific contexts it is almost always misrepresented).

So what is Heisenberg uncertainty? Simply put, it is a restriction on the accuracy of simultaneous measurements of 'observables' (measurable quantities). The prototypical example is position and momentum; Heisenberg uncertainty states that the position and momentum of a particle$^2$ cannot be known simultaneously to arbitrary precision. The mathematical statement of the position-momentum uncertainty principle is
\begin{equation}
\Delta x\Delta p\leq\frac{\hbar}{2}
\end{equation}
where $\hbar$ is the reduced Planck constant and $\Delta x$, $\Delta p$ represent (in some sense)$^3$ the uncertainty in $x$ and $p$ respectively. Effectively, the better you know position, the less well you know momentum, and vice versa. This is not a specifically experimental limitation, but a fundamental theoretical one. This sort of restriction, on first viewing, indeed seems very strange and certainly counter-intuitive. I will attempt to convince you that not only is it not necessarily strange, but expected.

What is important to remember about QM is that wave mechanics is a central theme. Particles are represented by wave-functions, which are complex solutions to the Schrödinger equation,$^4$ and this wave-nature contributes to a good deal of the quantum weirdness we are familiar with (an example is shown in a recent blog post of mine which relies on superposition and destructive interference of photon waves). With this in mind, let's take a look at some wave-functions.

For illustrative purposes, we will work in one spatial dimension and free space (zero potential everywhere) and only consider time-independent wave-functions. The simplest example of such a wave-function is the plane wave, which takes the form
\begin{equation}
\psi(x)=Ae^{ikx}\equiv Ae^{ipx/\hbar}.
\end{equation}
Here $A$ is the complex-valued amplitude. The amplitude in this case is not important, because any wave-function must be normalisable (as the probability distribution function, which must integrate to 1 over all space to preserve conservation of probability, is given by $|\psi(x)|^2$, known as the Born rule) and so $A$ will need to be scaled anyway. For those who are unfamiliar with complex exponential form, the waviness is more explicit in the less compact form $\exp{(ikx)}\equiv\cos{(kx)}+i\sin{(kx)}$. The $k$ is the wave-number (or wave-vector in higher dimensions), and this term appears naturally in most of the mathematics I'm presenting in this post. For this reason I will include the version of the equations with $k$ alongside the version with the more physically immediate momentum $p$ (the conversion is simply $p=\hbar k$).

Because the Schrödinger equation is linear, sums of solutions are themselves solutions (this property is known as the superposition principle). That means we can have wave-functions of the form
\begin{equation}
\psi(x)=\sum_{m=0}^{n}A_me^{ip_mx/\hbar}
\end{equation}
for any arbitrary $n$ (finite or infinite). Here the scaling of the $A$ values is still not important because of normalisation, but the relative magnitude of them is, as this determines the relative probability weightings according to the Born rule. However, because we have wavelength $\lambda_m=2\pi/k_m\equiv2\pi\hbar/p_m$, we can see that the above formulation does not capture the full number of possible modes; only integer multiples of the $m=0$ mode are captured. In free space all modes are permissible and so we can take the continuum limit (let the sum turn into an integral):
\begin{equation}
\psi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}\tilde{\psi}(k) e^{ikx}\mathrm{d}k\equiv\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\tilde{\psi}(p) e^{ipx/\hbar}\mathrm{d}p.
\end{equation}
Because we have moved away from integer indexing, the discrete set of amplitudes $A_m$ is replaced by the continuous function that is suggestively symbolled $\tilde{\psi}$. The function ranges over $p$ because we are integrating over all possible modes/wavelengths/momenta—in a sense $p$ takes over the role of index in the integral from $m$ in the summation. The factor of $1/\sqrt{2\pi}$ is a matter of convention and the factor of $1/\sqrt{\hbar}$ comes from the change of $k$ to $p$.

There is more to $\tilde{\psi}$ than meets the eye. Not only is it the amplitude function for the integral, but it's actually the wave-function itself, except not in physical space like $\psi$ but in momentum space.$^5$ In the context of QM this is known as the momentum space representation of the wave-function, but more broadly the mathematical construct is known as the Fourier transform,$^6$ and Fourier transforms occur very frequently in all manner of physical theories involving waves, be they QM, acoustics, optics, crystallography, signal analysis and so on.

So how do we determine the form of $\tilde{\psi}$? As it turns out, perhaps unsurprisingly, the Fourier transform is invertible, and so we find that
\begin{equation}
\tilde{\psi}(p)=\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\psi(x) e^{-ipx/\hbar}\mathrm{d}x.
\end{equation}
All well and good, but how do we make sense of it? Well, let's consider a limiting case. We can select out a single mode by using a Dirac delta such that $\tilde{\psi}(p)=\delta(p-p_0)$. This Dirac delta is zero everywhere except at $p=p_0$, where it is undefined but the area under the Dirac delta is always normalised to $1$. Inserting the delta into equation (4) yields$^7$
\begin{equation}
\psi(x)=\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\delta(p-p_0)e^{ipx/\hbar}\mathrm{d}p=\frac{e^{ip_0x/\hbar}}{\sqrt{2\pi\hbar}},
\end{equation}
which is, outside a numerical factor, effectively a complex exponential in $x$, or equivalently, a flat (complex-valued) wave across all space (applying the Born rule to a plane wave yields a probability distribution of $|e^{iz}|^2=\text{const.}$). So when the momentum is maximally well-defined (put into a single mode) the position is maximally poorly-defined (the wave-function is spread evenly across all space). By the invertibility of the Fourier transform, we can expect the vice versa case to hold (a maximally defined position, i.e., Dirac delta in $x$, will result in a maximally poorly-defined momentum, i.e., an even wave across all momentum space). This relationships lies at the heart of the Heisenberg uncertainty principle.



In this figure, the blue curve is the probability distribution for
a Gaussian wave-packet and the red curve is the Fourier transform
of the blue curve. All sub-figures are to the same (arbitrary) scale.
(i) The blue curve is given by $|\exp{(-x^2+ix)}|^2$. (ii) The blue
curve is given by $|\exp{(-(2x)^2+i(2x))}|^2$. (iii) The blue curve
is given by $|\exp{(-(4x)^2+i(4x))}|^2$. It is clear to see that as
the width of the blue curve is decreased, the width of its Fourier
transform correspondingly increases. 


It is easy to see why (physically speaking) a Dirac delta in momentum space produces a plane wave in position space, but perhaps not so for the other way around. One way to think of it is by considering the example of a Gaussian wave-packet in position space, which takes the Gaussian distribution as its probability distribution, making it somewhat localised (it's non-zero across all space, but asymptotes to zero approaching $\pm\infty$). It is not hard to show that the Fourier transform of the Gaussian wave-packet is also a Gaussian wave-packet. As we decrease the width of the position space wave-packet, we expect to require higher-frequency momentum modes as these will have shorter wavelengths and thus allow for more cancellation closer to the peak, therefore the momentum space wave-packet must correspondingly increase in width. Taking the limiting case, as the position space wave-packet becomes increasingly peaked the momentum space wave-packet must become increasingly spread-out, leading to the Dirac delta-plane wave correspondence we expected. This argument can also be used in reverse to achieve the more intuitive result of a Dirac delta in momentum space leading to a plane wave in position space.


As hinted to at that start of the post, position and momentum are not the only observables which obey Heisenberg uncertainty. The next most common pairing$^8$ is energy and time,$^9$ although uncertainty relationships can be generated more generally by taking the derivative of the (classical) action (and quantising). For example, momentum is the derivative of the action with respect to position, energy is the derivative of the action with respect to time, and so on.

Hopefully I have demonstrated that Heisenberg uncertainty is not so strange as might have first appeared. Rather than an arbitrary restriction on how accurately we can know certain measurable quantities, it is in fact a basic and unavoidable feature of any linear wave theory, of which quantum mechanics is only one example, albeit of a more fundamental and therefore perhaps more intuitively challenging sort than most.

Notes

$1$. The Heisenberg uncertainty principle is so ubiquitous in quantum physics that it is frequently referred to simply as 'the uncertainty principle'. Out of habit, however, I tend to use the less common 'Heisenberg uncertainty' to differentiate it from other, admittedly much less common uncertainty relations. As far as I know, any of these uses are considered acceptable.

$2$. I hesitate to use the word 'particle', as it is important when talking about these quantum concepts to be very clear about what one means. A better term might have been 'quantum' or 'wave-particle', as the wave-nature of QM is central to the discussion of Heisenberg uncertainty. However, despite the slightly misleading connotations, 'particle' is by far the most commonly used term and so I will use it also.

$3$. We could, for example, take the standard deviation in $x$ and $p$, $\sigma_x$ and $\sigma_p$, as these will be represented by continuous distributions.

$4$. In this post I will use the Schrödinger representation of QM as this makes explicit the wave-nature of the wave-function. However, there are many, many representations of QM, each with their own advantages (and disadvantages) when it comes to analysing real systems, but importantly they are all exactly equivalent and so this wave-nature I have been emphasising is intrinsic to all of them. In that sense I could just as well have chosen any representation and this blog post would otherwise have been identical, although perhaps not as easy to understand.

$5$. If you don't know what momentum space is, the most important thing to understand is that there are mathematical spaces other than the space(time) we are familiar with. Let's consider the flat 3-dimensional "position" space (3-space) of ordinary life, with $x$-, $y$- and $z$-directions. Suppose we have an object at the coordinates $x=1$, $y=0$ and $z=0$. This can be represented by a vector in the 3-space going to the point $(1,0,0)$, thus describing the position. Now suppose the 3-space we are looking at is part of a 4-space that includes time, except we are going to set the time to some instant and freeze it there.

Let's say at that instant the object at $(1,0,0)$ is travelling with a momentum of $0$ in the $x$-direction, $1$ in the $y$-direction and $0$ in the $z$-direction (in arbitrary units). We could then construct a "momentum" 3-space (or 4-space including time) with directions $p_x$, $p_y$ and $p_z$ and at that instant of time the vector corresponding to the object would be at the coordinates $(0,1,0)$. So for any $n$-dimensional position space it's easy to see there is a corresponding $n$-dimensional momentum space. In fact, we can define a $2n$-dimensional space known as the phase space by combining the position and momentum spaces, and that space will describe all possible states of a physical system.

$6$. I want to stress that strictly speaking the momentum representation of the wave-function is not the Fourier transform of the position representation. The Fourier transform dual to position is the wave-vector (or wave-number in 1-D) and not the momentum, although given one is a scalar multiple of the other I feel like we can be a little loose in our communication in this one respect.

$7$. In evaluating equation (6) we have made use of the so-called sifting property of the Dirac delta, where $\int_{-\infty}^{\infty}\delta(x-a)f(x)\mathrm{d}x=f(a)$. This property is analogous to the Kronecker delta for sums but used instead of integrals and is arguably its most useful feature.

$8$. These pairs of variables are typically referred to in physics as 'conjugate variables', although in this context we can also call them Fourier transform duals. It is important to remember that we would be less inclined to refer to them as such if we were working in, for example, the Heisenberg representation of QM where the Heisenberg uncertainty principle arises more directly out of the non-commutativity of Hermitian operator matrices. This is only because the Fourier transforms are implicit in that representation; they are still there in some sense due to the equivalence of representations of QM as discussed in Note 4, but are not nearly as obvious.

$9$. The energy-time Heisenberg uncertainty principle is given mathematically as $\Delta E\Delta t=\hbar/2$. This is analogous to position-momentum uncertainty in the sense that the Fourier dual of position is wave-number $k=p/\hbar$ and not momentum directly; the Fourier dual of time is technically angular frequency $\omega=E/\hbar$ and not energy directly. As in the position-momentum case, however, the difference is only a scalar factor of $\hbar$ and so we can speak reasonably loosely with some impunity.

News (2014/04/10)

Two items of news to report this week (well, one and a half at least). The half-piece of news is that I'm working on a new blog post which will hopefully be posted some time next week (although it might take until the week after).

The real news is that the LHC has confirmed the existence of Z(4430), a so-called "exotic hadron". Hadrons are composite particles made of quarks (and held together by gluons). According to the quark model, hadrons can only form in one of two ways: a quark-antiquark pairing (known as a meson) and in a quark triplet (know as a baryon).

The most common hadrons in the universe are protons and neutrons which form atomic nuclei. Protons consist of 2 up-type quarks and 1 down-type quark ("uud") while neutrons consist of 1 up quark and 2 down quarks ("udd"). The names and details of quark types, which are known as "flavours", is not something I will go into here, as it is an interesting enough topic to deserve its own post (although a proper explanation for the layperson would need a little more than that I think).

The quark model is a simple one though and does not describe all of the dynamics permitted by quantum chromodynamics ("QCD", the part of the Standard Model that describes strong interactions). This leaves open the door for exotic hadrons which are not mesons or baryons. Z(4430) is one such exotic hadron.

It was first 'discovered' in 2007 (although 5-sigma confirmation didn't come until 2008) and has now been observed at the LHCb experiment at 13.9-sigma accuracy. This means the chances of the observation being a statistical fluke are $1$ in $1.579\times10^{43}$ (a very, very, very large number indeed). It is believed to be a tetraquark made up of 1 charm quark, 1 charm antiquark, 1 down quark and 1 up antiquark ("ccdu").

While perhaps not as exciting as, for example, the BICEP2 result recently, this confirmation is still a very interesting result and will hopefully spur on further developments in the search for exotic hadrons.

A very dangerous factory

Suppose a new type of bomb is invented whose detonation device is so incredibly sensitive that if it comes into contact with a single particle it will explode. Putting aside the impracticality of such a weapon (and the obvious factory OH&S issues), the producer wishes to maintain quality control as, with anything, some bombs will be faulty and not have detonation devices attached. The question immediately arises: Is it possible to have some ensemble of bombs which we can guarantee contains no faulty weapons?

This question is known as the Elitzur-Vaidman bomb-testing problem, and although one can arrive after reasonably little thought at the fairly obvious answer that no such ensemble is possible (as any direct observation using light or matter will detonate any working bombs), in actual fact such an ensemble is possible! How can this be the case? The short answer is: quantum effects. The long answer? Read on!

Figure 1: A Mach-Zehnder interferometer with faulty bomb $B$ in place and all branches labelled. Note that the $d$-branch is drawn only for illustrative purposes; the photon cannot be detected along the $d$-branch due to destructive interference (see equation 1). (1) The single-photon source $S$. (2) One of the the two 50:50 beam-splitters which are both assumed to be lossless. (3) One of the two mirrors which are assumed to be perfectly reflective. (4) One of the two detectors which are assumed to be perfect detectors. 

The solution to this problem involves the use of a Mach-Zehnder interferometer (Fig. 1) with a single-photon source. To see how, let's consider the case of the interferometer without any bomb in place. We then have
\begin{align}\label{eq:MZ}
\left|s\right\rangle &\rightarrow \frac{i}{\sqrt{2}}\left|u\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left(\frac{1}{\sqrt{2}}\left|c\right\rangle + \frac{i}{\sqrt{2}}\left|d\right\rangle\right) + \frac{1}{\sqrt{2}}\left(\frac{i}{\sqrt{2}}\left|c\right\rangle + \frac{1}{\sqrt{2}}\left|d\right\rangle\right) \nonumber \\
&= \frac{i}{2}\left|c\right\rangle + \frac{-1}{2}\left|d\right\rangle + \frac{i}{2}\left|c\right\rangle + \frac{1}{2}\left|d\right\rangle \nonumber \\
&= i\left|c\right\rangle,
\end{align}
where $\left|a\right\rangle$ represents the quantum state in the $a$-branch of the interferometer (as labelled in Fig. 1) and $i$ is the imaginary unit.$^{1}$ What the above calculation shows$^{2}$ is that (somewhat surprisingly) despite the branching at the second beam-splitter, destructive interference along $d$ and constructive interference along $c$ causes the photon to always be detected at $C$ and never at $D$ (for this alignment).


Figure 2: A Mach-Zehnder interferometer with working bomb $B$ in place and all branches labelled. Note that $B$ blocks the $u$-branch whether the photon interacts with the detector or not (the case of an interaction is not illustrated here as this would correspond to the detonation of the bomb).

Now let's consider the same Mach-Zehnder interferometer but with a bomb placed such that the detector will be along the $u$-branch (as shown in Fig. 2). In this case we have
\begin{align}\label{eq:bomb}
\left|s\right\rangle\left|B_0\right\rangle &\rightarrow \frac{i}{\sqrt{2}}\left|u\right\rangle\left|B_0\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle\left|B_0\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left|X\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle\left|B_0\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left|X\right\rangle + \frac{1}{\sqrt{2}}\left(\frac{i}{\sqrt{2}}\left|c\right\rangle + \frac{1}{\sqrt{2}}\left|d\right\rangle\right)\left|B_0\right\rangle \nonumber \\
&= \frac{i}{\sqrt{2}}\left|X\right\rangle +\frac{i}{2}\left|c\right\rangle\left|B_0\right\rangle + \frac{1}{2}\left|d\right\rangle\left|B_0\right\rangle,
\end{align}
where $\left|B_0\right\rangle$ is the 'primed' or unexploded bomb, $\left|X\right\rangle$ represents the state where the bomb has been detonated$^3$ and $\left|a\right\rangle\left|b\right\rangle\equiv\left|a\right\rangle\otimes\left|b\right\rangle$. Note that for the purposes of this thought experiment we are assuming the detonator is a perfect detector, i.e., the photon wave cannot travel down $u$ without being absorbed.

As is clear from equation 2, the inclusion of the detonator destroys the constructive/destructive interference that caused the simplification in equation 1. Therefore, in the detonator case, rather than having every photon detected at $C$, we have the photon detected at $C$ with a probability of $1/4$, detected at $D$ with a probability of $1/4$ and the bomb detonated with a probability of $1/2$.$^4$

This is what makes it possible to assemble a set of functional bombs without detonating them—if a photon is detected by $D$ then the bomb must have a detonator attached and so we can set it aside knowing it works. If a photon is detected by $C$ then the functionality is indeterminate as we expect a detection at $C$ with non-zero probability in both detonator and no-detonator cases, but this is not a problem as we can simply emit another photon and re-run the test.

Note that while the probabilities above can be derived (in a fairly straightforward manner) from classical principles, we cannot apply a classical interpretation here as the quantum nature of the experiment is indispensable. In the classical (many-photon) run it is possible to both detonate a bomb and make a detection at $D$; this is precluded in the quantum case as the single photon cannot be absorbed by multiple objects. Furthermore, it is the wave-nature of the photon that permits the destructive interference at $D$ in the no-detonator case and thus provides 'detection by $D$' to signify the presence of the detonator and thus successfully make an 'interaction-free' measurement.

If you're unconvinced of this argument because it is based on a purely theoretical consideration, consider that this thought experiment has (equivalently) been carried out in the real world (admittedly using an ordinary detector rather than a bomb) and in fact was first done about a year after this problem was first published. I can't speak to the practical applications, if any exist, but I love this problem regardless for the simple fact that the solution challenges your intuition but can be understood using reasonably straightforward quantum mechanical principles.

Notes

$1$. The inclusion of $i$ in these equations might seem unusual or arbitrary, so I will provide a derivation here that shows where it comes from.

Figure 3: A beam-splitter with two incoming beams ($\psi_1$ and $\psi_2$) and two outgoing beams ($\psi_3$ and $\psi_4$). The incoming and outgoing beams are related by the beam-splitter matrix for the beam-splitter in question, as shown in equation 3. In the note below, the beam-splitter will be assumed to be 50:50 in accordance with the calculations in the main text.

Consider a beam-splitter as shown in Fig. 3. This system can be represented by the matrix equation $\left|\psi_3,\psi_4\right\rangle = \hat{B}\left|\psi_1,\psi_2\right\rangle$, or explicitly,
\begin{equation}\label{eq:BSM}
\begin{pmatrix}
\psi_3 \\ \psi_4
\end{pmatrix}
=
\begin{pmatrix}
T & R \\ R & T
\end{pmatrix}
\begin{pmatrix}
\psi_1 \\ \psi_2
\end{pmatrix},
\end{equation}
where $T$ and $R$ are the transmission and reflection coefficients respectively. In the experiment we assume an ideal, lossless beam-splitter which demands that the beam-splitter matrix be unitary, i.e., $\hat{B}^{\dagger}\hat{B}=\hat{\mathbb{I}}$, or,
\begin{equation}\label{eq:unitary}
\begin{pmatrix}
T^{\ast} & R^{\ast} \\ R^{\ast} & T^{\ast}
\end{pmatrix}
\begin{pmatrix}
T & R \\ R & T
\end{pmatrix}
=
\begin{pmatrix}
1 & 0 \\ 0 & 1
\end{pmatrix}.
\end{equation}
Equation 4 immediately implies the following relations:
\begin{equation}
|T|^2+|R|^2=1,
\end{equation}
\begin{equation}\label{eq:0}
T^{\ast}R+R^{\ast}T=0.
\end{equation}
As $T$ and $R$ are complex numbers, we can represent them in polar form as $T=|T|e^{i\theta_T}$ and $R=|R|e^{i\theta_R}$. For simplicity we choose $\theta_T=0$ and thus $T=|T|\implies T^{\ast}=T$ and so equation 6 becomes
\begin{align}\label{eq:0new}
T|R|e^{i\theta_R}+|R|e^{-i\theta_R}T&=0 \nonumber \\
2T|R|\cos{\left(\theta_R\right)}&=0
\end{align}
where we have made use of the identity $\cos{(\alpha)}=e^{i\alpha}/2+e^{-i\alpha}/2$. Equation 7 is satisfied by $\theta_R=n\pi+\pi/2, n\in\mathbb{Z}$, but we will choose $n=0\implies\theta_R=\pi/2$ for simplicity, which in turn gives $R=|R|e^{i\pi/2}=i|R|$.
Finally, as the beam-splitter is 50:50 (50% transmission, 50% reflection) we demand $|T|=|R|=1/\sqrt{2}$ and so the beam-splitter matrix is given by
\begin{equation}\label{eq:B}
\hat{B}=\frac{1}{\sqrt{2}}
\begin{pmatrix}
1 & i \\ i & 1
\end{pmatrix}.
\end{equation}
It should be clear that equation 8 is not a unique representation of $\hat{B}$; another choice of $\theta_T$ and/or $\theta_R$ would yield a different (unitary) matrix that would make no difference to the calculations shown in equations 1 and 2 (I leave proof of this as an exercise for the interested reader). With that said, the reason I like this representation is that it allows $i$ to function as a label for the states that result from a beam-splitter reflection, making it easier to write down interferometer equations directly from the diagram and keep track of where each term comes from. This is, of course, purely a matter of personal preference.

$2$. This equation is an example of quantum superposition in action. For example, the first line says that the photon exists in a superposition of the $\left|u\right\rangle$ and $\left|v\right\rangle$ states where the states are equally weighted (as we are assuming normalisation). Superposition is a fundamental aspect of quantum mechanics that follows from the linearity of the Schrödinger equation (linear combinations of solutions will themselves be solutions). In this case, the beam-splitter splits the photon probability wave along the two channels and so in some sense the photon travels along both branches, although no measurement can be made which will detect the photon in both channels at once—this is not a consequence of experimental limitations but is a restriction that is fundamental to quantum theory. The question of why this is the case is a deep and ongoing one, and I encourage the interested reader to investigate the literature on the philosophy (and especially interpretations) of quantum mechanics.

$3$. I have gone to some pains in this post to avoid using the term "wavefunction collapse" at any point, although for clarity I will say will say that in the Copenhagen interpretation, the case of the photon interacting with the detonator (or any of the detectors for that matter) is an example of wavefunction collapse.

$4$. So long as the beam-splitters are both 50:50, as we have assumed throughout this blog post. Naturally, other types of beam-splitters will yield different results, and in fact using a more sophisticated apparatus will permit a much better detection level (in theory, the detection fraction can be brought arbitrarily close to 1, although I cannot speak to the practicality of such an apparatus).

News (2014/03/20)

I'm trying to post on my blog much more often this year than I used to, in fact as close to every week as I can manage. Unfortunately, the post I'm working on at the moment isn't nearly ready for publication, so this week I'm instead going to make a little news post, the first item of which will be the thing that I just told you (about the new blog post coming soon)!

The next item of news is not particularly new; last Friday (the 14th of March) was Pi Day. Pi Day is of course silly for a whole bunch of reasons (the main one being it's based on the completely nonsensical American dating system) so I've decided to introduce readers who may not be familiar with it to the concept of tau (τ). Tau has been proposed as an alternative to pi, and while I am not especially partisan on the matter I have to say I am somewhat sympathetic. Here is the case for tau laid out in the Tau Manifesto and for the sake of fairness a counterargument in the Pi Manifesto.

Finally, the real news comes in the form of the results of the recent BICEP2 measurements of B-mode polarisation in the CMB, easily the biggest news in physics since the Higgs was announced in 2012 and a major breakthrough for early-universe cosmologists. I will be able to do an explanatory post about the news if there's enough demand for one, but otherwise a lot of good explanations can be found around the place ranging from the somewhat simplistic to the slightly more technical. This is a very exciting time for fundamental physics and I expect to see some very interesting papers published in the next couple of years based on insights from this new data.

Finally, it isn't technically news, but if you haven't heard of them already, I strongly urge you to check out Brady Haran's science channels, especially Sixty Symbols (physics) and Numberphile (mathematics); I've subscribed to most of them on YouTube and they are absolutely fantastic.

That's all for this quick post, hopefully I'll have a considerably more in-depth number ready for next week! See you then!

Playing with Infinite Series

Sequences, series and summation notation

In mathematical parlance, the term 'sequence' carries a similar meaning as in regular speech; it refers to an ordered list of numbers that usually follows a rule. For example, the sequence $1, 2, 3, 4, 5...$ (onwards without end) is given by adding 1 to the previous number in the sequence, beginning at 1. The definition of the term 'series' is less obvious—one description is that a series is the sum of a sequence, or in other words, to gain a series one adds all the terms of a sequence. For a finite series (a series with finitely many terms, or alternatively a series with a last term), the mathematics is typically fairly straightforward and so we will focus instead on the topic of infinite series, but before we do so, I will make a brief digression to discuss notation.
For reasons I am sympathetic to, few people enjoy reading about mathematical notation and it is difficult to write about it in a way which is interesting. However, while it is possible to write a series as, for example, $1+2+3+4+5+...$ this soon becomes cumbersome and is very limiting for finite series with a large number of terms and for series for which the rule or pattern is not obvious from the first few terms. For these reasons, mathematicians have developed a notation for series which I will adopt in the remainder of this post for instructional purposes (side–by–side with the long-hand version for clarity).
The notation is referred to as summation notation or sigma notation and uses a large capital sigma $\Sigma$ to denote the summation. The rule is written to the right of the sigma in terms of an index of summation, the lowest value of the index will be under the sigma and the highest value will be above the sigma (or '$\infty$' for an infinite series with no final term).$^1$
A simple example of sigma notation in action is the finite series \begin{align} \label{eq:square5} \sum_{n=1}^{5}n^2 & = 1^2+2^2+3^2+4^2+5^2 \nonumber \\ & = 1+4+9+16+25, \end{align} which in this case is equal to $55$. Here, the index of summation $n$ appears prominently in the rule as the number that is being squared. Another example, slightly tricker this time, is the infinite series \begin{align} \sum_{n=1}^{\infty}\left(\frac{1}{2}\right)^n & = \left(\frac{1}{2}\right)^1+\left(\frac{1}{2}\right)^2+\left(\frac{1}{2}\right)^3+\left(\frac{1}{2}\right)^4+... \nonumber \label{eq:Zeno} \\ & = \frac{1}{2}+\frac{1}{4}+\frac{1}{8}+\frac{1}{16}+... \end{align} where the index of summation is this time the power that $1/2$ is raised to in each term of the sum.$^2$

Infinite series behaving badly

An infinite series is in many ways a different beast to a finite series. Possibly the clearest way is conceptually; finite series are computed fairly easily, in principle at least—it is simply a case of adding so many numbers together and then reading the value off your calculator. This is not possible with an infinite series, as there is no final term and no opportunity to hit a final '$=$' button on the calculator, as the sum goes on forever. This is not always a problem, however. Some series are known as 'convergent', which is to say that they are in effect equal or equivalent to some number. There are many tests for convergence, but suffice it to say that we know a series is convergent when it approaches said number.$^3$ We have already seen a convergent series in equation 2, but another example of a convergent series is \begin{equation} \label{eq:euler} e=\sum^{\infty}_{n=0}\frac{1}{n!}=1+1+\frac{1}{2}+\frac{1}{6}+... \end{equation} where $!$ is the factorial operator.$^4$ As you can see, this series converges to $e$, a truly marvellous number which happens to be my favourite mathematical constant, but an explanation of that fact would require another post entirely.
If a series is not convergent, however, then we call it divergent,$^5$ and that is where the trouble starts. A straightforward example is the harmonic series \begin{equation} \label{eq:harmonic} \sum^{\infty}_{n=1}\frac{1}{n}=1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+... \end{equation} which despite its similarity to equation 3 increases indefinitely and does not approach any particular number. This is in some sense easy to understand intuitively; one can see how the sum would just keep getting bigger and bigger. This kind of reasoning is based around partial sums, which are truncations of infinite series. For the harmonic series the partial sum is given by \begin{equation} \label{eq:Hn} H_n=\sum^{n}_{k=1}\frac{1}{k}=1+\frac{1}{2}+\frac{1}{3}+...+\frac{1}{n}, \end{equation} where the $n^{\text{th}}$ partial sum $H_n$ is known as the $n^{\text{th}}$ harmonic number. (Note that here we use $n$ to label the partial sum and so the role of the index of summation is taken over by $k$ to avoid confusion, although we could have chosen $k$ as our label and kept $n$ the index of summation if we wished; it makes no difference). Partial sums are finite series by design and so just like equation 1 there is a final term after which we can hit the metaphorical '$=$' button. If we do so after 2 terms we find $H_2=3/2$, after 3 terms $H_3=11/6$, 4 terms $H_4=25/12$ and so on.$^6$
Now let us consider an altogether different beast known as Grandi's series. Grandi's series is given as \begin{equation} \label{eq:Grandi} \sum^{\infty}_{n=0}(-1)^n=1-1+1-1+1-1... \end{equation} Unlike the harmonic series, the partial sums of Grandi's series do not seem to trend in any particular direction, but rather alternate between two 'accumulation points' at 1 and 0. This peculiarity will cause us some trouble, but to see why first we will make a brief foray into the physical sciences.

A brief foray into the physical sciences

Suppose we have two thin neutral conductive plates placed parallel to each other very close together in a vacuum. According to classical physics (and everyday intuition) absolutely nothing will happen. However, this is not what we observe; in fact, the two plates will experience an attractive force attempting to bring them together. This is known as the Casimir effect, and can only be understood in terms of quantum mechanics. The Casimir effect is typically expressed in terms of quantum electrodynamics, but there is nothing inherently electrodynamic about the effect and so one can consider many analogous scenarios with equivalent Casimir effects. In order to greatly simplify our derivation I will choose to do just that.
One thing that it is essential to be aware of is that in quantum mechanics a vacuum is not 'empty' in the sense that there is nothing at all there; so far as we understand, such an emptiness cannot exist in the physical universe. This possibility is precluded both on experimental grounds and on theoretical grounds by the uncertainty principle.$^7$ Instead we understand the universe to be filled with quantised fields (mathematically speaking, a field assigns some value(s) to every point in space(time); a quantised field is one where the range of possible values is restricted to some discrete set) with each particle being a localised excitation in the energy of the field, e.g., photons (light particles) are excitations in the electromagnetic field, electrons are excitations in the electron field and so on. The rough procedure$^8$ for quantising a field is to treat it as a quantum harmonic oscillator (QHO) at every point in space—one could crudely picture this as an infinite system of connected balls and springs—which naturally results in quantised, discrete energy levels. A 'true' vacuum is therefore the ground state (lowest energy state) in every quantum field. We know that the ground state energy cannot be 0 as that would be the kind of emptiness that cannot exist. Rather, it takes on a value of \begin{equation} \label{eq:energy} E_0=\frac{\hbar\omega}{2}, \end{equation} the ground state of the QHO, where $\hbar$ is the reduced Planck constant and $\omega$ is the angular frequency of the oscillator.
With all this in mind, let's return to the Casimir effect, considering a 1+1-dimensional massless scalar field to simplify and clarify the example. The plates impose what are called 'boundary conditions', they restrict the frequencies that waves in between the plates can take, in this case to standing waves.$^9$ The equation for a standing wave in 1+1-dimensions is \begin{equation} \label{eq:wave} \psi_n(x,t)=e^{-i\omega_nt}\sin{\left(\frac{n\pi x}{a}\right)} \end{equation}  where $a$ is the width of the cavity between the plates, $n$ is a natural number (a positive whole number) and $\omega_n$ is the angular frequency given by \begin{equation} \label{eq:omega} \omega_n=\frac{n\pi c}{a} \end{equation} where $c$ is the wave speed. Now, we know that the ground state energy for a QHO is given by equation 7 and we know that in between the plates only standing waves can exist, so we can only have waves with angular frequency $\omega_n=n\pi c/a$. If we wish to find the vacuum energy between the plates, it seems clear then that all we need to do is sum over the possible ground state energies, giving \begin{equation} \label{eq:vacuum} E=\frac{\hbar}{2}\sum^{\infty}_{n=1}\omega_n=\frac{\hbar\pi c}{2a}\sum^{\infty}_{n=1}n. \end{equation} Here we run into a problem. Unless you haven't been paying attention, you'll notice that \begin{equation} \label{eq:natural} \sum^{\infty}_{n=1}n=1+2+3+4+... \end{equation} absolutely positively does not converge at all, and yet here it appears in a physics equation relating to a very real and decidedly measurably finite effect. How can this be? We haven't made a mistake, but we have overlooked one crucial fact. No matter what our plates are made of, they cannot confine arbitrarily high energies of the field; those high energy modes will always be able to escape. So what we need now is to somehow take account of that fact and in doing so somehow assign a finite value to equation 10 and rescue our derivation.

Putting divergent series to work

So what we seek is a way of attaching a meaningful finite value to divergent equations. Let's take a look at Grandi's series again and see if we can come up with anything consistent. We can try cancelling off pairs of terms to give \begin{align} \label{eq:gpair1} \sum^{\infty}_{n=0}(-1)^n&=(1-1)+(1-1)+(1-1)+...\nonumber\\ &=0+0+0+...\nonumber\\ &=0 \end{align} but we can just as easily choose different pairings to give \begin{align} \label{eq:gpair2} \sum^{\infty}_{n=0}(-1)^n&=1+(-1+1)+(-1+1)+...\nonumber\\ &=1+0+0+...\nonumber\\ &=1 \end{align} which is certainly not consistent. We can try re-ordering the series to bring all the $+1$s to the front, but this gives \begin{align} \label{eq:inf+1} \sum^{\infty}_{n=0}(-1)^n=1+1+1+...-1-1-1... \end{align} As there are an infinite number of $+1$s we never reach the $-1$s and the series approaches $+\infty$. Trying the same process by arranging all the $-1$s to the front will in the same way cause the series to approach $-\infty$. Rather than find a single consistent way of assigning a number to the series, all we have found is four duds.
The reason these methods are all duds is because operations like reordering and cancelling pairs of terms (method of differences) are valid only for convergent series. If we try to apply them to divergent series, the result is clearly a mess. Let's step back and look at the problem from another angle. We want to assign a number to the series; presumably we should be able to manipulate that number algebraically. If we call the series $S$ then after some algebraic juggling we find \begin{align} \label{eq:S} S&=1-1+1-1+1-1+...\nonumber\\ 1-S&=1-(1-1+1-1+1-1+...)\nonumber\\ &=1-1+1-1+1-1+...\nonumber\\ &=S\nonumber\\ 1&=2S\nonumber\\ \Rightarrow S&=1/2 \end{align} Furthermore, we can consider Grandi's series as an example of the infinite geometric series \begin{align} \label{eq:geometricG} \sum_{k=0}^{\infty}ar^k=a+ar+ar^2+ar^3... \end{align} where $a=1$ and $r=-1$. Even though Grandi's series is divergent, equation 16 is convergent for $|r|<1$ and in that case \begin{align} \label{eq:geometricC} \sum_{k=0}^{\infty}ar^k=\frac{a}{1-r}. \end{align} If we substitute $a=1$ and $r=-1$ into equation 17 then we again find $S=1/2$. Neither of these constitute solid proof in and of themselves, but they are highly suggestive. The tool we are looking for is the Cesàro sum.$^{10}$ A series is Cesàro summable when the mean value of its partial sums tends to a given value. For convergent series the Cesàro sum will always equal the number the series converges to and the Cesàro sum is defined for many divergent series too, including Grandi's series. The partial sums of Grandi's series are $1,0,1,0,...$ and so the terms in the Cesàro sequence are $1, 1/2, 2/3,1/2,3/5,1/2,4/7,...$ which clearly converges to $1/2$ in the limit.$^{11}$ In some sense this is quite a satisfying result as our Cesàro sum lies exactly in between the two accumulation points, serving as a kind of average value.

Having fun with zeta function regularisation

The partial sums of equation 11 are the triangular numbers $T_n$ (so named because they give the numbers of objects that can be arranged into equilateral triangles) $1, 3, 6, 10, 15,...$, so we calculate the Cesàro sequence of equation 11 and find it goes $1, 2, 10/3, 5, 7,...$  Once again, just as we stop to bask in our moment of triumph, we find our job isn't quite yet done; equation 11 is not Cesàro summable. We must find another, more sophisticated method for attaching a number to that series.
Let us consider the series \begin{align} \label{eq:dirichlet} D(s)=\sum^{\infty}_{n=1}n^{-s}, \text{ Re}(s)>1 \end{align} where $s$ is a complex number and $\text{Re}(s)$ denotes the real part of $s$. For $s=-1$ this series would be exactly the same as equation 11, but the series is not defined for $s=-1$.$^{12}$ However, we saw back in equation 17 that on that occasion if we applied the convergent case equation to Grandi's series we got the result of $1/2$ that turned out to be right; perhaps we could do something similar here? As it would happen, we can, but first I would implore you to, in the great words of John Arnold, "Hold on to your butts".
In the domain $\text{Re}(s)>1$, $\zeta(s)=D(s)$ where $\zeta(s)$ is known as the Riemann zeta function.$^{13}$ Unlike $D(s)$, $\zeta(s)$ is defined over the entire complex plane and is known as an analytic continuation of $D(s)$. Analytic continuation is a wonderfully useful (and perplexing) tool of complex analysis whereby the domain of an analytic ('well-behaved') function can be extended. While this may not seem like a big deal, it raises the question as to whether or not a function can be continued arbitrarily; if our original function is only defined for some small domain and we wish to extend that domain, what is to stop us giving it such–and–such value in the extended domain instead of some other value? As it would happen, the identity theorem states (very roughly) that any two holomorphic functions (all complex analytic functions are holomorphic) that are equal to each other at some point in a given domain must be equal over the entire domain, and thus there is only one unique way to analytically continue a function.
We wish to know what $\zeta(-1)$ is so we can assign that value to $D(-1)$ through the magic of analytic continuation. For negative whole numbers $n<0$, \begin{equation} \label{eq:negzeta} \zeta(n)=-\frac{B_{1-n}}{1-n} \end{equation} where $B_{1-n}$ is the '$1-n$'$^{\text{th}}$ Bernoulli number.$^{14}$ In the case of $n=-1$ we have  \begin{equation} \label{eq:-1zeta} \zeta(-1)=-\frac{B_2}{2}=-\frac{1}{12}. \end{equation} Thus we can assign to the divergent series $1+2+3+4+...$ the value of $-1/12$. If this strikes you as strange or even suspicious then I applaud your scepticism; there is indeed something plainly odd about assigning the value of a small, fractional negative number to a series of ever-increasing positive whole numbers. This is not at all like the neat case of Grandi's series where we had our value lying neatly between the accumulation points. But before we throw our hands up in despair, recall our motivation for this investigation, the Casimir effect. What happens if we use our value of $-1/12$ there?
As it would happen, to do so is to use a technique known as zeta function regularisation. We replace a divergent series with a 'regulator' in the form of a zeta function (although other regulators exist, each with different strengths and weaknesses) and in doing so remove unphysical infinities from our theory. If we have regularised correctly, then by the time we have reached our final result, the regulator will have disappeared—it is nothing more than a 'trick' for calculating the correct value and so it should not still appear at the last step. \begin{equation} \label{eq:zetanorm} E=\frac{\hbar\pi c}{2a}\sum^{\infty}_{n=1}n=\frac{\hbar\pi c}{2a}\zeta(-1)=-\frac{\hbar\pi c}{24a}. \end{equation} The force between the two plates is given by the negative gradient of the energy: \begin{equation} \label{eq:force} F=-\frac{\partial E}{\partial a}=-\frac{\partial}{\partial a}\left(-\frac{\hbar\pi c}{24a}\right)=-\frac{\hbar\pi c}{24a^2} \end{equation} which, lo and behold, is exactly the right result.

Final thoughts

The prompt for my writing this lengthy explanation of infinite series, divergences, regularisation and so on was a minor flurry on the Internet a little while ago due to a somewhat dodgy derivation of the result $1+2+3+4+...=-1/12$. In order to get this result a number of those forbidden–for–divergent–series operations were used for simplicity's sake, but in doing so I felt the important subtlety between a convergent series equalling a number and a divergent series being assigned a value (albeit in a rigorous way) was lost, and that is an important distinction to make. You cannot keep adding $1+2+3+4+...$ and then through the magic of infinity come up with a $-1/12$ at the end; that series will always diverge and will always approach $+\infty$, but as we have seen we can rigorously assign the value of $-1/12$ to it for the purposes of removing infinities from our calculations using, in this case, the Riemann zeta function.
During the conception of this post I did ponder a question which continues to interest me, though. We saw from the example of Grandi's series that there are some mathematical operations and manipulations which would be fine in 'normal' mathematics but which suddenly become verboten when done in the specific context of a divergent series. The question is, Is this a fundamental property of the mathematics in question, that is to say, the underlying patterns and structures, or is it one emergent from notational limits? Is a rearrangment of terms in a divergent series actually fundamentally different to a rearrangment of terms in a convergent series, or is it the same thing that manifests different results in different contexts? I am not sure if this question can be answered sensibly, but for my money I am reminded of the old dichotomy in the philosophy of mathematics whose question is still yet to find an answer: Is mathematics discovered or invented? Now there is truly some food for thought.

Notes

$1$. There are other ways to use sigma notation. For example, equation 11 can also be represented as $\sum_{n\leq 1}n$ ($n$ being a whole number is implied by the discrete sum being used instead of the continuous integtral) or $\sum_{n\in\mathbb{N}}n$. Other less common examples of sigma notation are $\sum_{p\text{ prime}}\frac{1}{p}$ which is the sum of the reciprocals of all prime numbers and $\sum_{d \vert n}d^x$ which is the divisor function, where '$d \vert n$' means $d$ divides $n$ exactly.
$2$. The sum shown here has the interesting property of being a series representation of the number 1, or in the given notation, $\sum_{n=1}^{\infty}\left(\frac{1}{2}\right)^n=1$. For any readers who are passingly familiar with Ancient Greek philosophy, this fact can be viewed as a solution to Zeno's dichotomy paradox, albeit one which ignores some nuances which I will address at a later date when I cover supertasks, one of the many intersections of philosophy and mathematics.
$3$. Formally, there exists a limit $S$ such that for any (arbitrarily small) number $\epsilon>0$ there is a number $N$ such that for $n>N$, $|S_n-S|<\epsilon$ where $S_n$ is the $n^{\text{th}}$ partial sum. Informally, there exists a number $S$ such that for an arbitrarily large $n$ the partial sum $S_n$ will be arbitrarily close to $S$.
$4$. The factorial operator is defined as $n!=n(n-1)(n-2)...1$, or in words, $n!$ (read '$n$ factorial') is given by the multiplication of $n$ by all the whole numbers less than $n$ going down to $1$. As an example, $5!=5\cdot4\cdot3\cdot2\cdot1=120$.
$5$. I will not delve here into the depths of conditional and absolute convergence, almost convergence, and so on. Suffice it to say that there are a great many interesting infinite series that have a great many interesting properties relating to convergence behaviour other than those simple ones shown here. If you are especially interested in the topic, I recommend investigating the Riemann series theorem for a very interesting ad surprising property of conditionally convergent series.
$6$. The harmonic series is deceptively interesting and there are many, many different and varied ways of calculating the harmonic numbers. One example straight out of equation 4 is the recurrence relation (an equation which gives one term in a sequence in terms of a previous one) $H_n=H_{n-1}+1/n$. I encourage you to investigate others and see where it leads you!
$7$. The value of a field and the value of the derivative of the field at a given point in space cannot be known to arbitrary accuracy; the better one is known the less well the other must be. Though the uncertainty principle is often raised as a weird and wonderful result of quantum mechanics, in fact it is a feature of any wave theory and is linked intimately with Fourier transformations, although a thorough demonstration of how this is so is sadly beyond the scope of this note.
$8$. The complexities of quantum field theory should by no means be underestimated; what I am presenting here is an extraordinarily simplified version that, while instructive, would not necessarily be very useful in practice.
$9$. Standing waves are waves with nodes (points of zero displacement) at the endpoints. An example would be a plucked guitar string, which is retricted from moving at the bridge and nut. This restriction ensures the only possible wavelengths are given by $\lambda_n=na/2$ for length $a$ where $n$ is a natural number. Using the wave relationship $c=\nu_n\lambda_n$ we find equivalently $\nu_n=nc/2a$ (or equation 9 as $\omega=2\pi\nu$) which gives the frequency $\nu$ for the $n^{\text{th}}$ harmonic.
$10$. Or rather, one of the tools, as we could equally have chosen the Abel sum, the Borel sum, the $1/x$ series method, or a number of others. Cesàro summation is far from the only rigorous way of dealing with Grandi's series, but what is important is that it gives a value of 1/2, as do the methods I listed above—this consistency is a big hint that we have picked the right number to assign to the series.
$11$. It is worth noting that if we 'dilute' the series by adding in $+0$s we change the value of the Cesàro sum (although the summability is not affected). This illuminates yet another mathematical manipulation which would be fine for a convergent series but is not for a divergent series.
$12$. Precisely because it would become divergent.
$13$. If $e$ is my favourite number then $\zeta(s)$ is surely my favourite function, but in exactly the same way I could not possibly hope to explain why except in another post devoted to it exclusively.
$14$. For fear of overwhelming you with yet more beautiful mathematics in an already over-long post I will avoid the temptation of discussing the Bernoulli numbers, although as ever I encourage the interested reader to investigate for themselves!