30 October 2014

News (2014/10/30)

Hi all,

Just a quick update on what's going on.

On the personal side of things, I've now completed both the general GRE and the physics GRE (which I travelled all the way to Perth for!) and so I have a little bit of time on my hands again. That means a return to blogging! I was originally intending to make a post about Faddeev-Popov ghosts in time for Halloween tomorrow (an idea I first had way back in June when we had a full moon Friday the 13th), however it was soon very apparent that to make it an accessible topic would require much more than a single post and require much more than a single week to write it all, so you may have to wait a little bit to see that one!

While I wasn't blogging I also had the idea to get some friends of mine to write guest posts explaining scientific papers they've published to give them a bit of the limelight and promote their hard work. I've had a couple of people indicate some interest so hopefully over the next few months we'll be seeing a few of those in addition to the usual content!

The next blog post will hopefully be on its way soon (give it a week or three) but in the meantime I thought I'd offer up some more recommendations, this time for Quanta Magazine, a general science online magazine published by the Simons Foundation which has some excellent physics coverage, and Symmetry Magazine, which is published by Fermilab and SLAC and is more focussed on particle physics. Take a look, I hope you enjoy them!

23 September 2014

News (2014/09/23)

Hey guys,

Not a lot of news to announce regarding the blog, no work has gone into the next post since my last news post about a month ago. I've been spending most of my time working on preparing for the GRE and doing some personal quantum field theory study for shits and giggles, along with various and sundry odd-jobs.

The real reason for this news post is that there's real news to report, that is to say, news in the outside world of physics. The Planck Collaboration have published on arXiv their intermediate results regarding polarised dust emissions. Those of you who remember the BICEP2 Collaboration's results regarding their measurement of primordial light polarisation, which implied gravitational waves evidencing cosmic inflation (which I've written about in previous news posts herehere and here) will recall that many people were choosing to be somewhat cautious until independent confirmation could be made by Planck. Well, the news is in and it isn't good.

As before I'll leave the explanation to someone better qualified but to put it simply, at the moment it seems to be the case that there's a lot more galactic dust in the region of the sky that BICEP2 was looking at than they thought and the majority of the signal they were looking at was in fact not primordial light polarisation but light polarisation due to interactions with galactic dust.

No doubt there's still lots still yet to come from this but for the moment I think it's safe to say that the jury is still out on the details of inflation and we're back to where we were before the BICEP2 announcement in March. Alas, sometimes science is messy and that's just how it goes. Even still, it also means that there's still much more work to be done in this area (which is a good thing!), and hopefully the next results that come in will stand up better under scrutiny.

6 September 2014

Historical motivations for special relativity

This post was originally written to be the first part of a text companion for a series of introductory lessons on the basics of special relativity. Ultimately I decided to write a different introduction and the lesson series didn't eventuate anyway, so I thought I might as well post it here since I'm not currently working on a blog post. Keep in mind it's not 100% perfect, I haven't fully edited it or added any illustrations, but if I did that I might as well write a new post altogether. So anyway, enjoy!

Let's begin by examining non-relativistic classical mechanics. This is sometimes referred to as Newtonian mechanics because its foundations are given by Newton's three laws of motion. For our discussion, the relevant laws are the closely-related first and second laws. As a refresher, they are given by:

First Law
Unless acted upon by an external force, an object will remain in constant motion.

Second Law
$\mathbf{F}_{\text{net}}=m\mathbf{a}$

An object in constant motion is one whose velocity (speed and direction) is not changing, or one that is not moving at all. The first law thus defines the domain of validity of Newton's laws, saying that they only hold in inertial reference frames. For our purposes, a reference frame is a coordinate system with an associated clock, and an inertial reference frame is such a reference frame which is not accelerating with respect to any other inertial reference frame. $\mathbf{F}_{\text{net}}$ is the net force, or the sum of all forces acting on an object (equivalently the resultant or overall force).

The difference between inertial and non-inertial reference frames is easy to understand given a simple example; two friends are on a straight road, with one standing and the other riding a rocket-powered skateboard towards him. If the standing friend asks the skating friend whether she feels a force pushing her closer to him, she will most certainly answer that she does. Thus the standing friend, who observes from an inertial reference frame, correctly associates force with acceleration (Newton's second law). However, if the skating friend asks the standing friend if he feels any force, he will answer that he does not, even though he appears to be accelerating towards her. This is because the skating friend occupies a non-inertial reference frame, and so Newton's second law no longer applies.

Any two inertial reference frames can be put into so-called 'standard configuration' by choosing a Cartesian coordinate system for both and, when their origins overlap, aligning the spatial coordinate axes (ensuring that the relative velocity is in the $x$-direction) and setting both clocks to 0. Having frames in standard configuration will prove very useful for simplifying calculations, as we will now see. We can generally choose to put inertial frames in standard configuration because our coordinatisation of space should not affect the physics of a given situation, but merely describe it.

Mathematically speaking, it is natural at this point to ask how we get from one inertial reference frame to another. In the context of Newtonian mechanics, we want to be able to transform spatial coordinates (and clocks, represented by a time coordinate) in such a way that Newton's second law holds, that is, the force on an object measured in one inertial frame is identical to that measured by any other inertial frame. The most intuitively obvious transformation laws we might consider are the Galileo transformations, which for two frames in standard configuration (a frame $S$ and a ''prime' frame $S'$) are given by
\begin{align*}
t'&=t\\
x'&=x-vt\\
y'&=y\\
z'&=z\\
\end{align*}
where $v$ is the relative velocity between the two frames and the primes denote the coordinates belonging to the $S'$ frame. These coordinate transformations immediately lead to a velocity transformation (for an object travelling in the $x$-direction):
\begin{align*}
u'&=\frac{\mathrm{d}x'}{\mathrm{d}t'}\\
&=\frac{\mathrm{d}}{\mathrm{d}t}(x-vt)\\
&=\frac{\mathrm{d}x}{\mathrm{d}t}-v\\
&=u-v\\
\end{align*}
where we have used the fact that $t'=t$. It is now not difficult to see how Newton's second law behaves under Galileo transformations. Let's consider an object accelerated parallel to the relative velocity of two frames in standard configuration (it is not difficult to generalise this case to acceleration in an arbitrary direction):
\begin{align*}
F'&=ma' \\
&=m\frac{\mathrm{d}u'}{\mathrm{d}t'} \\
&=m\frac{\mathrm{d}}{\mathrm{d}t'}(u-v) \\
&=m\frac{\mathrm{d}u}{\mathrm{d}t} \\
&=ma=F
\end{align*}
where we have assumed that masses are unchanged in different frames and used the fact that $v$ is constant. That Newton's second law is unchanged by Galileo transformations is what makes Newtonian mechanics 'Galileo-invariant'.

Let us now make a slight diversion into electrodynamics. Electrodynamics, naively the study of electricity and magnetism, is governed by Maxwell's equations. The equation for electromagnetic force can be derived from these equations, and is known as the Lorentz force law:
\begin{equation*}
\mathbf{F}_{\textrm{em}}=q(\mathbf{E}+\mathbf{u}\times\mathbf{B}).
\end{equation*}
It is immediately obvious that unlike Newton's second law, the Lorentz force law is not Galileo-invariant, as the velocity term $\mathbf{u}$ will be transformed by the Galileo transformations to $\mathbf{u}'=\mathbf{u}-\mathbf{v}$. Another issue with Maxwell's equations is that they permit two wave-like solutions, one for the electric and one for the magnetic field. These solutions are given as
\begin{align*}
\nabla^2\mathbf{E}&=\epsilon_0\mu_0\frac{\partial^2\mathbf{E}}{\partial t^2} \\
\nabla^2\mathbf{B}&=\epsilon_0\mu_0\frac{\partial^2\mathbf{B}}{\partial t^2}
\end{align*}
where $\mathbf{E}$ and $\mathbf{B}$ are the electric and magnetic fields respectively, and $\epsilon_0$ and $\mu_0$ are physical constants known as the vacuum permittivity and the vacuum permeability respectively. To compare, the wave equation for an arbitrary function $\phi$ is given by
\begin{equation*}
\nabla^2\phi=\frac{1}{c^2}\frac{\partial^2\phi}{\partial t^2}
\end{equation*}
where $c$ is a constant interpreted as the wave speed. We can therefore infer that for the wave-like solutions of Maxwell's equations, the wave speeds of the electric and magnetic waves in a vacuum are both given by $1/\sqrt{\epsilon_0\mu_0}=c$. As $\epsilon_0$ and $\mu_0$ are physical constants, so too must $c$ be. Physical constants cannot change value depending on your reference frame, so just as the Lorentz force law is not preserved under Galileo transformations, clearly neither are these wave-like solutions.

The resolution to this issue that was most favoured before the advent of relativity was that of the aether. It was postulated that Maxwell's equations were only valid in the reference frame of the luminiferous ("light-bearing") aether, a strange rigid fluid that permeated all space. The properties of the aether needed to be tweaked over time to make them come into line with experimental observations, but this could be managed -- at the price of a large number of assumptions about this apparently unobservable fluid, sometimes seemingly ad hoc. Special relativity offered an alternative which not only matched the data just as well, but was simpler, more predictive and much more elegant.

27 August 2014

News (2014/08/27)

This post might be slightly misleading in the sense that there isn't actually much news to report as of yet, I just thought I would keep my readers in the loop with regards to what's going on.

My next blog post is a work in progress. I don't want to spoil the surprise, but it will be related to the Riemann zeta function, which we have seen before in the first post on this blog, Playing with infinite series. Don't worry though, the post after that will be the final part of the 3-part basics of Fourier analysis series. Fair warning though, these posts might be coming a little later than usual (hence this blog post) as I'm anticipating I'll be quite busy in the next couple of months. There is a chance that I won't make any proper posts in that time, although I'll try to make occasional news posts like this to let everyone know what's going on.

By way of an advanced apology, I'm making a recommendation, just as I did in my first news post way back in March of this year. If you haven't heard of it, I highly recommend you take a look at The Conversation, which I'll naively describe as an online newspaper written by academics. There is a UK edition and an Australian edition (the latter of which I've linked to) and the Science + Technology section is usually a good read, although the depth of analysis is excellent across the board, so those with wider tastes should also appreciate it.

Hopefully the next proper post will be on its way soon, but if not I'll be sure the next news post has some proper news to report instead!

15 August 2014

Fourier fun 2: The full monty

Some other waveforms

This post is a sequel to my earlier post Fourier Fun 1: Sums and square waves, which gives a very basic first introduction to Fourier series. If you have not read that post, you may wish to do so before reading this one. 

We finished our last post on Fourier series by examining the Fourier series of the square wave, where we found for a square wave with an amplitude of $1$ and a period of $2\pi$ the Fourier series was given by
\begin{equation}
S(t)=\frac{4}{\pi}\sum_{n\textrm{ odd}} \frac{1}{n} \sin(nt).
\end{equation}
The square wave is one of the standard examples used when introducing Fourier series, but there are other waveforms we might want to consider. Two other standard examples are the triangle wave and the sawtooth wave (fig. 1). The triangle wave is similar to the square wave in that only odd components of the sum contribute:
\begin{equation}
T(t)=\frac{8}{\pi^2}\sum_{n\textrm{ odd}}\frac{(-1)^{(n-1)/2}}{n^2} \sin(nt).
\end{equation}
Apart from a couple of different terms, the main difference between the Fourier series of $S(t)$ and $T(t)$ is the sign term $(-1)^{(n-1)/2}$, which gives $+1$ for $n=1, 5, 9,\ldots$ and $-1$ for $n=3, 7, 11,\ldots$. This ensures the series gives a function which is strongly peaked rather than flat (fig. 2).

Figure 1: Plots of (a) a triangle wave $T(t)$ and (b) a sawtooth wave $W(t)$. 
Note that in this plot (and all others), the $t$-axis is shown in units of $\pi$ such 
that the period of both waves is $2\pi$. 

Figure 2: The Fourier components of the triangle wave (not to scale), with
positive components in blue and negative components in red. The alternating
positive and negative odd sine functions result in constructive interference
at the maximum of the lowest frequency mode. This causes the peak of the final
waveform to be very sharp . 

The sawtooth wave is a degree removed from both the square wave and the triangle wave in that, unlike the other two, it is not symmetric about its peaks. Part of the reason both the square wave and triangle wave have no odd-numbered components is that even-numbered components have zeroes ($t$-axis crossings) where the square and triangle waves have maxima. Any sine wave that is not displaced from the $t$-axis will have a positive region and a negative region on either side of every zero (fig. 3). If the square and triangle waves had even-numbered components, and those even-numbered components had zeroes aligned with the wave's peaks, reflection symmetry about the peaks would not be possible; one side would necessarily be 'more positive' on one side and 'more negative' on the other. As the sawtooth wave is not symmetric about its peaks, we should therefore expect it to include even-numbered components in its Fourier series, and this is indeed the case:
\begin{equation}
W(t)=\frac{2}{\pi}\sum_{n=1}^{\infty}\frac{(-1)^{n-1}}{n} \sin(nt).
\end{equation}

Figure 3: A half-period of the triangle wave is shown in blue, centred on a
peak. Overlaid in magenta is a sine function with twice the frequency
(half the period). A vertical red line denotes the line of reflection symmetry
of the triangle wave half-period. However, the sine wave is clearly positive
to the left of this line and negative to the right of it. As this is not
consistent with the symmetry of the triangle wave, we can deduce that it
cannot contribute to the Fourier series. 

The Fourier series

Even so, all three of the examples we have examined have some things in common that may seem to raise some suspicions. They all have amplitude $1$ and period $2\pi$, yes, but these things can easily be altered by multiplying the amplitude by some scalar factor and changing the period term $T$ in the series respectively. In other words, these things can be changed by tweaking the numbers in the equations we have already encountered. What about the vertical displacement? If we want to raise or lower our function with respect to the $t$-axis, there is no way to do that using the equations we have found thus far.

There is a simple fix for this, as we will see, but before we come to it there is something else considerably more suspicious we have yet to consider. All of the waves we have looked at 'begin'$^1$ in the same place, or rather have the same horizontal displacement, that is to say they have the same phase $\varphi$. What if, rather than having a zero at the origin, we wanted our waves to have a maximum? The solution seems pretty straightforward. If you have an arbitrary function $f(x)$ that you want to shift along the $x$-axis by some distance $\pm x'$, your function then becomes $f(x\mp x')$. For a sine function of period $T$, $\sin(2n\pi t/T)$, we shift it a quarter-period in the negative direction to give $\sin(2n\pi t/T+T/4)$. But this, unsurprisingly, is simply our old friend $\cos(2n\pi x/T)$.

So what do we do? Sure, we can replace all of our sines with cosines and recast our series that way, but all we have achieved then is an ability to 'begin' our waves a quarter-period over. It takes us from having only a single 'start' point, to two, which is an improvement without doubt but not incredibly more useful. What we really want, I should think, is the ability to 'start' our wave with whatever horizontal displacement we like. Given our previous accomplishment, this is only a further extension of the same thinking; rather than a shift of $T/4$ in the negative $t$-direction we choose any arbitrary phase shift $\varphi$ (with the sign of $\varphi$ indicating the direction of the shift) so that our Fourier series is now
\begin{equation*}
f(t)=\sum_{n=1}^{\infty}B_n\sin\left(\frac{2\pi nt}{T}-\varphi\right),
\end{equation*}
where we have called the amplitude $B_n$ to differentiate it from the $b_n$ amplitude that applies to un-shifted sine components.

This is all well and good, but we can do better by making a small but very significant change. What if, rather than applying a single phase shift $\varphi$ to every series component, we apply a different phase shift to each component, such that $\varphi_n$ is the phase shift for the $n^{\mathrm{th}}$ component? The increased flexibility of such a change is immediately obvious. Given this improvement, we can recast our Fourier series yet again into the form
\begin{equation}
\label{eq:fourierprim}
f(t)=\sum_{n=1}^{\infty}B_n\sin\left(\frac{2\pi nt}{T}-\varphi_n\right).
\end{equation}

For the most part, this expression is fine, but let's say for purely aesthetic reasons$^2$ we don't like having that $\varphi_n$ term there messing up our clean sine function notation. This can be resolved using a trigonometric identity:
\begin{align*}
\sin(\theta_1+\theta_2)&=\sin(\theta_2)\cos(\theta_1)+\cos(\theta_2)\sin(\theta_1)\\
\sin\left(\frac{2\pi nt}{T}+\varphi_n\right)&=\sin(\varphi_n)\cos\left(\frac{2\pi nt}{T}\right)+\cos(\varphi_n)\sin\left(\frac{2\pi nt}{T}\right)\\
&=\alpha_n\cos\left(\frac{2\pi nt}{T}\right)+\beta_n\sin\left(\frac{2\pi nt}{T}\right)
\end{align*}
where $\alpha_n=\sin(\varphi_n)$ and $\beta_n=\cos(\varphi_n)$. Inserting this result into equation \ref{eq:fourierprim} gives
\begin{align}
f(t)&=\sum_{n=1}^{\infty}B_n\Bigg(\alpha_n\cos\left(\frac{2\pi nt}{T}\right)+\beta_n\sin\left(\frac{2\pi nt}{T}\right)\Bigg)\nonumber\\
&=\sum_{n=1}^{\infty}\Bigg(a_n\cos\left(\frac{2\pi nt}{T}\right)+b_n\sin\left(\frac{2\pi nt}{T}\right)\Bigg),
\end{align}
where we have defined $a_n=B_n\alpha_n$ and $b_n=B_n\beta_n$, such that this $b_n$ is the same one we encountered previously. It can be shown$^3$ that, similar to $b_n$, the Fourier components $a_n$ are given by
\begin{equation}
a_n=\frac{2}{T}\int_{t_0}^{t_0+T}\! f(t)\cos\left(\frac{2\pi nt}{T}\right)\,\mathrm{d}t.
\end{equation}

Now we can, at last, return to the matter of the vertical displacement. Just as we considered how to translate an arbitrary function to determine the horizontal displacement, we can do the same for vertical displacement. For an arbitrary function $f(x)$, we shift it vertically by an amount $\pm C$ by making the very straightforward transformation $f(x)\mapsto f(x)\pm C$. Following this, we can modify our Fourier series formula to include a vertical displacement term:
\begin{equation}
\label{eq:Fourier}
f(t)=\frac{a_0}{2}+\sum_{n=1}^{\infty}\Bigg(a_n\cos\left(\frac{2\pi nt}{T}\right)+b_n\sin\left(\frac{2\pi nt}{T}\right)\Bigg),
\end{equation}
where our vertical displacement term, given by $C=a_0/2$, is a matter of convention.$^4$ This is the 'full' Fourier series formula, and with it we can find the Fourier series for any periodic function (in one dimension) with the freedom to pick any finite amplitude, period, vertical or horizontal displacement we choose.

This is the second part in a 3-part series on some of the basics of Fourier analysis. Part 3 can be found here.

Notes

$1$. When I say "begin" or "start" in this context, I put the terms in inverted commas because clearly periodic functions have no true beginning or end, they extend indefinitely in however many dimensions they are defined (that is, of course, unless their domain is limited, as is discussed in the "Making approximations" section). I use the terms here to refer to any point we choose to consider as a reference.

$2$. I admit I am being somewhat duplicitous here, as there is another reason to write the Fourier series as a sum of both sines and cosines, which I detail below. It is quite mathematically full-on compared to the rest of this post, so for those of the faint heart, you can rest easy in the knowledge that, like much mathematics, there is still an element of aesthetics involved, it is just somewhat more 'deep' than that provided in the main body.

Consider a complex function $g(z)$ such that we can take the Laurent expansion around zero:
\begin{equation*}
g(z)=\sum_{n=-\infty}^{\infty}c_n z^n,
\end{equation*}
where the Laurent coefficients $c_n$ are given by
\begin{equation*}
c_n=\frac{1}{2\pi i}\oint_{\gamma}\!\frac{g(z)}{z^{n+1}}\,\mathrm{d}z
\end{equation*}
where $\gamma$ denotes a non-self-intersecting contour which goes clockwise about a closed path enclosing $0$ within an annulus over which $g(z)$ is holomorphic (effectively smooth and well-behaved). What a mouthful! Now let's choose the contour given by $|z|=1$, that is, the unit circle in the complex plane. We can now choose the mapping $z=e^{i\theta}$ ($0\leq\theta\leq 2\pi$) which takes that annulus to a strip surrounding the real axis (and the unit circle to the real number line). We can similarly define a function $f(\theta)$ on this strip such that $f(\theta)=g(z)$. Making the substitution into the Laurent series we find
\begin{equation*}
f(\theta)=g(e^{i\theta})=\sum_{n=-\infty}^{\infty}c_n e^{in\theta}.
\end{equation*}
Considering the Laurent coefficients, we can make a change of variables by taking the derivative $\mathrm{d}z/\mathrm{d}\theta=ie^{i\theta}\rightarrow\mathrm{d}z=ie^{i\theta}\mathrm{d}\theta=iz\,\mathrm{d}\theta$:
\begin{align*}
c_n&=\frac{1}{2\pi i}\oint_{|z|=1}\! \frac{g(z)}{z^{n+1}}\,\mathrm{d}z\\
&=\frac{1}{2\pi i}\int_{0}^{2\pi}\! \frac{f(\theta)}{z^{n+1}}iz\,\mathrm{d}\theta\\
&=\frac{1}{2\pi}\int_{0}^{2\pi}\! \frac{f(\theta)}{z^n}\,\mathrm{d}\theta\\
&=\frac{1}{2\pi}\int_{0}^{2\pi}\! f(\theta)e^{-in\theta}\,\mathrm{d}\theta.
\end{align*}
As in the series $n$ runs from $-\infty$ to $\infty$, we can break up $n$ into positive, negative and $0$ cases. Let's apply Euler's formula $e^{ix}=\cos(x)+i\sin(x)$ to these three cases for $c_n$:
\begin{equation*}
c_n=
\begin{cases}
\frac{1}{2\pi}\int_{0}^{2\pi}\! f(\theta)\big(\cos(n\theta)-i\sin(n\theta)\big)\,\mathrm{d}\theta & \text{for } n > 0 \\
\frac{1}{2\pi}\int_{0}^{2\pi}\! f(\theta)\,\mathrm{d}\theta & \text{for } n = 0 \\
\frac{1}{2\pi}\int_{0}^{2\pi}\! f(\theta)\big(\cos(n\theta)+i\sin(n\theta)\big)\,\mathrm{d}\theta & \text{for } n < 0.
\end{cases}
\end{equation*}
If we compare these cases to the Fourier component equations for a function $f(\theta)$ with period $T=2\pi$, it is easy to formulate $c_n$ in terms of $a_n$ and $b_n$:
\begin{equation*}
c_n=
\begin{cases}
\frac{1}{2}(a_n-ib_n) & \text{for } n > 0 \\
\frac{a_0}{2} & \text{for } n = 0 \\
\frac{1}{2}(a_n+ib_n) & \text{for } n < 0.
\end{cases}
\end{equation*}
This is quite an interesting coincidence! Let's apply these definitions of $c_n$ and Euler's formula to the Laurent series for $f(\theta)$:
\begin{align*}
f(\theta)=\sum_{n=-\infty}^{\infty}c_n e^{in\theta}&=\sum_{n=1}^{\infty}c_{+n} e^{+in\theta}+c_0 e^0+\sum_{n=1}^{\infty}c_{-n} e^{-in\theta}\\
&=\frac{a_0}{2}+\sum_{n=1}^{\infty}\left(\frac{1}{2}(a_n-ib_n)\big(\cos(n\theta)+i\sin(n\theta)\big)\right.\\
&\qquad \left.+\frac{1}{2}(a_n+ib_n)\big(\cos(n\theta)-i\sin(n\theta)\big)\right)\\
&=\frac{a_0}{2}+\sum_{n=1}^{\infty}\big(a_n\cos(n\theta)+b_n\sin(n\theta)\big).
\end{align*}
Lo and behold, we have recovered the Fourier series! In doing so we have gone from the Laurent series for a complex function on the unit circle $S^1$ in a very natural way to the Fourier series for a (real) $2\pi$-periodic function, expressed in terms of complex exponentials which, in turn, can be very naturally expressed in terms of sines and cosines thanks to Euler's formula. Now the real meat of why this is significant is beyond the scope of this endnote, but I think for the time being it suffices to say that the link between the Laurent series of complex analysis and periodic functions, expressible through complex exponentials which are then very naturally expressible in terms of sine and cosine functions, is a good argument in favour of representing Fourier series in terms of these sine and cosine functions rather than in terms of sine functions with various phase shifts.

In truth, this is actually a good argument in favour of sines and cosines instead of any other set of periodic functions, rather than instead of only sines specifically. Even so, this is not the only reason why we might prefer to use both sines and cosines. Another mathematical reason is that sines and cosines are maximally orthogonal ($\int_0^{2\pi}\sin(nx)\cos(mx)=0$ as $\sin(nx)\cos(mx)$ is an odd-function). Of course, there are very compelling non-mathematical reasons, both in terms of historical development and practicality. The history of Fourier analysis I won't go into, but some practical elements will be discussed in Part 3 and viewed in that light, the preference for using sines and cosines may become more clear.

$3$. As we have not bothered to define any halfway-rigorous method of determining $B_n$ or $\varphi_n$ for a function $f(t)$, trying to determine the Fourier component $a_n$ from these quantities and $b_n$ is at the very least inadvisable. The best way to find $a_n$ is instead to follow the derivation of $b_n$, which I provided in Note 4 of Part 1 of this series, making suitable substitutions where appropriate. Due to the similarity between the two derivations, I leave this as an exercise for the reader.

$4$. To understand why this convention is chosen, consider the fact that the summation in equation \ref{eq:Fourier} begins at $n=1$ and not $n=0$. What happens if we take our formulae for $a_n$ and $b_n$ and set $n=0$? We quickly find that $b_0=0$ due to the $\sin(0)$ term. However, for $a_0$, we find
\begin{equation*}
a_0=\frac{2}{T}\int_{t_0}^{t_0+T}\!f(t)\,\mathrm{d}t.
\end{equation*}
Let us define $f(t)=g(t)+C$ such that $g(t)$ is the same function as $f(t)$ but shifted vertically so that it is as much above the $t$-axis as below; this will give $C$ as the vertical displacement of $f(t)$.
\begin{align*}
a_0&=\frac{2}{T}\int_{t_0}^{t_0+T}\!g(t)+C\,\mathrm{d}t\\
&=\frac{2}{T}\Big(G(t_0+T)+(t_0+T)C-\big(G(t_0)+t_0 C\big)\Big)\\
&=\frac{2}{T}\big(G(t_0+T)-G(t_0)+TC\big)
\end{align*}
where $G(t):=\int\!g(t)\,\mathrm{d}t$. We can see that $G(t_0+T)-G(t_0)=0$ because the integral effectively gives how much of a function is above or below the axis, and as $g(t)$ is as much above as below the axis over one period (since we defined it that way) the integral over the period $T$ must be $0$. Alternatively, we can exploit the periodicity of $g(t)$ (and hence $G(t)$) to argue that $G(t_0+T)=G(t_0)$ necessarily. Ultimately the effect is the same, that being we find $a_0=2TC/T=2C$ or, equivalently, $C=a_0/2$. Thus we find that the reason for this convention is that it makes the vertical displacement term consistent with the definitions of $a_n$ and $b_n$ that we have already established.

30 July 2014

How do paper planes work?

At a basic level, the flight of all aircraft is determined by four forces: lift (upwards force), thrust (forwards or 'accelerating' force), weight (downwards force, the force of gravity) and drag (backwards or slowing force). Paper aeroplanes are no exception, so this is a good place to start. They are un-powered after they are released, so they experience zero thrust during flight. As they are heavier than air, this makes them gliders, so their flight is best described as gliding (or perhaps falling with style).

Furthermore, the wings of paper aeroplanes provide very little lift. In large aircraft, the wings generate lift by exploiting Bernoulli's principle--a fluid (such as air) will have lower pressure the faster it moves, and vice versa. This is done primarily through the wing shape and the angle of the wing (the angle of attack) causing air to move over the wing faster than under it, causing a pressure differential which makes the wing rise. Do note, however, that the reason for the air moving over the wing faster than under is complicated and is not due to the air going over having to 'catch up' to the air going under (sometimes referred to as the equal transit-time fallacy), which is what I was taught as a kid!

Paper aeroplane wings are typically not curved (cambered) the way mechanical aeroplane wings are. While slight cambering can help provide more lift, not much cambering is required for the wings to generate a lot of drag, and the thinness of the paper minimising drag is a large reason for paper aeroplanes being able to stay airborne for as long as they do. Paper is also light, which helps to minimise the weight force that acts against the lift. 

While paper as a material clearly has some positives when it comes to flight, there are also drawbacks. For example, because paper is weak and most paper planes are constructed primarily by folding, the aspect ratio of paper aeroplane wings generally has to be very low. The aspect ratio of a wing is (very roughly) the ratio of its span (how far it stretches away from the body) to its chord (how far it stretches parallel to the body, see Fig. 1). Low aspect ratio wings, like those on paper aeroplanes, can also be found on fighter jets, while high aspect ratio wings are more common on commuter aircraft. This is because lower aspect ratio wings are generally better for faster flight and a higher aspect ratio is better for slower flight, so a slow paper aeroplane with low aspect ratio wings is disadvantaged. 


Figure 1: Examples of (a) a low aspect ratio wing and (b) a high aspect ratio wing.
The spans are denoted by $b$ while the chords at the fuselage are denoted by $c$.
In general the aspect ratio $AR$ is given by $AR=b^2/S$ where $S$ is the plan area.
Note that this reduces to $AR=b/c$ for a rectangular wing.

While the four forces of flight provide a good introduction to how flight works at a basic level, there are many other factors that come into play. One such factor is the stability of the aircraft, which is how well it handles small disruptions. At its simplest, an unstable aircraft will exaggerate the effect of a small disturbance, while a stable aircraft will naturally return to level flight. At neutral stability, a small disturbance would not be exaggerated but the plane would not rectify itself either; it would just continue onwards in the new direction. 

A good paper aeroplane must be stable or it will likely not be able to regularly glide very far. Many paper aeroplanes naturally have a lot of paper folded up near the front of the plane, which helps shift the centre of gravity (the balancing point) forward of the neutral point (the place where, if the centre of gravity were located there, the aircraft would have neutral stability). This helps give the aircraft stability, though too much weight forward will keep the nose down and paper aeroplanes don't have elevators to counteract this. This is only one type of stability (known as longitudinal static stability) and there are a number of others which good paper aeroplane design naturally accounts for. 

One such example is dihedral (bent-up) wings, which give the aeroplane lateral (roll) stability and prevent it from turning onto its back or spiralling, something which anhedral (bent-down) wings may encourage. Another example is winglets (small fins on the wing tips), which provide directional stability, i.e., they help keep the aeroplane headed in one direction, which can help compensate for the lack of a tail fin. For practical purposes, however, adding winglets can induce considerable drag (and thus actually reduce stability) if not angled straight-on. I won't go into more examples here, but as always I encourage the interested reader to pursue the topic further (there is a lot of very approachable material out there!). 

I hope this post has gone some way in explaining how a paper aeroplane flies. I think the best summary of what we've covered here is that it is very similar to how a full-scale aeroplane flies, but dramatically simplified. This is in retrospect somewhat obvious but, to me at least, it was very surprising! A final note: This is an excellent question, and when it was posed to me I couldn't give an answer that I would consider satisfactory (I still have some niggling doubts!). Since then I have gone to a number of Internet sources, listed below, from which I have drawn much of this post. I also solicited the help of my friend Kevin Yost, who knows a good deal more on the topic of aerospace engineering than I do. If you are interested in finding out more, I encourage you to do your own research, but be warned that most of what is out there is lacking in detail, so you may have to do some trawling.

Paper Aeroplanes and Wing-Tip Fins (this site may not be safe)
Wikipedia also has a number of very good pages on aerodynamics

26 June 2014

Fourier fun 1: Sums and square waves

Periodic functions

Please note that this post will assume a familiarity with summation notation. A brief explanation is given in my post on divergent series, which can be found here.

What is a Fourier series? In the simplest of terms, a Fourier series is a method of re-expressing (or approximating) any arbitrary periodic function, that is, a function that repeats itself over and over again (the length of the repeated interval is known as the period). This might sound very abstract, but it is an extremely powerful tool that asserts itself in very many fields, a couple of examples of which will be given later on. So how does it work? Let me show you.

In some sense, the most basic periodic function$^1$ is the sine function: $\sin$, although equally we could say it is the cosine function: $\cos$, which is the same as the sine function but shifted a quarter of a period across (see fig. 1). The natural expression of sine ($\sin(t)$) has a period of $2\pi$, or mathematically $\sin(t+2\pi)=\sin(t)$, but the function can be stretched or compressed to have an arbitrary period $T$ such that $\sin\left(\frac{2\pi}{T}t+T\right)=\sin\left(\frac{2\pi}{T}t\right)$. Note that for the specific case of $T=2\pi$ this reverts to the "natural" expression as expected. With this in mind, we can define a cosine with period $T$ in terms of sine as $\cos\left(\frac{2\pi}{T}t\right)=\sin\left(\frac{2\pi}{T}t+\frac{T}{4}\right)$.


Figure 1: $\sin(t)$ is shown in blue and $\cos(t)$ is shown in red. In this plot (and all
others except otherwise noted), the $t$-axis is shown in units of $\pi$ so that
$1.0$ corresponds to $\pi$, $2.0$ corresponds to $2\pi$, and so on. The fact that the
cosine function is related to the sine function by a quarter-period shift in the
negative direction is clearly demonstrated (recall that for $\sin(t)$ and $\cos(t)$ the
period $T$ is $2\pi$, so a quarter period shift is given by $T/4=2\pi/4=\pi/2$ or
$0.5$ in units of $\pi$ as shown on the plot).

This is all well and good, but there are many other periodic functions we might be interested in besides sine and cosine. One simple example is the square wave $S(t)$, which is the signal you get when, for example, you switch a DC power source on and off (and on and off and so on) at regular intervals. This is an example of a digital signal, and mathematically we would represent it as a function that oscillates evenly between $1$ and $-1$ over every period of length $T$ (see fig. 2)$^2$. What we will make special note of, however, is that the square wave looks very much like a flattened, widened $\sin(t)$ (assuming $S(t)$ is defined such that $T=2\pi$). This suggests the question of if and how we might get be able to get from a sine wave to a square wave.


Figure 2: $S(t)$ is shown in blue and $\sin(t)$ is shown in red. In this plot,
the similarity between the square wave (as we have defined it) and the
sine wave is apparent.

If we think additively, to get to $S(t)$ from $\sin(t)$ we need to add to the function near the zeroes (but not at them!) to amplify those regions and subtract from the function near the peaks to flatten them out. What's more, whatever function we add to $\sin(t)$ will also have to be periodic with a period that's an integer multiple of $2\pi$ so that the alterations we make occur everywhere at the right places (so that, for example, we don't add anything to our zero points). It just so happens that we find an excellent candidate in $\sin(3t)$—if we scale $\sin(t)$ and $\sin(3t)$ just right and then add them together, we get a little bit closer to $S(t)$. Repeating the process, we find we get closer if we add an appropriately scaled $\sin(5t)$ and closer again by adding more and more similar sine functions (see fig. 3).


Figure 3: For appropriately chosen amplitudes, the square wave $S(t)$ (shown in blue in the left column)
can be approximated by a sum of sine functions $\sin(nt)$ (shown in red in the left column). The right
column shows the sine functions used for each summation to the immediate left (the Fourier components).
The summations used (a) two components; (b) three components; (c) four components; and (d) five
components.

The square wave

As adding more and more sine functions improves our approximation to the square wave, we can reasonably assume$^3$ that as the number of sine functions we add approaches infinity, the approximation to the square wave becomes arbitrarily accurate. As we noted in the previous section, however, we are only interested in sine functions $\sin(nt)$ with odd values of $n$. Thus we can write our definition of the square wave as
\begin{equation} \label{eq:square1} S(t)=\sum_{n\mathrm{ odd}}b_n\sin(nt), \end{equation}
or equivalently,
\begin{equation} \label{eq:square2} S(t)=\sum^\infty_{m=1}b_m\sin\big((2m-1)t\big), \end{equation}
where in the second equation we have just replaced the positive odd numbers $n$ with the positive integers $m$ such that $2m-1=n$ is still always odd. The terms $b_n$ are the Fourier coefficients (think scaling factors or amplitudes) corresponding to each sine function $\sin(nt)$ such that the infinite sum can be expanded as $b_1\sin(t)+b_3\sin(3t)+b_5\sin(5t)+\ldots$.

Equations \ref{eq:square1} and \ref{eq:square2} are both examples of Fourier series. However, they are not very useful as we haven't found the scaling factors $b_n$. How do we find them? It turns out that's a little more complicated. The derivation for these terms involve a little bit of calculus and so is probably a little beyond the scope of this post, but I strongly encourage the interested reader to turn to the notes where I have given a sketched-out version.$^4$ The result we find, however, is that
\begin{equation} \label{eq:b} b_n = \frac{2}{T}\int_{t_0}^{t_0+T} \! f(t)\sin\left(\frac{2\pi nt}{T}\right) \, \mathrm{d}t \end{equation}
for the general case of a periodic function $f(t)$ with period $T$, where $t_0$ is any arbitrary $t$-value of our choosing. Note that as this applies to the general case, the $n$ in this expression could be odd or even.

Let us now calculate the $b_n$ terms for the square wave [A warning: this section requires a little calculus. It isn't as full-on as that required for the derivation which is why I have decided to include it here. If you have vague memories of calculus from high school you should be able to make it through with some care, but do not worry if you have some trouble or you can't follow it]. The first thing we must note is that both $S(t)$ and $\sin(2\pi nt/T)$ are odd functions, that is, they look upside-down when reflected across the $y$-axis (this is expressed mathematically as $f(-t)=-f(t)$). The product of two odd functions is an even function, that is, a function that looks the same when reflected across the $y$-axis (mathematically, $f(-t)=f(t)$; the cosine function is an example of an even function). That means that the product $S(t)\sin(2\pi nt/T)$ must be an even function. We can use this fact to our advantage! When we evaluate our integral from equation \ref{eq:b} we can choose $t_0=-T/2$ so that the integral ranges from $-T/2$ to $T/2$. Because the integrand is even, and therefore symmetric about $t=0$, the integral from $-T/2$ to $T/2$ is double the integral from $0$ to $T/2$: $\int_{-T/2}^{T/2}S(t)\sin(2\pi nt/T)\mathrm{d}t=2\int_{0}^{T/2}S(t)\sin(2\pi nt/T)\mathrm{d}t$ (recall that the integral calculates the area under the function). Using this fact and substituting in $T=2\pi$, equation \ref{eq:b} becomes
\begin{align} b_n &= \frac{4}{T}\int_{0}^{T/2} \! S(t)\sin\left(\frac{2\pi nt}{T}\right) \, \mathrm{d}t \nonumber \\ &= \frac{2}{\pi}\int_{0}^{\pi} \! S(t)\sin(nt) \, \mathrm{d}t. \end{align}
However, as we can see from fig. 2, between $t=0$ and $t=\pi$, $S(t)=1$. This equation now becomes simple to solve:
\begin{align*}b_n &= \frac{2}{\pi}\int_{0}^{\pi} \! \sin(nt) \, \mathrm{d}t \\&= \frac{2}{\pi}\bigg[-\cos(nt)\bigg]_0^\pi \\&= -\frac{2}{\pi}\big(\cos(\pi n)-\cos(0)\big).\end{align*}
As $\cos(0)=1$ and $\cos(\pi n)=\pm 1$ depending on whether or not $n$ is odd or even, we find that
\[b_n=\begin{cases}4/\pi n & \text{when } n \text{ is odd} \\0 & \text{when } n \text{ is even}\end{cases}\]
Thus, as expected, only odd $n$ terms contribute to the sum -- a very neat result to fall out! Now that we know the Fourier coefficients we know the Fourier series for the square wave (varying from $-1$ to $1$ with period $T=2\pi$) in full:
\begin{equation} S(t)=\frac{4}{\pi}\sum_{n\textrm{ odd}} \frac{1}{n} \sin(nt). \end{equation}

This is the first part in a 3-part series on some of the basics of Fourier analysis. Part 2 can be found here.

Notes

$1$. Here "most basic" refers both to the fact that sine and cosine are typically the only periodic functions taught in school and the more fundamental significance of its relationship to the unit circle.

$2$. Of course, the amplitude of the square wave is only a matter of convention (likewise the period); my preferred representation varies between $0$ and $1$ instead of $-1$ and $1$, but the transformation from the main body text version to my preference is simple: $S(t)\mapsto(S(t)+1)/2$. In the body text I used the $-1$ to $1$ form only because it is more immediately relatable to $\sin(t)$ and therefore, I hope, the Fourier sum becomes less conceptually difficult.

$3$. Of course, the convergence of Fourier series has been proven rigorously, but I leave the investigation of that topic to the interested reader. For the examples shown in this post, convergence can be inferred reasonably safely from the appearance of an ever-improving approximation, except notably at non-differentiable (sharp or discontinuous) points.

$4$. Suppose we have a function $f(t)$ that can be expressed purely as a sum of sine functions, similar to the case of $S(t)$. For this arbitrary periodic function we then have \begin{equation*}f(t)=b_1\sin(\omega_1t)+b_2\sin(\omega_2t)+b_3\sin(\omega_3t)+\ldots,\end{equation*} where $\omega_n=2\pi n/T$. Multiplying both sides by $\sin(\omega_nt)$ gives
\begin{align*}f(t)\sin(\omega_nt)&=(b_1\sin(\omega_1t)+b_2\sin(\omega_2t)+b_3\sin(\omega_3t)+\ldots)\sin(\omega_nt) \\ &=b_1\sin(\omega_1t)\sin(\omega_nt)+b_2\sin(\omega_2t)\sin(\omega_nt)+b_3\sin(\omega_3t)\sin(\omega_nt)+\ldots\end{align*}
We then wish to integrate both sides over one period $T$, giving
\begin{equation*} \int_{t_0}^{t_0+T} \! f(t)\sin(\omega_nt) \, \mathrm{d}t=\int_{t_0}^{t_0+T} \! \left(b_1\sin(\omega_1t)\sin(\omega_nt)+b_2\sin(\omega_2t)\sin(\omega_nt)+\ldots\right) \, \mathrm{d}t\end{equation*}
where $t_0$ is any arbitrary $t$-value we wish to choose to begin our integration at (because the function is periodic and the integration is over a period, the choice of starting point is not important).
We now consider the trigonometric identity $\sin(\theta)\sin(\phi)=(\cos(\theta-\phi)-\cos(\theta+\phi))/2$. Using $\omega_n$ and $\omega_m$ in place of $\theta$ and $\phi$ respectively we can integrate over a period beginning at $t_0=0$ (without loss of generality) to give
\begin{align*}\int_{0}^{T} \! \sin(\omega_nt)\sin(\omega_mt) \, \mathrm{d}t &= \int_{0}^{T} \! \sin\left(\frac{2\pi nt}{T}\right)\sin\left(\frac{2\pi mt}{T}\right) \, \mathrm{d}t \\ &= \frac{1}{2} \int_{0}^{T} \! \cos\left(\frac{2\pi (n-m)t}{T}\right)-\cos\left(\frac{2\pi (n+m)t}{T}\right) \, \mathrm{d}t.\end{align*}
We now consider two cases:
$1) \: n=m$
\begin{align*}\int_{0}^{T} \! \sin^2(\omega_nt) \, \mathrm{d}t &= \frac{1}{2}\int_{0}^{T} \! 1-\cos\left(\frac{4\pi nt}{T}\right) \, \mathrm{d}t \\ &= \frac{1}{2}\left[t-\frac{T}{4\pi n}\sin\left(\frac{4\pi nt}{T}\right)\right]^T_0 \\ &= \frac{T}{2}-\frac{T}{8\pi n}\sin(4\pi n) = \frac{T}{2} \end{align*}
as $n$ is a non-zero integer and so $\sin(4\pi n)=0$ for any $n$.
$2) \: n\neq m$
\begin{align*} \int_{0}^{T} \! \sin(\omega_nt)\sin(\omega_mt) \, \mathrm{d}t &= \frac{1}{2}\int_{0}^{T} \! \cos\left(\frac{2\pi (n-m)t}{T}\right)-\cos\left(\frac{2\pi (n+m)t}{T}\right) \, \mathrm{d}t \\ &= \frac{1}{2}\left[\frac{T}{2\pi (n-m)}\sin\left(\frac{2\pi (n-m)t}{T}\right)\right. \\ &\qquad \left.-\frac{T}{2\pi (n+m)}\sin\left(\frac{2\pi (n+m)t}{T}\right)\right]^T_0 \\ &= \frac{T}{4\pi}\left(\frac{1}{(n-m)}\sin(2\pi (n-m))-\frac{1}{(n+m)}\sin(2\pi (n+m))\right) \\ &=0 \end{align*}
as $n$ and $m$ are non-zero integers and so $\sin(2\pi (n\pm m))=0$ for any $n$ and $m$.
These two cases can be summarised as $\int_{0}^{T} \! \sin(\omega_nt)\sin(\omega_mt) \, \mathrm{d}t = T\delta_{nm}/2$ where $\delta_{nm}$ is the Kronecker delta, which gives $1$ when $n=m$ and $0$ otherwise. This property is known as 'orthogonality', and the set of sine functions of the form $\sin(\omega_nt)$ provides an example of such orthogonal functions.
We can use this orthogonality result to evaluate the integral:
\begin{align*} \int_{t_0}^{t_0+T} \! f(t)\sin(\omega_nt) \, \mathrm{d}t &= \int_{t_0}^{t_0+T} \! \left(b_1\sin(\omega_1t)\sin(\omega_nt)+b_2\sin(\omega_2t)\sin(\omega_nt)+\ldots\right) \, \mathrm{d}t \\ &=\int_{t_0}^{t_0+T} \! b_n\sin^2(\omega_nt)  \, \mathrm{d}t \\ &= b_n\frac{T}{2} \\ \Rightarrow b_n &= \frac{2}{T}\int_{t_0}^{t_0+T} \! f(t)\sin(\omega_nt) \, \mathrm{d}t \end{align*}
where in going to the second line we have used the fact that $\int_{t_0}^{t_0+T} \! b_n\sin(\omega_nt)\sin(\omega_mt)  \, \mathrm{d}t=0$ for all $n\neq m$, leaving only the $n=m$ case relevant, the result from which is used in going to the third line.

16 June 2014

News (2014/06/22)

This news post will be a short one, because I think the news should be pretty obvious -- I've started a new blog! Background Independence will be a home for all of my writings on physics, mathematics, science communication, education, and everything in that vein. With some luck I'll be updating it semi-regularly, so you should have good reason to keep track!

An introduction to the blog can be found on the About page, although I'll be running it in much the same way as Philosophia Mea (which I won't be abandoning completely!) so long-time readers won't need much catching up. One small note, I discourage readers from viewing the blog on mobile devices (not including tablets) if they can. In order to make the mathematics render on mobile I've had to use a mobile template which is not especially user-friendly, and strongly resembles the standard version. Everything is still there, but for the time being you might have to do a lot of reading zoomed-in, with lots of side-scrolling.

In physics news, the controversy regarding the BICEP2 results rages on, with an altered version of their initial pre-print paper being published in Physical Review Letters, a premier physics journal. How significant the alterations are is a matter of perspective, but certainly questions about the accuracy of the group's conclusions are yet to be answered, and likely won't be at least until the Planck group releases their data, which is expected to be more precise, around October.

In somewhat less science-y news, a couple of weeks ago was a rare full moon Friday 13th, which you would think would be off the bad luck charts. I'm not superstitious myself, and as far as I can remember my day went by fairly unremarkably. It was my intention to write a belated post (tongue firmly in cheek) on ghost fields in particle physics to mark the occasion, but unfortunately life has gotten in the way and I haven't had the time, nor likely will I any time soon, so that Wikipedia article will have to do for the time being.

Apart from that, I'll be restarting work on the Fourier post in earnest soon, so I expect to have it up in full or in part within the next couple of weeks, so keep an eye out!

News (2014/05/29)

First of all, I must apologise for the fact that I haven't been updating my blog as much as I'd intended to over the past few weeks. As often happens, real life has gotten in the way, including (for example) my graduation. I can now officially carry the post-nominal letters BScAdv(Hons), although anyone concerned for my ever-inflating ego will no doubt be glad to hear that I have very little intention of doing so.

In terms of the blog, I'm currently working on a post on Fourier analysis and the following post will likely be on the physics of mechanical flight, by popular request. At the current rate I'm going that might not be for a while though!

In physics news, since my last news post some rumours have arisen that the BICEP2 results may not be as sound as initially claimed. Basically, Adam Falkowski has claimed that the BICEP2 collaboration have miscalculated the galactic foreground radiation by misinterpreting an image on a Planck collaboration slide, and their primordial polarisations can mostly be accounted for due to this error (a claim that the BICEP2 team strongly denies). A sceptical take on the rumour is provided by Sesh Nadathur, who argues that the issue has been blown far out of proportion. It will be interesting to see how things unfold in the coming months!

Why Heisenberg uncertainty is not that weird

Whenever quantum mechanics (QM) is brought up in a popular context, in a scientific or pseudo-scientific way, the 'weirdness' of it is almost always mentioned, and the Heisenberg uncertainty principle$^1$ is almost always the go-to example of the weirdness (although in pseudo-scientific contexts it is almost always misrepresented).

So what is Heisenberg uncertainty? Simply put, it is a restriction on the accuracy of simultaneous measurements of 'observables' (measurable quantities). The prototypical example is position and momentum; Heisenberg uncertainty states that the position and momentum of a particle$^2$ cannot be known simultaneously to arbitrary precision. The mathematical statement of the position-momentum uncertainty principle is
\begin{equation}
\Delta x\Delta p\leq\frac{\hbar}{2}
\end{equation}
where $\hbar$ is the reduced Planck constant and $\Delta x$, $\Delta p$ represent (in some sense)$^3$ the uncertainty in $x$ and $p$ respectively. Effectively, the better you know position, the less well you know momentum, and vice versa. This is not a specifically experimental limitation, but a fundamental theoretical one. This sort of restriction, on first viewing, indeed seems very strange and certainly counter-intuitive. I will attempt to convince you that not only is it not necessarily strange, but expected.

What is important to remember about QM is that wave mechanics is a central theme. Particles are represented by wave-functions, which are complex solutions to the Schrödinger equation,$^4$ and this wave-nature contributes to a good deal of the quantum weirdness we are familiar with (an example is shown in a recent blog post of mine which relies on superposition and destructive interference of photon waves). With this in mind, let's take a look at some wave-functions.

For illustrative purposes, we will work in one spatial dimension and free space (zero potential everywhere) and only consider time-independent wave-functions. The simplest example of such a wave-function is the plane wave, which takes the form
\begin{equation}
\psi(x)=Ae^{ikx}\equiv Ae^{ipx/\hbar}.
\end{equation}
Here $A$ is the complex-valued amplitude. The amplitude in this case is not important, because any wave-function must be normalisable (as the probability distribution function, which must integrate to 1 over all space to preserve conservation of probability, is given by $|\psi(x)|^2$, known as the Born rule) and so $A$ will need to be scaled anyway. For those who are unfamiliar with complex exponential form, the waviness is more explicit in the less compact form $\exp{(ikx)}\equiv\cos{(kx)}+i\sin{(kx)}$. The $k$ is the wave-number (or wave-vector in higher dimensions), and this term appears naturally in most of the mathematics I'm presenting in this post. For this reason I will include the version of the equations with $k$ alongside the version with the more physically immediate momentum $p$ (the conversion is simply $p=\hbar k$).

Because the Schrödinger equation is linear, sums of solutions are themselves solutions (this property is known as the superposition principle). That means we can have wave-functions of the form
\begin{equation}
\psi(x)=\sum_{m=0}^{n}A_me^{ip_mx/\hbar}
\end{equation}
for any arbitrary $n$ (finite or infinite). Here the scaling of the $A$ values is still not important because of normalisation, but the relative magnitude of them is, as this determines the relative probability weightings according to the Born rule. However, because we have wavelength $\lambda_m=2\pi/k_m\equiv2\pi\hbar/p_m$, we can see that the above formulation does not capture the full number of possible modes; only integer multiples of the $m=0$ mode are captured. In free space all modes are permissible and so we can take the continuum limit (let the sum turn into an integral):
\begin{equation}
\psi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}\tilde{\psi}(k) e^{ikx}\mathrm{d}k\equiv\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\tilde{\psi}(p) e^{ipx/\hbar}\mathrm{d}p.
\end{equation}
Because we have moved away from integer indexing, the discrete set of amplitudes $A_m$ is replaced by the continuous function that is suggestively symbolled $\tilde{\psi}$. The function ranges over $p$ because we are integrating over all possible modes/wavelengths/momenta—in a sense $p$ takes over the role of index in the integral from $m$ in the summation. The factor of $1/\sqrt{2\pi}$ is a matter of convention and the factor of $1/\sqrt{\hbar}$ comes from the change of $k$ to $p$.

There is more to $\tilde{\psi}$ than meets the eye. Not only is it the amplitude function for the integral, but it's actually the wave-function itself, except not in physical space like $\psi$ but in momentum space.$^5$ In the context of QM this is known as the momentum space representation of the wave-function, but more broadly the mathematical construct is known as the Fourier transform,$^6$ and Fourier transforms occur very frequently in all manner of physical theories involving waves, be they QM, acoustics, optics, crystallography, signal analysis and so on.

So how do we determine the form of $\tilde{\psi}$? As it turns out, perhaps unsurprisingly, the Fourier transform is invertible, and so we find that
\begin{equation}
\tilde{\psi}(p)=\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\psi(x) e^{-ipx/\hbar}\mathrm{d}x.
\end{equation}
All well and good, but how do we make sense of it? Well, let's consider a limiting case. We can select out a single mode by using a Dirac delta such that $\tilde{\psi}(p)=\delta(p-p_0)$. This Dirac delta is zero everywhere except at $p=p_0$, where it is undefined but the area under the Dirac delta is always normalised to $1$. Inserting the delta into equation (4) yields$^7$
\begin{equation}
\psi(x)=\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\delta(p-p_0)e^{ipx/\hbar}\mathrm{d}p=\frac{e^{ip_0x/\hbar}}{\sqrt{2\pi\hbar}},
\end{equation}
which is, outside a numerical factor, effectively a complex exponential in $x$, or equivalently, a flat (complex-valued) wave across all space (applying the Born rule to a plane wave yields a probability distribution of $|e^{iz}|^2=\text{const.}$). So when the momentum is maximally well-defined (put into a single mode) the position is maximally poorly-defined (the wave-function is spread evenly across all space). By the invertibility of the Fourier transform, we can expect the vice versa case to hold (a maximally defined position, i.e., Dirac delta in $x$, will result in a maximally poorly-defined momentum, i.e., an even wave across all momentum space). This relationships lies at the heart of the Heisenberg uncertainty principle.



In this figure, the blue curve is the probability distribution for
a Gaussian wave-packet and the red curve is the Fourier transform
of the blue curve. All sub-figures are to the same (arbitrary) scale.
(i) The blue curve is given by $|\exp{(-x^2+ix)}|^2$. (ii) The blue
curve is given by $|\exp{(-(2x)^2+i(2x))}|^2$. (iii) The blue curve
is given by $|\exp{(-(4x)^2+i(4x))}|^2$. It is clear to see that as
the width of the blue curve is decreased, the width of its Fourier
transform correspondingly increases. 


It is easy to see why (physically speaking) a Dirac delta in momentum space produces a plane wave in position space, but perhaps not so for the other way around. One way to think of it is by considering the example of a Gaussian wave-packet in position space, which takes the Gaussian distribution as its probability distribution, making it somewhat localised (it's non-zero across all space, but asymptotes to zero approaching $\pm\infty$). It is not hard to show that the Fourier transform of the Gaussian wave-packet is also a Gaussian wave-packet. As we decrease the width of the position space wave-packet, we expect to require higher-frequency momentum modes as these will have shorter wavelengths and thus allow for more cancellation closer to the peak, therefore the momentum space wave-packet must correspondingly increase in width. Taking the limiting case, as the position space wave-packet becomes increasingly peaked the momentum space wave-packet must become increasingly spread-out, leading to the Dirac delta-plane wave correspondence we expected. This argument can also be used in reverse to achieve the more intuitive result of a Dirac delta in momentum space leading to a plane wave in position space.


As hinted to at that start of the post, position and momentum are not the only observables which obey Heisenberg uncertainty. The next most common pairing$^8$ is energy and time,$^9$ although uncertainty relationships can be generated more generally by taking the derivative of the (classical) action (and quantising). For example, momentum is the derivative of the action with respect to position, energy is the derivative of the action with respect to time, and so on.

Hopefully I have demonstrated that Heisenberg uncertainty is not so strange as might have first appeared. Rather than an arbitrary restriction on how accurately we can know certain measurable quantities, it is in fact a basic and unavoidable feature of any linear wave theory, of which quantum mechanics is only one example, albeit of a more fundamental and therefore perhaps more intuitively challenging sort than most.

Notes

$1$. The Heisenberg uncertainty principle is so ubiquitous in quantum physics that it is frequently referred to simply as 'the uncertainty principle'. Out of habit, however, I tend to use the less common 'Heisenberg uncertainty' to differentiate it from other, admittedly much less common uncertainty relations. As far as I know, any of these uses are considered acceptable.

$2$. I hesitate to use the word 'particle', as it is important when talking about these quantum concepts to be very clear about what one means. A better term might have been 'quantum' or 'wave-particle', as the wave-nature of QM is central to the discussion of Heisenberg uncertainty. However, despite the slightly misleading connotations, 'particle' is by far the most commonly used term and so I will use it also.

$3$. We could, for example, take the standard deviation in $x$ and $p$, $\sigma_x$ and $\sigma_p$, as these will be represented by continuous distributions.

$4$. In this post I will use the Schrödinger representation of QM as this makes explicit the wave-nature of the wave-function. However, there are many, many representations of QM, each with their own advantages (and disadvantages) when it comes to analysing real systems, but importantly they are all exactly equivalent and so this wave-nature I have been emphasising is intrinsic to all of them. In that sense I could just as well have chosen any representation and this blog post would otherwise have been identical, although perhaps not as easy to understand.

$5$. If you don't know what momentum space is, the most important thing to understand is that there are mathematical spaces other than the space(time) we are familiar with. Let's consider the flat 3-dimensional "position" space (3-space) of ordinary life, with $x$-, $y$- and $z$-directions. Suppose we have an object at the coordinates $x=1$, $y=0$ and $z=0$. This can be represented by a vector in the 3-space going to the point $(1,0,0)$, thus describing the position. Now suppose the 3-space we are looking at is part of a 4-space that includes time, except we are going to set the time to some instant and freeze it there.

Let's say at that instant the object at $(1,0,0)$ is travelling with a momentum of $0$ in the $x$-direction, $1$ in the $y$-direction and $0$ in the $z$-direction (in arbitrary units). We could then construct a "momentum" 3-space (or 4-space including time) with directions $p_x$, $p_y$ and $p_z$ and at that instant of time the vector corresponding to the object would be at the coordinates $(0,1,0)$. So for any $n$-dimensional position space it's easy to see there is a corresponding $n$-dimensional momentum space. In fact, we can define a $2n$-dimensional space known as the phase space by combining the position and momentum spaces, and that space will describe all possible states of a physical system.

$6$. I want to stress that strictly speaking the momentum representation of the wave-function is not the Fourier transform of the position representation. The Fourier transform dual to position is the wave-vector (or wave-number in 1-D) and not the momentum, although given one is a scalar multiple of the other I feel like we can be a little loose in our communication in this one respect.

$7$. In evaluating equation (6) we have made use of the so-called sifting property of the Dirac delta, where $\int_{-\infty}^{\infty}\delta(x-a)f(x)\mathrm{d}x=f(a)$. This property is analogous to the Kronecker delta for sums but used instead of integrals and is arguably its most useful feature.

$8$. These pairs of variables are typically referred to in physics as 'conjugate variables', although in this context we can also call them Fourier transform duals. It is important to remember that we would be less inclined to refer to them as such if we were working in, for example, the Heisenberg representation of QM where the Heisenberg uncertainty principle arises more directly out of the non-commutativity of Hermitian operator matrices. This is only because the Fourier transforms are implicit in that representation; they are still there in some sense due to the equivalence of representations of QM as discussed in Note 4, but are not nearly as obvious.

$9$. The energy-time Heisenberg uncertainty principle is given mathematically as $\Delta E\Delta t=\hbar/2$. This is analogous to position-momentum uncertainty in the sense that the Fourier dual of position is wave-number $k=p/\hbar$ and not momentum directly; the Fourier dual of time is technically angular frequency $\omega=E/\hbar$ and not energy directly. As in the position-momentum case, however, the difference is only a scalar factor of $\hbar$ and so we can speak reasonably loosely with some impunity.

News (2014/04/10)

Two items of news to report this week (well, one and a half at least). The half-piece of news is that I'm working on a new blog post which will hopefully be posted some time next week (although it might take until the week after).

The real news is that the LHC has confirmed the existence of Z(4430), a so-called "exotic hadron". Hadrons are composite particles made of quarks (and held together by gluons). According to the quark model, hadrons can only form in one of two ways: a quark-antiquark pairing (known as a meson) and in a quark triplet (know as a baryon).

The most common hadrons in the universe are protons and neutrons which form atomic nuclei. Protons consist of 2 up-type quarks and 1 down-type quark ("uud") while neutrons consist of 1 up quark and 2 down quarks ("udd"). The names and details of quark types, which are known as "flavours", is not something I will go into here, as it is an interesting enough topic to deserve its own post (although a proper explanation for the layperson would need a little more than that I think).

The quark model is a simple one though and does not describe all of the dynamics permitted by quantum chromodynamics ("QCD", the part of the Standard Model that describes strong interactions). This leaves open the door for exotic hadrons which are not mesons or baryons. Z(4430) is one such exotic hadron.

It was first 'discovered' in 2007 (although 5-sigma confirmation didn't come until 2008) and has now been observed at the LHCb experiment at 13.9-sigma accuracy. This means the chances of the observation being a statistical fluke are $1$ in $1.579\times10^{43}$ (a very, very, very large number indeed). It is believed to be a tetraquark made up of 1 charm quark, 1 charm antiquark, 1 down quark and 1 up antiquark ("ccdu").

While perhaps not as exciting as, for example, the BICEP2 result recently, this confirmation is still a very interesting result and will hopefully spur on further developments in the search for exotic hadrons.

A very dangerous factory

Suppose a new type of bomb is invented whose detonation device is so incredibly sensitive that if it comes into contact with a single particle it will explode. Putting aside the impracticality of such a weapon (and the obvious factory OH&S issues), the producer wishes to maintain quality control as, with anything, some bombs will be faulty and not have detonation devices attached. The question immediately arises: Is it possible to have some ensemble of bombs which we can guarantee contains no faulty weapons?

This question is known as the Elitzur-Vaidman bomb-testing problem, and although one can arrive after reasonably little thought at the fairly obvious answer that no such ensemble is possible (as any direct observation using light or matter will detonate any working bombs), in actual fact such an ensemble is possible! How can this be the case? The short answer is: quantum effects. The long answer? Read on!

Figure 1: A Mach-Zehnder interferometer with faulty bomb $B$ in place and all branches labelled. Note that the $d$-branch is drawn only for illustrative purposes; the photon cannot be detected along the $d$-branch due to destructive interference (see equation 1). (1) The single-photon source $S$. (2) One of the the two 50:50 beam-splitters which are both assumed to be lossless. (3) One of the two mirrors which are assumed to be perfectly reflective. (4) One of the two detectors which are assumed to be perfect detectors. 

The solution to this problem involves the use of a Mach-Zehnder interferometer (Fig. 1) with a single-photon source. To see how, let's consider the case of the interferometer without any bomb in place. We then have
\begin{align}\label{eq:MZ}
\left|s\right\rangle &\rightarrow \frac{i}{\sqrt{2}}\left|u\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left(\frac{1}{\sqrt{2}}\left|c\right\rangle + \frac{i}{\sqrt{2}}\left|d\right\rangle\right) + \frac{1}{\sqrt{2}}\left(\frac{i}{\sqrt{2}}\left|c\right\rangle + \frac{1}{\sqrt{2}}\left|d\right\rangle\right) \nonumber \\
&= \frac{i}{2}\left|c\right\rangle + \frac{-1}{2}\left|d\right\rangle + \frac{i}{2}\left|c\right\rangle + \frac{1}{2}\left|d\right\rangle \nonumber \\
&= i\left|c\right\rangle,
\end{align}
where $\left|a\right\rangle$ represents the quantum state in the $a$-branch of the interferometer (as labelled in Fig. 1) and $i$ is the imaginary unit.$^{1}$ What the above calculation shows$^{2}$ is that (somewhat surprisingly) despite the branching at the second beam-splitter, destructive interference along $d$ and constructive interference along $c$ causes the photon to always be detected at $C$ and never at $D$ (for this alignment).


Figure 2: A Mach-Zehnder interferometer with working bomb $B$ in place and all branches labelled. Note that $B$ blocks the $u$-branch whether the photon interacts with the detector or not (the case of an interaction is not illustrated here as this would correspond to the detonation of the bomb).

Now let's consider the same Mach-Zehnder interferometer but with a bomb placed such that the detector will be along the $u$-branch (as shown in Fig. 2). In this case we have
\begin{align}\label{eq:bomb}
\left|s\right\rangle\left|B_0\right\rangle &\rightarrow \frac{i}{\sqrt{2}}\left|u\right\rangle\left|B_0\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle\left|B_0\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left|X\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle\left|B_0\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left|X\right\rangle + \frac{1}{\sqrt{2}}\left(\frac{i}{\sqrt{2}}\left|c\right\rangle + \frac{1}{\sqrt{2}}\left|d\right\rangle\right)\left|B_0\right\rangle \nonumber \\
&= \frac{i}{\sqrt{2}}\left|X\right\rangle +\frac{i}{2}\left|c\right\rangle\left|B_0\right\rangle + \frac{1}{2}\left|d\right\rangle\left|B_0\right\rangle,
\end{align}
where $\left|B_0\right\rangle$ is the 'primed' or unexploded bomb, $\left|X\right\rangle$ represents the state where the bomb has been detonated$^3$ and $\left|a\right\rangle\left|b\right\rangle\equiv\left|a\right\rangle\otimes\left|b\right\rangle$. Note that for the purposes of this thought experiment we are assuming the detonator is a perfect detector, i.e., the photon wave cannot travel down $u$ without being absorbed.

As is clear from equation 2, the inclusion of the detonator destroys the constructive/destructive interference that caused the simplification in equation 1. Therefore, in the detonator case, rather than having every photon detected at $C$, we have the photon detected at $C$ with a probability of $1/4$, detected at $D$ with a probability of $1/4$ and the bomb detonated with a probability of $1/2$.$^4$

This is what makes it possible to assemble a set of functional bombs without detonating them—if a photon is detected by $D$ then the bomb must have a detonator attached and so we can set it aside knowing it works. If a photon is detected by $C$ then the functionality is indeterminate as we expect a detection at $C$ with non-zero probability in both detonator and no-detonator cases, but this is not a problem as we can simply emit another photon and re-run the test.

Note that while the probabilities above can be derived (in a fairly straightforward manner) from classical principles, we cannot apply a classical interpretation here as the quantum nature of the experiment is indispensable. In the classical (many-photon) run it is possible to both detonate a bomb and make a detection at $D$; this is precluded in the quantum case as the single photon cannot be absorbed by multiple objects. Furthermore, it is the wave-nature of the photon that permits the destructive interference at $D$ in the no-detonator case and thus provides 'detection by $D$' to signify the presence of the detonator and thus successfully make an 'interaction-free' measurement.

If you're unconvinced of this argument because it is based on a purely theoretical consideration, consider that this thought experiment has (equivalently) been carried out in the real world (admittedly using an ordinary detector rather than a bomb) and in fact was first done about a year after this problem was first published. I can't speak to the practical applications, if any exist, but I love this problem regardless for the simple fact that the solution challenges your intuition but can be understood using reasonably straightforward quantum mechanical principles.

Notes

$1$. The inclusion of $i$ in these equations might seem unusual or arbitrary, so I will provide a derivation here that shows where it comes from.

Figure 3: A beam-splitter with two incoming beams ($\psi_1$ and $\psi_2$) and two outgoing beams ($\psi_3$ and $\psi_4$). The incoming and outgoing beams are related by the beam-splitter matrix for the beam-splitter in question, as shown in equation 3. In the note below, the beam-splitter will be assumed to be 50:50 in accordance with the calculations in the main text.

Consider a beam-splitter as shown in Fig. 3. This system can be represented by the matrix equation $\left|\psi_3,\psi_4\right\rangle = \hat{B}\left|\psi_1,\psi_2\right\rangle$, or explicitly,
\begin{equation}\label{eq:BSM}
\begin{pmatrix}
\psi_3 \\ \psi_4
\end{pmatrix}
=
\begin{pmatrix}
T & R \\ R & T
\end{pmatrix}
\begin{pmatrix}
\psi_1 \\ \psi_2
\end{pmatrix},
\end{equation}
where $T$ and $R$ are the transmission and reflection coefficients respectively. In the experiment we assume an ideal, lossless beam-splitter which demands that the beam-splitter matrix be unitary, i.e., $\hat{B}^{\dagger}\hat{B}=\hat{\mathbb{I}}$, or,
\begin{equation}\label{eq:unitary}
\begin{pmatrix}
T^{\ast} & R^{\ast} \\ R^{\ast} & T^{\ast}
\end{pmatrix}
\begin{pmatrix}
T & R \\ R & T
\end{pmatrix}
=
\begin{pmatrix}
1 & 0 \\ 0 & 1
\end{pmatrix}.
\end{equation}
Equation 4 immediately implies the following relations:
\begin{equation}
|T|^2+|R|^2=1,
\end{equation}
\begin{equation}\label{eq:0}
T^{\ast}R+R^{\ast}T=0.
\end{equation}
As $T$ and $R$ are complex numbers, we can represent them in polar form as $T=|T|e^{i\theta_T}$ and $R=|R|e^{i\theta_R}$. For simplicity we choose $\theta_T=0$ and thus $T=|T|\implies T^{\ast}=T$ and so equation 6 becomes
\begin{align}\label{eq:0new}
T|R|e^{i\theta_R}+|R|e^{-i\theta_R}T&=0 \nonumber \\
2T|R|\cos{\left(\theta_R\right)}&=0
\end{align}
where we have made use of the identity $\cos{(\alpha)}=e^{i\alpha}/2+e^{-i\alpha}/2$. Equation 7 is satisfied by $\theta_R=n\pi+\pi/2, n\in\mathbb{Z}$, but we will choose $n=0\implies\theta_R=\pi/2$ for simplicity, which in turn gives $R=|R|e^{i\pi/2}=i|R|$.
Finally, as the beam-splitter is 50:50 (50% transmission, 50% reflection) we demand $|T|=|R|=1/\sqrt{2}$ and so the beam-splitter matrix is given by
\begin{equation}\label{eq:B}
\hat{B}=\frac{1}{\sqrt{2}}
\begin{pmatrix}
1 & i \\ i & 1
\end{pmatrix}.
\end{equation}
It should be clear that equation 8 is not a unique representation of $\hat{B}$; another choice of $\theta_T$ and/or $\theta_R$ would yield a different (unitary) matrix that would make no difference to the calculations shown in equations 1 and 2 (I leave proof of this as an exercise for the interested reader). With that said, the reason I like this representation is that it allows $i$ to function as a label for the states that result from a beam-splitter reflection, making it easier to write down interferometer equations directly from the diagram and keep track of where each term comes from. This is, of course, purely a matter of personal preference.

$2$. This equation is an example of quantum superposition in action. For example, the first line says that the photon exists in a superposition of the $\left|u\right\rangle$ and $\left|v\right\rangle$ states where the states are equally weighted (as we are assuming normalisation). Superposition is a fundamental aspect of quantum mechanics that follows from the linearity of the Schrödinger equation (linear combinations of solutions will themselves be solutions). In this case, the beam-splitter splits the photon probability wave along the two channels and so in some sense the photon travels along both branches, although no measurement can be made which will detect the photon in both channels at once—this is not a consequence of experimental limitations but is a restriction that is fundamental to quantum theory. The question of why this is the case is a deep and ongoing one, and I encourage the interested reader to investigate the literature on the philosophy (and especially interpretations) of quantum mechanics.

$3$. I have gone to some pains in this post to avoid using the term "wavefunction collapse" at any point, although for clarity I will say will say that in the Copenhagen interpretation, the case of the photon interacting with the detonator (or any of the detectors for that matter) is an example of wavefunction collapse.

$4$. So long as the beam-splitters are both 50:50, as we have assumed throughout this blog post. Naturally, other types of beam-splitters will yield different results, and in fact using a more sophisticated apparatus will permit a much better detection level (in theory, the detection fraction can be brought arbitrarily close to 1, although I cannot speak to the practicality of such an apparatus).