# A Cat’s Thought Experiment

This post was written for the Q# Advent Calendar 2018Check out the calendar for other posts.

Measurements of a quantum system are weird. It is mind-boggling that our classical world is somehow a manifestation of the counter-intuitive rules of quantum mechanics. When the classical world attempts to measure what’s in a quantum world, we obtain classical information and inevitably disturb the original system. (And we engineers are trying to build a computer out of this phenomenon!)

There have been many interpretations of measurements. Interference as an explanation was one of the topics that came up in discussions among our colleagues. As a “classically-trained” physicist (pun-intended), namely trained on approaches using wavefunctions, the Schrödinger equation, operators, the Copenhagen interpretation, etc., interference was an unfamiliar concept. I started learning from Scott Aaronson’s lectures about a new approach to teaching quantum mechanics that emphasizes the difference between classical probabilities and quantum probabilities. As I’ll explain later, this difference is what gives rise to interference effects in quantum mechanics. We’ll fast forward some fundamentals and talk about measurements and interference.

## Learning together

The below writing is part of the tutorial for the Quantum Computing Study Group organized by The Garage at Microsoft. This is a community of learners meeting together to study different aspects of quantum computing, spanning software, hardware, mathematics and physics (see syllabus below). I’ve had the pleasure to lead sessions in Silicon Valley and the writing of the tutorial. We are collaborating across the globe with our colleagues in the India Garage. From these study groups, our employees began with no background in quantum computing, and worked to the point of being able to teach and write programs in Q#. The Garage also runs Microsoft’s annual hackathon. There were some quantum projects from this year’s hackathon. I want to give a shout out to a few colleagues from our Silicon Valley study group, Mario Inchiosa, Difu Su, Johnathan Tan and Hans Wang, who contributed to the Superdense Coding in the Quantum Katas, which are public programming exercises for learning Q# and quantum computing.

Two discoveries I made during my study: 1. Even if you are a physicist professionally, you may not know how quantum computers work. 2. You don’t have to have a degree in physics to study and understand quantum computing. This is an intellectually interesting subject made into something practical. I would recommend anyone to learn it.

## TL;DR

Sections 1 and 2 explain the conventional construction of quantum mechanics and background on measurement. Feel free to skip these to look at the more interesting (or more unfamiliar) sections 3 and 4 on probability theory and interference. Section 5 shows where to find Q# exercises on measurement.

## 1. Some background

Typically, physicists learn the subject in a chronological order – how the field of Physics has developed through time. Because physicists were used to classical phenomena, experimental results of quantum mechanical phenomena appeared to be surprising when they were first discovered (look up the “UV Catastrophe”, “photoelectric effect”, “Compton Effect” and “interference of light at low intensity”). Physicists in the early 1900s naturally attempted to explain the results using classical methods.

One of the historical approaches to describe quantum phenomena is using wavefunctions. Physicists in the early 1900s found that quantum particles behave like waves. This means a quantum system can be described using a wave equation – the Schrödinger equation. Here is the Schrödinger equation for a non-relativistic particle in an external potential:

$-\frac{\hbar^2}{2m} \nabla^2 \Psi \left( r, t \right) + V \! \left( r, t \right) \Psi \left( r, t \right) = i \hbar \frac{\partial \Psi \left( r, t \right)}{\partial t}$

where $\Psi$, called the “wavefunction”, describes the state of the particle (technically, $\left| \Psi \left( r, t \right) \right|^2$ is the probability of finding the particle at position r at time t); ħ is Planck constant (~1.05 x 10-34 Js/rad); m is the mass of the quantum particle; and V is an external potential (such as an electric or gravitational field). In a system that doesn’t vary with time, the right-hand side of the equation equals $E \Psi \left( r, t \right)$, with $E$ being the energy of the system. The above equation reflects conservation of energy, as the first and second parts on the left-hand side describe the kinetic energy and potential energy in the system, respectively. Something that’s very important to note is that in the above equation, $\Psi \left( r, t \right)$ is a field that fills all of space and which evolves with time, which has to do with the probability of finding the particle anywhere at a given time. This is very different from classical mechanics, where we deal with the exact locations of particles.

The Schrödinger equation has a very interesting property called “linearity”. If we find two solutions $\Psi_1$ and $\Psi_2$, then combinations of them are also a valid solution. For example, $\frac{1}{2} \left( \Psi_1 + \Psi_2 \right)$ is a solution. For any system, physicists find a set of “basis states”, which are sufficient to fully describe the system. For example, there might be one basis state for each position the particle can be in. But now, since the Schrödinger equation is linear, the combination of multiple basis states is also a valid state. For example, the combination of the states representing “particle is at position 0” and “particle is at position 1” is valid, and represents a state where the particle’s position is either 0 or 1. This is called the “superposition principle”.

The superposition behavior is written as

$\Psi \left( x \right) = \sum_i c_i \phi_i \left( x \right)$,

where we’re working in 1D (hence, x instead of r) for simplicity. Here, $\phi_i$ is the ith basis state in the system with coefficient $c_i$ being the “amplitude” of the state $\phi_i$ . The amplitude squared, $\left| c_i \right|^2$, gives the probability of the system being in the ith state, $\phi_i$. Any wavefunction can be expanded in terms of the basis states, $\phi_i \left( x \right)$.

In terms of a qubit, the above equation is reflected by superposition of states, such as in $\left| \psi \right> = c_0 \left| 0 \right> + c_1 \left| 1 \right>$. We can see that a qubit system has only two basis states.

To obtain the value of the coefficient of each possible basis state, one needs to find how much overlap there is between each basis state $\psi_j$ and the overall state $\psi$.

$c_j = \int_{-\infty}^{+\infty} \psi_j^{\ast} \left( x \right) \psi \left( x \right) dx = \sum_i \int_{-\infty}^{+\infty} \psi_j^{\ast} \left( x \right) \psi_i \left( x \right) dx$.

In Dirac notation, $\left| \psi \right> = \sum_i c_i \left| \phi_i \right>$, where $c_j = \left< \phi_j \bigr| \psi \right>$.

## 2. Measurement – not a gate

Measurements are different from gates. Gates are unitary which makes operations reversible. Gates acting upon a quantum system do not destroy the system or make connections between the system and the outside world. Applying a gate does not lose information. By contrast, when we in the outside world want to obtain information from the system, we need to measure it. However, measuring a system brings it into contact with the outside world, irrevocably destroying information. Thus, measurement is, in general, not reversible – the system cannot be brought back to its state before the measurement. One consequence of this difference is that a measurement cannot be operated with a gate. It is an interaction between the system and its environment. A parameter or physical variable measured from a quantum system is called an “observable” (because we can observe it). Momentum, mass, velocity and energy are all observables.

If we use the wavefunction approach, we can derive the value we’d expect to measure for a large number of measurements of a given observable, M. The expectation value can be obtained as

$\left< M \right> = \left< \psi \bigr| M \bigr| \psi \right> = \sum_j m_j \left| c_j \right|^2$,

where $m_j$ is each measurement result of , and $\left| c_j \right|^2 = P \left( m_j \right)$ is the probability of getting the result $m_j$. Obtaining $m_j$ leaves the system in the corresponding state, $\left| \psi_j \right>$. This unavoidable disturbance of the system caused by the measurement process is often described as a “collapse”, a “projection” or a “reduction” of the wavefunction.

In our two-qubit system, a state is described by a superposition of four basis states, i.e.

$\left| \psi \right> = c_{00} \left| 00 \right> + c_{01} \left| 01 \right> + c_{10} \left| 10 \right> + c_{11} \left| 11 \right>$,

with coefficients $c_{ij}$. Now, what happens if we do a measurement on the system – say, if we measure the first qubit and find it to be 0? There are two possible states with the first qubit being 0. The probability of getting a 0 in the first qubit is

$P = \left| c_{00} \right|^2 + \left| c_{01} \right|^2$.

After such a measurement, there are only two possible states that can exist in the system. The state becomes

$\left| \psi^{\prime} \right> = \frac{ c_{00} \left| 00 \right> + c_{01} \left| 01 \right>}{\sqrt{P}}$.

The denominator is there to keep the probability normalized. A measurement collapses a state $\left| \psi \right>$ into a basis state with a definite value of the observable (or “operator”) being measured.

## Math insert – Hermitian operator

Recall that a gate is a unitary matrix, U, such that $U^{\dagger} U = I$, or $U^{\dagger} = U^{-1}$.

An observable corresponds to a Hermitian matrix, M, such that $M^{\dagger} = M$. This is a Hermitian matrix. Technically, the basis states that have a definite value of the observable M are the “eigenstates” of M, and the observed values of the operator are the “eigenvalues”. Operating with the matrix M on an eigenstate yields a number m times the eigenstate:

$M \left| \psi \right> = m \left| \psi \right>$.

This number, m, is the value of the observable M that one would measure if the system were in state $\psi$.

The operator corresponding to an observable is required to be Hermitian, because measurement results or observables need to be real numbers. Here’s the proof that the measurement operator must be Hermitian:

$\left< \psi \bigr| M \bigr| \psi \right>$.

If $\left< M \right>$ is real, then $\left< M \right>^{\dagger} = \left< M \right>$. Therefore,

$\left< M \right>^{\dagger} = \left< \psi \bigr| M^{\dagger} \bigr| \psi \right> = \left< \psi \bigr| M \bigr| \psi \right>$.

This must hold for any state $\left| \psi \right>$. Thus, $M^{\dagger} = M$.

## 3. The 2-norm approach – an alternative way to teach and learn quantum mechanics

Now that we are in the 21st century, quantum phenomena are no longer so strange to physicists. With all the knowledge we have accumulated from the past, perhaps there is a more straightforward way to learn quantum mechanics, without the need to think about wavefunctions right away.

Essentially, we can start by generalizing probability theory. In our experience, probabilities are always positive and sum to 1. This is called the “1-norm” condition:

$\sum_i p_i = 1$.

In quantum mechanics, we don’t work directly with probabilities. Instead, we work with “amplitudes”. The square of an amplitude is a probability, so we require that squares of the amplitudes sum to 1. This is called the “2-norm” condition (where “2” refers to the fact that we’re squaring the amplitudes):

$\sum_i \left| a_i \right|^2 = 1$.

One thing to note here is that when we talk about “squaring” a number, we actually mean taking the “modulus squared” (or the “square of the magnitude”) , which is done by multiplying it by its complex conjugate: $\left| a \right|^2 = a^{\ast} a$. For a real number, taking the modulus squared and taking the square are the same thing, but for a complex number, they’re different.

Because of the 2-norm condition, an amplitude can be a positive, negative or even complex number. In the example above, we wrote the amplitudes as $c_j$. As seen in the normalization condition, it is the square of magnitudes of the amplitudes that sum to 1. Amplitudes are related to probabilities. If we want to go from an amplitude to a probability, we take the square of the magnitude of the amplitude. That’s why the squares of the amplitudes must sum to 1.

The complex number $c_j$ can be written as $r_j e^{i \varphi_j}$, with $r_j$ being the magnitude and $\varphi_j$ the phase. Both $r_j$ and $\varphi_j$ are real numbers. As we’ve seen in Pavan Kumar’s blog post, “Bloch Sphere in Quantum Computing”, probability only depends on the magnitude of the amplitude. For two amplitudes, the normalization condition is $\left| c_0 \right|^2 + \left| c_1 \right|^2 = \left| r_0 e^{i \varphi_0} \right|^2 + \left| r_1 e^{i \varphi_1} \right|^2 = r_0^2 + r_1^2 = 1$.

This alternative quantum mechanics introduction bypasses wavefunction derivations. It puts up front that the universe behaves according to the 2-norm condition with a set of axioms. This allows us to see some fundamental quantum mechanical behaviors without dealing with more complicated systems, such as particles in an external potential.

## 4. Interference

The key of a generalized probability theory is that probability is equal to the square of magnitude of amplitude, so the square of amplitude must be a positive number between 0 and 1. Here, we’re going to look at the idea of interference between different possibilities, which is something that is typical of quantum mechanics, but not of classical probability theory. Interference occurs because we use amplitudes, rather than probabilities, and amplitudes can be negative or complex. Let’s use a metaphorical example to illustrate this abstract idea.

Say Bob is someone who may like sports and/or computer games. He has a 70% chance of liking sports ($S$) and a 30% chance of not liking sports ($\bar{S}$). If he likes sports, there is an 80% chance that he likes computer games ($C$) and a 20% chance that he does not like computer games ($\bar{C}$). If he does not like sports, there is a 10% chance he likes computer games, and a 90% chance that he doesn’t. This situation is depicted as a tree on the left of Figure 1.

Figure 1. Probability trees in classical probability theory according to 1-norm (left) and in quantum mechanics according to 2-norm (right). We begin at the top of the tree, and ask, “Does Bob like sports ($S$) or not like sports ($\bar{S}$)?” Then we get to the next bifurcation, and ask, “Does Bob like computer games ($C$) or not like computer games ($\bar{C}$)?” On the left, we show the probabilities for each option. On the right, we show the amplitudes.

What is the chance that Bob likes computer games? We can compute this by aid of the probability tree. There are two paths we can go down the tree to end up at $C$: $S \rightarrow C$ and $\bar{S} \rightarrow C$. We can multiply the branching probabilities down each path, and then add the paths together. In this case, the probability of $C$ is given by

$P \left( C \right) = 0.7 \times 0.8 + 0.3 \times 0.1 = 0.59$

or 59%. But what if Bob were a quantum system? In that case, we would work with amplitudes. The tree on the right of Figure 1 shows one possible set of amplitudes, which give the same branching possibilities. Instead of calculating the probability of Bob liking computer games, we’ll first calculate the amplitude. If we switch the word “probability” to “amplitude” in the above calculation, then everything is the same. We find all the paths to get to $C$, and multiply the amplitudes along the path, and then sum the amplitudes of the paths: $a_C = \sqrt{0.7} \times \sqrt{0.8} + \sqrt{0.3} \times \left(-\sqrt{0.1} \right)$.

To get the probability, we take the modulus squared of the amplitude:

$P \left( C \right) = \left| a_C \right|^2 \eqsim 0.548$,

or 54.8%. The second path actually subtracts from the first path, causing the overall probability to decrease. Very strangely, the fact that there’s a second way for Bob to like computer games decreases the probability that he likes them! This is because the amplitude of the second path has the opposite sign from the amplitude for the first path. We call this “destructive interference”. If the amplitudes had the same sign, the second path would reinforce the first, and we would get “constructive interference”. Interference is fundamentally caused by the fact that amplitudes can be negative, or even complex, allowing them to cancel one another out. Interference is at the heart of much of the “strangeness” of quantum mechanics. Conceptually, however, it’s rather simple, as we can see from the trees in Figure 1.

In quantum mechanics, why do we multiply the amplitudes, rather than the probabilities? It comes out from math in quantum mechanics. Let’s check with an arbitrary qubit state, $\left| \psi \right> = c_0 \left| 0 \right> + c_1 \left| 1 \right>$. It has amplitude $c_0$ of being in $\left|0 \right>$ and $c_1$ of being in $\left|1 \right>$. Both $\left|0 \right>$ and $\left|1 \right>$ can become $\left| + \right>$ and $\left| - \right>$. How much of $\left|+ \right>$ and $\left| - \right>$ are in $\left| \psi \right>$? Knowing $c_j = \left< \phi_j \bigr| \psi \right>$, we would do an overlap integral $\left< + \bigr| \psi \right>$ and $\left< - \bigr| \psi \right>$to get the corresponding amplitudes and square them to get the probabilities:

$c_{\pm} = \left< \pm \bigr| \psi \right> = c_0 \left< \pm \bigr| 0 \right> + c_1 \left< \pm \bigr| 1 \right>$.

$P \left( \left| \pm \right> \right) = \left| c_{\pm} \right|^2$.

The square happens after the amplitudes add up. This turns out to be equivalent to multiplying down a 2-norm probability tree in Figure 2, consistent with our metaphorical example for Bob above. Quantum states are intrinsically described with amplitudes.

Figure 2. Multiplying amplitudes down the 2-norm probability tree is equivalent to calculating the overlap integrals.

The 2-norm in quantum mechanics can reduce to 1-norm in classical probability theory when constructive interference (paths with amplitudes of the same complex phase) dominates and destructive interference (paths with amplitudes of different complex phases) eliminate each other. This is explained very nicely in Richard Feynman’s lecture on quantum electrodynamics. We are more likely to get a particular measurement if there are many paths to it that constructively interfere.

Interference can also be seen in a gate operation (unitary matrix) as Scott Aaronson showed in his lecture (https://www.scottaaronson.com/democritus/lec9.html ) using a randomizing matrix.

## 5. Q# exercise: Measurement using M operation

The Q# operation for measurement takes care of the fact that $m_j$ would be measured with a probability $\left| c_j \right|^2$ and the system becomes $\left| \psi_j \right>$. There are a lot of exercises on measurements in the Quantum Katas.

1. Go to Quantum Katas: https://github.com/Microsoft/QuantumKatas introduced earlier in the CNOT exercise.
2. In Visual Studio (Code) open folder “Measurements” and then Tasks.qs.
3. Look at Task 1.1. It only needs an operation M(q) to measure the state of the qubit q.
4. Now use M(q) to finish as many tasks as you can. These tasks need other gates you might have learned in quantum computing to put the qubits into the states you can measure.