%%% andrew childs, amchilds@cs.caltech.edu
%%% cs20c final project report

\documentstyle[11pt,graphics]{article}

\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\marginparwidth}{0pt}
\setlength{\marginparsep}{0pt}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9in}
\setlength{\topmargin}{0pt}
\setlength{\headheight}{0pt}
\setlength{\headsep}{0pt}

\begin{document}

\newcommand{\mod}{{\rm~mod~}}

%%% HEADER %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% some of this might be used by LaTeX2HTML
\title{Simulating quantum computation}
\author{Andrew Childs}
\date{11 June 1998}

\leftline{\LARGE\bf Simulating quantum computation}
\bigskip
\leftline{ Andrew Childs ({\tt amchilds@cs.caltech.edu})}
\leftline{ CS 20c Final Project \hfill 11 June 1998}
\smallskip
\hrule

%%% ABSTRACT %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\bigskip

\begin{abstract}

By exploiting the features of quantum mechanics, it is possible to represent
and process information in such a way that computation may be done more
efficiently on a quantutum computer than on any Turing-equivalent classical
information processing device.  This paper summarizes the principles of quantum
computation and outlines Shor's algorithm for efficient factoring.  I describe
a library of classes for simulating quantum computation in Java and their
application to a simulation of Shor's algorithm.  Finally, I present results of
the simulation for several small integers.

\end{abstract}

%%% MAIN PAPER %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Introduction}

The description of algorithmic complexity in classical computer science relies
on the Turing model of computation, a specific description of a computing
device.  Church's thesis asserts that the Turing model of computation is
universal for describing ``reasonable'' computation, which can be taken to mean
any computation which proceeds by mechanical application of a set of predefined
rules.  This assertion, which has been found to hold for all feasible classical
methods of computation, allows the establishment of complexity classes that
describe how the time to execute an algorithm scales with the input size
regardless of the machine used.

However, in the Turing model, information is taken as an entirely local
quantity.  On the other hand, quantum mechanical systems allow the
establishment of correlations between distinct parts of a system, so that
information can be stored in nonlocal correleations among all parts of a
system~\cite{pre}.  Fundamentally, information is a physical quantity, so we
believe it should be governed by the laws of quantum mechanics.  Thus a quantum
description of information admits the possibility of new definitions of
complexity.  As Peter Shor first showed in 1994~\cite{shor}, this is in fact
the case: there exist problems which we believe can be solved asymptotically
faster on a quantum computer than on any classical computer.

This paper describes the design and implementation of a system for simulating
quantum computation on a classical computer in the Java programming language.
I have written a library of routines for representing quantum information and
computational operations.  This library was then used to implement a simulation
of Shor's efficient factoring algorithm.

\section{Foundations of quantum computation}

In quantum mechanics, the state of a system is represented by a complex-valued
wavefunction whose amplitude squared gives the probability that, upon
measurement, the system will be observed in a particular state.  Then the
general state of a two-state quantum system can be written
$$
|\psi\rangle = \alpha |0\rangle + \beta |1\rangle
$$
with the normalization condition $|\alpha|^2 + |\beta|^2 = 1$, where $\alpha$
and $\beta$ are complex coefficients.  This is only one possible representation
of the state; the choice of the orthonormal basis $\{|0\rangle, |1\rangle\}$ is
arbitrary, but this standard basis will suffice for present purposes (note that
any other representation can be achieved by a unitary transformation, i.e.\ by
performing some computation).  The normalization condition assures that total
measurement probability is unity.

With $n$ qubits, the system can be in any arbitary superposition of basis
states for a Hilbert space of dimension $2^n$.  For example, a three-qubit
state has the form
$$
|\psi\rangle = \alpha_0 |000\rangle + \alpha_1 |001\rangle + 
               \alpha_2 |010\rangle + \alpha_3 |011\rangle +
	       \alpha_4 |100\rangle + \alpha_5 |101\rangle +
	       \alpha_6 |110\rangle + \alpha_7 |111\rangle,
$$
where the $\alpha_j$ are complex numbers with $\sum_{j=0}^7 \alpha_j = 1$.
Using base ten shorthand, we can also write this state as
$$
|\psi\rangle = \sum_{j=0}^7 \alpha_j |j\rangle.
$$

By the measurement postulate of quantum mechanics, measurement of a state
yields the basis state $|j\rangle$ with probability $|\alpha_j|^2$ and puts the
system in the state $|j\rangle$.  In other words, subsequent measurement will
yield that state $|j\rangle$ with certainty.  Furthermore, by the ``no cloning
theorem,'' it is not possible to copy a general quantum state~\cite{ste}.
Doing so would allow making many copies of the state, which could then be used
to determine the magnitudes of the $\alpha_j$ to any desired accuracy.
Although it is possible to prepare states which are in a superposition of basis
states, it is not possible to observe a general state to determine all of the
coefficients $\alpha_j$.  In some sense, it is easier to put information into a
quantum state than it is to extract it, a limitation that requires a certain
cleverness in the design of quantum algorithms.

Although it is possible to formulate a quantum analog of the Turing
machine~\cite{eke}, a more useful model of computation for describing real
algorithms is the quantum gate array, in which a succession of transformations
(gates) are applied to the set of qubits.  In quantum mechanics, the time
evolution of a system is governed by the action of some Hamiltonian which
produces a unitary transformation on the system.  This restriction is
equivalent to the statement that all operations must be reversible.  In quantum
computation, these unitary transformations, usually applied to some small
subset of a system, are refered to as quantum gates.

A general $n$-qubit gate has the form
$$
\hat G=\sum_{j=1}^{2^n-1}\sum_{k=1}^{2^n-1} a_{jk} | k \rangle\langle j |,
$$
where $a_{jk}$ is a complex coefficient that describes how the $j$th basis
state of the input affects the $k$th basis state of the output.  If we
represent states as complex column vectors of the form
$$
|\psi\rangle =
\left( \matrix{ \alpha_0 \cr
                \vdots   \cr
		\alpha_n } \right)
$$
then a general gate has the form
$$
\hat G=\left( \matrix{ a_{00} & \cdots & a_{n0} \cr
                       \vdots & \ddots & \vdots \cr
		       a_{0n} & \cdots & a_{nn} \cr} \right).
$$
The physical restriction to unitary transformations means that this matrix must
be unitary, i.e.\ that its conjugate transpose must be its inverse.  Some
specific examples of gates will be discussed in the description of Shor's
algorithm.

\section{Simulating quantum computation}

The formalisms of quantum computation in the standard basis $\{|0\rangle,
|1\rangle\}$ were implemented in Java.  Although this system was specifically
intended for implementing a simulation of Shor's factoring algorithm, the
system was made as general as possible to allow potential future use in
simulating other algorithms.  All code is available on the web at {\tt
http://www.cs.caltech.edu/\~\negthinspace\thinspace amchilds/final}.

Basis states $|j\rangle$ are represented by the {\tt BasisState} class.  States
are represented as arrays of boolean values that tell whether each individual
qubit is 0 or 1 for that basis state.  Most importantly, the {\tt BasisState}
class has methods for returning the representation of a given basis state on
some subspace specified by an array of qubit indices.  These methods are very
useful for performing gate applications, measurents, and projections on a
subspace of the system.

The core information data structure is the {\tt Qubits} class, which represents
a fixed-size set of qubits in terms of the complex coefficients for all
possible basis states.  For a space of $n$ qubits, $2^n$ complex numbers are
stored.  In an algorithm, qubits can be created and manipulated, but the state
amplitudes cannot be direcly accessed.  In keeping with this physical
restriction, the array of state coefficients is a private data member.  An
amplitude can only be extracted by performing a measurement, which modifies the
system by projecting the measured subspace onto a definite basis state.
Methods are provided for printing all state coefficents or amplitudes, but only
for purposes of debugging or algorithm analysis.  This information is not
physically accessible in a single computation, although it can be thought of as
the average result of many identitcally prepared computers.

Because only the {\tt Qubits} class has access to the coefficients, it must
perform all gate applications.  Applying a gate of the same size as the number
of qubits is trivial matrix multiplication, but determing how to apply gates to
a subspace was one of the more challeing aspects of the project.  So that
complexity is readily quantifiable, algorithms must employ fixed-size gates;
for example, Shor demonstrated the construction of a quantum discrete Fourier
transform from one- and two-qubit gates.  Therefore, essentially all
computation operates on a small subspace of the entire system.

Initially, I wrote a method in the {\tt Gate} class which converted a given
gate into a gate on a larger space.  However, I soon learned that this is not a
feasible implementation for large systems.  The smallest instance of Shor's
algorithm uses twelve qubits represented by $2^{12}=4096$ complex numbers.  Thus
a transformation of the entire space requires $(2^{12})^2 \approx 1.68 \times
10^7$ complex numbers, creating an unreasonably large structure.  Instead, the
{\tt Qubits} class contains a method which performs the transformation on a
subspace by looking up the appropriate entries in the transformation matrix
during the multiplication.  The time to perform the operation remains
unavoidably long, but the space requirements are drastically reduced.

Another difficulty was determing exactly how to map the gate matrix elements
onto the larger space.  The complete space can be thought of as a tensor
product (the Hilbert space version of a Cartesian product) of $n$
two-dimensional spaces; even general works from the literature tend to simply
state that gates can be applied to a subspace without exactly stating the
effect on the qubits not involved in the transformation~\cite{ste}.  Through
educated guessing and trial and error, I found that the gate transformation
corresponds to a block diagonal expansion when the gate operates on the first
qubit.  In terms of the basis states, if we let $S[j]$ denote the number formed
by selecting the bits of the binary expansion of $j$ that correspond to the
qubits that the gate operates on and let $\bar S[j]$ denote the number formed
by removing those bits, then
$$
    \sum_{j=1}^{2^n-1}
    \sum_{k=1}^{2^n-1} a_{jk} | k \rangle\langle j |
\to \sum_{j=1}^{2^N-1}
    \sum_{k=1}^{2^N-1} \bigg\{ \matrix { a_{jk} | S[k] \rangle\langle S[j] | &
                                        \bar S[j] = \bar S[k]\cr
                                        0 & \bar S[j] \ne \bar S[k] },
$$
where $N$ is the number of qubits the gate operates on.

The {\tt Qubits} class contains a counter for the number of gate applications.
As long as fixed-size gates are used, this provides a measure of the running
time of an algorithm that can be used to evaluate computational complexity.

In order to perform a special kind of operation which simplifies the simulation
of many algorithms, the {\tt Qubits} class also contains a method for
reversibly computing a function that maps integers to integers.  Assuming that
the function can be computed classically in polynomial time, computing it
directly via this method will not change whether the simulated algorithm can
be performed in polynomial time, and it may vastly simplify the description of
the algorithm.  Jozsa describes both Shor's algorithm and Simon's algorithm in
terms of this kind of general function~\cite{joz}.  These functions are
represented by the {\tt IntFunction} class, an abstract base class that can be
subclassed to implement specific functions.

General transformations are represented by the {\tt Gate} class.  Because all
operations that require access to quantum information must be performed by the
{\tt Qubits} class, the {\tt Gate} class is relatively simple: it is primarily
a data structure that contains the matrix elements of a transformation.  There
are also methods to check whether an arbitrary gate is unitary, but this
functionality is not often needed.

Along with the classes that simulate computation, there are two library classes
that provide tools to be used in algorithms.  The {\tt GateLib} class is a
library of common quantum gates.  It has static methods which return {\tt Gate}
objects for several common one-, two-, and three-qubit gates.  The {\tt
AlgoLib} class is a library of algorithms which might be useful as subroutines
in simulations of larger algorithms.  It contains only the quantum discrete
Fourier transform as constructed by Shor~\cite{shor}.  Both libraries are
easily extensible as new gates or subroutines become useful.

\section{Shor's factoring algorithm}

Shor's discovery of an algorithm for polynomial-time factoring was the first
evidence that quantum computers can be asymptotically faster than classical
ones.  Factoring is certinaly not the only application of quantum computers; in
particular, Grover's algorithm for searching an unstructured database provides
a quadratic speedup over classical searching, although it does not change the
complexity classification~\cite{ste}.  However, Shor's algorithm is both widely
studied and relatively simple to understand, so it was chosen for
implementation.

Shor actually solved the problem of factoring by reduction from another
problem, that of finding the order $r$ of $x$ mod $n$, the least $r$ such that
$x^r \equiv 1 \mod n$~\cite{shor}.  Using some basic number theory, one can
show that given a random $x \mod n$ coprime to $n$, the greatest common divisor
of $n$ and $x^{r/2}-1$ will be a nontrivial factor of $n$ with a probability
that approaces unity as $n$ increases.  This condition holds as long as $n$ is
odd and not a power of a prime number; in these cases, factors can be found
efficiently by classical computation.  All of the postprocessing can be done
in polynomial time on a classical computer, so an efficient quantum algorithm
for finding $r$ is effectively an efficient algorithm for factorizing $n$.

In Shor's orignial version of the algorithm, no measurements are performed
until the computation is complete, whereupon the whole system is measured and
part of the result is used to find the order.  In a slight modification used by
by Jozsa~\cite{eke,joz}, part of the system is measured during the computation,
projecting the system down to the subspace of the other part.  This
modification makes the algorithm more intuitive and faster to simulate without
changing its structure, so it will be used in what follows.

The algorithm for finding $r$ uses two quantum registers, the first of size
$\lceil \log n^2 \rceil$ and the second of size $\lceil \log n \rceil$.
For convenience, we define $q \equiv \lceil n^2 \rceil$.  To start the
computation, the first register is prepared in an even superposition of all
states and the function $x^j \mod n$ is computed in the second register,
preparing the state
$$
{1 \over \sqrt{q}} \sum_{j=0}^{q-1} | j \rangle | x^j \mod n \rangle.
$$
Shor describes how this function could be computed on a quantum computer,
noting that keeping the input in a separate register makes the operation
reversible and therefore allowed~\cite{shor}.  Furthermore, although it is the
most computationally intensive step of the algorithm, modular exponentiation
can be done in polynomial time on a reversible classical computer, so it is
clear that the quantum version can be done in polynomial time (specifically,
using the algorithm given by Shor~\cite{shor}, in $O((\log n)^2 (\log \log n)
(\log \log \log n))$).

After preparing this state, the second register is measured.  As described by
Ekert and Jozsa~\cite{eke}, this picks out all the values for which $x^j \mod
n$ is some arbitrary constant.  Since the order of $x \mod n$ is $r$, these
values have the form $x^{j+kr} \mod n$ for constant $j$ and all appropriate
$k$.  For simplicity, consider the case where $r$ divides $q$ exactly.  In this
case, the state after measurement has the form
$$
{\sqrt{r \over q}} \sum_{k=0}^{q/r-1} |j+kr\rangle;
$$
in other words, it is periodic with period $r$, but with a randomly chosen
offset $j$.

Next, the crucial step in Shor's algorithm is to perform the quantum discrete
Fourier transform (QFT)
$$
|j\rangle \to 
{1 \over \sqrt{q}} \sum_{l=0}^{q-1} 
\exp\left(2 \pi i {j l \over q} \right) | l \rangle 
$$
to pick out the peridoicity $r$ in the state of the second register.  Following
the treatment of Ekert and Jozsa~\cite{eke}, this operation takes the state of
the machine to
$$
{\sqrt{r \over q}} \sum_{k=0}^{q/r-1} 
{1 \over \sqrt{q}} \sum_{l=0}^{q-1} 
\exp\left(2 \pi i {(j+kr) l \over q}\right) | l \rangle
=
{\sqrt{r} \over q} \sum_{l=0}^{q-1}  \exp\left(2 \pi i {j l \over q}\right)
                   \sum_{k=0}^{q/r-q}\exp\left(2 \pi i {k r l \over q}\right)
$$
The inner sum is zero unless $l$ is a multiple of $q/r$, say $l=j q/r$, in
which case the sum is equal to $q/r$.  Thus the state is
$$
{1 \over \sqrt{r}} \sum_{j=0}^{r-1} 
\exp(2 \pi i j l/r) \left| j q/r \right\rangle 
$$

As Shor shows~\cite{shor}, the QFT can be constructed using two gates:
$$
R_j = {1 \over \sqrt{2}} \left( \matrix{ 1 & 1 \cr
                                         1 & -1} \right),
$$
the normalized Hadamard transformation on the $j$th qubit, and
$$
S_{j,k} = \left( \matrix{ 1 & 0 & 0 & 0 \cr
                         0 & 1 & 0 & 0 \cr
			 0 & 0 & 1 & 0 \cr
			 0 & 0 & 0 & \exp(i \pi / 2^{k-j})} \right),
$$
on the $j$th and $k$th qubits, $j<k$.  Taking $2^q \equiv l$, the QFT is
achieved using the sequence of gates
$$
R_{l-1} S_{l-2,l-1} R_{l-2} S_{l-3,l-1} S_{l-3,l-2} R_{l-3} \cdots
R_1 S_{0,l-1} S_{0,l-2} \cdots S_{0,2} S_{0,1} R_0
$$
in order from left to right.  Thus the QFT can be computed in $O((\log n)^2)$;
asymptotially, this operation is dominated by the modular exponentiation.

After performing the QFT, the register is measured, yielding some value $l=j
q/r$ with $j$ chosen equiprobably.  Knowing this, the denominator of $l/q$
reduced to lowest terms will give the order $r$ (since the probability that the
greatest common divisor of $l$ and $r$ is 1 asymptotically approaches unity).

In general, $r$ will not divide $q$ exactly, but it can be shown that the same
procedure works if the fraction $l/q$ is reduced using the method of rational
approximation by continued fractions, choosing the best approximation for which
$r \le n$.  Greater detail on algorithms for continued fraction expansions can
be found in Ekert and Jozsa~\cite{eke} and Hardy and Wright~\cite{har}; the
algorithm can be performed in polynomial time by a classical computer used for
postprocessing.

\section{Simulating Shor's algorithm}

Shor's algorithm was implemented in the {\tt Shor} class.  It performs the
quantum operations used to find the order $r$ as well as the necessary
classical preprocessing and postprocessing for factorization.  In addition to
the main class, the implementation of Shor's algorithm uses a subclass of {\tt
IntFunction} called {\tt ModExp} which performs modular exponentiation on
integers.

To make the algorithm more robust, factors of two (which cannot be dealt with
by Shor's algorithm) are handled in preprocessing.  Numbers below 15 are
handled by a lookup table as 15 is the smallest odd number which is not a prime
power.

The most challenging aspect of simulating the algorithm itself was learning the
relevant number theory to understand and code the method for finding a rational
approximation by continued fraction expansion.  Also, I found that modular
exponentiation and the final calculation of $r$ produce numbers which do not
fit in 32-bit integers even for small instances of the algorithm, so these
operations were perormed using Java's {\tt java.math.BigInteger} class.  Other
than these minor difficulties, coding the algorithm itself was relatively
simple compared to the general computation routines.

To increase simulation speed, the program was compiled into native executable
code using IBM's High Performance Compiler for Java~\cite{ibm}.  Unfortunately,
this compiler is available only for Windows 95, Windows NT, AIX, and OS/2.  The
same code ran about seven times faster on a 233 MHz AMD K6 running Windows 95
than on a Caltech CS cluster machine using Sun's javac under NetBSD.  An
instance of the algorithm to factor 15 (an operation dominated by the
calculation of an eight qubit QFT) took 105 seconds per iteration with the IBM
compiler and 767 seconds per iteration using javac.

A typical output of the program {\tt Shor} for the input 15 is as follows:

\begin{quote}
\begin{verbatim}
Shor initialized: factoring 15
-> Finding order of 2 mod 15
   Initializing quantum computation
   Allocating 8 qubits for first register, 4 qubits for second register
   Performing modular exponentiation on second register
   Measure second register: |8>
   QFT on first register
   Measure first register: |192>
   Finished in 29 quantum gate applications
   Rational approximation to 192 / 2^8 is 3 / 4
-> Order is 4
-> Possible divisor: 3
=> Found a divisor: 3
Factorization complete: 15 = 3 * 5
\end{verbatim}
\end{quote}

Plotting the state of the first register as the computation proceeds clearly
shows how the algorithm works.  It is important to note that the information
contained in these plots does not correspond to physically accessible
information for any single instance of the algorithm, but it can be taken to be
an average over many instances for which all measurements yield the same value.

The random number coprime to $n=15$ is chosen as $x=2$, and $\sum |2^j \mod
15\rangle$ is computed in the second register.  In other words, every basis
state for which the second register is the result of modular exponentiation on
the first register is given an equal amplitude, and all other basis states have
zero amplitude.  Measuring the second register chooses a particular value for
$j$ (in this case, 8), leading to an offset in an otherwise periodic
distribution, as shown in Figure 1.

\begin{figure}[bht]
\smallskip
\includegraphics{measured.eps}
\caption{\small Contents of the first register following measurement of the
second register in the factorization of 15.}
\end{figure}

Next, the QFT is performed on the first register.  As Figure 2 shows, the
distribution of peaks has become completely periodic, and the four peaks
correspond to the spacing between peaks in Figure 1.  Measurement finds the
peak at 192, which gives the proper order of 4.  A factor of 15 is given by the
GCD of 15 and $2^{4/2}-1=3$, which is 3, a factor of 15.  Had the measurement
picked out either the peak at 0 or the peak at 128, the order would be
incorrectly determined, and the computation would have to be repeated.

\begin{figure}[thb]
\includegraphics{qft.eps}
\caption{\small Contents of the first register following application of the QFT
in the factorization of 15.}
\end{figure}

Unfortunately, due to the exponential increase in the representation of a
general quantum system, the program could not be run on very many inputs.  The
numbers 15, 21, 33, 45, and 55 were successfully factored.  The factors of 55
were found with two iterations in about 38.6 hours; larger inputs were not
feasible given software, hardware, and time constraints.

A logarithmic plot of the time per iteration to factor these five numbers
(Figure 3) shows that the simulation time is exponential in input size $\log
n$.  A good general rule is that the calculation takes roughly five times
longer for each additional qubit.  The running time for 33 and 45 is
essentially the same because both require 11 qubits for the first register.

\begin{figure}[bht]
\scalebox{.45}{\includegraphics{time.eps}}
\hskip 24pt
\scalebox{.45}{\includegraphics{qops.eps}}
\caption{\small (at left) Time required to factor 15, 21, 33, 45, and 55 on a
233 MHz AMD K6 using the IBM compiler.}
\caption{\small (at right) Quantum operations required to perform the QFT when
factoring numbers from 15 to 128, inclusive.}
\end{figure}

Because the algorithm is probabilistic, the number of iterations varies from
instance to instance.  A more complete description of the running time would
include the total time for an average instance, but this requires running each
instance many times to acquire good statistics.  The few instances which were
repeated seem to indicate a fairly weak dependence of the number of instances
on the input size.  Shor shows that the dependence is logarithmic in input size
(though it can be made constant using suitable postprocessing)~\cite{shor}.
Therefore, this effect will not change whether the simulation time or the
number of quantum operations is exponential or polynomial.

Without taking into account the mean number of iterations to factor a give
number, the number of quantum in the QFT is completely determined by the number
of bits needed to perform the computation.  Figure 4 plots this relationship
for integers between 15 and 128.  The need to use entire qubits introduces
discretization, but the relationship is clearly linear.  Full simulation would
include the time to calculate modular exponentiation, which asymptotically
dominates the time to perform the QFT but which is also polynomial.

\section{Conclusion}

By developing a library of Java classes for performing simulation of general
quantum computation, I have learned a lot about how computation can be
performed using a quantum mechanical representation of information.  I have
also learned in detail how this sort of computation differs from classical
computation.  Simulating Shor's algorithm has given me a firm understanding of
how quantum computers can provide a theoretical speedup over classical
computers.

%%% REFERENCES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{thebibliography}{[10]}

\bibitem{pre}Preskill, J. (1997), Chapter 1 of lecture notes for Ph 229, {\tt
http://www.theory.caltech.edu/ \allowbreak
people/preskill/ph229/notes/chap6.ps}.

\bibitem{shor}Shor, P. W. (1994), {\it Algorithms for quantum computation:
Discrete logarithms and factoring}, in {\it Proceedings of the 35th Annual
Symposium on Foundations of Computer Science}, IEEE Computer Society Press.

\bibitem{ste}Steane, A. (1997), {\it Quantum computing}, {\tt
http://xxx.lanl.gov/ps/quant-ph/9708022}.

\bibitem{eke}Ekert, A. and R. Jozsa (1996), {\it Quantum computation and Shor's
factoring algorithm}, Rev. Mod. Phys. {\bf 68} (3) 733--53.

\bibitem{joz}Jozsa, R. (1996), {\it Quantum algorithms and the Fourier
transform}, {\tt http://xxx.lanl.gov/ \allowbreak ps/quant-ph/9707034}.

\bibitem{har}Hardy, G. H., and E. M. Wright (1979), {\it An Introduction to the
Theory of Numbers}, Clarendon Press, Oxford.

\bibitem{ibm}IBM Corporation, High Performance Compiler for Java, {\tt
http://www.alphaworks.ibm.com/\allowbreak alphapreview\_tools/}.

\end{thebibliography}

\end{document}