\documentstyle{article} \newtheorem{theorem}{Theorem} \def\a{A} \def\b{B} \def\c{C} \def\d{D} \def\e{E} \def\T{T} \newcommand{\ket}[1]{\left|#1\right>} \def\ep{\epsilon} \newcommand{\comment}[1]{} \begin{document} \title{ Scalable NMR Quantum Computation } \author{Leonard J. Schulman \\ College of Computing \\ Georgia Institute of Technology \\ Atlanta GA 30332 \and Umesh Vazirani\thanks{Supported in part by a JSEP grant.} \\ Computer Science Division \\ U.\ C.\ Berkeley \\ Berkeley CA 94720 } \maketitle \begin{center} {\large Preliminary Draft\footnote{Comments welcome to schulman@cc.gatech.edu and vazirani@cs.berkeley.edu.} } \end{center} \begin{abstract} Nuclear magnetic resonance offers an appealing prospect for implementation of quantum computers, because of the long coherence times associated with nuclear spins, and extensive laboratory experience in manipulating the spins with radio frequency pulses. Existing proposals, however, suffer from a signal-to-noise ratio that decays exponentially in the number of qubits in the quantum computer. This places a severe limit on the size of the computations that can be performed by such a computer; estimates of that limit are well within the range in which a conventional computer taking exponentially more steps would still be practical. We give an NMR implementation in which the signal-to-noise ratio depends only on features of NMR technology, not the size of the computer. This provides a means for NMR computation techniques to scale to sizes at which the exponential speedup enables quantum computation to solve problems beyond the capabilities of classical computers. \end{abstract} A sequence of results over the last decade~\cite{DJ, BV, Si, Sh} have provided the first credible challenge to the widely accepted notion that all physically ``reasonable'' computer models are roughly computationally equivalent, i.e.\ a problem is tractable (e.g.\ solvable in polynomial time) in one model if and only if it is solvable in another. At issue is the ability of computers based on quantum physics to perform certain computations (such as factorization~\cite{Sh}) exponentially faster than clasical computers. However, realizing quantum computation in the laboratory has proved to be a formidable challenge since it requires an isolation of the computer from the effects of environmentally induced decoherence, while being able to operate upon its state to perform elementary operations. Nevertheless, several proposals to realize quantum computation in the laboratory have been made, using a variety of systems such as cavity quantum electrodynamics~\cite{DRBH, CY, Tu}, trapped ions~\cite{CZ}, and most recently nuclear magnetic resonance~\cite{GC, CFH}. The last proposal is particularly interesting for a number of reasons. Nuclear spins exhibit long relaxation times --- with coherence times as long as thousands of seconds~\cite{CORW}. Moreover, NMR laboratory techniques routinely manipulate nuclear spins with sequences of hundreds of radio frequency pulses, and therefore provide a very attractive setting for carrying out a sequence of computational steps. However, for NMR techniques to be useful in quantum computation, there is a major obstacle that has to be overcome~\cite{Ll, Di} --- initializing the system in (or near) a known initial state (say $\ket{0^n}$). By contrast, conventional NMR systems use macroscopic samples, which at room temperature and in thermal equilibrium must be regarded as constituting a statistical mixture of pure states. Of course, if single nuclear spins could be individually addressed, this state preparation problem could be solved~\cite{Wa}. However, this appears to be quite difficult to realize. A major breakthrough in the use of NMR techniques in quantum computation came about in~\cite{GC, CFH}, where schemes for performing small scale NMR quantum computation using bulk samples were first introduced. The main idea in~\cite{GC}, is to embed a small dimensional `virtual' pure state within the density matrix describing the bulk sample, by exploiting the structure present in thermal equilibrium. Solving the initial state preparation problem in this way paves the way for experimental realization of quantum computation using off-the-shelf equipment for conventional pulsed NMR. Indeed, this approach has been used in the laboratory to implement $2$-qubit (quantum bit) prototypes of a quantum computer: over $100$ consecutive logic steps were performed on a $2$-qubit computer, and the basic steps of Grover's search algorithm \cite{Gr,CGK,CGKL}. Although this approach provides a very important ``proof of concept'' demonstration for quantum computation, it does not scale --- the strength of signal output by the NMR quantum computer degrades exponentially in the number of quantum bits $n$ in the system. Thus the exponential speedup promised by quantum computation is offset by an exponential increase in the effort required to detect the output signal. The most optimistic predictions are that the output signal will be undetectable for computers on about $30$ qubits. Quantum computations of this size could be quite efficiently simulated on conventional computers. In this paper, we give a new technique for preparing the initial state of the NMR system, where the output signal strength does not degrade as the number of qubits in the system is increased. We believe this is therefore the first proposal for a quantum computer which has long decoherence time, scales to large numbers of qubits, and does not suffer a corresponding decay in signal strength. NMR technology requires a ``bulk'' sample in order to create a readable signal in the output coils. A quantum computer will need to use enough macromolecules in order to create this signal; this is a matter for experimental considerations, which we will not discuss here. However, we avoid increases in sample size related to the complexity of the computation; this has the desirable aspect that a relatively small sample, which may be subjected to extreme conditions (of cold, magnetic field, etc.), may offer further opportunities for increasing the efficiency of the process. \subsection*{Polarization Process} Before describing our proposal, we briefly discuss the physical setting. The outline of the implementation is founded in standard liquid NMR physics, although as the calculations will indicate, development of a useful NMR quantum computer will require more. We consider a collection of macromolecules, each containing $n$ atoms with nuclear spin $1/2$ and nuclear magnetic moment $\mu$, suspended in a liquid medium at temperature $\T$, so that the relaxation (coherence) time between the particles and the surrounding liquid is on the order of seconds or thousands of seconds. The liquid is subjected to a magnetic field $B_0$. Upon reaching thermal equilibrium, the difference between the fraction of particles oriented in the direction of the field, and those oriented in the opposite direction, is \[ \ep = {\mu B_0 \over kT} \] where $k$ is Boltzman's constant, approximately $10^{-16}$ in CGS units. A typical magnetic field $B_0$ is approximately $10^5$ Gauss. A nuclear magnetic moment such as that of the proton is approximately $10^{-23}$ in CGS units. At room temperature ($\T = 300$~K), with an especially strong magnet, we can therefore obtain $\ep \approx 3 \times 10^{-5}$. As will be explained later, the number of qubits upon which a quantum computation can be performed, is approximately $\ep^2 n$, where $n$ is the number of spin $1/2$ particles in the macromolecule we employ as our quantum computer. For $\ep$ in the range obtained above, and in order to carry out a quantum computation on a useful number of qubits (e.g.\ $10^2$), this would require an impractically large macromolecule of size about $10^{11}$. Hence it is imperative to create a stronger initial polarization $\ep$. An obvious parameter to consider is temperature. Reducing the temperature to $10^{-1}$~K gives $\ep \approx 10^{-1}$, therefore quantum computations on $10^2$ qubits become possible using a molecule of size about $10^4$. However, it is difficult to obtain long coherence times at these low temperatures. Perhaps a more promising avenue is the use of optical pumping techniques for boosting the value of $\ep$. Until recently this technique has been confined to atomic gases, particularly xenon \cite{BCGHHN,Pines}; values of $\ep$ exceeding $1/2$ have been attained. There are plans at IBM to explore these techniques for molecules that may be suitable for quantum computation. With a value of $\ep$ in this range, the size of the molecules needed for a quantum computation on $10^2$ qubits would be under $10^3$. In the remainder of the paper we will simply assume that some $\ep$ has been provided by the polarizing process, and from that starting point we will show how to initialize the computer so that it can carry out any desired computation. \subsection*{Abstract Setting} We start by describing an abstract computational model that describes an NMR quantum computer. An ``NMR quantum computer'' is described by four parameters: $n,\ell, k$, and $\epsilon$. $n$ is the number of qubits in the computer (it is the number of spins available for computation in each molecule of the NMR sample). Initially the $n$ qubits are in a thermal mixture which deviates slightly from a uniform distribution. $\ep$ is the bias induced, at the start, by the external polarizing process. Namely, if any given bit of the computer is measured, the probability that $\ket{0}$ is observed is ${1+\ep \over 2}$. We assume that the statistical correlation between any two bits on a molecule, falls off exponentially with the distance between those bits. $\ell$ is the ``correlation distance'', the distance such that the correlation falls below some prescribed threshold such as $1/10$. We will use the term $\ep$-biased distribution to refer to such a thermal mixture. If there were no correlations, the distribution on the bits would be binomial; in the more realistic case which we consider, we will be able to obtain all the same essential results as if the distribution was binomial. Only the analysis will be a little more difficult, and the numbers a little worse, than for the binomial distribution. Why is it sufficient to specify the distribution that results when we measure the $n$ qubits in the computational basis? To properly describe the bulk sample in thermal equilibrium, we would have to specify the density matrix associated with the bulk sample. Different mixtures of pure states with the same density matrix are indistinguishable by any measurement (so long as that measurement is applied to the whole ensemble, not to individual members of the ensemble), and therefore by any quantum computation followed by a measurement in the computational basis. However, we will further restrict the quantum computation that we will allow during the state initialization process. The state initialization will be carried out by a computation that can only permute the computational basis states (i.e.\ by essentially a classical computation). Under these restrictions, it is sufficient to specify only the probability distribution that results when we measure the initial state of the sample in the computation basis. This is because different mixtures of pure states with different density matrices, but with the same resulting probability distribution, yield the same result under a basis state permutation followed by a measurement in the basis state. Since at the end of our initialization process, we plan to obtain $O(n)$ qubits in the all $\ket{0}$ state, any further (general) quantum computation that is restricted to these qubits yields the same results that it would if started on a $\ket{\bar{0}}$ state. In addition to the operation of initializing the thermal mixture to an $\epsilon$-biased distribution, there are four primitive computational operations that an NMR quantum computer supports: a) Cyclically shift the $n$ bits clockwise or counterclockwise one position. b) Apply an arbitrary two bit operation to the first two bits. c) Measure the first bit (in some fixed basis). d) (For a quantum cellular automaton) For some fixed value of $k$ (depending upon the structure of molecule chosen for the NMR experiment), apply an arbitrary 2-bit operation to all pairs of bits with indices $lk$ and $lk +1$. Notes: 1. Operation (a) does not require that the macromolecule have a cyclic topology. Our operative assumption is a linear topology. The implementation of the cyclic shift operation is given in the ``Architecture'' section, below. 2. As stated at the outset, these operations are a model of an NMR quantum computer. It must be understood that there is considerable flexibility in the design of the model, and that for the sake of specificity, we have made some arbitrary choices; proper choices must eventually be made on the basis of experimental considerations. In fact, there can be substantial reward for enriching the above operations. The machine architecture given by operations (a)-(c) corresponds to that of a $1$-tape Turing Machine. (We will speak of the site where we can execute arbitary operations on the pair of bits, as the ``tape head''.) Later in the paper, after describing designs which yield operations (a)-(c), we will also briefly describe how a slight variation of the design can in fact yield the equivalent of a $2$-tape Turing Machine. (Still on a linear molecule.) With such a machine, the run time of our algorithm can be significantly improved. \subsection*{Overall Scheme} An ideal NMR quantum computer would have its $n$ qubit register initialized to $\ket{0^n}$. The main goal of this paper is to describe an efficient simulation of an ideal NMR quantum computer using an NMR quantum computer. Notice that if the bias $\epsilon$, in the initial state of the NMR quantum computer, were $0$ then the density matrix of the mixture (of the $n$ qubit computers) would remain unchanged by any sequence of computational steps. Therefore an NMR quantum computer with parameter $\epsilon =0$ is incapable of supporting any computation. Our goal is to use the small but constant bias $\epsilon >0$ to isolate $m = \Theta(n)$ qubits such that the reduced density matrix of these $m$ qubits is very close to the density matrix corresponding to the pure state $\ket{ 0^m}$. What we need in order to achieve this goal is quite simple: we wish to carry out a permutation of the computation basis states $x \in \{0,1\}^n$ such that states with low Hamming weight should be reencoded with a long prefix of $0$'s. A similar task has been addressed previously by a quantum computation \cite{CD}. However, in that method, the necessary permutations are accomplished with the aid of a quantum computer which already has at its disposal a clean workspace, i.e.\ a sequence of qubits in a known initial state (of size about $n^{1/2}$). Obtaining such a clean workspace, in an NMR computer, is precisely the problem which needs to be addressed in order to make NMR quantum computing possible in the first place. In other words, what complicates the construction of these permutations, for us, is that we cannot assume that we have any clean bits at all (i.e.\ bits whose distribution is almost entirely supported on $\ket{0}$ or $\ket{1}$) to store intermediate results of our computation, since all the available qubits are in the thermal state. Consequently, and because of the restricted set of primitive operations allowed on an NMR quantum computer (necessary because of the physical limitations), we are initially hampered in the kinds of logical operations we can implement in our computer. What we provide is an ``end-to-end'' procedure: we start with only a string of qubits in a thermal mixture, and we end with a string of qubits that with high probability are all in the $\ket{0}$ state. \begin{theorem} \label{thm1} Assume that the thermal mixture is in an $\epsilon$-biased distribution. Then there is a constant $c$ such that, using primitives (a) and (b), we can convert the given mixture to one in which $1-o(1)$ of the probability is supported on strings which begin with a run of $c \epsilon^2 n$ $0$'s. \end{theorem} The process which we will describe uses $O(n^2)$ steps. We will show how to obtain a value of approximately $20$ for $c$. A slightly more complicated implementation of our method (esp.\ by using blocks of size greater than $2$ in phase 2, see below) can decrease this constant further. \noindent{\bf Proof} We begin by permuting the bits; if we wish to minimize our reliance on any assumptions concerning the dependencies among spins in the original mixture, then the permutation of $\{1,...,n\}$ is chosen at random, uniformly, by the experimenter. If (as is more likely, and as was assumed in the previous section) we can assume only local correlations then it is enough to ``shuffle'' the bits in any predetermined manner that guarantees that all bits that start out close to each other (within distance $n^{1/3}$) end up far apart (at least distance $n^{1/3}$.) If we can really assume a binomial distribution on strings, then this initial permutation is unnecessary. Under weaker assumptions, the permutation is necessary in order for the probability bounds of the analysis to be valid. There are a variety of ways to carry out the permutation; using operations (a) and (b) it can be accomplished without difficulty using (to within a constant factor) the optimum number of transpositions. Typically, and in the worst case, this number will be on the order of $n^2$. We will analyze weak (i.e.\ locally correlated) distributions as follows. The initialization algorithm has the property that it partitions the $n$ bits into blocks of size $n^{1/3}$, and each processed bit output by the algorithm depends only on one of the blocks. Now, since the $n$ bits were randomly permuted, with high probability no two bits in any block started out at distance less than $n^{1/3}$. This implies (even under very weak assumptions on the manner in which local correlations decay) that the distribution on each block is very closely approximated by the binomial distribution. (Under the assumption that local correlations decay exponentially in distance, the distribution in the block will have exponentially small distance to the binomial distribution, in the $L_1$ norm.) After the initial permutation, we carry out the preparation of the initial segment of bits. This process will proceed in three phases. \begin{enumerate} \item Boosting to constant bias: In this phase we extract, from $n$ bits with bias $\ep$, $\Theta(\ep^2 n)$ bits which have large constant (i.e.\ independent of $n$) bias. This process is efficient (in terms of how many bits of output are produced) up to a constant factor. \item Obtaining polynomially small $\delta=(1-\ep)/2$ by increasing block sizes. \item Boosting to obtain a nearly perfect block of bits: In the final phase, while keeping the block size beneath $n^{1/2}$, we reduce $\delta$ beneath $n^{-10}$. The union bound then implies that a computation can then begin, working on the assumption that all bits are $0$'s, and incur only a polynomially small ($n^{-9}$) probability of error due to possible bad initialization. \end{enumerate} \subsection*{Phase 1: Amplification to constant bias} In phases 1-3 we partition the $n$ bits into blocks of size $n^{1/3}$. All computations of phases 1-3 are conducted internally within these blocks, until after phase 3 the clean bits are finally collected together in one location for use in a subsequent computation. In this way we ensure that we can use near-independence of the bits within each block. If the original probability distribution was binomial (rather than having local correlations), there is no need for this device. {\it {\bf Theorem \ref{thm1}, phase 1: } Starting with $n$ $\ep$-biased bits, and using operations (a),(b), we can with probability $1-o(1)$ obtain $\Omega(\ep^2 n)$ bits with bias at least $0.856$. } We will go through several rounds of amplification; as soon as $\ep$ exceeding $0.856$ is achieved, we stop using this process and switch to phase 2. The amplification scheme is very simple. Partition the bits into pairs. If the bits in a pair are different discard both. Else discard one. The expected bias towards $0$ among the surviving bits is $2 \epsilon \over 1 + \epsilon^2$. Also, the expected number of bits that survive is $n {1 + \epsilon^2 \over 4}$. Since the bits are nearly independent (they would be completely independent if the original distribution was binomial), a large deviation bound now implies that with probability at least $1 - e^{-n/3}$, the number of bits surviving is at least ${1 \over 4} n - n^{2/3}$. As we go through several ($k$) rounds, the probability that we wind up with less than $n 4^{-k} (1-n^{-1/3})^k$ bits is at most $ke^{-n/3}$. This is negligible. A little more complicated question is, can we wind up with bits with a constant ($0.856$) bias while bounding $4^{-k}$ from below by $\Omega(\epsilon^{2})$? A positive answer comes from the following analysis. From the formula $\epsilon_{i+1} = { 2 \epsilon_i \over 1 + \epsilon_i^2}$ we obtain two things. First, \[\epsilon_i = \epsilon_0 2^i / \prod_{j=0}^{i-1} (1+\epsilon_j^2).\] So we can rephrase our goal: we wish to upper bound $\prod_{j=0}^{i-1} (1+\epsilon_j^2)$ (where $\epsilon_i=\hat{\epsilon}$). In an ideal process in which $\epsilon$ doubled in each round, we would need $k=\lg (\hat{\epsilon}/\epsilon_0)$; in the true process we need to increase $k$ over this ideal quantity by $\lg \prod_{j=0}^{i-1} (1+\epsilon_j^2)$. In other words, the multiplicative effect on $4^k$ (over the optimal factor), is at most $(\prod_{j=0}^{i-1} (1+\epsilon_j^2))^2$. Second, \[ \epsilon_i = {1 - \sqrt{1 - \epsilon_{i+1}^2} \over \epsilon_{i+1}}.\] The remainder of this analysis is broken into two parts: the rounds until $\ep>1/100$, and the remaining rounds until $\ep>0.856$. For the first part we use the inequality \[ x \leq 0.02 \mbox{ implies } \sqrt{1-x} \geq 1- {1 \over 2} x - {1 \over 4} x^2 \] to show that \[ \epsilon_i \leq {1 \over 2} \epsilon_{i+1} (1 + {1 \over 2} \epsilon_{i+1}^2).\] In particular note that this implies \[ \epsilon_i \leq 0.5004 \epsilon_{i+1} \] so long as $\epsilon_i$ is beneath our threshold for using this analysis. Now, $\prod (1+\epsilon_j^2) \leq e^{\sum \epsilon_j^2}$. Consequently $\prod (1+\epsilon_j^2) \leq e^{{0.02}^2 {1 \over 1 - 0.5004}}$ and so the multiplicative effect on $4^k$ in these rounds (the factor for how many bits we are losing) is bounded by $e^{{0.02}^2 {2 \over 1 - 0.5004}} < 1.0017$. In the remaining sequence of rounds we have $0.01 < \ep_i \leq 0.856$. We obtain an upper bound on $(\prod_{j=0}^{i-1} (1+\epsilon_j^2))^2$ by explicitly calculating it beginning with the term corresponding to $0.856$ and working down, until and including the first term that is less than $0.01$ (which is the seventh iterate, equal to approximately $0.009985$). This product is less than $6.7$. \\ \noindent {\bf Implementation: } We have to be somewhat careful to implement the amplification scheme using the computational primitives described above. We can think of the machine given by primitives (a),(b), as a Turing machine, whose ``tape head'' is at the site at which arbitrary unitary operations can be implemented on a pair of adjacent bits. We will want to speak of the tape head carrying with it a small ``register'' of several bits: this is easily implemented, by interspersing rotations of the tape with transpositions at the site of the ``tape head''. We will use a two-bit register labelled $y_1,y_2$. We will perform the amplification in stages. Start with arbitrary bits in the two-bit register. For $m$ ranging from $1$ up to $N/2$ (where $N$ is the current number of bits left in the process --- initially $O(n^{1/3})$), carry out the two-bit operation ``are they equal?'', namely $\ket{01} \rightarrow \ket{11}$, $\ket{11} \rightarrow \ket{01}$, on the pair of bits, which we will call $x_{m,1}, x_{m,2}$. Now for $m$ ranging from $1$ up to $N/2$, do the following. Exchange $x_{m,1}$ with $y_1$, and $x_{m,2}$ with $y_2$. Now move the tape head back to the first pair, $x_{1,1},x_{1,2}$. For $i$ from $1$ to $m-1$, do the following: if $y_1=0$, exchange $y_2$ with $x_{i,2}$. Finally, move the tape head to pair $m$, and exchange $x_{m,1}$ with $y_1$, and $x_{m,2}$ with $y_2$. After $m$ reaches $N/2$, and before the next iteration, exchange each pair of bits $x_{j,1}, x_{2j,2}$ for $1 \leq j \leq N {1+\ep^2 \over 4} (1-o(1))$. This brings all the ``good'' bits to the initial segment of length $N {1+\ep^2 \over 4} (1-o(1))$. This will be the value of $N$ in the next stage. (The $1-o(1)$ term, derived from a law of large numbers, is chosen so that with high probability all bits in the segment are in fact ``good'' bits.) The total number of steps in all stages of all rounds is quadratic in the block size, hence $O(n^{2/3})$. At the end of the process, the $\Theta(n^{1/3} \ep^2)$ good bits lie in a segment at the start of the block.\\ Why is it necessary to switch to phase 2 once the bias of the bits is high? Because once the bits have high bias, the bit that is discarded in a phase 1 computation itself has substantial bias. Consequently the method is wasteful; if we continued with phase 1 to the end, the ratio of clean bits obtained to the number we started with, would tend to $0$ in $n$ (rather than being the fixed quantity $\Omega(\ep^2)$, independent of $n$). In phases 2 and 3 we use blocks that, instead of being of the fixed size $2$, increase together with the bias. Only one or a constant number of bits are discarded from each block of the computation, and it becomes possible to discard a small fraction of the bits, while still amplifying those that remain. \subsection*{Phase 2: obtaining polynomially small $\delta$.} {\it {\bf Theorem \ref{thm1}, phase 2: } Starting with $n$ bits of bias at least $0.856$, and using operations (a),(b), we can obtain $\Omega(n)$ bits with $\delta 0$, how many such bits can we hope to prepare? If we ask that with high probability $k$ bits are all $0$'s, then the central limit theorem places a limit on $k$ of $n(1-H_2({1+\epsilon \over 2}))$ which, for small $\epsilon$, is approximately $n \epsilon^2$. To prepare just one good bit, therefore, we must use about $\epsilon^{-2}$ bits with bias $\epsilon$. \section*{Architecture} We now discuss how the computational primitives (a),(b), and some extensions, can be implemented on polymers with certain kinds of periodic structures. \subsection*{Turing machine: } Normally one imagines a Turing machine having a ``head'' which implements computations locally, i.e.\ involving the state of the ``tape'' in the vicinity of the head. We implement this abstraction (but without any moving parts) in the following way. (It must be understood that there is considerable flexibility in the design, and that for the sake of specificity, we are making some arbitrary choices; the proper choices must eventually be made on the basis of experimental considerations.) The tape will not of course be infinite, but a ring of $n$ qubits. These will be realized in the nuclear spins of a linear polymer. The polymer will consist of $n/3$ repetitions of the sequence $\a\b\c$, thus $\a\b\c\a\b\c\a\b\c\a\b\c...$; the atoms $\a,\b,\c$ have spin $1/2$ nuclei. In addition, at one point in the chain, another atom, $\d$, is adjacent to the chain, near a neighboring pair of $\c$ and $\a$ atoms; it induces a chemical shift in some of the energy levels at these two neighboring atoms. (Note: it is not actually necessary for $\a$,$\b$ and $\c$ to be different types of nuclei; they could all be of one kind, if the periodic structure resides in adjacent atoms that induce suitable chemical shifts in the energy levels.) Five resonant frequencies will be such that we can implement the following five operations: \begin{enumerate} \item Frequency 1: transposition of the qubits in all adjacent $\a\b$ pairs. \item Frequency 2: transposition of the qubits in all adjacent $\b\c$ pairs. \item Frequency 3: transposition of the qubits in all adjacent $\c\a$ pairs. \item Frequencies 4,5: these resonate only with energy levels shifted by the presence of atom $\d$. Hence they induce a unitary operator only on the pair of qubits at the $\c$ and $\a$ atoms immediately adjacent to atom $\d$. We assume that the combinations of frequencies 4 and 5 generate the group of all transformations in that $4$-dimensional Hilbert space. \end{enumerate} Arbitrary ``oblivious'' quantum computations can be performed on this machine. By an ``oblivious'' computation we mean one in which the sequence of movements of the tape head is a function is the same in all the superposed ``copies'' of the machine, in the quantum computation. A cyclic shift of the tape by one position is implemented by the following sequence of transpositions: $(\a,\b)$, $(\c,\a)$, and then $(\b, \c)$. (Each such transposition can be implemented by three CNOT gates: for example $(\a,\b)$ can be implemented by the sequence $[\a \rightarrow \b], [\b \rightarrow \a], [\a \rightarrow \b]$.) A succession of such triples of transpositions will bring any desired pair of adjoining qubits next to the tape head. \subsection*{Cellular automaton with distinguished site: } Lloyd\cite{Ll} has proposed implementing a quantum cellular automaton. We propose an architecture similar to what we have described above, but now we use five kinds of atoms: three $(\a,\b,\c)$ have spin $1/2$ nuclei and two $(\d,\e)$ induce chemical shifts in resonant frequencies of nearby atoms of the first three types. We assume that $k | n$. The ring consists of repetitions of the pattern $\a \b \c$; after every $k$ atoms of type $\a,\c$, one atom of type $\d$ adjoins the chain and induces local chemical shifts. At one site an $\e$ atom adjoins the chain and induces chemical shifts, which are different from those induced by $\d$. One step of the computation is implemented by a pulse at a frequency that involves a $\d$ atom and the two adjacent spin $1/2$ atoms; rotations of the tape are implemented as above, small rotations allow information to be sent between adjacent ``cells'' of the cellular automaton, while global rotations bring the tape contents past the $\e$ site, where individual operations may be implemented. \subsection*{Two-tape Turing machine: } To implement a two-tape Turing machine we need to enable the head to move independently on each of the tapes. Equivalently, in our implementation, we need to have two cycles of bits, which can independently be cyclically shifted past the head. Let the molecule consist of $n$ repetitions of the sequence $\a\b\c\d$. (As above, these are spin $1/2$ nuclei and each adjacent type of pair can be addressed with distinctive frequencies.) The $\a$ and $\c$ nuclei will carry one tape, the $\b$ and $\d$ nuclei the other. (Note that the nuclei of any given type carry a contiguous segment of half a tape, not every other bit.) The sequence of transpositions $(\a\b)(\b\c)(\a\b)(\c\d)(\a\d)(\c\d)$ rotates the $\a\c$-tape by one position, while leaving the $\b\d$-tape fixed. The most time-consuming stages of our procedure are the initial permutation of the bits and the final collecting of the clean bits, each requiring time $O(n^2)$. In fact, these are the only stages which require more than time $O(n^{4/3})$. The terminal permutation is very simple; the initial permutation can be very simple, as well, so long as we make the ``local correlations'' assumption on our initial $\ep$-biased distribution, in which case we can use the permutation which sends bit $r n^{1/3} + s$ (for $0 \leq s < n^{1/3}$) to position $(r+s)n^{1/3}+s$. In this case, the initial permutation can be performed in time $O(n^{4/3})$, and the final permutation in linear time, on the $2$-tape architecture. Consequently, the entire procedure can be implemented in time $O(n^{4/3})$. If we further augment our device by combining the features of a $2$-tape machine with those of a cellular automaton, with $k=n^{1/3}$, then the initial permutation can be performed in linear time, and in phase 3 and the latter part of phase 2 we can gain time by working in parallel within each $n^{1/3}$-size block. The overall runtime reduces to linear. Thus there is substantial benefit in implementing slightly stronger primitives than the minimal list of operations (a)-(c). \subsection*{Acknowledgments} Thanks to Isaac Chuang and Richard Singerman for helpful discussions. \begin{thebibliography}{99} \bibitem{BCGHHN} Benton,~D.~R., Cates,~G.~D., Gatzky,~M., Happer,~W., Hasson,~K.~C. and Newbury,~N.~R., ``Laser Production of Large Nuclear Spin Polarization in Frozen Xenon'', Phys. Rev. Letters 65, 2591 (1990). \bibitem{BV} Bernstein,~E. and Vazirani,~U., ``Quantum complexity theory'', {\em Proceedings of the 25th Annual ACM Symposium on Theory of Computing}, 1993, \mbox{pp.~11\,--\,20}. Special issue of {\em Siam J. Comp.}, October, 1997. \bibitem{CZ} Chirac,~J. and Zoller,~P., {\em Phys. Rev. Lett.} 74, pp. 4091 (1995). \bibitem{CGK} Chuang,~I.~L., Gershenfeld,~N. and Kubinec,~M. ``Experimental Implementation of Fast Quantum Searching'', Phys. Rev. Letters, vol. 80 issue 15, 3408-3411, April 13 1998. \bibitem{CGKL} Chuang,~I.~L., Gershenfeld,~N., Kubinec,~M.~G. and Leung,~D.~W. ``Bulk Quantum Computation with Nuclear Magnetic Resonance: Theory and Experiment'', Proc. R. Soc. Lond. A (1998) 454, 447-467. \bibitem{CY} Chuang,~I., Yamamoto,~Y., {\em Phys. Rev. Lett.} 52, pp. 3489 (1995). \bibitem{CORW} Chupp,~T.~E., Oteiza,~E.~R., Richardson~J.~M., and White,~T.~R., ``Precision frequency measurements with polarized $\mbox{}^3$He, $\mbox{}^{21}$Ne, and $\mbox{}^{129}$Xe atoms, {\em Phys. Rev. A} 38, 3998-4003 (1988). \bibitem{CD} Cleve,~R., and DiVincenzo,~D., ``Schumacher's quantum data compression as a quantum computation'', Phys Rev A 54, 1636, 1996. \bibitem{CFH} Cory,~D.~G., Fahmy,~A.~F., and Havel,~T.~F. ``Ensemble Quantum Computing by Nuclear Magnetic Resonance Spectroscopy'', Proc.\ Natl.\ Acad.\ Sci.\, 94:1634-1639, 1997. \bibitem{De} Deutsch,~D., ``Quantum theory, the Church-Turing principle and the universal quantum computer'', {\em Proceedings of the Royal Society, London}, vol.~A400, 1985, pp.~97\,--\,117. \bibitem{DJ} Deutsch,~D. and Jozsa,~R., ``Rapid solution of problems by quantum computation'', {\em Proceedings of the Royal Society, London}, vol.~A439, 1992, \mbox{pp.~553\,--\,558}. \bibitem{Di} DiVincenzo,~D., {\em Phys. Rev. A} 50, pp. 1015 (1995). \bibitem{DRBH} Domokos,~P., Rainond,~J., Brune,~M., Horache,~S., {\em Phys. Rev. Lett.} 52, pp. 3554 (1995). \bibitem{Fe} Feynman,~R., ``Simulating physics with computers'', {\em International Journal of Theoretical Physics}, vol.~21, nos.~6/7, 1982, \mbox{pp.~467\,--\,488}. \bibitem{GC} Gershenfeld,~N., Chuang,~I., {\em Science}, 275, pp. 350 (1997). \bibitem{Gr} Grover,~L.~K., ``A Fast Quantum Mechanical Algorithm for Database Search'', Proc.\ 28'th Symp. Theory Comput., 212-219 1996. \bibitem{Ll} Lloyd,~S., {\em Science} 261, pp. 1569 (1993). \bibitem{Mo} Monroe,~C. {\em Phys. Rev. Lett.} 75, pp. 4714 (1995). \bibitem{Pines} Raftery,~D., Long,~H., Grandinetti,~P.~J., Shykind,~D. and Pines,~A., ``Multiple-Pulse Nuclear Magnetic Resonance of Optically Pumped Xenon in Low Magnetic Field'', {\it Phys. Rev. A}, 50, 567-574 (1994). \bibitem{Sh} Shor,~P.\,W., ``Algorithms for quantum computation: Discrete logarithms and \mbox{factoring}'', {\em Proceedings of the 35th Annual IEEE Symposium on Foundations of \mbox{Computer} Science}, 1994, \mbox{pp.~124\,--\,134}. Special issue of {\em Siam J. Comp.}, October, 1997. \bibitem{Si} Simon,~D., ``On~the power of quantum computation'', {\em Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science}, 1994, \mbox{pp.~116\,--\,123}. Special issue of {\em Siam J. Comp.}, October, 1997. \bibitem{Tu} Turchette,~Q., {\em Phys. Rev. Lett.} 75, pp. 4710 (1995). \bibitem{Wa} Wago,~K., {\em J. Vac. Sci. Technol. B} 14, 1197 (1996). \end{thebibliography} \end{document}