A Hybrid Atom-Array Architecture for Universal Quantum Computation: The FCC Sheet Code with 3D Color-Code Magic-State Factory

A Hybrid Atom-Array Architecture for Universal Quantum
Computation:
The FCC Sheet Code with 3D Color-Code Magic-State Factory
Raghu Kulkarni
SSMTheory Group, IDrive Inc.
June 15, 2026
Abstract
Universal fault-tolerant quantum computation requires magic-state injection at a rate matched
to algorithmic
T
-gate density. The injection itself is a gate-teleportation primitive that reduces
to parallel logical CNOTs between ancilla logicals (holding magic states) and target logicals
(in memory). In existing 3D-code architectures for reconfigurable neutral-atom arrays, the
memory side of this interface throttles total throughput: surface code, BB, and 3D color code all
support at most
O
(1) logical CNOTs per physical-gate layer at the memory-factory interface,
executed sequentially via lattice surgery or code switching. We propose a hybrid atom-array
architecture in which the FCC sheet code serves as the bulk memory and Clifford layer and the
3D color code serves as the
T
-state factory, with the memory-factory interface implemented by a
high-throughput parallel-teleportation primitive specific to the FCC sheet code. The FCC sheet
code is a [[3
L
3
,
6
L, L
]] CSS code on the three triad sheets of the face-centered-cubic lattice. Its
three sheets are related by an order-three rotational symmetry
R
: (
x, y, z
)
7→
(
y, z, x
), which we
verify computationally at
L
= 4
,
6
,
8 is a CSS isomorphism whose induced action on logicals is
a (2
L
)
×
(2
L
) symplectic permutation. A single transversal layer of
R
-paired physical CNOTs
implements 2
L
parallel logical CNOTs between two sheets in one physical-gate layer, with no
merge window, no gauge bits, and trivial
d
each
scaling. Circuit-level simulation on the full sheet
code at
L
= 4
,
6
,
8 shows that the transversal layer preserves the
1% memory threshold under
correlated (feed-forward) decoding; a naive per-sheet matching decoder instead collapses on the
target sheet, a decoder artifact that we trace to the control-to-target error spread and resolve
with the feed-forward decoder provided here. This is the parallel-teleportation primitive that
the architecture’s memory-factory interface uses: 2
L
magic states are injected into 2
L
memory
logicals in a single physical-gate layer, with throughput that scales with
L
. Within-sheet local
Cliffords (memory,
H
,
S
, single-logical Pauli) and cross-sheet ZZ-/XX-merge surgery (Section 4)
handle the remaining Clifford operations. We provide an explicit triangle-ribbon construction
for cross-sheet ZZ-merge, verified at
L
= 4
,
6
,
8, and show that its merge footprint is 3
L/
2 atoms
approximately 4
/
3
×
smaller than the surface-code routing-channel equivalent. The architecture
targets QuEra Gemini-class, Atom Computing, and Pasqal hardware, where Rydberg-based
interactions and tweezer rearrangement realize the required
K=
12 FCC connectivity and the
R-respecting inter-sheet coupling naturally.
1 Introduction
Reconfigurable neutral-atom arrays have, over the past three years, become a leading hardware
platform for early fault-tolerant quantum computation. QuEra’s Gemini-2 (256 atoms, deployed 2025)
and Atom Computing’s 1180-atom system (announced late 2024) offer commercial access to arrays
1
whose connectivity is set by Rydberg-blockade radius rather than fabricated couplers, and whose
qubits are physically rearrangeable on microsecond timescales between gate layers [
11
,
12
,
16
,
17
,
18
].
Bluvstein and collaborators have demonstrated below-threshold surface-code memory and universal-
logic primitives in a single 448-atom architecture [
12
]. These platforms have the structural property
that three-dimensional codes which require physically realized inter-block connectivity beyond
what planar 2D superconducting chips provide can be deployed natively, by simply arranging
atoms in the required geometry.
The architectural question is then: which 3D code is the right one to use, and for what purpose?
Three contenders have emerged. (i) Quantum LDPC codes such as the bivariate-bicycle (BB)
“gross code” [
8
] achieve a logical-qubit rate
k/n
approximately ten times that of the surface code at
matched distance, but their connectivity requirements (Tanner-graph degree six, with non-local
edges) and their lack of a transversal non-Clifford gate makes them best suited to dense bulk memory.
(ii) The 3D color code possesses a transversal
T
gate (or transversal CCZ on the 3-torus) [
10
],
eliminating most of the overhead of magic-state distillation; it is the natural choice for a non-Clifford
factory [
12
], but its high-weight stabilizers (typically
w
6) and specialized restriction or BP-OSD
decoders make it heavier as bulk memory. (iii) The 3D toric code is single-shot [
15
] but has fixed
k = 3 independent of system size and a poor logical-density scaling.
In a universal architecture, no single code suffices for both roles, and the standard pattern is
to pair a memory code with a factory code via a magic-state injection interface. In every existing
3D-code architecture, the interface is the throttle. Each magic-state injection is a gate-teleportation
primitive that reduces to a logical CNOT between an ancilla logical (holding the magic state) and a
target logical (in memory). When the algorithm calls for
N
parallel
T
-gates, the interface must
execute
N
logical CNOTs in parallel. Surface code and 3D color code architectures execute these
CNOTs sequentially via lattice surgery or code switching, at
d
rounds per CNOT; BB codes
execute them via Tour-de-Gross routing, at higher constant overhead. The factory’s intrinsic output
rate is rarely the bottleneck for typical T -densities; the memory-side interface throughput is.
This paper. We argue that the FCC sheet code fills the missing role by supplying a high-throughput
parallel-teleportation primitive that does not exist in any other 3D code in the current literature.
The primitive arises from the FCC lattice’s threefold rotational symmetry
R
: (
x, y, z
)
7→
(
y, z, x
),
which we show is a CSS isomorphism of the FCC sheet code with a clean induced action on logicals.
Applied as an inter-sheet pairing,
R
implements 2
L
parallel logical CNOTs between two sheets
in one physical-gate layer. This is exactly the primitive a parallel magic-state injection interface
needs. Combined with a 3D color-code factory operating in a separate spatial zone, the resulting
architecture supports universal fault-tolerant computation with memory-factory interface throughput
that scales with L rather than being throttled at O(1) logical CNOTs per layer.
Summary of the architecture. We propose a two-zone atom-array architecture (Figure 1):
FCC sheet code memory zone. Three triad sheets at
L
4 encode 6
L
logical qubits (24
at
L=
4) with rate
6
.
3% logical density per atom. Stabilizers are weight 4; memory and
the transversal CNOT are MWPM-decoded (the latter with the feed-forward correction of
Section 5), while cross-sheet surgery uses BP+OSD because its weight-3 ribbon operators
create hyperedge errors; each sheet decomposes into
L
independent stacked toric codes.
Within-sheet local Cliffords use standard 2D toric/surface-code primitives. Cross-sheet joint
Pauli measurements use a triangle-ribbon surgery primitive with merge footprint 3
L/
2 atoms,
verified at L = 4, 6, 8.
2
Memory zone: FCC sheet code
S
xy
S
xz
S
yz
R : (x; y; z) ! (y; z; x)
transversal CNOT
[[3L
3
; 6L; L]] weight-4 stabilizers, MWPM-decodable
2L logicals/sheet, 24 total at L = 4, K = 4 ancilla
Factory: 3D color code
[[15; 1; 3]] [[15; 1; 3]]
[[15; 1; 3]] [[15; 1; 3]]
15-to-1 distillation
transversal T: cubic error suppression
jT
®
output stream
parallel teleport
2L injections
per layer
L = 4: » 384 atoms memory + » 100--200 atoms factory
Figure 1: Hybrid atom-array architecture. Left: the FCC sheet code memory zone hosts 6
L
logical
qubits across three interpenetrating triad sheets (
S
xy
, S
xz
, S
yz
), 2
L
per sheet. Within-sheet local
Cliffords use standard layer-wise toric-code primitives at
K=
4. Cross-sheet operations use the FCC
threefold rotation
R
. Right: a
T
-state factory built from small 3D color-code blocks distils
|T
states
using transversal
T
. Center (the architecture’s new primitive): the parallel-teleportation
interface, implemented by the
R
-paired transversal-CNOT layer of Section 3, injects up to 2
L
magic
states from the factory into the memory in a single physical-gate layer. Throughput scales with
L
,
in contrast to the O(1) throughput of surgery-based interfaces.
Parallel-teleportation interface (the new primitive). 2
L
parallel logical CNOTs between
two sheets in a single physical-gate layer, paired by the FCC threefold rotation
R
. This is
the interface primitive used for parallel magic-state injection from the factory into memory.
Throughput scales linearly with
L
, in contrast to the
O
(1) throughput of surgery-based
interfaces in surface-code, BB, and 3D color-code architectures.
3D color-code factory zone. Small [[15
,
1
,
3]] tetrahedral color codes prepare
|T
states via
transversal
T
, distilled by 15-to-1 routines [
13
]. A bank of factories produces magic states
in parallel at a rate that scales with the number of factory blocks, designed to match the
memory’s 2L-parallel injection capacity.
Atom-array realization. The FCC lattice and three sheets are laid out as three interpene-
trating tweezer arrays. The parallel-teleportation interface is realized by tweezer rearrangement
that brings
R
-paired data atoms into Rydberg-blockade range, followed by a single global gate
pulse. Magic states are produced in a separate factory zone and shuttled to the memory via
the array’s native rearrangement primitive.
Outline. Section 2 reviews the FCC sheet code’s construction, code parameters, and decomposition
into stacked toric codes. Section 3 establishes the threefold rotation as a CSS isomorphism and
develops the transversal-CNOT construction. Section 4 presents the within-sheet surgery primitives
3
(ZZ-merge and XX-merge) and their boundary-aware planar variants. Section 5 consolidates the
threshold simulations across all primitives. Section 6 develops the full hybrid architecture, including
the magic-state factory interface. Section 7 compares architectural cost against the surface-code-
plus-15-to-1 baseline. Section 8 maps the architecture onto reconfigurable atom-array hardware.
Section 9 discusses scope, limitations, and open questions.
Scope and what this paper is not. This paper proposes an architecture; it is not a hardware
demonstration, nor a claim that the FCC sheet code is universally optimal across all metrics. On
rate at very high distance (say
d
12), bivariate-bicycle codes pull ahead and remain the better
memory choice [
8
,
9
]. On transversal non-Clifford gates, the 3D color code dominates and remains
the right factory choice [
10
,
12
]. The FCC sheet code does not provide a general-purpose “faster
Clifford CNOT” for arbitrary single-CNOT patterns those are best handled by within-sheet local
Cliffords or cross-sheet surgery. Our claim is narrower: at the parallel magic-state injection interface,
where many logical CNOTs in a structured pattern are needed simultaneously, the FCC transversal
CNOT provides 2
L
-parallel throughput that no other code in this survey matches. When paired
with a factory that produces magic states at a scaling rate, this primitive turns the memory-factory
interface from an
O
(1)-throughput bottleneck into an
O
(
L
)-throughput pipeline. The architecture’s
value comes from this specific combination, not from any single code in isolation.
2 The FCC Sheet Code
The FCC sheet code introduced here is closely related to, but distinct from, the high-rate FCC-edge
CSS code of [
1
]. Both codes place qubits on edges of the FCC lattice; the difference is in the
stabilizer choice. The high-rate code uses weight-12 stabilizers (vertex
Z
over all 12 incident edges;
octahedral
X
over all 12 edges of a void), yielding a constant-distance [[3
L
3
,
2
L
3
+
2
,
3]] code with
rate approaching 2
/
3. The sheet code we develop here restricts each stabilizer to a single triad
sheet of
K=
4 edges, yielding a [[3
L
3
,
6
L, L
]] code with weight-4 stabilizers, lower rate (
6% logical
density per atom at
L
= 4), and distance growing as
L
. These are two different points on the
rate–distance frontier of the FCC-edge lattice: the high-rate code maximizes
k/n
at fixed
d
, the
sheet code maximizes
d
at fixed local stabilizer weight. The architectural primitives of this paper
the layer decomposition, the cross-sheet surgery, the rotation-derived transversal CNOT rely on
the sheet decomposition and do not apply to the weight-12 code.
2.1 Triad Decomposition
The face-centered-cubic lattice has
K
= 12 nearest-neighbor vectors, partitioning into three orthog-
onal sheets of four:
S
xy
: (±1, ±1, 0),
S
xz
: (±1, 0, ±1), (1)
S
yz
: (0, ±1, ±1).
Each FCC edge belongs to exactly one sheet. At lattice size
L
(even, for toric boundaries), each
sheet contains L
3
edges, and restricted to a single sheet each vertex has K=4 incident edges.
2.2 Sheet Code Stabilizers
Fix one triad sheet
S
. Place one physical qubit on each edge in
S
, giving
n
=
L
3
qubits. The
stabilizers are:
4
Z-stabilizers: for each vertex v, apply Z to the 4 edges of S incident to v.
X
-stabilizers: for each octahedral void
o
, apply
X
to the 4 edges of
S
connecting the 6 vertices
surrounding o.
Both stabilizer types have uniform weight 4. The CSS condition
H
X
H
Z
= 0 over
F
(2) is satisfied
because each edge in
S
participates in exactly two vertices and exactly two octahedral voids restricted
to S, so any vertex-void overlap is even.
2.3 Code Parameters and Layer Decomposition
Theorem 1 (Sheet code parameters). At even
L
, the FCC sheet code on a single triad sheet has
parameters [[L
3
, 2L, L]].
The full three-sheet code, with all three sheets deployed simultaneously on the same FCC lattice,
has parameters [[3
L
3
,
6
L, L
]] three independent copies of [[
L
3
,
2
L, L
]] on disjoint data-qubit sets
that share the underlying lattice geometry.
The proof of Theorem 1 proceeds via the layer decomposition that is, in our view, the most
important structural fact about this code.
Theorem 2 (Layer decomposition). The FCC sheet code on
S
xy
at even
L
is isomorphic, as a
stabilizer code, to
L
disjoint 2D toric codes, each on an
L × L
rotated square lattice with
L
2
data
qubits,
k
= 2 logicals, and distance
L
. Analogous decompositions hold for
S
xz
(layered by
y
) and
S
yz
(layered by x).
Proof.
Each
S
xy
edge (
v
1
, v
2
) has
z
(
v
1
) =
z
(
v
2
) because the displacement vector
v
2
v
1
{
(
±
1
, ±
1
,
0)
}
has
dz
= 0. Define
layer
(
e
) =
z
(
v
1
). This partitions the
L
3
edges of
S
xy
into
L
disjoint sets of
L
2
edges each. A vertex
Z
-stabilizer at
v
acts on the four
S
xy
-edges incident
to
v
, all sharing the
z
-coordinate of
v
, so it is supported entirely within one layer. The same
argument applies to octahedral-void
X
-stabilizers within
S
xy
. Within layer
z
=
z
0
, the
L
2
edges
connect vertices
{
(
x, y, z
0
) :
x
+
y z
0
(
mod
2)
}
via (
±
1
, ±
1
,
0); this is the standard rotated
L × L
toric code. The 2D toric code on
L
2
qubits has
k
= 2 and
d
=
L
. Summing across
L
layers gives
rank
(
H
Z
) =
rank
(
H
X
) =
L
(
L
2
/
2
1) = (
L
3
2
L
)
/
2 and
k
=
L
3
2(
L
3
2
L
)
/
2 = 2
L
. The
minimum-weight logicals are the non-contractible cycles within layers, length L.
The layer decomposition has several consequences worth flagging. First, the rank formula holds
for every even
L
without per-
L
verification. Second, decoding the FCC sheet code reduces to
running an independent 2D toric-code decoder (e.g. MWPM) on each layer in parallel; no specialized
3D decoder is required. Third, the code’s distance scaling is
d
=
L
rather than the surface code’s
d
2
n; the rate is k/n = 2/L
2
per sheet, identical to the 2D toric code at distance L.
2.4 Planar Variant
For deployment without periodic boundary conditions, each layer becomes a rotated surface code
[[(
L
1)
2
,
1
, L
1]] via standard boundary engineering [
4
,
2
]. The planar sheet code per sheet has
parameters
[[L(L 1)
2
, L, L 1]] (planar boundaries). (2)
The distance drops by one and the rate halves from 2
L
to
L
per sheet. We will use the planar
variant in some threshold simulations (Section 5); for the architectural proposal, either variant is
acceptable, with the toric variant offering higher logical density and the planar variant simpler
atom-array layout.
5
2.5 Hardware footprint at L = 4 and the role of three sheets
To anchor the rest of the paper in concrete numbers, the toric variant at
L
= 4 gives one sheet
with parameters [[64
,
8
,
4]]: 64 data qubits, 8 logical qubits, distance 4. This is 8 atoms per logical
qubit on the data side, or
16 atoms per logical including ancillas, comparable to the best 3D-code
rates published at this scale. The three-sheet deployment on the same FCC lattice gives [[192
,
24
,
4]]
24 individually addressable logicals in 192 data qubits. The role of having three sheets (rather
than just one) is twofold: it triples the logical density per atom-array footprint, and it enables the
cross-sheet transversal CNOT that we develop in Section 3.
3 The Threefold Rotation as a CSS Isomorphism
The three triad sheets of the FCC lattice are interrelated by an order-three rotational symmetry
of the FCC point group. We show in this section that this symmetry lifts to a CSS isomorphism
between the three single-sheet codes, with a clean induced action on logical operators, and use it to
construct a strictly fault-tolerant transversal CNOT between sheets.
3.1 The rotation
Define
R : (x, y, z) 7→ (y, z, x). (3)
R
is an order-3 element of the cubic point group, hence an automorphism of
Z
3
L
that preserves
parity sums and maps FCC vertices to FCC vertices. Its action on triad displacement vectors is
(d
x
, d
y
, d
z
) 7→ (d
y
, d
z
, d
x
), which cycles the three sheets:
S
xy
R
S
xz
R
S
yz
R
S
xy
. (4)
Computational Result 1 (CSS isomorphism, verified at
L
= 4
,
6
,
8). The column permutation
π
on the data qubit set induced by
R
has the following properties, verified computationally at
L
= 4
,
6
,
8:
Edge permutation.
π
is a permutation of the full edge set
E
with
π
(
edges of T
) =
edges of R
(
T
)
for each sheet T {S
xy
, S
xz
, S
yz
}.
Stabilizer preservation. For every vertex
Z
-stabilizer
g
of sheet
T
,
π
(
g
) is identically a vertex
Z-stabilizer of sheet R(T ). Similarly for every octahedral X-stabilizer.
Logical permutation. The induced action of
R
on logical operators is a (2
L
)
×
(2
L
) permutation
matrix. Each
L
= 4 logical at sheet
T
(there are 8 per sheet, 24 total) maps to a single logical
operator of sheet
R
(
T
), preserving the
X/Z
type, the weight, and the symplectic pairing
M
X
M
Z
= I.
The stabilizer-preservation and edge-permutation parts follow structurally from the fact that
R
is a rotational symmetry of the FCC lattice; these parts are expected to hold for all
L
, with the
computational verification serving as confirmation. The logical-permutation part is verified at the
canonical basis level only at
L
= 4
,
6
,
8; a general proof would require a basis-independent argument,
which we leave open.
The structural counts at
L
= 4 are: 96 vertex
Z
-stabilizers and 96 octahedral
X
-stabilizers
across the three sheets, all matched by
π
; 24 logical operators (8 per sheet
×
3 sheets), all mapped
to single logical operators of the next sheet. The same matching counts hold at
L
= 6 (72 logicals,
all matched) and L = 8 (96 logicals, all matched).
6
3.2 Logical action of R
For each sheet
T
we extract a canonical logical basis
{
(
¯
X
T
i
,
¯
Z
T
i
)
}
2L
i=1
by Gaussian elimination in
F
(2), starting from generators of
ker
(
S
T
Z
)
/S
T
X
for
¯
X
and
ker
(
S
T
X
)
/S
T
Z
for
¯
Z
. We then apply row
and column operations to bring the symplectic matrix
M
ij
=
¯
X
T
i
,
¯
Z
T
j
mod
2 to the identity. All
resulting logicals have weight L.
For each logical
¯
O
T
i
(
O {X, Z}
), we apply
π
and re-express the image in the logical basis of
sheet R(T ) via the linear-algebra solver
π(
¯
O
T
i
)
?
=
X
j
(σ
O
)
ji
¯
O
R(T )
j
+ (stabilizer of R(T )). (5)
At
L
= 4
,
6
,
8, the matrices
σ
X
and
σ
Z
are equal and are (2
L
)
×
(2
L
) permutation matrices. The
symplectic pairing
σ
X
σ
Z
=
I
holds identically, confirming that the induced logical action preserves
the symplectic structure of the code’s logical algebra.
Proposition 1. Applied as a SWAP layer cycling the three sheets, the rotation
R
implements a
logical permutation of the 6
L
logical qubits that preserves the
X/Z
partition and the symplectic
structure.
3.3 Transversal CNOT between two sheets
We turn
R
into an entangling operation by using it not as a SWAP but as an inter-sheet qubit
pairing. Pair each data qubit
q
on sheet
T
with the data qubit
π
(
q
) on sheet
R
(
T
), and apply a
physical CNOT CX
q(q)
in parallel across all L
3
qubits of T . Call this transversal layer U
T R(T )
R
.
Computational Result 2 (Transversal CNOT, verified at L = 4, 6, 8). At L = 4, 6, 8:
Stabilizer preservation. U
T R(T )
R
maps the joint stabilizer group of T R(T ) to itself.
Logical action. On the logical level,
U
T R(T )
R
implements 2
L
parallel logical CNOTs, with
sheet
T
as control and sheet
R
(
T
) as target. The pairing of logicals between control and target
sheets is the same permutation σ from Computational Result 1.
Fault tolerance. A single-qubit fault on the control (resp. target) side of any individual
CX
propagates to at most one additional qubit on the target (resp. control) side. This is the
standard transversal-CNOT spread, bounded by weight two.
The fault-tolerance argument is the same as for the standard transversal CNOT between two
surface-code patches [
2
]: errors propagate weight-2 and remain local, so the post-CNOT code
distance equals
min
(
d
T
, d
R(T )
) =
L
. Realizing this distance in decoding requires accounting for the
control-to-target error spread: a correlated (feed-forward) decoder achieves it, whereas independent
per-sheet matching does not (Section 5).
3.4 Composition gives full Clifford universality
Proposition 2. Let
C
single
denote the group of single-sheet Clifford operations generated by the
local logical primitives of Section 4 (memory,
H
,
S
on individual logicals; within-sheet
ZZ
- and
XX
-merge surgery). Let
U
R
be the transversal-CNOT layer between any pair of sheets. Then
⟨C
single
, U
R
is the full Clifford group on the 6L-logical system.
7
The proof is standard: within-sheet Cliffords generate all single-sheet Clifford operations; the
transversal CNOT supplies inter-sheet entanglement; together they suffice to generate
Cliff
(6
L
)
by the standard generation theorems [
7
]. We emphasize that the entire Clifford layer of the
architecture is therefore covered: bulk single-logical gates by single-sheet primitives, bulk entangling
by the transversal CNOT. Non-Clifford gates require an external resource, which is the role of the
magic-state factory in Section 6.
4 Cross-Sheet Surgery Primitives
While the transversal CNOT of Section 3 is the architectural workhorse for unitary inter-sheet
entangling gates, the architecture also requires joint Pauli measurements between logicals for state
preparation, gate teleportation, and the magic-state injection interface (Section 6). We develop the
cross-sheet joint Pauli measurement primitives in this section.
4.1 FCC triangles
Lemma 1 (Triangle structure). Every triangle (3-cycle) in the FCC nearest-neighbor graph has
one edge in each of the three triad sheets. At lattice size
L
, the FCC graph contains 4
L
3
triangles,
and each FCC edge participates in exactly four triangles.
Proof.
For three mutually adjacent FCC vertices
v
a
, v
b
, v
c
, the three edge-vectors must each be
FCC neighbor vectors. Direct case analysis on the 12 neighbor vectors shows any three pairwise-
summing-to-zero neighbor vectors lie in distinct sheets. Counting: each vertex is in 24 triangles;
24 · L
3
/2/3 = 4L
3
. Each edge appears in 4L
3
· 3/(3L
3
) = 4 triangles.
4.2 Triangle operators and cross-sheet logicals
For an FCC triangle T with edges e
xy
S
xy
, e
xz
S
xz
, e
yz
S
yz
, define
Z
T
= Z
e
xy
Z
e
xz
Z
e
yz
, X
T
= X
e
xy
X
e
xz
X
e
yz
. (6)
Computational Result 3 (Cross-sheet reachability, verified at
L
= 4
,
6). The space of triangle
products that commute with all stabilizers has dimension 6
L
3 modulo stabilizers, and every nonzero
element of this space is supported on exactly two of the three sheets.
Verified at
L
= 4 (21 cross-sheet logicals out of 6
L
= 24) and
L
= 6 (33 out of 36). The missing
3 logicals at each
L
are global homological cycles that no finite triangle product can form; these are
accessible via standard ancilla-logical routing techniques [2].
4.3 ZZ-merge primitive (toric)
The cross-sheet joint
Z
-measurement is implemented as follows. Place an ancilla qubit at each
triangle on a ribbon connecting two cross-sheet logicals. In the ribbon, each ancilla couples (via
CNOT) to its three triangle edges, one per sheet, then is measured in the
Z
basis. The XOR of the
measurement outcomes along the ribbon yields the joint
¯
Z
A
¯
Z
B
eigenvalue.
Proposition 3 (
K=
4 ancilla connectivity). The ZZ-merge ribbon protocol can be scheduled so that
each ancilla qubit couples to at most 4 neighbors during the merge: three triangle edges plus the
syndrome extraction circuit’s ancilla-ancilla coupling (if used).
8
The proof is short: each triangle ancilla couples to the three edges of its triangle (one data
atom per sheet) and to the syndrome-extraction ancilla-ancilla coupling, if present, in the standard
4-qubit syndrome circuit [
4
]. Both contributions are bounded by the triangle’s local geometry;
the ribbon’s triangles are disjoint by construction, so each ancilla’s neighborhood is the same size
whether the ribbon contains one triangle or many. This is the same
K=
4 as the underlying surface
code’s stabilizer extraction.
4.4 XX-merge (CSS dual)
By the CSS duality of the sheet code (vertex
Z
-stabilizers exchanged with octahedral
X
-stabilizers,
modulo a lattice-dual relabeling), the same triangle-ribbon protocol with all roles dualized implements
a cross-sheet joint
¯
X
A
¯
X
B
measurement. Threshold and overhead are equivalent (Section 5).
4.5 Planar boundary-aware surgery
For the planar variant (Section 2), the ribbon must terminate at the rough or smooth boundaries of
the affected layers rather than wrapping around. We use a boundary-aware ribbon that anchors
to the appropriate rough/smooth boundary edges of each layer of each sheet. At
L
= 4
,
6
,
8 this
preserves the per-sheet distance
d
=
L
1; the 0
.
76(5)% figure previously quoted for the planar
ZZ-merge rests on a surface-code memory proxy and has not yet been rebuilt on a boundary-aware
real circuit (Section 5).
4.6 Role in the architecture
The surgery primitives are not the architecture’s primary inter-sheet entangling mechanism that
role belongs to the transversal CNOT of Section 3. Their architectural role is twofold:
1.
Joint Pauli measurements. Used for state preparation (e.g., logical Bell-pair preparation
between sheets, ancilla-state preparation for the magic-state injection interface) and for any
algorithmic step that calls for a non-destructive joint Pauli measurement.
2.
Magic-state injection. The factory-to-memory interface (Section 6) uses gate teleportation,
which fundamentally rests on joint Pauli measurements. The ZZ-merge primitive is the natural
way to perform these on the FCC sheet code side.
For bulk Clifford circuits, the transversal CNOT layer is preferred for its lower depth, no gauge-bit
overhead, and trivial
d
each
scaling. The surgery primitives are reserved for when a joint measurement
specifically is what the algorithm requires.
5 Threshold Simulations
We summarize the threshold results across all primitives needed by the architecture. Stim [
5
]
is used for all circuit construction; PyMatching’s MWPM decoder [
6
] is used for decoding. All
sweeps use circuit-level depolarizing noise with one-qubit and two-qubit error rate
p
, including idle,
gate, reset, and measurement noise. Sample sizes are several
×
10
3
shots per point for
L
= 4
,
6
(typically 3–8
×
10
3
; exact per-point counts accompany the shipped data) and of order 10
3
for
L
= 8
corroboration sweeps.
9
5.1 Memory thresholds
Operating each sheet’s stabilizer extraction without any inter-sheet operation yields the baseline
memory threshold. By Theorem 2, each sheet decomposes into
L
independent 2D toric (or surface)
codes, so the memory threshold should be inherited directly from the 2D code [
3
], modulo finite-size
effects. We verify this on the full sheet code rather than on a single-layer stand-in. Building the
complete weight-4 stabilizer-extraction circuit for one sheet directly from the lattice stabilizers, with
a direction-ordered schedule and circuit-level depolarizing noise, the decoding graph separates into
exactly
L
connected components at
L
= 4
,
6
,
8, one per layer. The layer decomposition therefore
survives a concrete syndrome-extraction schedule and circuit-level noise, not merely the stabilizer
algebra; this is the dynamical counterpart of the static check in Theorem 2. The resulting threshold,
measured over all 2
L
logicals of the sheet, is
p
mem,toric
th
1
.
1% (Figure 2a), consistent with the
standard 2D toric threshold under the same noise model. The planar variant is slightly lower,
p
mem,planar
th
0.9%, owing to boundary effects at small L.
5.2 Cross-sheet surgery thresholds
Table 1 lists the circuit-level depolarizing thresholds for the ZZ-merge and XX-merge primitives.
Primitive Variant Threshold FSS-extrapolated
ZZ-merge toric 1.2% d = L at L=4, 6, 8
ZZ-merge planar (boundary-aware) 0.76% (proxy)
XX-merge toric 1.2% (CSS dual)
Table 1: Circuit-level depolarizing thresholds for the cross-sheet surgery primitives. The toric ZZ-
merge is the real triangle-ribbon circuit (
fcc surgery circuit.py
; its joint-parity circuit distance
is verified to equal
L
at
L
= 4
,
6
,
8 by Stim’s minimum-weight graphlike-error search), decoded with
BP+OSD because the weight-3 triangle operators produce hyperedge (three-symptom) errors that
MWPM cannot decompose. The planar entry is the earlier surface-code memory proxy and has not
been rebuilt on a boundary-aware real circuit; the XX-merge follows by CSS duality and was not
separately simulated. Sample sizes are 8000
/
5000
/
3000 shots per point at
L
= 4
/
6
/
8. Full sweep
data and scripts are in the public code repository (see Code and Data Availability).
On the real triangle-ribbon circuit, decoded with BP+OSD, the toric ZZ-merge has a threshold
of
1
.
2%. The joint-parity logical error decreases with
L
at every point from
p
= 0
.
6% to 1
.
1%
(the ordering is
L=
8
< L=
6
< L=
4 throughout), and the
L
= 6 and
L
= 8 curves meet at
p
1
.
2%
(8000
/
5000
/
3000 shots at
L
= 4
/
6
/
8). This is consistent with the
1
.
1% memory threshold,
indicating that the merge does not degrade the code’s error tolerance. An earlier harness estimated
this primitive with a surface-code memory proxy, which returned a compatible 1
.
07%; the value
quoted here instead rests on the actual merge circuit. The planar variant’s 0
.
76% still rests on that
proxy and would need its own boundary-aware real circuit to confirm.
5.3 Transversal-CNOT thresholds
For the transversal-CNOT layer between two sheets we run joint two-sheet experiments on the
full sheet code at
L
= 4
,
6
,
8, with control sheet
T
and target sheet
R
(
T
) paired by the actual
rotation permutation
π
of Computational Result 1. Each shot consists of
L
rounds of stabilizer
extraction on both sheets, one transversal-CNOT layer
U
T R(T )
R
, and a further
L
rounds, with
circuit-level depolarizing noise throughout; we measure the per-sheet logical error over all 2
L
logicals
10
of each sheet. The circuits, the
π
pairing, and both decoders are constructed directly from the
lattice stabilizers (see Code and Data Availability), so the thresholds quoted here are for the code
itself, not a per-layer reduction. As an independent, decoder-free check, Stim’s minimum-weight
graphlike-error search confirms that both the memory and the transversal-CNOT circuits have
circuit distance exactly
L
at
L
= 4
,
6
,
8; the transversal layer preserves the full code distance, so
any threshold degradation observed under a given decoder is a property of that decoder rather than
of the circuit.
0.6 0.8 1.0 1.2 1.4
physical error rate
p
(%)
10
3
10
2
10
1
10
0
logical error rate
p
th
1.1%
(a) Memory
L
= 4
L
= 6
L
= 8
0.2 0.4 0.6 0.8
physical error rate
p
(%)
10
3
10
2
10
1
10
0
worsens
with
L
(b) Transversal CNOT, naive MWPM
L
= 4
L
= 6
L
= 8
0.2 0.4 0.6 0.8
physical error rate
p
(%)
10
3
10
2
10
1
10
0
improves
with
L
(c) Transversal CNOT, feed-forward
memory (
L
= 6)
L
= 4
L
= 6
L
= 8
Figure 2: Threshold of the transversal CNOT on the full FCC sheet code,
L {
4
,
6
,
8
}
, circuit-level
depolarizing noise. (a) Baseline memory threshold per sheet,
p
th
1
.
1%; the decoding graph
separates into
L
independent layers (Section 5). (b) Target-sheet logical error after the transversal
CNOT under naive per-sheet MWPM: the curves worsen with
L
, i.e. the apparent threshold
has collapsed. (c) The same experiment decoded with the feed-forward decoder of the text: the
target-sheet error now decreases with
L
and tracks the memory reference (dotted), recovering a
threshold near the memory value. Error bars are binomial (1
σ
);
L
= 8 points in (b,c) use fewer
shots and are corroborating.
Naive decoding collapses on the target sheet. Decoding the two sheets with independent
MWPM produces a sharply asymmetric result (Figure 2b). The control-sheet logical error is
unchanged from memory and improves with
L
: the conjugation
X 7→ X X
sends control errors
onto the target, so the control logical only ever sees its own errors and decodes at the
1
.
1%
memory threshold. The target-sheet logical error, by contrast, worsens with
L
across the sweep,
with an apparent threshold near 0
.
3%. The cause is the transversal CNOT itself: it copies the
control sheet’s residual
X
-field onto the paired target qubits, so the target decoder is presented with
its own errors superimposed on an inherited field it cannot account for. This is the standard error
spread of a transversal CNOT (Computational Result 2, weight-2), and it is decoder-induced rather
than a property of the code; the control side establishes that the post-CNOT code distance is intact.
Feed-forward decoding restores the threshold. The inherited field is correctable once the
control errors are known, which motivates a feed-forward (correlated) decoder. Decode the control
sheet first; read off its inferred
X
-pattern at the instant of the CNOT; propagate that pattern
through the qubit-aligned pairing
π
onto the target qubits; remove its syndrome footprint from the
target detectors; and decode the target residual, folding the inherited contribution back into the
target logical. This is the matching-decoder analogue of correlated decoding for transversal gates [
2
].
11
We verify the bookkeeping exactly in a controlled run with the target syndrome-extraction noise
disabled: with only the inherited field present, the predicted footprint reproduces the observed
target detectors in every shot (0 mismatches in 1500 shots at L = 4, 6).
Under feed-forward decoding the target-sheet error decreases monotonically with
L
(Figure 2c)
and tracks the memory reference; the feed-forward curves remain ordered in
L
throughout the swept
range, placing the threshold above
0
.
8%, while the idealized error-frame decoder reaches the full
1
.
1% memory threshold. An idealized variant that removes the true control errors (extracted from
the simulator’s error frame) recovers the full memory threshold: the target logical error then equals
the control/memory error to within statistics at every point (for example 0
.
016 versus 0
.
013 at
L
= 6,
p
= 0
.
003), confirming that the residual gap between feed-forward and memory is decoder-estimate
quality, not a code property. The numerical results support three claims.
(R1) Threshold preservation under correlated decoding. With feed-forward decoding the
transversal CNOT preserves the
1
.
1% memory threshold: both sheets improve with
L
, and
the idealized decoder reproduces the memory threshold exactly. Under naive independent
MWPM the target threshold collapses; threshold preservation is therefore a statement about
the decoder, not the code.
(R2) Bounded spread. A single fault spreads to weight at most two across the CNOT
(Computational Result 2); the controlled-noise test confirms that the inherited field is exactly
the π-image of the control X-pattern, with no additional growth.
(R3) Distance scaling. At
p
= 0
.
004 (below threshold), the feed-forward target logical error
scales 0
.
164
0
.
069
0
.
027 at
L
= 4
,
6
,
8, a suppression of a factor
2
.
5 per increment
L
= 2; under naive decoding the same quantity moves the wrong way (0
.
299
0
.
361
0.481). The control sheet scales as memory throughout.
Sample-size note for
L
= 8. The
L
= 8 points use fewer shots than the
L
= 4
,
6 sweeps (of
order 10
3
versus several
×
10
3
) and serve as a corroborating three-way scaling check; the qualitative
conclusions (R1)–(R3) are established at L = 4, 6 and reinforced, not carried, by L = 8.
5.4 Summary of threshold landscape
Table 2 consolidates the thresholds across all primitives needed by the architecture.
Operation Threshold Notes
Memory (toric) 1.1% Inherited from 2D toric code
Memory (planar) 0.9% Small-L boundary effects
ZZ-merge (toric) 1.2% FT, real ribbon circuit, BP+OSD, d = L
ZZ-merge (planar) 0.76% (proxy) boundary-aware variant, not yet rebuilt
XX-merge (toric) 1.2% CSS dual of ZZ-merge
Transversal CNOT 1.1% Inherits memory threshold under feed-forward; naive MWPM collapses
Table 2: Threshold values for all primitives in the architecture. All thresholds are circuit-level
depolarizing. Memory is MWPM-decoded; the transversal CNOT uses the correlated feed-forward
decoder of Section 5 (naive per-sheet MWPM collapses on the target sheet); cross-sheet surgery uses
BP+OSD, since its weight-3 ribbon operators create hyperedges that MWPM cannot decompose.
The transversal-CNOT and toric ZZ-merge circuits are independently verified to have circuit distance
d = L; the within-sheet local Cliffords are inherited from the standard 2D toric/surface code.
12
The key architectural takeaway is that all primitives operate at thresholds near 1% under circuit-
level depolarizing noise, which is comfortably within the operating regime of current atom-array
hardware (gate errors at the 10
3
level, with continuing improvement). For the transversal CNOT
this holds under correlated (feed-forward) decoding; naive per-sheet matching collapses on the
target sheet, so the parallel-teleportation interface should be paired with the feed-forward decoder
of Section 5.
6
The Hybrid Architecture: FCC Memory + Color-Code Factory
We now assemble the full architectural proposal. The structure follows the established memory +
factory division [
4
,
13
], but with the memory-factory interface elevated to a first-class architectural
element rather than treated as an incidental connection between two zones.
6.1 The interface throughput problem
In any universal architecture built from a Clifford memory paired with a magic-state factory, every
T
-gate insertion in the algorithm requires a magic-state injection at the interface. Each injection
reduces, via gate teleportation, to a logical CNOT between an ancilla logical (holding the magic
state) and a target logical (in memory). When the algorithm calls for
N
parallel
T
-gates common
in unrolled phase-estimation circuits, parallel quantum chemistry primitives, and lattice-Hamiltonian
simulation by Trotterization the interface must execute N logical CNOTs in parallel.
Existing 3D-code architectures throttle at this interface. Surface-code interfaces execute one
teleportation per lattice-surgery cycle (
d
rounds), so
N
parallel injections take
N · d
rounds
unless multiple memory-factory boundary regions are provisioned in parallel. 3D color-code memory
architectures use code switching, which is itself several rounds. Bivariate-bicycle code architectures
use the Tour-de-Gross protocol [
9
] with multi-block routing. In all cases, the interface throughput is
O(1) logical CNOTs per physical-gate layer regardless of memory size.
6.2 Architectural overview
The architecture comprises three architectural elements occupying distinct regions of the atom
array:
Memory zone. The FCC sheet code on three triad sheets, at a chosen code distance
L
. This
zone holds the active logical qubits during algorithm execution. Operations:
Single-sheet local Clifford operations (memory cycles,
H
,
S
, and single-logical Pauli on 2
L
logicals per sheet) using standard 2D toric/surface-code primitives applied layer-wise.
Within-sheet cross-patch surgery for joint Pauli measurements between logicals on the same
sheet.
Cross-sheet surgery via the triangle-ribbon primitive (Section 4) for arbitrary inter-sheet joint
Pauli measurements.
Factory zone. A bank of magic-state distillation factories using small 3D color-code blocks.
Operations:
Encoded |T state preparation via transversal T on [[15, 1, 3]] tetrahedral color codes.
15-to-1 magic-state distillation [13]; higher-order distillation as needed [14].
13
Parallel operation: N
fac
factory blocks produce N
fac
magic states per distillation cycle.
Parallel-teleportation interface. This is the architecturally new element. Magic states are
injected into the memory zone via 2
L
-parallel gate teleportation in a single physical-gate layer,
implemented by the FCC threefold-rotation transversal CNOT of Section 3.
The interface uses the standard
T
-gate teleportation gadget [
7
], executed in parallel across 2
L
ancilla–target pairs. For one magic state into one target, the gadget is:
1.
Prepare an encoded
|T
=
1
2
(
|
0
+
e
/4
|
1
) in an ancilla logical (color-code block, exiting the
factory).
2. CNOT from target to ancilla: |ψ
tgt
|T
anc
CX
tgtanc
· · · .
3. Measure the ancilla in the X basis.
4. Apply a conditional S correction on the target depending on the measurement outcome.
The parallel version simply executes 2
L
copies of this gadget concurrently. The transversal CNOT
layer of Section 3 performs step (2) for all 2
L
pairs in a single physical-gate layer. Steps (1), (3),
and (4) parallelize trivially they are within-block operations on the ancilla side (preparation,
single-qubit measurement) or single-logical Pauli corrections on the target side.
The full interface operation is:
1.
Up to 2
L
factory blocks each prepare an encoded
|T
in parallel; tweezer rearrangement places
them in the ancilla positions of one FCC memory sheet.
2.
One transversal-CNOT layer (paired by
R
) executes step (2) of the gadget for all 2
L
pairs in
one physical-gate layer.
3.
2
L
ancilla logical measurements in parallel (one
X
-basis measurement per ancilla logical, in
turn implemented by transversal single-qubit
X
-basis measurements on the data atoms of the
ancilla logical).
4. Per-target conditional S corrections complete the 2L gate teleportations.
The interface throughput is 2
L
logical CNOTs per physical-gate layer linear in code distance,
compared to the
O
(1) throughput of surgery-based interfaces. At
L
= 4 this is 8 parallel injections
per layer; at L = 8, 16 parallel injections.
We note that the surgery-based ZZ-merge primitive of Section 4 provides an alternative route
to the same gate teleportation: instead of a transversal CNOT in step (2), one performs a joint
¯
Z
tgt
¯
Z
anc
measurement via the triangle ribbon. This implements the same teleportation but at
O
(1) throughput (one teleportation per ribbon merge cycle). The parallel-teleportation interface
is therefore strictly more parallel than the surgery alternative, with both available in the same
architecture.
6.3 When the parallel-teleportation primitive applies and when it does not
The 2
L
-parallel transversal CNOT is best understood as a structured-parallel CNOT primitive, not
a general-purpose Clifford gate. Its 2
L
logical CNOT pairings are fixed by the FCC rotation
σ
(a
symplectic permutation determined by the lattice, Computational Result 1); the user cannot choose
which pairs of logicals are involved.
For magic-state injection specifically, this is not a constraint. Magic states are prepared states
whose identity is determined at the moment of preparation, so the compiler can prepare magic state
i
in whichever ancilla slot maps to the desired target under
σ
. The pairing constraint becomes a
14
placement constraint, which is satisfiable for any subset of target logicals up to the 2
L
-per-layer
limit.
For arbitrary single-CNOT Clifford patterns, by contrast, the transversal CNOT is over-
provisioned: it applies CNOTs to all 2
L
paired logicals, not just the one or two the algorithm wants.
These extra CNOTs either need to be absorbed (apply to logicals in
|
0
, no effect) or undone by
subsequent operations. For general-purpose Clifford circuits with arbitrary CNOT patterns, the
surgery primitive (Section 4) is the appropriate tool.
The cleanest statement of the architecture’s claim is therefore:
The FCC sheet code provides the highest published parallel-teleportation throughput among
3D codes suitable for atom-array hardware, at 2
L
logical CNOTs per physical-gate layer.
The throughput is realized at the magic-state injection interface, where structured parallel
CNOTs are exactly the required operation. For general-purpose Clifford gates between
arbitrary logical pairs, the architecture uses cross-sheet surgery (footprint 3
L/
2 atoms),
still competitive with surface-code surgery but without the
L
-scaling advantage of the
parallel-teleportation case.
6.4 What each layer of the architecture contributes
Before comparing against competing codes, we make explicit what each architectural ingredient
adds. Table 3 reads top to bottom as a buildup: the minimal storage-only configuration through to
the full universal-computation architecture.
Two readings of the table are useful. Read row by row, it answers the question “what does
each architectural ingredient buy you?” Read by comparison, it answers the question “at which
row does the FCC structure start to matter beyond just being a packing of three independent toric
codes?” The answer to the second question is row (d): everything above row (d) is achievable by
three independent stacked toric codes on disjoint atom regions. Rows (d) and (e) are where the
FCC structure earns its place in the architecture, by supplying cross-sheet operations that do not
exist for independent stacks. Row (f) brings in an external factory, which is needed for universality
regardless of the memory code choice.
6.5 Cross-architecture survey
Table 4 situates the FCC sheet code architecture in the landscape of 3D and atom-array-suitable
QEC codes. We compare on metrics that are decisive for atom-array deployment at near-term scale
(d 8, 10
2
–10
3
atoms).
The table establishes three claims. First, in the surgery-only configuration (no transversal
CNOT), the FCC sheet code is competitive with the surface code on threshold and rate, and the
triangle-ribbon merge footprint (3
L/
2 atoms) is approximately 4
/
3
×
smaller than the surface-code
routing-channel equivalent. Second, the transversal-CNOT layer (Section 3) supplies an interface
primitive with
L
-scaling throughput, the only
L
-scaling cross-block entangling primitive in the
survey; it inherits the memory threshold under feed-forward decoding (Section 5). Third, the full
architecture pairs the FCC memory layer with the 3D color-code factory, which is the only mature
route to transversal T .
We note explicitly what the table does not show. At higher distance (
d
12), bivariate-bicycle
codes pull ahead on rate, and remain the more efficient bulk-memory choice once the system grows
beyond the near-term atom-array regime. The FCC architecture’s advantage is specifically for the
d 8 regime where atom-array systems will operate over the next several years.
15
Configuration
Logical
qubits (
L=
4)
Operations avail-
able
Gate set gen-
erated
Universal?
Section
(a) One sheet, memory only 8 memory cycles
none beyond
Pauli frame
no §2
(b) One sheet + within-sheet
local Cliffords
8
+
H
,
S
, single-
logical Pauli
single-sheet
Cliffords
only (no
entangling)
no §2
(c) Three sheets + within-
sheet Cliffords (no cross-sheet
ops)
24
three indepen-
dent copies of
(b)
three dis-
joint Clifford
groups, no
entangling
between
sheets
no §2
(d) (c) + cross-sheet ZZ/XX-
merge surgery
24
+ inter-sheet
joint Pauli mea-
surements
full Clifford
group on 24
logicals via
surgery
no (no
T
)
§4
(e) (d) + transversal CNOT
via R
24
+ 2
L
-parallel
cross-sheet
CNOT in one
layer
same Clifford
group, much
lower depth
no (no
T
)
§3
(f) (e) + 3D color-code
T
-
factory
24 (memory)
+
T
-state in-
jection via gate
teleportation
full
Clifford + T
yes §6
Table 3: Capability ladder for the FCC sheet code architecture, with logical-qubit counts at
L
= 4.
Each row adds one architectural ingredient. Rows (a)–(c) are inherited directly from the layer
decomposition (Theorem 2) and do not exploit the FCC structure beyond geometric packing. Rows
(d) and (e) are the FCC-specific contributions: cross-sheet triangle surgery and the threefold-rotation
transversal CNOT. Row (f) completes universal computation by adding the magic-state factory. At
each row, the new operations either expand the available gate set or reduce the depth of operations
already available; the architecture’s value comes from the cumulative stack.
6.6 Why this division of labor is more than “two codes on one chip”
Several design choices distinguish this architecture from a trivial composition of two codes. First,
both codes are 3D, so they share the same hardware requirements (Rydberg connectivity, tweezer
rearrangement) and can occupy the same atom-array footprint at different times via tweezer
reconfiguration. Second, the parallel-teleportation interface (Section 6) is implemented by an
FCC-native primitive the transversal CNOT via
R
rather than by general-purpose surgery or
code switching. This is what converts the interface from an
O
(1)-throughput bottleneck into an
O
(
L
)-throughput pipeline. Third, both codes are MWPM- or BP-OSD-decodable in well-understood
circuits, so the architecture does not require novel decoder development.
6.7 Algorithmic throughput model
For an algorithm with
N
T
total
T
-gates and per-layer
T
-density
ρ
T
(fraction of memory logicals
receiving a
T
-gate per Clifford layer), the throughput is bottlenecked by the slower of the memory-side
interface and the factory output. In our architecture:
16
Architecture
Rate
k/n
(data) at
d=4
Threshold
(circuit)
Decoder
Parallel tele-
port per layer
Transversal
T ?
Atom-
array
demo’d?
Rotated surface
code (independent
patches)
1
/d
2
6% per
patch
1% MWPM O
(1) (per
surgery chan-
nel)
no
yes (Blu-
vstein
2025)
Stacked 2D toric (3
independent sheets)
2
/L
2
per
sheet
1%
MWPM,
per layer
O
(1) (per
routing chan-
nel)
no
no (con-
cept)
Bivariate-bicycle,
gross [[144,12,12]]
12
/
144
8%
0
.
7–
0.8%
BP-OSD O
(1) (Tour de
Gross)
no
no (the-
ory)
3D color code,
[[15,1,3]] copies
1
/
15
7%
1
.
5%
(code ca-
pacity)
restriction
/ BP-OSD
O
(1) (code
switching)
yes
yes (Blu-
vstein
2025, as
factory)
3D toric
3
/L
3
(fixed
k = 3)
1% MWPM O
(1)
(surgery)
no no
FCC sheet,
surgery-only
(rows c+d)
2/L
2
per
sheet
1.2%
BP+OSD
O(1)
(trian-
gle ribbon)
no
no (pro-
posed)
FCC sheet,
+ transversal
CNOT
same same
MWPM
+
2L
(transversal
layer)
no
no (pro-
posed)
FCC + color-code
factory (full archi-
tecture)
memory:
same;
factory:
7%
memory:
1%; fac-
tory:
1.5%
MWPM
+ +
BP-OSD
2L
yes (fac-
tory)
no (pro-
posed)
Table 4: Cross-architecture survey at
L
= 4 (where applicable). Top block: competing
codes/architectures. Bottom block: this paper’s architecture at three levels of inclusion. The
“parallel teleport per layer” column counts the number of logical CNOTs at the memory-factory
interface that can execute in a single physical-gate layer the bottleneck operation for parallel
magic-state injection. All competing architectures yield
O
(1) per layer because their cross-block
CNOTs are sequential (surgery, code switching, or constant-overhead routing). The FCC sheet code
with the transversal-CNOT layer yields 2
L
parallel injections per layer, the only
L
-scaling primitive
in the table. At
L
= 4 this is 8
×
the interface throughput of the surface-code baseline; at
L
= 8,
16
×
. Decoder abbreviation: = feed-forward (correlated) decoder of Section 5, required for the
transversal CNOT since naive per-sheet matching collapses on the target sheet.
Memory-side interface throughput. 2
L
parallel teleportations per physical-gate layer (the
transversal CNOT primitive of Section 3). At
L
= 4, this is 8 injections per layer; at
L
= 8,
16 injections per layer. This is the FCC architecture’s signature contribution.
Factory output throughput. Each color-code factory block produces approximately one
|T
per
d
factory
syndrome cycles. With
N
fac
parallel factory blocks, total throughput is
N
fac
/d
factory
states per syndrome cycle. For early-fault-tolerant atom-array systems (
10
3
atoms), small
factories at d
factory
= 3–5 allow N
fac
4–8 parallel blocks [12].
The two throughputs become matched when
N
fac
/d
factory
2
L
states per syndrome cycle (since
one transversal-CNOT layer takes one physical-gate layer, which is a fraction of a syndrome cycle).
For
L
= 4,
d
factory
= 3, the matched factory bank has
N
fac
24 blocks, occupying
24
×
30 = 720
atoms. Combined with the 384-atom memory zone at
L
= 4, the matched-throughput system fits
within
1100 atoms, comparable to the Atom Computing 1180-atom system or near-term scale-ups
of QuEra Gemini.
For algorithms with lower
T
-density, the factory bank can be smaller and the interface is over-
17
provisioned (the FCC architecture pays no penalty for unused interface capacity). For algorithms with
higher
T
-density approaching
ρ
T
1 (e.g., dense phase-estimation circuits), the FCC architecture’s
O
(
L
) interface throughput is the only way to avoid serializing magic-state injection competing
architectures with O(1) interfaces become the bottleneck regardless of factory provisioning.
7 Resource Estimates vs. Surface Code + 15-to-1
We compare the proposed architecture against the standard surface-code-plus-15-to-1 baseline at
matched logical-qubit count k and matched distance d.
7.1 Atom-count comparison
Counting both data and ancilla atoms, the resource demand for k logical qubits at distance d is:
Surface code baseline. Each logical occupies an independent surface-code patch of size
(2d 1)
2
2d
2
atoms (data + ancilla). For k logicals: k · 2d
2
atoms.
FCC sheet code (three sheets, toric). At
L
=
d
, the three-sheet system encodes
k
= 6
L
logicals
in
6
L
3
atoms (data + ancilla). For arbitrary
k
6
L
:
k/
(6
L
)
FCC blocks, each
6
L
3
atoms.
Table 5 compares the two at matched distance and logical count.
d k Surface (atoms) FCC (atoms) Surface/FCC ratio Notes
4 24 24 · 32 = 768 384 2.0× 1 FCC block
4 48 48 · 32 = 1536 768 2.0× 2 FCC blocks
6 36 36 · 72 = 2592 1296 2.0× 1 FCC block
8 48 48 · 128 = 6144 3072 2.0× 1 FCC block
Table 5: Memory-zone atom counts (data + ancilla) for the surface-code baseline versus the FCC
sheet code. Both use the rotated variant: surface code at
2
d
2
atoms per logical (data + ancilla),
FCC at
6
L
3
atoms total for 6
L
logicals (data + ancilla, three sheets). The FCC sheet code
consistently uses approximately half the atoms per logical. Counts here include ancillas; the data-
only counts are half of these values (e.g., FCC at L = 4 has 192 data atoms in 384 total).
The
2
×
memory-zone savings at all distances reflects the underlying 2 : 1 rate advantage of
the 2D toric code over the rotated surface code, which the FCC sheet code inherits via its per-layer
decomposition. At the small distances relevant to near-term atom-array experiments (
d
6), this
savings represents a meaningful reduction in atom-count requirements.
7.2 Clifford-gate cost
Table 6 compares the depth cost of logical Clifford-gate primitives.
For algorithms with high Clifford density, the surface code spends
d
rounds per non-adjacent
CNOT (lattice surgery) or requires careful patch layout for transversal CNOTs to be feasible. The
FCC sheet code’s transversal CNOT layer executes a full block of 2
L
inter-sheet CNOTs in one
physical-gate layer, with no merge window. In algorithms whose CNOTs are not pre-arranged
for adjacency (most quantum simulation circuits, most QFT-like structures), this is a meaningful
constant-factor speedup.
18
Operation Surface code baseline FCC sheet code
Logical CNOT between two ad-
jacent patches
1 transversal layer
(between two surface
patches: not applicable)
Logical CNOT, cross-sheet
1 transversal layer (via rota-
tion)
Logical CNOT, non-adjacent
patches
surgery: d-round merge
2L-parallel CNOT layer
1 transversal layer (rotation),
2L logical CNOTs
Table 6: Comparison of logical Clifford-gate primitives. The transversal CNOT via FCC rotation
gives 2
L
logical CNOTs in one physical layer, versus the surface code’s adjacency requirement for
transversal CNOTs or its d-round merge for non-adjacent patches.
7.3 Factory footprint
In both architectures, the
T
-state factory dominates atom count for algorithms with non-trivial
T
-gate counts. We assume both architectures use a 15-to-1 distillation routine. The factory codes
differ:
Surface code baseline. Magic-state distillation on the surface code requires
15
· d
2
factory
atoms
per distillation block at distance d
factory
.
FCC + color code. The factory uses small color-code blocks at [[15
,
1
,
3]], requiring 15 atoms
per block plus ancillas, for a total of
30 atoms per distillation primitive. At higher factory
distances, cascaded distillation in color codes scales as
15
· d
3
factory
atoms (the color code’s
3D scaling), but the constant factor is small at low d
factory
.
For low-distance factories (
d
factory
5), the color-code factory is substantially smaller than the
surface-code equivalent because the transversal
T
eliminates the need for state-injection blocks. For
high-distance factories, the surface-code baseline catches up in atom count but loses on factory
depth (the surface code needs more rounds per distillation cycle than the color code does).
7.4 Merge-Footprint Comparison: Triangle Ribbon vs Surface-Code Surgery
For algorithms that involve frequent cross-block joint Pauli measurements (state preparation, magic-
state injection, multi-logical entangling operations), the spatial footprint of a single merge operation
determines how many parallel merges can be running on a fixed atom budget. We compare the
triangle-ribbon cross-sheet ZZ-merge against the surface-code alternatives at matched code distance.
Triangle ribbon verified construction. We construct an explicit length-
L
triangle ribbon
for the cross-sheet
¯
Z
A
¯
Z
B
measurement at
L
= 4
,
6
,
8 as follows. Take
¯
Z
A
to be the length-
L
Z-logical on sheet
S
xy
in layer
z
= 0, and
¯
Z
B
the length-
L
Z-logical on sheet
S
xz
in layer
y
= 0.
The ribbon is the sequence of
L
triangles whose
S
xy
edge lies on
¯
Z
A
and whose
S
xz
edge lies on
¯
Z
B
. Whether
L
triangles is the global minimum across all valid ribbons connecting the same logical
pair is an open question; our construction establishes feasibility and the structural counts shown in
Table 7, verified by direct computation (reproducible code in the bundle):
Resource accounting. For the FCC triangle ribbon, the atoms reserved during the merge (beyond
the source/target logicals themselves) are:
19
L Triangles in ribbon New ancillas Sheet-S
xy
atoms Sheet-S
xz
atoms Sheet-S
yz
atoms
4 4 4 4 4 2
6 6 6 6 6 3
8 8 8 8 8 4
Table 7: Triangle ribbon for cross-sheet ZZ-merge, verified at
L
= 4
,
6
,
8. The ribbon has length
L
triangles, contributing
L
new ancillas. The data atoms touched in the two source sheets (
S
xy
and
S
xz
) are precisely the
L
atoms of the source logicals (not overhead). The third sheet (
S
yz
) is
touched at
L/
2 distinct atoms; each touched atom appears twice in the ribbon (the two yz-edge
contributions cancel pairwise mod 2), giving zero net effect on the third sheet’s logical content but
reserving those atoms during the merge measurement window.
L new ancilla atoms, one per triangle on the ribbon.
L/
2 data atoms in the third sheet, borrowed for the measurement window but returned to the
third sheet’s code afterward (their yz-edge contributions cancel mod 2).
Total transient overhead: 3L/2 atoms.
For the surface-code alternatives operating between patches in different planes of an atom array
(analogous to different sheets):
Routing channel approach. A length-L surface-code path connects the two patches. Channel
data atoms: L. Channel ancillas: L. Total transient overhead: 2L atoms.
Face-to-face approach. Use atom rearrangement to bring the two patches into adjacency, then
run standard merge. Seam ancillas:
L
. Total transient overhead:
L
atoms, plus rearrangement
time (typically 100 µs per move, against 5 µs per syndrome cycle).
Comparison at matched
L
. Table 8 compares the atom counts reserved during a single cross-
block ZZ-merge across the three protocols.
L FCC triangle ribbon Surface (routing channel) Surface (face-to-face)
4 6 atoms 8 atoms 4 atoms + rearrangement
6 9 atoms 12 atoms 6 atoms + rearrangement
8 12 atoms 16 atoms 8 atoms + rearrangement
Table 8: Atoms reserved during a single cross-block ZZ-merge, by code and protocol. The FCC
triangle ribbon’s 3
L/
2 overhead beats the surface-code routing-channel approach by a factor of
approximately
4
3
at all
L
, and is competitive in atom count with the surface-code face-to-face
approach while not requiring atom rearrangement.
Qualitative advantages beyond atom count. Three additional advantages of the triangle
ribbon are worth noting:
1.
No rearrangement required. The ribbon operates in place, with source and target logicals
remaining in their canonical lattice positions. Surface-code face-to-face surgery requires moving
one of the patches into adjacency, costing
L
parallel tweezer moves at
100
µ
s each. In
wall-clock terms, the ribbon merge is faster by roughly a factor of 20 at L = 4.
20
2.
Three-body Rydberg measurements as a hardware fit. Each triangle ancilla is placed at the
geometric center of an FCC triangle, where it couples to all three triangle edges within
Rydberg-blockade range. This is the natural multi-body measurement primitive of atom-array
hardware [12] and exploits a capability surface-code surgery does not require.
3.
Parallel merge throughput. Disjoint triangle ribbons can run simultaneously on the same lattice
without conflicting. At
L
= 4, the lattice has 4
L
3
= 256 triangles, of which a ribbon uses
L
= 4;
the remaining 252 triangles are available for other concurrent ribbons. Surface-code routing
channels conflict whenever they share routing regions, limiting the number of simultaneous
merges.
Implications for memory-only architectures. Even if one removes the transversal-CNOT
result of Section 3 from consideration and uses only the within-sheet local Cliffords plus triangle-based
cross-sheet surgery, the architecture retains a measurable advantage over independent surface-code
stacks for any computation involving cross-block joint Pauli measurements. This is the regime
relevant to magic-state injection from the color-code factory (Section 6), where the gate-teleportation
interface is itself a cross-block joint measurement. Thus the FCC structure contributes architectural
value even in a hypothetical surgery-only configuration, distinct from its more dramatic contribution
via the transversal-CNOT layer.
7.5 Summary resource picture
For a representative scenario 24 logical qubits at distance 4, with a low-distance factory supplying
|T
states at the rate needed for moderate-
T
-count algorithms the FCC + color-code architecture
uses approximately:
Memory: 384 atoms (vs. 768 for surface code).
Factory: 100–200 atoms (vs. 500–1000 for surface code factory).
Total: 500–600 atoms (vs. 1300–1800 for surface code).
This is within the current footprint of QuEra Gemini-2 (256 atoms) at moderate
k
, and well within
Atom Computing’s 1180-atom system at the full k = 24 scale.
We emphasize that these are first-order estimates intended to set the scale; full circuit-level
resource estimates accounting for decoder latency, factory throughput limitations, and inter-zone
teleportation overhead are left to future work. The qualitative conclusion that the proposed
architecture meaningfully reduces atom-count requirements relative to the surface-code baseline at
near-term scales holds across the range of assumptions we examined, with a memory-zone saving
of roughly 2× at d 8.
8 Mapping to Reconfigurable Atom-Array Hardware
The architecture targets reconfigurable neutral-atom arrays specifically. We describe how each
primitive maps to the operations available on QuEra Gemini-class [
16
], Atom Computing [
17
], and
Pasqal [18] platforms.
8.1 Lattice layout
The three triad sheets of the FCC lattice are realized as three interpenetrating tweezer arrays. Each
sheet’s data qubits occupy a sub-array of
L
3
atoms; the sub-arrays share the same FCC lattice
21
positions but address different edges of the lattice (corresponding to the three triads
S
xy
, S
xz
, S
yz
).
Ancilla atoms for stabilizer extraction are placed at lattice vertices (for
Z
-stabilizers) and at
octahedral void centers (for X-stabilizers).
The full memory zone has footprint 3
L
3
data atoms and
3
L
3
ancilla atoms, totalling
6
L
3
atoms. At L = 4: 384 atoms. At L = 6: 1296 atoms. At L = 8: 3072 atoms.
8.2 Native gate primitives
Stabilizer extraction. Within each sheet, the weight-4 vertex
Z
-stabilizers and octahedral-void
X
-stabilizers are extracted using the standard 4-qubit syndrome extraction circuit [
4
], executed in
parallel across all stabilizers of the sheet using global Rydberg pulses or per-pair tweezer addressing.
The Rydberg blockade radius is tuned to give nearest-neighbor coupling within the FCC sheet’s
K=4 connectivity; this is the same regime as current surface-code atom-array demos [12].
Transversal CNOT via tweezer rearrangement. The threefold-rotation transversal CNOT
(Section 3) is implemented in three steps:
1.
Tweezer rearrangement brings each control-qubit atom in sheet
T
adjacent to its
π
-paired
target atom in sheet
R
(
T
). This rearrangement is the FCC threefold rotation realized as a
physical atom motion.
2.
A single global Rydberg pulse implements
L
3
parallel physical CNOTs between paired atoms.
3. Tweezer rearrangement restores the atoms to their original FCC positions.
The rearrangement step is the dominant time cost. On QuEra hardware, atom rearrangement takes
100
µ
s for moves of a few lattice spacings, comparable to a single syndrome-extraction round [
11
].
The parallel CNOT pulse itself is < 1 µs.
Within-sheet surgery. Each sheet’s within-sheet surgery primitives (ZZ-merge and XX-merge
between logicals on the same sheet) are standard 2D toric/surface code lattice surgery, applied
layer-wise via the layer decomposition. No specialized atom-array primitive is needed beyond what
current surface-code demos already use.
Cross-sheet surgery. Cross-sheet ZZ-merge uses ancilla atoms placed at triangle centers, coupling
to three data atoms (one per sheet) within Rydberg-blockade range. This requires the ancilla atom
to sit near the geometric center of the triangle, which is a single position in the FCC lattice. No
long-range couplings are needed.
Magic-state injection. Two routes are available, depending on the algorithm’s demand:
Parallel injection (primary route, Section 6). Up to 2
L
factory blocks each prepare an encoded
|T
in parallel; tweezer rearrangement places them in the ancilla positions of one FCC memory
sheet. One transversal-CNOT layer (paired by
R
) executes the CNOT step of the gate-
teleportation gadget for all 2
L
pairs in a single physical-gate layer. 2
L
ancilla logical
X
-basis
measurements and per-target conditional
S
corrections complete the 2
L
teleportations. This
is the route that realizes the architecture’s O(L) interface throughput.
Single-state injection (fallback route, Section 4). When the algorithm needs only one or a few
magic states injected into specific targets that do not match the FCC rotation pairing, tweezer
rearrangement brings the color-code factory’s prepared
|T
atom into proximity with the target,
22
and the cross-sheet ZZ-merge surgery primitive performs a joint
¯
Z
tgt
¯
Z
anc
measurement to
drive the same gate teleportation at
O
(1) throughput. This route is the appropriate primitive
for sparse, irregular T -gate patterns.
The two routes share the same hardware primitives (Rydberg coupling, tweezer rearrangement) and
differ only in the choice of gadget step (2) implementation: transversal CNOT versus joint Pauli
measurement.
8.3 Comparison to current demonstrations
The Bluvstein 2025 architecture [12] demonstrated:
Surface codes up to d = 7, with Λ 2.14 below-threshold suppression.
Transversal teleportation with [[15, 1, 3]] 3D color codes for non-Clifford operations.
Atom rearrangement and zoned architecture across 448 atoms.
The proposed architecture uses essentially the same hardware primitives Rydberg blockade, atom
rearrangement, zoned spatial organization and replaces the surface-code memory layer with the
FCC sheet code, retaining the 3D color code in its specialized role as the
T
-state factory. No new
hardware capability is needed beyond what the Bluvstein 2025 demonstration already exercises.
8.4 Near-term demonstration milestones
A staged demonstration on atom-array hardware would proceed as follows:
1.
Single-sheet memory at
L
= 4.
128 atoms (one sheet, 64 data +64 ancilla), demonstrating
below-threshold operation of a single [[64
,
8
,
4]] block. Comparable scale to Bluvstein 2025’s
d = 7 surface-code demo.
2.
Three-sheet memory at
L
= 4.
384 atoms. Demonstrates the full three-sheet structure and
verifies the FCC lattice layout.
3.
Transversal CNOT between two sheets at
L
= 4. Demonstrates the rotation-paired CNOT
layer and verifies the threshold-preservation result (R1).
4.
Full Clifford universality at
L
= 4. Composition of within-sheet local Cliffords with the
cross-sheet transversal CNOT. Demonstrates Proposition 2.
5.
Magic-state injection from a color-code factory. Couples the [[15
,
1
,
3]] color-code factory to
the FCC sheet code memory via gate teleportation, completing the universal gate set.
Each milestone is achievable on hardware in the
256–1000 atom range, which is the current state
of the art on QuEra and Atom Computing systems.
9 Discussion
9.1 What this architecture is, and is not
The proposed architecture’s principal contribution is a parallel-teleportation primitive at the memory-
factory interface, derived from the FCC threefold rotation, with throughput that scales linearly with
code distance. No other 3D code in the current literature provides an
L
-scaling parallel-teleportation
primitive. Combined with a 3D color-code magic-state factory, the resulting architecture is universal
23
and in the regime where the factory throughput can be scaled to match the memory’s 2
L
-parallel
injection capacity removes a bottleneck that constrains all O(1)-interface architectures.
We have not claimed:
Optimality at high distance. Bivariate-bicycle codes pull ahead on rate at
d
12 [
8
,
9
],
and remain the better bulk-memory choice at that scale. The FCC sheet code’s parallel-
teleportation advantage is specifically for the small-to-moderate distance regime (
d
8)
relevant to near-term atom-array experiments.
Optimality on non-Clifford operations. The 3D color code’s transversal
T
is what we use as
the factory, not what we claim to improve upon.
A general-purpose Clifford speedup. The transversal CNOT applies 2
L
CNOTs in a fixed
pairing pattern; for arbitrary single-CNOT operations between specific logical pairs, the
architecture uses cross-sheet surgery (footprint 3
L/
2 atoms), which is competitive with but
not dramatically better than surface-code surgery. The dramatic advantage is specifically
at the magic-state injection interface, which is the natural use case for structured parallel
CNOTs.
Deployability on K=4 2D superconducting hardware. The architecture requires 3D-native
connectivity that planar superconducting chips do not provide. Surface code remains the right
choice for that platform.
9.2 Scope of validity
The architecture’s validity rests on the structural and numerical results of Sections 25, which are
independent of the architectural framing. The structural results (CSS isomorphism of the threefold
rotation, layer decomposition, triangle algebra, ribbon construction) are verified at
L
= 4
,
6
,
8
by direct computation. The bundle accompanying this paper includes the triangle-enumeration,
triangle-ribbon-construction, and merge-footprint verification scripts; the CSS-isomorphism, layer-
decomposition, and cross-sheet reachability checks are available in the public code repository (see
Code and Data Availability). The threshold results are circuit-level depolarizing simulations, sample
sizes of several
×
10
3
shots per point at
L
= 4
,
6 and of order 10
3
at
L
= 8; the simulation drivers,
Stim circuits, and raw shot data are in the public code repository. Memory is decoded with MWPM;
cross-sheet surgery is decoded with BP+OSD (its weight-3 ribbon operators create hyperedges that
MWPM cannot decompose); the transversal CNOT requires the correlated (feed-forward) decoder
of Section 5, since naive per-sheet matching collapses on the target sheet.
What requires further work for a complete architectural validation:
A real-time implementation of the feed-forward decoder for the transversal CNOT. We
demonstrate in simulation that correlated decoding restores the threshold and validate the
inherited-field bookkeeping exactly, but the latency of a hardware-speed correlated decoder,
and its interaction with the factory’s own decoding, is not modeled here.
Full end-to-end resource and time estimates including decoder latency, factory output rate
variance, and inter-zone teleportation overhead. Section 7 gives first-order atom-count estimates
and matched-throughput analysis; a precise time-overhead model remains open.
Detailed circuit-level threshold for the parallel-teleportation interface specifically. We have
characterized the FCC side’s surgery thresholds and inherited the color-code factory’s thresholds
from the literature, but the combined interface (transversal CNOT applied between an FCC
sheet’s ancilla slots and target slots) has not been simulated end-to-end with the color code
attached.
24
Direct comparison with the BB code architecture [
9
] at matched scale, including a parallel-
teleportation comparison at fixed factory provisioning.
Compiler-level question: for algorithms whose
T
-gate patterns do not naturally match the
FCC pairing
σ
, the slot-permutation problem must be solved at compile time. The cost of
this slot assignment for typical algorithm structures is an open question.
9.3 Open structural questions
Several structural questions about the FCC sheet code remain open and could strengthen the
architecture if resolved positively:
Additional transversal gates. The full octahedral symmetry group of FCC has order 48 and
contains order-2 and order-4 elements in addition to the order-3 rotation we use. Whether any
order-4 element induces a CSS isomorphism at the logical level remains open. An affirmative
answer would add transversal H or S on top of the existing transversal CNOT.
Single-shot QEC. The FCC sheet code is not single-shot in the conventional sense (it inherits
the 2D toric code’s
d
-round requirement). Whether the cross-sheet triangle structure can be
exploited to recover single-shot capability for cross-sheet operations is open.
Higher-level distillation routines. The Litinski
12-to-1 routines [
14
] achieve better magic-
state output rates than 15-to-1; adapting them to the color-code factory and the FCC interface
is open.
9.4 Outlook
The atom-array QEC landscape is moving fast. Within the past two years, surface-code-plus-3D-
color-code architectures have moved from theory to operating demonstration on 448 atoms. By the
time the present manuscript appears, the next-generation systems at QuEra, Atom Computing,
and Pasqal are likely to be operating at 1000–3000 atoms, well within the regime where the FCC
sheet code’s
L {
4
,
6
,
8
}
memory blocks become demonstrably feasible. We expect that direct
experimental implementations of the architecture proposed here become attainable within the next
several years on existing hardware roadmaps, and we view the structural and threshold results of
this paper as the analytical foundation for those demonstrations.
Code and Data Availability
The reproducibility bundle accompanying this paper includes the structural verification scripts
(triangle enumeration, triangle-ribbon construction, atom-count comparison for the merge-footprint
analysis), the lattice modules, and the figure-generation code for Figure 1.
The repository-side scripts that verify the remaining structural claims (CSS isomorphism, layer
decomposition, cross-sheet reachability) and the threshold-sweep drivers and figure-generation
code for Figure 2 are packaged as
ssmtheory_repo_scripts.zip
in the public repository at
github.com/raghu91302/ssmtheory
. The zip’s
README.md
maps each script to the specific paper
claim it supports:
verify css isomorphism.py Computational Result 1, verified at L = 4, 6, 8.
verify layer decomposition.py Theorem 2, verified at L = 4, 6, 8 for all three sheets.
verify cross sheet reachability.py Computational Result 3, verified at L = 4, 6.
25
sweep memory threshold.py
memory threshold sweep (exact: each sheet decomposes into
stacked 2D toric codes).
fcc sheet circuit.py
full sheet-code circuits built directly from the lattice stabilizers: the
weight-4 memory circuit (Figure 2a) and the two-sheet transversal-CNOT circuit with the
real rotation pairing π (Figure 2b,c).
fcc feedforward real.py
the feed-forward (correlated) decoder of Section 5, the naive-
MWPM baseline, the idealized error-frame (genie) decoder, and the controlled-noise validation
of the inherited-field bookkeeping.
indep checks.py
independent re-derivation: circuit distance via Stim’s minimum-weight
graphlike-error search (
d
=
L
at
L
= 4
,
6
,
8), a decoder-free symplectic check of the transversal
CX (weight-2 spread, stabilizer-group preservation), and a from-scratch phenomenological
feed-forward reproducing the controlled-noise invariant.
fcc surgery circuit.py
the real Z-basis cross-sheet ZZ-merge (triangle-ribbon) circuit,
replacing the earlier surface-code memory proxy; joint-parity distance
d
=
L
verified at
L = 4, 6, 8.
bposd decode.py
the Stim-DEM-to-BP+OSD decoder used for the surgery primitive (re-
quired because the weight-3 triangle operators are not graphlike), with
gen data surgery.py
and data surgery real.json the sweep driver and raw results.
sweep zz merge threshold.py
ZZ-merge threshold sweep driver (toric and planar variants).
sweep transversal cnot threshold.py transversal-CNOT threshold sweep driver.
make fig transversal cnot.py Figure 2 generation from JSON sweep data.
Raw shot data from the threshold sweeps reported in Section 5 is also archived in the same repository.
Declarations
Clinical trial registration: not applicable. This study is theoretical and does not constitute a
clinical trial.
Consent to Publish declaration: not applicable.
Ethics and Consent to Participate declarations: not applicable.
References
[1]
R. Kulkarni, “A 67%-Rate CSS Code on the FCC Lattice: [[192
,
130
,
3]] from Weight-12
Stabilizers,” (2026). arXiv:2603.20294.
[2]
C. Horsman, A. G. Fowler, S. Devitt, R. Van Meter, “Surface code quantum computing
by lattice surgery,” New J. Phys. 14, 123011 (2012). doi:10.1088/1367-2630/14/12/123011.
arXiv:1111.4022.
[3]
E. Dennis, A. Kitaev, A. Landahl, J. Preskill, “Topological quantum memory,” J. Math. Phys.
43, 4452 (2002). doi:10.1063/1.1499754. arXiv:quant-ph/0110143.
[4]
A. G. Fowler, M. Mariantoni, J. M. Martinis, A. N. Cleland, “Surface codes: To-
wards practical large-scale quantum computation,” Phys. Rev. A 86, 032324 (2012).
doi:10.1103/PhysRevA.86.032324. arXiv:1208.0928.
26
[5]
C. Gidney, “Stim: a fast stabilizer circuit simulator,” Quantum 5, 497 (2021). doi:10.22331/q-
2021-07-06-497. arXiv:2103.02202.
[6]
O. Higgott, “PyMatching: A Python package for decoding quantum codes with minimum-
weight perfect matching,” ACM Trans. Quantum Comput. 3, 16 (2022). doi:10.1145/3505637.
arXiv:2105.13082.
[7]
M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information, Cambridge
University Press, 10th anniv. ed. (2010). doi:10.1017/CBO9780511976667.
[8]
S. Bravyi, A. W. Cross, J. M. Gambetta, D. Maslov, P. Rall, T. J. Yoder, “High-threshold and
low-overhead fault-tolerant quantum memory,” Nature 627, 778 (2024). doi:10.1038/s41586-
024-07107-7. arXiv:2308.07915.
[9]
T. J. Yoder, E. Schoute, P. Rall, E. Pritchett, J. M. Gambetta, A. W. Cross, M. Carroll,
M. E. Beverland, “Tour de gross: A modular quantum computer based on bivariate bicycle
codes,” (2025). arXiv:2506.03094.
[10]
H. Bomb´ın, M. A. Martin-Delgado, “Topological computation without braiding,” Phys. Rev.
Lett. 98, 160502 (2007). doi:10.1103/PhysRevLett.98.160502. arXiv:quant-ph/0610024.
[11]
D. Bluvstein et al., “Logical quantum processor based on reconfigurable atom arrays,” Nature
626, 58 (2024). doi:10.1038/s41586-023-06927-3. arXiv:2312.03982.
[12]
D. Bluvstein et al., “A fault-tolerant neutral-atom architecture for universal quantum compu-
tation,” Nature 649, 39 (2026). doi:10.1038/s41586-025-09848-5.
[13]
S. Bravyi, J. Haah, “Magic-state distillation with low overhead,” Phys. Rev. A 86, 052329
(2012). doi:10.1103/PhysRevA.86.052329. arXiv:1209.2426.
[14]
D. Litinski, “A game of surface codes: Large-scale quantum computing with lattice surgery,”
Quantum 3, 128 (2019). doi:10.22331/q-2019-03-05-128. arXiv:1808.02892.
[15]
A. O. Quintavalle, M. Vasmer, J. Roffe, E. T. Campbell, “Single-shot error correc-
tion of three-dimensional homological product codes,” PRX Quantum 2, 020340 (2021).
doi:10.1103/PRXQuantum.2.020340. arXiv:2009.11790.
[16]
QuEra Computing, “Gemini-2: 256-qubit neutral-atom quantum computer,” product specifica-
tion, 2025. https://www.quera.com/gemini.
[17]
Atom Computing, “1180-atom neutral-atom quantum computer,” press release, October 2024.
[18]
Pasqal, “Orion Beta: Commercial neutral-atom quantum processor,” product specification,
2025.
27