Parallel Teleportation on the FCC Sheet Code for High Throughput Magic State Injection in Atom Arrays

Parallel Teleportation on the FCC Sheet Code for High Throughput

Magic State Injection in Atom Arrays

Raghu Kulkarni

SSMTheory Group, IDrive Inc., Calabasas, CA, USA

Corresponding author: raghu@idrive.com

June 23, 2026

Abstract

Universal fault-tolerant quantum computation requires magic-state injection at a rate matched

to algorithmic

-gate density. The injection itself is a gate-teleportation primitive that reduces

to parallel logical CNOTs between ancilla logicals (holding magic states) and target logicals

(in memory). In existing 3D-code architectures for reconﬁgurable neutral-atom arrays, the

memory side of this interface throttles total throughput: surface code, BB, and 3D color code all

support at most

(1) logical CNOTs per physical-gate layer at the memory-factory interface,

executed sequentially via lattice surgery or code switching. We propose a hybrid atom-array

architecture in which the FCC sheet code serves as the bulk memory and Cliﬀord layer and the

3D color code serves as the

-state factory, with the memory-factory interface implemented by a

high-throughput parallel-teleportation primitive speciﬁc to the FCC sheet code. The FCC sheet

code is a [[3

L, L

]] CSS code on the three triad sheets of the face-centered-cubic lattice. Its

three sheets are related by an order-three rotational symmetry

: (

x, y, z

)

7→

(

y, z, x

), which we

verify computationally at

= 4

8 is a CSS isomorphism whose induced action on logicals is

a (2

)

) symplectic permutation. A single transversal layer of

-paired physical CNOTs

implements 2

parallel logical CNOTs between two sheets in one physical-gate layer, with no

merge window, no gauge bits, and trivial

each

scaling. Circuit-level simulation on the full sheet

code at

= 4

8 shows that the transversal layer preserves the

∼

1% memory threshold under

correlated (feed-forward) decoding; a naive per-sheet matching decoder instead collapses on the

target sheet, a decoder artifact that we trace to the control-to-target error spread and resolve

with the feed-forward decoder provided here. This is the parallel-teleportation primitive that

the architecture’s memory-factory interface uses: 2

magic states are injected into 2

memory

logicals in a single physical-gate layer, with throughput that scales with

. Within-sheet local

Cliﬀords (memory,

, single-logical Pauli) and cross-sheet ZZ-/XX-merge surgery (Section 4)

handle the remaining Cliﬀord operations. We provide an explicit triangle-ribbon construction

for cross-sheet ZZ-merge, veriﬁed at

= 4

8, and show that its merge footprint is 3

2 atoms

– approximately 4

smaller than the surface-code routing-channel equivalent. The architecture

targets QuEra Gemini-class, Atom Computing, and Pasqal hardware, where Rydberg-based

interactions and tweezer rearrangement realize the required

12 FCC connectivity and the

R-respecting inter-sheet coupling naturally.

1 Introduction

Reconﬁgurable neutral-atom arrays have, over the past three years, become a leading hardware

platform for early fault-tolerant quantum computation. QuEra’s Gemini-2 (256 atoms, deployed 2025)

and Atom Computing’s 1180-atom system (announced late 2024) oﬀer commercial access to arrays

whose connectivity is set by Rydberg-blockade radius rather than fabricated couplers, and whose

qubits are physically rearrangeable on microsecond timescales between gate layers [

Bluvstein and collaborators have demonstrated below-threshold surface-code memory and universal-

logic primitives in a single 448-atom architecture [

]. These platforms have the structural property

that three-dimensional codes – which require physically realized inter-block connectivity beyond

what planar 2D superconducting chips provide – can be deployed natively, by simply arranging

atoms in the required geometry.

The architectural question is then: which 3D code is the right one to use, and for what purpose?

Three contenders have emerged. (i) Quantum LDPC codes such as the bivariate-bicycle (BB)

“gross code” [

] achieve a logical-qubit rate

k/n

approximately ten times that of the surface code at

matched distance, but their connectivity requirements (Tanner-graph degree six, with non-local

edges) and their lack of a transversal non-Cliﬀord gate makes them best suited to dense bulk memory.

(ii) The 3D color code possesses a transversal

gate (or transversal CCZ on the 3-torus) [

eliminating most of the overhead of magic-state distillation; it is the natural choice for a non-Cliﬀord

factory [

], but its high-weight stabilizers (typically

w ≥

6) and specialized restriction or BP-OSD

decoders make it heavier as bulk memory. (iii) The 3D toric code is single-shot [

] but has ﬁxed

k = 3 independent of system size and a poor logical-density scaling.

In a universal architecture, no single code suﬃces for both roles, and the standard pattern is

to pair a memory code with a factory code via a magic-state injection interface. In every existing

3D-code architecture, the interface is the throttle. Each magic-state injection is a gate-teleportation

primitive that reduces to a logical CNOT between an ancilla logical (holding the magic state) and a

target logical (in memory). When the algorithm calls for

parallel

-gates, the interface must

execute

logical CNOTs in parallel. Surface code and 3D color code architectures execute these

CNOTs sequentially via lattice surgery or code switching, at

∼ d

rounds per CNOT; BB codes

execute them via Tour-de-Gross routing, at higher constant overhead. The factory’s intrinsic output

rate is rarely the bottleneck for typical T -densities; the memory-side interface throughput is.

This paper. We argue that the FCC sheet code ﬁlls the missing role by supplying a high-throughput

parallel-teleportation primitive that does not exist in any other 3D code in the current literature.

The primitive arises from the FCC lattice’s threefold rotational symmetry

: (

x, y, z

)

7→

(

y, z, x

which we show is a CSS isomorphism of the FCC sheet code with a clean induced action on logicals.

Applied as an inter-sheet pairing,

implements 2

parallel logical CNOTs between two sheets

in one physical-gate layer. This is exactly the primitive a parallel magic-state injection interface

needs. Combined with a 3D color-code factory operating in a separate spatial zone, the resulting

architecture supports universal fault-tolerant computation with memory-factory interface throughput

that scales with L rather than being throttled at O(1) logical CNOTs per layer.

Summary of the architecture. We propose a two-zone atom-array architecture (Figure 1):

•

FCC sheet code memory zone. Three triad sheets at

L ≥

4 encode 6

logical qubits (24

4) with rate

≈

3% logical density per atom. Stabilizers are weight 4; memory and

the transversal CNOT are MWPM-decoded (the latter with the feed-forward correction of

Section 5), while cross-sheet surgery uses BP+OSD because its weight-3 ribbon operators

create hyperedge errors; each sheet decomposes into

independent stacked toric codes.

Within-sheet local Cliﬀords use standard 2D toric/surface-code primitives. Cross-sheet joint

Pauli measurements use a triangle-ribbon surgery primitive with merge footprint 3

2 atoms,

veriﬁed at L = 4, 6, 8.

•

Parallel-teleportation interface (the new primitive). 2

parallel logical CNOTs between

two sheets in a single physical-gate layer, paired by the FCC threefold rotation

. This is

Memory zone: FCC sheet code

R : (x; y; z) ! (y; z; x)

transversal CNOT

[[3L

; 6L; L]] weight-4 stabilizers, MWPM-decodable

2L logicals/sheet, 24 total at L = 4, K = 4 ancilla

Factory: 3D color code

[[15; 1; 3]] [[15; 1; 3]]

15-to-1 distillation

transversal T: cubic error suppression

output stream

parallel teleport

2L injections

per layer

L = 4: » 384 atoms memory + » 100--200 atoms factory

Figure 1: Hybrid atom-array architecture. Left: the FCC sheet code memory zone hosts 6

logical

qubits across three interpenetrating triad sheets (

, S

), 2

per sheet. Within-sheet local

Cliﬀords use standard layer-wise toric-code primitives at

4. Cross-sheet operations use the FCC

threefold rotation

. Right: a

-state factory built from small 3D color-code blocks distils

|T ⟩

states

using transversal

. Center (the architecture’s new primitive): the parallel-teleportation

interface, implemented by the

-paired transversal-CNOT layer of Section 3, injects up to 2

magic

states from the factory into the memory in a single physical-gate layer. Throughput scales with

in contrast to the O(1) throughput of surgery-based interfaces.

the interface primitive used for parallel magic-state injection from the factory into memory.

Throughput scales linearly with

, in contrast to the

(1) throughput of surgery-based

interfaces in surface-code, BB, and 3D color-code architectures.

•

3D color-code factory zone. Small [[15

3]] tetrahedral color codes prepare

|T ⟩

states via

transversal

, distilled by 15-to-1 routines [

]. A bank of factories produces magic states

in parallel at a rate that scales with the number of factory blocks, designed to match the

memory’s 2L-parallel injection capacity.

•

Atom-array realization. The FCC lattice and three sheets are laid out as three interpene-

trating tweezer arrays. The parallel-teleportation interface is realized by tweezer rearrangement

that brings

-paired data atoms into Rydberg-blockade range, followed by a single global gate

pulse. Magic states are produced in a separate factory zone and shuttled to the memory via

the array’s native rearrangement primitive.

Outline. Section 2 reviews the FCC sheet code’s construction, code parameters, and decomposition

into stacked toric codes. Section 3 establishes the threefold rotation as a CSS isomorphism and

develops the transversal-CNOT construction. Section 4 presents the within-sheet surgery primitives

(ZZ-merge and XX-merge) and their boundary-aware planar variants. Section 5 consolidates the

threshold simulations across all primitives. Section 6 develops the full hybrid architecture, including

the magic-state factory interface. Section 7 compares architectural cost against the surface-code-

plus-15-to-1 baseline. Section 8 maps the architecture onto reconﬁgurable atom-array hardware.

Section 9 discusses scope, limitations, and open questions.

Scope and what this paper is not. This paper proposes an architecture; it is not a hardware

demonstration, nor a claim that the FCC sheet code is universally optimal across all metrics. On

rate at very high distance (say

d ≥

12), bivariate-bicycle codes pull ahead and remain the better

memory choice [

]. On transversal non-Cliﬀord gates, the 3D color code dominates and remains

the right factory choice [

]. The FCC sheet code does not provide a general-purpose “faster

Cliﬀord CNOT” for arbitrary single-CNOT patterns – those are best handled by within-sheet local

Cliﬀords or cross-sheet surgery. Our claim is narrower: at the parallel magic-state injection interface,

where many logical CNOTs in a structured pattern are needed simultaneously, the FCC transversal

CNOT provides 2

-parallel throughput that no other code in this survey matches. When paired

with a factory that produces magic states at a scaling rate, this primitive turns the memory-factory

interface from an

(1)-throughput bottleneck into an

(

)-throughput pipeline. The architecture’s

value comes from this speciﬁc combination, not from any single code in isolation.

2 The FCC Sheet Code

The FCC sheet code introduced here is closely related to, but distinct from, the high-rate FCC-edge

CSS code of [

]. Both codes place qubits on edges of the FCC lattice; the diﬀerence is in the

stabilizer choice. The high-rate code uses weight-12 stabilizers (vertex

over all 12 incident edges;

octahedral

over all 12 edges of a void), yielding a constant-distance [[3

3]] code with

rate approaching 2

3. The sheet code we develop here restricts each stabilizer to a single triad

sheet of

4 edges, yielding a [[3

L, L

]] code with weight-4 stabilizers, lower rate (

≈

6% logical

density per atom at

= 4), and distance growing as

. These are two diﬀerent points on the

rate–distance frontier of the FCC-edge lattice: the high-rate code maximizes

k/n

at ﬁxed

, the

sheet code maximizes

at ﬁxed local stabilizer weight. The architectural primitives of this paper –

the layer decomposition, the cross-sheet surgery, the rotation-derived transversal CNOT – rely on

the sheet decomposition and do not apply to the weight-12 code.

2.1 Triad Decomposition

The face-centered-cubic lattice has

= 12 nearest-neighbor vectors, partitioning into three orthog-

onal sheets of four:

: (±1, ±1, 0),

: (±1, 0, ±1), (1)

: (0, ±1, ±1).

Each FCC edge belongs to exactly one sheet. At lattice size

(even, for toric boundaries), each

sheet contains L

edges, and restricted to a single sheet each vertex has K=4 incident edges.

2.2 Sheet Code Stabilizers

Fix one triad sheet

. Place one physical qubit on each edge in

, giving

qubits. The

stabilizers are:

• Z-stabilizers: for each vertex v, apply Z to the 4 edges of S incident to v.

• X

-stabilizers: for each octahedral void

, apply

to the 4 edges of

connecting the 6 vertices

surrounding o.

Both stabilizer types have uniform weight 4. The CSS condition

⊤

= 0 over

(2) is satisﬁed

because each edge in

participates in exactly two vertices and exactly two octahedral voids restricted

to S, so any vertex-void overlap is even.

2.3 Code Parameters and Layer Decomposition

Theorem 1 (Sheet code parameters). At even

, the FCC sheet code on a single triad sheet has

parameters [[L

, 2L, L]].

The full three-sheet code, with all three sheets deployed simultaneously on the same FCC lattice,

has parameters [[3

L, L

]] – three independent copies of [[

L, L

]] on disjoint data-qubit sets

that share the underlying lattice geometry.

The proof of Theorem 1 proceeds via the layer decomposition that is, in our view, the most

important structural fact about this code.

Theorem 2 (Layer decomposition). The FCC sheet code on

at even

is isomorphic, as a

stabilizer code, to

disjoint 2D toric codes, each on an

L × L

rotated square lattice with

data

qubits,

= 2 logicals, and distance

. Analogous decompositions hold for

(layered by

) and

(layered by x).

Proof.

Each

edge (

, v

) has

(

) =

(

) because the displacement vector

− v

∈

{

(

, ±

}

has

= 0. Deﬁne

layer

(

) =

(

). This partitions the

edges of

into

disjoint sets of

edges each. A vertex

-stabilizer at

acts on the four

-edges incident

, all sharing the

-coordinate of

, so it is supported entirely within one layer. The same

argument applies to octahedral-void

-stabilizers within

. Within layer

, the

edges

connect vertices

{

(

x, y, z

) :

y ≡ z

(

mod

}

via (

, ±

0); this is the standard rotated

L × L

toric code. The 2D toric code on

qubits has

= 2 and

. Summing across

layers gives

rank

(

) =

rank

(

) =

(

−

1) = (

−

)

2 and

−

)

2 = 2

. The

minimum-weight logicals are the non-contractible cycles within layers, length L.

The layer decomposition has several consequences worth ﬂagging. First, the rank formula holds

for every even

without per-

veriﬁcation. Second, decoding the FCC sheet code reduces to

running an independent 2D toric-code decoder (e.g. MWPM) on each layer in parallel; no specialized

3D decoder is required. Third, the code’s distance scaling is

rather than the surface code’s

∼ n; the rate is k/n = 2/L

per sheet, identical to the 2D toric code at distance L.

2.4 Planar Variant

For deployment without periodic boundary conditions, each layer becomes a rotated surface code

[[(

L −

, L −

1]] via standard boundary engineering [

]. The planar sheet code per sheet has

parameters

[[L(L − 1)

, L, L − 1]] (planar boundaries). (2)

The distance drops by one and the rate halves from 2

per sheet. We will use the planar

variant in some threshold simulations (Section 5); for the architectural proposal, either variant is

acceptable, with the toric variant oﬀering higher logical density and the planar variant simpler

atom-array layout.

2.5 Hardware footprint at L = 4 and the role of three sheets

To anchor the rest of the paper in concrete numbers, the toric variant at

= 4 gives one sheet

with parameters [[64

4]]: 64 data qubits, 8 logical qubits, distance 4. This is 8 atoms per logical

qubit on the data side, or

≈

16 atoms per logical including ancillas, comparable to the best 3D-code

rates published at this scale. The three-sheet deployment on the same FCC lattice gives [[192

4]]

– 24 individually addressable logicals in 192 data qubits. The role of having three sheets (rather

than just one) is twofold: it triples the logical density per atom-array footprint, and it enables the

cross-sheet transversal CNOT that we develop in Section 3.

3 The Threefold Rotation as a CSS Isomorphism

The three triad sheets of the FCC lattice are interrelated by an order-three rotational symmetry

of the FCC point group. We show in this section that this symmetry lifts to a CSS isomorphism

between the three single-sheet codes, with a clean induced action on logical operators, and use it to

construct a strictly fault-tolerant transversal CNOT between sheets.

3.1 The rotation

Deﬁne

R : (x, y, z) 7→ (y, z, x). (3)

is an order-3 element of the cubic point group, hence an automorphism of

that preserves

parity sums and maps FCC vertices to FCC vertices. Its action on triad displacement vectors is

, d

) 7→ (d

, d

), which cycles the three sheets:

−→ S

. (4)

Computational Result 1 (CSS isomorphism, veriﬁed at

= 4

8). The column permutation

on the data qubit set induced by

has the following properties, veriﬁed computationally at

= 4

•

Edge permutation.

is a permutation of the full edge set

with

(

edges of T

) =

edges of R

(

)

for each sheet T ∈ {S

, S

•

Stabilizer preservation. For every vertex

-stabilizer

of sheet

(

) is identically a vertex

Z-stabilizer of sheet R(T ). Similarly for every octahedral X-stabilizer.

•

Logical permutation. The induced action of

on logical operators is a (2

)

) permutation

matrix. Each

= 4 logical at sheet

(there are 8 per sheet, 24 total) maps to a single logical

operator of sheet

(

), preserving the

X/Z

type, the weight, and the symplectic pairing

⊤

= I.

The stabilizer-preservation and edge-permutation parts follow structurally from the fact that

is a rotational symmetry of the FCC lattice; these parts are expected to hold for all

, with the

computational veriﬁcation serving as conﬁrmation. The logical-permutation part is veriﬁed at the

canonical basis level only at

= 4

8; a general proof would require a basis-independent argument,

which we leave open.

The structural counts at

= 4 are: 96 vertex

-stabilizers and 96 octahedral

-stabilizers

across the three sheets, all matched by

; 24 logical operators (8 per sheet

3 sheets), all mapped

to single logical operators of the next sheet. The same matching counts hold at

= 6 (72 logicals,

all matched) and L = 8 (96 logicals, all matched).

3.2 Logical action of R

For each sheet

we extract a canonical logical basis

{

(

)

}

i=1

by Gaussian elimination in

(2), starting from generators of

ker

(

)

for

and

ker

(

)

for

. We then apply row

and column operations to bring the symplectic matrix

⟨

⟩ mod

2 to the identity. All

resulting logicals have weight L.

For each logical

(

O ∈ {X, Z}

), we apply

and re-express the image in the logical basis of

sheet R(T ) via the linear-algebra solver

π(

)

(σ

)

R(T )

+ (stabilizer of R(T )). (5)

= 4

8, the matrices

and

are equal and are (2

)

) permutation matrices. The

symplectic pairing

⊤

holds identically, conﬁrming that the induced logical action preserves

the symplectic structure of the code’s logical algebra.

Proposition 1. Applied as a SWAP layer cycling the three sheets, the rotation

implements a

logical permutation of the 6

logical qubits that preserves the

X/Z

partition and the symplectic

structure.

3.3 Transversal CNOT between two sheets

We turn

into an entangling operation by using it not as a SWAP but as an inter-sheet qubit

pairing. Pair each data qubit

on sheet

with the data qubit

(

) on sheet

(

), and apply a

physical CNOT CX

q,π(q)

in parallel across all L

qubits of T . Call this transversal layer U

T →R(T )

Computational Result 2 (Transversal CNOT, veriﬁed at L = 4, 6, 8). At L = 4, 6, 8:

• Stabilizer preservation. U

T →R(T )

maps the joint stabilizer group of T ∪ R(T ) to itself.

•

Logical action. On the logical level,

T →R(T )

implements 2

parallel logical CNOTs, with

sheet

as control and sheet

(

) as target. The pairing of logicals between control and target

sheets is the same permutation σ from Computational Result 1.

•

Fault tolerance. A single-qubit fault on the control (resp. target) side of any individual

propagates to at most one additional qubit on the target (resp. control) side. This is the

standard transversal-CNOT spread, bounded by weight two.

The fault-tolerance argument is the same as for the standard transversal CNOT between two

surface-code patches [

]: errors propagate weight-2 and remain local, so the post-CNOT code

distance equals

min

(

, d

R(T )

) =

. Realizing this distance in decoding requires accounting for the

control-to-target error spread: a correlated (feed-forward) decoder achieves it, whereas independent

per-sheet matching does not (Section 5).

3.4 Composition gives full Cliﬀord universality

Proposition 2. Let

single

denote the group of single-sheet Cliﬀord operations generated by the

local logical primitives of Section 4 (memory,

on individual logicals; within-sheet

- and

-merge surgery). Let

be the transversal-CNOT layer between any pair of sheets. Then

⟨C

single

, U

⟩ is the full Cliﬀord group on the 6L-logical system.

The proof is standard: within-sheet Cliﬀords generate all single-sheet Cliﬀord operations; the

transversal CNOT supplies inter-sheet entanglement; together they suﬃce to generate

Cliﬀ

)

by the standard generation theorems [

]. We emphasize that the entire Cliﬀord layer of the

architecture is therefore covered: bulk single-logical gates by single-sheet primitives, bulk entangling

by the transversal CNOT. Non-Cliﬀord gates require an external resource, which is the role of the

magic-state factory in Section 6.

4 Cross-Sheet Surgery Primitives

While the transversal CNOT of Section 3 is the architectural workhorse for unitary inter-sheet

entangling gates, the architecture also requires joint Pauli measurements between logicals – for state

preparation, gate teleportation, and the magic-state injection interface (Section 6). We develop the

cross-sheet joint Pauli measurement primitives in this section.

4.1 FCC triangles

Lemma 1 (Triangle structure). Every triangle (3-cycle) in the FCC nearest-neighbor graph has

one edge in each of the three triad sheets. At lattice size

, the FCC graph contains 4

triangles,

and each FCC edge participates in exactly four triangles.

Proof.

For three mutually adjacent FCC vertices

, v

, the three edge-vectors must each be

FCC neighbor vectors. Direct case analysis on the 12 neighbor vectors shows any three pairwise-

summing-to-zero neighbor vectors lie in distinct sheets. Counting: each vertex is in 24 triangles;

24 · L

/2/3 = 4L

. Each edge appears in 4L

· 3/(3L

) = 4 triangles.

4.2 Triangle operators and cross-sheet logicals

For an FCC triangle T with edges e

∈ S

, e

∈ S

, e

∈ S

, deﬁne

= Z

⊗ Z

, X

= X

⊗ X

. (6)

Computational Result 3 (Cross-sheet reachability, veriﬁed at

= 4

6). The space of triangle

products that commute with all stabilizers has dimension 6

L−

3 modulo stabilizers, and every nonzero

element of this space is supported on exactly two of the three sheets.

Veriﬁed at

= 4 (21 cross-sheet logicals out of 6

= 24) and

= 6 (33 out of 36). The missing

3 logicals at each

are global homological cycles that no ﬁnite triangle product can form; these are

accessible via standard ancilla-logical routing techniques [2].

4.3 ZZ-merge primitive (toric)

The cross-sheet joint

-measurement is implemented as follows. Place an ancilla qubit at each

triangle on a ribbon connecting two cross-sheet logicals. In the ribbon, each ancilla couples (via

CNOT) to its three triangle edges, one per sheet, then is measured in the

basis. The XOR of the

measurement outcomes along the ribbon yields the joint

⊗

eigenvalue.

Proposition 3 (

4 ancilla connectivity). The ZZ-merge ribbon protocol can be scheduled so that

each ancilla qubit couples to at most 4 neighbors during the merge: three triangle edges plus the

syndrome extraction circuit’s ancilla-ancilla coupling (if used).

The proof is short: each triangle ancilla couples to the three edges of its triangle (one data

atom per sheet) and to the syndrome-extraction ancilla-ancilla coupling, if present, in the standard

4-qubit syndrome circuit [

]. Both contributions are bounded by the triangle’s local geometry;

the ribbon’s triangles are disjoint by construction, so each ancilla’s neighborhood is the same size

whether the ribbon contains one triangle or many. This is the same

4 as the underlying surface

code’s stabilizer extraction.

4.4 XX-merge (CSS dual)

By the CSS duality of the sheet code (vertex

-stabilizers exchanged with octahedral

-stabilizers,

modulo a lattice-dual relabeling), the same triangle-ribbon protocol with all roles dualized implements

a cross-sheet joint

⊗

measurement. Threshold and overhead are equivalent (Section 5).

4.5 Planar boundary-aware surgery

For the planar variant (Section 2), the ribbon must terminate at the rough or smooth boundaries of

the aﬀected layers rather than wrapping around. We use a boundary-aware ribbon that anchors

to the appropriate rough/smooth boundary edges of each layer of each sheet. At

= 4

8 this

preserves the per-sheet distance

L −

1; the 0

76(5)% ﬁgure previously quoted for the planar

ZZ-merge rests on a surface-code memory proxy and has not yet been rebuilt on a boundary-aware

real circuit (Section 5).

4.6 Role in the architecture

The surgery primitives are not the architecture’s primary inter-sheet entangling mechanism – that

role belongs to the transversal CNOT of Section 3. Their architectural role is twofold:

Joint Pauli measurements. Used for state preparation (e.g., logical Bell-pair preparation

between sheets, ancilla-state preparation for the magic-state injection interface) and for any

algorithmic step that calls for a non-destructive joint Pauli measurement.

Magic-state injection. The factory-to-memory interface (Section 6) uses gate teleportation,

which fundamentally rests on joint Pauli measurements. The ZZ-merge primitive is the natural

way to perform these on the FCC sheet code side.

For bulk Cliﬀord circuits, the transversal CNOT layer is preferred for its lower depth, no gauge-bit

overhead, and trivial

each

scaling. The surgery primitives are reserved for when a joint measurement

speciﬁcally is what the algorithm requires.

5 Threshold Simulations

We summarize the threshold results across all primitives needed by the architecture. Stim [

]

is used for all circuit construction; PyMatching’s MWPM decoder [

] is used for decoding. All

sweeps use circuit-level depolarizing noise with one-qubit and two-qubit error rate

, including idle,

gate, reset, and measurement noise. Sample sizes are several

shots per point for

= 4

(typically 3–8

; exact per-point counts accompany the shipped data) and of order 10

for

= 8

corroboration sweeps.

5.1 Memory thresholds

Operating each sheet’s stabilizer extraction without any inter-sheet operation yields the baseline

memory threshold. By Theorem 2, each sheet decomposes into

independent 2D toric (or surface)

codes, so the memory threshold should be inherited directly from the 2D code [

], modulo ﬁnite-size

eﬀects. We verify this on the full sheet code rather than on a single-layer stand-in. Building the

complete weight-4 stabilizer-extraction circuit for one sheet directly from the lattice stabilizers, with

a direction-ordered schedule and circuit-level depolarizing noise, the decoding graph separates into

exactly

connected components at

= 4

8, one per layer. The layer decomposition therefore

survives a concrete syndrome-extraction schedule and circuit-level noise, not merely the stabilizer

algebra; this is the dynamical counterpart of the static check in Theorem 2. The resulting threshold,

measured over all 2

logicals of the sheet, is

mem,toric

≈

1% (Figure 2a), consistent with the

standard 2D toric threshold under the same noise model. The planar variant is slightly lower,

mem,planar

≈ 0.9%, owing to boundary eﬀects at small L.

5.2 Cross-sheet surgery thresholds

Table 1 lists the circuit-level depolarizing thresholds for the ZZ-merge and XX-merge primitives.

Primitive Variant Threshold FSS-extrapolated

ZZ-merge toric ≈ 1.2% d = L at L=4, 6, 8

ZZ-merge planar (boundary-aware) 0.76% (proxy) —

XX-merge toric ≈ 1.2% (CSS dual) —

Table 1: Circuit-level depolarizing thresholds for the cross-sheet surgery primitives. The toric ZZ-

merge is the real triangle-ribbon circuit (

fcc surgery circuit.py

; its joint-parity circuit distance

is veriﬁed to equal

= 4

8 by Stim’s minimum-weight graphlike-error search), decoded with

BP+OSD because the weight-3 triangle operators produce hyperedge (three-symptom) errors that

MWPM cannot decompose. The planar entry is the earlier surface-code memory proxy and has not

been rebuilt on a boundary-aware real circuit; the XX-merge follows by CSS duality and was not

separately simulated. Sample sizes are 8000

5000

3000 shots per point at

= 4

8. Full sweep

data and scripts are in the public code repository (see Code and Data Availability).

On the real triangle-ribbon circuit, decoded with BP+OSD, the toric ZZ-merge has a threshold

≈

2%. The joint-parity logical error decreases with

at every point from

= 0

6% to 1

(the ordering is

< L=

4 throughout), and the

= 6 and

= 8 curves meet at

p ≈

(8000

5000

3000 shots at

= 4

8). This is consistent with the

≈

1% memory threshold,

indicating that the merge does not degrade the code’s error tolerance. An earlier harness estimated

this primitive with a surface-code memory proxy, which returned a compatible 1

07%; the value

quoted here instead rests on the actual merge circuit. The planar variant’s 0

76% still rests on that

proxy and would need its own boundary-aware real circuit to conﬁrm.

5.3 Transversal-CNOT thresholds

For the transversal-CNOT layer between two sheets we run joint two-sheet experiments on the

full sheet code at

= 4

8, with control sheet

and target sheet

(

) paired by the actual

rotation permutation

of Computational Result 1. Each shot consists of

rounds of stabilizer

extraction on both sheets, one transversal-CNOT layer

T →R(T )

, and a further

rounds, with

circuit-level depolarizing noise throughout; we measure the per-sheet logical error over all 2

logicals

of each sheet. The circuits, the

pairing, and both decoders are constructed directly from the

lattice stabilizers (see Code and Data Availability), so the thresholds quoted here are for the code

itself, not a per-layer reduction. As an independent, decoder-free check, Stim’s minimum-weight

graphlike-error search conﬁrms that both the memory and the transversal-CNOT circuits have

circuit distance exactly

= 4

8; the transversal layer preserves the full code distance, so

any threshold degradation observed under a given decoder is a property of that decoder rather than

of the circuit.

0.6 0.8 1.0 1.2 1.4

physical error rate

(%)

logical error rate

1.1%

(a) Memory

= 4

= 6

= 8

0.2 0.4 0.6 0.8

physical error rate

(%)

worsens

with

(b) Transversal CNOT, naive MWPM

= 4

= 6

= 8

0.2 0.4 0.6 0.8

physical error rate

(%)

improves

with

memory (

= 6)

= 4

= 6

= 8

Figure 2: Threshold of the transversal CNOT on the full FCC sheet code,

L ∈ {

}

, circuit-level

depolarizing noise. (a) Baseline memory threshold per sheet,

≈

1%; the decoding graph

separates into

independent layers (Section 5). (b) Target-sheet logical error after the transversal

CNOT under naive per-sheet MWPM: the curves worsen with

, i.e. the apparent threshold

has collapsed. (c) The same experiment decoded with the feed-forward decoder of the text: the

target-sheet error now decreases with

and tracks the memory reference (dotted), recovering a

threshold near the memory value. Error bars are binomial (1

);

= 8 points in (b,c) use fewer

shots and are corroborating.

Naive decoding collapses on the target sheet. Decoding the two sheets with independent

MWPM produces a sharply asymmetric result (Figure 2b). The control-sheet logical error is

unchanged from memory and improves with

: the conjugation

X 7→ X ⊗ X

sends control errors

onto the target, so the control logical only ever sees its own errors and decodes at the

≈

memory threshold. The target-sheet logical error, by contrast, worsens with

across the sweep,

with an apparent threshold near 0

3%. The cause is the transversal CNOT itself: it copies the

control sheet’s residual

-ﬁeld onto the paired target qubits, so the target decoder is presented with

its own errors superimposed on an inherited ﬁeld it cannot account for. This is the standard error

spread of a transversal CNOT (Computational Result 2, weight-2), and it is decoder-induced rather

than a property of the code; the control side establishes that the post-CNOT code distance is intact.

Feed-forward decoding restores the threshold. The inherited ﬁeld is correctable once the

control errors are known, which motivates a feed-forward (correlated) decoder. Decode the control

sheet ﬁrst; read oﬀ its inferred

-pattern at the instant of the CNOT; propagate that pattern

through the qubit-aligned pairing

onto the target qubits; remove its syndrome footprint from the

target detectors; and decode the target residual, folding the inherited contribution back into the

target logical. This is the matching-decoder analogue of correlated decoding for transversal gates [

We verify the bookkeeping exactly in a controlled run with the target syndrome-extraction noise

disabled: with only the inherited ﬁeld present, the predicted footprint reproduces the observed

target detectors in every shot (0 mismatches in 1500 shots at L = 4, 6).

Under feed-forward decoding the target-sheet error decreases monotonically with

(Figure 2c)

and tracks the memory reference; the feed-forward curves remain ordered in

throughout the swept

range, placing the threshold above

≈

8%, while the idealized error-frame decoder reaches the full

≈

1% memory threshold. An idealized variant that removes the true control errors (extracted from

the simulator’s error frame) recovers the full memory threshold: the target logical error then equals

the control/memory error to within statistics at every point (for example 0

016 versus 0

013 at

= 6,

= 0

003), conﬁrming that the residual gap between feed-forward and memory is decoder-estimate

quality, not a code property. The numerical results support three claims.

•

(R1) Threshold preservation under correlated decoding. With feed-forward decoding the

transversal CNOT preserves the

≈

1% memory threshold: both sheets improve with

, and

the idealized decoder reproduces the memory threshold exactly. Under naive independent

MWPM the target threshold collapses; threshold preservation is therefore a statement about

the decoder, not the code.

•

(R2) Bounded spread. A single fault spreads to weight at most two across the CNOT

(Computational Result 2); the controlled-noise test conﬁrms that the inherited ﬁeld is exactly

the π-image of the control X-pattern, with no additional growth.

•

(R3) Distance scaling. At

= 0

004 (below threshold), the feed-forward target logical error

scales 0

164

→

069

→

027 at

= 4

8, a suppression of a factor

≈

5 per increment

∆

= 2; under naive decoding the same quantity moves the wrong way (0

299

→

361

→

0.481). The control sheet scales as memory throughout.

Sample-size note for

= 8. The

= 8 points use fewer shots than the

= 4

6 sweeps (of

order 10

versus several

) and serve as a corroborating three-way scaling check; the qualitative

conclusions (R1)–(R3) are established at L = 4, 6 and reinforced, not carried, by L = 8.

5.4 Summary of threshold landscape

Table 2 consolidates the thresholds across all primitives needed by the architecture.

Operation Threshold Notes

Memory (toric) ≈ 1.1% Inherited from 2D toric code

Memory (planar) ≈ 0.9% Small-L boundary eﬀects

ZZ-merge (toric) ≈ 1.2% FT, real ribbon circuit, BP+OSD, d = L

ZZ-merge (planar) 0.76% (proxy) boundary-aware variant, not yet rebuilt

XX-merge (toric) ≈ 1.2% CSS dual of ZZ-merge

Transversal CNOT ≈ 1.1% Inherits memory threshold under feed-forward; naive MWPM collapses

Table 2: Threshold values for all primitives in the architecture. All thresholds are circuit-level

depolarizing. Memory is MWPM-decoded; the transversal CNOT uses the correlated feed-forward

decoder of Section 5 (naive per-sheet MWPM collapses on the target sheet); cross-sheet surgery uses

BP+OSD, since its weight-3 ribbon operators create hyperedges that MWPM cannot decompose.

The transversal-CNOT and toric ZZ-merge circuits are independently veriﬁed to have circuit distance

d = L; the within-sheet local Cliﬀords are inherited from the standard 2D toric/surface code.

The key architectural takeaway is that all primitives operate at thresholds near 1% under circuit-

level depolarizing noise, which is comfortably within the operating regime of current atom-array

hardware (gate errors at the 10

−3

level, with continuing improvement). For the transversal CNOT

this holds under correlated (feed-forward) decoding; naive per-sheet matching collapses on the

target sheet, so the parallel-teleportation interface should be paired with the feed-forward decoder

of Section 5.

The Hybrid Architecture: FCC Memory + Color-Code Factory

We now assemble the full architectural proposal. The structure follows the established memory +

factory division [

], but with the memory-factory interface elevated to a ﬁrst-class architectural

element rather than treated as an incidental connection between two zones.

6.1 The interface throughput problem

In any universal architecture built from a Cliﬀord memory paired with a magic-state factory, every

-gate insertion in the algorithm requires a magic-state injection at the interface. Each injection

reduces, via gate teleportation, to a logical CNOT between an ancilla logical (holding the magic

state) and a target logical (in memory). When the algorithm calls for

parallel

-gates – common

in unrolled phase-estimation circuits, parallel quantum chemistry primitives, and lattice-Hamiltonian

simulation by Trotterization – the interface must execute N logical CNOTs in parallel.

Existing 3D-code architectures throttle at this interface. Surface-code interfaces execute one

teleportation per lattice-surgery cycle (

∼ d

rounds), so

parallel injections take

N · d

rounds

unless multiple memory-factory boundary regions are provisioned in parallel. 3D color-code memory

architectures use code switching, which is itself several rounds. Bivariate-bicycle code architectures

use the Tour-de-Gross protocol [

] with multi-block routing. In all cases, the interface throughput is

O(1) logical CNOTs per physical-gate layer regardless of memory size.

6.2 Architectural overview

The architecture comprises three architectural elements occupying distinct regions of the atom

array:

Memory zone. The FCC sheet code on three triad sheets, at a chosen code distance

. This

zone holds the active logical qubits during algorithm execution. Operations:

•

Single-sheet local Cliﬀord operations (memory cycles,

, and single-logical Pauli on 2

logicals per sheet) using standard 2D toric/surface-code primitives applied layer-wise.

•

Within-sheet cross-patch surgery for joint Pauli measurements between logicals on the same

sheet.

•

Cross-sheet surgery via the triangle-ribbon primitive (Section 4) for arbitrary inter-sheet joint

Pauli measurements.

Factory zone. A bank of magic-state distillation factories using small 3D color-code blocks.

Operations:

• Encoded |T ⟩ state preparation via transversal T on [[15, 1, 3]] tetrahedral color codes.

• 15-to-1 magic-state distillation [13]; higher-order distillation as needed [14].

• Parallel operation: N

fac

factory blocks produce N

fac

magic states per distillation cycle.

Parallel-teleportation interface. This is the architecturally new element. Magic states are

injected into the memory zone via 2

-parallel gate teleportation in a single physical-gate layer,

implemented by the FCC threefold-rotation transversal CNOT of Section 3.

The interface uses the standard

-gate teleportation gadget [

], executed in parallel across 2

ancilla–target pairs. For one magic state into one target, the gadget is:

Prepare an encoded

|T ⟩

√

(

⟩

iπ/4

⟩

) in an ancilla logical (color-code block, exiting the

factory).

2. CNOT from target to ancilla: |ψ⟩

tgt

⊗ |T ⟩

anc

tgt→anc

−−−−−−→ · · · .

3. Measure the ancilla in the X basis.

4. Apply a conditional S correction on the target depending on the measurement outcome.

The parallel version simply executes 2

copies of this gadget concurrently. The transversal CNOT

layer of Section 3 performs step (2) for all 2

pairs in a single physical-gate layer. Steps (1), (3),

and (4) parallelize trivially – they are within-block operations on the ancilla side (preparation,

single-qubit measurement) or single-logical Pauli corrections on the target side.

The full interface operation is:

Up to 2

factory blocks each prepare an encoded

|T ⟩

in parallel; tweezer rearrangement places

them in the ancilla positions of one FCC memory sheet.

One transversal-CNOT layer (paired by

) executes step (2) of the gadget for all 2

pairs in

one physical-gate layer.

ancilla logical measurements in parallel (one

-basis measurement per ancilla logical, in

turn implemented by transversal single-qubit

-basis measurements on the data atoms of the

ancilla logical).

4. Per-target conditional S corrections complete the 2L gate teleportations.

The interface throughput is 2

logical CNOTs per physical-gate layer – linear in code distance,

compared to the

(1) throughput of surgery-based interfaces. At

= 4 this is 8 parallel injections

per layer; at L = 8, 16 parallel injections.

We note that the surgery-based ZZ-merge primitive of Section 4 provides an alternative route

to the same gate teleportation: instead of a transversal CNOT in step (2), one performs a joint

tgt

⊗

anc

measurement via the triangle ribbon. This implements the same teleportation but at

(1) throughput (one teleportation per ribbon merge cycle). The parallel-teleportation interface

is therefore strictly more parallel than the surgery alternative, with both available in the same

architecture.

6.3 When the parallel-teleportation primitive applies and when it does not

The 2

-parallel transversal CNOT is best understood as a structured-parallel CNOT primitive, not

a general-purpose Cliﬀord gate. Its 2

logical CNOT pairings are ﬁxed by the FCC rotation

symplectic permutation determined by the lattice, Computational Result 1); the user cannot choose

which pairs of logicals are involved.

For magic-state injection speciﬁcally, this is not a constraint. Magic states are prepared states

whose identity is determined at the moment of preparation, so the compiler can prepare magic state

in whichever ancilla slot maps to the desired target under

. The pairing constraint becomes a

placement constraint, which is satisﬁable for any subset of target logicals up to the 2

-per-layer

limit.

For arbitrary single-CNOT Cliﬀord patterns, by contrast, the transversal CNOT is over-

provisioned: it applies CNOTs to all 2

paired logicals, not just the one or two the algorithm wants.

These extra CNOTs either need to be absorbed (apply to logicals in

⟩

, no eﬀect) or undone by

subsequent operations. For general-purpose Cliﬀord circuits with arbitrary CNOT patterns, the

surgery primitive (Section 4) is the appropriate tool.

The cleanest statement of the architecture’s claim is therefore:

The FCC sheet code provides the highest published parallel-teleportation throughput among

3D codes suitable for atom-array hardware, at 2

logical CNOTs per physical-gate layer.

The throughput is realized at the magic-state injection interface, where structured parallel

CNOTs are exactly the required operation. For general-purpose Cliﬀord gates between

arbitrary logical pairs, the architecture uses cross-sheet surgery (footprint 3

2 atoms),

still competitive with surface-code surgery but without the

-scaling advantage of the

parallel-teleportation case.

6.4 What each layer of the architecture contributes

Before comparing against competing codes, we make explicit what each architectural ingredient

adds. Table 3 reads top to bottom as a buildup: the minimal storage-only conﬁguration through to

the full universal-computation architecture.

Two readings of the table are useful. Read row by row, it answers the question “what does

each architectural ingredient buy you?” Read by comparison, it answers the question “at which

row does the FCC structure start to matter beyond just being a packing of three independent toric

codes?” The answer to the second question is row (d): everything above row (d) is achievable by

three independent stacked toric codes on disjoint atom regions. Rows (d) and (e) are where the

FCC structure earns its place in the architecture, by supplying cross-sheet operations that do not

exist for independent stacks. Row (f) brings in an external factory, which is needed for universality

regardless of the memory code choice.

6.5 Cross-architecture survey

Table 4 situates the FCC sheet code architecture in the landscape of 3D and atom-array-suitable

QEC codes. We compare on metrics that are decisive for atom-array deployment at near-term scale

(d ≤ 8, ∼ 10

–10

atoms).

The table establishes three claims. First, in the surgery-only conﬁguration (no transversal

CNOT), the FCC sheet code is competitive with the surface code on threshold and rate, and the

triangle-ribbon merge footprint (3

2 atoms) is approximately 4

smaller than the surface-code

routing-channel equivalent. Second, the transversal-CNOT layer (Section 3) supplies an interface

primitive with

-scaling throughput, the only

-scaling cross-block entangling primitive in the

survey; it inherits the memory threshold under feed-forward decoding (Section 5). Third, the full

architecture pairs the FCC memory layer with the 3D color-code factory, which is the only mature

route to transversal T .

We note explicitly what the table does not show. At higher distance (

d ≥

12), bivariate-bicycle

codes pull ahead on rate, and remain the more eﬃcient bulk-memory choice once the system grows

beyond the near-term atom-array regime. The FCC architecture’s advantage is speciﬁcally for the

d ≤ 8 regime where atom-array systems will operate over the next several years.

Conﬁguration

Logical

qubits (

Operations avail-

able

Gate set gen-

erated

Universal?

Section

(a) One sheet, memory only 8 memory cycles

none beyond

Pauli frame

no §2

(b) One sheet + within-sheet

local Cliﬀords

, single-

logical Pauli

single-sheet

Cliﬀords

only (no

entangling)

no §2

sheet Cliﬀords (no cross-sheet

ops)

three indepen-

dent copies of

(b)

three dis-

joint Cliﬀord

groups, no

entangling

between

sheets

no §2

(d) (c) + cross-sheet ZZ/XX-

merge surgery

+ inter-sheet

joint Pauli mea-

surements

full Cliﬀord

group on 24

logicals via

surgery

no (no

)

§4

(e) (d) + transversal CNOT

via R

+ 2

-parallel

cross-sheet

CNOT in one

layer

same Cliﬀord

group, much

lower depth

no (no

)

§3

(f) (e) + 3D color-code

factory

24 (memory)

-state in-

jection via gate

teleportation

full

Cliﬀord + T

yes §6

Table 3: Capability ladder for the FCC sheet code architecture, with logical-qubit counts at

= 4.

Each row adds one architectural ingredient. Rows (a)–(c) are inherited directly from the layer

decomposition (Theorem 2) and do not exploit the FCC structure beyond geometric packing. Rows

(d) and (e) are the FCC-speciﬁc contributions: cross-sheet triangle surgery and the threefold-rotation

transversal CNOT. Row (f) completes universal computation by adding the magic-state factory. At

each row, the new operations either expand the available gate set or reduce the depth of operations

already available; the architecture’s value comes from the cumulative stack.

6.6 Why this division of labor is more than “two codes on one chip”

Several design choices distinguish this architecture from a trivial composition of two codes. First,

both codes are 3D, so they share the same hardware requirements (Rydberg connectivity, tweezer

rearrangement) and can occupy the same atom-array footprint at diﬀerent times via tweezer

reconﬁguration. Second, the parallel-teleportation interface (Section 6) is implemented by an

FCC-native primitive – the transversal CNOT via

– rather than by general-purpose surgery or

code switching. This is what converts the interface from an

(1)-throughput bottleneck into an

(

)-throughput pipeline. Third, both codes are MWPM- or BP-OSD-decodable in well-understood

circuits, so the architecture does not require novel decoder development.

6.7 Algorithmic throughput model

For an algorithm with

total

-gates and per-layer

-density

(fraction of memory logicals

receiving a

-gate per Cliﬀord layer), the throughput is bottlenecked by the slower of the memory-side

interface and the factory output. In our architecture:

Architecture

Rate

k/n

(data) at

d=4

Threshold

(circuit)

Decoder

Parallel tele-

port per layer

Transversal

T ?

Atom-

array

demo’d?

Rotated surface

code (independent

patches)

≈

6% per

patch

≈ 1% MWPM O

(1) (per

surgery chan-

nel)

yes (Blu-

vstein

2025)

Stacked 2D toric (3

independent sheets)

per

sheet

≈ 1%

MWPM,

per layer

(1) (per

routing chan-

nel)

no (con-

cept)

Bivariate-bicycle,

gross [[144,12,12]]

144

≈

7–

0.8%

BP-OSD O

(1) (Tour de

Gross)

no (the-

ory)

3D color code,

[[15,1,3]] copies

≈

(code ca-

pacity)

restriction

/ BP-OSD

(1) (code

switching)

yes

yes (Blu-

vstein

2025, as

factory)

3D toric

(ﬁxed

k = 3)

≈ 1% MWPM O

(1)

(surgery)

no no

FCC sheet,

surgery-only

(rows c+d)

2/L

per

sheet

≈ 1.2%

BP+OSD

O(1)

(trian-

gle ribbon)

no (pro-

posed)

FCC sheet,

+ transversal

CNOT

same same

MWPM

+ ﬀ

(transversal

layer)

no (pro-

posed)

FCC + color-code

factory (full archi-

tecture)

memory:

same;

factory:

memory:

1%; fac-

tory:

1.5%

MWPM

+ ﬀ +

BP-OSD

yes (fac-

tory)

no (pro-

posed)

Table 4: Cross-architecture survey at

= 4 (where applicable). Top block: competing

codes/architectures. Bottom block: this paper’s architecture at three levels of inclusion. The

“parallel teleport per layer” column counts the number of logical CNOTs at the memory-factory

interface that can execute in a single physical-gate layer – the bottleneck operation for parallel

magic-state injection. All competing architectures yield

(1) per layer because their cross-block

CNOTs are sequential (surgery, code switching, or constant-overhead routing). The FCC sheet code

with the transversal-CNOT layer yields 2

parallel injections per layer, the only

-scaling primitive

in the table. At

= 4 this is 8

the interface throughput of the surface-code baseline; at

= 8,

. Decoder abbreviation: ﬀ = feed-forward (correlated) decoder of Section 5, required for the

transversal CNOT since naive per-sheet matching collapses on the target sheet.

•

Memory-side interface throughput. 2

parallel teleportations per physical-gate layer (the

transversal CNOT primitive of Section 3). At

= 4, this is 8 injections per layer; at

= 8,

16 injections per layer. This is the FCC architecture’s signature contribution.

•

Factory output throughput. Each color-code factory block produces approximately one

|T ⟩

per

∼ d

factory

syndrome cycles. With

fac

parallel factory blocks, total throughput is

fac

factory

states per syndrome cycle. For early-fault-tolerant atom-array systems (

∼

atoms), small

factories at d

factory

= 3–5 allow N

fac

∼ 4–8 parallel blocks [12].

The two throughputs become matched when

fac

factory

≈

states per syndrome cycle (since

one transversal-CNOT layer takes one physical-gate layer, which is a fraction of a syndrome cycle).

For

= 4,

factory

= 3, the matched factory bank has

fac

≈

24 blocks, occupying

∼

30 = 720

atoms. Combined with the 384-atom memory zone at

= 4, the matched-throughput system ﬁts

within

∼

1100 atoms, comparable to the Atom Computing 1180-atom system or near-term scale-ups

of QuEra Gemini.

For algorithms with lower

-density, the factory bank can be smaller and the interface is over-

provisioned (the FCC architecture pays no penalty for unused interface capacity). For algorithms with

higher

-density approaching

∼

1 (e.g., dense phase-estimation circuits), the FCC architecture’s

(

) interface throughput is the only way to avoid serializing magic-state injection – competing

architectures with O(1) interfaces become the bottleneck regardless of factory provisioning.

7 Resource Estimates vs. Surface Code + 15-to-1

We compare the proposed architecture against the standard surface-code-plus-15-to-1 baseline at

matched logical-qubit count k and matched distance d.

7.1 Atom-count comparison

Counting both data and ancilla atoms, the resource demand for k logical qubits at distance d is:

•

Surface code baseline. Each logical occupies an independent surface-code patch of size

(2d − 1)

≈ 2d

atoms (data + ancilla). For k logicals: k · 2d

atoms.

•

FCC sheet code (three sheets, toric). At

, the three-sheet system encodes

= 6

logicals

≈

atoms (data + ancilla). For arbitrary

k ≤

⌈k/

)

⌉

FCC blocks, each

≈

atoms.

Table 5 compares the two at matched distance and logical count.

d k Surface (atoms) FCC (atoms) Surface/FCC ratio Notes

4 24 24 · 32 = 768 ≈ 384 2.0× 1 FCC block

4 48 48 · 32 = 1536 ≈ 768 2.0× 2 FCC blocks

6 36 36 · 72 = 2592 ≈ 1296 2.0× 1 FCC block

8 48 48 · 128 = 6144 ≈ 3072 2.0× 1 FCC block

Table 5: Memory-zone atom counts (data + ancilla) for the surface-code baseline versus the FCC

sheet code. Both use the rotated variant: surface code at

∼

atoms per logical (data + ancilla),

FCC at

∼

atoms total for 6

logicals (data + ancilla, three sheets). The FCC sheet code

consistently uses approximately half the atoms per logical. Counts here include ancillas; the data-

only counts are half of these values (e.g., FCC at L = 4 has 192 data atoms in 384 total).

The

≈

memory-zone savings at all distances reﬂects the underlying 2 : 1 rate advantage of

the 2D toric code over the rotated surface code, which the FCC sheet code inherits via its per-layer

decomposition. At the small distances relevant to near-term atom-array experiments (

d ≤

6), this

savings represents a meaningful reduction in atom-count requirements.

7.2 Cliﬀord-gate cost

Table 6 compares the depth cost of logical Cliﬀord-gate primitives.

For algorithms with high Cliﬀord density, the surface code spends

rounds per non-adjacent

CNOT (lattice surgery) or requires careful patch layout for transversal CNOTs to be feasible. The

FCC sheet code’s transversal CNOT layer executes a full block of 2

inter-sheet CNOTs in one

physical-gate layer, with no merge window. In algorithms whose CNOTs are not pre-arranged

for adjacency (most quantum simulation circuits, most QFT-like structures), this is a meaningful

constant-factor speedup.

Operation Surface code baseline FCC sheet code

Logical CNOT between two ad-

jacent patches

1 transversal layer

— (between two surface

patches: not applicable)

Logical CNOT, cross-sheet —

1 transversal layer (via rota-

tion)

Logical CNOT, non-adjacent

patches

surgery: d-round merge —

2L-parallel CNOT layer —

1 transversal layer (rotation),

2L logical CNOTs

Table 6: Comparison of logical Cliﬀord-gate primitives. The transversal CNOT via FCC rotation

gives 2

logical CNOTs in one physical layer, versus the surface code’s adjacency requirement for

transversal CNOTs or its d-round merge for non-adjacent patches.

7.3 Factory footprint

In both architectures, the

-state factory dominates atom count for algorithms with non-trivial

-gate counts. We assume both architectures use a 15-to-1 distillation routine. The factory codes

diﬀer:

•

Surface code baseline. Magic-state distillation on the surface code requires

∼

· d

factory

atoms

per distillation block at distance d

factory

•

FCC + color code. The factory uses small color-code blocks at [[15

3]], requiring 15 atoms

per block plus ancillas, for a total of

∼

30 atoms per distillation primitive. At higher factory

distances, cascaded distillation in color codes scales as

∼

· d

factory

atoms (the color code’s

3D scaling), but the constant factor is small at low d

factory

For low-distance factories (

factory

≤

5), the color-code factory is substantially smaller than the

surface-code equivalent because the transversal

eliminates the need for state-injection blocks. For

high-distance factories, the surface-code baseline catches up in atom count but loses on factory

depth (the surface code needs more rounds per distillation cycle than the color code does).

7.4 Merge-Footprint Comparison: Triangle Ribbon vs Surface-Code Surgery

For algorithms that involve frequent cross-block joint Pauli measurements (state preparation, magic-

state injection, multi-logical entangling operations), the spatial footprint of a single merge operation

determines how many parallel merges can be running on a ﬁxed atom budget. We compare the

triangle-ribbon cross-sheet ZZ-merge against the surface-code alternatives at matched code distance.

Triangle ribbon – veriﬁed construction. We construct an explicit length-

triangle ribbon

for the cross-sheet

⊗

measurement at

= 4

8 as follows. Take

to be the length-

Z-logical on sheet

in layer

= 0, and

the length-

Z-logical on sheet

in layer

= 0.

The ribbon is the sequence of

triangles whose

edge lies on

and whose

edge lies on

. Whether

triangles is the global minimum across all valid ribbons connecting the same logical

pair is an open question; our construction establishes feasibility and the structural counts shown in

Table 7, veriﬁed by direct computation (reproducible code in the bundle):

Resource accounting. For the FCC triangle ribbon, the atoms reserved during the merge (beyond

the source/target logicals themselves) are:

L Triangles in ribbon New ancillas Sheet-S

atoms Sheet-S

atoms

4 4 4 4 4 2

6 6 6 6 6 3

8 8 8 8 8 4

Table 7: Triangle ribbon for cross-sheet ZZ-merge, veriﬁed at

= 4

8. The ribbon has length

triangles, contributing

new ancillas. The data atoms touched in the two source sheets (

and

) are precisely the

atoms of the source logicals (not overhead). The third sheet (

) is

touched at

2 distinct atoms; each touched atom appears twice in the ribbon (the two yz-edge

contributions cancel pairwise mod 2), giving zero net eﬀect on the third sheet’s logical content but

reserving those atoms during the merge measurement window.

• L new ancilla atoms, one per triangle on the ribbon.

• L/

2 data atoms in the third sheet, borrowed for the measurement window but returned to the

third sheet’s code afterward (their yz-edge contributions cancel mod 2).

Total transient overhead: 3L/2 atoms.

For the surface-code alternatives operating between patches in diﬀerent planes of an atom array

(analogous to diﬀerent sheets):

• Routing channel approach. A length-L surface-code path connects the two patches. Channel

data atoms: L. Channel ancillas: L. Total transient overhead: 2L atoms.

•

Face-to-face approach. Use atom rearrangement to bring the two patches into adjacency, then

run standard merge. Seam ancillas:

. Total transient overhead:

atoms, plus rearrangement

time (typically ∼ 100 µs per move, against ∼ 5 µs per syndrome cycle).

Comparison at matched

. Table 8 compares the atom counts reserved during a single cross-

block ZZ-merge across the three protocols.

L FCC triangle ribbon Surface (routing channel) Surface (face-to-face)

4 6 atoms 8 atoms 4 atoms + rearrangement

6 9 atoms 12 atoms 6 atoms + rearrangement

8 12 atoms 16 atoms 8 atoms + rearrangement

Table 8: Atoms reserved during a single cross-block ZZ-merge, by code and protocol. The FCC

triangle ribbon’s 3

2 overhead beats the surface-code routing-channel approach by a factor of

approximately

at all

, and is competitive in atom count with the surface-code face-to-face

approach while not requiring atom rearrangement.

Qualitative advantages beyond atom count. Three additional advantages of the triangle

ribbon are worth noting:

No rearrangement required. The ribbon operates in place, with source and target logicals

remaining in their canonical lattice positions. Surface-code face-to-face surgery requires moving

one of the patches into adjacency, costing

∼ L

parallel tweezer moves at

∼

100

s each. In

wall-clock terms, the ribbon merge is faster by roughly a factor of ∼ 20 at L = 4.

Three-body Rydberg measurements as a hardware ﬁt. Each triangle ancilla is placed at the

geometric center of an FCC triangle, where it couples to all three triangle edges within

Rydberg-blockade range. This is the natural multi-body measurement primitive of atom-array

hardware [12] and exploits a capability surface-code surgery does not require.

Parallel merge throughput. Disjoint triangle ribbons can run simultaneously on the same lattice

without conﬂicting. At

= 4, the lattice has 4

= 256 triangles, of which a ribbon uses

= 4;

the remaining 252 triangles are available for other concurrent ribbons. Surface-code routing

channels conﬂict whenever they share routing regions, limiting the number of simultaneous

merges.

Implications for memory-only architectures. Even if one removes the transversal-CNOT

result of Section 3 from consideration and uses only the within-sheet local Cliﬀords plus triangle-based

cross-sheet surgery, the architecture retains a measurable advantage over independent surface-code

stacks for any computation involving cross-block joint Pauli measurements. This is the regime

relevant to magic-state injection from the color-code factory (Section 6), where the gate-teleportation

interface is itself a cross-block joint measurement. Thus the FCC structure contributes architectural

value even in a hypothetical surgery-only conﬁguration, distinct from its more dramatic contribution

via the transversal-CNOT layer.

7.5 Summary resource picture

For a representative scenario – 24 logical qubits at distance 4, with a low-distance factory supplying

|T ⟩

states at the rate needed for moderate-

-count algorithms – the FCC + color-code architecture

uses approximately:

• Memory: ∼ 384 atoms (vs. ∼ 768 for surface code).

• Factory: ∼ 100–200 atoms (vs. ∼ 500–1000 for surface code factory).

• Total: ∼ 500–600 atoms (vs. ∼ 1300–1800 for surface code).

This is within the current footprint of QuEra Gemini-2 (256 atoms) at moderate

, and well within

Atom Computing’s 1180-atom system at the full k = 24 scale.

We emphasize that these are ﬁrst-order estimates intended to set the scale; full circuit-level

resource estimates accounting for decoder latency, factory throughput limitations, and inter-zone

teleportation overhead are left to future work. The qualitative conclusion – that the proposed

architecture meaningfully reduces atom-count requirements relative to the surface-code baseline at

near-term scales – holds across the range of assumptions we examined, with a memory-zone saving

of roughly 2× at d ≤ 8.

8 Mapping to Reconﬁgurable Atom-Array Hardware

The architecture targets reconﬁgurable neutral-atom arrays speciﬁcally. We describe how each

primitive maps to the operations available on QuEra Gemini-class [

], Atom Computing [

], and

Pasqal [18] platforms.

8.1 Lattice layout

The three triad sheets of the FCC lattice are realized as three interpenetrating tweezer arrays. Each

sheet’s data qubits occupy a sub-array of

atoms; the sub-arrays share the same FCC lattice

positions but address diﬀerent edges of the lattice (corresponding to the three triads

, S

Ancilla atoms for stabilizer extraction are placed at lattice vertices (for

-stabilizers) and at

octahedral void centers (for X-stabilizers).

The full memory zone has footprint 3

data atoms and

∼

ancilla atoms, totalling

∼

atoms. At L = 4: ∼ 384 atoms. At L = 6: ∼ 1296 atoms. At L = 8: ∼ 3072 atoms.

8.2 Native gate primitives

Stabilizer extraction. Within each sheet, the weight-4 vertex

-stabilizers and octahedral-void

-stabilizers are extracted using the standard 4-qubit syndrome extraction circuit [

], executed in

parallel across all stabilizers of the sheet using global Rydberg pulses or per-pair tweezer addressing.

The Rydberg blockade radius is tuned to give nearest-neighbor coupling within the FCC sheet’s

K=4 connectivity; this is the same regime as current surface-code atom-array demos [12].

Transversal CNOT via tweezer rearrangement. The threefold-rotation transversal CNOT

(Section 3) is implemented in three steps:

Tweezer rearrangement brings each control-qubit atom in sheet

adjacent to its

-paired

target atom in sheet

(

). This rearrangement is the FCC threefold rotation realized as a

physical atom motion.

A single global Rydberg pulse implements

parallel physical CNOTs between paired atoms.

3. Tweezer rearrangement restores the atoms to their original FCC positions.

The rearrangement step is the dominant time cost. On QuEra hardware, atom rearrangement takes

∼

100

s for moves of a few lattice spacings, comparable to a single syndrome-extraction round [

The parallel CNOT pulse itself is < 1 µs.

Within-sheet surgery. Each sheet’s within-sheet surgery primitives (ZZ-merge and XX-merge

between logicals on the same sheet) are standard 2D toric/surface code lattice surgery, applied

layer-wise via the layer decomposition. No specialized atom-array primitive is needed beyond what

current surface-code demos already use.

Cross-sheet surgery. Cross-sheet ZZ-merge uses ancilla atoms placed at triangle centers, coupling

to three data atoms (one per sheet) within Rydberg-blockade range. This requires the ancilla atom

to sit near the geometric center of the triangle, which is a single position in the FCC lattice. No

long-range couplings are needed.

Magic-state injection. Two routes are available, depending on the algorithm’s demand:

•

Parallel injection (primary route, Section 6). Up to 2

factory blocks each prepare an encoded

|T ⟩

in parallel; tweezer rearrangement places them in the ancilla positions of one FCC memory

sheet. One transversal-CNOT layer (paired by

) executes the CNOT step of the gate-

teleportation gadget for all 2

pairs in a single physical-gate layer. 2

ancilla logical

-basis

measurements and per-target conditional

corrections complete the 2

teleportations. This

is the route that realizes the architecture’s O(L) interface throughput.

•

Single-state injection (fallback route, Section 4). When the algorithm needs only one or a few

magic states injected into speciﬁc targets that do not match the FCC rotation pairing, tweezer

rearrangement brings the color-code factory’s prepared

|T ⟩

atom into proximity with the target,

and the cross-sheet ZZ-merge surgery primitive performs a joint

tgt

⊗

anc

measurement to

drive the same gate teleportation at

(1) throughput. This route is the appropriate primitive

for sparse, irregular T -gate patterns.

The two routes share the same hardware primitives (Rydberg coupling, tweezer rearrangement) and

diﬀer only in the choice of gadget step (2) implementation: transversal CNOT versus joint Pauli

measurement.

8.3 Comparison to current demonstrations

The Bluvstein 2025 architecture [12] demonstrated:

• Surface codes up to d = 7, with Λ ≈ 2.14 below-threshold suppression.

• Transversal teleportation with [[15, 1, 3]] 3D color codes for non-Cliﬀord operations.

• Atom rearrangement and zoned architecture across 448 atoms.

The proposed architecture uses essentially the same hardware primitives – Rydberg blockade, atom

rearrangement, zoned spatial organization – and replaces the surface-code memory layer with the

FCC sheet code, retaining the 3D color code in its specialized role as the

-state factory. No new

hardware capability is needed beyond what the Bluvstein 2025 demonstration already exercises.

8.4 Near-term demonstration milestones

A staged demonstration on atom-array hardware would proceed as follows:

Single-sheet memory at

= 4.

∼

128 atoms (one sheet, 64 data +64 ancilla), demonstrating

below-threshold operation of a single [[64

4]] block. Comparable scale to Bluvstein 2025’s

d = 7 surface-code demo.

Three-sheet memory at

= 4.

∼

384 atoms. Demonstrates the full three-sheet structure and

veriﬁes the FCC lattice layout.

Transversal CNOT between two sheets at

= 4. Demonstrates the rotation-paired CNOT

layer and veriﬁes the threshold-preservation result (R1).

Full Cliﬀord universality at

= 4. Composition of within-sheet local Cliﬀords with the

cross-sheet transversal CNOT. Demonstrates Proposition 2.

Magic-state injection from a color-code factory. Couples the [[15

3]] color-code factory to

the FCC sheet code memory via gate teleportation, completing the universal gate set.

Each milestone is achievable on hardware in the

∼

256–1000 atom range, which is the current state

of the art on QuEra and Atom Computing systems.

9 Discussion

9.1 What this architecture is, and is not

The proposed architecture’s principal contribution is a parallel-teleportation primitive at the memory-

factory interface, derived from the FCC threefold rotation, with throughput that scales linearly with

code distance. No other 3D code in the current literature provides an

-scaling parallel-teleportation

primitive. Combined with a 3D color-code magic-state factory, the resulting architecture is universal

and – in the regime where the factory throughput can be scaled to match the memory’s 2

-parallel

injection capacity – removes a bottleneck that constrains all O(1)-interface architectures.

We have not claimed:

•

Optimality at high distance. Bivariate-bicycle codes pull ahead on rate at

d ≥

12 [

and remain the better bulk-memory choice at that scale. The FCC sheet code’s parallel-

teleportation advantage is speciﬁcally for the small-to-moderate distance regime (

d ≤

relevant to near-term atom-array experiments.

•

Optimality on non-Cliﬀord operations. The 3D color code’s transversal

is what we use as

the factory, not what we claim to improve upon.

•

A general-purpose Cliﬀord speedup. The transversal CNOT applies 2

CNOTs in a ﬁxed

pairing pattern; for arbitrary single-CNOT operations between speciﬁc logical pairs, the

architecture uses cross-sheet surgery (footprint 3

2 atoms), which is competitive with but

not dramatically better than surface-code surgery. The dramatic advantage is speciﬁcally

at the magic-state injection interface, which is the natural use case for structured parallel

CNOTs.

•

Deployability on K=4 2D superconducting hardware. The architecture requires 3D-native

connectivity that planar superconducting chips do not provide. Surface code remains the right

choice for that platform.

9.2 Scope of validity

The architecture’s validity rests on the structural and numerical results of Sections 2–5, which are

independent of the architectural framing. The structural results (CSS isomorphism of the threefold

rotation, layer decomposition, triangle algebra, ribbon construction) are veriﬁed at

= 4

by direct computation. The bundle accompanying this paper includes the triangle-enumeration,

triangle-ribbon-construction, and merge-footprint veriﬁcation scripts; the CSS-isomorphism, layer-

decomposition, and cross-sheet reachability checks are available in the public code repository (see

Code and Data Availability). The threshold results are circuit-level depolarizing simulations, sample

sizes of several

shots per point at

= 4

6 and of order 10

= 8; the simulation drivers,

Stim circuits, and raw shot data are in the public code repository. Memory is decoded with MWPM;

cross-sheet surgery is decoded with BP+OSD (its weight-3 ribbon operators create hyperedges that

MWPM cannot decompose); the transversal CNOT requires the correlated (feed-forward) decoder

of Section 5, since naive per-sheet matching collapses on the target sheet.

What requires further work for a complete architectural validation:

•

A real-time implementation of the feed-forward decoder for the transversal CNOT. We

demonstrate in simulation that correlated decoding restores the threshold and validate the

inherited-ﬁeld bookkeeping exactly, but the latency of a hardware-speed correlated decoder,

and its interaction with the factory’s own decoding, is not modeled here.

•

Full end-to-end resource and time estimates including decoder latency, factory output rate

variance, and inter-zone teleportation overhead. Section 7 gives ﬁrst-order atom-count estimates

and matched-throughput analysis; a precise time-overhead model remains open.

•

Detailed circuit-level threshold for the parallel-teleportation interface speciﬁcally. We have

characterized the FCC side’s surgery thresholds and inherited the color-code factory’s thresholds

from the literature, but the combined interface (transversal CNOT applied between an FCC

sheet’s ancilla slots and target slots) has not been simulated end-to-end with the color code

attached.

•

Direct comparison with the BB code architecture [

] at matched scale, including a parallel-

teleportation comparison at ﬁxed factory provisioning.

•

Compiler-level question: for algorithms whose

-gate patterns do not naturally match the

FCC pairing

, the slot-permutation problem must be solved at compile time. The cost of

this slot assignment for typical algorithm structures is an open question.

9.3 Open structural questions

Several structural questions about the FCC sheet code remain open and could strengthen the

architecture if resolved positively:

•

Additional transversal gates. The full octahedral symmetry group of FCC has order 48 and

contains order-2 and order-4 elements in addition to the order-3 rotation we use. Whether any

order-4 element induces a CSS isomorphism at the logical level remains open. An aﬃrmative

answer would add transversal H or S on top of the existing transversal CNOT.

•

Single-shot QEC. The FCC sheet code is not single-shot in the conventional sense (it inherits

the 2D toric code’s

-round requirement). Whether the cross-sheet triangle structure can be

exploited to recover single-shot capability for cross-sheet operations is open.

•

Higher-level distillation routines. The Litinski

∼

12-to-1 routines [

] achieve better magic-

state output rates than 15-to-1; adapting them to the color-code factory and the FCC interface

is open.

9.4 Outlook

The atom-array QEC landscape is moving fast. Within the past two years, surface-code-plus-3D-

color-code architectures have moved from theory to operating demonstration on 448 atoms. By the

time the present manuscript appears, the next-generation systems at QuEra, Atom Computing,

and Pasqal are likely to be operating at 1000–3000 atoms, well within the regime where the FCC

sheet code’s

L ∈ {

}

memory blocks become demonstrably feasible. We expect that direct

experimental implementations of the architecture proposed here become attainable within the next

several years on existing hardware roadmaps, and we view the structural and threshold results of

this paper as the analytical foundation for those demonstrations.

Code and Data Availability

The reproducibility bundle accompanying this paper includes the structural veriﬁcation scripts

(triangle enumeration, triangle-ribbon construction, atom-count comparison for the merge-footprint

analysis), the lattice modules, and the ﬁgure-generation code for Figure 1.

The repository-side scripts that verify the remaining structural claims (CSS isomorphism, layer

decomposition, cross-sheet reachability) and the threshold-sweep drivers and ﬁgure-generation

code for Figure 2 are packaged as

ssmtheory_repo_scripts.zip

in the public repository at

github.com/raghu91302/ssmtheory

. The zip’s

README.md

maps each script to the speciﬁc paper

claim it supports:

• verify css isomorphism.py – Computational Result 1, veriﬁed at L = 4, 6, 8.

• verify layer decomposition.py – Theorem 2, veriﬁed at L = 4, 6, 8 for all three sheets.

• verify cross sheet reachability.py – Computational Result 3, veriﬁed at L = 4, 6.

• sweep memory threshold.py

– memory threshold sweep (exact: each sheet decomposes into

stacked 2D toric codes).

• fcc sheet circuit.py

– full sheet-code circuits built directly from the lattice stabilizers: the

weight-4 memory circuit (Figure 2a) and the two-sheet transversal-CNOT circuit with the

real rotation pairing π (Figure 2b,c).

• fcc feedforward real.py

– the feed-forward (correlated) decoder of Section 5, the naive-

MWPM baseline, the idealized error-frame (genie) decoder, and the controlled-noise validation

of the inherited-ﬁeld bookkeeping.

• indep checks.py

– independent re-derivation: circuit distance via Stim’s minimum-weight

graphlike-error search (

= 4

8), a decoder-free symplectic check of the transversal

CX (weight-2 spread, stabilizer-group preservation), and a from-scratch phenomenological

feed-forward reproducing the controlled-noise invariant.

• fcc surgery circuit.py

– the real Z-basis cross-sheet ZZ-merge (triangle-ribbon) circuit,

replacing the earlier surface-code memory proxy; joint-parity distance

veriﬁed at

L = 4, 6, 8.

• bposd decode.py

– the Stim-DEM-to-BP+OSD decoder used for the surgery primitive (re-

quired because the weight-3 triangle operators are not graphlike), with

gen data surgery.py

and data surgery real.json the sweep driver and raw results.

• sweep zz merge threshold.py

– ZZ-merge threshold sweep driver (toric and planar variants).

• sweep transversal cnot threshold.py – transversal-CNOT threshold sweep driver.

• make fig transversal cnot.py – Figure 2 generation from JSON sweep data.

Raw shot data from the threshold sweeps reported in Section 5 is also archived in the same

repository. A frozen snapshot of the full script and data bundle is archived on Zenodo at

https:

//doi.org/10.5281/zenodo.20808547.

Declarations

Data availability: The datasets generated and analysed during the current study are available in

the Zenodo repository, https://doi.org/10.5281/zenodo.20808547.

Funding: not applicable.

Clinical trial registration: not applicable. This study is theoretical and does not constitute a

clinical trial.

Consent to Publish declaration: not applicable.

Ethics and Consent to Participate declarations: not applicable.

References

[1]

R. Kulkarni, “A 67%-Rate CSS Code on the FCC Lattice: [[192

130

3]] from Weight-12

Stabilizers,” (2026). arXiv:2603.20294.

[2]

C. Horsman, A. G. Fowler, S. Devitt, R. Van Meter, “Surface code quantum computing

by lattice surgery,” New J. Phys. 14, 123011 (2012). doi:10.1088/1367-2630/14/12/123011.

arXiv:1111.4022.

[3]

E. Dennis, A. Kitaev, A. Landahl, J. Preskill, “Topological quantum memory,” J. Math. Phys.

43, 4452 (2002). doi:10.1063/1.1499754. arXiv:quant-ph/0110143.

[4]

A. G. Fowler, M. Mariantoni, J. M. Martinis, A. N. Cleland, “Surface codes: To-

wards practical large-scale quantum computation,” Phys. Rev. A 86, 032324 (2012).

doi:10.1103/PhysRevA.86.032324. arXiv:1208.0928.

[5]

C. Gidney, “Stim: a fast stabilizer circuit simulator,” Quantum 5, 497 (2021). doi:10.22331/q-

2021-07-06-497. arXiv:2103.02202.

[6]

O. Higgott, “PyMatching: A Python package for decoding quantum codes with minimum-

weight perfect matching,” ACM Trans. Quantum Comput. 3, 16 (2022). doi:10.1145/3505637.

arXiv:2105.13082.

[7]

M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information, Cambridge

University Press, 10th anniv. ed. (2010). doi:10.1017/CBO9780511976667.

[8]

S. Bravyi, A. W. Cross, J. M. Gambetta, D. Maslov, P. Rall, T. J. Yoder, “High-threshold and

low-overhead fault-tolerant quantum memory,” Nature 627, 778 (2024). doi:10.1038/s41586-

024-07107-7. arXiv:2308.07915.

[9]

T. J. Yoder, E. Schoute, P. Rall, E. Pritchett, J. M. Gambetta, A. W. Cross, M. Carroll,

M. E. Beverland, “Tour de gross: A modular quantum computer based on bivariate bicycle

codes,” (2025). arXiv:2506.03094.

[10]

H. Bomb´ın, M. A. Martin-Delgado, “Topological computation without braiding,” Phys. Rev.

Lett. 98, 160502 (2007). doi:10.1103/PhysRevLett.98.160502. arXiv:quant-ph/0610024.

[11]

D. Bluvstein et al., “Logical quantum processor based on reconﬁgurable atom arrays,” Nature

626, 58 (2024). doi:10.1038/s41586-023-06927-3. arXiv:2312.03982.

[12]

D. Bluvstein et al., “A fault-tolerant neutral-atom architecture for universal quantum compu-

tation,” Nature 649, 39 (2026). doi:10.1038/s41586-025-09848-5.

[13]

S. Bravyi, J. Haah, “Magic-state distillation with low overhead,” Phys. Rev. A 86, 052329

(2012). doi:10.1103/PhysRevA.86.052329. arXiv:1209.2426.

[14]

D. Litinski, “A game of surface codes: Large-scale quantum computing with lattice surgery,”

Quantum 3, 128 (2019). doi:10.22331/q-2019-03-05-128. arXiv:1808.02892.

[15]

A. O. Quintavalle, M. Vasmer, J. Roﬀe, E. T. Campbell, “Single-shot error correc-

tion of three-dimensional homological product codes,” PRX Quantum 2, 020340 (2021).

doi:10.1103/PRXQuantum.2.020340. arXiv:2009.11790.

[16]

QuEra Computing, “Gemini-2: 256-qubit neutral-atom quantum computer,” product speciﬁca-

tion, 2025. https://www.quera.com/gemini.

[17]

Atom Computing, “1180-atom neutral-atom quantum computer,” press release, October 2024.

[18]

Pasqal, “Orion Beta: Commercial neutral-atom quantum processor,” product speciﬁcation,

2025.