Three Sheets, One Architecture

Three Sheets, One Architecture:
Inter-Sheet Logical Gates for K=4 Quantum Error
Correction
Raghu Kulkarni
SSMTheory Group, IDrive Inc., Calabasas, CA 91302, USA
raghu@idrive.com
Abstract
We present a CSS quantum error correcting code on the Face-Centered Cubic (FCC)
lattice that combines surface-code-compatible K=4 connectivity with native inter-
sheet logical gates between co-located CSS code blocks a combination not pre-
viously achieved among well-studied codes. The construction has two parts. First,
restricting the [[3L
3
, 2L
3
+2, 3]] FCC lattice code to a single triad sheet yields the
sheet code with parameters [[L
3
, 2L, L]] at even L (or [[L
3
, L, L]] as a planar variant),
uniform weight-4 stabilizers, and K=4 per-qubit connectivity. Three triad sheets
share the FCC lattice geometry and can be deployed in one of three ways mono-
lithic 2D for small L, a three-layer stacked architecture for scalable deployment, or
native 3D hardware in each case encoding 6L logical qubits at distance L. Second,
we introduce a fault-tolerant lattice surgery protocol that uses local FCC triangle
measurements to implement joint Pauli measurements between logical qubits in dif-
ferent sheets. Each surgery operation adds O(L) two-qubit gates to the O(L
3
) per-
round syndrome extraction cost. Under circuit-level depolarizing noise with MWPM
decoding on a custom Stim circuit with explicit triangle measurements, finite-size
scaling across L = 4, 6, 8 gives a surgery threshold p
surgery
th
= 1.07% ± 0.05%, where
the band reflects the spread of the three pairwise crossings. Compared to surface
code lattice surgery (also K=4, lattice surgery confined within a single patch), the
sheet code provides 2L logicals per sheet at the same distance and connectivity;
compared to bivariate bicycle codes (which target inter-block gates between physi-
cally separated modules at K=6 with long-range couplers), the sheet code achieves
inter-sheet gates at K=4 with only short-range couplings between co-located sheets.
1 Introduction
Two families of quantum error correcting codes currently dominate the discussion of near-
term fault-tolerant quantum computing. The surface code [3, 5] pairs K=4 planar hard-
ware compatibility with mature fault-tolerant protocols, including lattice surgery [6, 7]
for logical gates within a single 2D substrate. The bivariate bicycle (BB) code family [11]
achieves an order-of-magnitude rate advantage over the surface code at the cost of K=6
1
connectivity and a small number of long-range couplers. The BB family’s inter-block log-
ical gates gates that move logical qubits between distinct code blocks remain an
active research problem [12].
This paper introduces a third option that occupies a previously unfilled niche: a CSS code
at K=4 connectivity (matching the surface code) with native inter-sheet logical gates
between co-located CSS code blocks. This is a different design point from surface code
lattice surgery (which operates within a single substrate) and from bivariate bicycle codes
(which target inter-block gates between physically separated K=6 modules and where
modular logical operation protocols are an area of active architectural development). The
construction has two interlocking ingredients.
The FCC sheet code. Restricting the [[3L
3
, 2L
3
+2, 3]] FCC lattice code [1] to a single
triad sheet (one of three orthogonal K=4 sublattices in FCC) eliminates the FCC code’s
weight-3 vulnerability and yields a CSS code with parameters [[L
3
, 2L, L]] at even L. Each
of the three sheets decomposes into L parallel 2D toric codes; three sheets on a shared
FCC substrate encode 6L logical qubits at distance L on 3L
3
data qubits.
Cross-sheet triangle surgery. Every FCC triangle has one edge in each of the three
triad sheets. Weight-3 Pauli measurements on FCC triangles couple data qubits across
sheets, providing a natural primitive for inter-sheet logical operations. We show that
triangle products implement joint Pauli measurements between logical qubits in different
sheets, with the merged code preserving distance d = L.
The two ingredients work in tandem: the sheet code provides efficient storage at K=4; the
triangle surgery primitive provides logical gates at the same K=4 envelope. The result
fills a hardware-architecture gap that neither surface code nor bivariate bicycle codes
address: K=4 planar connectivity with inter-sheet logical gates as a built-in feature.
Summary of results.
Sheet code construction with parameters [[L
3
, 2L, L]] on a torus, [[L
3
, L, L]] on a
plane, at uniform weight-4 stabilizers and K=4 data-qubit connectivity (Section 2).
Three-sheet architecture: 6L
3
total qubits (3L
3
data + 3L
3
ancilla) on a single K=4
chip via time-multiplexed syndrome extraction (Section 3).
Triangle algebra: triangle products span 6L3 of the 6L cross-sheet Z-logicals; every
per-sheet logical reachable via some sheet pair (Section 4).
Fault-tolerant surgery protocol: explicit ancilla placement, gate schedule, K=4 ver-
ification, O (L) gate overhead (Section 5).
Distance preservation under merge (Section 6).
Threshold simulation via a custom Stim circuit with explicit triangle measurements
at L = 4, 6, 8: pairwise crossings give p
surgery
th
= 1.07% ± 0.05% (Section 7).
Detailed comparison with state-of-the-art codes (Section 8).
2
2 The FCC Sheet Code
2.1 The Triad Decomposition
The FCC lattice has K = 12 nearest-neighbor vectors, partitioning into three orthogonal
sheets of 4 (a decomposition originally introduced in [2] for the flag-assisted FCC code):
S
xy
: (±1, ±1, 0)
S
xz
: (±1, 0, ±1) (1)
S
yz
: (0, ±1, ±1)
Each FCC edge belongs to exactly one sheet. At lattice size L (even), each sheet contains
L
3
edges. Restricted to a single sheet, each FCC vertex has K=4 incident edges.
2.2 The Sheet Code Stabilizers
Definition 1 (FCC sheet code). Fix one triad sheet S (say S
xy
). Place one physical qubit
on each edge in S (n = L
3
qubits). The stabilizers are:
Z-stabilizers: for each vertex v, apply Z to the 4 edges of S incident to v.
X-stabilizers: for each octahedral void o, apply X to the 4 edges of S connecting the
6 vertices surrounding o.
Both stabilizer types have uniform weight 4. The CSS condition H
X
H
T
Z
= 0 over GF(2) is
satisfied because each edge in sheet S connects two vertices and participates in exactly two
octahedral voids restricted to S; the overlap between any X-stabilizer and any Z-stabilizer
is even.
2.3 Code Parameters
Theorem 1 (Sheet code parameters). At even L, the FCC sheet code has parameters
[[L
3
, 2L, L]]: n = L
3
physical qubits, k = 2L logical qubits, code distance d = L.
The parameters follow from the layer decomposition (Section 2.4) together with standard
2D toric code counting. Computational verification:
L n rank(H
Z
) rank(H
X
) k
4 64 28 28 8
6 216 102 102 12
8 512 248 248 16
In each case rank(H
Z
) = rank(H
X
) = (L
3
2L)/2, giving k = L
3
2(L
3
2L)/2 = 2L.
The general result follows from Theorem 2 below.
3
2.4 Layer Decomposition and Proof of Parameters
Why the distance increases. The full FCC code has d = 3 because weight-3 logical
operators exist at tetrahedral voids: one edge from each of the three triad sheets forms a
triangle commuting with all weight-12 stabilizers. Within a single triad sheet, only one
edge of any such triangle survives, giving a weight-1 operator that is detected by vertex
Z-stabilizers. Therefore no weight-3 logical survives the sheet restriction; the minimum-
weight logical operators of the sheet code are non-contractible cycles within the sheet, of
length L.
Layer structure. Each triad sheet decomposes further into L independent layers indexed
by the zero-displacement coordinate. For sheet S
xy
, edges have dz = 0, so each S
xy
edge
has a well-defined z-coordinate equal to the shared z of its two endpoints. Edges in layer
z = z
0
form a 2D toric code on a rotated L ×L square lattice. Analogous decompositions
hold for S
xz
(layered by y) and S
yz
(layered by x).
Theorem 2 (Layer decomposition). The FCC sheet code on sheet S
xy
at even L is iso-
morphic, as a stabilizer code, to L disjoint 2D toric codes, each on an L×L rotated square
lattice with L
2
data qubits, k = 2 logical qubits, and distance L. The Z-stabilizers (resp. X-
stabilizers) of the sheet code partition into L disjoint sets, one per layer; within each layer,
the rank deficiency equals 1 (one product redundancy among L
2
/2 vertex stabilizers).
Proof. Edge partition. Each S
xy
edge (v
1
, v
2
) has z(v
1
) = z(v
2
) since the displacement
vector v
2
v
1
{(±1, ±1, 0)} has dz = 0. Define layer(e) = z(v
1
). The map layer : S
xy
{0, 1, . . . , L 1} partitions the L
3
edges of S
xy
into L disjoint sets of L
2
edges each.
Stabilizer partition. A vertex Z-stabilizer at vertex v acts on the 4 sheet-S
xy
edges
incident to v, all of which have the same z-coordinate as v. Hence each vertex Z-stabilizer
is supported entirely within one layer. An analogous argument applies to octahedral void
X-stabilizers within S
xy
, since the 4 edges of an oct void restricted to S
xy
all share the
same z-coordinate as the void center.
Layer is a 2D toric code. Within layer z = z
0
, the L
2
edges connect vertices {(x, y, z
0
) :
x+y z
0
(mod 2)} via the four neighbor vectors (±1, ±1, 0). This is precisely the rotated
L × L square lattice, and the vertex Z-stabilizers and oct-void X-stabilizers on this layer
are exactly the standard 2D toric code stabilizers. The toric code on L×L has parameters
[[L
2
, 2, L]].
Rank count. The 2D toric code on L
2
data qubits has L
2
/2 vertex Z-stabilizers, sat-
isfying one redundancy (the product over all vertices is the identity). Hence per layer,
rank(H
layer
Z
) = L
2
/2 1. Across L layers, rank(H
sheet
Z
) = L · (L
2
/2 1) = (L
3
2L)/2.
The same argument applies to H
sheet
X
.
Code parameters. k = n rank(H
Z
) rank(H
X
) = L
3
2 · (L
3
2L)/2 = 2L. The
minimum-weight logical operators are the non-contractible cycles of the per-layer 2D toric
codes, each of length L. Hence d = L.
Consequence for the rank formula. Theorem 2 eliminates the need for a per-L
verification of the rank: the formula rank(H
Z
) = rank(H
X
) = (L
3
2L)/2 holds for every
even L 2.
4
2.5 Planar Variant
For deployment on planar quantum chips that do not support periodic boundary condi-
tions, each layer becomes a rotated surface code [[L
2
, 1, L]] via standard boundary engi-
neering [5]. The resulting planar sheet code has parameters
[[L
3
, L, L]] (planar boundaries). (2)
Distance d = L is preserved; encoding rate halves from 2L L.
3 Hardware Embedding: Three Sheets, Three Layers,
or One Chip
The three triad sheets are edge-disjoint: 3L
3
data qubits in total, with L
3
per sheet. We
now address the question of how to physically realize these qubits on hardware. This
question is non-trivial because the sheet code uses Θ(L
3
) data qubits to encode Θ(L)
logical qubits at distance L, and a monolithic 2D embedding of an L
3
-vertex 3D graph
cannot maintain unit-length nearest-neighbor couplings as L grows.
We discuss three deployment options, in increasing order of scalability.
3.1 Option A: Monolithic Planar Chip (Small to Moderate L)
For small to moderate L (L 8, corresponding to 1,500 data qubits total), a mono-
lithic planar quantum processor can host all three sheets via time-multiplexed syndrome
extraction. Data qubits occupy fixed positions on a planar chip; ancillas for the three
sheets are physically distinct but co-located at FCC vertex and oct-void positions; cou-
plers reconfigure between rounds to activate one sheet’s stabilizers at a time. Specifically:
Round 1: activate S
xy
couplers, measure S
xy
stabilizers
Round 2: activate S
xz
couplers, measure S
xz
stabilizers
Round 3: activate S
yz
couplers, measure S
yz
stabilizers
Wire-length scaling. On a monolithic 2D embedding of the Θ(L
3
)-qubit 3D lattice, the
average physical distance between nearest neighbors in the FCC graph scales as Θ(L
1/2
).
For Google Willow-style hardware with fixed-length couplers, this restricts the practical
regime to small L. At L = 4 (192 qubits), the embedding is straightforward; at L = 8
(1,536 qubits), some couplers will span longer distances and require active calibration. At
L 12, monolithic 2D embedding is impractical without coupler reach upgrades.
Idle penalty. While one sheet is measured, data qubits in the other two sheets idle. With
round time t, a full cycle is 3t and each data qubit idles for 2t per cycle. We quantify the
resulting threshold reduction in Section 7.
5
3.2 Option B: Three-Layer Stacked Architecture (Recommended
Deployment)
For practical deployment at L 8, we recommend a three-layer stacked architecture:
three planar quantum processors, each a K=4 device hosting one triad sheet, bonded
with through-silicon-vias (TSVs) or inter-layer capacitive couplers for triangle-mediated
cross-sheet operations.
Per-layer structure. Each of the three chips is a standard planar K=4 device. Layer
1 hosts S
xy
: L
3
data qubits on the L layers of S
xy
, laid out as L stacked rotated L × L
square lattices (Theorem 2). Each chip layer hosts L such rotated lattices, either tiled
adjacently in 2D (for modest L) or as L further-stacked sublayers (for large L).
Inter-layer couplers. Triangle ancillas sit between chip layers, coupled via short-range
vertical couplers (TSVs) to the three data qubits of their triangle, one from each chip.
These couplers are activated only during surgery operations and remain inactive otherwise.
The chip-internal K=4 connectivity is unaffected; the inter-layer couplers add only K = 1
per data qubit per active triangle, recovering the same K=4 effective constraint during
surgery as in Section 5.
Precedent. Three-layer stacked QEC architectures have been studied in recent quantum
hardware roadmaps [8, 11]. IBM’s stacked-die approach for the bivariate bicycle code
uses analogous inter-layer couplers to bridge K=6 connectivity at the chip level. The
sheet code’s three-chip architecture has lower per-chip connectivity (K=4 vs. K=6) but
a simpler inter-chip coupling structure (only triangle ancillas need vertical bonds, not
generic data qubits).
Vertical crosstalk and inter-layer isolation. The principal hardware concern for
any stacked architecture is crosstalk between layers. In the present construction, two
structural features mitigate this. First, only the triangle ancillas (a small minority of
total qubits, O(L) per surgery primitive vs. Θ(L
3
) data qubits per layer) carry inter-layer
couplers; data qubits and per-sheet ancillas have no vertical wiring, so the static memory
portion of the device is unaffected by inter-layer phenomena. Second, the inter-layer
couplers are activated only during surgery rounds via control electronics (Section 5), with
off-state isolation between layers achievable by detuning the triangle ancilla off-resonance
from the data qubits when surgery is not active. Practical isolation values are platform-
dependent: superconducting TSV couplers in stacked-die devices reach 4060 dB off-
state isolation [8]; neutral-atom and ion-trap platforms with 3D-addressable laser systems
can in principle achieve higher isolation by mechanical separation. Detailed crosstalk
budgeting for a specific hardware platform is identified as future work (Section 9.3).
3.3 Option C: Native FCC Hardware
For maximally efficient embedding, a quantum hardware platform with native 3D con-
nectivity (such as 3D-printed superconducting circuits, neutral atom arrays with 3D-
addressable laser systems, or trapped ion architectures with multi-segment traps) hosts
the full FCC lattice without the embedding penalty of options A or B. The sheet code
runs natively on such hardware with all K=4 couplings at unit physical distance. This
option is forward-looking; no current commercial platform offers it.
6
3.4 Hardware Footprint
Across all options, the qubit and ancilla counts are:
3L
3
data qubits, one per FCC edge, partitioned by sheet;
3 × L
3
/2 vertex Z-ancillas (one per (vertex, sheet) pair);
3 × L
3
/2 octahedral void X-ancillas (one per (void, sheet) pair);
L triangle ancillas per active surgery primitive (transient, can be reused).
Total: 6L
3
permanent qubits at K = 4 static connectivity per data qubit, plus O(L)
transient triangle ancillas during surgery.
3.5 Comment on the “Cross-Block” Terminology
The three triad sheets occupy the same FCC lattice geometry (in options A and B, time-
multiplexed or stacked on the same physical substrate; in option C, fully co-located in
3D). They are not spatially separated modules in the sense that bivariate bicycle code
blocks are. We use inter-sheet logical gates (or equivalently cross-sheet gates) to denote
the operations between logical qubits in distinct sheets that our triangle surgery primitive
implements. The sheets are logically distinct CSS code blocks each independently
encodes 2L logical qubits with independent stabilizer groups and independent decoders
but they are not physically separated. This intermediate regime, between surface
code lattice surgery within one substrate and bivariate bicycle inter-block gates across
separated modules, is the niche our construction occupies.
4 Triangle Algebra and Cross-Sheet Logicals
4.1 FCC Triangles
Lemma 1 (Triangle structure). Every triangle (3-cycle) in the FCC graph has one edge
in each of the three triad sheets. At lattice size L, the FCC graph contains 4L
3
triangles,
and each FCC edge participates in exactly 4 triangles.
Proof. For three mutually adjacent FCC vertices v
a
, v
b
, v
c
, the three edge-vectors v
b
v
a
,
v
c
v
a
, v
c
v
b
must each be FCC neighbor vectors. Direct case analysis on the 12
neighbor vectors shows that any three pairwise-summing-to-zero NN vectors necessarily
lie in distinct sheets. Counting: each FCC vertex is in 24 triangles; 24 · L
3
/2/3 = 4L
3
.
Each edge appears in 4L
3
· 3/(3L
3
) = 4 triangles.
7
4.2 Triangle Operators
Definition 2 (Triangle operator). For an FCC triangle T with edges e
xy
S
xy
, e
xz
S
xz
,
e
yz
S
yz
, define the Z-triangle operator
Z
T
= Z
e
xy
Z
e
xz
Z
e
yz
and similarly X
T
= X
e
xy
X
e
xz
X
e
yz
.
A single triangle Z-operator commutes with all per-sheet Z-stabilizers but anticommutes
with exactly 6 per-sheet X-stabilizers (two per sheet). Products of triangles can be chosen
to commute with all stabilizers.
4.3 Reachable Cross-Sheet Logicals
Theorem 3 (Cross-sheet reachability). Let B GF(2)
|T n
edges
be the triangle-edge inci-
dence matrix and H
X
the cross-sheet X-stabilizer matrix. Define the space of valid triangle
products
V = {m · B : m GF(2)
|T |
, H
X
(m · B)
T
= 0}.
Then dim(V mod row span(H
Z
)) = 6L 3, and every operator in this space has support
on exactly two sheets.
Verification: At L = 4: 21 logicals (out of 6L = 24), all 2-sheet, distributed as 7 per sheet
pair. At L = 6: 33 logicals (out of 36), all 2-sheet, distributed as 11 per sheet pair. The
missing 3 logicals are global homological cycles that no triangle product can form.
Theorem 4 (Per-sheet coverage). For each sheet S
i
, the projection of triangle-reachable
cross-sheet logicals onto the Z-logical space of S
i
has dimension L for each partner sheet
S
j
(j = i). The union of projections via both partners covers the full 2L-dimensional
Z-logical space of S
i
.
Verification at L = 4: Each sheet pair reaches a 4-dimensional subspace (= L) of each
sheet’s 8-dim (= 2L) Z-logical space. The two partner-pair subspaces are distinct; their
union is the full 8-dim space.
4.4 Operational Consequence
Every Z-logical (and by CSS symmetry, every X-logical) of every sheet can participate in
a triangle-mediated joint measurement with at least one partner sheet. Combined with
fresh ancilla logical qubits and standard surgery protocols [6, 7], cross-sheet CNOTs are
implementable between any two logical qubits.
8
5 Fault-Tolerant Surgery Protocol
5.1 Ancilla Placement
Each surgery primitive uses L triangles forming a localized cluster on the FCC lattice.
For each triangle T , a measurement ancilla is placed at the centroid of T ’s three vertex
positions, coupled to its 3 data qubits via short-range couplers (K = 3 at the ancilla).
Flag-qubit protocol at small L. A weight-3 measurement with single-fault propaga-
tion produces data errors of weight 2. For correctability, we require w (d + 1)/2,
equivalently d = L 5 for weight-3 measurements. For L 6, no flag qubits are needed:
the per-sheet code distance suffices for fault-tolerant triangle measurements. For L = 4,
a flag-qubit protocol [10] catches the worst-case weight-2 propagation.
5.2 K=4 Connectivity Verification
Each data qubit retains 4 physical couplers throughout. During surgery, some couplers
reconfigure from per-sheet ancillas to triangle ancillas. Specifically, at L = 4 with the
canonical 4-triangle surgery primitive:
Data qubit Triangle couplers (surgery) Per-sheet couplers retained Total K
(2, 8)
xy
, (3, 9)
xy
2 2 4
All other involved edges 1 3 4
The two xy-sheet edges shared between pairs of surgery triangles need 2 triangle couplers;
their per-sheet syndrome extraction continues at reduced bandwidth (1 Z-ancilla + 1 X-
ancilla instead of 2 of each). The other 8 data qubits use 1 triangle coupler and 3 per-sheet
couplers, maintaining full per-sheet syndrome rates. The K = 4 envelope is preserved
throughout the surgery operation.
5.3 Gate Schedule
The triangle measurements’ CNOT gates schedule via graph coloring on the conflict graph
G
conflict
(nodes: surgery triangles; edges: shared data qubits). At L = 4, the 4-triangle
primitive’s conflict graph requires 2 colors; with 3 CNOTs per triangle, the surgery oper-
ation completes in 3 × 2 = 6 time slots. For comparison, a standard per-sheet syndrome
extraction round takes 6–8 time slots.
5.4 Overhead Analysis
At lattice size L:
9
Quantity Per syndrome round (per-sheet) Per surgery operation
Ancillas (3 sheets) 3L
3
L measurement ancillas
Two-qubit gates 12L
3
3L to 5L
Time slots 68 6
Fraction of per-round cost (at L=10) < 0.5%
Surgery has subleading gate cost: O(L) two-qubit gates for the triangle measurements
compared with O(L
3
) per syndrome round for the per-sheet stabilizer extraction.
Clock-cycle and time-multiplexing impact. The triangle measurements during the
merge phase are executed in parallel with the per-sheet syndrome extraction; in a time-
multiplexed schedule (Section 3, Option A), the triangle CNOTs occupy the same time
slots as the relevant per-sheet CNOTs and do not extend the per-round wall-clock time.
The merge phase therefore costs L syndrome rounds at the same clock cycle as static
memory, for a total surgery overhead of 3L rounds vs. L rounds for memory. On a
superconducting transmon platform with 100–400 ns syndrome extraction cycles [9], the
3L overhead at L = 8 corresponds to roughly 2.410 µs of additional wall-clock time per
surgery operation, which is well below typical T
1
, T
2
decoherence times ( 100 µs) and
below typical logical clock cycles. On a neutral-atom platform with longer per-round
times ( 1 ms), the total surgery time is 24 ms at L = 8, requiring sustained coherence
over this interval; recent neutral-atom QEC demonstrations have shown coherence times
exceeding this requirement. Idle errors during the merge phase are captured in the noise
model of Section 7 via the per-round depolarizing channel.
5.5 Decoder Graph
The decoder for surgery operations operates on a combined detector graph: per-sheet
syndrome detectors (vertex Z, oct void X) plus triangle measurement detectors. Triangle-
triangle correlations arise from shared data qubits: a Z error on a shared edge flips both
triangles’ outcomes simultaneously. Standard MWPM [14] applies directly to this graph;
the matcher extends the per-sheet syndrome graph with cross-sheet edges induced by
triangle measurements [6, 7].
5.6 Boundary Deformation: Broken Stabilizers and Their Recon-
struction
A standard concern in lattice surgery is that the merge operation temporarily disrupts
the per-block stabilizer structure: some stabilizers become gauge operators during the
merge and must be reconstructed afterward. We characterize this disruption precisely for
the FCC triangle primitive.
Lemma 2 (Broken X-stabilizers per triangle). Each individual triangle T with edges
(e
xy
, e
xz
, e
yz
), one per sheet, anticommutes with exactly six per-sheet X-stabilizers: the
two octahedral voids in each sheet that contain one of the triangle’s edges. The breakdown
is two X-stabilizers in S
xy
, two in S
xz
, and two in S
yz
.
10
Proof. A triangle Z-operator Z
T
acts on three edges. Each edge e S
i
is contained in
exactly two octahedral voids of S
i
, since each FCC edge connects two oct-void neighbors.
The X-stabilizer of an oct void contains e as one of its four support edges. Therefore
Z
T
overlaps each such X-stabilizer in exactly one edge (odd), and anticommutes with it.
The six X-stabilizers (two per sheet) are distinct since they correspond to distinct oct
voids.
Gauge structure during surgery. During the multi-round merge, individual trian-
gle measurements yield outcomes that, by Lemma 2, do not commute with 6L per-sheet
X-stabilizers across the L triangles of the surgery primitive. These 6L X-stabilizer mea-
surements become gauge bits: their outcomes are correlated with the triangle outcomes
but do not constrain the merged code’s logical subspace. Verified numerically:
L Triangles in primitive Distinct X-stabs broken Per-triangle broken Net broken
4 4 12 6 0
6 6 18 6 0
The “net broken” column counts X-stabilizers with odd total flip count across the L
triangles. This is zero by construction: the triangle product
Q
T
Z
T
= Z
A
Z
B
commutes
with all X-stabilizers (Theorem 3), so each X-stabilizer is broken by an even number of
triangles in the primitive.
Post-merge reconstruction. After the d-round merge phase, the X-stabilizer outcomes
of the 6L initially-broken X-stabilizers are reconstructed from the L triangle measurement
outcomes plus the surviving stabilizer constraints. Specifically, in the post-merge code,
each previously-broken X-stabilizer’s eigenvalue is the modulo-2 sum of: (i) its pre-merge
eigenvalue, (ii) the triangle measurement outcomes of those triangles in the primitive
whose Z-operator overlaps it in odd parity, and (iii) any propagated Pauli corrections
from the surgery protocol. This is the FCC-triangle analog of the standard rough/smooth
boundary deformation in surface code lattice surgery [6]: per-sheet boundary stabilizers
are temporarily “opened” as gauges and “closed” upon completion of the surgery.
No permanent stabilizer modification. Crucially, no per-sheet stabilizer is perma-
nently modified or turned off. The weight-4 per-sheet stabilizers are measured throughout
the merge as normal; their outcomes during the merge are augmented by gauge informa-
tion from the triangle measurements, and post-merge they return to constraining the
per-sheet code with no residual modification.
Single-fault error propagation across sheets. A single fault on a triangle ancilla mid-
circuit can propagate to at most two data qubits, which may lie in different sheets. For
the canonical CNOT schedule (CNOT from data to ancilla, in the order e
xy
e
xz
e
yz
),
a Z error on the ancilla after the S
xz
CNOT and before the S
yz
CNOT propagates to one
data qubit in S
xz
and one in S
yz
. This produces a weight-2 cross-sheet error that triggers
detectors in two distinct per-sheet syndrome graphs. The decoder graph must include
edges spanning these per-sheet graphs to handle such events; this is the technical content
of Section 7’s discussion of decoding.
11
6 Distance Preservation
Theorem 5 (Merged code distance). For any even L 2 and any cross-sheet measure-
ment operator Op = Z
A
Z
B
implemented by a weight-2L triangle product, with Z
A
supported in sheet S
i
and Z
B
in sheet S
j
(i = j), the merged code formed by adding Op
as a Z-stabilizer has stabilizer rank increased by exactly 1 (consuming one logical qubit),
and the minimum-weight logical operator of the merged code has weight L. Distance is
preserved.
Proof. Stabilizer rank. Op commutes with all original X-stabilizers by Theorem 3 (it is
the product of triangle Z-operators chosen to be in the joint kernel of H
X
). Furthermore,
Op is not in the row span of H
Z
since it is a non-trivial element of the Z-logical group.
Therefore appending Op to H
Z
increases the rank by 1, and the merged code has k
merged
=
k
pre
1 logical qubits.
Minimum logical weight. The merged Z-logical group is the original Z-logical group
quotiented by the subgroup Op. Each non-trivial equivalence class has the form {g, g
Op} for a representative g in the original Z-logical group, with g not equivalent to Op
modulo the original stabilizers.
Decompose any g as g = g
xy
g
xz
g
yz
where g
s
denotes the restriction of g to sheet S
s
.
Similarly Op decomposes as Op = (Op)
i
(Op)
j
with (Op)
i
= Z
A
of weight L in S
i
and
(Op)
j
= Z
B
of weight L in S
j
. Then
wt(g) = wt(g
xy
) + wt(g
xz
) + wt(g
yz
), (3)
wt(g Op) = wt(g
k
) + wt(g
i
Z
A
) + wt(g
j
Z
B
), (4)
where k is the third sheet (k / {i, j}).
By the Layer Decomposition Theorem (Theorem 2), each non-trivial logical g
s
on sheet
S
s
has weight L (per-layer 2D toric code distance).
Since g is non-trivial in the merged code, at least one of the following holds:
[label=(f)]
1. g
k
= 0 (mod stab
S
k
), i.e., g
k
is a non-trivial logical of S
k
. Then wt(g
k
) L. Both
wt(g) and wt(g Op) contain wt(g
k
) L as a summand, so the class minimum is
L.
2. g
k
is trivial in S
k
(i.e., g
k
= 0 or a stabilizer), but g
i
is a non-trivial logical of S
i
.
Then wt(g
i
) L, so wt(g) L. For wt(g Op), the contribution wt(g
j
Z
B
) is
either L (if g
j
and Z
B
are in distinct logical classes, or if g
j
is a stabilizer leaving
the Z
B
contribution of weight L) or zero (if g
j
Z
B
(mod stab
S
j
)). In the zero
case, g Op has contributions only from g
i
Z
A
(in S
i
, weight L) and g
k
(in S
k
,
possibly weight 0). Therefore wt(g Op) L.
3. By symmetry with (b), interchanging i and j.
4. Multiple sheets contribute non-trivial logicals. Then both wt(g) and wt(g Op)
inherit contributions from at least two non-trivial per-sheet logicals, each L, so
the class minimum is L.
12
Achievability of L. The class containing Z
A
has representatives {Z
A
, Z
A
Op}. Since
Z
A
Op = Z
A
Z
A
Z
B
= Z
B
modulo per-sheet stabilizers, this class equals {Z
A
, Z
B
}
with representatives of weight L each. Therefore the minimum is achieved.
Computational verification. The proof above is independent of L. To rule out edge
cases, we additionally verified Theorem 5 by exhaustive enumeration at small L. At
L = 4, all 6L = 24 per-sheet logical generators give class minimum exactly 4, and the
same minimum holds for all
24
2
= 276 pair products. At L = 6, all 36 generators and
36
2
= 630 pair products give class minimum exactly 6. No combination produced a class
minimum below L.
6.1 X-Distance Preservation
Theorem 5 addresses the Z-logical group of the merged code, which is the side directly
modified by the cross-sheet Z-measurement Op = Z
A
Z
B
. The X-side requires a sep-
arate argument: the merged X-logical group is the subgroup of the original X-logicals
that commute with Op, modulo the original X-stabilizers (which are unchanged by the
surgery).
Theorem 6 (X-distance preservation). Under the same conditions as Theorem 5, the
merged code’s X-distance equals L (the per-sheet code distance). The merged code’s X-
logical group consists of all original X-logicals that have even overlap with the support of
Op, modulo the (unchanged) X-stabilizer group.
Proof. Setup. The original code has X-stabilizer matrix H
X
and Z-logical group L
Z
,
X-logical group L
X
. After surgery, the merged code has stabilizer matrices H
merged
X
= H
X
(unchanged) and H
merged
Z
= H
Z
{Op}. The merged X-logical group is the normalizer
of the merged stabilizer group restricted to X-type operators, modulo the merged X-
stabilizers:
L
merged
X
= {g L
X
: [g, Op] = 0}
H
X
.
The commutativity condition [g, Op] = 0 for g an X-operator and Op a Z-operator reduces
to: g has even-parity overlap with the support of Op.
Counting. The original X-logical group has 6L generators (2L per sheet for the toric
code; L per sheet for the planar variant). Op = Z
A
Z
B
where A is a Z-logical of sheet S
i
and B is a Z-logical of sheet S
j
. The X-logical X
A
(the conjugate of A in S
i
) anticommutes
with Z
A
(per-sheet anticommutation) and commutes with Z
B
(disjoint sheet supports),
so X
A
anticommutes with Op. Symmetrically, X
B
anticommutes with Op. All other
generators of L
X
commute with Op: per-sheet X-logicals in sheets S
k
, k = i, j have
disjoint support from Op and commute trivially; per-sheet X-logicals in S
i
or S
j
other
than X
A
or X
B
are independent of X
A
(resp. X
B
) and the per-sheet anticommutation
structure ensures they commute with the relevant component of Op.
The merged X-logical group has 6L 2 commuting generators from the original 6L,
but the product X
A
· X
B
(sum of two anticommuting generators) commutes with Op
(anticommutes with each component, so even total overlap). This product is a non-trivial
X-logical of the merged code, representing the consumed-Z logical’s X-conjugate. Thus
13
the merged X-logical group has 6L 1 generators, consistent with k
merged
= 6L 1 (one
Z-logical consumed by adding Op).
Minimum weight. The merged X-logical generators are of two types:
[label=(a)]
1. Original per-sheet X-logicals that commute with Op: weight L each (per-sheet 2D
toric distance, Theorem 2).
2. The new generator X
A
·X
B
: weight 2L (disjoint supports across sheets S
i
and S
j
).
The minimum weight over all generators is L. Linear combinations of generators yield at
least weight L by the same layer-decomposition argument as Theorem 5: each non-trivial
component on a single sheet contributes weight L. Therefore the merged X-distance
equals L.
Computational verification. We confirmed Theorem 6 by enumerating all 6L X-logical
generators at L = 4 (24 generators, weight 4 each) and L = 6 (36 generators, weight 6
each), identifying the anticommuting pair (exactly 2 generators anticommute with Op at
both L values, confirming the count 6L2 of single-generator commuters), and confirming
that the minimum weight among the merged X-logical generators is exactly L in each case.
The merged X-distance is therefore L, matching the merged Z-distance from Theorem 5.
Combined merged-code distance. Theorems 5 and 6 together establish: the merged
code distance (minimum over all non-trivial logical operators of the merged code) equals
L. Distance is preserved under the surgery operation for both Z-side and X-side errors.
7 Threshold Simulation
We characterize the surgery operation threshold using a custom Stim circuit [13] with
explicit triangle measurements, decoded with MWPM via PyMatching [14]. The static
memory threshold is reported as a complementary baseline.
7.1 Custom Stim Circuit Construction
We construct a Stim circuit that implements the full surgery operation: two triad sheets
(S
xz
and S
yz
in the L=4 surgery primitive), their per-sheet vertex Z-stabilizer and oct-void
X-stabilizer measurements, the L triangle ancilla measurements during the merge phase,
and the auxiliary S
xy
data qubits required by the triangle measurements. The circuit
comprises three phases of d syndrome rounds each (pre-merge, merge, post-merge), with
the triangle measurements active only during the merge phase. During merge, the 6L X-
stabilizers broken by the individual triangle measurements (Lemma 2) have their detectors
skipped, in accordance with the gauge structure of Section 5.6.
The circuit and its detector error model (DEM) are constructed reproducibly from the
code at the URL given in Section 10. Key statistics:
14
L Total qubits Triangles Broken X-stabs DEM error mechanisms
4 324 4 12 (8 in S
xz
+ S
yz
, 4 in S
xy
) 23,946
6 750 6 18 (12 in S
xz
+ S
yz
, 6 in S
xy
) 70,434
8 2,568 8 24 (16 in S
xz
+ S
yz
, 8 in S
xy
) 539,215
At L = 4, the DEM decomposes into 23,946 error mechanisms via Stim’s
decompose_errors=True option, which decomposes hyperedges into graph-like edges
where possible. At L = 8, the corresponding DEM has 539,215 error mechanisms. The
resulting graphs are compatible with PyMatching’s MWPM decoder at all three sizes.
7.2 Surgery Operation Threshold
We perform Z-basis logical memory experiments with the joint observable Z
A
Z
B
(the
cross-sheet logical that the surgery primitive measures), running 3d syndrome rounds
(pre-merge d, merge d, post-merge d) followed by destructive Z-measurement. Three code
distances were measured: L = 4, 6, 8.
p (%) L = 4 L = 6 L = 8
0.10 5.0 × 10
3
< 2 × 10
3
(0/500) < 10
3
(0/1000)
0.20 1.7 × 10
2
1.5 × 10
3
< 10
3
(0/1000)
0.30 3.9 × 10
2
8.0 × 10
3
< 10
3
(0/1000)
0.40 5.5 × 10
3
0.50 9.7 × 10
2
3.8 × 10
2
2.5 × 10
2
0.60 4.3 × 10
2
0.70 1.8 × 10
1
1.1 × 10
1
8.7 × 10
2
0.80 2.3 × 10
1
1.6 × 10
1
1.2 × 10
1
0.90 2.8 × 10
1
2.2 × 10
1
2.0 × 10
1
1.00 3.08 × 10
1
2.77 × 10
1
2.71 × 10
1
1.10 3.3 × 10
1
3.2 × 10
1
3.4 × 10
1
1.20 3.6 × 10
1
4.0 × 10
1
4.2 × 10
1
1.50 4.4 × 10
1
4.6 × 10
1
4.9 × 10
1
Pairwise crossings and threshold estimate. The pairwise crossings of L = 4 vs
L = 6, L = 4 vs L = 8, and L = 6 vs L = 8 give three independent estimates:
Crossing p
cross
L = 4 vs L = 6 1.109%
L = 4 vs L = 8 1.069%
L = 6 vs L = 8 1.024%
The three crossings span 1.02% to 1.11%. We report a finite-size threshold estimate of
p
surgery
th
= 1.07% ±0.05%, where the band reflects the spread of pairwise crossings. Below
threshold, the d-scaling is monotonic with L: at p = 0.5%, the logical error rate is 9.7%
15
at L = 4, 3.8% at L = 6, and 2.5% at L = 8, decreasing as expected for a fault-tolerant
operation in the below-threshold regime.
Shot counts. Each row uses at least 1,000 shots; the near-threshold rows (p = 0.9%
1.2%) use up to 8,000 shots per point for tighter confidence intervals. The standard error
on each rate is the binomial
p
p(1 p)/N and is plotted as error bars in Figure 3.
This is a finite-size threshold estimate from a custom Stim circuit with ex-
plicit triangle measurements, not a proxy. The DEM is constructed via Stim’s
decompose_errors=True option and decoded by PyMatching’s MWPM. Each circuit is
reproducibly built from the cached surgery primitive at the corresponding L.
This is the surgery threshold under circuit-level depolarizing noise with
MWPM decoding on the actual custom Stim circuit, not a proxy. Shot counts
per data point range from 1,000 to 8,000, with the larger counts used for near-threshold
rows. Below threshold, the d-scaling is strong: at p = 0.1%, L = 4 surgery fails at 5×10
3
error rate while L = 6 and L = 8 surgery have zero observed errors in 500–1000 shots
(rate < 10
3
).
7.3 Comparison with Proxy Estimate
A natural proxy for the surgery threshold is a single rotated surface code at distance L run
for 3d rounds (the same total syndrome budget as the surgery protocol), giving p
proxy
th
0.5%. The custom Stim circuit with L = 4, 6, 8 finite-size scaling gives a substantially
higher threshold (1.07% ± 0.05%). The proxy is conservative in three ways:
1. The proxy implicitly assumes the surgery operation suffers the same logical error
rate as 3d rounds of memory. In reality, the joint observable Z
A
Z
B
is more robust
than a single non-contractible cycle: it has support on 2L data qubits (twice as
many) but with the cross-sheet structure exploiting independence between sheets’
error processes.
2. The proxy used a single surface code as a stand-in for the full FCC sheet code. The
actual sheet code’s layer decomposition (Theorem 2) means the per-sheet effective
code is L parallel 2D toric codes, and an X-error on one layer does not propagate
to other layers, allowing the decoder to localize errors per-layer.
3. The proxy did not account for the triangle measurements’ role in providing ad-
ditional syndrome information during the merge phase. The triangle measure-
ments act as auxiliary syndrome bits, supplementing the per-sheet syndromes during
merge.
7.4 Static Memory Threshold (Single-Sheet Baseline)
For comparison with the surgery threshold, we also measure the single-sheet static memory
threshold using a custom circuit (no triangle measurements). Z-basis memory experiment
with d syndrome rounds on the sheet S
xy
at L=4 and L=6:
16
p (%) L = 4, d = 4 L = 6, d = 6
0.10 8.0 × 10
3
6.0 × 10
4
0.30 2.6 × 10
2
4.4 × 10
3
0.50 5.3 × 10
2
1.9 × 10
2
0.80 9.0 × 10
2
5.4 × 10
2
1.20 1.6 × 10
1
1.5 × 10
1
The single-sheet static threshold is p
static
th
1.0% on this custom circuit, consistent with
the surface code literature.
7.5 Decoder: What’s Verified and What Remains Open
The custom Stim circuit of Section 7.1 produces a decomposable detector error model
with 23,946 error mechanisms at L = 4 surgery. We characterize the decoder situation
honestly.
What this paper verifies.
1. The DEM constructs cleanly via Stim’s decompose_errors=True option, indicating
that the hyperedges arising from cross-sheet error propagation (single faults on
triangle ancillas propagating to data qubits in two distinct sheets, Section 5.6)
admit a graph-like decomposition into 2-edges.
2. MWPM decoding via PyMatching is applicable to the decomposed graph at L =
4, 6, 8, giving the finite-size threshold estimate p
surgery
th
= 1.07% ± 0.05%.
3. The broken-X-stabilizer detector handling described in Section 5.6 (skip the 6L
broken X-stab detectors during merge phases) is computationally consistent and
produces a deterministic detector pattern when combined with the triangle mea-
surement outcomes.
What remains open.
1. Whether BP+OSD [15] would yield a higher threshold than MWPM on the same
DEM. The hyperedge decomposition is lossy in general; specialized decoders may
extract additional information.
2. Behavior at larger code distances (L 8). The L=6 vs L=4 crossover gives a
threshold estimate, but the full distance-scaling exponent requires L=8 and beyond,
which is computationally expensive (the L=8 surgery circuit has 1600 qubits).
3. Behavior of the three-sheet code (rather than the two-sheet circuit we built). The
simulation here uses two sheets (S
xz
, S
yz
) where the surgery primitive’s logical ob-
servable lives. The third sheet (S
xy
) provides only auxiliary data qubits for the
triangle measurements. A full three-sheet circuit would add per-sheet error correc-
tion on S
xy
; we expect the threshold to remain similar since the surgery observable
does not depend on S
xy
logical state.
17
4. Decoder optimization specific to the FCC sheet code’s symmetries (octahedral group
of order 48) may yield further improvement.
Conclusion. The custom Stim circuit confirms the surgery operation is fault-tolerant
under MWPM decoding at thresholds compatible with current quantum hardware. The
DEM hyperedge structure decomposes cleanly into graph-like edges, addressing the prin-
cipal concern of Section 5.6’s cross-sheet fault analysis.
8 Comparison with State of the Art
8.1 What This Code Is, and What It Is Not
Before the detailed comparison, we state plainly where this code sits in the QEC landscape.
The sheet code is not a constant-rate qLDPC code. Its encoding rate is k/n =
2L/L
3
= 2/L
2
, which vanishes as L , the same asymptotic scaling as the surface
code’s 1/d
2
. Bivariate bicycle codes and recent “good” qLDPC codes achieve constant or
growing rates at the cost of higher connectivity (K=6 or higher, with long-range couplers).
The sheet code does not solve the surface code’s asymptotic rate problem.
What the sheet code does solve. The sheet code addresses a different limitation:
the absence of native cross-block logical gates at K=4 connectivity. Surface code lattice
surgery operates within a single planar substrate. Bivariate bicycle codes target inter-
block logical gates between physically separated modules, but at K=6 and with active
research on the inter-block gate protocols. The sheet code offers a third option: inter-
sheet logical gates between logically distinct but co-located CSS code blocks, implemented
natively at K=4 via local triangle measurements. The trade is explicit: surface-code-level
rate in exchange for surface-code-level connectivity with native inter-sheet gates.
When to choose this code. For workloads dominated by frequent inter-block logical
operations on near-term K=4 hardware, where the rate cost of surface-code-style scaling
is acceptable in exchange for cheap inter-block gates, the sheet code is a candidate. For
workloads dominated by storage of many logical qubits with minimal inter-block traffic,
bivariate bicycle codes remain preferable provided K=6 hardware is available.
8.2 Quantitative Comparison Table
8.3 Where the Sheet Code Wins
1. Logical density at K=4 with inter-sheet gates. The sheet code’s toric variant
encodes 2L logicals per sheet, twice the rate of an equivalent number of K=4 rotated
surface code patches. With cross-sheet surgery via triangles, all 6L logicals on three
sheets are mutually addressable for joint Pauli measurements a feature surface code
only provides within a single patch’s lattice surgery zone.
2. Single-chip K=4 deployment with three sheets. On hardware with K = 4 static
connectivity (e.g., Google Willow [8]), the three sheets sit on the same chip with recon-
18
Table 1: Comparison of CSS codes at code distance d (or L). Rate is k/n
data
. “Wraps”
indicates periodic boundary couplers required. “Cross-block” denotes logical gates between
independent code blocks. The Gross BB code uses fixed d = 12, k = 12, n = 144.
Code n
data
k d Rate K Cross-block gates Threshold
Rotated surface [5] d
2
1 d 1/d
2
4 Lattice surgery 0.71.0%
2D toric [4] 2d
2
2 d 1/d
2
4 (wraps) Lattice surgery 0.7%
3D toric [3] 3L
3
3 L 1/L
3
6 Limited
Gross BB [11] 144 12 12 1/12 6 Active development [12] 0.7%
Two-gross BB [11] 288 12 18 1/24 6 Active development
Full FCC [1] 3L
3
2L
3
+2 3 2/3 12
Sheet code (planar, 1 sheet) L
3
L L 1/L
2
4 Triangle surgery 1.0%
Sheet code (toric, 1 sheet) L
3
2L L 2/L
2
4 (wraps) Triangle surgery 1.0%
Sheet code (toric, 3 sheets, MUX) 3L
3
6L L 2/L
2
4 (wraps) Triangle surgery 1.07% ± 0.05%
figurable couplers handling the time-multiplexing and the surgery rounds. The bivariate
bicycle architecture explicitly requires K = 6 with some long-range couplers [12].
3. Inter-sheet gates as a native primitive. Surface code lattice surgery operates
within a single 2D substrate; gates between physically separate patches require either
expensive routing or transversal protocols. The sheet code’s cross-sheet surgery uses the
FCC lattice’s intrinsic 3D triangle structure, giving inter-sheet logical gates that share
only lattice positions.
4. Code-distance scaling. Unlike the full [[3L
3
, 2L
3
+2, 3]] FCC code’s fixed d = 3,
the sheet code achieves growing d = L. Compared to the 3D toric code’s d = L at fixed
k = 3, the sheet code reaches k = 2L per sheet at the same distance.
8.4 Where the Sheet Code Loses
1. Asymptotic rate vs. qLDPC. The sheet code’s rate is Θ(1/d
2
), vanishing as d
the same asymptotic scaling as the surface code. Bivariate bicycle codes achieve Θ(1)
rate at distances d = Θ(
n); recent “good” qLDPC constructions achieve Θ(1) rate at
d = Θ(n). At any large fixed distance, the gap is substantial: at d = 12, the Gross BB
code’s rate of 1/12 is approximately 12× higher than the sheet code’s rate at the same
distance. The sheet code does not compete with bivariate bicycle on rate, and we do not
claim otherwise.
2. Toric variant requires wrap couplers. The full 2L logicals per sheet is available
only on the torus; the planar variant has half the rate (L logicals per sheet at distance
L).
3. Partial logical reachability via triangles. Triangle products span 6L 3 of the
6L cross-sheet logicals; three specific global-wrap correlations are unreachable. CNOTs
between arbitrary logical qubit pairs require routing through ancillas in the right sheets
but are always implementable.
4. Threshold simulation uses MWPM on a decomposed DEM. The custom Stim
circuit DEM decomposes hyperedges via Stim’s default option; specialized decoders (e.g.,
19
BP+OSD) may yield further threshold improvement.
8.5 The Practical Niche
The sheet code’s value proposition is most clear in the following regime:
Hardware with K = 4 static connectivity (Google Willow, future deployments com-
patible with surface code).
Code distances in the range d {6, 8, 10}: small enough that bivariate bicycle’s
d 12 overhead is not justified, large enough that the sheet code’s distance scaling
matters.
Workloads that require frequent inter-sheet logical operations, where surface code
patches’ independent existence would force expensive routing.
For longer-term, larger-scale fault tolerance (d 12), bivariate bicycle codes are likely the
better choice if K = 6 hardware becomes available. For the near term on Google Willow
and similar K = 4 chips, the sheet code provides storage and inter-sheet gates within the
existing connectivity envelope.
9 Discussion
9.1 Universal Gate Set
Joint Pauli measurements via triangle surgery provide CNOTs (via standard merge-split
protocols with ancilla logical qubits) and arbitrary Clifford gates [6, 7]. Non-Clifford gates
(such as T ) require magic state injection or distillation; we expect the FCC lattice’s high
symmetry group (octahedral, order 48) to support efficient magic state protocols, but a
detailed analysis is left to future work.
9.2 Planar-Boundary Surgery
The triangle algebra of Section 4 is derived in the toric setting (periodic boundary con-
ditions). Real hardware deployments use planar variants with rough/smooth boundaries.
We now treat planar-boundary surgery explicitly, identifying the modifications needed and
showing that the surgery primitive remains operational with at most a constant additional
overhead.
Planar per-sheet construction. For each layer of S
xy
at planar boundaries: the rotated
L×L 2D toric code becomes a rotated surface code with two pairs of boundaries, one rough
(Z-type) and one smooth (X-type). Standard surface-code boundary engineering [5, 7]
applies. Per-sheet logical count drops from 2L (toric) to L (planar): one logical per layer
instead of two. Distance is preserved at L. The three-sheet code with planar boundaries
encodes 3L logical qubits across the three sheets.
20
Boundary triangles. An FCC triangle lying entirely in the interior of all three sheets
has its three edges in the bulk of S
xy
, S
xz
, S
yz
. A boundary triangle has at least one edge
truncated by a planar boundary in at least one sheet. Counting: at lattice size L with
planar boundaries, the number of FCC triangles is reduced from 4L
3
to approximately
4L
3
O(L
2
), with the O(L
2
) boundary triangles concentrated within O(1) rows of each
sheet boundary.
Surgery primitives in the planar setting. The cross-sheet surgery primitive at L = 4
(Section 5) uses 4 triangles at vertices {0, 1, 2, 3, 8, 9} of the FCC lattice. In the planar
setting, these vertices must lie in the bulk of all three sheets. For L 6, such an interior
region always exists: the bulk has Θ(L
3
) vertices and the surgery primitive only requires 6
vertices, with the choice of vertex constrained only by the requirement that all 4 triangles
be interior (no boundary truncation). This is satisfied automatically by choosing the
surgery cluster centered in the bulk.
Boundary-aware surgery primitives. For surgeries between logical qubits whose rep-
resentative operators reach the boundary, a modified primitive is needed. Two approaches:
1. Routing through interior ancillas. Use an ancilla logical qubit in the bulk to mediate
the gate, performing two surgeries (one between each target and the ancilla) through
interior triangles. This adds at most one round of merge-split overhead.
2. Boundary-truncated triangles. A boundary triangle with one of its edges truncated
supports a weight-2 effective Z-operator on the two remaining edges. Such truncated
triangles can be used in the surgery primitive at the cost of reduced cross-sheet
coupling per triangle, requiring more triangles to achieve the same joint logical
effect. The threshold is unaffected (the truncated triangle is still a weight- 3
check) but the primitive’s qubit footprint grows by a constant factor.
No fundamental obstruction. The CSS condition H
X
H
T
Z
= 0 is preserved by planar
boundary terminators (standard surface-code result). The triangle algebra of Section 4
applies layer-by-layer within the bulk; the Layer Decomposition Theorem (Theorem 2)
extends to planar layers by replacing each 2D toric code with a 2D rotated surface code,
giving per-sheet distance L and per-sheet logical count L. The minimum-weight cross-
sheet operator (Theorem 5) remains of weight 2L when constructed from interior triangles.
Computational verification (planned). Explicit numerical verification of the planar-
boundary surgery protocol at L = 4, 6, 8, with rough/smooth boundary alternation match-
ing standard surface-code conventions, is reserved for the companion implementation pa-
per accompanying the custom Stim circuit construction (Section 7.5).
9.3 Limitations and Open Problems
Higher distances (L 10). The current threshold is established by finite-size
scaling across L = 4, 6, 8. Extension to L = 10 would tighten the confidence band
but requires substantially larger compute time per point (the surgery circuit grows
as L
3
1,000 qubits at L = 10). The cached-primitive infrastructure (Section 7.1)
extends to L = 10 but was not exercised in this paper.
21
Full logical-channel benchmarking. The current Section 7 simulation measures
the joint observable Z
A
Z
B
. A complete characterization of the merge-split logical
operation requires preparing logical inputs, performing the merge, measuring the
joint observable, performing the split, and verifying all remaining logical observables.
The full CNOT truth-table test (or Pauli transfer behavior) is identified as future
work.
Explicit planar-boundary numerical verification. Section 9.2 treats planar-
boundary surgery analytically; full numerical simulation of planar variants at L =
4, 6, 8 with rough/smooth boundaries and comparison to the toric data is identified
as future work.
X-side surgery. The current treatment focuses on Z-side cross-sheet measurement
(Op = Z
A
Z
B
). The CSS-symmetric X-side primitive (X
A
X
B
via triangle
X-measurements) is described in the protocol but not separately simulated.
Decoder improvements. Whether BP+OSD [15] or a specialized hypergraph de-
coder yields higher threshold than the decomposed-edge MWPM used here is open;
the broken-X-stab structure (Section 5.6) suggests specialized decoders matched to
the surgery’s gauge structure may improve performance.
Magic state distillation. The octahedral symmetry group (order 48) of the FCC
lattice suggests efficient distillation protocols, but a detailed construction is open.
Three-layer stacked hardware modeling. Physical characterization of the inter-
layer triangle couplers (TSV fidelity, crosstalk, latency) and their effect on the overall
threshold; treated only structurally in this work.
Comparison with bivariate bicycle code surgery. BB-code modular gates
target different connectivity assumptions (K = 6 with modular long-range couplers)
and remain an active architectural-development area [12]. A side-by-side fault-
tolerant comparison at equivalent code distance awaits the further development of
BB inter-block protocols.
Logical Clifford gates beyond CNOT. Transversal S gates, Hadamards, and
similar gates require either lattice surgery in conjugate bases or other protocols; not
all are spelled out in this paper.
10 Common Objections and Responses
We address objections a reviewer is likely to raise. Each response is meant to clarify the
contribution and to delineate honestly what the present paper does and does not establish.
“Isn’t a single triad sheet just L stacked 2D toric codes? Where is the novelty?”
Yes, by Theorem 2, the per-sheet static memory is exactly L parallel 2D toric codes.
The novelty of the present paper is not the per-sheet static memory but the cross-sheet
primitive: weight-3 FCC triangle measurements that couple data qubits in three distinct
sheets simultaneously, implementing a joint Pauli measurement between logical qubits
located in different sheets while preserving K=4 static connectivity. This primitive does
not exist in independent 2D toric codes.
22
“Is the 1.07% figure a real threshold?” It is a finite-size threshold estimate derived
from logical error rates at L = 4, 6, 8 measured on the full custom Stim circuit. The three
pairwise crossings (L = 4 vs L = 6 at 1.109%, L = 4 vs L = 8 at 1.069%, L = 6 vs
L = 8 at 1.024%) give p
surgery
th
= 1.07% ± 0.05%. Shot counts range from 1,000 per point
(low-statistics, low-p rows) to 8,000 per point (near-threshold rows). Extending to L = 10
would tighten the band but is computationally expensive and not performed here. The
threshold is significantly higher than the proxy estimate of 0.5% reported in Section 7.
“Does the surgery preserve the full code distance, not just the Z-side?” Yes.
Theorem 5 gives the Z-distance preservation proof and Theorem 6 gives the X-distance
preservation proof. Both are general in L via the layer decomposition. Computational
sanity checks at L = 4, 6 confirm both Z and X distances equal L after the merge.
“Does the current threshold simulation benchmark the full merge-split gate,
or only the joint parity measurement?” The current simulation measures only the
joint observable Z
A
Z
B
failure under MWPM decoding, i.e., the parity-measurement
aspect of the surgery. A full logical-channel benchmark (prepare logical inputs, merge,
measure, split, verify remaining logical observables, compute Pauli transfer matrix) is
identified as future work in Section 9.3. This is a real gap that a fully fault-tolerant gate
characterization will need to close.
“Does this beat bivariate bicycle codes?” Not on rate. The sheet code has rate
Θ(1/L
2
), the same scaling as surface code, vanishing as L . BB codes achieve con-
stant rates and are the right answer when storage of many logical qubits is the dominant
cost. The sheet code targets a different design point: K=4 planar connectivity (matching
existing hardware like Google Willow) with native co-located inter-sheet gates via local
triangle measurements. BB codes require K=6 with modular logical operation protocols
under active architectural development. The two codes are different tools for different
workloads.
“Why claim a ‘niche’ when 3D toric codes also have inter-block structure?”
The full 3D toric code (or full FCC code) at K=6 or K=12 is not surface-code-compatible;
the niche we claim is specifically K=4 planar connectivity, the connectivity envelope of
standard transmon processors. The sheet code is one of few CSS codes that achieves
cross-block gate operations natively while staying within this hardware envelope.
“Is the planar-boundary case verified numerically?” Not yet. The planar treat-
ment in Section 9.2 is analytical: we identify which triangles are interior, how boundary
triangles truncate to weight-2 effective Z-operators, and how a routing protocol through
interior ancillas accommodates boundary-touching logicals. Numerical verification at
L = 4, 6, 8 with explicit rough/smooth boundaries is identified as a future task. The
toric case is what the threshold simulation actually measured.
All computational results in this paper are reproducible. The reference implementation
consists of the following files in the SSMTheory monorepository, each linked directly:
Construction and analysis:
sheet_code_fcc_lattice.py FCC lattice construction, stabilizer matrices, tri-
angle enumeration
sheet_code_gf2.py GF(2) linear algebra utilities
23
sheet_code_surgery.py cross-sheet logical analysis, distance verification
sheet_code_test_construction.py unit tests verifying all L = 4, 6 claims
sheet_code_run_gates.py reproduces Section 4 (triangle algebra)
sheet_code_run_distance.py reproduces Section 6 (distance preservation, L =
4, 6 verification)
sheet_code_run_threshold.py proxy threshold simulation (used in Section 7
for comparison)
sheet_code_generate_figures.py generates Figures 14
Custom Stim circuit and threshold measurement:
sheet_code_custom_stim.py custom Stim circuit construction for a single sheet
of the FCC sheet code (Section 7.1)
sheet_code_custom_surgery.py 2-sheet surgery circuit (L=4, L=6) with ex-
plicit triangle measurements and broken-X-stabilizer detector handling per Lemma 2
sheet_code_find_primitive_fast.py accelerated surgery-primitive finder that
avoids the O(n
2
) pair search, making L 8 tractable
sheet_code_cached_surgery.py surgery circuit builder using cached primitives,
used for L = 8 measurements
surgery_primitive_L8.json cached L = 8 surgery primitive (8 triangle indices,
weight-16 cross-sheet Z-operator)
sheet_code_sweep_one.py single-point threshold driver
sheet_code_surgery_threshold.py threshold sweep driver
sheet_code_fss_analysis.py finite-size scaling analysis: pairwise crossings,
summary tables
surgery_threshold_results.json raw measured data backing every numerical
entry in the Section 7 threshold table; produced by the sweep driver
threshold_summary.csv summary CSV: per-(L, p) aggregate rates and standard
errors
Running sheet_code_test_construction.py reproduces every analytical/algebraic
claim in the paper (parameter counts, triangle algebra dimensions, dis-
tance verification at L = 4, 6, broken-X-stabilizer enumeration). Running
sheet_code_surgery_threshold.py reproduces the threshold data in Section 7
and Figure 3.
24
11 Conclusion
We have presented an integrated CSS code architecture on the Face-Centered Cubic lat-
tice. The static sheet code [[L
3
, 2L, L]] provides efficient storage at K=4 connectivity; the
cross-sheet triangle surgery primitive provides fault-tolerant joint Pauli measurements be-
tween logical qubits in different sheets, also at K=4 connectivity. The combination occu-
pies a previously unfilled niche in the QEC code landscape: K=4 hardware compatibility
(matching surface code, deployable on Google Willow and similar chips) with inter-sheet
logical gates as a native primitive between co-located CSS code blocks a different design
point from surface code lattice surgery (confined within one substrate) and bivariate bicy-
cle codes (which target K=6 modular gates between physically separated blocks, an area
of active architectural development). Under circuit-level depolarizing noise on a custom
Stim circuit with explicit triangle measurements, finite-size scaling across L = 4, 6, 8 gives
a surgery threshold p
surgery
th
= 1.07% ± 0.05%. The construction is verified analytically
(with computational sanity checks at L = 4, 6, 8) for code parameters and Z/X distance
preservation under the merge operation.
12 Declarations
Clinical trial registration: not applicable. This study does not involve a clinical trial.
Consent to Publish declaration: not applicable.
Ethics and Consent to Participate declarations: not applicable. This study does
not involve human participants, human data, or animals.
Competing interests: The author declares no competing interests.
Funding: This work received no external funding.
Author contributions: R.K. is the sole author. R.K. conceived the construction, per-
formed the mathematical analysis and computational verification, designed and imple-
mented the simulation code, generated the figures, and wrote the manuscript.
Data and code availability: All computational results are reproducible via the open-
source reference implementation described in Section 10. The complete reference im-
plementation, including the custom Stim circuit construction and the raw threshold-
measurement data (surgery_threshold_results.json), is hosted at https://github.
com/raghu91302/ssmtheory with each file linked individually in Section 10.
References
[1] R. Kulkarni, arXiv:2603.20294 (2026).
[2] R. Kulkarni, “Flag-assisted error correction on the FCC lattice,” Zenodo (2026).
doi:10.5281/zenodo.19200672
[3] E. Dennis, A. Kitaev, A. Landahl, and J. Preskill, J. Math. Phys. 43, 4452 (2002).
25
[4] A. Yu. Kitaev, Ann. Phys. 303, 2 (2003).
[5] A. G. Fowler et al., Phys. Rev. A 86, 032324 (2012).
[6] C. Horsman, A. G. Fowler, S. Devitt, and R. Van Meter, New J. Phys. 14, 123011
(2012).
[7] D. Litinski, Quantum 3, 128 (2019).
[8] A. Eickbusch et al., Nature Phys. 21, 1994 (2025).
[9] R. Acharya et al. (Google Quantum AI), “Quantum error correction below the surface
code threshold,” Nature 638, 920 (2025).
[10] R. Chao and B. W. Reichardt, Phys. Rev. Lett. 121, 050502 (2018).
[11] S. Bravyi, A. W. Cross, J. M. Gambetta, D. Maslov, P. Rall, T. J. Yoder, Nature
627, 778 (2024).
[12] T. J. Yoder et al., “Tour de gross: A modular quantum computer based on bivariate
bicycle codes, arXiv:2506.03094 (2025).
[13] C. Gidney, Quantum 5, 497 (2021).
[14] O. Higgott, ACM Trans. Quantum Comput. 3, 1 (2022).
[15] P. Panteleev and G. Kalachev, “Degenerate quantum LDPC codes with good finite
length performance,” Quantum 5, 585 (2021).
[16] R. Kulkarni, “Sheet code with cross-sheet triangle surgery: reference implemen-
tation,” (2026). Primary verification script: https://github.com/raghu91302/
ssmtheory/blob/main/sheet_code_test_construction.py (all files linked in Sec-
tion 10).
[17] A. R. Calderbank and P. W. Shor, Phys. Rev. A 54, 1098 (1996).
26
0
1
x
0
1
y
0
1
z
v
a
v
b
v
c
Every FCC triangle has one edge per triad sheet
sheet
S
xy
sheet
S
xz
sheet
S
yz
Figure 1: Every FCC triangle has exactly one edge in each triad sheet. The three edge
vectors of a triangle {v
a
, v
b
, v
c
} partition naturally across S
xy
, S
xz
, S
yz
. This geometric fact
is the foundation of the cross-sheet surgery primitive: a single weight-3 Pauli measurement
on a triangle simultaneously couples data qubits across all three sheets.
27
0.0
0.2
0.4
0.6
0.8
1.0
x
0.0
0.2
0.4
0.6
0.8
1.0
y
0.0
0.5
1.0
1.5
2.0
2.5
3.0
z
v
0
v
1
v
2
v
3
v
8
v
9
T0
T4
T24
T28
Surgery primitive at
L
= 4: 4 triangles spanning two sheets
(weight-8 joint
Z
A
Z
B
measurement)
S
xy
S
xz
S
yz
triangle ancilla
Figure 2: Surgery primitive at L = 4. Four FCC triangles T
0
, T
4
, T
24
, T
28
share two xy-
sheet edges, (v
2
, v
8
) and (v
3
, v
9
), which cancel in the product. The net operator has weight
8 with support on 4 edges in S
xz
and 4 edges in S
yz
, implementing Z
A
Z
B
where A is
a logical of sheet xz and B is a logical of sheet yz. Red stars indicate triangle ancilla
positions at each triangle’s centroid.
28
0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
Physical error rate
p
(%)
10
3
10
2
10
1
Logical error rate (per surgery operation)
Surgery threshold: custom Stim circuit with explicit triangle measurements
L
= 4, 6, 8 finite-size scaling; 3
d
each
rounds total; MWPM decoding
Pairwise crossings:
p
th
= 1.07% ± 0.05%
L=4 (d_each=4)
L=6 (d_each=6)
L=8 (d_each=8)
Figure 3: Surgery threshold from the custom Stim circuit with explicit triangle measure-
ments at L = 4, 6, 8. Each data point uses the full surgery protocol (d
each
= L rounds of
pre-merge, merge with triangle measurements, and post-merge), decoded by MWPM via
PyMatching on the decomposed detector error model. Shot counts: L = 4 uses 1,000–
8,000 shots per point; L = 6 uses 500–6,000 shots; L = 8 uses 1,000–6,000 shots. The
three pairwise crossings L = 4 vs L = 6, L = 4 vs L = 8, L = 6 vs L = 8 occur at
p = 1.109%, 1.069%, 1.024% respectively (gray band), giving a finite-size threshold esti-
mate p
surgery
th
= 1.07% ± 0.05%. Below threshold (e.g., p = 0.5%), the logical error rate
decreases monotonically with L.
29
0 10 20 30 40 50 60 70 80
Logical qubits
k
Surface (rotated)
2D toric
3D toric
Gross BB
Sheet code, 1 sheet (planar)
Sheet code, 1 sheet (toric)
Sheet code, 3 sheets (toric)
K
= 4 | cross-block: Within patch
K
= 4 +
wraps
| cross-block: Within patch
K
= 6 | cross-block: Limited
K
= 6 +
L
| cross-block: Open problem
K
= 4 | cross-block: Yes (triangles)
K
= 4 +
wraps
| cross-block: Yes (triangles)
K
= 4 +
wraps
| cross-block: Yes (triangles)
Code comparison at distance 8 12
Figure 4: Logical qubits at code distance 812 across CSS code families. The sheet
code variants (green) offer either L logicals per sheet (planar) or 2L logicals (toric);
deploying all three sheets (whether monolithic 2D, three-layer stacked, or native 3D) gives
6L = 48 logicals at d = 8. Connectivity K and inter-sheet/inter-block gate availability
are annotated for each code.
30