1 Introduction

The integration of quantum computing with machine learning may offer the potential to address the increasing demand for computational power and efficiency across a wide range of complex tasks within the field of machine learning (Biamonte et al. 2017; Cong et al. 2019; Havlíček et al. 2019; Schuld and Killoran 2019; Beer et al. 2020). Quantum machine learning (QML) algorithms aim to exploit the fundamental quantum phenomena of superposition and entanglement that are unavailable to classical machine learning algorithms. By harnessing these unique properties of quantum mechanics, QML may enable computational speed-ups, improved model performance for certain classes of problems (Lloyd et al. 2013; Rebentrost et al. 2014; Lloyd et al. 2014), including enhanced pattern recognition through the use of quantum feature maps that lead to quantum speed-ups (Liu et al. 2021; Wang et al. 2025), and greater robustness to adversarial attacks (Wu et al. 2023; West et al. 2023, 2024; Dowling et al. 2024; Khatun and Usman 2024), compared to classical machine learning models.

Given that the vast majority of QML studies have thus far been conducted theoretically in ideal classical simulation environments, it is unclear whether the advantages predicted so far will be retained when implementing QML algorithms on realistic quantum hardware, such as on Noisy Intermediate Scale Quantum (NISQ) devices (Preskill 2018; Bharti et al. 2022). As for all quantum algorithms, one of the biggest obstacles to the practical implementation of QML on quantum devices is vulnerability to noise, which can cause errors and distort computations, leading to meaningless outputs.

Numerous strategies have been developed to suppress and mitigate noise with the goal of achieving quantum advantage prior to the advent of fully fault-tolerant quantum systems, including approximate algorithms such as the Quantum Approximate Optimisation Algorithm (Farhi et al. 2014), heuristic approaches (Biamonte et al. 2017) and approximate amplitude encoding (West et al. 2024), all of which reduce circuit depths and minimise noise accumulation. Noise-induced errors can also be reduced using virtual distillation (Huggins et al. 2021; Karim et al. 2024), which suppresses errors by combining multiple noisy copies of a quantum state, and dynamical decoupling (Quantum Information and Computation for Chemistry 2014; Ezzell et al. 2023; Ji and Polian 2024), which preserves coherence during computation by applying carefully-timed pulses to qubits. While these methods may be applied to QML, and while some have already proven to be effective in implementations of QML on a quantum device (West et al. 2024), achieving scalability and fault tolerance for quantum algorithms ultimately requires the adoption of Quantum Error Correction (QEC) codes (Gottesman 1998).

Early QEC research focused on stabiliser codes (Calderbank and Shor 1996; Steane 1996; Shor 1997), and established the Threshold Theorem, which states that fault-tolerant quantum computation is possible provided physical error rates in quantum hardware remain below a finite threshold (Aharonov and Ben-Or 1996; Kitaev 1997). Since then, QEC has expanded to include diverse codes tailored for different advantages and environments, such as surface codes (Dennis et al. 2002), Bacon-Shor codes (Bacon 2006) and 3D color codes (Bombin and Martin-Delgado 2006). Very recently we have begun to see experimental verification of the effectiveness of these codes, such as in the landmark demonstration of the surface code operating below its critical threshold on superconducting processors (Acharya et al. 2024). However, the present challenge with QEC codes is their high resource overhead, with practical implementations of useful algorithms potentially requiring hundreds of thousands of qubits (Kivlichan et al. 2020).

As a result, the theoretical and experimental implementation of QEC codes is still in its early stages and is largely limited to the smallest codes or the simplest operations. QEC codes have been successfully applied to a small number of quantum operations in both theoretical and experimental quantum environments. In particular, the [[4,2,2]] stabiliser code has seen widespread use in recent years across a range of applications, including magic state preparation (Gupta et al. 2024), quantum chemistry (van Dam et al. 2024; Bedalov et al. 2024), Variational Quantum Eigensolvers (Urbanek et al. 2020; Zhang et al. 2022; Gowrishankar et al. 2025), and the implementation of diverse quantum circuits (Cane et al. 2021; Sun et al. 2022; Reichardt et al. 2024). Codes with enhanced error correction capabilities and greater resource requirements, including the Steane code (Steane 1996a, b), repetition code (Kelly et al. 2015), Shor code (Shor 1995), Bacon-Shor code and surface code, have primarily been applied to error correction on a single logical qubit (Hilder et al. 2021; Egan et al. 2021; Zhao et al. 2022; Acharya et al. 2023) or to the implementation of simple operations on 1-2 logical qubits (Bluvstein et al. 2023; Hetényi and Wootton 2024; Paetznick et al. 2024; Kim et al. 2025), though notably the Steane Code was recently used to logically encode three-qubit circuits for the quantum Fourier Transform (Mayer et al. 2024). Despite these recent advancements, no experiment or theoretical simulation has yet demonstrated the application of a QEC code to a QML problem.

We present the first study on the implementation of a QML algorithm with a stabiliser code. Specifically, we classically simulate the implementation of a simple Variational Quantum Classifier (VQC) with the [[4,2,2]] stabiliser code. We selected the simplest stabiliser code and QML algorithm in order to minimise resource overhead in our simulations, which nevertheless requires 20 qubits to implement five rounds of syndrome extractions. In Sections 2 and 3, we explain how we logically encoded the VQC according to the [[4,2,2]] code, introduce the noise models we used to simulate realistic noisy environments, and detail the parameters used in our simulations. We then present and discuss the results of our analyses in Section 4, showing that a threshold error rate for ancilla qubits exists, such that above this threshold, the stabiliser code is no longer capable of ensuring good training accuracies and state fidelities. We conclude in Section 5 with a discussion of the implications of our findings for the practical implementation of QML algorithms in the NISQ era.

2 Implementation framework

We chose a very simple 2-qubit VQC for our simulations (displayed in Fig. 1), in order to minimise both resource overhead and computational time required to run the simulations. The VQC takes two qubits encoded in the basis encoding as input, and classifies their parity through measurement of the first qubit in the Z basis. The quantity measured is the expectation value over 1000 shots. We use only one rotational parameter, \(\theta\), to train the classifier, as any more than one leads to overfitting. The classifier is able to reach an accuracy of 1.0 within 100 training iterations.

Fig. 1
Fig. 1
Full size image

The Variational Quantum Classifier (VQC) with an example input state of \(|10\rangle\) and rotational parameter \(\theta\)

2.1 Logical encoding

We chose the [[4,2,2]] 4-qubit Calderbank-Shor-Steane (CSS) stabiliser code for our encoding and error detection, as it is the simplest stabiliser code that protects against X and Z single-qubit errors (Vaidman et al. 1996). It encodes 2 logical qubits using 4 physical qubits, and can only facilitate detection (not correction) of single-qubit errors. As with all stabiliser codes, errors are detected by taking measurements of ancilla qubits after applying stabilisers to the physical qubits, known as syndrome extraction.

We used the following mapping to encode 2 logical qubits with 4 physical qubits:

$$\begin{aligned} |{00}\rangle _{L}&= \frac{1}{\sqrt{2}}(|{0000}\rangle + |{1111}\rangle )\end{aligned}$$
(1)
$$\begin{aligned} |{01}\rangle _{L}&= \frac{1}{\sqrt{2}}(|{0011}\rangle + |{1100}\rangle )\end{aligned}$$
(2)
$$\begin{aligned} |{10}\rangle _{L}&= \frac{1}{\sqrt{2}}(|{0101}\rangle + |{1010}\rangle )\end{aligned}$$
(3)
$$\begin{aligned} |{11}\rangle _{L}&= \frac{1}{\sqrt{2}}(|{0110}\rangle + |{1001}\rangle ), \end{aligned}$$
(4)

where the left-hand side represents the four different 2-logical qubit states (indicated by the subscript L), and the right-hand side represents the physical qubit states.

With these definitions, the CNOT gate in the VQC is logically encoded by a SWAP gate between the first two physical qubits. The logical encoding for the rotation gates require additional ancilla qubits (one ancilla per rotation gate to be encoded), where the ancilla qubits undergo the rotation instead of the physical qubits. This process generally requires a series of CNOT gates before and after the application of each rotation gate, which entangle the physical (and logical) qubits with the ancilla qubits.

In our logical encoding, the quantum state of the full qubit system, including ancilla qubits, is always of the form:

$$\begin{aligned} |{\Psi }\rangle = \sum _{i=0}^{3} c_i |{\psi _i}\rangle _L \bigotimes _{j=0}^{n-1}\,|{\phi _{i}}\rangle _{a_j}, \end{aligned}$$
(5)

where \(|{\psi _i}\rangle _L\) represents each of the four possible logical basis states, \(c_i\) is the complex coefficient associated with the i-th logical basis state, \(|{\phi _i}\rangle _{a_j}\) represents the state of the j-th ancilla qubit, \(a_j\), associated with the i-th logical basis state, and n is the total number of ancilla qubits that have been introduced into the system.

To illustrate, the full quantum state after the first two \(R_{X}\) rotations is given by:

$$\begin{aligned} |{\Psi }\rangle&= -i\textrm{cs}|{00}\rangle _L |{0}\rangle _{a_1}|{0}\rangle _{a_2} + \mathrm {c^{2}}|{01}\rangle _L|{0}\rangle _{a_1}|{1}\rangle _{a_2} \nonumber \\&- \mathrm {s^{2}}|{10}\rangle _L|{1}\rangle _{a_1}|{0}\rangle _{a_2} - i\textrm{sc}|{11}\rangle _L|{1}\rangle _{a_1}|{1}\rangle _{a_2}, \end{aligned}$$
(6)

where c and s are short-hand notations for cos(\(\theta\)) and sin(\(\theta\)), the logical states are subscripted with L and the ancilla states are subscripted with \(a_j\). There are two ancilla states (\(a_1\), \(a_2\)) invoked because two rotations have occurred. By the end of a logical operation, the ancilla qubits in each term reflect the logical state within the same term.

The steps we used to logically perform the double \(R_Y\) and \(R_X\) gates are outlined below:

  1. 1.

    New ancilla initiation: If no rotations have been performed in previous steps, initiate two new ancilla qubits to match the initial input logical state to the circuit. For example, if the initial input logical state is \(|{01}\rangle _L\), then the two new ancilla qubits, \(a_j\) and \(a_{j+1}\), should respectively be in the states \(|{0}\rangle\) and \(|{1}\rangle\). If double rotations have been performed in previous steps, invoke the new ancilla qubits in the \(|{0}\rangle\) state, then match the new ancilla states to the previous two ancilla states that were invoked to perform the last double rotation, by applying CNOT gates to the new ancilla qubits controlled by the two previous ancilla qubits.

  2. 2.

    Change previous ancilla states: If double rotations have been performed in previous steps, apply CNOT gates to each of the previously invoked ancilla qubits, controlled by the new ancilla qubits. For example, for newly invoked ancilla qubits \(a_j\) and \(a_{j+1}\), we apply CNOT gates to any previous ancillas \(a_{j-2}\), \(a_{j-4}\), ..., \(a_0\) and \(a_{j-1}\), \(a_{j-3}\), ..., \(a_1\) controlled by \(a_j\) and \(a_{j+1}\) respectively.

  3. 3.

    Change logical state: Apply CNOT gates to physical qubits \(q_1\) and \(q_3\) controlled by newly initiated ancilla \(a_j\), and CNOT gates to \(q_2\) and \(q_3\) controlled by newly initiated ancilla \(a_{j+1}\).

  4. 4.

    Apply rotation gate: Apply the relevant rotation gates to each of the 2 ancilla qubits.

  5. 5.

    Undo logical state change: Apply the same set of CNOT operations targeting the physical qubits and controlled by the 2 ancilla qubits as was performed in Step 3.

  6. 6.

    Undo previous ancilla state change: Apply the same set of CNOT operations as applied in Step 2, targeting the ancilla qubits invoked for previous rotations, and controlled by the newest ancilla states.

The steps for implementing the \(R_Z\) gates are much simpler and do not require as many CNOT gates between physical and ancilla qubits. We only match the newly introduced ancilla qubits to the previous two ancilla qubits, then apply the \(R_Z\) rotation gate to each new ancilla qubit. In Fig. 2, we show the full set of operations we used to perform the logical equivalent of the double \(R_X\), \(R_Z\) and \(R_Y\) gates shown in Fig. 1. For the logical CNOT gate, we applied additional CNOT gates after the SWAP gate to ensure matching between the ancilla and logical states in each term of the full quantum state of the system.

The above steps can also be used to logically implement single qubit rotations, in which case we only need to introduce one ancilla qubit each time. Logically rotating a logical qubit generally requires initiating a new ancilla qubit, transforming the states of the ancillas that were used to logically rotate the qubit in previous steps, transforming the qubit state itself, followed by applying the relevant rotation gate to the newly initiated ancilla, and finally re-applying the same operations to the ancillas and logical qubit that were applied before the rotation. This approach to logically performing rotation gates can be generalised to circuits with any number of rotation gates.

The logical circuit does not allow the direct Z basis measurement of the first logical qubit, so we conducted the logical equivalent by measuring the probability distribution across the 16 states spanned by the four physical qubits, in the Z basis. Using these probabilities, we calculated the equivalent probability distribution for the four logical 2-qubit states, from which we determined the expectation value of the Z basis measurement of the first logical qubit.

Fig. 2
Fig. 2
Full size image

Logical rotation gates implemented with the [[4,2,2]] encoding: (a) logical \(R_{X}(\theta )\), (b) logical \(R_{Z}(\theta )\) and (c) logical \(R_{Y}(\theta )\) rotations. In each subfigure, the physical qubits are denoted by \(|{q_i}\rangle\) and the ancilla qubits are denoted by \(|{a_i}\rangle\)

Although in fault-tolerant architectures, there exist alternative methods for implementing logical rotations (e.g., magic state distillation), these methods are challenging to simulate and infeasible for NISQ-era hardware due to their extreme resource requirements. In the NISQ regime, our ancilla-assisted logical rotations are an alternative method that could enable the physical implementation of QML algorithms before fault-tolerance is achieved.

2.2 Noise and error models

We considered two types of incoherent noise that can occur in a quantum circuit: probabilistic gate noise and environmental noise. Gate noise can arise from imperfections in hardware or control signals, qubit cross-talk during multi-qubit operations, and otherwise non-ideal behaviour of the qubit system whenever a gate is implemented. Environmental noise typically consists of noise that is external to the qubit system seeping in, including stray electromagnetic (EM) fields, photons, and mechanical vibrations. These types of noise are usually modelled by their impact on qubits; namely, thermal relaxation, where energy in the qubits dissipates as a result of interaction with the thermal environment, and dephasing, where the relative phase between quantum states starts randomising due to external EM fields, slow environmental changes or noise in the control systems. For our study, we do not consider errors associated with state preparation or measurement read-out.

As stabiliser codes can only detect and correct combinations of X and Z errors, we only consider X, Y and Z errors for our noise models, in the form of probabilistic gate noise and depolarising noise. This means that our noise models are inherently unable to capture the full range of noise and errors that might arise in physical NISQ systems. However, since our aim is to evaluate the effectiveness of the [[4,2,2]] stabiliser code in improving training outcomes, we only need to simulate noise that the code is theoretically capable of detecting.

We implement the gate noise model with single-qubit “error" gates applied after each single-qubit gate, where there is a probability (or Pauli Error Rate), given by p (with \(0< p < 1\)), of either an X, Y or Z error occurring, and a \(1-p\) chance of no error occurring. Additionally, we apply each single-qubit error gate after each 2-qubit gate, on the same qubits targeted by the 2-qubit gates. We also alter the Pauli Error Rate for 2-qubit gates so that it is double the error rate used for single-qubit gates, in order to better model the increased error rate for multi-qubit gates compared to single-qubit gates.

Our environmental noise model is a highly simplified model that again consists of X,Y and Z errors. We inject Pauli errors into the system at regular intervals throughout the circuit, to each physical and ancilla qubit at the same time. Each injection has a Pauli Error Rate defined in the same way as for the gate noise model. Applying noise at regular intervals mimics the cumulative build-up of errors in quantum circuits that occur as a result of environmental noise, where the specific regularity of the noise injections reflects the typical relaxation time and dephasing time of the system. Although this model does not include amplitude damping noise, dephasing noise, or the entire span of complex errors that could arise from realistic noise models, it is able to capture a range of alterations that may occur to the qubits as a result of energy loss to the system and decoherence. The model is also compatible with our choice of error-detecting code.

3 Simulations

We ran simulations of the logically-encoded circuit under the gate noise and environmental noise models in a classical high-performance computing environment, using Xanadu’s Pennylane library (Bergholm et al. 2022) in Python 3.12.3. The application of each stabiliser and syndrome extraction adds one extra qubit to the system, hence the resource overhead for the simulations ranged from 12 qubits to 20 qubits depending on the number of syndrome extraction rounds performed. We note that on physical hardware, the ancillas used for syndrome extraction can in principle be reset and reused within the same computation. This would reduce the physical overhead compared to our statevector-based simulations, where all ancillas are kept throughout the computation. However, reuse is limited to the syndrome ancillas only. We still require 10 qubits in total for the logical encoding of the quantum states and rotation gates, and a minimum of 2 additional qubits for syndrome extraction.

Since there are only four unique data points that can be used to train the VQC (namely, [0,0,0], [0,1,1], [1,0,1] and [1,1,0]), we duplicated the set to produce 40 training samples, and split it into 24 samples for training and 16 samples for testing. We used a batch size of 8 and ran the training for 100 iterations each, which was more than sufficient for convergence in a zero noise environment. Since the [[4,2,2]] stabiliser code is not capable of correcting errors, we discarded shots where at least one X or Z error was detected. Shots were rerun until no errors are detected.

We apply the noise models to both ancilla and physical qubits in the system, but keep the syndrome extraction qubits noise-free. We chose a Pauli Error Rate ranging from 0.001 to 0.01 for both models, which is consistent with current NISQ device capabilities (Arute et al. 2019). For the environmental noise model, we chose a Pauli error injection regularity of once every 4 gates, with the same Pauli Error Rates as used for the gate noise model. Since most of our gates are 2-qubit gates, with an estimated completion time of \(\approx 10-200\) ns (Kubo and Goto 2023; Howard et al. 2023; Kubo et al. 2024), and since our Pauli Error Rates produce one error every 100 to 1000 injections, equivalent to one error every 400 to 4000 gates, we estimate the time interval between errors to be \(\approx 4 - 800\) \(\mu s\). This is consistent with realistic relaxation and coherence times, which are roughly of order \(10-1000\) \(\mu s\) in superconducting qubits (Urbanek et al. 2020; Somoroff et al. 2023).

Incoherent noise is normally modelled by mixed states and density matrices, as this formulation captures the true nature of how the quantum state evolves over time as a result of random noise. However, due to the large computational overhead of simulating the QML model training with density matrices (where size scales poorly with number of qubits), we modelled the system using statevectors instead. Although this model does not capture the decoherence of the original pure state, for our purposes it was able to correctly predict the probability vector output at the end of each circuit, while requiring much less computational time to simulate.

4 Results and discussion

We first present the impact of the noise models on training accuracy with and without error correction, focusing on the effectiveness of the [[4,2,2]] stabiliser code in detecting errors and protecting the training accuracies. We then show the impact of ancilla qubit noise on the fidelities of the physical qubit states and consequently, the training accuracies. We also reveal that ancilla qubit noise limits the effectiveness of the stabiliser code, and use our results to define a threshold for the maximum ancilla Pauli Error Rate and the minimum ancilla fidelity required for reliable error detection and best training accuracies.

4.1 Training with noise without error detection

In Fig. 3, we show the evolution of the mean training accuracy achieved during training (where the mean was calculated from 10 simulations with different starting seeds), under both noise models, with varying noise levels expressed as Pauli Error Rates and without error detection.

As we might expect, it is clear from Fig. 3 that the higher the noise level, the lower the final training accuracy is. For noise levels where \(p\ge 0.005\), the training curve does not exhibit a significant jump in accuracy within the first 30 iterations, as it does for the lower noise level simulations. Instead, they stay close to their initial training accuracy throughout training, indicating that the patterns in the data required for training are lost in the noise. When noise levels are at \(p\le 0.0025\), learning appears hampered but not impossible.

Fig. 3
Fig. 3
Full size image

Mean training accuracy of the logically-encoded VQC when training under different levels of gate (left) and environmental (right) noise, ranging from \(p=0.001-0.01\). The black dashed line indicates the accuracy obtained from training without noise

These results indicate that our simple VQC model is fairly susceptible to noise, which we suggest is due to its reliance on only two qubits to learn and make predictions. While larger and more complex QML algorithms generally have greater learning capacities and may be more robust to noise and errors, they also require more qubits and gates, which increases the potential for errors. Consequently, training and inference for both simple and complex QML algorithms on NISQ devices (where noise levels can exceed \(p=0.005\)) will most likely require a combination of error correction and mitigation. For some QML algorithms and applications, fully fault-tolerant quantum computation will still be necessary to achieve desirable levels of accuracy.

4.2 Training with noise and error detection

In Fig. 4, we display the effect of implementing the [[4,2,2]] stabiliser code with different numbers of syndrome extraction rounds on the final training accuracy achieved after model convergence. In this and subsequent figures, we are interested in the mean final training accuracy, which we calculated by taking the mean of the accuracies recorded over the last 40 iterations of training (as we can assume the training has stabilised by this point), and averaging this mean over 10 simulations of VQC training. We also report the first standard deviation associated with this mean.

Under both gate noise and environmental noise models, we observe that for low noise levels (which we define as \(p \le 0.0025\)), the training accuracy always improves with increasing number of syndrome extraction rounds and there is high consistency in the final training accuracy recorded across the 10 simulations. However, for higher levels of noise (specifically, \(0.005 \le p \le 0.01\)), the training accuracy is fairly inconsistent (greater spread in values), and occasionally worsens with more syndrome extractions. This runs counter to our expectation that more syndrome extractions should lead to the detection of more errors. Our results show that at higher noise levels, the effectiveness of syndrome extractions (and error detection) is limited.

There is also evidence of limitations in the effectiveness of error detection at low noise levels. We find that even for \(p\le 0.0025\), the increase in syndrome extractions produces no clear increase in final training accuracy beyond two extraction rounds, and does not reach 1.00 even at low noise levels and with five rounds of syndrome extractions.

For the gate noise model, the proportion of runs we discarded due to errors detected was approximately \(6\%\) for \(p=0.001\) and \(44\%\) for \(p=0.01\), when using five syndrome measurements for error detection. When using only one syndrome measurement, the discard rate was \(6\%\) for \(p=0.001\) and \(41\%\) for \(p=0.01\). For the environmental noise model, the discard rate ranged from \(5\%\) at \(p=0.001\) to \(37\%\) at \(p=0.01\) when using five syndrome measurements for error detection, and ranged from \(4\%\) to \(35\%\) at \(p=0.01\) when using only one syndrome measurement per run. For both noise models, the discard rate increases with increasing error rate. It also generally increases with number of syndrome measurements, though very slowly. The largest change in discard rate occurs between one and two syndrome measurements, while between two and five syndrome measurements it remains fairly constant, apart from occasional small decreases of less than \(1\%\) as the number of syndrome extractions increases. This is consistent with the above results where for low noise levels, the training accuracy stays relatively constant after just one round of syndrome extraction, while for high noise levels, the training accuracy occasionally dips as we increase the number of syndrome extractions.

Fig. 4
Fig. 4
Full size image

Mean final training accuracy of the logically-encoded VQC under different levels of gate (left) and environmental (right) noise ranging from \(p=0.001-0.01\), and with different degrees of error detection implemented from 0 to 5 rounds of syndrome extractions. The mean final training accuracy is calculated from the training accuracies attained by the VQC over the final 40 iterations of training, after convergence. The standard deviation of the mean final training accuracy from 10 training runs is represented by the shaded regions

These results suggest that there is noise in the logically-encoded circuit that syndrome measurements cannot detect. The only possible source of this noise is the ancilla qubits, which we do not apply any syndrome extractions to, but are entangled with the physical qubits via multiple CNOT gates.

We ran the training with different levels of ancilla qubit noise, to determine the validity of our hypothesis that the ancilla qubit errors are responsible for the limited effectiveness of the [[4,2,2]] stabiliser code in detecting errors in the system. We show in Figs. 5 (gate noise model) and 6 (environmental noise model) the evolution in the mean final training accuracy as number of syndrome extractions increases, with different levels of ancilla qubit noise. The fraction \(f_{anc}\) denotes the fraction of the physical qubit Pauli Error Rate that we apply to the ancilla qubits. For example, \(f_{anc}=0.0\) means that there is no ancilla qubit noise, while \(f_{anc}=0.5\) means that the ancilla Pauli Error Rate is half the physical Pauli Error Rate.

Fig. 5
Fig. 5
Full size image

Impact of ancilla qubit error rate on the mean final training accuracy, under the gate noise model with physical qubit error probability ranging from \(p=0.001-0.01\). Each subplot displays the variation in mean final training accuracy with the number of rounds of syndrome extraction, for a specific combination of ancilla and physical qubit error probabilities. The ancilla error rates are expressed as a fraction of the physical error rates, denoted by the fraction \(f_{anc}\). The standard deviation in the meaning final training accuracies, each calculated from 10 training runs, is shown by the shaded regions

Fig. 6
Fig. 6
Full size image

Impact of ancilla qubit error rate on the mean final training accuracy, under the environmental noise model with error probability ranging from \(p=0.001-0.01\). Each subplot shows the variation in mean final training accuracy with the number of rounds of syndrome extraction, for a specific combination of ancilla and physical qubit error probabilities. The ancilla error rates are expressed as a fraction of the physical error rates, denoted by the fraction \(f_{anc}\). The standard deviation in the meaning final training accuracies, each calculated from 10 training runs, is shown by the shaded regions

It is clear from the plots that as the ancilla Pauli Error Rate increases, the syndrome extractions become less effective and less reliable, confirming our earlier hypothesis. There is a greater spread in the final training accuracy, as well as an increase in non-monotonicity (which we suggest is a consequence of the greater variance), when the ancilla error rate is higher. We note that lower noise levels produce better training outcomes than higher noise levels for \(f_{anc} \ne 0.0\); specifically there is a far smaller chance of exhibiting non-monotonicity in the final training accuracy as the number of syndrome extractions increase, and a far smaller variance in the final training accuracy. Notably, when there is no noise on the ancilla qubits, the error detection works as expected and even at the highest noise levels, we see the training accuracy reach 1.0 within five rounds of syndrome extractions. As long as there is any noise on the ancilla qubits, the error detection loses effectiveness, such that at higher ancilla noise levels (i.e., \(p_{anc}\ge 0.005\)), the error detection becomes unreliable.

Thus, a likely explanation for the high variability in final training accuracy at high ancilla qubit error rates is that the ancilla errors are not well-detected in this set-up, so they cannot be reliably eliminated from the training. We do not add syndrome extractions to the ancilla qubits, since it would require an additional encoding of the ancilla qubits, leading to more ancillary qubits that we cannot perform syndrome extractions on. Since the ancilla qubits are entangled with the physical qubits, their errors spread easily into the physical qubits through the CNOT gates (see Fig. 2). While some of these errors will be detectable by syndrome measurements on the physical qubits, most of these errors will be non-Pauli. Hence, as the noise level increases, it becomes more difficult to protect the training through syndrome extractions, leading to the lower final accuracies and greater variance in its value. As we will demonstrate in Section 4.3.1, high noise levels result in high spread in physical qubit state fidelities, which produces the higher variance and non-monotonicity in the final training accuracies.

We again observe the tendency for the mean final training accuracies to plateau for \(p_{anc}> 0\) instead of increase with more rounds of syndrome extraction. The plateauing effect is particularly apparent in the subplots of Fig. 6 where ancilla qubits are subject to noise (\(f_{anc} \ne 0\)), but is most apparent where the ancilla Pauli Error Rate is less than approximately 0.003. We also find that the higher the noise level, the lower the accuracy at which the plateau occurs. At higher levels of ancilla noise, the plateauing is hidden by the greater variance and non-monotonicity in the evolution of the final training accuracy. The plateau indicates that there is a limit for how many ancilla-caused errors can be detected and removed from the physical qubits by the stabiliser code for a given Pauli Error Rate, leaving only errors that syndrome measurements cannot detect. When that limit is reached, adding more syndrome extractions will not result in the detection of more errors, leading to the plateau.

The plateauing is also present in the simulation results under gate noise (see Fig. 5), though not as clearly visible because the threshold noise level before high variability takes over is lower than for the environmental noise simulations. The plateauing is most visible for \(p_{anc} \approx 0.002 - 0.004\), where the ancilla noise is high enough to produce plateauing but low enough to not be masked by the high variability in final training accuracy. Interestingly, the plateauing starts at a higher number of syndrome extraction rounds in the simulations under the gate noise model than under the environmental noise model, suggesting that there are fewer detectable errors spreading to the physical qubits under the environmental noise model. Additionally, with the same ancilla and physical Pauli Error Rates, the gate noise model finishes at a lower final training accuracy than the environmental noise model, which reflects the difference in error levels between the two models. Given the greater frequency of errors produced and the higher error rate for 2-qubit gates under the gate model, it is unsurprising that the level of noise in the physical qubits, both detectable and undetectable, is greater in the gate noise model than in the environmental noise model.

Our results for both noise models motivate the definition of a threshold Pauli Error Rate for ancilla qubits, such that when the error rate is larger than this threshold, error correction may not be effective - namely, the addition of more syndrome extractions may not improve training, there is high variability in final training accuracy, and the maximum mean final training accuracy is considerably lower than 1.0 for the system. In determining the threshold Pauli Error Rate for our system, we excluded ancilla error rates that produce a plateau at a final training accuracy of less than 0.90, even if high variability is not an issue. Taking into account all these requirements, we arrive at a threshold Pauli Error Rate of \(p = 0.003\) for the gate noise model, and \(p=0.004\) for the environmental depolarising noise model. For comparison, the current lowest single-qubit gate error rate exhibited by a NISQ device is \(0.15\%\) (Arute et al. 2019), meaning that we may be able to run our simple VQC with the [[4,2,2]] code on the least noisy NISQ devices under special circumstances (for example, if the dominant noise is gate noise and environmental noise is very low). However, with the addition of real-world environmental noise to the gate noise, it is possible that the error threshold would be too low to run on currently available NISQ devices.

There is a parallel between the threshold we have defined and the Threshold Theorem for quantum error correction, which asserts that there is a critical error rate below which sufficiently good quantum error correction codes can successfully correct errors. For error rates above this threshold, errors accumulate too quickly for effective error correction. However, despite the parallels, the Threshold Theorem does not explicitly cover the phenomenon of ancilla errors spreading to the physical qubits and reducing the effectiveness of the error correcting code.

Though the threshold values we found are specific to our system and cannot be generalised to other systems, both the limit on the maximum training accuracy achievable and the existence of a threshold error rate for ancilla qubits should generalise to other combinations of QML algorithms and QEC codes. All QML algorithms contain rotation gates, and when implemented with QEC codes where ancilla qubits are needed for logically encoding rotation gates, resulting in the entanglement of ancilla and physical qubits, we can expect error propagation between the ancilla and physical qubit registers. If the QEC code cannot correct the full range of complex errors that may arise from such propagation, its effectiveness will be limited in noisy environments, leading to a maximum achievable training accuracy and a threshold error rate.

Given this, we expect the general issue of error propagation to persist as QML architectures scale, with larger and deeper circuits exhibiting a lower threshold error rate due to the greater number of logical rotation gates and consequently the increased opportunities for errors in the ancilla register to accumulate and propagate. More sophisticated QEC codes would generally be applied to these circuits, and with higher code distances and better detection capabilities, they could offset some of the impact of increased error accumulation. Further investigation is needed to determine the exact scaling behaviour of these thresholds when circuit size and code capabilities are increased.

Our findings also highlight the need for additional considerations when implementing QML algorithms on quantum hardware. In the fault-tolerant regime, methods such as the use of magic states (Bravyi and Kitaev 2005) are viable alternatives. In the NISQ-era, quantum machine learning may only be achievable with QEC if additional error mitigation techniques are used with it. For example, performing rotations with error mitigation instead of as logical operations (such as with zero-noise extrapolation Pascuzzi et al., 2022 or dynamical gate error correction Khodjasteh and Viola, 2009), and using flag qubits normally applied to syndrome qubits (Chamberland et al. 2020; Chao and Reichardt 2020) in the ancilla register to minimise errors before they propagate, may help minimise uncorrectable errors. Another possible direction is the use of partial QEC, which significantly reduces resource requirements, though still requires thousands of physical qubits (Kang et al. 2025).

4.3 State fidelities

We next present our results showing the effect that noise and the implementation of error detection have on the state fidelities of physical and ancilla qubits, and use these results to further explain our training accuracy results above. All state fidelities (F) that we present in this subsection were calculated using the following definition:

$$\begin{aligned} F(\rho , \sigma ) = |\langle \psi _{\rho } | \psi _{\sigma } \rangle |^2, \end{aligned}$$
(7)

where \(\rho\) and \(\sigma\) represent the two pure quantum states we seek to compare (since we use pure states to represent the quantum states instead of mixed states). We use the error-free physical and ancilla states at the end of the logically-encoded circuit, just before measurement, as the ideal states to compare all other states to.

We report only the absolute values of fidelities in this study. This is because we used probabilities to calculate expectation values during the training process, so any negative amplitudes just before measurement disappear and do not impact training.

4.3.1 State fidelity distributions

To determine state fidelity distributions under varying error rates and levels of error detection, we simulated the logically-encoded circuit with 4000 shots, using each of the four possible inputs as initial states for 1000 shots each. The analysis revealed roughly bimodal distributions, with peaks at 1.0 and 0.0 and a small proportion of intermediate fidelitiesFootnote 1.

Fig. 7
Fig. 7
Full size image

The distribution of physical state fidelities displayed as histograms with bin size of 0.02, measured from 4000 training simulations, with a Pauli Error Rate of \(p=0.01\) for physical and ancilla qubits. Left: Without error detection. Right: With error detection (three rounds of syndrome extractions)

The distributions remain approximately bimodal for all error rates, with higher error rates producing more fidelities near 0 (and between 0 and 1), and fewer fidelities close to 1. As shown in Fig. 7, applying error detection improves the proportion of fidelities near 1 by removing the states with errors from the distribution. The near-bimodality of the distributions suggests that errors in physical or ancilla qubits will often yield states orthogonal to the correct state, which we suggest is due to the nature of the errors consisting of only bitflips and phaseflips. Bitflips in particular are far more likely to create orthogonal states from flipping at least one of the qubits in the system. For example, if the physical qubit state was originally forming the logical \(|{00}\rangle _{L}\) state, the only non-orthogonal state that can result from bitflips is the original state itself. Additionally, rotation gates will preserve the orthogonality of the errored state relative to the correct state, as rotations are unitary operations. Phase flips will have a smaller and less predictable impact on state fidelities, but in theory could create both orthogonal and non-orthogonal states, adding to the number of states with fidelities near 0 and introducing states with fidelities between 0 and 1. Both bitflips and phase flips occurring on ancilla qubits will spread to the physical qubits without necessarily preserving orthogonality, leading to physical qubit states with intermediate fidelities (and vice versa).

4.3.2 Pauli error rates and state fidelities

Table 1 presents the mean fidelities (and standard deviations) for ancilla and physical qubits under differing Pauli Error Rates and without error correction, calculated from the previously mentioned 4000-shot simulations. Since the fidelity distributions are non-Gaussian, we report both the mean (and standard deviationFootnote 2) and the fraction of fidelities below 0.02 or above 0.98. Additionally, we provide the corresponding mean final training accuracies (with standard deviations) calculated from 10 independent training runs, each initiated with a unique random seed.

Table 1 Ancilla qubit fidelities and physical qubit fidelities under selected ancilla and physical qubit Pauli Error Rates, and without error detection

Our results reveal several noteworthy trends and observations. Firstly, ancilla and physical qubit fidelities are primarily affected by their own Pauli Error Rates, but are slightly impacted by the error rate of the other qubit register. For example, when the physical error rate is non-zero and the ancilla error rate is zero, the ancilla fidelities still fall below 1.0, as physical qubit errors can propagate to ancilla qubits through the CNOT gates. We observe this also in the gate noise model, when the physical and ancilla error rate is 0.001, the proportion of fidelities where \(F_{anc}> 0.98\) is 0.91 (with \(\bar{F}_{anc} = 0.93\)), but when the physical error rate is increased to 0.010 while keeping ancilla error rate the same, the proportion of fidelities with \(F_{anc}> 0.98\) is reduced to 0.65 (with \(\bar{F}_{anc} = 0.89\)). Increasing the ancilla error fraction also reduces the physical state fidelities, despite no change in physical error rates.

Secondly, the mean final training accuracy is not especially robust to reductions in the physical state fidelities, which matches with our earlier observation that the training accuracies are impacted by even low levels of noise. To achieve an accuracy above 0.95, the mean physical state fidelity needs to be at or higher than roughly 0.95Footnote 3, corresponding to approximately \(93-94\%\) of states with \(F> 0.98\) and \(3-4\%\) with \(F < 0.02\). Apart from reducing the mean training accuracy, errors will also increase its standard deviation by increasing the number of very low fidelity states that occur during training. The training accuracies follow a much more Gaussian distribution than the fidelities, as many states are used to train the model, which tends to average out the impact of errors on training.

We expect to find that under both noise models, the training accuracy should follow the physical state fidelities at the end of the circuit very closely, as it is the physical state that is fed to the algorithm for optimisation. However, our results slightly deviate from this expectation. For example, in the environmental noise model, for \(p_{phys} = 0.005\), we find that when \(p_{anc} = 0.0025\), the physical state fidelity is 0.89 and the mean final training accuracy is \(0.82 \pm 0.07\), whereas when \(p_{anc}=0.004\), the physical state fidelity is 0.82 and the mean final training accuracy is \(0.83 \pm 0.02\). The similarity in final training accuracy despite clear difference in physical state fidelity is most likely from using only 10 samples to determine mean training accuracy and standard deviation.

Finally, we note that for both noise models, ancilla fidelities are lower than physical fidelities at the same error rates. This is because there are more gate operations applied to the ancilla register than physical register, and thus a higher chance of both gate errors and a greater level of simulated environmental noise in the ancilla register. The ancilla register thus accumulates more errors than the physical register under both noise models.

4.3.3 Error detection and state fidelities

We show the impact of error detection on the mean physical and ancilla fidelities in Figs. 8 and 9, respectively, under the gate noise model and with varying Pauli Error Rates on both registers. While there is slight improvement in ancilla fidelity after one round of syndrome extractions, likely from removing the errors that would have otherwise propagated from the physical to ancilla register, the application of more syndrome extractions to the physical qubits has minimal effect on ancilla fidelities. Additionally, we notice that the physical fidelities also plateau after approximately one set of syndrome extractions, and lower mean ancilla fidelities are associated with lower mean physical fidelities, at their respective plateaus. These observations support our earlier assertion that ancilla errors impact the physical qubits in ways that cannot be detected by the [[4,2,2]] stabiliser code. More fundamentally, the Pauli errors that originate in the ancilla register and are propagated to the physical register through entangling gates may not remain Pauli errors when they reach the physical qubits. It is these non-Pauli errors that the stabiliser code is unable to detect effectively, resulting in a maximum mean physical qubit fidelity that can be reached even with error detection, and consequently a maximum mean training accuracy. We also observe very similar trends under the environmental noise model.

Fig. 8
Fig. 8
Full size image

Mean physical state fidelity as a function of number of syndrome extraction rounds under the gate noise model, for different rates of ancilla error. The means are calculated from 10 training runs per combination of ancilla and physical error rate, where the physical error rate ranges from \(p=0.001-0.01\), and the ancilla error rate is expressed as a fraction of the physical error rates and denoted by \(f_{anc}\). The standard deviations in fidelity are not displayed to ensure visibility

Fig. 9
Fig. 9
Full size image

Mean ancilla state fidelity as a function of number of syndrome extraction rounds under the gate noise model, for different rates of ancilla error. The means are calculated from 10 training runs per combination of ancilla and physical error rate, where the physical error rate ranges from \(p=0.001-0.01\), and the ancilla error rate is expressed as a fraction of the physical error rates and denoted by \(f_{anc}\). The standard deviations in fidelity are not displayed to ensure visibility

We identified the ancilla fidelities corresponding to the threshold Pauli Error Rates we established earlier: \(p_{anc} = 0.003\) and \(p_{anc} = 0.004\). Since ancilla fidelities tend to stabilise and plateau after one round of syndrome extractions, we determined the threshold ancilla fidelities by averaging the fidelity measurements taken at 1, 2, 3, and 5 rounds of syndrome extractions, which should all be close to the true ancilla fidelity at the threshold error rate. Table 2 reports the mean ancilla and physical fidelities we measured in this way from the 4000-shot simulations, at ancilla error rates slightly above, equal to, and slightly below the thresholds. We included some ancilla error rates more than once, with different physical Pauli Error Rates. We generally find that the ancilla fidelities remain roughly the same despite differences in the physical Pauli Error Rates. This is because after one round of syndrome extractions is applied, the vast bulk of the impact from the errors in the physical register is removed, leaving non-Pauli errors that cannot be removed. We conclude from our results that the thresholds for mean ancilla fidelities that correspond to the Pauli Error Rate thresholds are 0.85 and 0.83, for the gate noise model and the environmental noise model, respectively. The corresponding proportion of states with fidelities near 0 and 1 at the thresholds are \(12\%\) and \(82\%\) for the gate noise model, and \(14\%\) and \(82\%\) for the environmental noise model.

Table 2 Ancilla qubit fidelities (\(F_{anc}\)) and physical qubit fidelities (\(F_{phys}\)) for selected ancilla (\(p_{anc}\)) and physical (\(p_{phys}\)) qubit Pauli Error Rates, with threshold ancilla fidelities highlighted in bold text

It is important to highlight that these threshold fidelities and Pauli Error Rates are dependent on the models we have used in this study. For different noise models and a different VQC, we cannot guarantee that the threshold fidelities and error rates will remain in the same ballpark. However, the interaction and propagation of errors between the ancilla and physical registers, their limiting effect on error correction schemes and the existence of a threshold error rate for ancilla qubits (and associated ancilla fidelity), are highly relevant to the general implementation of VQCs with error correcting codes on NISQ devices. QEC implementations that rely on ancilla qubits for encoding logical rotations and have limited capacity to detect and correct propagated errors will have a limiting error rate for which error correction cannot effectively protect the training and prediction processes. This limit is in addition to the limit accounted for by the Threshold Theorem.

5 Conclusions

Through classical simulations of a 2-qubit VQC implemented with the [[4,2,2]] code, we have demonstrated a proof-of-concept for integrating error detection within QML circuits, and shown that quantum error detection techniques can improve QML training accuracies in noisy environments. Since QEC is an extension of quantum error detection, our results suggest the potential for QEC to also be useful for near-term, non-fault tolerant QML implementations. However, the effectiveness of error detection and correction is limited by the error rate of ancilla qubits. A threshold ancilla error rate may be defined such that the error detection (and by extension, error correction) can reliably guarantee a reasonable final training accuracy if the ancilla error rate is below the threshold, and such that above the threshold, training accuracies may be poor and quite variable.

Under our gate noise and environmental noise models, respectively, we determined this threshold to be \(p_{anc}=0.003\) and \(p_{anc}=0.004\), for a desired minimum training accuracy of 0.90. The ancilla fidelities corresponding to these error rates are 0.85 and 0.83, respectively. The threshold error rate for the gate noise model compares favourably to the lowest single-qubit gate error rates exhibited by a NISQ device of \(0.15\%\). Under a more complex noise model with both gate noise and environmental noise, we would likely see a lower threshold required for the stabiliser code to detect errors effectively, as a greater number of errors would propagate from ancilla to physical registers under such a noise model. Moreover, both models are simplified models of realistic gate noise and environmental noise. Including other forms of environmental noise-induced errors such as amplitude damping, or non-Pauli errors from rotation gates, would likely lead to lower threshold error rates, potentially lower than the lowest error rates exhibited by current NISQ systems.

Our proposed explanation for the observed threshold is the propagation of Pauli errors from ancilla qubits to physical qubits through the combination of CNOT gates and rotation gates, which transform the Pauli errors into non-Pauli errors that are undetectable by the stabiliser code. Although our results come from a single system consisting of a 2-qubit VQC and the [[4,2,2]] error-detecting code, the physical phenomena that give rise to these results are generalisable to other combinations of QML and QEC algorithms. We conclude from our results that any QML algorithm and QEC code implementation where rotation gates need ancilla qubits to logically encode non-transversally, will allow errors to propagate between ancilla and physical qubits, leading to the formation of exotic errors in the physical qubits. If the QEC code employed is not able to detect or correct for these types of errors, the effectiveness of the QEC code is significantly hampered at high error rates and training accuracies achievable by the QML system will also be limited. The specific limit on achievable accuracy depends on the ancilla error rate and the capabilities of the QEC code.

In fully fault-tolerant settings, logical rotations would likely be implemented using alternative techniques, such as magic states. However, these approaches remain highly resource-intensive and beyond the capability of NISQ era hardware. We believe that well before fault-tolerance, QML implementations with small scale error detection or correction codes such as [[4,2,2]], [[7,1,3]] and [[9,1,3]] will play important roles in providing key insights on error suppression during the training and learning processes. Our findings indicate that practical implementation of QML algorithms on NISQ systems requires consideration of both the logical encoding associated with the QEC code and the code’s capacity to detect a wide range of error types, in addition to error mitigation approaches to be used in conjunction with the QEC code. Given the limitations of a purely QEC-based approach, it is worth exploring alternative methods to use in addition to QEC codes. For example, flag qubits may be employed to address errors on the ancilla qubits before they have a chance to propagate to the physical register.