hardware algorithms

Architecture Shape Governs QNN Trainability: Jacobian Null Space Growth and Parameter Efficiency

Curator's Take

This research reveals a fundamental mathematical reason why some quantum neural network architectures train dramatically better than others, even when they theoretically have the same computational capacity. The authors prove that "serial" architectures (where qubits are encoded sequentially) suffer from an inherent structural flaw called "gradient starvation" - as you add more parameters, an increasing fraction become mathematically decoupled from the training process, making them essentially useless. In contrast, "parallel" architectures avoid this trap by maintaining independent phase trajectories for each qubit, allowing all parameters to meaningfully contribute to learning. For quantum machine learning practitioners, this work provides concrete guidance: when scaling up your quantum neural networks, add more feature map layers rather than more trainable blocks to achieve the same performance with roughly half the parameters.

— Mark Eatherly

Summary

Variational quantum circuits with angle encoding implement truncated Fourier series, and architectures arranging $N$ qubits with $L$ encoding layers each -- sharing encoding budget $E = NL$ -- generate identical frequency spectra, identical frequency redundancy, and require the same minimum parameter count for coefficient control. Despite this equivalence, trainability varies substantially with architecture shape $(N,L)$ at fixed $E$. We identify structural rank deficiency of the coefficient matching Jacobian $J$ as the mechanism responsible. For serial single-qubit architectures, we prove $\mathrm{rank}(J) \leq 2L+1$ regardless of parameter count $P$, with $\dim(\ker J) \geq P-(2L+1)$ growing without bound -- a phenomenon we term \emph{structural gradient starvation}: a growing fraction of parameters become structurally decoupled from the loss as $P$ increases at fixed $L$. Parallel architectures avoid this via independent phase trajectories, ensuring $σ_{\min}(J^{(\mathrm{par})}) > 0$ generically for $P \leq 2E+1$, so no parameter lies in $\ker J$. For practitioners, we further show that the two natural routes to increasing parameter count have fundamentally different effects: adding feature map (FM) layers monotonically strengthens the Jacobian QFIM eigenvalue spectrum and achieves $R^2 \geq 0.95$ with $1.6$--$2.2\times$ fewer parameters than adding trainable blocks across all tested architectures, while trainable blocks improve training only through the classical interpolation mechanism with no quantum-specific benefit.