hardware machine_learning policy

Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis

Curator's Take

This article presents a breakthrough in quantum circuit compilation using reinforcement learning to optimize Clifford circuit synthesis, achieving near-optimal results that outperform existing classical algorithms like those in Qiskit. The key innovation is an equivariant neural network architecture that can handle circuits of varying qubit counts without retraining, scaling from 6-qubit problems where it finds optimal solutions 99.2% of the time up to 30-qubit circuits generated from thousand-gate sequences. This work addresses a critical bottleneck in quantum computing where efficient circuit compilation directly impacts the feasibility of running quantum algorithms on real hardware, especially as quantum devices grow larger and more complex. The ability to synthesize high-quality Clifford circuits in milliseconds could significantly accelerate quantum error correction protocols and fault-tolerant quantum computing implementations.

— Mark Eatherly

Summary

We consider the problem of synthesizing Clifford quantum circuits for devices with all-to-all qubit connectivity. We approach this task as a reinforcement learning problem in which an agent learns to discover a sequence of elementary Clifford gates that reduces a given symplectic matrix representation of a Clifford circuit to the identity. This formulation permits a simple learning curriculum based on random walks from the identity. We introduce a novel neural network architecture that is equivariant to qubit relabelings of the symplectic matrix representation, and which is size-agnostic, allowing a single learned policy to be applied across different qubit counts without circuit splicing or network reparameterization. On six-qubit Clifford circuits, the largest regime for which optimal references are available, our agent finds circuits within one two-qubit gate of optimality in milliseconds per instance, and finds optimal circuits in 99.2% of instances within seconds per instance. After continued training on ten-qubit instances, the agent scales to unseen Clifford tableaus with up to thirty qubits, including targets generated from circuits with over a thousand Clifford gates, where it achieves lower average two-qubit gate counts than Qiskit's Aaronson-Gottesman and greedy Clifford synthesizers.