Adaptive Shot Allocation for Recursive QAOA via Reinforcement Learning

Curator's Take

This research tackles a critical practical challenge in quantum optimization: how to intelligently allocate measurement shots across the recursive stages of QAOA to maximize solution quality while minimizing noise exposure on NISQ devices. By framing shot allocation as a reinforcement learning problem, the authors demonstrate that adaptive measurement strategies can significantly outperform fixed allocation schemes, potentially making recursive QAOA more viable for real-world optimization problems. The work represents an important shift toward treating quantum resource management as an active learning challenge rather than a static engineering problem, which could inform similar adaptive approaches across other variational quantum algorithms. This kind of smart resource allocation will be essential for extracting maximum value from today's noisy quantum processors before fault-tolerant systems arrive.

— Mark Eatherly

Summary

Recursive QAOA (RQAOA) solves combinatorial optimization problems by using shallow quantum circuits to estimate pairwise correlations and recursively eliminate variables until a classical solver can handle the residual instance. Each elimination step requires measurement shots, and the total shot cost grows with the number of recursive stages. On near-term quantum devices, increasing shot counts can translate directly into greater exposure to hardware-level noise sources such as readout errors and decoherence, making shot-efficient execution not merely a cost-reduction measure but a factor with direct implications for solution reliability. While shot reduction has been studied broadly across NISQ algorithms, step-wise measurement control inside the recursive loop of RQAOA has received little attention. We formulate this step-wise allocation as a sequential decision problem and propose two strategies for depth-1 RQAOA on weighted Max-Cut instances. A hand-crafted heuristic assigns shots based on local indicators of step difficulty, and a tabular Double Q-learning agent learns a residual policy that adjusts this baseline under a Lagrangian-constrained objective. Both methods are evaluated under a fixed-cap fairness protocol that equalizes the per-step budget across all strategies, and the elimination rule itself is kept unchanged so that the contribution of adaptive measurement control can be isolated. On a diverse set of weighted graph instances spanning a range of sizes and structures, the heuristic reduces total shots by approximately 23% relative to uniform allocation, and the RL policy achieves a 36% reduction with a lower effective shots per success ratio than both baselines. The improvement persists on problem sizes not seen during training, suggesting that reinforcement learning can discover efficient, instance-adaptive measurement strategies in recursive quantum optimization.

Read Full Article at arXiv Quantum Physics →