Quantum Reinforcement Learning Breakthrough on IonQ’s Forte-1 Quantum Computer

Tuesday 07 October 2025

In a major breakthrough, scientists have successfully integrated two cutting-edge technologies – quantum machine learning and quantum policy evaluation – to demonstrate the feasibility of using actual quantum hardware for reinforcement learning.

The experiment, which took place on IonQ’s forte-1 quantum computer, involved using quantum machine learning (QML) to automatically learn the parameters of a quantum environment from classical offline reinforcement learning batch data. These parameters were then used by quantum policy evaluation (QPE) to determine policy values.

Reinforcement learning is a type of machine learning where an agent learns to make decisions based on rewards or punishments received for its actions. In the past, quantum computers have been thought to hold great potential for speeding up this process, but so far, most attempts have focused on theoretical models rather than practical applications.

The researchers behind this breakthrough used a two-armed bandit environment, where an agent must decide which arm to pull in order to receive a reward. The left arm yielded a reward with 70% probability, while the right arm had a 20% chance of providing a reward.

To test their approach, the team implemented QML on the quantum computer using a noiseless simulator and then repeated the experiment using a noisy simulator that mimicked the real device. They also ran the same experiment on the actual forte-1 quantum computer to see how it would perform in practice.

The results showed that while there were some challenges with implementing this technology, such as noise and errors during gate operations, the team was still able to learn circuit parameters that resulted in satisfactory model performance.

One of the key findings was that only the most basic circuit configuration produced the expected outcomes. This suggests that more complex circuits may not be necessary for quantum reinforcement learning, at least not yet.

The researchers also found that policy Π0, which chose the left arm with 50% probability, performed better than policy Π50, which selected the left arm with 0% probability. This is likely because policy Π0 was able to take advantage of the high reward probability on the left arm, while policy Π50 was limited by its deterministic nature.

While this breakthrough is an important step towards practical applications of quantum reinforcement learning, there are still many challenges that must be overcome before it can be used in real-world scenarios. However, with continued advancements in both hardware and error mitigation techniques, the potential for quantum advantage in this area remains strong.

Cite this article: “Quantum Reinforcement Learning Breakthrough on IonQ’s Forte-1 Quantum Computer”, The Science Archive, 2025.

Quantum Machine Learning, Quantum Policy Evaluation, Reinforcement Learning, Ionq, Forte-1, Quantum Computer, Two-Armed Bandit, Circuit Parameters, Policy Values, Error Mitigation.

Reference: Daniel Hein, Simon Wiedemann, Markus Baumann, Patrik Felbinger, Justin Klein, Maximilian Schieder, Jonas Stein, Daniëlle Schuman, Thomas Cope, Steffen Udluft, “From Classical Data to Quantum Advantage — Quantum Policy Evaluation on Quantum Hardware” (2025).

Leave a Reply