Reinforcement learning (RL) is a category of machine learning that uses a trial-and-error approach. RL is a more goal-directed learning approach than either supervised or unsupervised machine learning.
Reinforcement learning is a powerful means for solving business problems that do not have a large historical dataset for training because it uses a dynamic model with rewards and penalties. Reinforcement learning models learn from interaction – an entirely different approach than supervised and unsupervised techniques that learn from history to predict the future.
Reinforcement learning models use a reward mechanism to update model actions (outputs) based on feedback (rewards or penalties) from previous actions. The model is not told what actions to take, but rather discovers what actions yield the most reward by trying different options. A reinforcement learning model (“agent”) interacts with its environment to choose an action, and then moves to a new state in the environment. In the transition to the new state, the model receives a reward (or punishment) that is associated with its previous action. The objective of the model is to maximize its reward, thereby allowing the model to improve continually with each new action and observation.
For example, if you want to train a machine learning model to play checkers, you are unlikely to have a game tree that models all possible moves in a game or to have a comprehensive historical dataset of past moves (there are 10^20 possible moves in checkers). Instead, reinforcement learning models can learn game strategy using rewards and punishments.
To test this approach, a team from software company DeepMind trained a reinforcement learning model to play the strategy board game Go. With a game tree of 10^360 possible combinations of moves, Go is more than 100 orders of magnitude more complex than checkers. The DeepMind team trained a model to successfully defeat reigning Go professional world champion Lee Sedol