a type of machine learning where an AI agent (such as a generative AI chatbot) learns to make better decisions to achieve a goal like 'sounding human.' The agent receives rewards or penalties for the actions it takes—such as thumbs up or thumbs down from a user—and learns to maximize the total reward over time.
"We're using reinforcement learning to training our chatbot to provide the right amount of technical information in answers based on the type of user interacting with it."