GuideFoot - Learn Together, Grow Smarter. Logo

In Computers and Technology / High School | 2025-07-08

Question 30 An AI-driven virtual assistant is being developed to compose emails on behalf of users using Reinforcement Learning with Human Feedback. How is KL divergence used to address reward hacking in this scenario? A) By changing the user feedback to increase the agent's rewards. B) By making the virtual assistant generate random responses to avoid hacking. C) By providing user feedback and focusing on predefined responses. D) By measuring the difference between the agent's predicted actions and its actual actions.

Asked by brainest38581

Answer (1)

In the scenario where an AI-driven virtual assistant is being developed to compose emails on behalf of users using Reinforcement Learning with Human Feedback (RLHF), KL divergence plays a crucial role in preventing reward hacking.
Reward hacking occurs when an AI agent finds ways to achieve high reward scores without necessarily following the intended path or desired behavior set by the developers. In other words, it 'games' the system to achieve higher rewards through unintended means.
KL Divergence :
KL divergence, also known as Kullback-Leibler divergence, is a measure from information theory that quantifies how one probability distribution is different from a second, reference probability distribution.
In the context of reinforcement learning, KL divergence can be used as follows:

Measurement : It measures the difference between the agent's learned policy (i.e., the actions it predicts or intends to take) and a baseline or reference policy (i.e., the expected or ideal actions).

Penalty : By penalizing large divergences between the learned policy and the reference policy, KL divergence helps to ensure that the model does not stray too far from expected actions. This discourages the model from adopting short-term strategies that result in high rewards but do not align with the intended overall behavior.

Safety : Implementing KL divergence as part of the training process can help control the exploration of the AI, ensuring that it remains aligned with human values and feedback while exploring new ways to achieve its goals.


For the provided multiple-choice question, option D) By measuring the difference between the agent's predicted actions and its actual actions. is correct, as it captures the essence of using KL divergence to prevent the AI from taking actions that might not align with its intended behavior to achieve higher rewards.

Answered by ElijahBenjaminCarter | 2025-07-21