- If a machine can learn based on real-world inputs and adjust its behaviors there exists the potential for that machine to learn the wrong thing.
- An agent (the machine) learns in accordance with what’s known as a reward function.
- A learning agent might discover some short-cut, which may maximize the reward for the machine but may wind up being very undesirable for humans.
- A catch with reinforcement learning is that human programmers might not always anticipate every possible way there is to reach a given reward.
- Corrigible AI agents recognize that they are fundamentally flawed or actively under-development and, as such, treat any human intervention as a neutral thing for any reward function.
Read the full article, click here.
@hplusdesign: “Google DeepMind Researchers Develop #AI Kill Switch”
Google DeepMind Researchers Develop AI Kill Switch