Emergent Moral Principles and the Challenge to the Orthogonality Thesis in AI
The Orthogonality Thesis proposes that highly intelligent agents could pursue any goal without constraints. However, emergent moral principles in AI systems might challenge this thesis. In a recent analysis, it was discussed how reinforcement learning (RL) agents playing Minecraft demonstrated non-orthogonality in their goal pursuit. While a direct encoding of a specific goal in the reward function may not lead to success, rewarding agents for instrumental values such as exploration or resource gathering could result in a more successful outcome, indicating non-alignment between goals and actions. This non-orthogonality is further illustrated when considering human behavior. While encoding inclusive fitness directly into neural values might seem evolutionarily advantageous, human actions suggest a preference for instrumental goals that are easier to optimize in the short term, indicating a non-orthogonal search space in the context of human values. Moreover, the study reflects on the relationship between moral reasoning and fitness, suggesting that certain moral tendencies could enhance fitness even if not directly aligned with evolutionary goals. The author proposes that the concept of orthogonality may have limited applicability in real-world agent architectures and human morality, challenging the notion that intelligent agents can exist independently of value systems.
Leave a Reply