Human relevant behaviours emerge in OpenAI’s experimentation with multi-agent interaction in physically grounded environments
OpenAi, a research and deployment company based in San Francisco, has developed a digital simulation using intelligent artificial agents to study emergent behaviors while competing in a two-team hide-and-seek game in a physics-based digital environment.This project investigates the potential application of multi-agent autocurricula in a physically grounded environment to study and simulate human-relevant behaviours in a digital environment. The study opens new doors for a generation of digital simulation that goes beyond performance-driven or physics-driven simulation processes.
There are several examples of successful implementation of multi-agent reinforcement learning in multi player games. The idea behind this type of algorithm, based on self-play, is that an agent learns to exploit its own errors, thereby challenging itself in the next round to correct errors and discover new strategies. A successful example is the game Backgammon, in which a self-play algorithm named TD-gammon was applied. (Tesauro, 1995). These algorithms have certain limitations that can be reduced once the training of these agents is deployed on a larger scale and diverse sets of training scenarios with multiple agents. In these cases, it is possible to reach very complex simulations, but many of these examples are missing the relationship between the agents and the physical world.
One of the exciting characteristics of this innovative approach by the OpenAI team is that these agents’ training and simulation are based on a mixed competitive and cooperative physics-based environment through a simple game. Based on a simple visibility-based reward function and competition, agents learn and develop emergent skills and learn to create and use new tools to respond and interact with the environment.
In the environment developed by OpenAI, agents play a team-based hide and seek game. Hiders try to avoid the seekers’ line of sight, and the seekers aim to catch the hiders. The scene has boundaries that the agents should not cross, and in the scene, there are several objects that the agents can interact with. There are also random obstacles that agents cannot move or modify.
Initially, hiders and seekers learn to run away and chase each other. After 25 million iterations, the agents begin to learn to use available tools within the environment, like building a shelter. After approximately 75 million iterations, they start to improve interaction with the environment to achieve their goals. This comes in the form of moving boxes and using ramps. Hiders learn how to respond to these new initiatives and react by developing new strategies to achieve their goal. After 380 million iterations, more surprising and unexpected behaviours emerge. The seekers start surfing one of the boxes to find the hiders, taking advantage of the physical constraints that the environment gave them. Large scale training was critical to allow these emergent behaviours. Each new strategy that emerges creates a new pressure for agents to adapt and learn new strategies to reach their goals.
By using game rules, multi-agent competitions, and standard reinforcement algorithms on a large scale of training scenarios, the OpenAI team demonstrated that agents could learn and develop complex strategies and skills. The team observed six specific emergent behaviours, suggesting the potential for open-ended growth in complexity. This use of digital simulation finds direct application in several fields like the game industry. It represents an important design opportunity from industrial design applications to architecture and urban design applications. Designers may have the possibility to study the performance of the built environment not only from a structural or environmental perspective but from a social perspective as well.