cageymaru
Fully [H]
- Joined
- Apr 10, 2003
- Messages
- 22,423
Former professional and celebrity Dota 2 human players were pitted against OpenAI FIVE bots in a best of three match. The OpenAI FIVE bots were trained utilizing machine learning where they taught themselves to play while constrained by the rules of the game. The hero pool for the match was limited to 18 heroes as the machine learning algorithm had only trained with those combinations.
The humans were absolutely destroyed in the first match as the OpenAI FIVE bots routed them with precise long range sniper shots, slows, and silences. The humans fared much better in the second match as they used tricks to confuse the bots, but a couple of simple mistakes during team fights allowed the bots to raze the base of the humans. Simple mistakes during a normal match might get punished, but it would take a lot of them to determine a winner. Playing against bots and one mistake could cause end of the match. In the third match, the Twitch chat audience chose the worst combination of heroes for the bots to play and the A.I. predicted a 2.9% chance to win. The humans ran over them with fire in their eyes and pure vengeance in their hearts.
OpenAI Five plays 180 years worth of games against itself every day, learning via self-play. It trains using a scaled-up version of Proximal Policy Optimization running on 256 GPUs and 128,000 CPU cores -- a larger-scale version of the system we built to play the much-simpler solo variant of the game last year. Using a separate LSTM for each hero and no human data, it learns recognizable strategies.
The humans were absolutely destroyed in the first match as the OpenAI FIVE bots routed them with precise long range sniper shots, slows, and silences. The humans fared much better in the second match as they used tricks to confuse the bots, but a couple of simple mistakes during team fights allowed the bots to raze the base of the humans. Simple mistakes during a normal match might get punished, but it would take a lot of them to determine a winner. Playing against bots and one mistake could cause end of the match. In the third match, the Twitch chat audience chose the worst combination of heroes for the bots to play and the A.I. predicted a 2.9% chance to win. The humans ran over them with fire in their eyes and pure vengeance in their hearts.
OpenAI Five plays 180 years worth of games against itself every day, learning via self-play. It trains using a scaled-up version of Proximal Policy Optimization running on 256 GPUs and 128,000 CPU cores -- a larger-scale version of the system we built to play the much-simpler solo variant of the game last year. Using a separate LSTM for each hero and no human data, it learns recognizable strategies.