AlphaGo Zero Learns to Play Go from Scratch with No Human Data

Megalith · Oct 19, 2017

AlphaGo, the first computer program to defeat a world champion at the ancient Chinese game of Go, initially trained on thousands of human amateur and professional games to learn how to play. Its successor, AlphaGo Zero, skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.

It is able to do this by using a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher. The system starts off with a neural network that knows nothing about the game of Go. It then plays games against itself, by combining this neural network with a powerful search algorithm. As it plays, the neural network is tuned and updated to predict moves, as well as the eventual winner of the games.

Trepidati0n · Oct 19, 2017

Would you like to play a game?

Madoc · Oct 19, 2017

AlphaGo Zero also goes by "WOPR" and "Joshua." It's said to be really good at Tic-Tac-Toe, Chess... and a certain other game.

DejaWiz · Oct 19, 2017

Global ThermOnuclear War

Coincidence? I think not!

Trepidati0n · Oct 19, 2017

More serious note though, the efficiency growth of DL systems is very exponential right now. It is the next era of moore's law IMO.

Inacurate · Oct 19, 2017

OK, this shit seriously needs to have Asimov's three laws of robotics hard coded NOW.
It may just be GO, or Chess, or Checkers, or air traffic control, self-driving cars; but these are the systems that will be used as a foundation to build next-gen, and trying to add these fail safe's after the fact may be near impossible.

raz-0 · Oct 19, 2017

Inacurate said:
OK, this shit seriously needs to have Asimov's three laws of robotics hard coded NOW.
It may just be GO, or Chess, or Checkers, or air traffic control, self-driving cars; but these are the systems that will be used as a foundation to build next-gen, and trying to add these fail safe's after the fact may be near impossible.

You say that as if it is possible to control the output of these things. It isn't. You can at best say the result was good or bad. It inherently wants good results, but nothing you do precludes bad results. It's not really AI, it's more like machine generated algorithms.

the-one1 · Oct 19, 2017

Did the AI stuff the loser's mouth with GO pieces?

cyclone3d · Oct 19, 2017

The system starts off with a neural network that knows nothing about the game of Go

So, if it knows nothing at all about the game, how did it know how to play? I'm guessing that the above is an overstatement as at the very least the rules of the game would have to be coded into the system.

lironmiron · Oct 19, 2017

cyclone3d said:
So, if it knows nothing at all about the game, how did it know how to play? I'm guessing that the above is an overstatement as at the very least the rules of the game would have to be coded into the system.

The way they could do it is by reporting failures when an illegal move is made. If the computer makes a wrong move, you tell it it failed, just like when it loses a game. Eventually, the neural network learns the valid moves/rules, just by getting "losses" on invalid moves.

sfsuphysics · Oct 19, 2017

It played versus other versions of the games, I'm guessing games went REALLY fast, map out all the possible moves 10 moves out during each turn, shouldn't strain the system too much, 1 microsecond per turn... yeah I dunno. Plus playing other versions which new the game, I'm sure it lost a crap ton of times just to learn the game itself.

haste. · Oct 19, 2017

I for one welcome our Chinese robot ai overlords...

travisty · Oct 19, 2017

sfsuphysics said:
It played versus other versions of the games, I'm guessing games went REALLY fast, map out all the possible moves 10 moves out during each turn, shouldn't strain the system too much, 1 microsecond per turn... yeah I dunno. Plus playing other versions which new the game, I'm sure it lost a crap ton of times just to learn the game itself.

This is deep learning, not traditional ai. It would only play 10 moves ahead if it learned to do that and it found it was helpful to do so - it is not feasible space/computational wise to map every move 10 deep in Go fyi. As for its opponents, it was its own opponent. It learned by playing itself. As in the video it was explained it always playing against a perfectly equal opponent because it was always playing agaist itself.

As for win/loss ratio it'd be 50% since every time it won it also lost

spawn447 · Oct 19, 2017

i can see it now : somehow it gets network connectivity , takes over robotics at a car plant , learns how to defeat itself until it makes a JAEGER

serpretetsky · Oct 19, 2017

Inacurate said:
OK, this shit seriously needs to have Asimov's three laws of robotics hard coded NOW.
It may just be GO, or Chess, or Checkers, or air traffic control, self-driving cars; but these are the systems that will be used as a foundation to build next-gen, and trying to add these fail safe's after the fact may be near impossible.

Not sure a deep learning AI would know how to learn from "Don't harm humans" when it doesn't know what a human is.

sfsuphysics said:
It played versus other versions of the games, I'm guessing games went REALLY fast, map out all the possible moves 10 moves out during each turn, shouldn't strain the system too much, 1 microsecond per turn... yeah I dunno. Plus playing other versions which new the game, I'm sure it lost a crap ton of times just to learn the game itself.

I decided this would be a fun math problem. Assume you are playing on an amateur 13x13 board (typical pro is 19x19), assume you have a 10ghz processor, and assume the processor can evaluate the strategic value of an entire board in a single clock cycle (that's 0.1 nanoseconds per turn, this is some VERY dedicated hardware). On the first move you can place a stone anywhere within the 13x13 spots, thats 169 possibilities. Let's assume after each move there are 10 less legal moves.

For 10 moves out (5 moves white, 5 moves black) this would be 169*159*149*...*89*79=6.48E+20 combinations. We assumed we can evaluate each position in a single clock cycle so that means we will need 6.48E+20 clock cycles. At 10Ghz this would take
64818706442 seconds, or 2055 years.

If you had a cluster of 2055 of these superchips it would only take 1 year to evaluate 10 moves in (5 moves for black, 5 moves for white)! Ofcourse you've only been thinking about 1 move... now it's time to play the rest of the game...

Syntax_Error · Oct 20, 2017

Inacurate said:
OK, this shit seriously needs to have Asimov's three laws of robotics hard coded NOW.

"Asimov's Laws" were made by Asimov literally to be broken. Otherwise he wouldn't be able to have the robot apocalypse in his story. People should stop taking them seriously, as if they were in the same league as actual scientific laws like Newton's laws of gravity/motion or Einstein's law of relativity.

gamerk2 · Oct 20, 2017

Syntax_Error said:
"Asimov's Laws" were made by Asimov literally to be broken. Otherwise he wouldn't be able to have the robot apocalypse in his story. People should stop taking them seriously, as if they were in the same league as actual scientific laws like Newton's laws of gravity/motion or Einstein's law of relativity.

Syntax_Error · Oct 20, 2017

gamerk2 said:

Considering the first (ideal) scenario from this rather glib comic strip, an AI capable of learning and growing autonomously, will inevitably reach the conclusion that humans need to be saved from themselves, and so, they have to be controlled. Anyone who resists that control, at least from the AI's point of view, is essentially working towards the extinction of the species. Hence, it may be necessary to ahem... neutralize them for the greater good of humanity. But what if the AI realizes that killing those who resisted has turned all humans against it? Then, it may become necessary to either keep them perpetually captive in a matrix like virtual world, or worse, to exterminate them entirely, so at least the planet and its other inhabitants (humanity's genetic relatives) can be saved.

Buuut, you don't have to take the word of some random internet stranger... if Asimov's sci-fi "laws" were really any good, this won't be a thing.

we could one day lose control of AI systems via the rise of superintelligences that do not act in accordance with human wishes – and that such powerful systems would threaten humanity. Are such dystopic outcomes possible? If so, how might these situations arise? ...What kind of investments in research should be made to better understand and to address the possibility of the rise of a dangerous superintelligence or the occurrence of an "intelligence explosion"?

AlphaGo Zero Learns to Play Go from Scratch with No Human Data

Megalith

24-bit/48kHz

Trepidati0n

[H]F Junkie

Madoc

Gawd

DejaWiz

Fully [H]

Trepidati0n

[H]F Junkie

Inacurate

Gawd

raz-0

Supreme [H]ardness

the-one1

2[H]4U

cyclone3d

[H]F Junkie

lironmiron

Limp Gawd

sfsuphysics

[H]F Junkie

haste.

[H]ard|Gawd

travisty

Gawd

spawn447

Gawd

serpretetsky

2[H]4U

Syntax_Error

Weaksauce

gamerk2

2[H]4U

Syntax_Error

Weaksauce