Some time ago, I ever mention about the AlphaGo beating one of the best Go player in the world. For some moment all I can relate is the time IBM's Deep Blue wiped Kasparov in 1997. At the time, the victory was widely described
as a milestone in artificial intelligence (or AI in short). But Deep Blue’s technology
turned out to be useful for chess and not much else. Computer science
did not undergo a revolution. But AlphaGo is different.
The Go-playing program captures elements of human intuition, an advance that promises far-reaching consequences. I repeat. Elements of human intuition.
When I heard the news asking whether AlphaGo, the Go-playing system that recently defeated one of the strongest Go players in history, will be any different than previous AI? I believe the answer is yes, but not for the reasons you may have
heard or think of.
To understand this AI advancement you must first know what Go really is. Go is an abstract strategy board game for two players, in which the aim is to surround more territory than the opponent. The game originated in ancient China more than 2,500 years ago, and is one of the oldest board games played today. It was considered one of the four essential arts of a cultured Chinese scholar in antiquity.
Go Plays. Credit to deepmind.com by Google |
Despite its relatively simple rules,
Go is considered more complex than chess, having both a larger board
with more scope for play and longer games, and, on average, more
alternatives to consider per move.
Many articles proffer expert testimony that Go is harder than
chess, making this victory more impressive. Or they say that we didn’t
expect computers to win at Go for another 10 years, so this is a bigger
breakthrough. Some articles offer decent observation that there
are more potential positions in Go than in chess, but they don’t explain
why this should cause more difficulty for computers than for humans.
Now you will know. That it is more than what news told you. That AlphaGo are qualitatively different
and more important than those that led to Deep Blue.
In chess, beginning players are taught a notion of a chess piece’s
value. In one system, a knight or bishop is worth three pawns. A rook,
which has greater range of movement, is worth five pawns. And the queen,
which has the greatest range of all, is worth nine pawns. A king has
infinite value, since losing it means losing the game. You can use these values to assess potential moves. The notion of value is crucial in computer chess. Most computer chess
programs search through millions or billions of combinations of moves
and countermoves. The goal is for the program to find a sequence of
moves that maximizes the final value of the program’s board position, no
matter what sequence of moves is played by the opponent.
Ideas like this depend on detailed knowledge of chess and were
crucial to Deep Blue’s success. According to the technical paper written
by the Deep Blue team, this notion of a semitransparent levered pawn
was crucial to Deep Blue’s play in the second game against Kasparov.
Ultimately, the Deep Blue developers used two main ideas. The first
was to build a function that incorporated lots of detailed chess
knowledge to evaluate any given board position. The second was to use
immense computing power to evaluate lots of possible positions, picking
out the move that would force the best possible final board position. So, what happens if you apply this strategy to Go?
It turns out that you will run into a difficult problem when you try.
The problem lies in figuring out how to evaluate board positions. Top
Go players use a lot of intuition in judging how good a particular board
position is. They will, for instance, make vague-sounding statements
about a board position having good shape. And it’s not
clear how to express this intuition in well-defined systems like
the valuation of chess pieces.
Now you might think it’s just a question of working hard and coming
up with a good way of evaluating board positions. Unfortunately, even
after decades of attempts to do this using conventional approaches,
there was still no obvious way to apply the search strategy that was so
successful for chess, and Go programs remained disappointing. This began
to change in 2006, with the introduction of Monte Carlo tree
search algorithms, which tried a new approach to evaluation based on a
clever way of randomly simulating games. But Go programs still fell far
short of human players in ability. It seemed as though a strong
intuitive sense of board position was essential to success.
(Talking bout Monte Carlo, it's my first proposed undergraduate project in which I must whole-heartedly let go for another field. And now you know why I choose this topic)
What’s new and important about AlphaGo is that its developers have figured out a way of accomodating something very like that intuitive sense. To explain how it works, let me describe the AlphaGo system, as outlined in the paper
the AlphaGo team published in January. (The details of the system were
somewhat improved for AlphaGo’s match against Lee Sedol, but the broad
governing principles remain the same.)
AlphaGo took 150,000 games played by good human players and used an artificial neural network (this is kind of optimization algorithm)
to find patterns in those games. In particular, it learned to predict
with high probability what move a human player would take in any given
position. AlphaGo’s designers then improved the neural network by
repeatedly playing it against earlier versions of itself, adjusting the
network so it gradually improved its chance of winning.
We can see from this that AlphaGo didn’t start out with a valuation
system based on lots of detailed knowledge of Go, the way Deep Blue did
for chess. Instead, by analyzing thousands of prior games and engaging
in a lot of self-play, AlphaGo created a policy network through billions
of tiny adjustments, each intended to make just a tiny incremental
improvement. That, in turn, helped AlphaGo build a valuation system that
captures something very similar to a good Go player’s intuition about
the value of different board positions.
Or in another term, it mimic human learning process. But in a narrow field.
In this way, AlphaGo is much more radical than Deep Blue. Since the
earliest days of computing, computers have been used to search out ways
of optimizing known functions. Deep Blue’s approach was just that: a
search aimed at optimizing a function whose form, while complex, mostly
expressed existing chess knowledge. It was clever about how it did this
search, but it wasn’t that different from many programs written in the
1960s.
This ability to replicate intuitive pattern recognition is a big deal. Just so you know, over the past few years neural networks have been used to capture
intuition and recognize patterns across many domains. So it's not entirely new. Many of the
projects employing these networks have been visual in nature, involving
tasks such as recognizing artistic style or developing good video-game
strategy. But there are also striking examples of networks simulating
intuition in very different domains, including audio and natural
language.
But because of this versatility, I see AlphaGo not as a revolutionary
breakthrough in itself, but rather as the leading edge of an extremely
important development: the ability to build systems that can capture
intuition and learn to recognize patterns.
With that said, systems like AlphaGo are genuinely exciting. Humans have
learned to use computer systems to reproduce at least some forms of
human intuition. Now we’ve got so many wonderful challenges ahead: to
expand the range of intuition types we can represent, to make the
systems stable, to understand why and how they work, and to learn better
ways to combine them with the existing strengths of computer systems.
Might we soon learn to capture some of the intuitive judgment that goes
into writing mathematical proofs, or into writing stories or good
explanations? It’s a tremendously promising time for artificial
intelligence.
Exponential civilization rate is real.
Comments
Post a Comment