AlphaGo and Google DeepMind: (Un)Settling the Score between Human and Artificial Intelligence
By Katie L. Strong, PhD
In a quiet room in a London office building, artificial intelligence history was made last October as reigning European Champion Fan Hui played Go, a strategy-based game he had played countless times before. This particular match was different from the others though – not only was Fan Hui losing, but he was losing against a machine.
The machine was a novel artificial intelligence system named AlphaGo developed by Google DeepMind. DeepMind, which was acquired by Google in 2014 for an alleged $617 million (their largest European acquisition to date), is a company focused on developing machines that are capable of learning new tasks for themselves. DeepMind is more interested in artificial “general” intelligence, or AI machines that are adaptive to the task at hand and can accomplish new goals with little or no preprogramming. DeepMind programs essentially have a kind of short-term working memory that allows them to manipulate and adapt information to make decisions. This is in contrast to AI that may be very adept at a specific job, but cannot translate these skills to a different task without human intervention. For the researchers at DeepMind, the perfect platform to test these types of sophisticated AI: computer and board games.
DeepMind had set their sights high with Go; since IBM’s chess playing Deep Blue beat Garry Karparov in 1997, Go has been considered the holy grail of artificial intelligence, and many experts had predicted that humans would remain undefeated for at least another 10 years. Go is a relatively straightforward game with few rules, but the number of possibilities on the board makes for complex, interesting play that requires long-term planning; on the typical 19x19 grid, according to the DeepMind website, there are more legal game positions “than there are atoms in the universe.” Players take turns strategically placing stones (black for the first player, white for the second) on the grid intersections in an effort to form territories. Passing is an alternative to taking a turn, and the game ultimately ends when both players have passed due to the lack of unmarked territory. Often though, towards the end of the game, one player will resign in lieu of playing to the very end.
In a Nature paper published in January of this year, researchers at DeepMind reported the development of an AI agent that could beat other Go computer games with a winning rate of 99.8%. Buried in the text, in a single paragraph of the Results section, the authors also briefly describe the epic match between AlphaGo and Fan Hui, which ultimately resulted in a 5 to 0 win for artificial intelligence.
With that significant win in hand, DeepMind took a much bolder approach in announcing AlphaGo’s complexity, and invited Lee Sedol, the top Go player in the world for the last decade, to compete in a five match tournament the week of March 9th – 15th. Instead of a private match at DeepMind’s headquarters, this contest was live-streamed to the world through YouTube and came with a 1 million dollar prize. Despite the defeat of Fan Hui and the backing of Google, Lee Sedol was still fairly confident in his skills and said late February in a statement, “I have heard that Google DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win at least this time.”
Three and half hours into the first match on March 9th though, Lee Sedol resigned, or forfeited, the match. He resigned the second and third matches as well. According to Lee Sedol during a press conference following the third game, he felt he underestimated the program during game one, made mistakes in game two, and was under extreme pressure in game three.
However, in a win for humanity, Lee Sedol won the fourth game. Interestingly, the first 11 moves of the fourth game were exactly the same as the second game, and perhaps Lee Sedol was able to capitalize on what he learned from the previous three. According to the English commentator Michael Redmond, Move 78 (a move by Lee Sedol) elicited a miscalculation from AlphaGo and the game was essentially over from that point. In both of these games, Lee Sedol played second (the white stones), and he stated in the post four-game press conference that AlphaGo is weaker when the machine goes first.
Whether or not AlphaGo is actually weaker when it plays first is difficult to know since Lee Sedol may be the only person that can attest to this. During the post-four game press conference, cofounder of DeepMind Demis Hassabis stated that Lee Sedol’s win was valuable to the algorithm and the researchers would take AlphaGo back to the UK to study what had happened, so this weakness could be confirmed (and presumably fixed). One important point of Go play that may have influenced the outcome though is that AlphaGo will play moves to maximize its chances of winning, irrespective of how this move influences the margin of victory. Whether or not this is a weakness is probably up for debate as well, but in this sense AlphaGo is not playing like a professional human player. Go has a long history of being respected for its elegance and simplicity, but AlphaGo is not concerned with the sophistication or complexity of the game – it just wants to win.
Lee Sedol requested and was granted the opportunity to play black (the first move) in the fifth and final match-up, even though the rules of the game stated that it would be randomly assigned. “I really do hope I can win with black” Lee Sedol said after winning game four, “because winning with black is much more valuable.” The fifth match lasted a grueling five hours, but eventually Lee Sedol did resign. After almost a week of play, the championship concluded with a 4-1 score for artificial intelligence.
When AlphaGo played Fan Hui in October 2015, the agent beat a professional 2-dan player, but Lee Sedol ranks higher than Fan Hui as a 9-dan professional player. (Those who have mastered the game of Go are ranked on a scale known as dan, which begins with 1-dan and continues to 9-dan). To put this into perspective, Lee Sedol was a 2-dan professional player in 1998, and it wasn’t until 2003 that he reached 9-dan status. Playing at the professional level of 9-dan from 2-dan took Lee Sedol five years, but AlphaGo was able to climb this ladder in only five months. DeepMind was able to build an artificial intelligence agent with these capabilities by utilizing two important concepts, deep neural networks and reinforcement learning. Typical AI agents of the past deployed tree searching to review possible outcomes, but this brute force approach where AI considers the effect of every possible move on the outcome of the game is not feasible in Go. In Go, the first black stone played could lead to hundreds of potential moves by white, which in turn could lead to hundreds of potential moves by black. Humans have been able to master Go without mentally running through every possible play during each turn and without mentally finishing the game after every move by an opponent. Humans rely on imagination and intuition to master complex skills, and AlphaGo is actually designed to mimic these very complex cognitive functions.
Deep neural networks are loosely based on how neural connections in our brains work, and neural networks have been utilized for years to optimize our searches in Google and to increase the performance of voice recognition in smartphones. Analogous to synaptic plasticity, where synaptic strength increases or decreases over a lifetime, computer neural networks change and strengthen when presented with many examples. In this type of processing, neural networks are organized into layers, and each layer is responsible for constructing only a single piece of information. For example, in facial recognition software, the first layer of the network may only pick up on pixels and the second layer will only be able to reconstruct simple shapes, while a more sophisticated layer may be able to recognize difficult shapes (i.e, eyes and mouths). These layers will continue to become more complex until the software can recognize faces.
AlphaGo has to two neural networks: a policy network to select the next move, and a value network to select the winner of the game. AlphaGo uses the Go board as input and processes it through 12 layers of neural networks to determine the best move. To train the neural networks, researchers used 30 million moves from games played on the KGS Go server, and this alone led to an agent that could predict the human move 57% of the time. The goal was not to play at the level of humans though; the goal was to beat humans, and to do that researchers utilized reinforcement learning where AlphaGo was split in two and then played thousands of games against itself. With this, AlphaGo was able to win at the rate of 99.8% against commercial Go programs.
These neural networks mean that AlphaGo doesn’t search through every possible position to determine the best move before it makes a play and it doesn’t simulate entire games to help make a choice either. Instead, AlphaGo only considers a few potential moves when confronted with a decision and considers only the more immediate consequences of these potential moves. Even though chess has many fewer possible legal moves than Go, AlphaGo evaluated thousands of times fewer positions than Deep Blue did in 1997. AlphaGo is just more human-like in that it makes these choices intelligently and precisely. According to AlphaGo developer David Silver in this video, “the search process itself is not based on brute force. It’s based on something more akin to imagination.”
This powerful computing power is not reserved strictly for games; DeepMind’s website declares that it would like to “solve intelligence” and “use it to make the world a better place.” Games are just the beginning, but deep neural networks may be able to model disease states, pandemics, or climate change and teach us to think differently about the world’s toughest problems. (DeepMind Health was announced on February 24th of this year.) Many of the moves that AlphaGo made in the beginning of the matches baffled Go professionals because they seemed like mistakes, but AlphaGo ultimately won. Were these really mistakes that AlphaGo was able to fix later or were these moves just beyond our current comprehension? How many potential Go moves have never before been considered or played out in a game?
If AlphaGo’s choices of moves could surprise Go professionals and even the masterminds behind AlphaGo, should we fear that AlphaGo is an early version of a machine that could spontaneously evolve into a conscious AI? Today, we probably have very little to be concerned about. Although the technology behind AlphaGo could be applied to many other games, AlphaGo’s learning progress was hardly casual as it took millions of games of training. However, how will we know when we do need to worry? Games have provided us with a convenient benchmark to measure the progress of AI, from backgammon in 1979 to the recent Go match, but if Go was a final frontier for AI, where do we go from here?
Measuring emerging consciousness in AI agents that simulate the human brain will be challenging, according to a paper by Kathinka Evers and Yadin Dudai of the Human Brain Project. We can use a Turing Test, although the authors note that it seems highly plausible that an intelligent AI could pass the Turing Test without having consciousness. We could also try to detect in silico signatures similar to our brain signatures that denote consciousness, but we are at a loss for what those signals may be and how well they actually represent human consciousness. If consciousness is more than just well-defined organization and requires biological entities, then computers will never be conscious in the same sense that we are and instead will exhibit only an artificial consciousness. Furthermore, thought leaders on the integrated information theory (IIT) Giulio Tononi and Christof Koch have argued in this paper that a simulation of consciousness is not the same as consciousness, and “IIT implies that digital computers, even if their behaviour were to be functionally equivalent to ours, and even if they were to run faithful simulations of the human brain, would experience next to nothing.”
Regardless of how we debate machine consciousness, neural networks that mimic human learning are being utilized in most major companies that dominate our society, including Facebook, Google, and Microsoft. We will probably continue to see deep reinforcement learning as developed by DeepMind to improve voice recognition, translations, YouTube, and image searching. Deep reinforcement learning could also be used to power self-driving cars, train robots, and as Hassabis envisions in the future, develop scientist AIs that work alongside humans. Without a well-defined metric for machine intelligence and consciousness, time will tell which of these milestones marks the next great achievement in AI, how we measure its significance, and whether this event warrants anxiety. The mysterious ethics board that Hassabis negotiated with Google is probably a reflection of the company’s awareness of the ambiguous state of future AI research.
As uncertain and even scary as the future may seem though, it is important to remember that AlphaGo lost one of the matches, and that loss matters. Prior to the match, AlphaGo played millions and millions of Go games, many more games than Lee Sedol could ever play in a lifetime. AlphaGo never got tired, it never got intimidated by Lee Sedol’s 18 international titles, and it never participated in self-doubt. AlphaGo’s ignorance to the stakes of the games worked in its favor; Lee Sedol admitted he was under too much pressure during the third match.
For all of these advantages though, AlphaGo couldn’t adapt quickly or learn fast enough from Lee Sedol to make a difference in how it played. For AlphaGo to get better, it must play millions of games – not just a couple. Lee Sedol was able to play the first three matches, learn from AlphaGo, and exploit what he thought was a weakness. He thought AlphaGo played weaker when it played black, and he took advantage of this by playing a move that many consider brilliant and unexpected. AlphaGo challenged Lee Sedol and then brought out the best in him. And, when it comes to the future, the outcome of the fourth match begs the question: how can AI bring out the best in us?
Want to cite this post?
Strong, K.L. (2016). AlphaGo and Google DeepMind: (Un)Settling the Score between Human and Artificial Intelligence. The Neuroethics Blog. Retrieved on , from http://www.theneuroethicsblog.com/2016/03/alphago-and-google-deepmind-unsettling.html
In a quiet room in a London office building, artificial intelligence history was made last October as reigning European Champion Fan Hui played Go, a strategy-based game he had played countless times before. This particular match was different from the others though – not only was Fan Hui losing, but he was losing against a machine.
The machine was a novel artificial intelligence system named AlphaGo developed by Google DeepMind. DeepMind, which was acquired by Google in 2014 for an alleged $617 million (their largest European acquisition to date), is a company focused on developing machines that are capable of learning new tasks for themselves. DeepMind is more interested in artificial “general” intelligence, or AI machines that are adaptive to the task at hand and can accomplish new goals with little or no preprogramming. DeepMind programs essentially have a kind of short-term working memory that allows them to manipulate and adapt information to make decisions. This is in contrast to AI that may be very adept at a specific job, but cannot translate these skills to a different task without human intervention. For the researchers at DeepMind, the perfect platform to test these types of sophisticated AI: computer and board games.
Courtesy of Flickr user Alexandre Keledjian |
DeepMind had set their sights high with Go; since IBM’s chess playing Deep Blue beat Garry Karparov in 1997, Go has been considered the holy grail of artificial intelligence, and many experts had predicted that humans would remain undefeated for at least another 10 years. Go is a relatively straightforward game with few rules, but the number of possibilities on the board makes for complex, interesting play that requires long-term planning; on the typical 19x19 grid, according to the DeepMind website, there are more legal game positions “than there are atoms in the universe.” Players take turns strategically placing stones (black for the first player, white for the second) on the grid intersections in an effort to form territories. Passing is an alternative to taking a turn, and the game ultimately ends when both players have passed due to the lack of unmarked territory. Often though, towards the end of the game, one player will resign in lieu of playing to the very end.
In a Nature paper published in January of this year, researchers at DeepMind reported the development of an AI agent that could beat other Go computer games with a winning rate of 99.8%. Buried in the text, in a single paragraph of the Results section, the authors also briefly describe the epic match between AlphaGo and Fan Hui, which ultimately resulted in a 5 to 0 win for artificial intelligence.
With that significant win in hand, DeepMind took a much bolder approach in announcing AlphaGo’s complexity, and invited Lee Sedol, the top Go player in the world for the last decade, to compete in a five match tournament the week of March 9th – 15th. Instead of a private match at DeepMind’s headquarters, this contest was live-streamed to the world through YouTube and came with a 1 million dollar prize. Despite the defeat of Fan Hui and the backing of Google, Lee Sedol was still fairly confident in his skills and said late February in a statement, “I have heard that Google DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win at least this time.”
Three and half hours into the first match on March 9th though, Lee Sedol resigned, or forfeited, the match. He resigned the second and third matches as well. According to Lee Sedol during a press conference following the third game, he felt he underestimated the program during game one, made mistakes in game two, and was under extreme pressure in game three.
However, in a win for humanity, Lee Sedol won the fourth game. Interestingly, the first 11 moves of the fourth game were exactly the same as the second game, and perhaps Lee Sedol was able to capitalize on what he learned from the previous three. According to the English commentator Michael Redmond, Move 78 (a move by Lee Sedol) elicited a miscalculation from AlphaGo and the game was essentially over from that point. In both of these games, Lee Sedol played second (the white stones), and he stated in the post four-game press conference that AlphaGo is weaker when the machine goes first.
Cofounder of DeepMind Demis Hassabis |
Whether or not AlphaGo is actually weaker when it plays first is difficult to know since Lee Sedol may be the only person that can attest to this. During the post-four game press conference, cofounder of DeepMind Demis Hassabis stated that Lee Sedol’s win was valuable to the algorithm and the researchers would take AlphaGo back to the UK to study what had happened, so this weakness could be confirmed (and presumably fixed). One important point of Go play that may have influenced the outcome though is that AlphaGo will play moves to maximize its chances of winning, irrespective of how this move influences the margin of victory. Whether or not this is a weakness is probably up for debate as well, but in this sense AlphaGo is not playing like a professional human player. Go has a long history of being respected for its elegance and simplicity, but AlphaGo is not concerned with the sophistication or complexity of the game – it just wants to win.
Lee Sedol requested and was granted the opportunity to play black (the first move) in the fifth and final match-up, even though the rules of the game stated that it would be randomly assigned. “I really do hope I can win with black” Lee Sedol said after winning game four, “because winning with black is much more valuable.” The fifth match lasted a grueling five hours, but eventually Lee Sedol did resign. After almost a week of play, the championship concluded with a 4-1 score for artificial intelligence.
When AlphaGo played Fan Hui in October 2015, the agent beat a professional 2-dan player, but Lee Sedol ranks higher than Fan Hui as a 9-dan professional player. (Those who have mastered the game of Go are ranked on a scale known as dan, which begins with 1-dan and continues to 9-dan). To put this into perspective, Lee Sedol was a 2-dan professional player in 1998, and it wasn’t until 2003 that he reached 9-dan status. Playing at the professional level of 9-dan from 2-dan took Lee Sedol five years, but AlphaGo was able to climb this ladder in only five months. DeepMind was able to build an artificial intelligence agent with these capabilities by utilizing two important concepts, deep neural networks and reinforcement learning. Typical AI agents of the past deployed tree searching to review possible outcomes, but this brute force approach where AI considers the effect of every possible move on the outcome of the game is not feasible in Go. In Go, the first black stone played could lead to hundreds of potential moves by white, which in turn could lead to hundreds of potential moves by black. Humans have been able to master Go without mentally running through every possible play during each turn and without mentally finishing the game after every move by an opponent. Humans rely on imagination and intuition to master complex skills, and AlphaGo is actually designed to mimic these very complex cognitive functions.
Courtesy of Flickr user Little Book |
Deep neural networks are loosely based on how neural connections in our brains work, and neural networks have been utilized for years to optimize our searches in Google and to increase the performance of voice recognition in smartphones. Analogous to synaptic plasticity, where synaptic strength increases or decreases over a lifetime, computer neural networks change and strengthen when presented with many examples. In this type of processing, neural networks are organized into layers, and each layer is responsible for constructing only a single piece of information. For example, in facial recognition software, the first layer of the network may only pick up on pixels and the second layer will only be able to reconstruct simple shapes, while a more sophisticated layer may be able to recognize difficult shapes (i.e, eyes and mouths). These layers will continue to become more complex until the software can recognize faces.
AlphaGo has to two neural networks: a policy network to select the next move, and a value network to select the winner of the game. AlphaGo uses the Go board as input and processes it through 12 layers of neural networks to determine the best move. To train the neural networks, researchers used 30 million moves from games played on the KGS Go server, and this alone led to an agent that could predict the human move 57% of the time. The goal was not to play at the level of humans though; the goal was to beat humans, and to do that researchers utilized reinforcement learning where AlphaGo was split in two and then played thousands of games against itself. With this, AlphaGo was able to win at the rate of 99.8% against commercial Go programs.
These neural networks mean that AlphaGo doesn’t search through every possible position to determine the best move before it makes a play and it doesn’t simulate entire games to help make a choice either. Instead, AlphaGo only considers a few potential moves when confronted with a decision and considers only the more immediate consequences of these potential moves. Even though chess has many fewer possible legal moves than Go, AlphaGo evaluated thousands of times fewer positions than Deep Blue did in 1997. AlphaGo is just more human-like in that it makes these choices intelligently and precisely. According to AlphaGo developer David Silver in this video, “the search process itself is not based on brute force. It’s based on something more akin to imagination.”
This powerful computing power is not reserved strictly for games; DeepMind’s website declares that it would like to “solve intelligence” and “use it to make the world a better place.” Games are just the beginning, but deep neural networks may be able to model disease states, pandemics, or climate change and teach us to think differently about the world’s toughest problems. (DeepMind Health was announced on February 24th of this year.) Many of the moves that AlphaGo made in the beginning of the matches baffled Go professionals because they seemed like mistakes, but AlphaGo ultimately won. Were these really mistakes that AlphaGo was able to fix later or were these moves just beyond our current comprehension? How many potential Go moves have never before been considered or played out in a game?
If AlphaGo’s choices of moves could surprise Go professionals and even the masterminds behind AlphaGo, should we fear that AlphaGo is an early version of a machine that could spontaneously evolve into a conscious AI? Today, we probably have very little to be concerned about. Although the technology behind AlphaGo could be applied to many other games, AlphaGo’s learning progress was hardly casual as it took millions of games of training. However, how will we know when we do need to worry? Games have provided us with a convenient benchmark to measure the progress of AI, from backgammon in 1979 to the recent Go match, but if Go was a final frontier for AI, where do we go from here?
Measuring emerging consciousness in AI agents that simulate the human brain will be challenging, according to a paper by Kathinka Evers and Yadin Dudai of the Human Brain Project. We can use a Turing Test, although the authors note that it seems highly plausible that an intelligent AI could pass the Turing Test without having consciousness. We could also try to detect in silico signatures similar to our brain signatures that denote consciousness, but we are at a loss for what those signals may be and how well they actually represent human consciousness. If consciousness is more than just well-defined organization and requires biological entities, then computers will never be conscious in the same sense that we are and instead will exhibit only an artificial consciousness. Furthermore, thought leaders on the integrated information theory (IIT) Giulio Tononi and Christof Koch have argued in this paper that a simulation of consciousness is not the same as consciousness, and “IIT implies that digital computers, even if their behaviour were to be functionally equivalent to ours, and even if they were to run faithful simulations of the human brain, would experience next to nothing.”
Regardless of how we debate machine consciousness, neural networks that mimic human learning are being utilized in most major companies that dominate our society, including Facebook, Google, and Microsoft. We will probably continue to see deep reinforcement learning as developed by DeepMind to improve voice recognition, translations, YouTube, and image searching. Deep reinforcement learning could also be used to power self-driving cars, train robots, and as Hassabis envisions in the future, develop scientist AIs that work alongside humans. Without a well-defined metric for machine intelligence and consciousness, time will tell which of these milestones marks the next great achievement in AI, how we measure its significance, and whether this event warrants anxiety. The mysterious ethics board that Hassabis negotiated with Google is probably a reflection of the company’s awareness of the ambiguous state of future AI research.
As uncertain and even scary as the future may seem though, it is important to remember that AlphaGo lost one of the matches, and that loss matters. Prior to the match, AlphaGo played millions and millions of Go games, many more games than Lee Sedol could ever play in a lifetime. AlphaGo never got tired, it never got intimidated by Lee Sedol’s 18 international titles, and it never participated in self-doubt. AlphaGo’s ignorance to the stakes of the games worked in its favor; Lee Sedol admitted he was under too much pressure during the third match.
For all of these advantages though, AlphaGo couldn’t adapt quickly or learn fast enough from Lee Sedol to make a difference in how it played. For AlphaGo to get better, it must play millions of games – not just a couple. Lee Sedol was able to play the first three matches, learn from AlphaGo, and exploit what he thought was a weakness. He thought AlphaGo played weaker when it played black, and he took advantage of this by playing a move that many consider brilliant and unexpected. AlphaGo challenged Lee Sedol and then brought out the best in him. And, when it comes to the future, the outcome of the fourth match begs the question: how can AI bring out the best in us?
Want to cite this post?
Strong, K.L. (2016). AlphaGo and Google DeepMind: (Un)Settling the Score between Human and Artificial Intelligence. The Neuroethics Blog. Retrieved on , from http://www.theneuroethicsblog.com/2016/03/alphago-and-google-deepmind-unsettling.html
Thanks, Katie, for your excellent and thoughtful review of this momentous occasion in AI history. You covered the topic well, including mentioning Tononi's Integrated Information Theory, which to me provides the best measure to date of conscious neural processing in artificial systems. Notably, it can be used to give a quantitative measure of just how much consciousness an AI has. So I ask you and your readers to begin to think of consciousness not as a lightbulb that suddenly switches on when an AI gets smart enough, but as a continuum. Or even as a complex set of evolutionary adaptations for doing well in whatever environment the creature in question evolved in. AlphaGo "evolved" in a rarified environment, as you pointed out, without emotional considerations, just an incentive to win the game. Therefore, its level of consciousness is immature, compared to a human's, but is certainly there. AIs are all conscious already, to some degree. We can continue to evolve them to be more human-like and give them emotional awareness if we want (As Ros Picard has promoted, http://web.media.mit.edu/~picard/). Don't assume that any AI will suddenly become more powerful. Like animals, they will gradually advance in their capabilities and level of consciousness as we put them in ever more challenging environments and give them ever more sophisticated sensory systems, including those that can sense what is going on inside their own neural networks -- introspection and enteroception.
ReplyDelete-Steve M. Potter