What it’s like to match wits with a supercomputer

I spent most of the May 1997 rematch between chess world champion Garry Kasparov and IBM’s Deep Blue supercomputer sitting in a grad school classroom. I think it was Intro to Human-Computer Interaction, ironically enough. The professor projected a clunky Java-powered chess board “webcast” (the term was new, as was the web) so we could follow the match. The pace of chess being deliberative and glacial, it really wasn’t a distraction. Not to mention that, at the time, I didn’t know how to play chess. But I do remember people caring deeply about the outcome. I went to work for IBM the following year.

Deep Blue’s descendant, if not in code or microchips then in the style of its coming-out party, is Watson, a massively parallel assemblage of Power 7 processors and natural language-parsing algorithms. Watson, if you’re not a geek or a game show enthusiast, was the computer that played Ken Jennings and Brad Rutter on Jeopardy Feb. 14-16 of this year. Watson won.

Wednesday of last week I got a chance to play Watson on the Jeopardy set built at our research facility for the show. I did not win.

But I did hold the lead for a time and, in fact, I beat Watson during an unrecorded practice round. Honest!

Jeffrey Plaut of Global Strategy and I were the two human competitors selected to go up against Watson in a demonstration match. We did so at the culmination of a few hours of discussion with leaders from the humanitarian sector on how to expand Watson’s repertoire to put it to work in areas that matter. (More on that in a bit.)

IBM built a complete Jeopardy set for the actual televised match. Sony has lots of experience with this, as Jeopardy often goes on the road. But it’s clearly a hack: TV made the set look a lot bigger that it really is and the show’s producers had to jump through hoops to provide dressing room space and keep the contestants segregated from interacting with IBM’ers (to avoid claims of collusion, I suppose). Ken Jennings has some typically humorous insight on this.

Trebek was long gone, so we had the project manager for Watson host the session I competed in. He’s actually very good, as Watson went through a year of training with past winners and stand-in hosts. I was to play one round of Jeopardy. The rules were the same as the real game and Watson was at full computing capacity, with two exceptions. We were told that we could ring in and then appeal to the audience for help and, most importantly, Watson’s ring-in time was slowed down by a quarter second. The first I took as an insult — if I was going to compete against a computer I was going to do it myself — the second was a blessing.

Standing at the podium is certainly nerve-wracking. There’s a small screen and light pen for scrawling your name and then the buzzer. I stood in Jennings’ spot and it was striking to see how worn the paint was on the buzzer. From sweat? Who knows, but that thing looked like it had been squeezed to death. Contestants can see the clue board and the host, of course, but there’s also a blue bar of light underneath the clues which is triggered manually by a producer once the host finishes reading the last syllable of the clue. This is the most important moment, as ringing in before the blue bar appears locks you out temporarily. Watson had to wait a quarter second at this point and I am convinced it is the only reason we humans were able to get an answer in edgewise.

In a way, this moment is as much human-versus-human as anything. You’re trying to predict exactly when the producer will trigger the go light. Factor in some electrical delay for the plunger and it can be a real crapshoot. This is why past champions perfect their buzzer technique and ring in no matter what. They just assume they will know the answer and be able to retrieve it in the three seconds they are given.

I got a bit of a roll in the category called “Saints Be Praised”. My Catholic upbringing, study in Rome, and fascination with weird forms of martyrhood finally paid dividends. (I also learned after the match that my human competitor was Jewish and largely clueless about the category.) The video above shows me answering a question correctly — something that seems to have shocked my colleagues and the audience. (And I would have disgraced every facet of personal heritage had I messed up a question about an Italian Catholic from Chicago.)

Question

This clue was more interesting as Watson and I both got it wrong. The category was “What are you … chicken?” about chicken-based foods. Maybe my brain was still in Italian mode as I incorrectly responded “Marsala”, but Watson’s answer — “What is sauce?” — was way wrong, categorically so. This is insightful. For one, the answer, “What is Chicken A La King,” if Watson had come across it at all, was likely confusing since “king” can have so many other contexts in natural language. But Watson was confident enough to ring in anyway and its answer was basically a description of what makes Chicken A La King different from regular chicken. Note that the word “sauce” does not exist in the clue. Watson was finishing the sentence.

What’s most important and too-infrequently mentioned is that Watson is not connected to the Internet. And even if it were, because of the puns, word play, and often contorted syntax of Jeopardy clues, Google wouldn’t be very useful anyway. Try searching on the clue above and you’ll get one hit — and that only because we were apparently playing a category that had already been played and logged online by Jeopardy fans. The actual match questions during the Jennings-Rutter match were brand new. The Internet is no lifeline for questions posed in natural language.

At one point I had less than zero (I blew a Daily Double) while Jeff got on a roll asking the audience for help. And the audience was nearly always right. Call it human parallel processing. But if I was going to go down in flames to a computer I was damn sure not going to lose to another bag of carbon and water. I did squeak out a victory with a small “v” — and Watson was even gracious about it.

Thinking back it is interesting to note that nearly all my correct answers were from things I had learned through experience, not book-ingested facts. I would not have known the components of Chicken Tetrazini did I not love to eat it. I would probably not know Mother Cabrini if I didn’t take the L past the Cabrini-Green housing project every day on the way to work. This is the biggest difference between human intelligence and Watson, it seems to me. Watson does learn and make connections between concepts — and this is clearly what makes it so unique — but it does not learn in an embodied way. That is, it does not experience anything. It has no capacity for a fact to be strongly imprinted on it because of physical sensation, or habit, or happenstance — all major factors in human act of learning.

In Watson’s most-discussed screw-up on the actual show, where it answered “Toronto” when given two clues about Chicago’s airports, there’s IBM’s very valid explanation (weak category indicator, cities in the US called Toronto, difficult phrasing), but it was also noted that Watson has never been stuck at O’Hare, as virtually every air traveler has. (The UK-born author of this piece has actually be stranded for so long that he wandered around the airport and learned that it was named for the WWII aviator Butch O’Hare.) Which isn’t to say that a computer could never achieve embodied knowledge, but that’s not where we are now.

But all of it was just icing on the cake. The audience was not there to see me make a fool of myself (though perhaps a few co-workers were). We were there to discuss the practical, socially-relevant applications of Watson’s natural-language processing in fields directly benefiting humanity.

Healthcare is a primary focus. It isn’t a huge leap to see a patient’s own description of what ails him or her as the (vague, weakly-indicating) clue in Jeopardy. Run the matching algorithm against the huge corpus of medical literature and you have a diagnostic aid. This is especially useful in that Watson could provide the physician its confidence level and the logical chain of “evidence” that it used to arrive at the possible diagnoses. Work to create a “Doctor” Watson is well underway.

As interesting to my colleagues and I are applications of Watson to social services, education, and city management. Imagine setting Watson to work on the huge database of past 311 service call requests. We could potentially move beyond interesting visualizations and correlations to more efficient ways to deploy resources. This isn’t about replacing call centers but about enabling them to view 311 requests — a kind of massive, hyperlocal index of what a city cares about — as an interconnected system of causes and effects. And that’s merely the application most interesting to me. There are dozens of areas to apply Watson, immediately.

The cover story of The Atlantic this month, Mind vs. Machine, is all about humanity’s half-century attempt to create a computer that would pass the Turing Test — which would, in other words, be able to pass itself off as a human, convincingly. (We’re not there yet, though we’ve come tantalizingly close.) Watson does not pass the Turing test, for all sorts of reasons, but the truth is that what we’ve learned from it — what I learned personally in a single round of Jeopardy — is that the closer we get to creating human-like intelligence in a machine, the more finely-nuanced our understanding of our own cognitive faculties becomes. The last mile to true AI will be the most difficult, primarily because we’re simultaneously trying to crack a technical problem and figure out what, in the end, makes human intelligence human.