Deep Learning Fascinating Tales of a Strange Tomorrow
March 11, 2017
Presentation by Julien Simon, AWS Principal Technical Evangelist. Filmed at AWS User Group in Wellington
Transcript
So, my title is "Fascinating Tales of a Strange Tomorrow." You'll get it in a second. Yes, I'm a big science fiction fan, and I reckon I'm not the only one here. Of course, I've read all those books, and science fiction really picked up in the 50s with Asimov, "The Forbidden Planet," and so on. They were obsessed with one thing: creating artificial life, artificial intelligence—mostly robots at the time. Look at this title: "Man-Like Machines Rule the World: Fascinating Tales of a Strange Tomorrow." That's from 60, almost 70 years ago. Were they afraid of machine rule, like some of us are today? I don't know, but it was on their minds. It wasn't just science fiction writers and movie writers; lots of very serious people, scientists, and super bright minds, went into artificial intelligence.
There's this seminal event in 1956 at Dartmouth where those geniuses invented artificial intelligence. They got all pumped up and started making predictions. In 1958, they said within 10 years, the digital computer would be the world champion and would discover and prove a major math theorem. That didn't go well. In 1965, Mr. Simon, not a relative as far as I know, predicted that within 20 years, machines would be capable of doing any work mentally. In 1967, Marvin Minsky, the godfather of artificial intelligence, said within a generation, the problem of creating artificial intelligence would be solved. In the seventies, he predicted that in three to eight years, we would have a machine with the general intelligence of an average human being. I think this one worked out okay, but we're not there yet. These guys, the brightest minds of their generation, got it totally wrong.
Eventually, it did happen. In 1997, Kasparov was defeated by Deep Blue, and last year, a Google AI defeated the Go world champion, one of the hardest games. But this happened 50 to 60 years later than those guys expected. So, back to the good old days. I'm not finger-pointing; who am I to criticize those guys? But as it turns out, not much really happened in artificial intelligence. There were several reasons. Most of those problems, like natural language processing and image recognition, worked okay with a very small data set, but they couldn't scale well. Most of these problems are exponential time problems, which are difficult to process and require a lot of power. They didn't have that in the 60s and 70s.
There was another problem: the common sense issue. You build an artificial intelligence system, and even if it's quite smart, it will get some answers totally wrong when a four-year-old child would know the right answer. A four-year-old knows the difference between a dog, a cat, a horse, a cow, and a lion. A kid already knows millions of things that you would still have to teach a computer. If you didn't, the system would look stupid. At the end of the day, there was a lot of research, but not many meaningful applications—mostly what we call toy applications. The people funding this research, like the U.S. government and others, decided that enough was enough. Around the mid-70s, they cut all the funding to those projects, leading to the first AI winter.
In the mid-80s, these guys invented Lisp machines. This is really old stuff. Did anyone here actually see one of those? Thank you. Lisp was a popular language at the time, and some people tried to convince me it still is, mostly in academic circles. They wanted to use functional programming and Lisp with custom hardware to bypass computing issues. The idea was to implement Lisp instructions in hardware to run faster, handle bigger data sets, and redeem themselves. These machines were super expensive, and not many people bought them. The market was fragmented, with different companies building different machines, and most of the people building and using them disagreed on what to do with them and how to build them. Richard Stallman, the father of the GNU project, was already there, and things weren't going quite right. A few years later, Moore's law wiped them out, and companies like Sun Microsystems invented the general-purpose workstation, which was much cheaper and faster. It could run Lisp programs circles around those powerful, complex, and expensive Lisp machines. So, Lisp machines became museum pieces, and the second AI winter began.
By that time, artificial intelligence was still being taught. If you were at the university in the 90s, the general message was that you could do this and that, and it was cool, but there were no real use cases. It was all theory. Some of my friends pursued AI PhDs and did pretty cool stuff that was totally worthless because there were no applications at the time. They ended up working in subcontracting companies writing C++. It went okay in the end, but there was a lot of disillusion. Marvin Minsky, who passed away last year, is the absolute godfather of AI. In 2001, he wrote a paper and gave a talk called "Okay, It's 2001, Where Is HAL?" It's significant because he worked with Stanley Kubrick on the movie. He was an advisor on "2001: A Space Odyssey." The paper goes through several reasons why these dreams never panned out.
Something happened around 2000: the web. Initial big players like Yahoo, Google, and Amazon came out of nowhere and quickly had millions of users. Today, it's probably hundreds of millions or even billions. Many other companies followed, including Facebook. They had tons of data, commodity hardware, lots of engineers, and a desperate need to make money. To me, looking back, all this stuff was just gasoline waiting for a match. I think this is where the machine learning boom came from—those booming web companies with tons of data and a desperate need to make money.
In December 2004, Google published the MapReduce paper. A year or so later, the Yahoo guys, who had read that paper and had a lot of data, released Hadoop 0.1 to crunch web logs and generate advertising revenue. In 2009, they demonstrated how they could sort a terabyte in a minute, which was a significant benchmark at the time. It was fast, and it went in all possible directions. Today, we have a ton of Apache projects that initiated from that Hadoop project.
Fast forward five to six years, and everyone is doing machine learning in some way. It's a core skill now. Being able to fire up a cluster, run some queries in Hive, etc., is pretty much a solved problem, especially if you use your mind. We got really good at crunching petabytes of web logs, getting people to click on ads, and building real-time prediction models. But did we make a lot of progress in computer vision, speech, and natural language processing? Not really. The reason is that traditional machine learning doesn't work well for those use cases. You cannot use traditional algorithms for these problems.
Training a machine learning model to predict clicks on banners is one thing, and you have a lot of data to do it. Training an algorithm to recognize my voice versus yours is a totally different thing. The second problem is feature extraction. When you do machine learning, the first thing you need to do is identify the features that are really meaningful. For example, in an Apache log with 50 to 60 different variables, some are more meaningful than others for predicting clicks. If you inject all 60 variables into the model, it won't perform well. You need to find the important features, which is what data scientists do. This works for structured data or even unstructured data, but what about faces and voices? What are the important features in my voice? It's more complicated than saying, "Well, I think we should take the time of day, the user agent, and the publisher ID."
So, do we have to lose hope that computers can be smarter at seeing, speaking, and listening? Of course not. The solution will come from the past in the form of neural networks, which we've all been taught about years ago but never really worked. Things are changing now. The best definition I've seen of neural networks is that a neural network is a universal approximation machine. This definition is by Andrew Ng, co-founder of Coursera and chief scientist at Baidu. He's a brilliant speaker on artificial intelligence. If you haven't seen his recent video, "Artificial Intelligence is the New Electricity," please watch it. It's not very technical but is 45 minutes of brilliance.
What he means is that you can throw any data at a neural network, and the network will figure it out, learning how to build the most accurate answer and approximate the best answer as closely as possible. You need to train networks by showing them data and the expected output, letting the network learn how that input should produce that output. The cool thing about neural networks is that features are found automatically. The math is shockingly simple, but they require a lot of computing power. The more data you can train the network on, the better it gets, unlike traditional machine learning, where after a certain point, your model doesn't improve because the data set is redundant.
The theory of neural networks is not new. One of the key articles is from 1958, the Perceptron, and the important backpropagation algorithm, which feeds the output back into the network for accelerated learning, dates back to 1975. The problem those guys had was not the theory but the lack of computing power to run these things at scale. We can take that 50-year-old theory, scale it, and get results.
Training and running a network means doing very large matrix operations, which are easy to run in parallel. But scale gets in the way. Imagine a product recommendation at Amazon with a matrix of all Amazon users as rows and all Amazon products as columns. Each cell represents a user-product interaction. As you add more languages and more products, the matrix grows. Building skills for devices like Alexa is surprisingly easy. I started doing it weeks ago, and it's uncanny how easy it is to use, build, and get good results with natural language processing and voice.
So, what do we do with this? We're building the machine learning and deep learning stack at AWS. It's far from perfect but is what it looks like today. On the hardware side, we have EC2 instances, including GPUs, which are now elastic. You can attach a GPU to an instance for a couple of hours and detach it, making it cost-effective. We also have FPGA instances and Greengrass for IoT. On the software side, you can run your favorite deep learning frameworks, MapReduce, Spark, and use notebooks to build and store models. You can also use newer services that simplify the process.
There's a trade-off between simplicity and control. Higher-level services are simpler but more boxed in, while lower-level services give you more control but require more expertise and operational workload. GPU instances have been out for a while, with two families: G2 and P2. The largest one, the p2.16xlarge, has 40,000 cores and costs $14 per hour. Setting up the software, including CUDA and deep learning libraries, can be tricky. The deep learning AMI on the marketplace is a good starting point, with all the necessary frameworks and dependencies pre-installed.
MXNet is our deep learning framework of choice. We're supporting and contributing to it, and it's now part of the Apache incubator. Before we go to the demos, the hard questions are always the same: Should I build my own network? Should I use a pre-trained model? Should I use a high-level service? These are the same questions we asked about machine learning. For 99% of people, using a high-level service or pre-trained model is the best approach.
Let's look at a demo using a pre-trained model to recognize dogs and cats. I'm using a P2 instance and the Keras library. I have a small data set with a few hundred images. I'm using a pre-trained VGG16 model, which is off the shelf. I instantiate the model, get my images locally, and fine-tune it for my specific categories. The model can recognize over a thousand categories, but I only need dogs and cats. After fine-tuning, I can do predictions with just a few lines of code. The accuracy is low with a small data set, but it shows how easy it is to use a pre-trained model.
Moving up the stack, let's look at our AI services. For image recognition, you can use the AWS Rekognition service. You can try it in the console or use the API. I wrote a small script to process images stored in S3. For example, it can detect faces, gender, age range, and emotions. It works well, even with abstract or challenging images.
Another service is Polly, which converts text to speech. You can add SSML tagging to customize pronunciation. It's very fast and can be used for various applications, like having a build server read out messages when continuous integration breaks.
The last service I want to show is Lex, a chatbot service. I built a chatbot for a flower shop and integrated it with Facebook. You can interact with it naturally, using text or voice. For example, you can order tulips, specify the color, pick-up date, and time, and receive a confirmation SMS. The deep learning behind the bot makes the interaction feel natural and intelligent.
In summary, these services and tools make it easier to build and deploy AI applications, from image recognition to natural language processing and chatbots. The future of AI is exciting, and we're just getting started. And then I have a Lambda function because everyone needs a Lambda function to do validation and fulfillment. So when I said give me black tulips, the Lambda function got that message from me and said, "Sorry, black is impossible." So let's try to find out. Here, obviously, it's a demo, so I'm hard-coding flower types and colors, and you can see black is not among them. So I get feedback telling me, "Sorry, no black." But of course, you could use the SDK and go into one of your backends to check if black flowers are available or not. If you were building a proper application, this would all be part of the code. The last thing I want to show you is how easy it is to send a message. Isn't this the most beautiful line of code ever? Python helps to make it beautiful, but this is what I love about some of our ideas. It's just so easy to use. This is the phone number, that's the message, and I just publish, and that's it. It goes off, and you can stop the whole planet. Don't tell them I told you.
In a nutshell, this is how Lex works. So as you can see, to sum things up, this is not another cycle of AI like the two or three previous ones. I don't think so. I think it's for real. I think the planets have aligned, and we can go and build your stuff with it. Whether you want to do it with deep learning libraries or keep doing EMR and Spark and Spark ML, that's one way. Or now you have those higher-level services if they fit your users. If you need to do image recognition or if you want to have chatbots and don't want to spend weeks and months building them, these new services are good enough for you. Please try them; they might save you a lot of trouble.
That's the end. Thank you so much for listening to me. This is my email address: Julien from Arcee at Amazon dot com. That works too. This is my Twitter address. If you have questions later on, please feel free to reach out. I do travel a lot and don't always have Wi-Fi on the plane, but I try to answer everyone. So again, feel free to ask questions. Thanks again, and if you have questions now, I'll be happy to answer them. Thanks for inviting me. This is the meetup at the end of the world, so I should take your picture. So you're all going into my global database of faces for recognition, which is my hidden agenda. Thanks for coming all the way just for this, not even for any other conferences. Thank you.