People of ACM - Fei-Fei Li
May 30, 2017
Why is this an exciting time to be working in the areas of deep neural networks and artificial intelligence?
AI as a field is more than 60 years old. It started with a lofty goal of building intelligent machines. Since then, researchers in mostly academic labs and institutions have worked to lay the foundation of the AI field—the problem formulation, the evaluation metrics, the algorithms, and the important subfields as pillars of AI (e.g., robotics, computer vision, natural language processing, and machine learning). So I call this period “AI in vitro.” What’s exciting is that around 2010, our field entered a new phase that I call “AI in vivo.” We’ve now entered an age in which AI applications are changing the way computing is done in real-world scenarios, from transportation, to image processing, to healthcare, and more. Because of the advances in algorithms (such as neural network-based deep learning methods), computing (such as Moore’s law, Graphical Processing Units (GPUs), and soon Tensor Processing Units (TPUs), and the availability of data (such as ImageNet, etc.), AI applications are making a real difference. In fact, this is just the beginning. I see AI as the most important driving force of the Fourth Industrial Revolution and expect it to change industry as we know it. This is what makes this field exciting now.
You have said that computer vision—how computers identify and understand images—will be the cornerstone technology of AI advances going forward. Can you explain why computer vision is so central?
Let’s borrow a metaphor from nature. 540 million years ago, the first pair of eyes in the animal world became the biggest driving force of evolution’s Big Bang—the Cambrian Explosion, during which animal species exploded. Vision was the main factor to change the behaviors of animals, and to accelerate the development of brains. Humans, the most intelligent animals we know, use vision as the main sensory system for survival, navigation, manipulation, communication and everything else. In fact, it is estimated that more than half of the brain is involved in visual processing and visual intelligence. Now let’s look at our digital world. Due to the explosion of sensors, smartphones, self-driving cars, etc., the predominant form of cyberspace data is in pixels. Cisco has estimated that more than 80% of the data on the Internet is in some kind of pixel form. YouTube alone sees hundreds of hours of videos uploaded to its servers every 60 seconds! So just like in the animal world, vision is going to be a major driving force of the evolution of machine intelligence. It is a universal application across every single vertical industry, and it is one of the most important capabilities that an intelligent machine should have, whether it is a self-driving car, or a healthcare diagnostic system, or a smart classroom, or the future of manufacturing.
In a recent project at Stanford, you and your students developed technology in which computers generate sentences to describe the images they “see.” What key insights/advances made this project possible?
One of the most fascinating abilities of humans is to tell a story when seeing a visual scene. We all know the saying “a picture is worth a thousand words.” So since the beginning of my career as a computer vision scientist, I’ve been working on this problem of telling stories about pictures (and videos). The recent work of image captioning (and later paragraph generation upon seeing a picture) is, again, an illustration of the power of deep learning methods. Given enough training with pairs of pictures and captions, our algorithm was able to learn the pairing of words or word phrases with visual content, and then generate human-like sentences. This is an incredible result given that nature took hundreds of millions of years to create only one animal that is capable of doing so (i.e., ourselves). Yet computers came to have that ability in just 50 or 60 years! And what’s remarkable about this is that today hundreds of millions of people are using this technology in the Google Photos app that allows you to search for specific things, like “beach” or “sky,” and will pull up all your photos with those things in it. Google Photos is a great example of how a company like Google has been able to develop products building on this specific research project.
Among the many industries that will be impacted by advances in AI, you have singled out healthcare as the area that will undergo the most sweeping transformations. What are some interesting applications of AI technologies to healthcare that we may see 20 years from now?
As I said before, AI is changing, and will be changing, every single vertical industry. It’s a new way of computing information and data, and this is at the essence of every single business. I’m very excited by AI’s potential in healthcare because I believe in the democratization of AI for everyone, not just the privileged few. In healthcare, I’m particularly excited by two areas of applications. One is AI-assisted diagnosis. This is the bedrock of healthcare. Many of our doctors are overwhelmed by large amounts of data and information, yet many regions are lacking proper access by trained clinicians in order to make critical diagnoses for patients and to decide what form of treatment they need. Moreover, it is well known that the earlier we detect a medical problem, the better the prognosis is for our patients. We’ve already seen how AI can better predict diabetic eye disease and help pathologists detect breast cancer. AI-powered medical diagnostics can become a tireless assistant to help doctors do their work, from early preventive diagnostics, to triaging, to precision medicine. Whether it’s pathology or radiology, much of the data in healthcare are in pixel form. So computer vision can play a critical role in advancing solutions.
Another area I’m extremely excited about is improving healthcare quality and reducing cost by improving workflow. From primary care clinics, to emergency departments, to operation rooms, to ICUs, to home care, every part of patient care and healthcare management can benefit from AI-assisted workflow. Here is a small example. Every year, around $30 billion is spent in the US to treat hospital-acquired infections in patients, a prevalent problem mostly caused by poor hand hygiene practice by clinicians. At Stanford, students in my lab are collaborating with colleagues at the medical school and the Lucile Packard Children’s Hospital on this problem. We installed cheap depth sensors (which are privacy-preserving) in a hospital inpatient unit to help track and monitor hand hygiene practice. This is the first time ever a smart sensory system has been deployed to tackle this problem. Compared to the traditional method of deploying human monitors (called “secret shoppers”), the AI system is continuous, cheap, unbiased, and more accurate. So this is just one example of AI’s usage in workflow. Similar technology can help clinicians and doctors do medical transcription, operating room protocol monitoring, Emergency Department monitoring, and much more.
Fei-Fei Li is an Associate Professor of Computer Science at Stanford University. She is also the Director of the Stanford Artificial Intelligence Laboratory (SAIL) and the Stanford Vision Lab, where she works with students and colleagues to build algorithms that enable computers and robots to see and think. Cognitive neuroscience has been one of her primary research focus areas, and she has applied insights on how the human brain functions to her work in artificial intelligence.
Li has published more than 150 articles in scholarly journals and conference proceedings. She has been recognized with the J.K. Aggarwal Prize by IAPR (2016), IBM Faculty Fellow Award (2014), The Alfred P. Sloan Faculty Award (2011), and the National Science Foundation (NSF) Career Award (2009). She is a TED speaker (2015) and one of the Great Immigrants of America honoree by the Carnegie Foundation (2016). At ACM’s 50 Years of the Turing Award Celebration in June, Li will participate in the panel “Advances in Deep Neural Networks.”