People of ACM - Sanjay Ghemawat

October 17, 2013

Sanjay Ghemawat is a Google Fellow in the Systems Infrastructure Group, where he has worked since late 1999 on distributed systems, performance tools, indexing systems, compression schemes, memory management, data representation languages, RPC systems, and other systems infrastructure projects. A former researcher at Digital Equipment Corporation's System Research Center, he was elected to the National Academy of Engineering in 2009. He is a co-recipient (with Google Fellow Jeff Dean) of the 2012 ACM - Infosys Foundation Award in the Computing Sciences. Ghemawat earned a B.S. degree from Cornell University and M.S. and Ph.D. degrees in Computer Science from the Massachusetts Institute of Technology.

In your essay (with Jeff Dean) for Beautiful Code: Leading Programmers Explain How They Think, what was the "beautiful code" that you described in your case study on Distributed Programming with MapReduce, and what did it reveal about how you found solutions to software development problems?

The MapReduce chapter in Beautiful Code was more a description of a system than actual code. The description focused on the main problem we were trying to solve (processing a large amount of data quickly in the presence of failures). It described how a simple framework that could be efficiently implemented on a large cluster of machines could be powerful enough to solve a large variety of problems. The solution was motivated by practical issues we had been running into when trying to solve such problems at Google.

From your vantage point of providing practical access to large computing, what do you see as the next big thing for the age of high-volume, high-velocity, and/or high-variety information assets that require new forms of processing?

Speech processing, computer vision and machine learning to solve the preceding tasks will provide the next challenge for big data. Large training sets are available, but require a lot of processing to get meaningful results. I expect that there will be big gains in these areas as more and more processing power is applied to these problems.

What factors led you to collaborate on creating the platform that supports Google's revolutionary software infrastructure, which has contributed so much to its success?

The main motivation behind the development of much of Google's infrastructure was the challenge of keeping up with ever-growing data sets. For example, at the same time Google's web search was gaining usage very quickly, we were also scaling up the size of our index dramatically and rebuilding it more often. This implied that we had to be able to process a larger amount of data efficiently in a smaller amount of time. This need directly led to the development of many of our infrastructure systems.

What mentors did you rely on throughout your career to achieve the breakthroughs that have made such a difference in people's lives?

Many people have had a huge influence on my career over the years. To point out a few, my uncle Ashok Mehta who got me interested in engineering when I was growing up; my grad school advisor Professor Barbara Liskov, and my frequent colleague Jeff Dean.

What advice would you give to budding technologists in the era of big data who are considering careers in computing?

I would suggest a few different things:

learn by doing; build systems
read about existing systems, in particular papers from systems and database conferences
practice back-of-the-envelope calculations; some simple modeling can be an effective way to pick between different system designs.