People of ACM - Andrew A. Chien
July 11, 2017
At CERES, you are working to create a partnership between leading companies and university faculty and students. Why do you think collaboration between academia and industry is important?
The computing community is special in its unusually deep ties and crossflow between academia and industry. Researchers and leaders move back and forth (as I have), reflecting mutual respect and powerful synergies between academic insights and practical challenges of exploitation and scale. I have come to believe that many of the dramatic advances in computing arise from this fertile mixing.
Of course, this insight is not original; I was fortunate to be named an NSF Young Investigator in 1994. This program was launched by the visionary Erich Bloch, National Science Foundation Director from 1984 to 1990, who instituted the requirement that a portion of the NSF funds must be matched by industry. His goal, based on his experience at IBM Research, was to connect leading young scientists to problems of industrial importance. Simply put, that experience changed the direction of my career, and I've worked at the boundary of academic and industrial efforts ever since!
A major goal of CERES is to eliminate the possibility of computing systems failing without warning or cutting out intermittently. Why is achieving better resiliency difficult in the current landscape?
We started CERES with the focus on “Unstoppable” because computing has insinuated itself into the most critical parts of the economy (e-commerce, banking, logistics, etc.) and society (voting, government, law, personal, relationship, etc.), yet computer systems fail all of the time! Everyone has experienced a phone crash, a network outage, and even a cloud service failure.
Achieving resiliency is difficult for two basic reasons. First, for computing, resiliency is not an integral property and goal of our systems specification and designs, and further is not systematically evaluated and improved. Fundamental advances in our models and tools are needed to enable systematic resilience modeling and engineering. Second, the structure of our industry is that today, in most settings, resilience doesn’t pay—features and functions do. We need intellectual advances to address the first reason, and maturing customer expectations and understanding that lead to resilience becoming an agreed measurable and valued property.
One of your interests has been programming models and tools for post-Moore’s Law computing substrates. What do you predict will be some of the major characteristics of post-Moore’s Law hardware and software?
We are seeing the rise of accelerated computing—increasingly software must be customized to efficient hardware structures. That’s because parallelism, data locality, and well-matched special operations must be artfully combined to achieve best performance. Such customization is a dramatic departure from dominant trends in software practice over the past 40 years. Such customization, combined with software of extraordinary complexity, represents an extraordinary set of challenges for software architecture, programming models, and programming tools.
On mobile and embedded platforms (e.g., smartphones), accelerators are encapsulated into very large components or libraries, and it’s already true that most of the computing operations happen in accelerators (not the general-purpose CPUs on those systems-on-chip). This incurs significant reprogramming costs for each new generation of accelerators. Will mainstream cloud software, and even laptop software take this route? New programming tools and perhaps models are needed that enable both portable programming and flexible, efficient implementation for this new and rapidly changing world of “deep heterogeneity.” Ideally, software should be portable to a broad variety of accelerators (types, generations), yet achieve close-to-optimal performance. Yet meeting this challenge requires significant intellectual advances in both compilation and runtime techniques.
You’ve recently begun to work on renewable energy and cloud computing. What are the opportunities and challenges there?
When I began talking with colleagues at Argonne, I learned that computing has become a major consumer of electrical power, approaching 7 percent for ICT overall with 2 to 3 percent for data centers. So, in a substantial way, we have become part of the “energy problem” that is causing climate change. However, driving a power grid with a high percentage of volatile renewables (i.e., solar and wind beyond 25 percent) is not only expensive, but presents major and growing challenges for stability, reliability, and excess capacity requirements. Surprisingly, adding renewables to a data center can make this situation worse! The problem manifests as “uneconomic power” —sold to the grid at a negative price—so yes, the generators pay the grid to take the power. As computer scientists, we are trying to figure out if data centers can be managed to help the power grid solve this problem, by making computing’s power demands responsive to the power grid’s instantaneous constraints, and thereby support a grid with high-fraction renewable generation and commensurately lower carbon emissions.
Andrew A. Chien is the Editor-in-Chief of Communications of the ACM (read his inaugural Editor’s Letter). He is the William Eckhardt Distinguished Service Professor of Computer Science and the Director of the CERES Center for Unstoppable Computing at the University of Chicago. Chien is also a Senior Computer Scientist at the Argonne National Laboratory.
His research interests include cloud systems, architecture, high performance computing, mobile computing, and recently renewable energy. Chien has founded two industry-funded research centers (Center for Networked Systems at the University of California, San Diego and CERES), and served as Intel Corporation’s Vice President of Research from 2005 to 2010. He has received numerous awards for research papers and research excellence, including an NSF Young Investigator award. Chien is a Fellow of ACM, IEEE, and AAAS.