People of ACM - Brandon Lucia
January 25, 2017
How did your interest in concurrent and parallel systems lead to research in intermittent, energy-harvesting computers?
My research focuses on the problems associated with making computers easy to program and debug, correct and reliable in their operation, and efficient in their use of resources. What interests me about this set of problem areas is that they do not affect only one area of the computer system. Solving these problems requires us to think across the layers of the system stack, about how the interplay of programming languages, system software and tools, and hardware architecture dictate a system’s programmability, reliability, and efficiency. I have taken this “across the layers” view in my work on concurrent and parallel systems. For example, in my lab we develop new software systems and architectural features that help define the next generation of parallel programming languages, making sure they are both reliable and efficient.
When we started studying energy-harvesting systems, we realized that they lacked some very important basic system support for programming, runtime reliability, debugging and profiling, and program compilation. We thought we could best fill in these missing pieces by thinking across the layers of the system stack. When we dug in and started working with intermittent systems, we found that there were many abstract similarities to problems in concurrency and parallelism. As with concurrency, intermittent systems lacked the connection between the programming language and the memory model of the underlying device’s runtime software and hardware. We developed Chain, our new programming language, along with new runtime software system support to bridge this gap, giving programmers a simpler way to use an intermittent computer’s memory and better guarantees about the behavior of their applications. Another similarity between intermittence and concurrency is that both create new categories of software bugs that are difficult to find and can lead to costly failures. Informed by our previous experience taking on concurrency debugging, we developed the hardware and software of the first debugger expressly designed for intermittent systems, filling another important gap and enabling programmers to avoid costly failures with reliable, bug-free code.
Energy harvesting is not a new area of research, but there have been significant advances in recent years. What new hardware and software innovations are responsible for making this technology more prevalent?
The current surge of interest in intermittent, energy-harvesting computer systems is motivated from the top down by applications and from the bottom up by improving technologies. Recently we have seen major developments in the Internet of Things (IoT), from smart homes, to wearables, to environmental and infrastructure sensors, to implantable and ingestible medical devices, and even to tiny satellites that we can send cheaply into Earth’s orbit. The desire (and in some cases the need) for perpetual, battery-free, maintenance-free operation has created a real demand for reliable energy-harvesting computer systems, and each year the market for these devices grows more.
From the bottom up, technology has been improving to match this demand. Commercially available, ultra-low-power microcontrollers have dropped their power consumption to such a level that intermittent operation entirely using harvested energy has become viable. We’re seeing cheap, tiny solar panels, commercially available radio wave harvesters, vibration harvesters, and interaction-powered user interfaces. We’ve seen researchers developing amazing, general-purpose research platforms, like the Wireless Identification and Sensing Platform (WISP), bringing together low-level components to make an enabling platform for energy-harvesting systems research. A few big technology companies are even beginning to produce microcontrollers with basic hardware features supporting energy harvesting.
The growing collection of exciting and very motivating applications and the ability to rely on increasingly mature, low-level technologies has motivated and positioned us to develop system support that will bring intermittent computing from an ad hoc niche to a reliable, mainstream computing domain.
In early 2017, Chain, your new programming language, will be deployed on a postage stamp-sized satellite that will circle Earth in low-earth-orbit. Can you briefly tell us how Chain works?
Chain makes intermittent software reliable by making sure that programs make progress and that all of their state remains consistent. Chain allows programmers to build their applications up as a collection of computational tasks. Each task performs some basic operations, like reading a sensor and archiving the reading, compressing some data, coding a message to be sent over a radio, or sending a radio message. We built Chain so that tasks are “atomic,” which means that a task is “all-or-nothing” and never commits only a part of its effects to the system. Chain lets the programmer send a value computed in one task to be used in another, later task through a language abstraction called a channel. Chain and its channels keep all of a task’s input values separate from all of its output values and ensures that a task’s input values are stored in “non-volatile” memory that retains its value when the computer’s power turns off. Separating a task’s inputs and outputs means the task can repeatedly try to execute to completion, as energy permits; the task always knows where to look for its inputs and it will never overwrite them. This technique, which we call “static multi-versioning” is the heart of Chain’s channel abstraction. Non-volatile channels are the main reason why Chain guarantees that applications are correct, despite operating on devices powered by radio waves that might reset 10 or 100 times every second, losing the contents of the system’s registers and memory.
Our field deployment of Chain will put this correctness guarantee to the test. We are working with KickSat, a nano-satellite startup, and with a collaborator from Disney Research Pittsburgh to put two devices into low-earth-orbit. The Chain code deployed on these devices aggregates sensor readings of the temperature in space and Earth’s magnetic field. Aggregation requires basic signal processing and compression, which are difficult to write for an intermittent device without Chain, because single-bit flips can cause unrecoverable (i.e., undecompressible) results. To send results to earth, Chain code will encode data summaries for radio transmission, and encoding needs to be correct to maximize the likelihood of getting results back on Earth. With the application written in Chain, we integrated custom hardware and software to profile the system’s behavior in orbit. Beyond the fact that sending things to space is cool, we are excited to see the scientific results of our deployment. Our satellite will send back invaluable reliability and energy profiles that uniquely characterize our Chain application in its actual orbital environment.
Aside from work in intermittent computing/ambient energy harvesting, what is another area at the intersection of computer architecture and programming languages that will yield major advances in coming years?
In the next five to 10 years, I see a lot of potential in new, dense, non-volatile memory technologies tightly integrated with heterogeneous computing components using 3D-stacked fabrication. This cross-cutting technology change will be a disruption because it affects the capacity and capability, latency and bandwidth, energy and power, and manufacturability of computing and memory components. This disruption could yield orders of magnitude improvement in system power and application performance, but only if we know how to put these new pieces together efficiently without creating a system that is not programmable due to its complexity. Realizing the potential of non-volatile, heterogeneous, 3D-integrated systems requires us to rethink computer architectures, system software, and programming languages together.
In the longer term, I am excited about the idea of designing machines that embrace completely alternative computing technology. Biological computing and data storage are coming into their own, but with only the most basic programming interfaces and execution models with which to reason about a system’s behavior. The behavior of a biological embedding—in DNA or protein networks—of today’s most sophisticated deep neural learning models yields a level of complexity that is beyond our current ability to reason. Today it is difficult or impossible to precisely interpret any sub-computation in a deep neural network, and biological computations are stochastic and difficult to understand, even probabilistically. One compelling future research problem is to define the programming and behavioral abstractions, system architectures, and behavioral specification techniques that enable future biological programmers to direct such stochastic, biological systems to carry out such complex computations.
Brandon Lucia is an Assistant Professor in the department of Electrical and Computer Engineering at Carnegie Mellon University. His research focuses on the boundaries between computer architecture, computer systems and programming languages. He is particularly interested in intermittent, energy-harvesting computers and concurrent and parallel computer systems. Lucia’s research group works to develop the basic programming, runtime system, and debugging support for making computer systems easier to program, more efficient and reliable, and less susceptible to costly errors. Recently, at ACM SPLASH 2016, Lucia’s research group presented a new programming language called Chain that is the first to be designed to ensure that energy-harvesting computers operate reliably, even with extremely frequent interruptions of the solar or radio wave energy that they harvest. Chain addresses a key challenge of energy-harvesting computers: software executes intermittently only when energy is available and Chain ensures that software executes correctly. Without system support like Chain and the other efforts of Lucia’s group, intermittently operating software is unpredictable and applications are unreliable.
Lucia received the 2015 Bell Labs Prize for foundational work on intermittent computing; a Distinguished Paper Award and a Distinguished Artifact Award at SPLASH/OOPSLA 2015; and a prestigious Summer 2015 Google Research Award.