People of ACM - Yong Rui
March 1, 2018
How did your career path lead you to specialize in multimedia computing?
While doing my BS and MS degrees, I studied control theory and large-scale system optimization, which contributed considerably to my later academic career in areas of multimedia research, such as relevance feedback, neural networks and deep learning.
When I was a doctoral student at the University of Illinois at Urbana-Champaign, I began to conduct multimedia analysis and retrieval research. It was an era when the internet was in its infancy. The web browser was just emerging, and search engines were not yet born. The idea of automatically searching for images was quite avant-garde at the time.
There was a great opportunity for me. The National Science Foundation was working on a digital libraries project. I was lucky to be involved in it. Thanks to the project, and through extensive interdisciplinary research integrating control theory, information retrieval and computer vision, I became one of the first researchers to introduce a concept called relevance feedback into image search, creating a new paradigm to search for images. Relevance feedback is a way of getting better search results by examining users’ earlier search results and behaviors.
After I received my PhD, I started my 18-year professional career at Microsoft, where I continued to focus on areas of research like multimedia analysis, understanding, and retrieval, machine learning, computer vision and pattern recognition, among others.
Now, as Lenovo’s CTO and leader of Lenovo Research, one of Lenovo’s most important innovation and R&D engines, I’ll continue to work to advance the development of multimedia computing, and build state-of-the-art multimedia tech into Lenovo’s products and services.
Given the volume of multimedia that is being recorded and made available every day, what is an emerging application of multimedia analysis and retrieval that could yield important benefits to society in the coming years?
Technologically, the development of AI algorithms, which are represented by deep learning, has bolstered and will continue to bolster multimedia research. In particular, deep learning has led to a multimodality-based algorithm framework, enabling the effective fusion and use/retrieval of cross-domain multimedia data.
Take image and video captioning, for example. A couple of years ago, tagging was the only way to describe images and videos. But now, deep learning has helped establish the connection between computer vision and natural language processing (NLP), resulting in tagging being replaced by coherent natural-language sentence generation.
As related fields and hardware continue to develop, the generation of multiple sentences or even entire paragraphs could become a reality for image and video captioning. Moreover, more natural user interaction systems could become available, and modalities might not be limited to computer vision and NLP. Instead, voice, depth, and text features could be further incorporated into the deep learning pipeline.
Smartphones, of which Lenovo is a major world producer, are one of the main ways people consume multimedia content today. Based on current research and product development, how will smartphones be different in the future?
From a technical point of view, in the future, technologies like artificial intelligence (AI), virtual and augmented reality (VR/AR), 5G, instantaneous translation/interpretation, advanced batteries and holographic tech, would transform smartphones and users’ experience profoundly.
Specifically, the smartphone will be equipped with bezel-free infinity screens, neural-processing units and more sensors. It will be integrated with biometric sensors and have depth-cameras, multi-cameras and improved computer vision. Moreover, the development of 5G would lead to 10x bandwidth and zero latency.
Smartphones’ form factors will likely change considerably. One of the possibilities is the bendable phone. Lenovo Research developed the industry’s first workable bendable phone prototypes in 2016: CPlus and Folio. CPlus can switch modes between a cell phone and a wristwatch, while Folio can change from a tablet into a phone and vice versa.
Lenovo appears to be expanding into the AR/VR area, with both the Lenovo VR Classroom and the Disney/Lenovo Star Wars: Jedi Challenge. AR/VR technologies have been around for decades; why do you believe they are now poised to go mainstream?
Yes, AR/VR technologies have been around for decades, but technical breakthroughs in the past few years, like optical lens, computer vision and slam (simultaneous localization and mapping), have considerably accelerated the development of AR/VR and begun to open up their huge potential. On the other hand, VR/AR can help people solve many difficult problems, or entertain people in a brand-new way.
Personally, compared with VR, I think AR will probably be a bigger and more promising platform in the future. Especially, AR will have huge potential if it’s integrated with vertical markets, such as education, training, and industrial maintenance. For instance, at Lenovo Tech World 2017, we showcased how an engineer fixed a broken aircraft engine with daystAR, an AR glass prototype developed by Lenovo Research and our AR platform, demonstrating the promising applications of AR tech in vertical sectors.
Yong Rui is the Chief Technology Officer and Senior Vice President of Lenovo Group, a Chinese multinational technology company. His role includes setting the technical strategy for Lenovo Group and leading Lenovo Research, which is focused on areas of research like intelligent devices, artificial intelligence, cloud computing, 5G and smart lifestyle technologies.
Rui is a recipient of many awards, including the 2017 IEEE SMC Society Andrew P. Sage Best Transactions Paper Award, the 2017 ACM TOMM Nicolas Georganas Best Paper Award, the 2016 IEEE Computer Society Technical Achievement Award, and the 2010 Most Cited Paper of the Decade Award from Journal of Visual Communication and Image Representation. Rui holds 65 US and international issued patents. He was recently named an ACM Fellow for his contributions to image, video and multimedia analysis, understanding and retrieval.