People of ACM - Xin Luna Dong

December 3, 2024

Many of us use knowledge graphs on a regular basis on applications such as search engines (such as Google or Bing) or when we ask questions of a virtual assistant (such as Amazon Alexa or Apple Siri). Will you give us a general definition of what knowledge graphs are and how they make these applications possible?

A knowledge graph (KG)is a graph where each node represents a real-world entity and each edge represents relations between the entities. Knowledge in a KG is considered as semi-structured data. On the one hand, it enjoys clean semantics of structured data powered by the rigidity of schemas, called ontology. On the other hand, it embraces the flexibility of unstructured data by allowing easily adding new classes and relationships. The graph form best mimics how human beings view the world with entities and connections between them. It also allows the KG to seamlessly connect a large number of domains through common entities across these domains, like people who are both musical artists and movie stars.

These advantages give KGs a unique position to support real-world applications. The ontology and structure make KGs understandable to machines, so semantic understanding can be facilitated in search, question answering, dialogs, and recommendation. For example, given a search query “k-cups dunkin donuts dark”, the search engine can figure out “k-cups” is a product type, “dunkin donuts” is a brand, and “dark” is a roasting type. The search engine will therefore refrain from returning coffee of the “Donut Shop” brand, or “medium-roasted.” The graph structure makes KGs easily understandable to human beings, thus convenient for displaying information such as in attribute-value pairs for understanding and in paths in a graph for explanations.

Your most downloaded paper in the ACM DL is “Towards Next-Generation Intelligent Assistants Leveraging LLM Techniques.” What are large language models (LLM’s) and why do they complement rather than compete with knowledge graphs to improve intelligent assistants?

Large language models (LLMs) are advanced machine learning models that can understand and generate text in order to finish a wide range of NLP tasks such as translation, summarization, and question answering. Recently LLMs have advanced way beyond basic NLP tasks, gaining capabilities of reasoning, tool use, and mastering of multi-modal contents. The strong capabilities make LLMs critical to power intelligent assistants.

LLMs obtain their knowledge mainly through pre-training, where they have access to massive data sets and learn to predict the next token. As such, LLM’s can hardly learn knowledge that is lacking or sparse in the pre-train datasets. Recent studies show that LLMs lack knowledge that can change over time or knowledge about less popular entities, since such knowledge does not occur often in the pre-train data. Even worse, LLMs can hallucinate when asked for such information. This is where KGs can complement LLMs as additional data sources. This is similar to how humans function: we are capable in understanding and creating, but do not know or memorize all information, so we need to refer to dictionaries, encyclopedias, books, and the web. Similarly, LLMs also need to refer to external data sources—this process is called retrieval-augmented generation (RAG)—and KGs can be an important external source to power RAG. Our practices in building RAG systems show that the crisp and precise answers from KGs can significantly increase LLM question-answering (QA) quality and reduce latency.

Will you tell us how the personal assistant you are developing at Meta will be an improvement over existing state-of-the-art personal assistants?

Existing virtual assistants are normally voice assistants, taking voice commands from users and taking actions accordingly. They are essentially an interface to existing features and serve the users in a reactive fashion.

The virtual assistants we research for wearable devices, such as smart glasses, can have many more advantages. First, as the glasses have cameras, they can take virtual input and answer questions like: “What’s the name of this Cathedral?” In other words, they evolve from voice-only to multi-modal. Second, as we wear the glasses they understand contexts much better and can remind us to buy grocery items on our to-do list when we are close to a grocery store—so they evolve from context-agnostic to context-aware. Third, as we may wear the glasses for a long time, they tend to know us better and can serve us proactively, such as playing our favorite morning music for our morning routine—so they evolve from reactive to proactive. Finally, with the strong capabilities of LLMs and the RAG techniques mentioned above, they are able to provide much richer information more accurately—so they go from an interface to a know-it-all. Who can reject an assistant that knows us and the world so well, and can provide us the right services at the right time?

One example of a wearable personal assistant your team is working on are the Ray-Ban Meta Smart Glasses. Other technology companies have introduced wearable glasses in the past that were not embraced by consumers. What makes the Ray-Ban Meta Smart Glasses unique?

Compared with smart glasses developed decades ago, the timing is much better with much better hardware and ML technologies. Just to name a few unique features that we have enabled on Ray-ban Meta but impossible a decade ago. We can look at some vegetables in the grocery store and ask the glasses to suggest cooking recipes. We can look at a Spanish menu and ask for a translation to English. We can look at a building and ask who designed the architecture. These are all enabled by the recent technological advancements, in both software and hardware.

There are many other reasons regarding product design. This is not my expertise but I can clearly see the differences. Ray-Ban Meta is designed as stylish glasses that users are willing to wear in their daily lives. The glasses are not too heavy and so are reasonably comfortable to wear. They have reasonable battery life and so are fairly practical. Also, they have foundational features like taking pictures, listening to music, and making calls, so they are more useful in our daily lives. Techniques are important, but what makes a good product often goes way beyond techniques.

What is another area of your field that is poised to make a big impact in the near future?

I’d like to mention personal information management and personalization. Vannevar Bush proposed Memex—Memory Expansion—in 1945, by which people can record their lives and easily access them afterwards. Inspired by this idea, my PhD thesis is about personal information. But we managed only digital information, everything that is already in digital form. However, we were not able to record and manage physical personal information—that is, what people see, hear, and feel.

Indeed, what Bush depicted in a sketch is basically a wearable device. Such devices eventually will have two sides of one coin: they can record the physical life and serve as an assistant, just like a “second brain.” At Meta we recently launched a “reminder” feature where a user can ask explicitly to “remember my parking lot” and later ask the assistant “remind me where I parked." This is a starting point and gradually we may remember more, possibly proactively, with the user’s opt-in. We may also use this information to make better recommendations, such as recommending a train museum to a user who has a train collection in her living room.

 

Xin Luna Dong is Principal Research Scientist at Meta Reality Labs, where she is leading the machine learning efforts to build an intelligent personal assistant. Developing an intelligent personal assistant includes work in contextual AI, multimodal conversation, search, and question answering, as well as recommendation and personalization. Her research interests include databases, data mining, natural language processing (NLP), and machine learning. Prior to joining Meta, she spent a decade working on knowledge graphs at Amazon and Google.

Dong’s honors include the VLDB Women in Database Research Award, and the VLDB Early Career Research Contribution Award. This year she was named an ACM Fellow and an IEEE fellow for contributions to knowledge graph construction and data integration.

She was recently featured on an episode of ACM Bytecast, where she discussed how early experiences growing up in China sparked her interest in computing, how her PhD experience in data integration lay the groundwork for future work with knowledge graphs, and much more.