People of ACM - Tova Milo
August 22, 2017
What made you decide to pursue a career in data science?
Data is the foundation of everything: from better pharmaceutical research to better transportation systems to more productive enterprises. Early in my career, we thought about how to deal with new types of data: structured, unstructured, XML. Then the spiral growth in volume of data created a new breadth of problems, and now I am working on data that is tough to get hold of—knowledge that resides in people’s heads.
The MoDaS (Mob Data Sourcing) program, of which you are project lead, is seeking to establish scientific foundations for web-scale data sourcing. How could a more theoretically-grounded approach improve current crowdsourcing practices via the web? What are some examples of breakthroughs we might see from theoretically-grounded web-sourcing 10 years from now?
Crowd-based data sourcing democratizes data collection, cutting companies' and researchers' reliance on stagnant, overused datasets and bears great potential for revolutionizing our information world. Yet, triumph has so far been limited to only a handful of successful projects such as Wikipedia and IMDb, or in large companies such as Google or Amazon. This comes, notably, from the difficulty of managing huge volumes of data and users of questionable quality and reliability. Every single initiative had to battle, almost from scratch, the same non-trivial challenges. The ad hoc solutions, even when successful, are application-specific and rarely sharable.
In the MoDaS project we are developing solid scientific foundations for web-scale data sourcing. We believe that such a principled approach is essential to obtaining knowledge of superior quality, to realizing the task more effectively and automatically, and being able to reuse solutions, thereby accelerating the pace of practical adoption of this technology that is revolutionizing our life. This will open the way for developing a new and otherwise unattainable universe of knowledge in a wide range of applications, from scientific fields to social and economic ones.
Analysts predict that the collection and analysis of data will continue to explode in the coming years, but data integration remains a challenge. Are there any exciting recent research developments that might impact the wider data integration effort in the coming years?
Human-machine integration has been a dream (and a fear) for many years. Taking a data-centric perspective, what we are aiming for is a world where digital data and human knowledge can be integrally and dynamically brought to bear on a range of hard problems.
Crowd-mining frameworks, such as the one that we are developing in the MoDaS project, combine general knowledge (which can refer to an ontology or information in a database) with individual knowledge obtained from the crowd (which captures, for example, habits, insights, and preferences). To account for such mixed knowledge, along with user interaction and optimization issues, a complex process of reasoning, automatic crowd task generation, and result analysis needs to be employed. While some significant progress has been made, much is still left to be done to achieve the ultimate global knowledge-based dream.
You are recognized as a prolific author. How do your publishing activities mesh with your other professional responsibilities? Any advice on how to balance the two?
I spend about half of my time on writing, rewriting and then rewriting again. Frankly, this is the part of my work that I like the least, and I don’t think that I am very good at it. But, even the best research findings don’t create an impact if they are poorly articulated. So I spend a lot of time with my students on ensuring that our work is sound and accessible.
Tova Milo is a Professor of Computer Science and Lead of the Data Management Group at Tel Aviv University. Her research interests include databases, web data management, data-centric business processes and crowd-based data sourcing. Milo has co-authored more than 200 papers and written a book on business processes. Along with co-authors Victor Vianu and Dan Suciu, she received the ACM PODS Alberto O. Meldelzon Test-of-Time Award for the paper Typechecking for XML Transformers.
Milo has served on the editorial board of ACM Transactions on Database Systems (TODS) and was a Program Chair for multiple leading database conferences, including the ACM Symposium on Principles of Database Systems (PODS). Her honors include being selected as an ACM Fellow and as a member of the Academy of Europe. In 2017 she won the Weizmann Prize for Exact Sciences as well as the VLDB Women in Database Research award.