Discovering the principles of crowd work - Wikipedia and beyond

Systems such as Wikipedia or Linux involve hundreds of thousands of contributors with different knowledge, views, and motivations directly interacting with and building on each others’ work. As such these systems violate simple “wisdom of crowds” assumptions of independent judgments and automatic aggregation. To understand how these systems function, we draw on social science theories about how people organize in the offline world to find effective algorithms for crowd work. For example, using parallel computing to analyze the entire history of Wikipedia, we discovered that adding more contributors to an article on average actually decreases article quality, rather than improving it as we (and many others) expected. To address these issues we have drawn from concepts in organizational behavior such as group identification, motivation, and coordination to identify social algorithms that lead to effectively combining the efforts of many contributors. Our models have been validated in thousands of online communities, and extended to understand domains such as massively collaborative scientific discovery (e.g., the Polymath Projects) and crowdsourcing task markets (e.g., Amazon’s Mechanical Turk).

Improving crowd sensemaking

Building on these principles, we are developing tools to increase the ability of the crowd to understand each others’ work and effectively contribute. As the size and complexity of crowd-built artifacts grow, so do challenges to workers’ability to effectively contribute to them. For example, despite early exponential growth, Wikipedia has been losing members, due in part to the difficultly of adding to articles that thousands have edited before. We are developing intelligent interfaces for Wikipedia combining visualization and machine learning that enable users to quickly understand thousands of differing viewpoints; predict conflict across millions of pages; facilitate interactions between editors; and help newcomers understand how to improve their contributions. We are now partnering with the Wikimedia Foundation to create a novel research methodology for Wikipedia that enables researchers to design, deploy and field test new interfaces without interfering with the ongoing function of the live site. Building on this experience, we are also developing new distributed sensemaking systems in the context of scientific discovery to enable greater interdisciplinary insights (cognitiveatlas.org).

Enabling new crowd abilities - Crowdsourcing complex work

We are developing ways to accomplish complex, interdependent and creative tasks in crowdsourcing markets such as Amazon’s Mechanical Turk, which have previously been limited to simple, independent, and objective tasks. Some examples include using the crowd for article writing, poetry translation, and science journalism.

Quality control for crowdsourcing
Enabling more complex and creative work incurs new challenges, for example maintaining quality control for subjective tasks which may have many valid answers or that are difficult to evaluate. Our initial work in this area developed new task design methods based on the idea of making believable invalid answers more effortful than good faith responses, which reduced cheating by a factor of 20 and more than doubled time-on-task. More recently , we have developed machine learning systems that can predict crowd workers’ quality based only on how they do the work (e.g., scrolling, mouse movements) without knowledge of their actual output, enabling quality control for subjective tasks where traditional approaches such as gold standards or worker agreement are impossible.

Coordination and crowdsourcing
Another key challenge is coordinating many small micro-contributions to create a complex, interdependent artifact, such as writing an article or coding software. For example, imagine trying to write a coherent article with a hundred contributors in which each only provides a few minutes of work. We have developed systems inspired by distributed computing models such as Google’s MapReduce which manage the dependencies between contributors to produce crowd-written articles rated better than individually-produced ones and as good as Wikipedia articles. In collaboration with professional journalists, we are now experimenting with crowdsourcing the science journalism process (e.g., mybossisarobot.com), with the dual benefits of involving citizens as participants in the research process and increasing the amount of scientific information consumable by the general public.

Augmenting scientific discovery and insight

Collaborative ontology building
The rapid growth of medical and neuroscientific data require structured ontologies which can help people find and make use of large data stores. However, traditional bioinformatics approaches to ontology building are insufficient for many domains where rapid advances mean that knowledge is both dynamic and distributed among many sources. Other drawbacks include high participation costs, limited longevity, and adoption issues. In contrast, we are developing a system for ontology building that combines the bazaar-like aspects of emerging collaborative online paradigms such as Wikipedia with cathedral-like support for structured data. The goal of this research is to enable researchers to make robust cross-disciplinary inferences (such as tying together cognitive, behavioral, and neuroimaging data) in a system that brings together knowledge from many researchers while maintaining low participation costs. A prototype of the system can be found at cognitiveatlas.org.

Literature discovery
The recent rapid advancement of science has been accompanied by an explosion in the accompanying scientific literature. Nowadays to make a significant advance it seems necessary to study a single area of science for many years; yet some of the most important discoveries are made by bridging between fields. In order to help people more quickly understand a new field we are developing tools that combine visualization and large scale data mining to help researchers build mental models of information spaces.

Collaborative knowledge mapping
Increasingly specialized knowledge is required to comprehend our complex world. Often, making sense of consumer choice, scientific advancements, and social issues involves frequently changing information, rapid learning, and competing viewpoints. This project aims to design a web-based knowledge building environment for the collaborative creation of knowledge maps. Knowledge maps are interactive visualizations that capture and convey the information, argumentation, and perspective that underlies people's informal theories about the empirical world.