Digging Into Data

Two professors detail how we came to live in a world driven by algorithms.

Professors Jones and Wiggins

Professors Matthew L. Jones and Chris Wiggins ’93.


On a bright, chilly Tuesday in January, just after the start of the Spring semester, 100 or so College and Engineering students filed into a Schermerhorn lecture hall for their second week of “Data: Past, Present and Future,” taught by Professors Chris Wiggins ’93 and Matthew L. Jones. The two-part class has exploded in popularity since being introduced in 2017, fueled by the need for fluency in a culture increasingly saturated and shaped by data. Wiggins was speaking from the podium that day; Jones would take center stage on Thursday to lead a hands-on coding lab.

Sitting among the undergrads in a middle row, taking in what Wiggins was saying about the assumed superiority of numerical data, I quietly had my mind blown. I had always believed that math was sacrosanct, that quantitative information was king, that numbers equaled facts. But that morning, Wiggins, an associate professor of applied mathematics and systems biology and the chief data scientist at The New York Times, informed us that that notion was simply not correct.

“Data comes with truthiness that is unwarranted,” he said. “With data comes power, including the power to shape what is perceived to be true.”

In fact, Wiggins and Jones, the James R. Barker Professor of Contemporary Civilization, discuss truth and power all semester. The class aims to get students thinking critically about how mathematical analysis came to be the dominant way to understand — and control — the world; to investigate how data-empowered algorithms have come to shape our personal, professional and political lives; and to consider the ethical stakes of a data-mediated reality.

Later that week, during the first computer lab exercise, Jones demonstrated the shadiness of numerical objectivity. He had the class look at data sets — lists of numbers or values that relate to a particular subject; in this case, the examples ranged from wine tasting to pharmaceutical use. Then they used coding tools to take the rows and columns apart and determine where the info came from. “The students see that numbers are artifacts of choices that people make,” Jones says. “Which doesn’t mean that they’re useless, or throw us into complete skepticism. But it’s where we begin building.”

How Data Happens box
Teaching students to think critically about data situates “Data: Past, Present and Future” in the timely intersection of science and the humanities — the techy and the fuzzy, to use an analogy Wiggins likes. And as a data scientist and a historian of science and technology, Wiggins and Jones offer fascinatingly complementary perspectives. The professors have recently expanded on their course in the book How Data Happened: A History from the Age of Reason to the Age of Algorithms (W.W. Norton & Co., $30), released in March.

“Our goal is to provide a framework for understanding the persistent role of data in rearranging power,” Wiggins and Jones write in their introduction. “We hope to show how we collectively got here ... this will, in turn, help us picture how we can break and reset the bones of systems that sometimes empower the defenseless — yet have more often strengthened the empowered.”

Though the title might suggest a heavy math dive, How Data Happened is actually a lot about people. From Belgian astronomer Adolphe Quetelet to statistics pioneer Florence Nightingale (yes, that Florence Nightingale!) to the O.G. of algorithms, Alan Turing, Wiggins and Jones engagingly describe not only how data happened, but why and by whom.

“Scientists are human beings, and science is built of very human stories — it’s just we don’t usually foreground that because we think about ‘science’ as objectivity and ideas and technique,” Wiggins says. “We don’t think about the people, and what their interests are in creating and arguing for those techniques.

“The point of the book — and what motivated the class — is that I really do believe it should be known that data is created by real people,” Wiggins continues. “It’s the recurring theme of subjective design choices and the way we make decisions about something that we then present as being objective.”

The professors met in 2013, after Wiggins attended Jones’ lecture on the history of machine learning (“His talk had a lot about the government mining of personal data — and then the Snowden revelations happened,” Wiggins recalls. “I said to him, ‘Oh my god, you skated toward exactly where the puck was going to go!’”); they collaborated for the first time a year later, when they were invited by the Journalism School to craft a class for data journalists.

“It was a great experience,” Jones says. “It really contributed to our sense of the way in which you can teach people a set of tools, and at the same time, teach them the limitations of it.”

Jones and Wiggins working on laptops

Jones and Wiggins at work.


Wiggins and Jones launched “Data: Past, Present and Future” in January 2017, with support from the Collaboratory at Columbia, a joint initiative of the Columbia Data Science Institute and Columbia Entrepreneurship that aims to promote digital literacy by integrating data and computer science into other areas of study.

This was right after the contentious 2016 election; the course kicked off as President Trump was inaugurated, a time when narratives around data, information and reality were shifting significantly (remember, this was the first time we heard the term “fake news”). The course started as a small seminar, but blew up quickly in size and content at the same time Facebook CEO Mark Zuckerberg was being called to discuss data privacy before the Senate. “It became clear that the students did not just want a history of data science,” Wiggins says. “They really wanted to understand the relationship between data and truth, and data and power.”

The professors say they consistently update their course material in order to stay current and challenge their two constituencies: students with a mathematical or computational background and those who major in the humanities. The duality of lecture and lab provides discussion one day (students read articles by computer scientists, social scientists and even some sections of How Data Happened) and hands-on learning with modern tools the next. Assignments are split on two tracks to get people outside their comfort zones: The techies get longer, more humanistic papers to write; the fuzzies do more technical problem sets. There’s also an unusually upbeat class Slack channel, with conversation flowing among the professors, students and alumni of the course (who are welcome to stick around) — some even opt to stay connected as coding homework graders.

“It became clear that the students did not just want a history of data science,” Wiggins says. “They really wanted to understand the relationship between data and truth, and data and power.”

The students were lively and engaged on the days I attended, and the rapport between Wiggins and Jones was palpable. Conversely, Wiggins, the scientist, gives the lecture, while Jones, the historian, teaches the coding. (Their professorial styles also differ notably: Wiggins is compelling and droll but stationary, while Jones is more animated, pacing and gesturing, dropping TikToks into his lab presentation. It was apparent the students enjoy them both.)

Though not limited to College students, “Data: Past, Present and Future” has deep connections to the Core Curriculum. The class satisfies a Core science requirement; Jones was chair of Contemporary Civilization from 2009 to 2012 and still teaches CC; and Wiggins is an alumnus who completed the curriculum. “I see a lot of continuity to CC and Lit Hum and other components of the Core in the way that we teach this class,” Jones says. “For me, it’s a particular privilege, because it allows you to think of the history of science as something that is deeply engaged with questions of knowledgeable citizenry.”

Wiggins doubles down on the subject of critical thinking. “I didn’t realize when we started teaching the class how much there is to know about how we know what we know,” he says. “What even is science? How do we collectively construct a consensus on what is true?”

The warm dynamic Wiggins and Jones have in the classroom comes through in their writing, but when asked how How Data Happened happened, both authors are modest. They knew once they began the class that they had a good story to tell — for example, Jones says that the development of statistics up to WWII is well known among historians but less so outside those circles. “And the students were really gaining a lot from it,” he says.

Jones, however, says he had no idea how to write a non-academic book; Wiggins claims he didn’t know how to write a book, period. Jones’ colleague, historian and author Adam Tooze, suggested they take their inspiration from the genre of a great undergraduate lecture. “The right scholarly depth with expositional clarity,” Jones recalls. “And so the way we were teaching the lecture component of the class became the framework for most of the chapters.”

How Data Happened is laid out in three parts: the history of data, its evolution and finally, how we might bend data’s current trajectory to better ends. Wiggins and Jones had a lot to work with, in terms of the academic literature they had already assigned to their students. The later chapters include their own scholarly work; they discuss how data and power moved from a state concern to a corporate one, and close the book with a reminder that just because we can use algorithmic decision-making systems doesn’t mean we must (“ads based on mass surveillance are not necessary elements of our society”). Societal change may be slow, they say. But it’s possible.

“There are alternatives ... it can be done,” the book’s final pages read. “Many potential forces, large and small, are available to us, directly and indirectly, to shape the relationships among technology and norms, laws and markets, and data’s role in it all.”