Time: 06:00 PM
Location: CIC Ideation Studio
On Thursday 24 November, we welcomed guest speaker Ryan Baker, one of the world’s leading researchers and consultants on Educational Data Mining. He hosted a conversation with our Master of Data Science & Innovation students to explore how the deeply technical meshes with the ethical and how machine and human intelligence can be mixed in a complex social setting (in this case, education).
Abstract of Event
We’ve started to answer the questions of what we can model through EDM, and we’re getting better and better at modelling each year. We publish papers that present solid numbers under reasonably stringent cross-validation, and we find that our models don’t just agree with training labels, but can predict future constructs as well. We’re making progress as a field in figuring out how to use these models to drive and support intervention, although there’s a whole lot more to learn.
But when and where can we trust our models? One of the greatest powers of EDM models is that we can use them outside the contexts in which they were originally developed, but how can we trust that we’re doing so wisely and safely? Theory from machine learning and statistics tell us about generalizability, and we know empirically that models developed with explicit attention to generalizability and construct validity are more likely to generalise and to be valid. But our conceptions and characterizations of population and context remain insufficient to fully answer the question of whether a model will be valid where will apply it. What’s worse, the world is constantly changing; the model that works today may not work tomorrow, if the context changes in important ways, and we don’t know yet which changes matter.
In this talk, Ryan Baker illustrated these issues by discussing his work to develop models that generalise across urban, rural, and suburban settings in the United States, and to study model generalizability internationally. He discussed work from other groups that started to think more carefully about characterising context and population in a concrete and precise fashion; where this work is successful, and where it remains incomplete. By considering these issues more thoroughly, we can become increasingly confident in the applicability, validity, and usefulness of our models for broad and general use, a necessity for using EDM in a complex and changing world.
Ryan Baker Biography
Ryan Baker is Associate Professor at the University of Pennsylvania, and Director of the Penn Center for Learning Analytics. His lab conducts research on engagement and robust learning within online and blended learning, seeking to find actionable indicators that can be used today but which predict future student outcomes.
Baker has developed models that can automatically detect student engagement in over a dozen online learning environments and has led the development of an observational protocol and app for field observation of student engagement that has been used by over 150 researchers in four countries.
He was the founding president of the International Educational Data Mining Society, is currently serving as Associate Editor of three journals, and the first technical director of the Pittsburgh Science of Learning Center DataShop, the world’s largest public repository for data on the interactions between learners and online learning environments. Baker has co-authored published papers with over 250 colleagues.