The CIC team recently gave an overview of our recent LLM work to our Qualitative Data Analysis colleagues, as part of the Aspire QDA series
Simon Buckingham Shum (Connected Intelligence Centre), Antonette Shibani (TD School), Lisa-Angelique Lim (Connected Intelligence Centre) & Ram Ramanathan (Connected Intelligence Centre). This is work from collaborations with Aneesha Bakharia, Trish McCluskey & Nazanin Reza zadeh mottaghi
ABSTRACT: Until recently, qualitative data analysis (QDA), such as the deductive and inductive coding of textual data, was considered the preserve of human researchers. The nuanced judgements required to apply a complex coding scheme, or to discern themes that evolve into a coding scheme, were beyond algorithms. However, the emergence and mainstream availability of large language models (LLMs: e.g., GPT, Gemini, Claude, Llama) has catalysed rigorous research into their ability to perform such QDA in minutes. This is accompanied by healthy debate on whether this could lead to the full automation of certain kinds of analysis, or the augmentation of their work through productive, hybrid analysis with a new generation of interactive QDA tools. Using LLMs hosted by privacy-respecting, secure, university instances, we have been testing LLMs for both inductive and deductive coding, and welcome your thoughts on how we address important considerations including:
- How can we translate a theory-grounded codebook into a system prompt guiding the LLM?
- How do we evaluate the quality of the coding compared to human researchers?
- Since (like humans) LLMs are intrinsically variable in their coding, how do we understand and manage this variability?
- How can an LLM provide a transparent account of its inductive coding of a corpus so humans can understand it?
- How will human and machine analysts work together in the future, harnessing their respective strengths?
- What concerns do researchers have about automated coding, and can these be addressed?
Publications for the details…
Bakharia, A., Shibani, A., Lim, L.-A., McCluskey, T., & Buckingham Shum, S. (2025). From Transcripts to Themes: A Trustworthy Workflow for Qualitative Analysis Using Large Language Models. Proceedings From Data to Discovery: LLMs for Qualitative Analysis in Education(Workshop, LAK25), Dublin. [Preprint]
Ramanathan, S., Lim, L.-A., Mottaghi, Nazanin R., & Buckingham Shum, S. (2025). When the Prompt Becomes the Codebook: Grounded Prompt Engineering (GROPROE) and its Application to Belonging Analytics. Proceedings LAK25: 15th International Conference on Learning Analytics & Knowledge, Dublin, IRE. https://doi.org/10.1145/3706468.3706564