Date of Award
Honors Bachelor of Arts
This paper presents a comparative analysis of human and AI performance on a sentiment analysis task involving the coding of qualitative data from community program transcripts. The results demonstrate promising but imperfect agreement between two AI models, Claude and Bing, versus three human annotators and one expert annotator using the Community Capitals framework categories. While both models achieved fair alignment with human judgment, confusion patterns emerged involving metaphorical language and text overlapping multiple categories. The findings provide a case study for benchmarking conversational AI systems against human baselines to reveal limitations and target improvements. Key gaps center around distinguishing between social and human elements and handling cultural references. Expanded testing on more diverse datasets could further quantify differences in classification capabilities. Overall, the analysis exposes definable areas where machines still struggle compared to humans, highlighting productive research directions to eventually achieve a similar threshold to humans across diverse language inputs. As AI systems enter real-world applications, human-AI comparative studies can help define boundaries between robust statistics-based learning and adaptive human cognition.
Shah, Aakriti, "Evaluating AI Sentiment Analysis" (2023). Honors Program Theses. 218.