Language Models Generate Multiple-Choice Questions with Artifacts
Atrey Desai, Nishant Balepur, Rachel Rudinger
Publications, preprints, class projects, and talks in one place.
Refine by year and topic without changing the reading order.
Atrey Desai, Nishant Balepur, Rachel Rudinger
Chace Hayhurst, Hyojae Park, Atrey Desai, Suheidy De Los Santos, Michael Littman
Nishant Balepur, Atrey Desai, Rachel Rudinger
While choices-only success is often deemed problematic, reasoning traces reveal that LLMs use less problematic strategies like inferring missing questions, challenging claims that partial-input success is always a flaw. Consequently, reasoning traces could help separate problematic data from less problematic reasoning.
Nishant Balepur, Bhavya Rajasekaran, Jane Oh, Michael Xie, Atrey Desai, Jordan Boyd-Graber
Atrey Desai, Tirza Panunto, Lindsay Pike, Theron S. Wang, Tuan M. Dang, Hridayesh Lekhak, Kenny Q. Zhu
Atrey Desai, Sathvik Nair
Atrey Desai
We analyze how YouTube "BookTube" creators adapt their phonetic features over time, finding that they significantly reduce vocal fry and increase pitch range as they gain experience. These changes correlate with higher audience engagement, suggesting that long-form content creators professionalize their speech to signal competence and credibility.
Atrey Desai, Leo Du, James van Doorn, Kamala Sreepada
We introduce a multilingual benchmark to evaluate how Large Language Models (LLMs) understand multi-turn humor in English and Spanish through joke classification and line-purpose identification tasks. We find that while larger open-source models (32B) significantly outperform smaller ones, all models struggle with nuanced humor comprehension and are highly sensitive to adversarial perturbations and cross-cultural linguistic differences.
The University of Texas at Austin · Feb 2026
University of Maryland, College Park · Jul 2025
University of Maryland, College Park · Apr 2025
University of Maryland, College Park · Nov 2024
The University of Texas at Arlington · Jul 2024