research

Filter:

Show preprints

publications

2025

Language Models Generate Multiple-Choice Questions with Artifacts

Atrey Desai, Nishant Balepur, Rachel Rudinger

MASC-SLL, 2025

NLPBenchmarking

PDF

2022

Reinforcement Learning As End-User Trigger-Action Programming

Chace Hayhurst, Hyojae Park, Atrey Desai, Suheidy De Los Santos, Michael Littman

AAAI IML Workshop 2022, RLDM 2022, 2022

Reinforcement LearningHuman-AI Interaction

PDF

preprints

Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers

Nishant Balepur, Atrey Desai, Rachel Rudinger

Under Review at ACL Rolling Review, 2025 preprint

TL;DR: While choices-only success is often deemed problematic, reasoning traces reveal that LLMs use less problematic strategies like inferring missing questions, challenging claims that partial-input success is always a flaw. Consequently, reasoning traces could help separate problematic data from less problematic reasoning.

NLPReasoning

arXiv

BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks

Nishant Balepur, Bhavya Rajasekaran, Jane Oh, Michael Xie, Atrey Desai, Jordan Boyd-Graber

Under Review at ACL, 2025 preprint

NLPBenchmarking

A Preview of Computational Animal Linguistics

Atrey Desai, Tirza Panunto, Lindsay Pike, Theron S. Wang, Tuan M. Dang, Hridayesh Lekhak, Kenny Q. Zhu

Under Review at Computational Linguistics, 2025 preprint

NLPAnimal Communication

Filling in the Mechanisms: How do LMs Learn Filler-Gap Dependencies under Developmental Constraints?

Atrey Desai, Sathvik Nair

Under Review at TLS, ACL, 2025 preprint

NLPLinguistics

class projects

Longitudinal Phonetic Adaptation in YouTube BookTube Creators

Atrey Desai

2025

TL;DR: We analyze how YouTube "BookTube" creators adapt their phonetic features over time, finding that they significantly reduce vocal fry and increase pitch range as they gain experience. These changes correlate with higher audience engagement, suggesting that long-form content creators professionalize their speech to signal competence and credibility.

LinguisticsPhonetics

PDF

From LOL to LLM: Measuring Multilingual Multi-Turn Humor Understanding in AI

Atrey Desai, Leo Du, James van Doorn, Kamala Sreepada

2025

TL;DR: We introduce a multilingual benchmark to evaluate how Large Language Models (LLMs) understand multi-turn humor in English and Spanish through joke classification and line-purpose identification tasks. We find that while larger open-source models (32B) significantly outperform smaller ones, all models struggle with nuanced humor comprehension and are highly sensitive to adversarial perturbations and cross-cultural linguistic differences.

NLPHumorMultilingual

PDF

talks & presentations

Filler-Gap Dependencies under Developmental Constraints in LMs

The University of Texas at Austin · Feb 2026 research talk

Language Models Generate Multiple-Choice Questions with Artifacts

University of Maryland, College Park · Jul 2025 poster

Language Models Generate Multiple-Choice Questions with Artifacts

University of Maryland, College Park · Apr 2025 poster

Adaptor Grammars and Neural Networks for Feline Lexical Discovery (adapted)

University of Maryland, College Park · Nov 2024 research talk

Adaptor Grammars and Neural Networks for Feline Lexical Discovery

The University of Texas at Arlington · Jul 2024 research talk