research

Publications, preprints, class projects, and talks in one place.

Filter

Refine by year and topic without changing the reading order.

publications

2025

MASC-SLL 2025

Language Models Generate Multiple-Choice Questions with Artifacts

Atrey Desai, Nishant Balepur, Rachel Rudinger

NLPBenchmarking

2022

AAAI IML Workshop 2022, RLDM 2022 2022

Reinforcement Learning As End-User Trigger-Action Programming

Chace Hayhurst, Hyojae Park, Atrey Desai, Suheidy De Los Santos, Michael Littman

Reinforcement LearningHuman-AI Interaction

preprints

Under Review at ARR 2025 preprint

Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers

Nishant Balepur, Atrey Desai, Rachel Rudinger

While choices-only success is often deemed problematic, reasoning traces reveal that LLMs use less problematic strategies like inferring missing questions, challenging claims that partial-input success is always a flaw. Consequently, reasoning traces could help separate problematic data from less problematic reasoning.

NLPReasoning
Under Review at Computational Linguistics 2025 preprint

A Preview of Computational Animal Linguistics

Atrey Desai, Tirza Panunto, Lindsay Pike, Theron S. Wang, Tuan M. Dang, Hridayesh Lekhak, Kenny Q. Zhu

NLPAnimal Communication
TLS (Oral), Under Review at ARR 2025 preprint

Filling in the Mechanisms: How do LMs Learn Filler-Gap Dependencies under Developmental Constraints?

Atrey Desai, Sathvik Nair

NLPLinguistics

class projects

2025

Longitudinal Phonetic Adaptation in YouTube BookTube Creators

Atrey Desai

We analyze how YouTube "BookTube" creators adapt their phonetic features over time, finding that they significantly reduce vocal fry and increase pitch range as they gain experience. These changes correlate with higher audience engagement, suggesting that long-form content creators professionalize their speech to signal competence and credibility.

LinguisticsPhonetics
2025

From LOL to LLM: Measuring Multilingual Multi-Turn Humor Understanding in AI

Atrey Desai, Leo Du, James van Doorn, Kamala Sreepada

We introduce a multilingual benchmark to evaluate how Large Language Models (LLMs) understand multi-turn humor in English and Spanish through joke classification and line-purpose identification tasks. We find that while larger open-source models (32B) significantly outperform smaller ones, all models struggle with nuanced humor comprehension and are highly sensitive to adversarial perturbations and cross-cultural linguistic differences.

NLPHumorMultilingual

talks & presentations

research talk

Filler-Gap Dependencies under Developmental Constraints in LMs

The University of Texas at Austin · Feb 2026

poster

Language Models Generate Multiple-Choice Questions with Artifacts

University of Maryland, College Park · Jul 2025

University of Maryland, College Park · Apr 2025

research talk

Adaptor Grammars and Neural Networks for Feline Lexical Discovery

University of Maryland, College Park · Nov 2024

The University of Texas at Arlington · Jul 2024

Washington, DC
Last updated Mar 24, 2026