hey, i'm atrey desai.

I am a third-year undergraduate student double majoring in Computer Science and Linguistics with a minor in Korean Studies at the University of Maryland.

I am fortunate to be advised by Professor Rachel Rudinger and Professor Jordan Boyd-Graber .

I am a member of the technical staff of Learn Prompting.

research interests: natural language processing, particularly:

  1. How can we verify validity and robustness of existing benchmarks?
  2. How can humans and AI collaborate in data creation?
  3. How can we create new evaluation methods that probe multimodal, linguistic, and spatiotemporal understanding?

research

see all

Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers

Nishant Balepur, Atrey Desai, Rachel Rudinger

Under Review at ACL Rolling Review, 2025 preprint

TL;DR: While choices-only success is often deemed problematic, reasoning traces reveal that LLMs use less problematic strategies like inferring missing questions, challenging claims that partial-input success is always a flaw. Consequently, reasoning traces could help separate problematic data from less problematic reasoning.

NLPReasoning

BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks

Nishant Balepur, Bhavya Rajasekaran, Jane Oh, Michael Xie, Atrey Desai, Jordan Boyd-Graber

Under Review at ACL, 2025 preprint

NLPBenchmarking

Language Models Generate Multiple-Choice Questions with Artifacts

Atrey Desai, Nishant Balepur, Rachel Rudinger

MASC-SLL, 2025

NLPBenchmarking
Washington, DC