Quick, Create a Distractor! Evaluating LLM Distractors for Multiple-Choice Benchmarks
Atrey Desai, Nishant Balepur, Rachel Rudinger
I am a third-year undergraduate student studying computer science and linguistics with a minor in korean studies at the University of Maryland.
I am fortunate to be advised by Professors Rachel Rudinger and Jordan Boyd-Graber.
Language models are increasingly capable, but our methods for measuring and building that capability lag behind. I work on evaluation and data pipelines for reliable NLP, namely:
1. Benchmark validity and the limits of what our evaluations actually measure
2. Human-AI collaboration in synthetic data creation and annotation
3. Evaluation for systems that reason and perceive in the world
[IP] = in progress
Atrey Desai, Nishant Balepur, Rachel Rudinger
Nishant Balepur, Atrey Desai, Rachel Rudinger
Atrey Desai, Sathvik Nair