FreshStack Metrics vs. Model Parameters
Average scores across 5 domains vs model parameter size; points are colored by model family.
FreshStack Metrics vs. Model Release Date
Average scores across 5 domains vs model release date; points are colored by model family.
Cite FreshStack
@inproceedings{
thakur2025freshstack,
title={FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents},
author={Nandan Thakur and Jimmy Lin and Sam Havens and Michael Carbin and Omar Khattab and Andrew Drozdov},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2025},
url={https://openreview.net/forum?id=54TTgXlS2U}
}