Evaluating Long-Context Question & Answer Systems
Eugene Yan
JUNE 21, 2025
Loong evaluates a model’s ability to locate, compare, cluster, and reason on evidence spread across multiple documents, typically ranging from 10,000 to over 250,000 tokens. Clustering : Aggregating and grouping relevant information from multiple sources based on specific criteria. © Eugene Yan 2015 - 2025 • Feedback • RSS
Let's personalize your content