Dataset Evaluators
Overview
The DatasetEvaluator classes in the RAGchain framework are used for evaluating metrics using question answering datasets. The datasets currently supported are StrategyQA, Ko-StrategyQA, and Qasper.
Supporting Datasets
A multi-hop open-domain question answering dataset. It has paragraphs from wikipedia, and every question is multi-hop question, which needs to retrieve multiple paragraphs to answer the question. Plus, all answers is True/False type. We do not support answer evaluator for this dataset yet. But you can easily check answers by yourself.
The Korean version of the StrategyQA dataset.
A dataset for question answering on scientific (NLP) papers. It can evaluate the retrieval performance of one NLP paper document, plus answer performance.
MSMARCO (Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking. The passages are top-k results Bing engin searched based on question.
Mr.TyDi is a multi-lingual benchmark dataset built on TyDi. Mr.TyDi apply for languages that is 11 diverse languages and 1 combined diverse languages.
Last updated