RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • Overview
  • Supporting Evaluators
  • Supporting Datasets
  • Supporting Metrics
  • Role of the Evaluator in the Framework
  • Advantages of Evaluators
  1. RAGchain Structure

Benchmark

Evaluate your pipeline

PreviousLLM RerankerNextAuto Evaluator

Last updated 1 year ago

Overview

The RAGchain Benchmark Module is a tool designed for the evaluation of Retrieval-Augmented Generation (RAG) workflows. It allows developers to evaluate their pipelines performance with different datasets and user's questions.

Supporting Evaluators

The Benchmark module supports two types of evaluators: AutoEvaluator and DatasetEvaluator.

  • AutoEvaluator: This tool allows you to evaluate your pipeline with your own questions without a dataset. It can evaluate retrieved passages and answers without ground truth answers and ground truth retrieved passages.

  • DatasetEvaluator: This tool can evaluate your pipeline with question answering datasets.

Supporting Datasets

Supporting Metrics

Each evaluator supports different metrics.

✅ means that the evaluator supports the metric. ❌ means that the evaluator does not support the metric. 🚧 means that the evaluator will support the metric in the future.

AP

❌

❌

❌

❌

✅

❌

NDCG

❌

❌

❌

❌

✅

❌

CG

❌

❌

❌

❌

✅

❌

Ind_DCG

❌

❌

❌

❌

✅

❌

IDCG

❌

❌

❌

❌

✅

❌

Recall

❌

✅

✅

✅

✅

✅

Precision

❌

✅

✅

✅

✅

✅

RR

❌

❌

❌

❌

✅

❌

Hole

❌

✅

✅

✅

✅

✅

Accuracy

❌

✅

✅

✅

✅

✅

EM

❌

✅

✅

✅

✅

✅

F1

❌

✅

✅

✅

✅

✅

ragas context-recall

❌

✅

✅

✅

✅

✅

ragas context-precision

✅

✅

✅

✅

✅

✅

ragas answer-relevancy

✅

✅

❌

❌

✅

❌

ragas faithfullness

✅

✅

❌

❌

✅

❌

BLEU

❌

✅

❌

❌

✅

❌

ROUGE-L

🚧

🚧

🚧

🚧

🚧

🚧

KF1

❌

✅

❌

❌

✅

❌

Role of the Evaluator in the Framework

The Benchmark module evaluate the performance of the RAG workflow by generated answers and retrieved passages. The performance is measured using more than 10 metrics. The evaluators provide a holistic view of the model's performance and help in identifying areas of improvement.

Advantages of Evaluators

  • Comprehensive Evaluation: The evaluators provide a thorough and detailed evaluation of the RAG workflow. They assess the performance of the workflow from multiple aspects and provide a comprehensive view of its effectiveness.

  • Flexible: They allow you to use your own questions or question answering datasets for evaluation. Plus, you can create your own Evaluator for new datasets.

  • Easy to Use: The evaluators come with an easy-to-use interface. You just need to provide your pipeline, and the evaluators will take care of the rest. You even don't have to download the datasets. The evaluators will download the dataset automatically.

StrategyQA
Ko-StrategyQA
Qasper
MS-MARCO
Mr-Tydi
AutoEvaluator
Qasper
Ko-Strategy-QA
Strategy-QA
ms-marco
mr-tydi