RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • Overview
  • Usage
  • Initialization
  • Rerank
  1. RAGchain Structure
  2. Reranker

BM25 Reranker

Overview

The BM25Reranker is a class that leverages the BM25 ranking function to rerank a list of passages based on their relevance to a given query.

Usage

Initialization

Create an instance of BM25Reranker. The tokenizer_name parameter should specify the name of the tokenizer to use. If not provided, it defaults to "gpt2". You can put any tokenizer name from huggingface.

from RAGchain.reranker import BM25Reranker

reranker = BM25Reranker(tokenizer_name="gpt2")

Rerank

Call the rerank method on your BM25Reranker instance to rerank a list of passages. This method takes as input a query string and a list of Passage objects, and returns a list of Passage objects sorted by their relevance to the query.

query = "What is query decomposition?"
passages = [...list_of_passages...] # Assume we have list_of_passages retreived earlier

rerank_passages = bm25_reranker.rerank(query, test_passages)
print(rerank_passages)

In the rerank method, the contents of the passages are first tokenized. Then, a BM25Okapi instance is created with the tokenized content. The BM25 scores of the tokenized query with respect to each tokenized content are calculated. The passages are then sorted by their scores in descending order.

PreviousRerankerNextUPR Reranker

Last updated 1 year ago