RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • Overview
  • Usage
  1. RAGchain Structure
  2. Retrieval

Hybrid Retrieval

HybridRetrieval Class Documentation

Overview

The HybridRetrieval class is designed to retrieve passages from multiple retrievals. It combines retrieval scores using either the Reciprocal Rank Fusion (RRF) algorithm or Convex Combination (CC) algorithm. RRF algorithm calculate final similarity scores based on ranking in each retrievals. CC algorithm can caluclate scores with different weights between each retrievals.

Usage

Initialize

To create an instance of the HybridRetrieval class, you need to provide a list of Retrieval objects.

You can provide p value, which means retrieve passages counts from each retrievals before run rrf or cc algorithm. If p value is small, it might can't get enought passages to reach top_k value. You should need more p value if your retrievals have huge passages.

If you want to use RRF algorithm, you can provide rrf_k value, which is hyper parameter in rrf algorithm.

If you want to use CC algorith, you can provide a list of weights corresponding to each retrieval method. The weights should sum up to 1.0.

import chromadb
from RAGchain.retrieval import BM25Retrieval, VectorDBRetrieval, HybridRetrieval
from RAGchain.utils.vectorstore import ChromaSlim
from RAGchain.utils.embed import EmbeddingFactory

# initialize various retrievals
chroma = ChromaSlim(
    client=chromadb.PersistentClient(path='your/path/to/chroma'),
    collection_name='your_collection_name',
    embedding_function=EmbeddingFactory('openai').get()
)
bm25_retrieval = BM25Retrieval(save_path='your/path/to/bm25_file.pkl')
vectordb_retrieval = VectorDBRetrieval(chroma)
retrievals = [bm25_retrieval, vectordb_retrieval]

# if you want to use rrf algorithm
hybrid_retrieval_rrf = HybridRetrieval(retrievals, p=100, method='rrf', rrf_k=60)

# if you want to use cc algorithm
weights = [0.6, 0.4]
hybrid_retrieval_cc = HybridRetrieval(retrievals, p=100, method='cc', weights=weights)

Ingest

passages = [ ... ] # your list of Passage objects
hybrid_retrieval_cc.ingest(passages)

Retrieve

Retrieve top-k passages for a given query.

pythonquery = "What is AI?"
top_k_passages = hybrid_retrieval_cc.retrieve(query)

Retrieve with filter

You can also filter the retrieved passages. Use the retrieve_with_filter method and provide the query, top-k value, and a list of content, filepath, or metadata values to filter by.

Here's an example:

filtered_passages = hybrid_retrieval_cc.retrieve_with_filter(query, top_k, filepath=["filepath1", "filepath3"])
# This code will search top-5 most similar passages with filepath1 and filepath3
PreviousBM25 RetrievalNextHyde Retrieval

Last updated 1 year ago

Ingest a list of s into all retrievals in the hybrid retrieval.

In this method uses DB.search method. Please refer for further information.

Passage
here