RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • Overview
  • Supporting Retrievals
  • Role of the Retrievals in RAGchain
  • Advantages of Retrievals
  1. RAGchain Structure

Retrieval

Retrieval Module Documentation

PreviousToken splitterNextBM25 Retrieval

Last updated 1 year ago

Overview

With Retrieval module, you can ingest passage contents to vector representations. Then the module can retrieve the most similar passages for answering to given questions.

Supporting Retrievals

1.

BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document. It uses term frequency (TF) and inverse document frequency (IDF) to calculate a weight for each word in a document.

2.

Hybrid Retrieval combines multiple retrieval methods into one system. It allows you to leverage the strengths of multiple methods at once, potentially leading to better results than using any single method alone.

3.

Hyde is a passage retrieval method that generates hypothetical passages using language models and retrieves passages based on that generated passages.

4.

VectorDBRetrieval uses VectorDB as a backend for storing embedded vectors of passages and retrieving similar vectors given an input query using similarity search.

Role of the Retrievals in RAGchain

The role of these retrievals within our framework is essential as they serve as the backbone for information extraction from large amounts of data or documents based on user queries or needs. They provide efficient ways to search through data by embedding or ranking information so that relevant results can be returned quickly even with massive amounts of data.

Advantages of Retrievals

Each type of retrieval system offers its own unique advantages:

  • BM25: This method is simple yet effective, especially when dealing with unstructured text data where term frequency can indicate relevance.

  • Hybrid: By combining multiple methods, this approach can take advantage of their individual strengths while mitigating their weaknesses.

  • Hyde: This method's use of hypotheticals allows it to perform more precise search.

  • VectorDB: The use of vector embeddings enables this method to capture semantic relationships between words and phrases beyond simple keyword matching.

Remember that no single method will be best for all tasks; you should choose your retrieval system based on your specific needs and constraints.

BM25 Retrieval
Hybrid Retrieval
Hyde Retrieval
VectorDB Retrieval