RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • Overview
  • Usage
  1. RAGchain Structure
  2. Retrieval

Hyde Retrieval

HyDERetrieval Class Documentation

PreviousHybrid RetrievalNextVectorDB Retrieval

Last updated 1 year ago

Overview

The HyDERetrieval class is inspired by the paper "". It uses a language model to generate a hypothetical passage for a given query and then retrieves passages using this hypothetical passage as the query.

Usage

Initialize

First, prepare the retrieval instance you want to use and set up your system prompt. In this example, we are using BM25Retrieval as the base retrieval method and setting up a custom system prompt.

from RAGchain.retrieval import BM25Retrieval, HyDERetrieval

test_prompt = "Please write a scientific paper passage to answer the question"
bm25_retrieval = BM25Retrieval(save_path="path/to/your/bm25/save_path")
hyde_retrieval = HyDERetrieval(bm25_retrieval, system_prompt=test_prompt)

Ingest

Ingest a list of s into the retrieval in the HyDE retrieval.

passages = [ ... ] # your list of Passage objects
hyde_retrieval.ingest(passages)

Retrieve

Retrieve top-k passages for a given query. You can also specify model kwargs such as max tokens for hypothetical passage generation model. modle kwargs reference is in .

query = "What is visconde structure?"
top_k = 5
top_k_passages = hyde_retrieval.retrieve(query, top_k=top_k, model_kwargs={'max_tokens': 64})

Retrieve with filter

You can also filter the retrieved passages. Use the retrieve_with_filter method and provide the query, top-k value, and a list of content, filepath, or metadata values to filter by.

Here's an example:

filtered_passages = hyde_retrieval.retrieve_with_filter(query, top_k, filepath=["filepath1", "filepath3"])
# This code will search top-5 most similar passages with filepath1 and filepath3

In this method uses DB.search method. Please refer for further information.

Precise Zero-shot Dense Retrieval without Relevance Labels
Passage
openai api docs
here