Hyde Retrieval

HyDERetrieval Class Documentation

Overview

The HyDERetrieval class is inspired by the paper "Precise Zero-shot Dense Retrieval without Relevance Labels". It uses a language model to generate a hypothetical passage for a given query and then retrieves passages using this hypothetical passage as the query.

Usage

Initialize

First, prepare the retrieval instance you want to use and set up your system prompt. In this example, we are using BM25Retrieval as the base retrieval method and setting up a custom system prompt.

from RAGchain.retrieval import BM25Retrieval, HyDERetrieval

test_prompt = "Please write a scientific paper passage to answer the question"
bm25_retrieval = BM25Retrieval(save_path="path/to/your/bm25/save_path")
hyde_retrieval = HyDERetrieval(bm25_retrieval, system_prompt=test_prompt)

Ingest

Ingest a list of Passages into the retrieval in the HyDE retrieval.

passages = [ ... ] # your list of Passage objects
hyde_retrieval.ingest(passages)

Retrieve

Retrieve top-k passages for a given query. You can also specify model kwargs such as max tokens for hypothetical passage generation model. modle kwargs reference is in openai api docs.

query = "What is visconde structure?"
top_k = 5
top_k_passages = hyde_retrieval.retrieve(query, top_k=top_k, model_kwargs={'max_tokens': 64})

Retrieve with filter

You can also filter the retrieved passages. Use the retrieve_with_filter method and provide the query, top-k value, and a list of content, filepath, or metadata values to filter by.

In this method uses DB.search method. Please refer here for further information.

Here's an example:

filtered_passages = hyde_retrieval.retrieve_with_filter(query, top_k, filepath=["filepath1", "filepath3"])
# This code will search top-5 most similar passages with filepath1 and filepath3

Last updated