Hyde Retrieval
HyDERetrieval Class Documentation
Overview
The HyDERetrieval
class is inspired by the paper "Precise Zero-shot Dense Retrieval without Relevance Labels". It uses a language model to generate a hypothetical passage for a given query and then retrieves passages using this hypothetical passage as the query.
Usage
Initialize
First, prepare the retrieval instance you want to use and set up your system prompt. In this example, we are using BM25Retrieval
as the base retrieval method and setting up a custom system prompt.
from RAGchain.retrieval import BM25Retrieval, HyDERetrieval
test_prompt = "Please write a scientific paper passage to answer the question"
bm25_retrieval = BM25Retrieval(save_path="path/to/your/bm25/save_path")
hyde_retrieval = HyDERetrieval(bm25_retrieval, system_prompt=test_prompt)
Ingest
Ingest a list of Passage
s into the retrieval in the HyDE retrieval.
passages = [ ... ] # your list of Passage objects
hyde_retrieval.ingest(passages)
Retrieve
Retrieve top-k passages for a given query. You can also specify model kwargs such as max tokens for hypothetical passage generation model. modle kwargs reference is in openai api docs.
query = "What is visconde structure?"
top_k = 5
top_k_passages = hyde_retrieval.retrieve(query, top_k=top_k, model_kwargs={'max_tokens': 64})
Retrieve with filter
You can also filter the retrieved passages. Use the retrieve_with_filter
method and provide the query, top-k value, and a list of content, filepath, or metadata values to filter by.
In this method uses DB.search
method. Please refer here for further information.
Here's an example:
filtered_passages = hyde_retrieval.retrieve_with_filter(query, top_k, filepath=["filepath1", "filepath3"])
# This code will search top-5 most similar passages with filepath1 and filepath3
Last updated