RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • Overview
  • Usage
  • Initialize
  • Ingest
  • Retrieve
  • Retrieve with filter
  1. RAGchain Structure
  2. Retrieval

VectorDB Retrieval

VectorDBRetrieval Class Documentation

PreviousHyde RetrievalNextLLM

Last updated 1 year ago

Overview

The VectorDBRetrieval class is a retrieval class that uses VectorDB as a backend. You can use Dense Retrieval with this class easily.

It first embeds the passage content using an embedding model, then stores the embedded vector in VectorDB. When retrieving, it embeds the query and searches for the most similar vectors in VectorDB. Lastly, it returns the passages that have the most similar vectors.

Usage

Initialize

First, prepare the VectorDB instance you want to use and set up your database. In this example, we are using Chroma and as our VectorDB. is VectorStore that stores only passage ID and embedding vectors, which optimizes for RAGchain. The Chroma client requires a path to save the database and an embedding function.

  • Using Langchain Chroma VectorStore

from langchain.vectorstores import Chroma
import chromadb
from RAGchain.utils.embed import EmbeddingFactory
from RAGchain.retrieval import VectorDBRetrieval

chroma_path = "path/to/your/chroma"
embedding_function = EmbeddingFactory('openai').get()
client = chromadb.PersistentClient(path=chroma_path)

chroma = Chroma(client=client,
                collection_name='your_collection_name',
                embedding_function=embedding_function)

vectordb_retrieval = VectorDBRetrieval(vectordb=chroma)
  • Using RAGchain SlimVectorStore

from RAGchain.utils.vectorstore import ChromaSlim
import chromadb
from RAGchain.utils.embed import EmbeddingFactory
from RAGchain.retrieval import VectorDBRetrieval

chroma_path = "path/to/your/chroma"
embedding_function = EmbeddingFactory('openai').get()
client = chromadb.PersistentClient(path=chroma_path)

chroma = ChromaSlim(client=client,
                    collection_name='your_collection_name',
                    embedding_function=embedding_function)

slim_vectordb_retrieval = VectorDBRetrieval(vectordb=chroma)

Ingest

Ingest a list of passages into your retrieval system.

passages = [...] # your list of passages here
slim_vectordb_retrieval.ingest(passages)

Retrieve

Retrieve top-k passages for a given query.

query = "What's the main advantage of using Slim Vector store?"
top_k_passages = slim_vectordb_retrieval.retrieve(query=query, top_k=5)

Retrieve with filter

You can also filter the retrieved passages. Use the retrieve_with_filter method and provide the query, top-k value, and a list of content, filepath, or metadata values to filter by.

Here's an example:

filtered_passages = slim_vectordb_retrieval.retrieve_with_filter(query, top_k, filepath=["filepath1", "filepath3"])
# This code will search top-5 most similar passages with filepath1 and filepath3

In this method uses DB.search method. Please refer for further information.

ChromaSlim
ChromaSlim
here