VectorDB Retrieval

VectorDBRetrieval Class Documentation

Overview

The VectorDBRetrieval class is a retrieval class that uses VectorDB as a backend. You can use Dense Retrieval with this class easily.

It first embeds the passage content using an embedding model, then stores the embedded vector in VectorDB. When retrieving, it embeds the query and searches for the most similar vectors in VectorDB. Lastly, it returns the passages that have the most similar vectors.

Usage

Initialize

First, prepare the VectorDB instance you want to use and set up your database. In this example, we are using Chroma and ChromaSlim as our VectorDB. ChromaSlim is VectorStore that stores only passage ID and embedding vectors, which optimizes for RAGchain. The Chroma client requires a path to save the database and an embedding function.

  • Using Langchain Chroma VectorStore

from langchain.vectorstores import Chroma
import chromadb
from RAGchain.utils.embed import EmbeddingFactory
from RAGchain.retrieval import VectorDBRetrieval

chroma_path = "path/to/your/chroma"
embedding_function = EmbeddingFactory('openai').get()
client = chromadb.PersistentClient(path=chroma_path)

chroma = Chroma(client=client,
                collection_name='your_collection_name',
                embedding_function=embedding_function)

vectordb_retrieval = VectorDBRetrieval(vectordb=chroma)
  • Using RAGchain SlimVectorStore

Ingest

Ingest a list of passages into your retrieval system.

Retrieve

Retrieve top-k passages for a given query.

Retrieve with filter

You can also filter the retrieved passages. Use the retrieve_with_filter method and provide the query, top-k value, and a list of content, filepath, or metadata values to filter by.

In this method uses DB.search method. Please refer herearrow-up-right for further information.

Here's an example:

Last updated