BasicIngestPipeline
BasicIngestPipeline Class Documentation
Overview
The BasicIngestPipeline
class handles the ingestion process of documents into a DB and a retrieval system. It is simple pipeline for beginners. It loads files from a directory using a file loader, splits the document into passages using a text splitter, saves the passages to a database, and ingests the passages into a retrieval module.
Usage
Initialize
The BasicIngestPipeline
class is initialized with the following parameters:
file_loader
: File loader to load documents. You can use any file loader from Langchain and RAGchain.db
: Database to save passages.retrieval
: Retrieval module to ingest passages.text_splitter
: Text splitter to split document into passages. Default isRecursiveTextSplitter
.ignore_existed_file
: If True, ignore existed file in database. Default is True. It usesFileCache
internally.
Run
The run
method executes the ingest pipeline. It takes an optional target_dir
parameter, which specifies the target directory to load documents from. If target_dir
is not provided, it uses the target_dir
from the file loader that was passed in during the initialization of the pipeline.
This method will load the documents, split them into passages, save the passages to the database, and ingest the passages into the retrieval module.
Last updated