RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • Overview
  • Usage
  • Initialize
  • Run
  1. Pipeline

BasicIngestPipeline

BasicIngestPipeline Class Documentation

PreviousPipelineNextBasicRunPipeline

Last updated 1 year ago

Overview

The BasicIngestPipeline class handles the ingestion process of documents into a DB and a retrieval system. It is simple pipeline for beginners. It loads files from a directory using a file loader, splits the document into passages using a text splitter, saves the passages to a database, and ingests the passages into a retrieval module.

Usage

Initialize

The BasicIngestPipeline class is initialized with the following parameters:

  • : File loader to load documents. You can use any file loader from Langchain and RAGchain.

  • : Database to save passages.

  • : Retrieval module to ingest passages.

  • : Text splitter to split document into passages. Default is .

  • ignore_existed_file: If True, ignore existed file in database. Default is True. It uses internally.

from RAGchain.pipeline.basic import BasicIngestPipeline
from RAGchain.DB import PickleDB
from RAGchain.retrieval import BM25Retrieval
from RAGchain.preprocess.loader import FileLoader

file_loader = FileLoader(target_dir="your/path/to/file/dir")
db = PickleDB("your/path/to/pickle.pkl")
retrieval = BM25Retrieval(save_path="your/path/to/bm25.pkl")
pipeline = BasicIngestPipeline(file_loader=file_loader, db=db, retrieval=retrieval)

Run

The run method executes the ingest pipeline. It takes an optional target_dir parameter, which specifies the target directory to load documents from. If target_dir is not provided, it uses the target_dir from the file loader that was passed in during the initialization of the pipeline.

pipeline.run()

This method will load the documents, split them into passages, save the passages to the database, and ingest the passages into the retrieval module.

file_loader
db
retrieval
text_splitter
RecursiveTextSplitter
FileCache