> For the complete documentation index, see [llms.txt](https://nomadamas.gitbook.io/ragchain-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://nomadamas.gitbook.io/ragchain-docs/pipeline/basicingestpipeline.md).

# BasicIngestPipeline

## Overview

The `BasicIngestPipeline` class handles the ingestion process of documents into a DB and a retrieval system. It is simple pipeline for beginners. It loads files from a directory using a file loader, splits the document into passages using a text splitter, saves the passages to a database, and ingests the passages into a retrieval module.

## Usage

### Initialize

The `BasicIngestPipeline` class is initialized with the following parameters:

* [`file_loader`](/ragchain-docs/ragchain-structure/file-loader.md) : File loader to load documents. You can use any file loader from Langchain and RAGchain.
* [`db`](/ragchain-docs/ragchain-structure/db.md): Database to save passages.
* [`retrieval`](/ragchain-docs/ragchain-structure/retrieval.md): Retrieval module to ingest passages.
* [`text_splitter`](/ragchain-docs/ragchain-structure/text-splitter.md): Text splitter to split document into passages. Default is [`RecursiveTextSplitter`](/ragchain-docs/ragchain-structure/text-splitter/recursive-text-splitter.md).
* `ignore_existed_file`: If True, ignore existed file in database. Default is True. It uses [`FileCache`](/ragchain-docs/utils/file-cache.md) internally.

```python
from RAGchain.pipeline.basic import BasicIngestPipeline
from RAGchain.DB import PickleDB
from RAGchain.retrieval import BM25Retrieval
from RAGchain.preprocess.loader import FileLoader

file_loader = FileLoader(target_dir="your/path/to/file/dir")
db = PickleDB("your/path/to/pickle.pkl")
retrieval = BM25Retrieval(save_path="your/path/to/bm25.pkl")
pipeline = BasicIngestPipeline(file_loader=file_loader, db=db, retrieval=retrieval)
```

### Run

The `run` method executes the ingest pipeline. It takes an optional `target_dir` parameter, which specifies the target directory to load documents from. If `target_dir` is not provided, it uses the `target_dir` from the file loader that was passed in during the initialization of the pipeline.

```python
pipeline.run()
```

This method will load the documents, split them into passages, save the passages to the database, and ingest the passages into the retrieval module.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://nomadamas.gitbook.io/ragchain-docs/pipeline/basicingestpipeline.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
