RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • Overview
  • Usage
  • Initialization
  • Split document
  1. RAGchain Structure
  2. Text Spliter

Recursive Text Splitter

PreviousText SpliterNextMarkdown Header Splitter

Last updated 1 year ago

Overview

The RecursiveTextSplitter is used to split a document into passages by recursively splitting on a list of separators. The class also allows for specifying a window size and overlap size to split the document into overlapping passages.

The most feature is similar with Langchain's .

Usage

Initialization

First, initialize an instance of RecursiveTextSplitter. For example:

from RAGchain.preprocess.text_splitter import RecursiveTextSplitter

splitter = RecursiveTextSplitter(chunk_size=500, chunk_overlap=50)

Split document

You can split document using split_document() method. It will return list of objects. For example:

passages = splitter.split_document(document)
RecursiveCharacterTextSplitter
Passage