Recursive Text Splitter

Overview

The RecursiveTextSplitter is used to split a document into passages by recursively splitting on a list of separators. The class also allows for specifying a window size and overlap size to split the document into overlapping passages.

The most feature is similar with Langchain's RecursiveCharacterTextSplitter.

Usage

Initialization

First, initialize an instance of RecursiveTextSplitter. For example:

from RAGchain.preprocess.text_splitter import RecursiveTextSplitter

splitter = RecursiveTextSplitter(chunk_size=500, chunk_overlap=50)

Split document

You can split document using split_document() method. It will return list of Passage objects. For example:

passages = splitter.split_document(document)

Last updated