RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • Overview
  • Usage
  • Initialization
  • Split document
  1. RAGchain Structure
  2. Text Spliter

Markdown Header Splitter

PreviousRecursive Text SplitterNextHTML Header splitter

Last updated 1 year ago

Overview

The MarkDownHeaderSplitter is used to split a document into passages based document's header information which a list of separators contain. The most feature is similar with Langchain's . It split based on header.

metadata_etc of Passage contains header information and original document information. metadata_etc updates new header is two case. First, whenever new header appear at document, metadata_etc is appended new header information. Second, when a header with an equivalent relationship appears, the metadata is initialized and the newly appeared header is included in the metadata.

Usage

Initialization

First, initialize an instance of MarkDownHeaderSplitter. For example:

from RAGchain.preprocess.text_splitter import MarkDownHeaderSplitter

markdown_header_splitter = MarkDownHeaderSplitter()

Split document

You can split document using split_document() method. It will return list of objects. For example:

passages = markdown_header_splitter.split_document(document)
MarkdownHeaderTextSplitter
Passage