RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • Overview
  • Usage
  1. RAGchain Structure
  2. File Loader

Hwp Loader

Documentation for HwpLoader class

PreviousKo-Strategy-QA LoaderNextRust Hwp Loader

Last updated 1 year ago

Overview

The HwpLoader class is a dedicated loader for handling HWP files, which are widely used in South Korea. It provides functionality to load and convert HWP files into text using an external API, namely the .

The hwp-converter-api is a service that converts HWP files into plain text. You can find more information about this API at .

Usage

To use this class, you would need to instantiate it by providing necessary parameters and then call its load or lazy_load method:

from RAGchain.preprocess.loader import HwpLoader

loader = HwpLoader(path="path_to_your_file.hwp", 
                   hwp_host_url="http://your_hwp_converter_api_url")
documents = loader.load()

You can get list of objects that came from original hwp file.

Please note that currently only .hwp files are supported; .hwpx files are not supported yet by this loader.

Also note that you must have aiohttp library installed in your environment.

Lastly, ensure that provided URL points to running instance of hwa-converter-api.

hwp-converter-api
here
Document