> For the complete documentation index, see [llms.txt](https://nomadamas.gitbook.io/ragchain-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://nomadamas.gitbook.io/ragchain-docs/ragchain-structure/file-loader/rust-hwp-loader.md).

# Rust Hwp Loader

## Overview

The `RustHwpLoader` is a Python class that loads HWP files using the [`libhwp`](https://blog.hanlee.io/2022/hwp-rs) library. It works across all OS.

This loader extracts all paragraphs and tables from the HWP file and returns them as a list of `Document` objects. Each `Document` object includes the content of the paragraph or table and some associated metadata. The first `Document` contains all paragraphs from the HWP file, including the texts within each table. Subsequent `Documents` represent the paragraphs within each table.

Unfortunately, this loader does not distinguish between rows and columns in a table.

The `metadata` attribute of each `Document` includes the file path (under the 'source' key) and the page type ('text' or 'table').

While other HWP loaders may offer more great features, `RustHwpLoader` is a great option for MacOS and Linux users because it does not require an external HWP loader server or a Windows-only HWP program.

## Usage

### Initialization

To use the `RustHwpLoader`, you first need to initialize it with the path to the HWP file:

```python
loader = RustHwpLoader("/path/to/hwp/file")
```

If the `libhwp` library is not installed, an `ImportError` will be raised with a message asking you to install it using pip:

```python
pip install libhwp
```

### Loading Documents

The `RustHwpLoader` provides two methods to load `Document` objects from the HWP file: `load` and `lazy_load`.

#### `load`

The `load` method returns a list of all `Documents`:

```python
documents = loader.load()
```

#### `lazy_load`

The `lazy_load` method is a generator that lazily yields `Document` objects:

```python
for document in loader.lazy_load():
    # process document
```

This method is useful when working with large HWP files that could consume a lot of memory if fully loaded into a list.

Each `Document` yielded by `load` or `lazy_load` contains a `page_content` string and a `metadata` dictionary. The `metadata` includes the 'source' key (the file path) and the 'page\_type' key (either 'text' or 'table').


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://nomadamas.gitbook.io/ragchain-docs/ragchain-structure/file-loader/rust-hwp-loader.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
