# Win32 Hwp Loader

## Overview

The `Win32HwpLoader` class is a base loader for loading HWP files in a Windows environment. It uses the `pywin32` library to facilitate this process. The class can handle both `.hwp` and `.hwpx` file formats.

The primary use of this class is to extract all paragraphs and tables from a given HWP or HWPX file. It returns a list of `Document` objects, with the first `Document` containing all paragraphs excluding any text within tables. Each subsequent `Document` represents a table from the original file, with its content converted into HTML format. This allows you to handle complex table structures with ease.

The `Document` objects also contain metadata such as the `source` for file path and the `page_type`, which can either be 'text' or 'table'.

However, please note that `Win32HwpLoader` is only suitable for Windows. If you need to handle HWP files on macOS or Linux, consider using [`RustHwpLoader`](/ragchain-docs/ragchain-structure/file-loader/rust-hwp-loader.md).

## Usage

To use the `Win32HwpLoader` class, you need to initialize it with the path to the HWP file:

```python3
loader = Win32HwpLoader('path/to/hwp/file')
```

After initializing the loader, you can call either the `load` or `lazy_load` method to extract the documents:

```python3
documents = loader.load()
```

or

```python3
for document in loader.lazy_load():
    # process document
```

The `load` method loads all documents at once into a list, while the `lazy_load` method returns a generator iterator that yields one `Document` at a time. This can be useful for larger files as it allows you to process each `Document` individually, reducing memory usage.

Please note that the `preprocessor` method is called internally by `load` and `lazy_load` to handle the actual extraction and conversion of the HWP file content. It's not intended to be called directly.

In case the file extension is neither `.hwp` nor `.hwpx`, a `ValueError` will be raised.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://nomadamas.gitbook.io/ragchain-docs/ragchain-structure/file-loader/win32-hwp-loader.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
