> For the complete documentation index, see [llms.txt](https://nomadamas.gitbook.io/ragchain-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://nomadamas.gitbook.io/ragchain-docs/ragchain-structure/file-loader/ocr/mathpix-markdown-loader.md).

# Mathpix Markdown Loader

## Overview

The `MathpixMarkdownLoader` class is a Python class that loads and processes Mathpix Markdown files (.mmd files), which are a special kind of markdown file designed for scientific papers. For the detailed explaination about mathpix markdown, please check out [mathpix website](https://mathpix.com/docs/mathpix-markdown/overview).

The class provides functionality to split the loaded file into sections and tables, making it easier to process and analyze the scientific content.

## Usage

### Initialization

To initialize the `MathpixMarkdownLoader` class, you need to provide a path to an existing Mathpix Markdown file (`.mmd`).

```python
from RAGchain.preprocess.loader import MathpixMarkdownLoader

loader = MathpixMarkdownLoader(filepath="/path/to/your/file.mmd")
```

If the provided filepath does not point to an existing file, a `ValueError` will be raised.

#### Loading Data

There are two ways you can load data from the `.mmd` file: `load()` or `lazy_load()`

Example:

```python
documents = loader.load(split_section=True, split_table=True)
```

or

```python
for document in loader.lazy_load(split_section=True, split_table=True):
    # process each Document here...
    pass
```

Both methods return list of Document objects representing each section or table in the original `.mmd` file.

#### Splitting Sections and Tables

The loader provides options for splitting content into sections and tables:

* `split_section`: Splits provided markdown content into separate sections based on '#' headers.
* `split_table`: Splits provided markdown content into separate pieces based on LaTeX table environments (`\\begin{table}` ... `\\end{table}`). The returned list alternates between non-table text and table text.

These features can be used independently of loading if desired:

```python
content = "your-markdown-string"
sections = MathpixMarkdownLoader.split_section(content)
tables_and_text = MathpixMarkdownLoader.split_table(content)
```

Note: The order of each section/table in returned list(s) is consistent with their order in original `.mmd` file.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://nomadamas.gitbook.io/ragchain-docs/ragchain-structure/file-loader/ocr/mathpix-markdown-loader.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
