Token splitter
Overview
Usage
Initialization
import os
import pathlib
root_dir = pathlib.PurePath(os.path.dirname(os.path.realpath(__file__))).parent.parent.parent
file_path = os.path.join(root_dir, "resources", "sample_test_document.txt")
with open(file_path) as f:
state_of_the_union = f.read()
TEST_DOCUMENT = Document(
page_content=state_of_the_union,
metadata={
'source': 'test_source',
'Data information': '맨까 새끼들 부들부들하구나',
'What is it?': 'THis is token splitter'
}
Tiktoken
Split document(tiktoken)
spaCy
Split document(spaCy)
SentenceTransformers
Split document(SentenceTransformers)
NLTK
Split document(NLTK)
Trouble Shooting(NLTK)
HuggingFace
Split document(HuggingFace)
Last updated