File Cache
The purpose of FileCache
is to remove duplicate documents since users are likely to load duplicate document files if they ingest files multiple times.
The FileCache
is a util that checks the DB for duplicate file check files.
Usage
At first, import FileCache.
We will intentionally generate db saved duplicate files to illustrate.
Create an instance and input parameter. At this example, we use PickleDB
.
And then, use delete_duplicate()
to detect what file is already saved in DB. You can get List[Document] with duplicate document removed!
Last updated