RAGChain Docs
  • Introduction
  • Quick Start
  • Installation
  • RAGchain Structure
    • File Loader
      • Dataset Loader
        • Ko-Strategy-QA Loader
      • Hwp Loader
      • Rust Hwp Loader
      • Win32 Hwp Loader
      • OCR
        • Nougat Loader
        • Mathpix Markdown Loader
        • Deepdoctection Loader
    • Text Spliter
      • Recursive Text Splitter
      • Markdown Header Splitter
      • HTML Header splitter
      • Code splitter
      • Token splitter
    • Retrieval
      • BM25 Retrieval
      • Hybrid Retrieval
      • Hyde Retrieval
      • VectorDB Retrieval
    • LLM
    • DB
      • MongoDB
      • Pickle DB
    • Reranker
      • BM25 Reranker
      • UPR Reranker
      • TART Reranker
      • MonoT5 Reranker
      • LLM Reranker
    • Benchmark
      • Auto Evaluator
      • Dataset Evaluators
        • Qasper
        • Ko-Strategy-QA
        • Strategy-QA
        • ms-marco
  • Utils
    • Query Decomposition
    • Evidence Extractor
    • Embedding
    • Slim Vector Store
      • Pinecone Slim
      • Chroma Slim
    • File Cache
    • Linker
      • Redis Linker
      • Dynamo Linker
      • Json Linker
    • REDE Search Detector
    • Semantic Clustering
  • Pipeline
    • BasicIngestPipeline
    • BasicRunPipeline
    • RerankRunPipeline
    • ViscondeRunPipeline
  • For Advanced RAG
    • Time-Aware RAG
    • Importance-Aware RAG
Powered by GitBook
On this page
  • REDE Search Detector
  • How to Use
  • Train
  • Inference
  • Evaluate
  1. Utils

REDE Search Detector

PreviousJson LinkerNextSemantic Clustering

Last updated 1 year ago

REDE Search Detector

This class is implementation of REDE, the method for detect knowledge-seeking turn in few-shot setting. It means, you can detect search (retrieve) intent in dialogues with only few knowledge-seeking turn dialogues. It contains train function for your custom model, and inference function for detect knowledge-seeking turn. You will need non-knowledge seeking turn dialogues and few knowledge-seeking turn dialogues. The detail of this method is described in . Or you can read about it.

How to Use

Train

You have to prepare non-knowledge seeking turn dialogues and few knowledge-seeking turn dialogues. Then, you can find representation formation and train density estimation. Finally, you can find threshold for detect knowledge-seeking turn using validation data. You need to input from scikit-learn for training density estimation. You can save this model for further inference.

knowledge_seeking_sentences = ['your sentences']
non_knowledge_seeking_sentences = ['your sentences']

detector = RedeSearchDetector()
detector.find_representation_transform(knowledge_seeking_sentences)
detector.train_density_estimation(GaussianMixture(n_components=1), non_knowledge_seeking_sentences)

valid_knowledge_seeking_sentences = ['your sentences']
valid_non_knowledge_seeking_sentences = ['your sentences']
threshold = detector.find_threshold(valid_knowledge_seeking_sentences, valid_non_knowledge_seeking_sentences)

Inference

After training, you can use inference function for detect knowledge-seeking turn like below.

results = detector.detect(['your sentences'])

If result is True, it means knowledge-seeking turn. Otherwise, it means non-knowledge-seeking turn.

Evaluate

1. Put your own test data

test_knowledge_seeking_sentences = ['your sentences']
test_non_knowledge_seeking_sentences = ['your sentences']

precision, recall, f1 = detector.evaluate(test_knowledge_seeking_sentences, test_non_knowledge_seeking_sentences)

You can get precision, recall, f1 score by using your own test dataset.

2. Use DSTC11-Track5 dataset

evaluator = SearchDetectorEvaluator(detector)
precision, recall, f1 = evaluator.evaluate()

You can also get precision, recall, f1 score by using DSTC11-Track5 dataset. DSTC11-Track5 dataset is about hotel and restaurant reservation dialogue dataset. Thus, it is not great benchmark for out-of-domain dataset other than hotel and restaurant reservation dialogues. So, we recommend to use train function at SearchDetectorEvaluator instance for training with DSTC11-Track5 dataset. Then, you might get great result.

You can evaluate your model with two methods. First, put your own test data. Second, use dataset using SearchDetectorEvaluator instance.

paper
Korean Blog
GaussianMixtureModel
DSTC11-Track5