Importance-Aware RAG
Importance-Aware RAG
In some cases, you may want to prioritize certain passages over others. For example, when you try to find specific information at Google search, there are some websites that you trust more than others. Or, some websites that you don't want to see. In this case, you can use importance-aware RAG.
Here, we introduce some useful tools for importance-aware RAG.
Passage importance
Passage importanceImportance-RAG can set passage importance by using Passage's importance field. The default value of importance is 0, and you can set this value to any int value. If it is minus value, its passage is less important than default passages. If it is plus value, its passage is more important than default passages. Using this field, you can retrieve passages with document importance.
SimpleImportanceReranker
This is the simplest way to implement importance-aware RAG. After retrieving passages, you can sort them by importance.
query = "What is the capital of Korea?"
retrieval = BM25Retrieval('/path/to/your/bm25.pkl')
passages = retrieval.retreive(query)
# rerank passages by importance
reranker = SimpleImportanceReranker()
reranked_passages = reranker.rerank(passages)
FYI, you don't have to input any query to rerank method, because SimpleImportanceReranker doesn't inherit from BaseReranker class.
WeightedImportanceReranker
If you want to mix importance with relevance scores, you can use WeightedImportanceReranker. It is similar with WeightedTimeReranker.
The algorithm to rerank passages is as follows:
You can easily use WeightedImportanceReranker as follows:
Set a Hard Limit of Passage importance
Another simply, yet powerful way to implement importance-aware RAG is to set a hard limit of passage importance. You can achieve this by using retrieve_with_filter at any Retrieval class you can use. You can set importance values you want to retrieve.
Last updated