Time-Aware RAG
Time-Aware RAG
In real-world applications, you'll often find that the latest information is more valuable than outdated data. However, manually discarding old information can be challenging and potentially harmful, as this information might be useful for other queries. Therefore, time-aware RAG is crucial for certain applications.
Here, we introduce some useful tools for time-aware RAG.
Passage
content_datetime
Passage
content_datetimeFirst, we have content_datetime
in Passage
schema, available from v0.2.3. This field is automatically filled with the datetime that Passage
is created, or you can manually set it. When you edit this passage, the content_datetime
will be updated to the current datetime. You can use this field for time-aware RAG.
SimpleTimeReranker
This is the simplest way to implement time-aware RAG. After retrieving passages, you can sort them by content_datetime
.
FYI, you don't have to input any query to rerank
method, because SimpleTimeReranker
doesn't inherit from BaseReranker
class.
WeightedTimeReranker
If you want to mix content_datetime
with relevance scores, you can use WeightedTimeReranker
. It is similar with Langchain's TimeWeightedVectorStoreRetriever
, but you can use WeightedTimeReranker
with any retrievals in RAGchain.
So, the algorithm to rerank passages is as follows:
In this algorithm, relevance_score is normalized.
You can easily use WeightedTimeReranker
as follows:
Set a Hard Limit of Passage datetime
Another simply, yet powerful way to implement time-aware RAG is to set a hard limit of passage content_datetime
. You can achieve this by using retrieve_with_filter
at any Retrieval
class you can use. You can set multiple time ranges at once, then it only retrieves passages in the time ranges.
ClusterTimeCompressor
If you want to use recent information, but keep old and unique information, you can use ClusterTimeCompressor
. This class semantically clusters passages, and keep only latest passage in each cluster. For clustering passages, it uses SemanticClustering
.
You can select split_by_sentences
option when initializing ClusterTimeCompressor
. If this option is True, it splits passages to each sentence and clusters them. This option can be helpful that each passage size is big or contain whole different meanings in one passage.
Last updated