Skip to content

2025

BM25 Explained: A Better Ranking Algorithm than TF-IDF

BM25 algorithm is a popular ranking function used in information retrieval tasks such as search engines. However, BM25 search has also become increasingly popular for RAG (Retrieval Augmented Generation) based systems for ranking documents based on their relevance to a query.

BM25 search is an improved version of the TF-IDF algorithm that addresses some of its limitations. In this article, we will explore the BM25 algorithm in detail, understand its components, compare it with TF-IDF, and implement it from scratch. But before diving into BM25, it is essential to understand the nitigrities of TF-IDF, which you can find in my previous article.

A Comprehensive Guide on TF-IDF

TF-IDF (Term Frequency-Inverse Document Frequency) is a popular technique in Natural Language Processing (NLP) for text analysis and information retrieval. It is used to evaluate the importance of a word in a document relative to a collection of documents (corpus). This guide provides an in-depth explanation of TF-IDF, its intuition, mathematical formulation, and implementation from scratch.