IndexZero

Build a search engine from scratch in Python

Ten modules. One Python codebase that grows. You start with ranking judgment and end with a working search API. Along the way, you build every piece yourself: tokenizer, inverted index, BM25 scorer, eval harness, vector retrieval, and hybrid search.

Build arc

One codebase, one chain of system states

Each module changes the system state. The next module starts from the artifact you just made.

  1. M0The ProblemRaw titles become ranking hypotheses
  2. M1Text ProcessingText becomes tokens
  3. M2The IndexTokens become postings
  4. M3RankingPostings become ranked results
  5. M4Did It Work?Rankings become metrics
  6. M5Smarter QueriesQueries become structured
  7. M6Meaning, Not WordsMeaning becomes vectors
  8. M7Both TogetherSignals become one ranked list
  9. M8Keeping It AliveIndexes become live systems
  10. M9The Full SystemSystem becomes an API

Modules

Ten modules from ranking judgment to search API

Free modules let you start the system. The cohort continues through indexing, ranking, evaluation, query processing, and deployment shape.

M0

Available Free

The Problem

Ranking audit

You can articulate why relevance is hard and what signals a ranker needs

  • Effort1-2 hours
  • Tests0
  • Assessments3

M1

Available Free

Text Processing

Tokenizer + vocabulary

A working tokenizer that turns raw text into clean token streams

  • Effort4-6 hours
  • Tests44
  • Assessments7

M2

Available Cohort

The Index

Inverted index + lookup

A searchable inverted index that maps terms to documents

  • Effort4-6 hours
  • Tests26
  • Assessments6

M3

Available Cohort

Ranking

BM25 scorer

A BM25 scorer that ranks documents by relevance to a query

  • Effort4-5 hours
  • Tests27
  • Assessments5

M4

Planned Cohort

Did It Work?

Eval harness

An evaluation harness that measures ranking quality with labeled queries

  • ReleasePlanned
  • AssessmentsTo be announced

M5

Planned Cohort

Smarter Queries

Query processor

A query processor that handles multi-word and structured queries

  • ReleasePlanned
  • AssessmentsTo be announced

M6

Planned Cohort

Meaning, Not Words

Vector retrieval

Vector embeddings and approximate nearest neighbor search over the same corpus

  • ReleasePlanned
  • AssessmentsTo be announced

M7

Planned Cohort

Both Together

Hybrid retrieval

A hybrid retriever that combines lexical and semantic signals

  • ReleasePlanned
  • AssessmentsTo be announced

M8

Planned Cohort

Keeping It Alive

Index pipeline

An index that handles document changes without full rebuilds

  • ReleasePlanned
  • AssessmentsTo be announced

M9

Planned Cohort

The Full System

Search API

A complete search API serving queries over your index

  • ReleasePlanned
  • AssessmentsTo be announced

Start free

M0 and M1

Start with M0 and M1. You define the ranking problem, build the tokenizer, and see how the corpus changes before indexing starts.

  • Ranking audit
  • Tokenizer and vocabulary
  • Tests and assessments included in the repo

Join the cohort

M2 and beyond

Continue with M2 and beyond through the guided cohort. You get code reviews, deadline structure, discussion, and a workshop after the ranking modules.

  • Inverted index and ranking modules
  • Code reviews and deadline structure
  • Workshop after the ranking modules

See cohort details

Evidence

Concrete workload, one dataset, compounding artifacts

One codebase across 10 modules. M1: 44 tests, 7 assessments. M2: 26 tests, 6 assessments. M3: 27 tests, 5 assessments. Real product dataset. Each module feeds the next.

Who built this

About the instructor

IndexZero is built by Sumit Garg. Sumit spent years building search infrastructure at Microsoft, working on the systems behind Azure AI Search that handle billions of queries. He built this course because most developers use hosted search without understanding the retrieval mechanics underneath. IndexZero makes those mechanics visible in code you write yourself.