About CineMatch

A portfolio-grade movie discovery platform — semantic search, hybrid ML recommendations, and an AI chat assistant, built on a production-hardened full-stack architecture.

What is CineMatch?

CineMatch is a full-stack movie recommendation platform demonstrating how semantic search, machine learning recommendation engines, and modern web architecture work together as a real product — not a tutorial project.

The recommendation engine uses a hybrid cosine similarity + BM25 approach trained on metadata, genres, cast, and directors from 5,000 TMDB films. The semantic search pipeline combines TF-IDF, BM25, fuzzy matching, and franchise alias expansion to understand natural language queries.

Authentication, user data, and role management are powered by Supabase. The AI chat assistant runs on Groq LLM (Llama 3.1), grounded to the dataset for accurate suggestions.

Features

🔍

Semantic Search

Describe a vibe, emotion, or plot — the engine understands natural language and finds the right film.

Hybrid Recommendations

Custom scoring formula combining content similarity, genre overlap, cast, and quality signals.

🤖

AI Chat

Chat with CineMatch — ask for recommendations, trivia, or explore genres through conversation.

🎬

5,000+ Films

Sourced from the TMDB dataset with enriched metadata, posters, ratings, and full cast data.

🔖

Personal Watchlist

Sign in to save films to your watchlist, persisted across devices with Supabase.

🕓

Recently Viewed

Your viewing history is tracked locally and synced to your account when signed in.

Architecture overview

Next.js → FastAPI → Supabase Postgres — each layer decoupled and independently deployable.

Frontend

  • Next.js 14 — App Router, SSR, API routes
  • React 18 — Context API for auth & watchlist
  • Tailwind CSS — utility-first styling
  • Supabase JS — client-side auth + real-time

Backend

  • Python 3.11 — FastAPI + Uvicorn
  • slowapi — per-endpoint rate limiting
  • Structured JSON logging with latency
  • Pydantic v2 — request validation

ML / Search

  • TF-IDF + cosine similarity (scikit-learn)
  • BM25 keyword scoring (rank-bm25)
  • Fuzzy matching + franchise alias expansion
  • Hybrid re-ranking formula

Auth & Data

  • Supabase Auth — email + Google OAuth
  • Supabase Postgres — managed DB
  • Row-Level Security on all tables
  • RBAC — user / admin roles

Recommendation engine

The engine computes a weighted hybrid score for every candidate film:

  • TF-IDFVectorises plot overviews and computes cosine similarity across the corpus.
  • BM25Probabilistic keyword relevance on title, genre, and cast tokens.
  • FuzzyTolerates typos and alternate spellings via token-set-ratio matching.
  • QualityVote count and average rating signals boost popular consensus picks.

Results are re-ranked by genre overlap and franchise membership before the final top-N are returned to the client.

Security

Row-Level Security (RLS)

All Supabase tables enforce RLS policies — users can only access their own data at the database layer.

Rate Limiting

slowapi guards every endpoint: /chat (10/min), /recommend (20/min), /search (30/min).

Admin RBAC

A role column on profiles gates /admin via Next.js middleware before any page code runs.

Two-Factor Authentication

Admins can enable TOTP (QR code via pyotp) or Telegram OTP (real Bot API via BotFather) as a second factor.

Input Validation

Pydantic models sanitise all inputs. Queries are length-capped and whitespace-stripped server-side.

Standardised Error Envelopes

{ success, error: { code, message } } ensures no internal details leak to the client.

Testing & CI

pytest — searchtests/test_api_search.py
pytest — recommendationstests/test_api_recommend.py
pytest — chat + admin/2FAtests/test_api_chat.py
Playwright E2E — search flowtests/e2e/search.spec.js
Playwright E2E — chat interfacetests/e2e/chat.spec.js
CI pipeline (GitHub Actions).github/workflows/ci.yml

The GitHub Actions pipeline runs pytest, builds Next.js, and executes Playwright smoke tests on every push to main. Any failing step blocks the merge.

Deployment

Cloud (default)

  • Vercel — Next.js frontend
  • Render — FastAPI backend
  • Supabase Cloud — DB + Auth

Self-hosted (Docker)

  • Dockerfile.backend — multi-stage Python image
  • Dockerfile.frontend — multi-stage Node image
  • docker-compose.yml — single-command local stack

Roadmap & future improvements

Watch provider lookup (JustWatch API)planned
Personalised recommendations from watchlist MLplanned
Director & actor filmography pagesplanned
Redis caching for hot search pathsplanned
Microservices split (search / recommendations / chat)future
Dedicated search infrastructure (Meilisearch / Typesense)future
Mobile app (React Native)future

Open Source Project

Support the project

Help shape what comes next — every contribution helps keep CineMatch free and improving.

Support CineMatch