Java + Astro LIVE

Classroom Clarity

AI-powered semantic Q&A for school documents — ask questions, get cited answers.

info
description

Overview

Classroom Clarity is a full-stack retrieval-augmented generation (RAG) application purpose-built for the education space. Teachers, administrators, and parents can upload school documents — curriculum maps, student handbooks, policy guides — and immediately ask natural-language questions. The system retrieves the most relevant passages using vector similarity search, then generates grounded answers with source citations so users can verify every claim. The backend is a Java 21 Spring Boot service using Spring AI 1.1 with the RetrievalAugmentationAdvisor pattern, backed by PostgreSQL 16 with pgvector for combined relational and vector storage. The frontend is an Astro 5 static site with Tailwind CSS, hosted on Firebase with a Cloud Functions API proxy. The entire stack runs on Google Cloud — Cloud Run for the API, Cloud SQL for the database, Cloud Storage for PDFs, and Vertex AI for embeddings and chat.

psychology_alt
warning

The Problem

School districts produce hundreds of pages of curriculum maps, handbooks, and policy documents every year. These critical documents live as static PDFs buried in shared drives and websites. Finding a specific policy, checking a standards alignment, or answering a parent's question means manually searching through dozens of pages. Teachers waste valuable planning time hunting for information that should be instantly accessible. Traditional keyword search fails because education documents use specialized terminology and the same concept is often described differently across documents.

auto_fix

The Approach

The system uses two distinct pipelines. During document ingestion, uploaded PDFs are stored in Cloud Storage, then text is extracted using Apache PDFBox, split into overlapping chunks (~512 tokens with ~100 token overlap), and each chunk is embedded into a 768-dimension vector using Vertex AI text-embedding-005. The chunks and their embeddings are stored in PostgreSQL with pgvector, indexed using HNSW for fast approximate nearest-neighbor search. At query time, the user's question is embedded with the same model, a cosine similarity search retrieves the top-K most relevant chunks, and those chunks are assembled into a context prompt sent to Vertex AI Gemini 2.0 Flash for answer generation. The response includes the AI-generated answer alongside the specific source chunks with document titles and page numbers, enabling full traceability. Rate limiting via Bucket4j protects the API, and document retention policies auto-clean expired uploads.

Architecture

account_tree

Document Ingestion Flow

How school documents become searchable knowledge

Document Ingestion Flow
zoom_in Tap to expand
account_tree

Query Workflow

How a question becomes a grounded answer with source citations

Query Workflow
zoom_in Tap to expand

Tech Stack

Medium-High effort
code_blocks

Spring Boot 3.5

Java 21 Backend

psychology

Spring AI 1.1

RAG Orchestration

database

pgvector

Vector Search (HNSW)

auto_awesome

Gemini 2.0 Flash

Answer Generation

text_snippet

text-embedding-005

768-dim Embeddings

cloud

Cloud Run

Serverless Deploy

web

Astro 5

Static Frontend

storage

Cloud SQL + GCS

Managed Data Layer

Engineering Highlights

Full-Stack RAGVector Similarity SearchSource CitationsPDF ETL PipelineHNSW IndexingRate LimitingTestcontainersCloud-Native