Classroom Clarity
AI-powered semantic Q&A for school documents — ask questions, get cited answers.
Overview
Classroom Clarity is a full-stack retrieval-augmented generation (RAG) application purpose-built for the education space. Teachers, administrators, and parents can upload school documents — curriculum maps, student handbooks, policy guides — and immediately ask natural-language questions. The system retrieves the most relevant passages using vector similarity search, then generates grounded answers with source citations so users can verify every claim. The backend is a Java 21 Spring Boot service using Spring AI 1.1 with the RetrievalAugmentationAdvisor pattern, backed by PostgreSQL 16 with pgvector for combined relational and vector storage. The frontend is an Astro 5 static site with Tailwind CSS, hosted on Firebase with a Cloud Functions API proxy. The entire stack runs on Google Cloud — Cloud Run for the API, Cloud SQL for the database, Cloud Storage for PDFs, and Vertex AI for embeddings and chat.
The Problem
School districts produce hundreds of pages of curriculum maps, handbooks, and policy documents every year. These critical documents live as static PDFs buried in shared drives and websites. Finding a specific policy, checking a standards alignment, or answering a parent's question means manually searching through dozens of pages. Teachers waste valuable planning time hunting for information that should be instantly accessible. Traditional keyword search fails because education documents use specialized terminology and the same concept is often described differently across documents.
The Approach
The system uses two distinct pipelines. During document ingestion, uploaded PDFs are stored in Cloud Storage, then text is extracted using Apache PDFBox, split into overlapping chunks (~512 tokens with ~100 token overlap), and each chunk is embedded into a 768-dimension vector using Vertex AI text-embedding-005. The chunks and their embeddings are stored in PostgreSQL with pgvector, indexed using HNSW for fast approximate nearest-neighbor search. At query time, the user's question is embedded with the same model, a cosine similarity search retrieves the top-K most relevant chunks, and those chunks are assembled into a context prompt sent to Vertex AI Gemini 2.0 Flash for answer generation. The response includes the AI-generated answer alongside the specific source chunks with document titles and page numbers, enabling full traceability. Rate limiting via Bucket4j protects the API, and document retention policies auto-clean expired uploads.
Architecture
Document Ingestion Flow
How school documents become searchable knowledge
Query Workflow
How a question becomes a grounded answer with source citations
Tech Stack
Medium-High effortSpring Boot 3.5
Java 21 Backend
Spring AI 1.1
RAG Orchestration
pgvector
Vector Search (HNSW)
Gemini 2.0 Flash
Answer Generation
text-embedding-005
768-dim Embeddings
Cloud Run
Serverless Deploy
Astro 5
Static Frontend
Cloud SQL + GCS
Managed Data Layer