Synthetic Student Generator
Generate realistic student writing samples for grading calibration — in seconds, not hours.
Overview
The Synthetic Student Generator is a full-stack calibration tool for teachers that produces realistic student writing samples spanning four proficiency levels — Below, Approaching, Proficient, and Exemplary. Teachers choose from four built-in rubric templates (6-Trait Writing, Argumentative/Persuasive, Narrative, Informational/Explanatory) or paste their own freeform rubric text from any LMS or document. After entering an assignment prompt and grade level, the system generates a complete calibration set where each sample reflects a distinct student persona with consistent voice, skill level, intentional error patterns, and rubric-aligned proficiency scores. The backend is a Python FastAPI service using the Google Gen AI SDK with Vertex AI to call Gemini 2.5 Flash with enforced structured JSON output. The frontend is a SvelteKit 2 single-page application built with Svelte 5 runes and Tailwind CSS, statically built and hosted on Firebase with a Cloud Functions API proxy. The backend runs on Cloud Run with scale-to-zero and Application Default Credentials.
The Problem
Teachers need diverse examples of student work at different proficiency levels to calibrate their grading — ensuring consistency across a department, training new teachers, or preparing for standardized scoring sessions. Creating these calibration samples by hand is tedious and time-consuming: a typical set requires writing four distinct samples at different quality levels, each needing to authentically reflect how a real student at that level would write. This can take 3-4 hours per assignment prompt. The result is that calibration exercises often reuse stale samples, lack variety, or get skipped entirely — leading to inconsistent grading that directly impacts students.
The Approach
The system uses Gemini 2.5 Flash via the Google Gen AI SDK with Vertex AI, leveraging two key capabilities: complex system instructions for persona modeling and native structured JSON output for consistent formatting. For each proficiency level, a detailed system instruction is assembled from a persona template that specifies the student's grade level, writing strengths and weaknesses, error patterns to apply (e.g., run-on sentences, weak thesis statements), and rubric dimension targets. The model is constrained to output a strict JSON schema (GENERATED_SAMPLE_SCHEMA) containing the student response, per-dimension proficiency scores, persona notes, and writing trait metrics. Freeform rubric text — pasted directly from an LMS, Google Doc, or PDF — is parsed into structured dimensions by a separate Gemini call before generation, making the system compatible with any rubric format without manual re-entry. The SvelteKit frontend orchestrates parallel requests (one per selected proficiency level) to generate a full calibration set in ~15 seconds. Failed generations are automatically retried once, with a manual retry button for persistent failures. Each response includes token usage counts and an estimated cost breakdown, making the tool educational for understanding AI economics.
Architecture
Production Architecture
How the system deploys and serves requests from browser to Gemini
Generation Pipeline
How a teacher's prompt becomes a calibration set of student writing samples
Tech Stack
Medium effortFastAPI
Python 3.13 Backend
Gemini 2.5 Flash
Structured Generation
Google Gen AI SDK
Vertex AI Integration
Structured Output
JSON Schema Enforcement
SvelteKit 2
Svelte 5 Runes + Tailwind
Cloud Run
Scale-to-Zero Deploy
Firebase Hosting
Static SPA + API Proxy
Token Tracking
Usage & Cost Estimates