Synthetic Student Generator

info

description

Overview

The Synthetic Student Generator is a full-stack calibration tool for teachers that produces realistic student writing samples spanning four proficiency levels — Below, Approaching, Proficient, and Exemplary. Teachers choose from four built-in rubric templates (6-Trait Writing, Argumentative/Persuasive, Narrative, Informational/Explanatory) or paste their own freeform rubric text from any LMS or document. After entering an assignment prompt and grade level, the system generates a complete calibration set where each sample reflects a distinct student persona with consistent voice, skill level, intentional error patterns, and rubric-aligned proficiency scores. The backend is a Python FastAPI service using the Google Gen AI SDK with Vertex AI to call Gemini 3.1 Pro with enforced structured JSON output. The frontend is a SvelteKit 2 single-page application built with Svelte 5 runes and Tailwind CSS, statically built and hosted on Firebase with a Cloud Functions API proxy. The backend runs on Cloud Run with scale-to-zero and Application Default Credentials.

psychology_alt

warning

The Problem

Teachers need diverse examples of student work at different proficiency levels to calibrate their grading — ensuring consistency across a department, training new teachers, or preparing for standardized scoring sessions. Creating these calibration samples by hand is tedious and time-consuming: a typical set requires writing four distinct samples at different quality levels, each needing to authentically reflect how a real student at that level would write. This can take 3-4 hours per assignment prompt. The result is that calibration exercises often reuse stale samples, lack variety, or get skipped entirely — leading to inconsistent grading that directly impacts students.

auto_fix

The Approach

The system uses Gemini 3.1 Pro via the Google Gen AI SDK with Vertex AI, leveraging two key capabilities: complex system instructions for persona modeling and native structured JSON output for consistent formatting. For each proficiency level, a detailed system instruction is assembled from a persona template that specifies the student's grade level, writing strengths and weaknesses, error patterns to apply (e.g., run-on sentences, weak thesis statements), and rubric dimension targets. The model is constrained to output a strict JSON schema (GENERATED_SAMPLE_SCHEMA) containing the student response, per-dimension proficiency scores, persona notes, and writing trait metrics. Freeform rubric text — pasted directly from an LMS, Google Doc, or PDF — is parsed into structured dimensions by a separate Gemini call before generation, making the system compatible with any rubric format without manual re-entry. The SvelteKit frontend orchestrates parallel requests (one per selected proficiency level) to generate a full calibration set in a single round trip. Failed generations are automatically retried once, with a manual retry button for persistent failures. Each response includes token usage counts and an estimated cost breakdown, making the tool educational for understanding AI economics.

Architecture

account_tree

Production Architecture

How the system deploys and serves requests from browser to Gemini

zoom_in Tap to expand

account_tree

Generation Pipeline

How a teacher's prompt becomes a calibration set of student writing samples

zoom_in Tap to expand

Tech Stack

Medium effort

api

FastAPI

Python 3.13 Backend

auto_awesome

Gemini 3.1 Pro

Structured Generation

psychology

Google Gen AI SDK

Vertex AI Integration

data_object

Structured Output

JSON Schema Enforcement

web

SvelteKit 2

Svelte 5 Runes + Tailwind

cloud

Cloud Run

Scale-to-Zero Deploy

hosting

Firebase Hosting

Static SPA + API Proxy

monitoring

Token Tracking

Usage & Cost Estimates

Engineering Highlights

Structured OutputPersona EngineeringFreeform Rubric ParsingCalibration SetsParallel GenerationAuto-RetryToken Usage TrackingCloud-Native