Vision-First Study Buddy
Snap your notes, get a study guide — multimodal AI for students.
Overview
Vision-First Study Buddy is a mobile-friendly web application that transforms messy handwritten notes, whiteboard photos, PDFs, and epubs into organized study materials. Students snap a photo of their notes or upload a document, and the system uses Gemini 1.5 Flash's native multimodal understanding to extract content — including handwriting, diagrams, and equations — then generates structured study guides and interactive quizzes. The backend is a Python FastAPI service leveraging Vertex AI's massive context window for processing entire documents in a single pass. Uploaded materials are stored in Firebase Storage, and the React frontend with Material UI provides a mobile-first experience optimized for on-the-go studying.
The Problem
Students accumulate mountains of handwritten notes, whiteboard photos, and PDF handouts throughout a semester. When it comes time to study, they face a disorganized pile of materials in different formats with no easy way to synthesize them. Manually creating study guides is tedious, and most note-taking apps can't read handwriting or extract meaning from diagrams. Existing OCR tools produce raw text without understanding the educational context, leaving students to do the hard work of organizing and connecting concepts themselves.
The Approach
The system leverages Gemini 1.5 Flash's native multimodal capabilities to bypass traditional OCR pipelines entirely. When a student uploads a photo of handwritten notes or a PDF, the raw image or document is sent directly to Gemini along with the full context of previously uploaded materials for that study session. The model's 1M+ token context window means entire textbook chapters can be processed alongside handwritten annotations without chunking. For study guide generation, a specialized prompt template instructs the model to identify key concepts, create hierarchical outlines, and highlight relationships between topics. For quiz generation, a separate template produces varied question types (multiple choice, short answer, true/false) calibrated to the material's complexity. The React frontend uses Material UI components for a clean, mobile-first experience with drag-and-drop upload and native camera capture for snapping photos directly from the app.
Tech Stack
Medium effortFastAPI
Python API Framework
Gemini 1.5 Flash
Multimodal Vision + Generation
Firebase Storage
Image & Document Hosting
React + MUI
Mobile-First Frontend
Camera Capture
Native Photo Input
Quiz Engine
Auto-Generated Assessments