Python + React IN DESIGN

Vision-First Study Buddy

Snap your notes, get a study guide — multimodal AI for students.

info
description

Overview

Vision-First Study Buddy is a mobile-friendly web application that transforms messy handwritten notes, whiteboard photos, PDFs, and epubs into organized study materials. Students snap a photo of their notes or upload a document, and the system uses Gemini 1.5 Flash's native multimodal understanding to extract content — including handwriting, diagrams, and equations — then generates structured study guides and interactive quizzes. The backend is a Python FastAPI service leveraging Vertex AI's massive context window for processing entire documents in a single pass. Uploaded materials are stored in Firebase Storage, and the React frontend with Material UI provides a mobile-first experience optimized for on-the-go studying.

psychology_alt
warning

The Problem

Students accumulate mountains of handwritten notes, whiteboard photos, and PDF handouts throughout a semester. When it comes time to study, they face a disorganized pile of materials in different formats with no easy way to synthesize them. Manually creating study guides is tedious, and most note-taking apps can't read handwriting or extract meaning from diagrams. Existing OCR tools produce raw text without understanding the educational context, leaving students to do the hard work of organizing and connecting concepts themselves.

auto_fix

The Approach

The system leverages Gemini 1.5 Flash's native multimodal capabilities to bypass traditional OCR pipelines entirely. When a student uploads a photo of handwritten notes or a PDF, the raw image or document is sent directly to Gemini along with the full context of previously uploaded materials for that study session. The model's 1M+ token context window means entire textbook chapters can be processed alongside handwritten annotations without chunking. For study guide generation, a specialized prompt template instructs the model to identify key concepts, create hierarchical outlines, and highlight relationships between topics. For quiz generation, a separate template produces varied question types (multiple choice, short answer, true/false) calibrated to the material's complexity. The React frontend uses Material UI components for a clean, mobile-first experience with drag-and-drop upload and native camera capture for snapping photos directly from the app.

Tech Stack

Medium effort
api

FastAPI

Python API Framework

visibility

Gemini 1.5 Flash

Multimodal Vision + Generation

cloud_upload

Firebase Storage

Image & Document Hosting

widgets

React + MUI

Mobile-First Frontend

photo_camera

Camera Capture

Native Photo Input

quiz

Quiz Engine

Auto-Generated Assessments

Engineering Highlights

Multimodal VisionHandwriting RecognitionLong-Context ProcessingStudy Guide GenerationQuiz GenerationMobile Camera Capture