Performance at the
Speed of Sight.

A Multimodal Gaze-Controlled Instrument & Remote Conductor

GazeComposer is a webcam-based performance environment. Gaze maps to pitch via GridKey, Mouth gestures control dynamics or trills, and Hands trigger chords. In Conductor mode, the same webcam signals become a remote conducting surface that sends visual cues to laptop performers in other locations.

After each take, the system renders your session into standard notation (MusicXML) and exports MIDI for immediate score generation.

Watch New Modes See Timeline

New Modes: Poet & Landscape

Mode Visual → Musical

Poet Mode

Poet Mode transforms a poem or lyrics into a playable musical landscape. Words from the text are placed across the GridKey at random, and as someone reads the poem aloud, you follow the words with your gaze—turning that search into music. Even the words that drift by as “detours” become part of the piece.

In short: text becomes a score you navigate with attention.

Mode Dual Camera

Landscape Mode

Landscape Mode uses a dual-camera setup to replace the self-view with the outside world. Instead of looking at your own face, you watch a live external scene and compose by pursuing whatever draws your attention—moment by moment.

In short: environment becomes an instrument.

What’s new in this build

While developing these modes, I also introduced small but meaningful refinements to gaze measurement stability and UI clarity (calibration feedback, smoother tracking feel, and more readable overlays). Both Poet and Landscape extend the same core idea: translating the visual into the musical.

Why GazeComposer?

Expressive Control

A true multimodal instrument. Eyes for melody, mouth for phrase shaping (dynamics/trill), and hands for texture (chords) — and, in Conductor mode, for steering ensemble density and register across the network.

GridKey & Tuning

The screen becomes a tuned pitch grid (5×10). Scale-aware cells ensure every gaze fixation lands in tune, enabling confident leaps.

Research Ready

Logs every event (pitch, dwell, vibrato) to CSV. Compare "Legacy Mouse" vs. "Mouth Profile" control modes for HCI studies.

Score Gen

Don't just play—compose. The system quantizes your dwell times and renders a professional MusicXML score and MIDI file instantly.

System Architecture

GazeComposer integrates MediaPipe tracking with a custom Python audio engine and a Node.js session server that supports remote Conductor mode sessions.

  • Input: Webcam Feed + Networked Conductor (Socket.io)
  • Tracking: Face Mesh (Iris/Lips) + Hand Landmarks
  • Calibration: Polynomial regression → GridKey (Tuned Pitch Grid)
  • Profiles:
    • - Profile A: Mouth Openness → Dynamics
    • - Profile B: Mouth Openness → Trill (On/Rate)
    • - Legacy: Mouse Position → Dynamics (Comparison)
  • Conductor Mode: Remote cueing surface for laptop ensembles (entries, cutoffs, holds, key changes, density hints).
  • Output: Real-time Audio, MIDI, CSV Logs
  • Post-Process: Score Renderer (Quantization → MusicXML)
System Diagram Part 1
System Diagram Part 2

Evolution Timeline

v5 GridKey + Chords

Texture and Sparkle

Introduces GridKey for scale-locked pitch accuracy. Hand gestures now spawn chords for harmonic support, while a mouth-driven trill profile adds ornamentation. Includes post-performance Score Export.

v4 Unison & Cells

Musical Cells

Development of the "Tuned Grid" concept. Supports mid-piece Key Changes triggered by the conductor, allowing multiple performers to play in unison across the network.

v3 Remote Conductor

Boston–Seoul Connection

Proof-of-concept for remote laptop ensembles. Conductor mode turns a single webcam into a control surface that broadcasts phrase-level cues (entries, cutoffs, holds, key changes) via a session server to players in different cities. Successfully tested in sessions between Boston and Seoul over typical home-internet latency.

Works Made with These Modes

Work Dual Perspective

Charlie Chaplin — The Cure: Two Gaze Melodies → One New Piece

While watching The Cure, I captured two distinct melodic traces: one formed by following the protagonist, and another formed by shifting attention to surrounding characters. Layered together, these gaze paths become a new composition—born from focus, distraction, and everything in between.

Research & Process

GazeComposer: A Pilot Study of Dwell-Based Vibrato Mapping

A single-performer pilot that uses GazeComposer’s GridKey mode to probe how dwell time can safely trigger vibrato without breaking timing.

This Phase 1 pilot treats GazeComposer’s GridKey mode as a testbed for dwell-based vibrato mapping. As a single designer-performer, I recorded a structured task under several vibrato presets, logging every note event (dwell time, vibrato depth, and per-note flags) to CSV. Across more than 800 notes, mean dwell time stayed tightly clustered around ~160–175 ms for all presets, confirming that the mapping preserves a consistent sense of tempo. However, vibrato depth remained extremely small and more than 99% of notes had zero vibrato, even under more “aggressive” settings. This suggests that the main design problem is the shape of the dwell-to-vibrato curve and its thresholds, rather than the maximum depth itself. The paper contributes a reusable logging and analysis pipeline for tuning vibrato presets in a perceptually informed way before moving to multi-participant studies.

Read Pilot Study

GazeComposer Conductor Mode

A system paper on using gaze, mouth, and hand gestures as a remote conducting surface for laptop ensembles spread across cities.

This paper presents conductor mode, a multimodal webcam-based interface for distributed orchestral performance built on top of GazeComposer. A single conductor uses gaze, mouth dynamics, and hand gestures as a control surface, while performers in different locations join the same session with only a laptop, webcam, headphones, and the client software. A lightweight client–server architecture maintains a shared timeline and relays visual cues—entries, cutoffs, holds, key changes, and texture instructions—from the conductor to selected players, who still shape their own lines locally with gaze and facial mappings. Case studies with musicians in Seoul and Boston explore how conductor cues can steer density, register, and unison passages despite typical home-internet latency, focusing on phrase-level timing rather than beat-perfect synchronization. The paper argues that this approach offers a more musical alternative to click tracks or fixed backing tracks, and outlines design considerations for future large-ensemble, cross-continental performances.

Read System Paper

Meet the Creator

Doohyun Jung

Classical Singer & Music Technologist

I am a classical singer who has moved into music technology. With a B.M. in Voice from Seoul National University and an M.M. from Boston University, my work sits between performance, instrument design, and questions of access.

GazeComposer began with a simple question: "How can someone with only a laptop and a camera still shape phrases, not just trigger sounds?" Seeing the gap between the desire to make music and the barriers of cost and space, I developed this system to turn embodied gestures into continuous, expressive performance.

Get in Touch

Interested in using GazeComposer for a performance or research study?

gazecomposer@gmail.com