SoundSeg

SoundSeg

Featured

An AI-powered sound segmentation tool that separates songs from YouTube into downloadable components (vocals, drums, etc)

React
TypeScript
Material UI
Firebase
Python

What problem does this solve?

Musicians, producers, and audio enthusiasts often need isolated instrument tracks from existing songs—whether to learn a guitar riff, remix a track, practice drumming along to a song, or create karaoke versions. Traditionally, this required access to the original multi-track recordings, which are almost never publicly available. Professional-grade source separation software exists but is expensive and complex.

SoundSeg uses Convolutional Neural Networks and spectral signal processing to automatically separate any song into its component parts: drums, bass, vocals, and other instruments. Users simply paste a YouTube URL or upload an audio file, and within about a minute, they receive downloadable isolated tracks.Who is it for?

  • Musicians who want to learn specific parts of a song by ear
  • Producers & DJs creating remixes, mashups, or samples
  • Content creators who need instrumental or a cappella versions
  • Music teachers isolating parts for students
  • Anyone wanting a karaoke-style experience

Why did I build it?

This was a collaborative project with a friend who developed the ML model, serverless backend, and co-authored a research paper on spectral normalization techniques. My role was to design and build the entire frontend—turning a research tool into a polished, accessible web application. I built the React UI with Material UI, implemented Google authentication, crafted a custom dark theme with subtle animations, and focused on creating an intuitive experience that makes advanced audio processing feel effortless for non-technical users.

Stack

  • React 18 + TypeScript with Vite
  • Material UI v5 with a heavily customized theme
  • Firebase for Google auth and storage
  • React Hook Form for form handling
  • React Context for auth and modal state

Architecture Decisions

Custom Theme System

  • Extended MUI's palette with a custom accent color via TypeScript module augmentation
  • Dark glassmorphism aesthetic—semi-transparent cards, backdrop blur, layered gradient glows
  • Centralized component overrides keep styling consistent across the app

Styled Components Pattern

  • All styled components defined at the top of each file, keeping JSX clean
  • Used MUI's styled() API with full theme access

Context Over Redux

  • Auth and modals managed via lightweight React Context—right-sized for this app's scope

Interesting Challenges

Deferred Authentication

  • Users can fill the form before signing in to reduce friction
  • On submit, unauthenticated users see a login modal
  • Form data is captured in a closure—after sign-in, the submission continues automatically

Queueable Modal System

  • Custom modal context supporting a queue with prepend/append positioning
  • Needed for potential sequential modals (error → login → success)
  • Each modal has a unique ID for targeted closing

File Uploads

  • Audio converted to base64 client-side before sending to Cloud Functions
  • Client-side size validation (10MB) to fail fast

Animations

  • CSS keyframe animations for floating logo, pulsing loader, glowing icons, staggered fade-in reveals

Audio Source Separation
  • Upload any audio file or paste a YouTube URL
  • AI separates the track into individual stems: drums, bass, vocals, and other instruments
  • Download each isolated track directly from the browser
Flexible Input Options
  • Supports direct audio file uploads (MP3, WAV, etc.)
  • Accepts YouTube and other video platform URLs—automatically extracts and processes the audio
  • Built-in demo mode to preview the output without uploading anything
Frictionless Authentication
  • No sign-in required to explore the interface
  • Google authentication only prompted when submitting a request
  • After sign-in, the original action continues automatically—no re-entry needed
Polished Dark UI
  • Custom glassmorphism theme with layered gradients and subtle animations
  • Responsive design adapts to mobile and desktop
  • Real-time loading feedback with progress indicators
Instant Playback & Download
  • Processed stems are available for immediate playback in-browser
  • One-click download for each isolated track

🏆 Achievements

Backdrop Build v4 Winner

Built and shipped a product during a competitive 4-week builder cohort focused on rapid iteration and public feedback

Collaborating Across Disciplines

  • Worked closely with a friend focused on ML research—learned to bridge the gap between a research tool and a user-facing product
  • Gained appreciation for API contract design and clear communication when building on top of someone else's backend

Handling Long-Running Async Operations

  • Audio processing takes 30–60 seconds; learned to design around that with clear loading states and user feedback
  • Reinforced the importance of managing user expectations when you can't make something faster

Shipping a Complete Product

  • Small project, but end-to-end: design, build, deploy, and actually put it in front of users
  • Reinforced that "done" beats "perfect"