Homework #2: Generative Adversarial Networks and Variational Autoencoders
EE 641: Fall 2025
Assigned: 24 September
Due: Tuesday, 07 October at 23:59
Submission: Gradescope via GitHub repository
- PyTorch >= 2.0 must be installed
- Allowed libraries: PyTorch, NumPy, Pillow (PIL), matplotlib, librosa (for audio only), and Python standard library
- No other external libraries permitted (including no torchvision.models, pre-trained models, or GAN-specific libraries)
Overview
In this assignment you will implement and analyze two fundamental generative models: Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). The first problem explores GAN training dynamics and mode collapse through font generation. The second problem implements a hierarchical VAE for drum pattern generation with style control.
Getting Started
Download the starter code: hw2-starter.zip
Extract and generate the datasets:
unzip hw2-starter.zip
cd hw2-starter
python setup_data.py --seed 641This creates a data/ directory with: - fonts/: Synthetic font dataset (28×28 grayscale letter images) - drums/: Drum pattern dataset (16×9 binary matrices)
Use seed 641 to ensure consistent results across submissions.
Problem 1: Font Generation GAN - Understanding Mode Collapse
Build a GAN that generates letter images and observe mode collapse firsthand. You will implement diagnostic tools and test different stabilization techniques.
Part A: Dataset and Data Loading
You will work with a font dataset containing grayscale images of letters A-Z in 10 different fonts.
The dataset is provided in the following structure: - Images: 28×28 grayscale, normalized to [0, 1] - Classes: 26 letters × 10 fonts = 260 unique letter-font combinations - Training: 200 samples per letter (mixed fonts) - Validation: 60 samples per letter
The starter code provides dataset.py with the data loader:
Part B: GAN Architecture
Implement models.py with Generator and Discriminator networks.
The starter code provides architecture skeletons that work well for 28×28 grayscale images:
Part C: Training Dynamics and Mode Collapse
Implement the training loop in training_dynamics.py to observe mode collapse:
Part D: Implementing Fixes for Mode Collapse
The starter code provides three techniques to combat mode collapse. Choose ONE to implement in fixes.py:
Part E: Analysis and Experiments
Complete evaluate.py to analyze your trained models:
Deliverables
Your problem1/ directory must contain:
- All code files as specified above
results/training_log.jsonwith loss curves and mode coverage metricsresults/best_generator.pth- saved model weightsresults/mode_collapse_analysis.png- visualization of mode collapseresults/visualizations/containing:- Generated letter grids at epochs 10, 30, 50, 100
- Mode coverage histogram (which letters survive)
- Interpolation sequences
- Comparison of vanilla vs fixed GAN
Your report must include analysis of:
- Why certain letters (like O, A) survive mode collapse while others (Q, X, Z) disappear
- Quantitative comparison of mode coverage with and without your chosen fix
- Discussion of training dynamics: when does collapse begin?
- Evaluation of your chosen stabilization technique’s effectiveness
Problem 2: Hierarchical VAE for Music Generation
Build a Variational Autoencoder that learns to generate drum patterns with controllable style. You will implement hierarchical latent variables, handle discrete outputs, and prevent posterior collapse.
Part A: Dataset and Representation
You will work with a dataset of drum patterns represented as binary matrices.
The dataset contains: - Format: 16×9 binary matrices (16 timesteps, 9 drum instruments) - Instruments: Kick, Snare, Closed Hi-hat, Open Hi-hat, Tom1, Tom2, Crash, Ride, Clap - Styles: Rock, Jazz, Hip-hop, Electronic, Latin (200 patterns each) - Total: 1000 unique drum patterns from MIDI files
The starter code provides dataset.py with the data loader:
Part B: Hierarchical VAE Architecture
A hierarchical VAE uses multiple levels of latent variables to capture different aspects of the data. The provided architecture design separates high-level style from low-level variations:
Implement the VAE architecture in hierarchical_vae.py:
Part C: Training Techniques for Discrete Data
Discrete outputs and posterior collapse are major challenges for VAEs. The starter code provides proven techniques:
Use the provided utilities in training_utils.py:
Part D: Training Implementation
The starter code provides train.py with the training loop structure:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import json
def compute_hierarchical_elbo(recon_x, x, mu_low, logvar_low, mu_high, logvar_high, beta=1.0):
"""
Compute ELBO for hierarchical VAE.
ELBO = E[log p(x|z_low)] - beta * KL(q(z_low|x) || p(z_low|z_high))
- beta * KL(q(z_high|z_low) || p(z_high))
Args:
recon_x: Reconstructed pattern logits [batch, 16, 9]
x: Original patterns [batch, 16, 9]
mu_low, logvar_low: Low-level latent parameters
mu_high, logvar_high: High-level latent parameters
beta: KL weight
Returns:
loss: Total loss
recon_loss: Reconstruction component
kl_low: KL for low-level latent
kl_high: KL for high-level latent
"""
# Reconstruction loss (binary cross-entropy)
recon_loss = F.binary_cross_entropy_with_logits(
recon_x.view(-1), x.view(-1), reduction='sum'
)
# TODO: Implement KL divergences
# KL(q(z_high) || p(z_high)) where p(z_high) = N(0, I)
kl_high = -0.5 * torch.sum(1 + logvar_high - mu_high.pow(2) - logvar_high.exp())
# TODO: KL(q(z_low) || p(z_low|z_high))
# This is more complex - can simplify to standard KL for now
kl_low = -0.5 * torch.sum(1 + logvar_low - mu_low.pow(2) - logvar_low.exp())
return recon_loss + beta * (kl_low + kl_high), recon_loss, kl_low, kl_high
def train_epoch(model, data_loader, optimizer, epoch, device):
"""
Train for one epoch with annealing schedules.
"""
model.train()
total_loss = 0
# Get annealing parameters for this epoch
beta = kl_annealing_schedule(epoch, method='cyclical')
temperature = temperature_annealing_schedule(epoch)
for batch_idx, (patterns, styles, _) in enumerate(data_loader):
patterns = patterns.to(device)
# TODO: Forward pass
# TODO: Compute loss with current beta
# TODO: Backward and optimize
# Log progress
if batch_idx % 10 == 0:
print(f'Epoch {epoch}, Batch {batch_idx}: Loss = {loss.item():.4f}, '
f'Beta = {beta:.3f}, Temp = {temperature:.2f}')
return total_loss / len(data_loader)
def main():
# Configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
batch_size = 32
num_epochs = 100
learning_rate = 0.001
# TODO: Initialize dataset, model, optimizer
# TODO: Training loop with logging
# TODO: Save checkpoints and final model
pass
if __name__ == '__main__':
main()Part E: Analysis and Music Generation
Complete analyze_latent.py to analyze the trained model and generate music:
Part F: Creative Experiments
Create a notebook experiments.ipynb with the following analyses:
- Genre Blending: Interpolate between jazz and rock patterns
- Complexity Control: Find latent dimensions that control pattern density
- Humanization: Add controlled variations to mechanical patterns
- Style Consistency: Generate full drum tracks with consistent style
Deliverables
Your problem2/ directory must contain:
- All code files as specified above
results/training_log.jsonwith loss curves and KL valuesresults/best_model.pth- saved model weightsresults/generated_patterns/containing:- 10 samples from each style
- Interpolation sequences
- Style transfer examples
results/latent_analysis/containing:- t-SNE visualization of latent space
- Disentanglement analysis
- Dimension interpretation results
results/audio_samples/with generated drum loops (optional but encouraged)
Your report must include analysis of:
- Evidence of posterior collapse and how annealing prevented it
- Interpretation of what each latent dimension learned to control
- Quality assessment: Do generated patterns sound musical?
- Comparison of different annealing strategies
- Success rate of style transfer while preserving rhythm
Submission Requirements
Your GitHub repository must follow this exact structure:
ee641-hw2-[username]/
├── problem1/
│ ├── models.py
│ ├── dataset.py
│ ├── models.py
│ ├── training_dynamics.py
│ ├── fixes.py
│ ├── train.py
│ ├── evaluate.py
│ └── results/
│ ├── training_log.json
│ ├── best_generator.pth
│ ├── mode_collapse_analysis.png
│ └── visualizations/
├── problem2/
│ ├── dataset.py
│ ├── hierarchical_vae.py
│ ├── training_utils.py
│ ├── train.py
│ ├── analyze_latent.py
│ └── results/
│ ├── training_log.json
│ ├── best_model.pth
│ ├── generated_patterns/
│ └── latent_analysis/
├── report.pdf
└── README.md
The README.md in your repository root must contain:
- Your full name
- USC email address
- Instructions to run each problem if they differ from the standard commands
- Any implementation notes
Before submitting:
- Your repository structure must match the requirement exactly
python train.pymust run without errors in each problem directory- All output files must be generated in the correct locations