hw02-q1

Problem 1: Font Generation GAN - Understanding Mode Collapse

Build a GAN that generates letter images and observe mode collapse firsthand. You will implement diagnostic tools and test different stabilization techniques.

Part A: Dataset and Data Loading

You will work with a font dataset containing grayscale images of letters A-Z in 10 different fonts.

The dataset is provided in the following structure: - Images: 28×28 grayscale, normalized to [0, 1] - Classes: 26 letters × 10 fonts = 260 unique letter-font combinations - Training: 200 samples per letter (mixed fonts) - Validation: 60 samples per letter

The starter code provides dataset.py with the data loader:

Part B: GAN Architecture

Implement models.py with Generator and Discriminator networks.

The starter code provides architecture skeletons that work well for 28×28 grayscale images:

Part C: Training Dynamics and Mode Collapse

Implement the training loop in training_dynamics.py to observe mode collapse:

Part D: Implementing Fixes for Mode Collapse

The starter code provides three techniques to combat mode collapse. Choose ONE to implement in fixes.py:

Part E: Analysis and Experiments

Complete evaluate.py to analyze your trained models:

Deliverables

Your problem1/ directory must contain:

All code files as specified above
results/training_log.json with loss curves and mode coverage metrics
results/best_generator.pth - saved model weights
results/mode_collapse_analysis.png - visualization of mode collapse
results/visualizations/ containing:
- Generated letter grids at epochs 10, 30, 50, 100
- Mode coverage histogram (which letters survive)
- Interpolation sequences
- Comparison of vanilla vs fixed GAN

Your report must include analysis of:

Why certain letters (like O, A) survive mode collapse while others (Q, X, Z) disappear
Quantitative comparison of mode coverage with and without your chosen fix
Discussion of training dynamics: when does collapse begin?
Evaluation of your chosen stabilization technique’s effectiveness