Problem 1: Font Generation GAN - Understanding Mode Collapse
Build a GAN that generates letter images and observe mode collapse firsthand. You will implement diagnostic tools and test different stabilization techniques.
Part A: Dataset and Data Loading
You will work with a font dataset containing grayscale images of letters A-Z in 10 different fonts.
The dataset is provided in the following structure: - Images: 28×28 grayscale, normalized to [0, 1] - Classes: 26 letters × 10 fonts = 260 unique letter-font combinations - Training: 200 samples per letter (mixed fonts) - Validation: 60 samples per letter
The starter code provides dataset.py with the data loader:
Part B: GAN Architecture
Implement models.py with Generator and Discriminator networks.
The starter code provides architecture skeletons that work well for 28×28 grayscale images:
Part C: Training Dynamics and Mode Collapse
Implement the training loop in training_dynamics.py to observe mode collapse:
Part D: Implementing Fixes for Mode Collapse
The starter code provides three techniques to combat mode collapse. Choose ONE to implement in fixes.py:
Part E: Analysis and Experiments
Complete evaluate.py to analyze your trained models:
Deliverables
Your problem1/ directory must contain:
- All code files as specified above
results/training_log.jsonwith loss curves and mode coverage metricsresults/best_generator.pth- saved model weightsresults/mode_collapse_analysis.png- visualization of mode collapseresults/visualizations/containing:- Generated letter grids at epochs 10, 30, 50, 100
- Mode coverage histogram (which letters survive)
- Interpolation sequences
- Comparison of vanilla vs fixed GAN
Your report must include analysis of:
- Why certain letters (like O, A) survive mode collapse while others (Q, X, Z) disappear
- Quantitative comparison of mode coverage with and without your chosen fix
- Discussion of training dynamics: when does collapse begin?
- Evaluation of your chosen stabilization technique’s effectiveness