Final Project Guidelines

EE 641: A Computational Introduction to Deep Learning

See project deliverables for submission requirements and deadlines.

Overview

The final project requires teams of two students to design and implement a deep learning system that addresses a substantial problem in machine learning. Your project must demonstrate mastery of problem specification, model selection, data analysis, experimental design, and rigorous evaluation. This is not a survey of existing techniques or a simple application of course material to a standard benchmark. You are expected to tackle a challenging problem that requires thoughtful design decisions, careful implementation, and thorough analysis.

Teams will propose their approach, implement and train their models, evaluate results comprehensively, and present both technical findings and broader insights.

Technical Requirements

Your project must demonstrate depth in the following areas:

Problem Specification: Define a clear problem with measurable objectives. This might be improving performance on a challenging task, exploring architectural variations, generating novel datasets, or investigating theoretical properties of models. The problem should require more than applying an existing architecture to a standard benchmark. Vague goals like “improve image classification” or “build a chatbot” are insufficient. Specific problems like “develop a diffusion model for high-resolution medical image synthesis with controllable anatomical features” or “investigate the impact of attention head pruning on transformer generalization across distribution shifts” demonstrate appropriate scope.
Model Selection and Architecture: Make deliberate choices about model architecture based on problem requirements. This might involve adapting existing architectures, combining multiple approaches, or designing novel components. Justify your architectural decisions with reference to the problem characteristics, computational constraints, and theoretical considerations. Simply using a pre-trained model or following a tutorial implementation does not meet this requirement.
Data Analysis and Preparation: Demonstrate understanding of your data through exploratory analysis. This includes characterizing distributions, identifying biases, analyzing failure modes, and understanding what makes the problem challenging. Some projects may involve generating synthetic datasets, curating specialized collections, or developing augmentation strategies. Data work should reflect deep engagement with the problem, not just downloading a standard dataset.
Training and Optimization: Implement training procedures that reflect understanding of optimization, regularization, and convergence behavior. This includes hyperparameter selection with justification, learning rate scheduling, monitoring convergence, and diagnosing training issues. Document your training process with loss curves, validation metrics, and computational costs. Show that you understand why training succeeds or fails, not just that it runs.
Evaluation and Analysis: Evaluate your model rigorously using appropriate metrics and baselines. This goes beyond reporting a single number—analyze performance across different conditions, failure modes, computational requirements, and limitations. Compare against relevant baselines with statistical significance where appropriate. Understand what your model learned and what it failed to learn.
Critical Reflection: Demonstrate critical thinking about your approach and results. What worked and why? What failed and why? What assumptions did you make and how did they hold? What would you do differently? What questions remain unanswered? This reflection separates projects that execute a plan from projects that develop understanding.

Project Scope

Projects should demonstrate significant technical depth appropriate for students completing an advanced deep learning course. You have substantial background in architectures, optimization, and implementation. Your project should reflect this expertise.

What Your Project Is Not

Not a tutorial implementation: Following a blog post or paper implementation without modification or deep understanding is insufficient. If you could complete the core work in a weekend by following existing code, the scope is too small.
Not a simple benchmark comparison: Training three architectures on CIFAR-10 and comparing accuracy does not demonstrate the required depth. Benchmark comparisons may be part of a larger project but cannot be the entire contribution.
Not a survey or literature review: Implementing multiple existing techniques and reporting results is not a project. Your work should involve substantial technical contribution beyond reproducing known results.
Not an application without insight: Applying existing models to a new domain without adaptation, analysis, or understanding of why the approach works (or doesn’t) lacks the required depth.

What Your Project Is

A deep investigation of a specific problem: Your project should explore a problem thoroughly enough to develop genuine insight. This might mean understanding why certain architectural choices matter, how data characteristics affect performance, or what theoretical properties emerge from specific designs.
A demonstration of technical mastery: Your implementation, training, and evaluation should show that you understand deep learning systems at a fundamental level. You should be able to diagnose issues, make informed design decisions, and justify your choices with both theoretical and empirical reasoning.
An exercise in critical analysis: The strongest projects don’t just solve a problem—they develop understanding about why solutions work, where they fail, and what this reveals about the underlying phenomena.
A complete system with reproducible results: Your project should be implemented carefully enough that results are reliable and reproducible. This includes proper experimental design, controlling for randomness, and documenting all decisions that affect outcomes.

Choosing a Project

Select a problem that allows you to demonstrate depth rather than breadth. It is better to thoroughly investigate one aspect of deep learning than to superficially touch many topics.

Several project types work well for this course:

Architectural Investigation: Explore how specific architectural choices affect model behavior. This might involve systematic ablations of transformer components, investigating different attention mechanisms, or analyzing how architectural constraints affect learned representations. The goal is understanding why architectures work, not just demonstrating that they do.
Generative Modeling: Develop generative models for challenging domains. This might involve diffusion models for specific data types, GANs with novel training procedures, or VAEs with specialized encoders. Strong projects in this area show careful analysis of generation quality, mode coverage, and failure modes.
Dataset Development and Analysis: Create synthetic datasets that test specific capabilities, curate specialized collections that fill gaps in existing benchmarks, or develop data augmentation strategies for challenging domains. Dataset projects require rigorous validation that the data serves its intended purpose.
Algorithmic Development: Investigate training procedures, optimization techniques, or learning algorithms. This might involve analyzing convergence behavior under different conditions, developing adaptive learning rate schedules, or exploring regularization strategies. These projects require strong theoretical grounding combined with empirical validation.
Applied Research: Tackle challenging problems in specific domains that require adapting or extending existing techniques. This might involve medical imaging, scientific computing, reinforcement learning for complex tasks, or multimodal learning. The application should drive technical innovation, not just demonstrate existing methods.
Theoretical Investigation with Empirical Validation: Explore theoretical properties of models through carefully designed experiments. This might involve studying generalization behavior, analyzing learned representations, or investigating optimization landscapes. These projects require precise experimental design to test specific hypotheses.

The key is identifying a problem where you can make meaningful progress while developing deep understanding. Your project should feel like research, even if it’s not publishable—it should investigate questions rather than simply apply known solutions.

Technical Depth Expectations

Your project should demonstrate depth appropriate for students completing an advanced deep learning course.

Model Implementation

You should implement significant components yourself rather than only using high-level APIs. This doesn’t mean writing everything from scratch—PyTorch provides essential building blocks. But you should understand and implement the core technical components of your approach.

If you use pre-trained models, your project should involve substantial modification, fine-tuning with careful analysis, or using them as components in a larger system you design. Simply loading a pre-trained model and running inference is insufficient.

Experimental Design

Design experiments that test specific hypotheses or answer specific questions. This requires controlling for confounding factors, using appropriate train/validation/test splits, running multiple trials where randomness matters, measuring what matters for your specific problem, and comparing against meaningful baselines. Thoughtful experimental design distinguishes rigorous work from casual exploration.

Computational Considerations

Demonstrate awareness of computational requirements and constraints. This includes estimating and reporting training costs (GPU-hours, memory requirements), making architectural choices appropriate for available compute, optimizing where it matters for your problem, and understanding trade-offs between model capacity and training time. You don’t need access to massive compute resources, but you should work within your constraints intelligently and document your choices.

Analysis Depth

Go beyond surface-level metrics. Analyze what your model learned (visualization of learned features, attention patterns, generated samples), where it fails (failure mode analysis with examples), why it behaves as it does (ablations, controlled experiments), how sensitive results are to design choices (hyperparameter studies with trends, not just final values), and what the results mean (interpretation with connection to theory and prior work).

Data and Datasets

Your relationship with data should demonstrate sophistication beyond downloading a standard benchmark.

If using existing datasets: Analyze them carefully. Understand their characteristics, biases, and limitations. Show that you’ve thought about train/test distribution differences, class imbalances, or annotation quality. Your data analysis should reveal something about the problem, not just report statistics.

If generating synthetic data: Validate that your synthetic data serves its intended purpose. This might mean showing it tests specific capabilities, matches real data characteristics in important ways, or explores edge cases that natural data doesn’t cover. Synthetic data projects require justification for why generation is necessary and validation that it’s appropriate.

If curating specialized datasets: Document your collection and curation process. Explain what makes this dataset valuable, how it differs from existing resources, and how you ensure quality. Dataset curation projects should produce resources that enable research, not just gather examples.

For all projects: Explain why your data choices are appropriate for your problem. What aspects of the data make the problem challenging? How do data characteristics affect modeling choices? What would happen with different data?

Evaluation and Validation

Rigorous evaluation is not optional—it’s central to demonstrating that you understand what you’ve built.

Metrics: Choose metrics appropriate for your problem and justify your choices. Understand what metrics measure and what they miss. Report multiple metrics when different aspects of performance matter. Include uncertainty estimates (standard deviations across runs, confidence intervals) where randomness affects results.

Baselines: Compare against meaningful baselines that contextualize your results. This might include standard architectures for your task, simpler versions of your approach (ablations), prior work on the same or similar problems, or random or heuristic baselines that establish lower bounds. Explain what each baseline tests and what comparisons reveal.

Failure Analysis: Understand where your approach fails. Analyze failure modes systematically. Show examples of failures and explain why they occur. Understanding failures often reveals more than analyzing successes.

Ablation Studies: For any non-trivial design, conduct ablations to understand which components matter. This might mean removing architectural components, simplifying training procedures, or varying data augmentation. Ablations should test specific hypotheses about what drives performance.

Generalization Analysis: Understand how your model generalizes beyond training data. This might involve out-of-distribution testing, cross-dataset evaluation, analyzing learned representations, or studying behavior on edge cases.

Research Projects and Novel Contributions

Some projects may pursue novel contributions or preliminary research. These projects have additional considerations:

Research projects must maintain the same technical rigor as applied projects. Novel ideas don’t excuse poor implementation or weak evaluation. In fact, they require more careful validation since there’s no prior work to follow.

Position your contribution clearly. What is new in your approach? What builds on existing work? What assumptions are you making? Research projects should acknowledge both novelty and limitations honestly.

Provide stronger baselines. If you claim improvement over existing approaches, demonstrate it rigorously. This means implementing baselines carefully, not just citing reported numbers from papers trained under different conditions.

Document failures and negative results. Research projects often encounter dead ends. Document what you tried and why it didn’t work—these insights demonstrate understanding even when results disappoint.

Evaluation Criteria

Projects will be evaluated on technical depth, experimental rigor, critical analysis, and quality of execution.

Technical Depth: Does your project demonstrate sophisticated understanding of deep learning? Have you made thoughtful design decisions and justified them? Is your implementation robust and carefully done?

Experimental Rigor: Are your experiments well-designed to test specific hypotheses? Have you controlled for confounding factors? Are your results reliable and reproducible?

Analysis and Insight: Do you understand what your results mean? Have you analyzed failures as well as successes? Do you demonstrate critical thinking about your approach and findings?

Scope and Execution: Is your project appropriately ambitious for two students over the project period? Have you completed what you set out to do? Is your work thorough rather than superficial?

Evaluation emphasizes depth of understanding over breadth of coverage. A focused project executed rigorously with thoughtful analysis will outperform an ambitious project that touches many topics superficially.

Academic Integrity

All code must be written by your team unless explicitly documented otherwise. You may use standard libraries (PyTorch, NumPy, etc.) and build on published architectures. You may reference papers, tutorials, and documentation. You may use AI assistants for understanding concepts or debugging.

You may not: - Copy substantial code from other projects without attribution and understanding - Use pre-trained models as your primary contribution without significant modification - Submit work you don’t understand because AI or others generated it - Misrepresent others’ work as your own

When using code from papers, repositories, or other sources, document it clearly and explain what you added or modified. Your contribution should be substantial and clearly identifiable.

The goal is developing deep understanding, not just producing results. If you can’t explain every component of your project in technical detail, you haven’t met this standard.