hw03-q2

Problem 2: Positional Encoding and Length Extrapolation

Implement three positional encoding strategies and test their ability to generalize to sequences longer than those seen during training.

Part A: Dataset and Data Loading

You will work with a sorting detection dataset where the model must determine if a sequence of integers is sorted in ascending order.

Generate the dataset:

cd problem2
python scripts/generate_data.py --seed 641

The dataset contains: - Task: Binary classification (1 = sorted, 0 = unsorted) - Training: Sequences of length 8-16 with integers 0-99 - Testing: Sequences of length 32, 64, 128, 256 (for extrapolation analysis) - Training samples: 10,000 (50% sorted, 50% unsorted) - Validation samples: 2,000

Example training samples:

{"sequence": [3, 15, 27, 41, 58, 72, 83, 94], "is_sorted": 1, "length": 8}
{"sequence": [15, 3, 72, 27, 41, 94, 58, 83], "is_sorted": 0, "length": 8}
{"sequence": [1, 5, 12, 18, 23, 29, 34, 40, 47, 53, 61, 68], "is_sorted": 1, "length": 12}

The starter code provides src/dataset.py with the data loader:

Part B: Positional Encoding Implementations

Implement src/positional_encoding.py with three encoding strategies:

Part C: Model Architecture

The transformer encoder is provided in src/model.py:

Part D: Training

Train your model using the provided training script:

Part E: Extrapolation Analysis

Implement analyze.py to test models on longer sequences:

Deliverables

Your problem2/ directory must contain:

All code files as specified above
results/sinusoidal/ containing:
- best_model.pth - saved model weights
- training_log.json - loss curves and accuracy
- training_curves.png - visualization
results/learned/ with same structure
results/none/ with same structure
results/extrapolation/ containing:
- extrapolation_results.json - accuracy data
- extrapolation_curves.png - main result plot
- learned_position_embeddings.png - position visualization

Your report must include analysis of:

Extrapolation curves showing accuracy vs. sequence length for all three methods
Mathematical explanation: Why does sinusoidal encoding extrapolate while learned encoding fails?
Position embedding visualization for learned encoding
Quantitative comparison: accuracy at lengths 32, 64, 128, 256