Problem 2: Positional Encoding and Length Extrapolation
Implement three positional encoding strategies and test their ability to generalize to sequences longer than those seen during training.
Part A: Dataset and Data Loading
You will work with a sorting detection dataset where the model must determine if a sequence of integers is sorted in ascending order.
Generate the dataset:
cd problem2
python scripts/generate_data.py --seed 641The dataset contains: - Task: Binary classification (1 = sorted, 0 = unsorted) - Training: Sequences of length 8-16 with integers 0-99 - Testing: Sequences of length 32, 64, 128, 256 (for extrapolation analysis) - Training samples: 10,000 (50% sorted, 50% unsorted) - Validation samples: 2,000
Example training samples:
{"sequence": [3, 15, 27, 41, 58, 72, 83, 94], "is_sorted": 1, "length": 8}
{"sequence": [15, 3, 72, 27, 41, 94, 58, 83], "is_sorted": 0, "length": 8}
{"sequence": [1, 5, 12, 18, 23, 29, 34, 40, 47, 53, 61, 68], "is_sorted": 1, "length": 12}The starter code provides src/dataset.py with the data loader:
Part B: Positional Encoding Implementations
Implement src/positional_encoding.py with three encoding strategies:
Part C: Model Architecture
The transformer encoder is provided in src/model.py:
Part D: Training
Train your model using the provided training script:
Part E: Extrapolation Analysis
Implement analyze.py to test models on longer sequences:
Deliverables
Your problem2/ directory must contain:
- All code files as specified above
results/sinusoidal/containing:best_model.pth- saved model weightstraining_log.json- loss curves and accuracytraining_curves.png- visualization
results/learned/with same structureresults/none/with same structureresults/extrapolation/containing:extrapolation_results.json- accuracy dataextrapolation_curves.png- main result plotlearned_position_embeddings.png- position visualization
Your report must include analysis of:
- Extrapolation curves showing accuracy vs. sequence length for all three methods
- Mathematical explanation: Why does sinusoidal encoding extrapolate while learned encoding fails?
- Position embedding visualization for learned encoding
- Quantitative comparison: accuracy at lengths 32, 64, 128, 256