Official PyTorch Reading
This page is the research spine behind the docs set. All links below point to official PyTorch documentation.
Distributed Training Core
Section titled “Distributed Training Core”| Topic | Why it matters |
|---|---|
| DistributedDataParallel | Current DDP behavior, including the fact that DDP does not shard inputs for you. |
| Distributed Data Parallel Notes | Deeper background on how DDP works. |
| FullyShardedDataParallel | Current FSDP surface and sharding model. |
| Distributed Overview | Reference map for the larger distributed stack. |
Launch and Runtime
Section titled “Launch and Runtime”| Topic | Why it matters |
|---|---|
torchrun | One-process-per-GPU launch model and current --local-rank behavior. |
| NCCL environment variables | Useful when discussing operational debugging and communication failures. |
Data Plane
Section titled “Data Plane”| Topic | Why it matters |
|---|---|
torch.utils.data | DataLoader, DistributedSampler, prefetch_factor, persistent_workers, and set_epoch. |
Checkpointing and Recovery
Section titled “Checkpointing and Recovery”| Topic | Why it matters |
|---|---|
| Distributed Checkpoint | Multi-rank checkpointing and load-time resharding. |
| Distributed Checkpoint Recipe | Practical implementation guidance. |
Performance and Memory
Section titled “Performance and Memory”| Topic | Why it matters |
|---|---|
| AMP | Current mixed-precision guidance. |
| Activation checkpointing | Memory/compute tradeoff and RNG-state implications. |
torch.compile end-to-end tutorial | Useful if you want one more modern optimization angle in the discussion. |
Biotech and Life Sciences
Section titled “Biotech and Life Sciences”External references for the biotech-specific sections of this site. These are not PyTorch docs, but they are the standard references for the domain.
| Topic | Why it matters |
|---|---|
| ESM protein language models | Reference implementation for large-scale protein sequence pretraining. |
| PyTorch Geometric DataLoader | Graph-level batching for molecular GNNs; the batch vector semantics. |
| AlphaFold 2 technical report | Source for pair representation design and Evoformer memory characteristics. |
| Megatron-LM sequence parallelism | The communication pattern behind sequence-parallel attention. |
| EGNN equivariant network | Practical SE(3)-equivariant message passing for molecular graphs. |
| MoleculeNet splits | Scaffold, random, and scaffold-stratified split definitions used in benchmarking. |
| BEDROC enrichment metric | Why EF@1% and BEDROC matter more than AUROC for virtual screening evaluation. |