Skip to content

Official PyTorch Reading

This page is the research spine behind the docs set. All links below point to official PyTorch documentation.

TopicWhy it matters
DistributedDataParallelCurrent DDP behavior, including the fact that DDP does not shard inputs for you.
Distributed Data Parallel NotesDeeper background on how DDP works.
FullyShardedDataParallelCurrent FSDP surface and sharding model.
Distributed OverviewReference map for the larger distributed stack.
TopicWhy it matters
torchrunOne-process-per-GPU launch model and current --local-rank behavior.
NCCL environment variablesUseful when discussing operational debugging and communication failures.
TopicWhy it matters
torch.utils.dataDataLoader, DistributedSampler, prefetch_factor, persistent_workers, and set_epoch.
TopicWhy it matters
Distributed CheckpointMulti-rank checkpointing and load-time resharding.
Distributed Checkpoint RecipePractical implementation guidance.
TopicWhy it matters
AMPCurrent mixed-precision guidance.
Activation checkpointingMemory/compute tradeoff and RNG-state implications.
torch.compile end-to-end tutorialUseful if you want one more modern optimization angle in the discussion.

External references for the biotech-specific sections of this site. These are not PyTorch docs, but they are the standard references for the domain.

TopicWhy it matters
ESM protein language modelsReference implementation for large-scale protein sequence pretraining.
PyTorch Geometric DataLoaderGraph-level batching for molecular GNNs; the batch vector semantics.
AlphaFold 2 technical reportSource for pair representation design and Evoformer memory characteristics.
Megatron-LM sequence parallelismThe communication pattern behind sequence-parallel attention.
EGNN equivariant networkPractical SE(3)-equivariant message passing for molecular graphs.
MoleculeNet splitsScaffold, random, and scaffold-stratified split definitions used in benchmarking.
BEDROC enrichment metricWhy EF@1% and BEDROC matter more than AUROC for virtual screening evaluation.