Skip to content

Interview Cheatsheet

  • one process per GPU
  • DDP before FSDP unless memory forces otherwise
  • deterministic distributed sampler
  • rank-0 concise logs, all-rank structured errors
  • full-job restart from latest good checkpoint
  • model
  • optimizer
  • scheduler
  • AMP scaler
  • sampler progress
  • RNG state
  • step / epoch / config metadata
  • step time by phase
  • samples/sec
  • data-loader wait
  • all-reduce time
  • checkpoint latency
  • restart count
  • “I’m optimizing for the smallest correct distributed baseline first.”
  • “Data partitioning and resume semantics are correctness problems, not just performance details.”
  • “I would only introduce a more complex parallelism axis when the current bottleneck is explicit.”
  • “Checkpoint frequency is an RPO and cost decision.”
  • “In a notebook I’ll preserve production boundaries even if I mock the backing systems.”

Additional defaults for roles involving protein, molecular, or genomic data.

Biotech data defaults

  • scaffold split over random split; time split for prospective validation
  • missing bioassay labels are not negatives—mask them out of the loss
  • per-task AUROC, BEDROC, and EF@1% over accuracy and raw loss
  • sparse matrix row-slicing in __getitem__ for genomic datasets

Biotech model defaults

  • bfloat16 over float16 for long-sequence stability
  • use_reentrant=False in activation checkpointing for transformer blocks
  • layer normalization only in equivariant networks; batch norm breaks equivariance
  • sequence length curriculum from short to long to avoid early OOM events

Biotech high-signal sentences

  • “Scaffold split is the correctness bar; random split is the optimism bar.”
  • “Missing labels in a bioassay database are not negatives—treating them as negatives is a silent training error.”
  • “Equivariant network training imposes preprocessing contracts: frame canonicalization must precede data sharding.”
  • “Wet lab hit rate is the only metric the business actually cares about; computational metrics exist to let us move faster while we wait for it.”
  • “Active learning turns the dataset contract from a precondition into an ongoing invariant managed by the orchestrator.”