From 66d9566cbf98be6aa3cfa0d4a0968bb4a0682fd5 Mon Sep 17 00:00:00 2001 From: Quentin Anthony Date: Sat, 30 Apr 2022 07:11:25 -0500 Subject: [PATCH] Add distributed example without multiprocessing to README --- imagenet/README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/imagenet/README.md b/imagenet/README.md index 0680c0bd28..699a5ccd18 100644 --- a/imagenet/README.md +++ b/imagenet/README.md @@ -45,6 +45,15 @@ Node 1: python main.py -a resnet50 --dist-url 'tcp://IP_OF_NODE0:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 1 [imagenet-folder with train and val folders] ``` +## Distributed Data Parallel Training + +If you wish to disable PyTorch's multiprocessing module and manually manage processes yourself (e.g. with MPI), you must specify the `world-size`, `rank` and `gpu` values yourself. For example: + +```bash +python main.py ... --world-size 2 --rank 0 --gpu 0 [imagenet-folder with train and val folders] & +python main.py ... --world-size 2 --rank 1 --gpu 1 [imagenet-folder with train and val folders] & +``` + ## Usage ```