change batches_ahead for Loader

c00d6590 · Wu, Jiantao (PG/R - Comp Sci & Elec Eng) · 4dfeb51d · c00d6590 · c00d6590 · c00d6590
Commit c00d6590 authored 1 year ago by Wu, Jiantao (PG/R - Comp Sci & Elec Eng)
--- a/vitookit/evaluation/README.md
+++ b/vitookit/evaluation/README.md
+## Classification

-### Finetune Classification
-To evaluate finetuning stage for a pretrained model, run:
-```
-torchrun --master_port=29501 --nproc_per_node=2 evaluation/eval_cls.py --pretrained_weights <weight> --data_location <data_path> --data_set <data_set> --output_dir <output_dir> --epochs=1000 --arch vit_base --batch_size 64 --layer_decay 0
-```
+### avaliable datasets
+The argument `--data_set` can be one of the following:
+- IN1K
+- ominiglot
+- STL
+- CIFAR10
+- CIFAR100
+- Cars
+- Pets
+- Aircraft
+- Flowers
+- Folder

-Use one of the following settings:
- data_set: Cars, Pets, Aircraft, CIFAR10/100, ImageFolder (train/validation).
- epochs: 200 for IN100, 100 for IN1K, 1000 for other small datasets.
- head_type: 0 for CLS token only (DINO, iBOT), 1 for mean patch tokens (MAE, BEiT), 2 for concatnating the CLS token and mean patch tokens (dSiT)
- layer_decay: 0 for small datasets, 0.75 for large datasets (IN1K).
- arch: vit_small, _base ...
+### Finetune: `eval_cls.py` `eval_cls_ffcv.py`

-Example:
-```
-WANDB_PROJECT=SiT WANDB_NAME=threeaug torchrun --master_port=29501 --nproc_per_node=2  evaluation/eval_cls.py  --pretrained_weights ../related_work/SiT/outputs/imagenet/vit_base/checkpoint.pth --data_location ../data --data_set Cars --output_dir ../related_work/SiT/outputs/imagenet/vit_base/eval/finetune_ibot2-cars --epochs=1000 --arch vit_base --batch_size 64 --layer_decay 0 --ThreeAugment --weight_decay 0.02
-```
+We reproduced the results of MAE on ImageNet. The results are as follows:
+| **ImageNet Accuracy** | ViT-Base | ViT-Large | ViT-Huge |
+|-----------------------|----------|-----------|----------|
+| MAE repo              | 83.664   | 85.952    | 86.928   |
+| Our repo              |          |           |          |
+| Our repo (dres)       | 83.302   |           |          |

-ImageNet 4GPUs:
-```
--batch_size 64 --layer_decay 0.75 --weight_decay 0.05 --head_type=2
-```
+Training time is ~7h11m in 32 V100 GPUs for MAE repo. 

-### k-NN Classification 
-To evaluate k-NN classification on the frozen features, run:
+
+To launch the evaluation, use `vitrun` or `submitit`. For example, to finetune a pre-trained model on ImageNet, run:
 ```
-python -m torch.distributed.launch  --master_port=29501 --nproc_per_node=2 evaluation/eval_knn.py --pretrained_weights <weight> --data_location <data_path> --data_set <data_set> --output_dir <output_dir> --head_type <> --dis_fn <cosine/euclidean>
+submitit  --module vitookit.evaluation.eval_cls_ffcv   --train_path ~/data/ffcv/IN1K_train_500_95.ffcv --val_path  ~/data/ffcv/IN1K_val_500_100.ffcv --fast_dir /raid/local_scratch/jxw30-hxc19/ --gin VisionTransformer.global_pool='"avg"'   --blr 5e-4 --layer_decay 0.65 --weight_decay 0.05 --drop_path 0.1 --checkpoint_key=model -w ~/models/mae_pretrain_vit_base.pth 
 ```

-Use one of the following settings:
- head_type: 0 for CLS token only (DINO, iBOT), 1 for mean patch tokens (MAE, BEiT), 2 for concatnating the CLS token and mean patch tokens (dSiT)
- dis_fn: cosine is always better than euclidean
+Here the effective batch size is 128 (batch_size per gpu) * 8 (gpus per node) = 1024.

-
-### Linear Probing on ImageNet
-
-To train a single classifier on frozen weights with customized learning rate, run:
-```
-torchrun evaluation/eval_linear.py --batch_size 512 --blr 0.1 --weight_decay 0.0 --accum_iter=4 --arch vit_small --data_location=<> 
+**dres** Dynamic Resolution for Efficient Supervised Learning.
+```bash
+vitrun --nproc_per_node=8 eval_cls_ffcv.py --train_path <> --val_path <> -w ~/models/mae_pretrain_vit_base.pth --checkpoint_key=model --layer_decay=0.65 --gin VisionTransformer.global_pool='"avg"' DynamicResolution.start_ramp=0 DynamicResolution.end_ramp=60 DynamicResolution.scheme=1 --dynamic_resolution
 ```
-We follow the  MAE recipe to train the linear classifier. Note that:
+
+### Linear Prob
+We follow the MAE recipe to train the linear classifier. Note that:
 - The effective batch size is 16384 = 512 (batch_size per gpu) * 1 (nodes) * 8 (gpus per node) * 4 (accum_iter).
 - The actual `lr` is computed by `lr`` = `blr`` * effective batch size / 256.
 - Training time is ~2h20m for 90 epochs in 32 V100 GPUs.
@@ -48,37 +47,19 @@ Reference results for [MAE in linear probing](https://github.com/facebookresearc
 |                    | ViT-Base | ViT-Large | ViT-Huge |
 |:------------------:|:--------:|:---------:|:--------:|
 | paper (TF/TPU)     | 68.0     | 75.8      | 76.6     |
-| this repo (PT/GPU) | 67.8     | 76.0      | 77.2     |
-
+|  MAE repo (PT/GPU) | 67.8     | 76.0      | 77.2     |
+|  Our repo (PT/GPU) | 67.8     | 76.0      | 77.2     |

-### Fine-Tuning on ImageNet

-To fine-tune the pre-trained model, we apply layerwise decay and sweep the learning rate. 
-
-To train ViT-S/16 with 200 epochs, run:
-```
-./run.sh imagenet_cls $JOB_NAME vit_small teacher 8 \
-  --epochs 200 \
-  --drop_path 0.1 \
-  --layer_decay 0.75
-```
-To train ViT-B/16 with 100 epochs, run:
+To train a single classifier on frozen weights, run:
 ```
-./run.sh imagenet_cls $JOB_NAME vit_base teacher 8 \
-  --epochs 100 \
-  --drop_path 0.2 \
-  --layer_decay 0.65
+submitit --module vitookit.evaluation.eval_linear_ffcv --train_path ~/data/ffcv/IN1K_train_500_95.ffcv --val_path ~/data/ffcv/IN1K_val_500_95.ffcv  -w ~/models/mae_pretrain_vit_base.pth --checkpoint_key=model  --gin VisionTransformer.global_pool='"avg"'  --fast_dir /raid/local_scratch/jxw30-hxc19/ --batch_size=128 --accum_iter=16 --blr=0.1
 ```
-To train ViT-L/16 with 50 epochs, run:
+
+### k-NN Classification 
+To evaluate k-NN classification on the frozen features, run:
 ```
-./run.sh imagenet_cls $JOB_NAME vit_large teacher 8 \
-  --epochs 50 \
-  --drop_path 0.4 \
-  --layer_decay 0.75 \
-  --batch_size 64 \
-  --enable_deepspeed \
-  --warmup_epochs 5 \
-  --update_freq 2
+python -m torch.distributed.launch  --master_port=29501 --nproc_per_node=2 evaluation/eval_knn.py --pretrained_weights <weight> --data_location <data_path> --data_set <data_set> --output_dir <output_dir> --head_type <> --dis_fn <cosine/euclidean>
 ```

 ### Unsupervised Classification on ImageNet
@@ -227,4 +208,5 @@ for m in model_dir:
    cmd = f"python evaluation/eval_fewshot_cls.py --data_location ../data --arch=vit_base -w {m}/checkpoint.pth --output_dir={m}/eval/fewshot --data_set={ds}"
    print(cmd)
    os.system(cmd)
-```
\ No newline at end of file
+```
+
--- a/README.MD
+++ b/README.MD
@@ -7,7 +7,49 @@ Install the package by
 pip install git+https://gitlab.surrey.ac.uk/jw02425/vitoolkit.git
 ```

-## Evaluation
+# Run on HPC
+See the available evaluations in [evaluation protocols](EVALUATION.md).
+
+## commands
+
+```bash
+vitrun train_cls.py --data_location=../data/IMNET --gin VisionTransformer.global_pool='"avg"' -w wandb:dlib/EfficientSSL/lsx2qmys 
+```
+
+## condor
+
+```bash
+condor_submit condor/eval_weka_cls.submit model_dir=outputs/dinosara/base ARCH=vit_base
+```
+
+## Slurm
+
+```text
+
+usage: submitit for evaluation [-h] [--module MODULE] [--ngpus NGPUS] [--nodes NODES] [-t TIMEOUT] [--mem MEM] [--partition PARTITION] [--comment COMMENT] [--job_dir JOB_DIR] [--fast_dir FAST_DIR]
+
+options:
+  -h, --help            show this help message and exit
+  --module MODULE       Module to run
+  --ngpus NGPUS         Number of gpus to request on each node
+  --nodes NODES         Number of nodes to request
+  -t TIMEOUT, --timeout TIMEOUT
+                        Duration of the job
+  --mem MEM             Memory to request
+  --partition PARTITION
+                        Partition where to submit
+  --comment COMMENT     Comment to pass to scheduler
+  --job_dir JOB_DIR
+  --fast_dir FAST_DIR   The dictory of fast disk to load the datasets
+
+```
+
+We move files to **FAST_DIR**. For example, to finetune a pre-trained model on ImageNet, run:
+```bash
+submitit  --module vitookit.evaluation.eval_cls_ffcv   --train_path  ~/data/ffcv/IN1K_train_500_95.ffcv --val_path  ~/data/ffcv/IN1K_val_500_95.ffcv --gin VisionTransformer.global_pool='"avg"' -w wandb:dlib/EfficientSSL/lsx2qmys 
+```
+
+# Evaluation

 There are many protocols for evaluating the performance of a model. We provide a set of evaluation scripts for different tasks. Use `vitrun` to launch the evaluation.

@@ -36,32 +78,6 @@ The pretrained weights can be one of the following:

 You can further specify the *key* and *prefix* to extract the weights from a checkpoint file. For example, `--pretrained_weights=ckpt.pth --checkpoint_key model --prefix module.` will extract the state dict from the key "model" in the checkpoint file and remove its prefix "module." in the keys.

-# HPC
-
-## commands
-
-[training commands](evaluation/README.md)
-## condor
-```bash
-condor_submit condor/eval_weka_cls.submit model_dir=outputs/dinosara/base ARCH=vit_base
-condor_submit condor/eval_weka_seg.submit model_dir=outputs/dinosara/base
-```
-
-## Slurm
-
-```bash
-bin/submitit  --module vitookit.evaluation.eval_cls_ffcv   --train_path  ~/data/ffcv/IN1K_train_500_95.ffcv --val_path  ~/data/ffcv/IN1K_val_500_95.ffcv --gin VisionTransformer.global_pool='\"avg\"' -w wandb:dlib/EfficientSSL/lsx2qmys 
-```
-
-## Test examples
-
-We provide some simple examples.
-
-<p float="center">
-  <img src="imgs/sample1.JPEG" width="32%" />
-  <img src="imgs/sample2.JPEG" width="32%" /> 
-  <img src="imgs/sample3.jpg" width="32%" />
-</p>


 ## cluster

--- a/bin/submitit
+++ b/bin/submitit
@@ -142,8 +142,8 @@ def main():
        slurm_signal_delay_s=120,
        **kwargs
    )
-
-    executor.update_parameters(name="eval")
+    evaluation = args.module.split(".")[-1]
+    executor.update_parameters(name=evaluation)
    args.dist_url = get_init_file(args.job_dir).as_uri()
    print("args:", args)
    trainer = Trainer(args)

--- a/bin/vitrun
+++ b/bin/vitrun
@@ -20,9 +20,19 @@ pack_path = pkg_resources.get_distribution('vitookit').location

 import re
 import sys, os
-from torch.distributed.run import main
+from torch.distributed.run import parse_args, config_from_args, elastic_launch, uuid
 if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
-    sys.argv[1] = os.path.join(pack_path,'vitookit','evaluation',sys.argv[1])
-    print(sys.argv)
-    sys.exit(main())
+    
+    args = parse_args(None)
+    if args.standalone:
+        args.rdzv_backend = "c10d"
+        args.rdzv_endpoint = "localhost:0"
+        args.rdzv_id = str(uuid.uuid4())
+    config, cmd, cmd_args = config_from_args(args)
+    cmd_args[1] = os.path.join(pack_path, 'vitookit', 'evaluation', cmd_args[1])
+    # print(cmd, cmd_args)
+    elastic_launch(
+        config=config,
+        entrypoint=cmd,
+    )(*cmd_args)
--- a/vitookit/evaluation/eval_cls_ffcv.py
+++ b/vitookit/evaluation/eval_cls_ffcv.py
@@ -232,13 +232,13 @@ def main(args):
    
    
    order = OrderOption.RANDOM if args.distributed else OrderOption.QUASI_RANDOM
-    data_loader_train =  Loader(args.train_path, pipelines=ThreeAugmentPipeline(),
+    data_loader_train =  Loader(args.train_path, pipelines=ThreeAugmentPipeline(),batches_ahead=1,
                        batch_size=args.batch_size, num_workers=args.num_workers, 
                        order=order, distributed=args.distributed,seed=args.seed)
    

    data_loader_val =  Loader(args.val_path, pipelines=ValPipeline(),
-                        batch_size=args.batch_size, num_workers=args.num_workers, 
+                        batch_size=args.batch_size, num_workers=args.num_workers, batches_ahead=1,
                        distributed=args.distributed,seed=args.seed)

    mixup_fn = None

--- a/vitookit/evaluation/eval_linear_ffcv.py
+++ b/vitookit/evaluation/eval_linear_ffcv.py
@@ -39,21 +39,18 @@ from ffcv.loader import OrderOption

 def get_args_parser():
    parser = argparse.ArgumentParser('MAE linear probing for image classification', add_help=False)
-    parser.add_argument('--batch_size', default=512, type=int,
+    parser.add_argument('--batch_size', default=128, type=int,
                        help='Batch size per GPU (effective batch size is batch_size * accum_iter * # gpus')
    parser.add_argument('--epochs', default=90, type=int)
    parser.add_argument('--ckpt_freq', default=5, type=int)
-    parser.add_argument('--accum_iter', default=1, type=int,
+    parser.add_argument('--accum_iter', default=16, type=int,
                        help='Accumulate gradient iterations (for increasing the effective batch size under memory constraints)')

    # Model parameters
    parser.add_argument("--compile", action='store_true', default=False, help="compile model with PyTorch 2.0")
    parser.add_argument("--checkpoint_key", default=None, type=str, help="checkpoint key to load")
    parser.add_argument("--prefix", default=None, type=str, help="prefix of the model name")
-    parser.add_argument('--head_type', default=0, choices=[0, 1 ,2], type=int,
-        help="""How to aggress global information.
-        We typically set this to 0 for models with [CLS] token (e.g., DINO), 1 for models encouraging patch semantics e.g. BEiT, 2 for combining mean pool and CLS. 2 works well for all cases. """)
-
+    
    parser.add_argument('--input_size', default=224, type=int,
                        help='images input size')

@@ -138,7 +135,7 @@ def main(args):
    cudnn.benchmark = True
    
    data_loader_val =  Loader(args.val_path, pipelines=ValPipeline(),
-                        batch_size=args.batch_size, num_workers=args.num_workers, 
+                        batch_size=args.batch_size, num_workers=args.num_workers, batches_ahead=1,
                        distributed=args.distributed,seed=args.seed)
    
    global_rank = misc.get_rank()
@@ -234,7 +231,7 @@ def main(args):
    
    order = OrderOption.RANDOM if args.distributed else OrderOption.QUASI_RANDOM
    data_loader_train =  Loader(args.train_path, pipelines=SimplePipeline(),
-                        batch_size=args.batch_size, num_workers=args.num_workers, 
+                        batch_size=args.batch_size, num_workers=args.num_workers, batches_ahead=1,
                        order=order, distributed=args.distributed,seed=args.seed)
    
    for epoch in range(args.start_epoch, args.epochs):