Skip to content
Snippets Groups Projects
user avatar
gent authored
20b883cb
History

ViT tookit

We aim to provide a tookit for evaluating and analyzing Vision Transformers. Install the package by

pip install git+https://gitlab.surrey.ac.uk/jw02425/vitoolkit.git

Evaluation

There are many protocols for evaluating the performance of a model. We provide a set of evaluation scripts for different tasks. Use vitrun to launch the evaluation.

Fineune Protocol for Image Classification

vitrun eval_cls.py --data_location=$data_path -w <weights.pth> --gin key=value

Lnear prob Protocol for Image Classification

vitrun eval_linear.py --data_location=$data_path -w <weights.pth> --gin key=value

More evaluation scripts can be found in evaluation.

Flexible Configuration

We use gin-config to configure the model and the training process. You can easily change the configuration by passing the gin files by --cfgs <file1> <file2> ... or directly change the bindings by --gin key=value.

Pretrained weights

The pretrained weights can be one of the following:

  • a local file
  • a url starting with https://
  • an artifact path starting with artifact:
  • a run path starting with wandb:<entity>/<project>/<run>, where weights.pth will be used as the weights file.

You can further specify the key and prefix to extract the weights from a checkpoint file. For example, --pretrained_weights=ckpt.pth --checkpoint_key model --prefix module. will extract the state dict from the key "model" in the checkpoint file and remove its prefix "module." in the keys.

HPC

commands

training commands

condor

condor_submit condor/eval_weka_cls.submit model_dir=outputs/dinosara/base ARCH=vit_base
condor_submit condor/eval_weka_seg.submit model_dir=outputs/dinosara/base

Test examples

We provide some simple examples.

cluster

To see the self-attention map and the feature map of a given image, run

python bin/viz_vit.py --arch vit_base --pretrained_weights <checkpoint.pth> --img imgs/sample1.JPEG

CAM Visualization

Gram CAM is an important tool to diagnose model predictions. We use pytorch-grad-cam to visualize the parts focused by the model for classification.

python bin/grad_cam.py --arch=vit_base  --method=scorecam --pretrained_weights=<> --img imgs/sample1.JPEG--output_img=<>

Evaluation

Condor

Run a group of experiments:

condor_submit condor/eval_stornext_cls.submit model_dir=../SiT/outputs/imagenet/sit-ViT_B head_type=0