ViT tookit
We aim to provide a tookit for evaluating and analyzing Vision Transformers. Install the package by
pip install git+https://gitlab.surrey.ac.uk/jw02425/vitoolkit.git
Evaluation
There are many protocols for evaluating the performance of a model. We provide a set of evaluation scripts for different tasks. Use vitrun
to launch the evaluation.
Fineune Protocol for Image Classification
vitrun eval_cls.py --data_location=$data_path -w <weights.pth> --gin key=value
Lnear prob Protocol for Image Classification
vitrun eval_linear.py --data_location=$data_path -w <weights.pth> --gin key=value
More evaluation scripts can be found in evaluation.
Flexible Configuration
We use gin-config to configure the model and the training process. You can easily change the configuration by passing the gin files by --cfgs <file1> <file2> ...
or directly change the bindings by --gin key=value
.
Pretrained weights
The pretrained weights can be one of the following:
- a local file
- a url starting with
https://
- an artifact path starting with
artifact:
- a run path starting with
wandb:<entity>/<project>/<run>
, whereweights.pth
will be used as the weights file.
You can further specify the key and prefix to extract the weights from a checkpoint file. For example, --pretrained_weights=ckpt.pth --checkpoint_key model --prefix module.
will extract the state dict from the key "model" in the checkpoint file and remove its prefix "module." in the keys.
HPC
commands
condor
condor_submit condor/eval_weka_cls.submit model_dir=outputs/dinosara/base ARCH=vit_base
condor_submit condor/eval_weka_seg.submit model_dir=outputs/dinosara/base
Test examples
We provide some simple examples.
cluster
To see the self-attention map and the feature map of a given image, run
python bin/viz_vit.py --arch vit_base --pretrained_weights <checkpoint.pth> --img imgs/sample1.JPEG
CAM Visualization
Gram CAM is an important tool to diagnose model predictions. We use pytorch-grad-cam to visualize the parts focused by the model for classification.
python bin/grad_cam.py --arch=vit_base --method=scorecam --pretrained_weights=<> --img imgs/sample1.JPEG--output_img=<>
Evaluation
Condor
Run a group of experiments:
condor_submit condor/eval_stornext_cls.submit model_dir=../SiT/outputs/imagenet/sit-ViT_B head_type=0