Skip to content
Snippets Groups Projects
Unverified Commit 2c79b01a authored by Georgios Pavlakos's avatar Georgios Pavlakos Committed by GitHub
Browse files

Initial commit

parent b91ed5ca
No related branches found
No related tags found
No related merge requests found
Showing
with 733 additions and 1 deletion
MIT License
Copyright (c) 2023 UC Regents, Georgios Pavlakos
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# hamer
# HaMeR: Hand Mesh Recovery
Code repository for the paper:
**Reconstructing Hands in 3D with Transformers**
[Georgios Pavlakos](https://geopavlakos.github.io/), [Dandan Shan](https://ddshan.github.io/), [Ilija Radosavovic](https://people.eecs.berkeley.edu/~ilija/), [Angjoo Kanazawa](https://people.eecs.berkeley.edu/~kanazawa/), [David Fouhey](https://cs.nyu.edu/~fouhey/), [Jitendra Malik](http://people.eecs.berkeley.edu/~malik/)
[![arXiv](https://img.shields.io/badge/arXiv-2305.20091-00ff00.svg)](https://arxiv.org/pdf/2312.05251.pdf) [![Website shields.io](https://img.shields.io/website-up-down-green-red/http/shields.io.svg)](https://geopavlakos.github.io/hamer/) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1rQbQzegFWGVOm1n1d-S6koOWDo7F2ucu?usp=sharing) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/geopavlakos/HaMeR)
![teaser](assets/teaser.jpg)
## Installation
First you need to clone the repo:
```
git clone --recursive git@github.com:geopavlakos/hamer.git
cd hamer
```
We recommend creating a virtual environment for HaMeR. You can use venv:
```bash
python3.10 -m venv .hamer
source .hamer/bin/activate
```
or alternatively conda:
```bash
conda create --name hamer python=3.10
conda activate hamer
```
Then, you can install the rest of the dependencies. This is for CUDA 11.7, but you can adapt accordingly:
```bash
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu117
pip install -e .[all]
pip install -v -e third-party/ViTPose
```
You also need to download the trained models:
```bash
bash fetch_demo_data.sh
```
Besides these files, you also need to download the MANO model. Please visit the [MANO website](https://mano.is.tue.mpg.de) and register to get access to the downloads section. We only require the right hand model. You need to put `MANO_RIGHT.pkl` under the `_DATA/data/mano` folder.
## Demo
```bash
python demo.py \
--img_folder example_data --out_folder demo_out \
--batch_size=48 --side_view --save_mesh --full_frame
```
## Training
First, download the training data to `./hamer_training_data/` by running:
```
bash fetch_training_data.sh
```
Then you can start training using the following command:
```
python train.py exp_name=hamer data=mix_all experiment=hamer_vit_transformer trainer=gpu launcher=local
```
Checkpoints and logs will be saved to `./logs/`.
## Acknowledgements
Parts of the code are taken or adapted from the following repos:
- [4DHumans](https://github.com/shubham-goel/4D-Humans)
- [SLAHMR](https://github.com/vye16/slahmr)
- [ProHMR](https://github.com/nkolot/ProHMR)
- [SPIN](https://github.com/nkolot/SPIN)
- [SMPLify-X](https://github.com/vchoutas/smplify-x)
- [HMR](https://github.com/akanazawa/hmr)
- [ViTPose](https://github.com/ViTAE-Transformer/ViTPose)
- [Detectron2](https://github.com/facebookresearch/detectron2)
Additionally, we thank [StabilityAI](https://stability.ai/) for a generous compute grant that enabled this work.
## Citing
If you find this code useful for your research, please consider citing the following paper:
```bibtex
@inproceedings{pavlakos2023reconstructing,
title={Reconstructing Hands in 3{D} with Transformers},
author={Pavlakos, Georgios and Shan, Dandan and Radosavovic, Ilija and Kanazawa, Angjoo and Fouhey, David and Malik, Jitendra},
booktitle={arxiv},
year={2023}
}
```
assets/teaser.jpg

1.53 MiB

demo.py 0 → 100644
from pathlib import Path
import torch
import argparse
import os
import cv2
import numpy as np
from hamer.configs import CACHE_DIR_HAMER
from hamer.models import HAMER, download_models, load_hamer, DEFAULT_CHECKPOINT
from hamer.utils import recursive_to
from hamer.datasets.vitdet_dataset import ViTDetDataset, DEFAULT_MEAN, DEFAULT_STD
from hamer.utils.renderer import Renderer, cam_crop_to_full
LIGHT_BLUE=(0.65098039, 0.74117647, 0.85882353)
from vitpose_model import ViTPoseModel
import json
from typing import Dict, Optional
def main():
parser = argparse.ArgumentParser(description='HaMeR demo code')
parser.add_argument('--checkpoint', type=str, default=DEFAULT_CHECKPOINT, help='Path to pretrained model checkpoint')
parser.add_argument('--img_folder', type=str, default='images', help='Folder with input images')
parser.add_argument('--out_folder', type=str, default='out_demo', help='Output folder to save rendered results')
parser.add_argument('--side_view', dest='side_view', action='store_true', default=False, help='If set, render side view also')
parser.add_argument('--full_frame', dest='full_frame', action='store_true', default=True, help='If set, render all people together also')
parser.add_argument('--save_mesh', dest='save_mesh', action='store_true', default=False, help='If set, save meshes to disk also')
parser.add_argument('--batch_size', type=int, default=1, help='Batch size for inference/fitting')
parser.add_argument('--rescale_factor', type=float, default=2.0, help='Factor for padding the bbox')
parser.add_argument('--file_type', nargs='+', default=['*.jpg', '*.png'], help='List of file extensions to consider')
args = parser.parse_args()
# Download and load checkpoints
#download_models(CACHE_DIR_HAMER)
model, model_cfg = load_hamer(args.checkpoint)
# Setup HaMeR model
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = model.to(device)
model.eval()
# Load detector
from hamer.utils.utils_detectron2 import DefaultPredictor_Lazy
from detectron2.config import LazyConfig
import hamer
cfg_path = Path(hamer.__file__).parent/'configs'/'cascade_mask_rcnn_vitdet_h_75ep.py'
detectron2_cfg = LazyConfig.load(str(cfg_path))
detectron2_cfg.train.init_checkpoint = "https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_h/f328730692/model_final_f05665.pkl"
for i in range(3):
detectron2_cfg.model.roi_heads.box_predictors[i].test_score_thresh = 0.25
detector = DefaultPredictor_Lazy(detectron2_cfg)
# keypoint detector
cpm = ViTPoseModel(device)
# Setup the renderer
renderer = Renderer(model_cfg, faces=model.mano.faces)
# Make output directory if it does not exist
os.makedirs(args.out_folder, exist_ok=True)
# Get all demo images ends with .jpg or .png
img_paths = [img for end in args.file_type for img in Path(args.img_folder).glob(end)]
# Iterate over all images in folder
for img_path in img_paths:
img_cv2 = cv2.imread(str(img_path))
# Detect humans in image
det_out = detector(img_cv2)
img = img_cv2.copy()[:, :, ::-1]
det_instances = det_out['instances']
valid_idx = (det_instances.pred_classes==0) & (det_instances.scores > 0.5)
pred_bboxes=det_instances.pred_boxes.tensor[valid_idx].cpu().numpy()
pred_scores=det_instances.scores[valid_idx].cpu().numpy()
# Detect human keypoints for each person
vitposes_out = cpm.predict_pose(
img_cv2,
[np.concatenate([pred_bboxes, pred_scores[:, None]], axis=1)],
)
bboxes = []
is_right = []
# Use hands based on hand keypoint detections
for vitposes in vitposes_out:
left_hand_keyp = vitposes['keypoints'][-42:-21]
right_hand_keyp = vitposes['keypoints'][-21:]
# Rejecting not confident detections
keyp = left_hand_keyp
valid = keyp[:,2] > 0.5
if sum(valid) > 3:
bbox = [keyp[valid,0].min(), keyp[valid,1].min(), keyp[valid,0].max(), keyp[valid,1].max()]
bboxes.append(bbox)
is_right.append(0)
keyp = right_hand_keyp
valid = keyp[:,2] > 0.5
if sum(valid) > 3:
bbox = [keyp[valid,0].min(), keyp[valid,1].min(), keyp[valid,0].max(), keyp[valid,1].max()]
bboxes.append(bbox)
is_right.append(1)
if len(bboxes) == 0:
continue
boxes = np.stack(bboxes)
right = np.stack(is_right)
# Run reconstruction on all detected hands
dataset = ViTDetDataset(model_cfg, img_cv2, boxes, right, rescale_factor=args.rescale_factor)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=8, shuffle=False, num_workers=0)
all_verts = []
all_cam_t = []
all_right = []
for batch in dataloader:
batch = recursive_to(batch, device)
with torch.no_grad():
out = model(batch)
multiplier = (2*batch['right']-1)
pred_cam = out['pred_cam']
pred_cam[:,1] = multiplier*pred_cam[:,1]
box_center = batch["box_center"].float()
box_size = batch["box_size"].float()
img_size = batch["img_size"].float()
multiplier = (2*batch['right']-1)
scaled_focal_length = model_cfg.EXTRA.FOCAL_LENGTH / model_cfg.MODEL.IMAGE_SIZE * img_size.max()
pred_cam_t_full = cam_crop_to_full(pred_cam, box_center, box_size, img_size, scaled_focal_length).detach().cpu().numpy()
# Render the result
batch_size = batch['img'].shape[0]
for n in range(batch_size):
# Get filename from path img_path
img_fn, _ = os.path.splitext(os.path.basename(img_path))
person_id = int(batch['personid'][n])
white_img = (torch.ones_like(batch['img'][n]).cpu() - DEFAULT_MEAN[:,None,None]/255) / (DEFAULT_STD[:,None,None]/255)
input_patch = batch['img'][n].cpu() * (DEFAULT_STD[:,None,None]/255) + (DEFAULT_MEAN[:,None,None]/255)
input_patch = input_patch.permute(1,2,0).numpy()
regression_img = renderer(out['pred_vertices'][n].detach().cpu().numpy(),
out['pred_cam_t'][n].detach().cpu().numpy(),
batch['img'][n],
mesh_base_color=LIGHT_BLUE,
scene_bg_color=(1, 1, 1),
)
if args.side_view:
side_img = renderer(out['pred_vertices'][n].detach().cpu().numpy(),
out['pred_cam_t'][n].detach().cpu().numpy(),
white_img,
mesh_base_color=LIGHT_BLUE,
scene_bg_color=(1, 1, 1),
side_view=True)
final_img = np.concatenate([input_patch, regression_img, side_img], axis=1)
else:
final_img = np.concatenate([input_patch, regression_img], axis=1)
cv2.imwrite(os.path.join(args.out_folder, f'{img_fn}_{person_id}.png'), 255*final_img[:, :, ::-1])
# Add all verts and cams to list
verts = out['pred_vertices'][n].detach().cpu().numpy()
is_right = batch['right'][n].cpu().numpy()
verts[:,0] = (2*is_right-1)*verts[:,0]
cam_t = pred_cam_t_full[n]
all_verts.append(verts)
all_cam_t.append(cam_t)
all_right.append(is_right)
# Save all meshes to disk
if args.save_mesh:
camera_translation = cam_t.copy()
tmesh = renderer.vertices_to_trimesh(verts, camera_translation, LIGHT_BLUE, is_right=is_right)
tmesh.export(os.path.join(args.out_folder, f'{img_fn}_{person_id}.obj'))
# Render front view
if args.full_frame and len(all_verts) > 0:
misc_args = dict(
mesh_base_color=LIGHT_BLUE,
scene_bg_color=(1, 1, 1),
focal_length=scaled_focal_length,
)
cam_view = renderer.render_rgba_multiple(all_verts, cam_t=all_cam_t, render_res=img_size[n], is_right=all_right, **misc_args)
# Overlay image
input_img = img_cv2.astype(np.float32)[:,:,::-1]/255.0
input_img = np.concatenate([input_img, np.ones_like(input_img[:,:,:1])], axis=2) # Add alpha channel
input_img_overlay = input_img[:,:,:3] * (1-cam_view[:,:,3:]) + cam_view[:,:,:3] * cam_view[:,:,3:]
cv2.imwrite(os.path.join(args.out_folder, f'{img_fn}_all.jpg'), 255*input_img_overlay[:, :, ::-1])
if __name__ == '__main__':
main()
example_data/test1.jpg

101 KiB

example_data/test2.jpg

32.9 KiB

example_data/test3.jpg

86.2 KiB

example_data/test4.jpg

488 KiB

example_data/test5.jpg

166 KiB

wget https://www.dropbox.com/s/6zejmxu0aur3568/hamer_demo_data.tar.gz
tar --warning=no-unknown-keyword --exclude=".*" -xvf hamer_demo_data.tar.gz
# Downloading all tars
wget -O hamer_training_data_part1.tar.gz https://www.dropbox.com/scl/fi/f249h32hd35x78l058ofy/hamer_training_data_part1.tar.gz?rlkey=puuvwg5ngueaxl4xxwf3yd15a
wget -O hamer_training_data_part2.tar.gz https://www.dropbox.com/scl/fi/l9l5udalchu0mh4qxnw2t/hamer_training_data_part2.tar.gz?rlkey=i0n2lzix4q6jxmhm4sr5rtmkt
wget -O hamer_training_data_part3.tar.gz https://www.dropbox.com/scl/fi/6lamcbwt79ri0oj4knwm3/hamer_training_data_part3.tar.gz?rlkey=j5y7ea7xrlu440ud12otaj2ne
wget -O hamer_training_data_part4a.tar.gz https://www.dropbox.com/scl/fi/vp6cw7he8t0eigjf6001l/hamer_training_data_part4a.tar.gz?rlkey=wylmufft4a5nq3yxep2olifrk
wget -O hamer_training_data_part4b.tar.gz https://www.dropbox.com/scl/fi/vyjasngr67ru14fb8s108/hamer_training_data_part4b.tar.gz?rlkey=qgotg1v9lkgo5eu78gh8b007t
wget -O hamer_training_data_part4c.tar.gz https://www.dropbox.com/scl/fi/nfvz5zpcmhz8hkwzc6ji4/hamer_training_data_part4c.tar.gz?rlkey=ygh0wvse04twhh1ri3xiw2sag
for f in hamer_training_data_part*.tar; do
tar --warning=no-unknown-keyword --exclude=".*" -xvf $f
done
import os
from typing import Dict
from yacs.config import CfgNode as CN
CACHE_DIR_HAMER = "./_DATA"
def to_lower(x: Dict) -> Dict:
"""
Convert all dictionary keys to lowercase
Args:
x (dict): Input dictionary
Returns:
dict: Output dictionary with all keys converted to lowercase
"""
return {k.lower(): v for k, v in x.items()}
_C = CN(new_allowed=True)
_C.GENERAL = CN(new_allowed=True)
_C.GENERAL.RESUME = True
_C.GENERAL.TIME_TO_RUN = 3300
_C.GENERAL.VAL_STEPS = 100
_C.GENERAL.LOG_STEPS = 100
_C.GENERAL.CHECKPOINT_STEPS = 20000
_C.GENERAL.CHECKPOINT_DIR = "checkpoints"
_C.GENERAL.SUMMARY_DIR = "tensorboard"
_C.GENERAL.NUM_GPUS = 1
_C.GENERAL.NUM_WORKERS = 4
_C.GENERAL.MIXED_PRECISION = True
_C.GENERAL.ALLOW_CUDA = True
_C.GENERAL.PIN_MEMORY = False
_C.GENERAL.DISTRIBUTED = False
_C.GENERAL.LOCAL_RANK = 0
_C.GENERAL.USE_SYNCBN = False
_C.GENERAL.WORLD_SIZE = 1
_C.TRAIN = CN(new_allowed=True)
_C.TRAIN.NUM_EPOCHS = 100
_C.TRAIN.BATCH_SIZE = 32
_C.TRAIN.SHUFFLE = True
_C.TRAIN.WARMUP = False
_C.TRAIN.NORMALIZE_PER_IMAGE = False
_C.TRAIN.CLIP_GRAD = False
_C.TRAIN.CLIP_GRAD_VALUE = 1.0
_C.LOSS_WEIGHTS = CN(new_allowed=True)
_C.DATASETS = CN(new_allowed=True)
_C.MODEL = CN(new_allowed=True)
_C.MODEL.IMAGE_SIZE = 224
_C.EXTRA = CN(new_allowed=True)
_C.EXTRA.FOCAL_LENGTH = 5000
_C.DATASETS.CONFIG = CN(new_allowed=True)
_C.DATASETS.CONFIG.SCALE_FACTOR = 0.3
_C.DATASETS.CONFIG.ROT_FACTOR = 30
_C.DATASETS.CONFIG.TRANS_FACTOR = 0.02
_C.DATASETS.CONFIG.COLOR_SCALE = 0.2
_C.DATASETS.CONFIG.ROT_AUG_RATE = 0.6
_C.DATASETS.CONFIG.TRANS_AUG_RATE = 0.5
_C.DATASETS.CONFIG.DO_FLIP = False
_C.DATASETS.CONFIG.FLIP_AUG_RATE = 0.5
_C.DATASETS.CONFIG.EXTREME_CROP_AUG_RATE = 0.10
def default_config() -> CN:
"""
Get a yacs CfgNode object with the default config values.
"""
# Return a clone so that the defaults will not be altered
# This is for the "local variable" use pattern
return _C.clone()
def dataset_config() -> CN:
"""
Get dataset config file
Returns:
CfgNode: Dataset config as a yacs CfgNode object.
"""
cfg = CN(new_allowed=True)
config_file = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'datasets_tar.yaml')
cfg.merge_from_file(config_file)
cfg.freeze()
return cfg
def get_config(config_file: str, merge: bool = True, update_cachedir: bool = False) -> CN:
"""
Read a config file and optionally merge it with the default config file.
Args:
config_file (str): Path to config file.
merge (bool): Whether to merge with the default config or not.
Returns:
CfgNode: Config as a yacs CfgNode object.
"""
if merge:
cfg = default_config()
else:
cfg = CN(new_allowed=True)
cfg.merge_from_file(config_file)
if update_cachedir:
def update_path(path: str) -> str:
if os.path.isabs(path):
return path
return os.path.join(CACHE_DIR_HAMER, path)
cfg.MANO.MODEL_PATH = update_path(cfg.MANO.MODEL_PATH)
cfg.MANO.MEAN_PARAMS = update_path(cfg.MANO.MEAN_PARAMS)
cfg.freeze()
return cfg
## coco_loader_lsj.py
import detectron2.data.transforms as T
from detectron2 import model_zoo
from detectron2.config import LazyCall as L
# Data using LSJ
image_size = 1024
dataloader = model_zoo.get_config("common/data/coco.py").dataloader
dataloader.train.mapper.augmentations = [
L(T.RandomFlip)(horizontal=True), # flip first
L(T.ResizeScale)(
min_scale=0.1, max_scale=2.0, target_height=image_size, target_width=image_size
),
L(T.FixedSizeCrop)(crop_size=(image_size, image_size), pad=False),
]
dataloader.train.mapper.image_format = "RGB"
dataloader.train.total_batch_size = 64
# recompute boxes due to cropping
dataloader.train.mapper.recompute_boxes = True
dataloader.test.mapper.augmentations = [
L(T.ResizeShortestEdge)(short_edge_length=image_size, max_size=image_size),
]
from functools import partial
from fvcore.common.param_scheduler import MultiStepParamScheduler
from detectron2 import model_zoo
from detectron2.config import LazyCall as L
from detectron2.solver import WarmupParamScheduler
from detectron2.modeling.backbone.vit import get_vit_lr_decay_rate
# mask_rcnn_vitdet_b_100ep.py
model = model_zoo.get_config("common/models/mask_rcnn_vitdet.py").model
# Initialization and trainer settings
train = model_zoo.get_config("common/train.py").train
train.amp.enabled = True
train.ddp.fp16_compression = True
train.init_checkpoint = "detectron2://ImageNetPretrained/MAE/mae_pretrain_vit_base.pth"
# Schedule
# 100 ep = 184375 iters * 64 images/iter / 118000 images/ep
train.max_iter = 184375
lr_multiplier = L(WarmupParamScheduler)(
scheduler=L(MultiStepParamScheduler)(
values=[1.0, 0.1, 0.01],
milestones=[163889, 177546],
num_updates=train.max_iter,
),
warmup_length=250 / train.max_iter,
warmup_factor=0.001,
)
# Optimizer
optimizer = model_zoo.get_config("common/optim.py").AdamW
optimizer.params.lr_factor_func = partial(get_vit_lr_decay_rate, num_layers=12, lr_decay_rate=0.7)
optimizer.params.overrides = {"pos_embed": {"weight_decay": 0.0}}
# cascade_mask_rcnn_vitdet_b_100ep.py
from detectron2.config import LazyCall as L
from detectron2.layers import ShapeSpec
from detectron2.modeling.box_regression import Box2BoxTransform
from detectron2.modeling.matcher import Matcher
from detectron2.modeling.roi_heads import (
FastRCNNOutputLayers,
FastRCNNConvFCHead,
CascadeROIHeads,
)
# arguments that don't exist for Cascade R-CNN
[model.roi_heads.pop(k) for k in ["box_head", "box_predictor", "proposal_matcher"]]
model.roi_heads.update(
_target_=CascadeROIHeads,
box_heads=[
L(FastRCNNConvFCHead)(
input_shape=ShapeSpec(channels=256, height=7, width=7),
conv_dims=[256, 256, 256, 256],
fc_dims=[1024],
conv_norm="LN",
)
for _ in range(3)
],
box_predictors=[
L(FastRCNNOutputLayers)(
input_shape=ShapeSpec(channels=1024),
test_score_thresh=0.05,
box2box_transform=L(Box2BoxTransform)(weights=(w1, w1, w2, w2)),
cls_agnostic_bbox_reg=True,
num_classes="${...num_classes}",
)
for (w1, w2) in [(10, 5), (20, 10), (30, 15)]
],
proposal_matchers=[
L(Matcher)(thresholds=[th], labels=[0, 1], allow_low_quality_matches=False)
for th in [0.5, 0.6, 0.7]
],
)
# cascade_mask_rcnn_vitdet_h_75ep.py
from functools import partial
train.init_checkpoint = "detectron2://ImageNetPretrained/MAE/mae_pretrain_vit_huge_p14to16.pth"
model.backbone.net.embed_dim = 1280
model.backbone.net.depth = 32
model.backbone.net.num_heads = 16
model.backbone.net.drop_path_rate = 0.5
# 7, 15, 23, 31 for global attention
model.backbone.net.window_block_indexes = (
list(range(0, 7)) + list(range(8, 15)) + list(range(16, 23)) + list(range(24, 31))
)
optimizer.params.lr_factor_func = partial(get_vit_lr_decay_rate, lr_decay_rate=0.9, num_layers=32)
optimizer.params.overrides = {}
optimizer.params.weight_decay_norm = None
train.max_iter = train.max_iter * 3 // 4 # 100ep -> 75ep
lr_multiplier.scheduler.milestones = [
milestone * 3 // 4 for milestone in lr_multiplier.scheduler.milestones
]
lr_multiplier.scheduler.num_updates = train.max_iter
FREIHAND-TRAIN:
TYPE: ImageDataset
URLS: hamer_training_data/dataset_tars/freihand-train/{000000..000130}.tar
epoch_size: 130_240
INTERHAND26M-TRAIN:
TYPE: ImageDataset
URLS: hamer_training_data/dataset_tars/interhand26m-train/{000000..001056}.tar
epoch_size: 1_424_632
HALPE-TRAIN:
TYPE: ImageDataset
URLS: hamer_training_data/dataset_tars/halpe-train/{000000..000022}.tar
epoch_size: 34_289
COCOW-TRAIN:
TYPE: ImageDataset
URLS: hamer_training_data/dataset_tars/cocow-train/{000000..000036}.tar
epoch_size: 78_666
MTC-TRAIN:
TYPE: ImageDataset
URLS: hamer_training_data/dataset_tars/mtc-train/{000000..000306}.tar
epoch_size: 363_947
RHD-TRAIN:
TYPE: ImageDataset
URLS: hamer_training_data/dataset_tars/rhd-train/{000000..000041}.tar
epoch_size: 61_705
MPIINZSL-TRAIN:
TYPE: ImageDataset
URLS: hamer_training_data/dataset_tars/mpiinzsl-train/{000000..000015}.tar
epoch_size: 15_184
HO3D-TRAIN:
TYPE: ImageDataset
URLS: hamer_training_data/dataset_tars/ho3d-train/{000000..000083}.tar
epoch_size: 83_325
H2O3D-TRAIN:
TYPE: ImageDataset
URLS: hamer_training_data/dataset_tars/h2o3d-train/{000000..000060}.tar
epoch_size: 121_996
DEX-TRAIN:
TYPE: ImageDataset
URLS: hamer_training_data/dataset_tars/dex-train/{000000..000406}.tar
epoch_size: 406_888
FREIHAND-MOCAP:
DATASET_FILE: hamer_training_data/freihand_mocap.npz
# @package _global_
defaults:
- /data_filtering: low1
DATASETS:
TRAIN:
FREIHAND-TRAIN:
WEIGHT: 0.25
INTERHAND26M-TRAIN:
WEIGHT: 0.25
MTC-TRAIN:
WEIGHT: 0.1
RHD-TRAIN:
WEIGHT: 0.05
COCOW-TRAIN:
WEIGHT: 0.1
HALPE-TRAIN:
WEIGHT: 0.05
MPIINZSL-TRAIN:
WEIGHT: 0.05
HO3D-TRAIN:
WEIGHT: 0.05
H2O3D-TRAIN:
WEIGHT: 0.05
DEX-TRAIN:
WEIGHT: 0.05
VAL:
FREIHAND-TRAIN:
WEIGHT: 1.0
MOCAP: FREIHAND-MOCAP
# @package _global_
DATASETS:
# Data filtering during training
SUPPRESS_KP_CONF_THRESH: 0.3
FILTER_NUM_KP: 4
FILTER_NUM_KP_THRESH: 0.0
FILTER_REPROJ_THRESH: 31000
SUPPRESS_BETAS_THRESH: 3.0
SUPPRESS_BAD_POSES: False
POSES_BETAS_SIMULTANEOUS: True
FILTER_NO_POSES: False # If True, filters images that don't have poses
# @package _global_
MANO:
DATA_DIR: _DATA/data/
MODEL_PATH: ${MANO.DATA_DIR}/mano
GENDER: neutral
NUM_HAND_JOINTS: 15
MEAN_PARAMS: ${MANO.DATA_DIR}/mano_mean_params.npz
CREATE_BODY_POSE: FALSE
EXTRA:
FOCAL_LENGTH: 5000
NUM_LOG_IMAGES: 4
NUM_LOG_SAMPLES_PER_IMAGE: 8
PELVIS_IND: 0
DATASETS:
BETAS_REG: True
CONFIG:
SCALE_FACTOR: 0.3
ROT_FACTOR: 30
TRANS_FACTOR: 0.02
COLOR_SCALE: 0.2
ROT_AUG_RATE: 0.6
TRANS_AUG_RATE: 0.5
DO_FLIP: False
FLIP_AUG_RATE: 0.0
EXTREME_CROP_AUG_RATE: 0.0
EXTREME_CROP_AUG_LEVEL: 1
# @package _global_
defaults:
- default.yaml
GENERAL:
TOTAL_STEPS: 1_000_000
LOG_STEPS: 1000
VAL_STEPS: 1000
CHECKPOINT_STEPS: 1000
CHECKPOINT_SAVE_TOP_K: 1
NUM_WORKERS: 25
PREFETCH_FACTOR: 2
TRAIN:
LR: 1e-5
WEIGHT_DECAY: 1e-4
BATCH_SIZE: 8
LOSS_REDUCTION: mean
NUM_TRAIN_SAMPLES: 2
NUM_TEST_SAMPLES: 64
POSE_2D_NOISE_RATIO: 0.01
SMPL_PARAM_NOISE_RATIO: 0.005
MODEL:
IMAGE_SIZE: 256
IMAGE_MEAN: [0.485, 0.456, 0.406]
IMAGE_STD: [0.229, 0.224, 0.225]
BACKBONE:
TYPE: vit
PRETRAINED_WEIGHTS: hamer_training_data/vitpose_backbone.pth
MANO_HEAD:
TYPE: transformer_decoder
IN_CHANNELS: 2048
TRANSFORMER_DECODER:
depth: 6
heads: 8
mlp_dim: 1024
dim_head: 64
dropout: 0.0
emb_dropout: 0.0
norm: layer
context_dim: 1280 # from vitpose-H
LOSS_WEIGHTS:
KEYPOINTS_3D: 0.05
KEYPOINTS_2D: 0.01
GLOBAL_ORIENT: 0.001
HAND_POSE: 0.001
BETAS: 0.0005
ADVERSARIAL: 0.0005
# disable python warnings if they annoy you
ignore_warnings: False
# ask user for tags if none are provided in the config
enforce_tags: True
# pretty print config tree at the start of the run using Rich library
print_config: True
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment