diff --git a/README.md b/README.md index 9bffda427fbe25a4e1cf14bc380455261824acf7..0367d86f5d38270a1f15ea43af47a2ed850a498a 100644 --- a/README.md +++ b/README.md @@ -55,7 +55,7 @@ This repository is the official implementation of the [CVPR 2022](https://cvpr20 EMOCA takes a single in-the-wild image as input and reconstructs a 3D face with sufficient facial expression detail to convey the emotional state of the input image. EMOCA advances the state-of-the-art monocular face reconstruction in-the-wild, putting emphasis on accurate capture of emotional content. The official project page is [here](https://emoca.is.tue.mpg.de/index.html). -## !!! UPDATES !!! +## !!! UPDATE !!! **EMOCA v2 is now out.** Compared to the original model it produces: @@ -68,7 +68,7 @@ This is achieved by: 2) Using absolute landmark loss in combination with the relative losses (as opposed to only relative landmark losses in EMOCA v1) 3) Incorporating perceptual lip reading loss. Inspired by [spectre](https://filby89.github.io/spectre/). Big shout-out to these guys! -You will have to upgrade to the new environment in order to use EMOCA v2. Please follow the steps bellow to install the package. Then, go to [EMOCA](gdl_apps/EMOCA) . +You will have to upgrade to the new environment in order to use EMOCA v2. Please follow the steps bellow to install the package. Then, go to the [EMOCA](gdl_apps/EMOCA) subfolder and follow the steps described there. ## EMOCA project The training and testing script for EMOCA can be found in this subfolder: diff --git a/gdl_apps/EMOCA/README.md b/gdl_apps/EMOCA/README.md index 77106de37634e4dde6e8b95179b4183d7d607918..446467a0a36c3a2db030d806275b7c7449737229 100644 --- a/gdl_apps/EMOCA/README.md +++ b/gdl_apps/EMOCA/README.md @@ -14,6 +14,31 @@ input image. <p align="center">Top row: input images. Middle row: coarse shape reconstruction. Bottom row: reconstruction with detailed displacements.<p align="center"> +## !!! UPDATE !!! +**EMOCA v2 is now out.** + +Compared to the original model it produces: + +1) Much better lip and eye alignment +2) Much better lip articulation + +This is achieved by: +1) Using a subset of mediapipe landmarks for mouth, eyes and eyebrows (as opposed to FAN landmarks that EMOCA v1 uses) +2) Using absolute landmark loss in combination with the relative losses (as opposed to only relative landmark losses in EMOCA v1) +3) Incorporating perceptual lip reading loss. Inspired by [spectre](https://filby89.github.io/spectre/). Big shout-out to these guys! + +There are now mutiple EMOCA models that you can use for reconstruction. You can specify their names in the demo scripts using the `--model_name <model>` flag. + +The available models are: +1) `EMOCA` - the original `EMOCA version +2) `EMOCA_v2_mp` - EMOCA v2 trained with mediapiple landmarks (instead of FAN) but without the lip reading loss +3) `EMOCA_v2_lr_cos_1.5` - EMOCA v2 trained with mediapiple landmarks and with the lip reading loss (cosine similarity on lip reading features, similarly to SPECTRE) +4) `EMOCA_v2_lr_mse_20` - (default) EMOCA v2 trained with mediapipe landmarks and with the lip reading loss (MSE on lip reading features) + +Notes: +The SPECTRE paper uses a cosine similarity metric on lip reading features for supervision. In practice, we found that the cosine similarity loss can sometimes be artifact prone (over-exaggerated lip motion). This is the `EMOCA_v2_lr_cos_1.5` model. We found the supervision by mean squared error metric to be more stable in this regard and hence we recommend using the `EMOCA_v2_lr_mse_20` model. If you find that even this one produces undesirable artifacts, we suggest using `EMOCA_v2_mp`, which does not use the lip reading loss but is still much better thatn the original `EMOCA` model. + + ## Installation 1) Follow the steps at the [root of this repo](../..). If for some reason the environment from there is not valid, create one using a `.yml` file from `envs`. @@ -31,7 +56,7 @@ bash download_assets.sh Then activate your environment: ```bash -conda activate work36_cu11 +conda activate work38 ``` @@ -109,7 +134,18 @@ As EMOCA builds on top of [DECA](https://github.com/YadiraF/DECA) and uses parts url = {https://doi.org/10.1145/3450626.3459936} } ``` - +Furthermore, if you use EMOCA v2, please also cite [SPECTRE](https://filby89.github.io/spectre/): +``` +@article{filntisis2022visual, + title = {Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos}, + author = {Filntisis, Panagiotis P. and Retsinas, George and Paraperas-Papantoniou, Foivos and Katsamanis, Athanasios and Roussos, Anastasios and Maragos, Petros}, + journal = {arXiv preprint arXiv:2207.11094}, + publisher = {arXiv}, + year = {2022}, +} +``` + + ## License This code and model are **available for non-commercial scientific research purposes** as defined in the [LICENSE](https://emoca.is.tue.mpg.de/license.html) file. By downloading and using the code and model you agree to the terms of this license.