| --- |
| library_name: transformers |
| tags: |
| - video |
| - feature |
| - face |
| license: cc |
| base_model: |
| - ControlNet/MARLIN |
| pipeline_tag: feature-extraction |
| --- |
| |
|
|
| # MARLIN: Masked Autoencoder for facial video Representation LearnINg |
|
|
| This repo is the official PyTorch implementation for the paper |
| [MARLIN: Masked Autoencoder for facial video Representation LearnINg](https://openaccess.thecvf.com/content/CVPR2023/html/Cai_MARLIN_Masked_Autoencoder_for_Facial_Video_Representation_LearnINg_CVPR_2023_paper) (CVPR 2023) ([arXiv](https://arxiv.org/abs/2211.06627)). |
|
|
|
|
| ## Use `transformers` (HuggingFace) for Feature Extraction |
|
|
| Requirements: |
| - Python |
| - PyTorch |
| - transformers |
| - einops |
|
|
| Currently the huggingface model is only for direct feature extraction without any video pre-processing (e.g. face detection, cropping, strided window, etc). |
|
|
|
|
| ```python |
| import torch |
| from transformers import AutoModel |
| |
| model = AutoModel.from_pretrained( |
| "ControlNet/marlin_vit_base_ytf", # or other variants |
| trust_remote_code=True |
| ) |
| tensor = torch.rand([1, 3, 16, 224, 224]) # (B, C, T, H, W) |
| output = model(tensor) # torch.Size([1, 1568, 384]) |
| ``` |
|
|
| ## License |
|
|
| This project is under the CC BY-NC 4.0 license. See [LICENSE](LICENSE) for details. |
|
|
| ## References |
| If you find this work useful for your research, please consider citing it. |
| ```bibtex |
| @inproceedings{cai2022marlin, |
| title = {MARLIN: Masked Autoencoder for facial video Representation LearnINg}, |
| author = {Cai, Zhixi and Ghosh, Shreya and Stefanov, Kalin and Dhall, Abhinav and Cai, Jianfei and Rezatofighi, Hamid and Haffari, Reza and Hayat, Munawar}, |
| booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, |
| year = {2023}, |
| month = {June}, |
| pages = {1493-1504}, |
| doi = {10.1109/CVPR52729.2023.00150}, |
| publisher = {IEEE}, |
| } |
| ``` |
|
|