Portfolio - Thanos Charisoudis

Publications

Masked Autoencoders Are Small Scale Vision Learners: A Reproduction Under Resource Constraints

[Presented as a poster in NeurIPS 2023] The Masked Autoencoder (MAE) was recently proposed as a framework for efficient self‐supervised pre‐training in Computer Vision [1]. In this pa‐per, we attempt a replication of the MAE under significant computational constraints.Specifically, we target the claim that masking out a large part of the input image yields a nontrivial and meaningful self‐supervisory task, which allows training models that generalize well. We also present the Semantic Masked Autoencoder (SMAE), a novel yet simple extension of MAE which uses perceptual loss to improve encoder embeddings. Methodology — The datasets and backbones we rely on are significantly smaller than those used by [1]. Our main experiments are performed on Tiny ImageNet (TIN) [2] and trans‐fer learning is performed on a low‐resolution version of CUB‐200‐2011 [3]. We use a ViT‐Lite [4] as backbone. We also compare the MAE to DINO, an alternative frame‐work for self‐supervised learning [5]. The ViT, MAE, as well as perceptual loss were implemented from scratch, without consulting the original authors’ code. Our code is available at https://github.com/MLReproHub/SMAE. The computational budget for our reproduction and extension was approximately 150 GPU hours.Results — This paper successfully reproduces the claim that the MAE poses a non trivial and meaningful self‐supervisory task. We show that models trained with this frame‐work generalize well to new datasets and conclude that the MAE is reproducible with exception for some hyperparameter choices. We also demonstrate that MAE performs well with smaller backbones and datasets. Finally, our results suggest that the SMAE extension improves the downstream classification accuracy of the MAE on CUB (+5 pp)when coupled with an appropriate masking strategy.What was easy — Given prior experience with a deep learning framework, re‐implementing the paper was relatively straightforward, with sufficient details given in the paper.What was difficult — We faced challenges implementing efficient patch shuffling and tuning hyperparameters. The hyperparameter choices from [1] did not translate well to a smaller dataset and backbone.

View Project

Course Projects

Personal Projects

Masked Autoencoders Are Small Scale Vision Learners: A Reproduction Under Resource Constraints

Generative Adversarial Networks for Biological Image Synthesis

Generative Adversarial Networks for Pose and Style Selection in Fashion Design Applications

Ad-hoc peer-to-peer wireless communications using Raspberry Pi's

CUSTOM AAC CODEC IN MATLAB

EUROTECHNIK.GR FRONTEND, CMS, CUSTOM CRM & ERP