Masked Autoencoders Are Small Scale Vision Learners: A Reproduction Under Resource Constraints
[Presented as a poster in NeurIPS 2023] The Masked Autoencoder (MAE) was recently proposed as a framework for efficient self‐supervised pre‐training in Computer Vision [1]. In this pa‐per, we attempt a replication of the MAE under significant computational constraints.Specifically, we target the claim that masking out a large part of the input image yields a nontrivial and meaningful self‐supervisory task, which allows training models that generalize well. We also present the Semantic Masked Autoencoder (SMAE), a novel yet simple extension of MAE which uses perceptual loss to improve encoder embeddings. Methodology — The datasets and backbones we rely on are significantly smaller than those used by [1]. Our main experiments are performed on Tiny ImageNet (TIN) [2] and trans‐fer learning is performed on a low‐resolution version of CUB‐200‐2011 [3]. We use a ViT‐Lite [4] as backbone. We also compare the MAE to DINO, an alternative frame‐work for self‐supervised learning [5]. The ViT, MAE, as well as perceptual loss were implemented from scratch, without consulting the original authors’ code. Our code is available at https://github.com/MLReproHub/SMAE. The computational budget for our reproduction and extension was approximately 150 GPU hours.Results — This paper successfully reproduces the claim that the MAE poses a non trivial and meaningful self‐supervisory task. We show that models trained with this frame‐work generalize well to new datasets and conclude that the MAE is reproducible with exception for some hyperparameter choices. We also demonstrate that MAE performs well with smaller backbones and datasets. Finally, our results suggest that the SMAE extension improves the downstream classification accuracy of the MAE on CUB (+5 pp)when coupled with an appropriate masking strategy.What was easy — Given prior experience with a deep learning framework, re‐implementing the paper was relatively straightforward, with sufficient details given in the paper.What was difficult — We faced challenges implementing efficient patch shuffling and tuning hyperparameters. The hyperparameter choices from [1] did not translate well to a smaller dataset and backbone.
View ProjectGenerative Adversarial Networks for Biological Image Synthesis
Reproduced and extended the work of Osokin et al. “GANs for Biological Image Synthesis”. In this work, we also reason on the effectiveness of GANs on recreating cellular evolution images as they appear under a fluorescence microscope's lens.
View ProjectGenerative Adversarial Networks for Pose and Style Selection in Fashion Design Applications
Conducted this research-based thesis in ISSEL group, Aristotle University, wherein we tried to implement state-of-the-art GAN architectures to perform realistic transformations on images of people wearing fashion garments. Explored & experimented with models based on StyleGAN, CycleGAN, and pix2pix, among others. With over 15.5K lines of python code (PyTorch) and over 500 hours of GPU training, I consider this thesis as the best possible start to deep learning research.
View ProjectAd-hoc peer-to-peer wireless communications using Raspberry Pi's
The setup consists of a number of wifi-capable Raspberry Pi Zero's each of which may bound its TCP sockets to a specified IP address with the student's number (SSN) as the last digits. On each of them runs the same app that on initialization creates three POSIX threads, one to listen for inbound connection requests (server thread) one to search via polling for other active devices in wireless range, and the 3rd which produces a message with a random recipient at a random interval.
When devices connect, they exchange messages. A device sends to the other device all the messages not intended for it and receives the corresponding messages from the other device. The purpose is for a message to reach its recipient after zero or more intermediate recipients. The final recipient stores its messages in an internal inbox (in memory).
A C99 app was developed and cross-compiled to run inside each of the Raspberry Pi that was involved in the experimental setup. We used TCP sockets to handle actual information exchange and time sync between devices. All devices kept communication event logs from which useful communication statistics were extracted. The communication statistics for each of the involved devices were elegantly presented using a custom Javascript application that was also developed for the purposes of the course project.
CUSTOM AAC CODEC IN MATLAB
A 3-stage AAC audio codec was developed in MATLAB in the course “Multimedia Systems and Virtual Reality” taught by prof. Anastasios Delopoulos.
View ProjectEUROTECHNIK.GR FRONTEND, CMS, CUSTOM CRM & ERP
Developed eurotechnik.gr website (now in 4th revision) using PHP, JS, CSS, and AJAX. A custom CMS was also built. A fully-working custom CRM + ERP developed in PHP on top of the Laravel framework (at admin.eurotechnik.gr). Deployed on an initially unmanaged dedicated server. The latest addition was the automated bid-creation subsystem that uses AI for prediction of customer demand and supply shortages.
View Project