originally made this back in 2021 for myself currently updating the code for all my papers and publishing them as I get them working
just a repo of deep learning papers I implemented just to practice implementing and reading/understanding deep learning papers. Every model I train will have a demo you can load up locally to test/try models. this is pretty much my main repo to practicing build ML pipelines from data collection to MLops. Each model has it's own branch. To see the template I used for each project see project_template
branch
Here is the machine I used to train all these models
- cpu: threadripper 1920x
- gpu: duel RTX 3090s
- ram: 120 Gbs of DDR4(rgb to make faster)
- os: Ubuntu 22.04 LTS/20.04 LTS
- editor: VScode
- Gradient-based learning applied to document recognition(lenet5): https://ieeexplore.ieee.org/document/726791
- ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)" by Alex Krizhevsky et al. (2012): https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
- "Very Deep Convolutional Networks for Large-Scale Image Recognition (VGGNet)" by Karen Simonyan and Andrew Zisserman (2014)
- "Going Deeper with Convolutions (GoogLeNet/Inception)" by Christian Szegedy et al. (2014)
- "Deep Residual Learning for Image Recognition (ResNet)" by Kaiming He et al. (2015)
- "U-Net: Convolutional Networks for Biomedical Image Segmentation" by Olaf Ronneberger et al. (2015)
- "YOLO: Unified, Real-Time Object Detection" by Joseph Redmon et al. (2016)
- "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" by Andrew G. Howard et al. (2017)
- "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" by Mingxing Tan and Quoc V. Le (2019)
- "Generative Adversarial Nets (GANs)" by Ian Goodfellow et al. (2014)
- "Vision Transformers (ViT): An Image is Worth 16x16 Words" by Alexey Dosovitskiy et al. (2020)
- CLIP: Learning Transferable Visual Models From Natural Language Supervision" by Alec Radford et al. (2021)
- WaveNet: A Generative Model for Raw Audio" by Aäron van den Oord et al. (2016)
- "Deep Speech: Scaling up end-to-end speech recognition" by Aäron van den Oord et al. (2014)
- "Tacotron: Towards End-to-End Speech Synthesis" by Yuxuan Wang et al. (2017)
- "Conformer: Convolution-augmented Transformer for Speech Recognition" by Anmol Gulati et al. (2020)
- "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis" by Jungil Kong et al. (2020)
- valle...
- "Sequence to Sequence Learning with Neural Networks" by Ilya Sutskever et al. (2014)
- "Attention Is All You Need (Transformer)" by Ashish Vaswani et al. (2017)
- "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Jacob Devlin et al. (2018)
- "GPT-2: Language Models are Unsupervised Multitask Learners" by Alec Radford et al. (2019)
- "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" by Zihang Dai et al. (2019)
- "XLNet: Generalized Autoregressive Pretraining for Language Understanding" by Zhilin Yang et al. (2019)
- "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" by Victor Sanh et al. (2019)
- "T5: Text-To-Text Transfer Transformer" by Colin Raffel et al. (2019)