Deep Learning Paper Implementations

originally made this back in 2021 for myself currently updating the code for all my papers and publishing them as I get them working

just a repo of deep learning papers I implemented just to practice implementing and reading/understanding deep learning papers. Every model I train will have a demo you can load up locally to test/try models. this is pretty much my main repo to practicing build ML pipelines from data collection to MLops. Each model has it's own branch. To see the template I used for each project see project_template branch

machine setup

Here is the machine I used to train all these models

cpu: threadripper 1920x
gpu: duel RTX 3090s
ram: 120 Gbs of DDR4(rgb to make faster)
os: Ubuntu 22.04 LTS/20.04 LTS
editor: VScode

Papers in repo

Image

Gradient-based learning applied to document recognition(lenet5): https://ieeexplore.ieee.org/document/726791
ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)" by Alex Krizhevsky et al. (2012): https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
"Very Deep Convolutional Networks for Large-Scale Image Recognition (VGGNet)" by Karen Simonyan and Andrew Zisserman (2014)
"Going Deeper with Convolutions (GoogLeNet/Inception)" by Christian Szegedy et al. (2014)
"Deep Residual Learning for Image Recognition (ResNet)" by Kaiming He et al. (2015)
"U-Net: Convolutional Networks for Biomedical Image Segmentation" by Olaf Ronneberger et al. (2015)
"YOLO: Unified, Real-Time Object Detection" by Joseph Redmon et al. (2016)
"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" by Andrew G. Howard et al. (2017)
"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" by Mingxing Tan and Quoc V. Le (2019)
"Generative Adversarial Nets (GANs)" by Ian Goodfellow et al. (2014)
"Vision Transformers (ViT): An Image is Worth 16x16 Words" by Alexey Dosovitskiy et al. (2020)
CLIP: Learning Transferable Visual Models From Natural Language Supervision" by Alec Radford et al. (2021)

Audio

WaveNet: A Generative Model for Raw Audio" by Aäron van den Oord et al. (2016)
"Deep Speech: Scaling up end-to-end speech recognition" by Aäron van den Oord et al. (2014)
"Tacotron: Towards End-to-End Speech Synthesis" by Yuxuan Wang et al. (2017)
"Conformer: Convolution-augmented Transformer for Speech Recognition" by Anmol Gulati et al. (2020)
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis" by Jungil Kong et al. (2020)
valle...

Text

"Sequence to Sequence Learning with Neural Networks" by Ilya Sutskever et al. (2014)
"Attention Is All You Need (Transformer)" by Ashish Vaswani et al. (2017)
"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Jacob Devlin et al. (2018)
"GPT-2: Language Models are Unsupervised Multitask Learners" by Alec Radford et al. (2019)
"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" by Zihang Dai et al. (2019)
"XLNet: Generalized Autoregressive Pretraining for Language Understanding" by Zhilin Yang et al. (2019)
"DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" by Victor Sanh et al. (2019)
"T5: Text-To-Text Transfer Transformer" by Colin Raffel et al. (2019)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Deep Learning Paper Implementations

machine setup

Papers in repo

Image

Audio

Text

Video

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deep Learning Paper Implementations

machine setup

Papers in repo

Image

Audio

Text

Video