Skip to content

Latest commit

 

History

History
43 lines (41 loc) · 3.4 KB

README.md

File metadata and controls

43 lines (41 loc) · 3.4 KB

Deep Learning Paper Implementations

originally made this back in 2021 for myself currently updating the code for all my papers and publishing them as I get them working

just a repo of deep learning papers I implemented just to practice implementing and reading/understanding deep learning papers. Every model I train will have a demo you can load up locally to test/try models. this is pretty much my main repo to practicing build ML pipelines from data collection to MLops. Each model has it's own branch. To see the template I used for each project see project_template branch

machine setup

Here is the machine I used to train all these models

  • cpu: threadripper 1920x
  • gpu: duel RTX 3090s
  • ram: 120 Gbs of DDR4(rgb to make faster)
  • os: Ubuntu 22.04 LTS/20.04 LTS
  • editor: VScode

Papers in repo

Image

  • Gradient-based learning applied to document recognition(lenet5): https://ieeexplore.ieee.org/document/726791
  • ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)" by Alex Krizhevsky et al. (2012): https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
  • "Very Deep Convolutional Networks for Large-Scale Image Recognition (VGGNet)" by Karen Simonyan and Andrew Zisserman (2014)
  • "Going Deeper with Convolutions (GoogLeNet/Inception)" by Christian Szegedy et al. (2014)
  • "Deep Residual Learning for Image Recognition (ResNet)" by Kaiming He et al. (2015)
  • "U-Net: Convolutional Networks for Biomedical Image Segmentation" by Olaf Ronneberger et al. (2015)
  • "YOLO: Unified, Real-Time Object Detection" by Joseph Redmon et al. (2016)
  • "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" by Andrew G. Howard et al. (2017)
  • "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" by Mingxing Tan and Quoc V. Le (2019)
  • "Generative Adversarial Nets (GANs)" by Ian Goodfellow et al. (2014)
  • "Vision Transformers (ViT): An Image is Worth 16x16 Words" by Alexey Dosovitskiy et al. (2020)
  • CLIP: Learning Transferable Visual Models From Natural Language Supervision" by Alec Radford et al. (2021)

Audio

  • WaveNet: A Generative Model for Raw Audio" by Aäron van den Oord et al. (2016)
  • "Deep Speech: Scaling up end-to-end speech recognition" by Aäron van den Oord et al. (2014)
  • "Tacotron: Towards End-to-End Speech Synthesis" by Yuxuan Wang et al. (2017)
  • "Conformer: Convolution-augmented Transformer for Speech Recognition" by Anmol Gulati et al. (2020)
  • "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis" by Jungil Kong et al. (2020)
  • valle...

Text

  • "Sequence to Sequence Learning with Neural Networks" by Ilya Sutskever et al. (2014)
  • "Attention Is All You Need (Transformer)" by Ashish Vaswani et al. (2017)
  • "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Jacob Devlin et al. (2018)
  • "GPT-2: Language Models are Unsupervised Multitask Learners" by Alec Radford et al. (2019)
  • "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" by Zihang Dai et al. (2019)
  • "XLNet: Generalized Autoregressive Pretraining for Language Understanding" by Zhilin Yang et al. (2019)
  • "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" by Victor Sanh et al. (2019)
  • "T5: Text-To-Text Transfer Transformer" by Colin Raffel et al. (2019)

Video