This repository documents my structured learning path toward GPU and ML systems engineering, with a focus on high-performance computing, CUDA programming, and ML infrastructure. The journey combines theoretical learning with hands-on projects, covering modern C++, CUDA, computer architecture, and ML systems design.
- Master modern C++ and GPU programming principles
- Develop expertise in CUDA and parallel computing
- Understand ML system architecture and optimization
- Build practical experience with distributed ML systems
- Gain deep knowledge of computer architecture
Implementation exercises focusing on:
- Memory management and RAII
- Template metaprogramming
- Concurrent programming
- Performance optimization
Hands-on GPU programming including:
- Matrix operations library
- Custom ML kernels
- Performance profiling
- Multi-GPU implementations
End-to-end ML infrastructure projects:
- Inference engine
- Distributed training system
- Model serving infrastructure
- Performance optimization tools
Detailed notes from:
- Technical books
- Online courses
- Conference talks
- Research papers
Technical writing documenting:
- Learning progress
- Project insights
- Performance analyses
- Architecture decisions
This is a 6-month structured learning plan (November 2024 - May 2025)
Month 1: C++ Foundations & CUDA Basics
- Modern C++ review
- Basic CUDA programming
- Performance profiling
Month 2: Advanced CUDA & Computer Architecture
- GPU architecture deep dive
- Memory optimization
- Cache-friendly algorithms
Month 3: ML Operations & GPU Optimization
- Custom ML operators
- CUDA kernel optimization
- PyTorch C++ integration
Month 4: Distributed Systems & ML Infrastructure
- Multi-GPU programming
- Distributed training
- Network optimization
Month 5: ML Compilation & Advanced Optimization
- Kernel fusion
- Compilation techniques
- Advanced optimization
Month 6: Production ML Systems
- Model serving
- Production monitoring
- System optimization
- "A Tour of C++" by Bjarne Stroustrup
- "Effective Modern C++" by Scott Meyers
- "Programming Massively Parallel Processors"
- "Computer Architecture: A Quantitative Approach"
- Georgia Tech High Performance Computing
- NVIDIA CUDA Programming Course
- Various ML systems courses
- NVIDIA NSight Systems
- CUDA Toolkit
- Modern C++ development environment
- Performance profiling tools
- Repository setup
- Learning plan development
- C++ refresher on Exercism
- Initial chapters of "A Tour of C++"
- Smart pointer implementation
- Basic CUDA setup
- ...
While this is a personal learning repository, I welcome discussions, suggestions, and feedback through issues and discussions.
This project is licensed under the MIT License - see the LICENSE file for details.
This is a living document that will be updated as the learning journey progresses.