Review code structure

MLOps-Courses · Mar 21, 2024 · 193c43a · 193c43a
1 parent 8c0d1c3
commit 193c43a
Show file tree

Hide file tree

Showing 58 changed files with 137 additions and 117 deletions.
diff --git a/docs/0. Overview/0.0. Summary.md → docs/0. Overview/0.0. Course.md b/docs/0. Overview/0.0. Summary.md → docs/0. Overview/0.0. Course.md
@@ -1,15 +1,17 @@
-# The MLOPS template course
+# 0.0 Course
 
 ## In few words
 
 
 ## Intended Audience
 
+## Prerequisites knowledge
+
 
 ## How to read ?
 
 
 ## Technology
 
 
-Copyright 
+Copyright
diff --git a/docs/0. Overview/0.1. Projects.md b/docs/0. Overview/0.1. Projects.md
@@ -0,0 +1 @@
+# 0.1 Projects
diff --git a/docs/0. Overview/0.1. Datasets.md → docs/0. Overview/0.2. Datasets.md b/docs/0. Overview/0.1. Datasets.md → docs/0. Overview/0.2. Datasets.md
@@ -1,4 +1,4 @@
-# 0.1. Data
+# 0.2. Datasets
 
 Data is often referred to as the fuel for Machine Learning, and although this course focuses on MLOps, it's crucial to have access to data to fully grasp the various concepts and technologies involved.
 
@@ -11,13 +11,13 @@ Briefly we can note the following data types:
 ### Structured Data
 Structured data adheres to a predefined model, making it easier to search and organize.
 
-* *Tabular*: 
+* *Tabular*:
     Perhaps the most common type of data, where data is organize in rows and columns.
     * Column are homogeneous in terms of types
-    * Typically CSV files and Relational database 
+    * Typically CSV files and Relational database
 * *Time Series*:
-    Sequence of data points collected or recorded at successive points in time, usually at consistent intervals. 
-    * Characterized by temporal order, meaning the sequence of observations is crucial, and changing the order can alter the meaning or interpretation of the data. 
+    Sequence of data points collected or recorded at successive points in time, usually at consistent intervals.
+    * Characterized by temporal order, meaning the sequence of observations is crucial, and changing the order can alter the meaning or interpretation of the data.
     * Typically, financial data and energy
 * *Geospatial*:
     Data representing a specific location or geographic area on earth
@@ -39,7 +39,7 @@ Unstructured data does not follow a predefined model, making it more complex to
     * characterized by high number of unique words, contextual meaning and ambiguity
 * *Multimedia*:
     Refer to picture, sound, video data
-    * challenging due to the high dimensionality, large file sizes, and the complexity of extracting meaningful patterns. 
+    * challenging due to the high dimensionality, large file sizes, and the complexity of extracting meaningful patterns.
 
 ### Semi Structured Data
 
@@ -50,7 +50,7 @@ Examples are XML and JSON files.
 
 ## Which data should I use?
 
-The question of which dataset to use is common, and honestly, the best dataset is the one you're most familiar with. 
+The question of which dataset to use is common, and honestly, the best dataset is the one you're most familiar with.
 While the vast array of data types and their diverse applications might seem overwhelming, it's important to remember that many MLOps concepts are universal and can be applied across different domains.
 
 We will look into the specificities of certain types of applications later in the course. For now, we offer two options for getting started.

diff --git a/docs/0. Overview/0.2. Architecture.md → docs/0. Overview/0.3. Platforms.md b/docs/0. Overview/0.2. Architecture.md → docs/0. Overview/0.3. Platforms.md
@@ -1,4 +1,4 @@
-# Why not specific tools?
+# 0.3. Platforms
 
 Databricks, metaflow ...
 

diff --git a/docs/0. Overview/0.4. Mentoring.md b/docs/0. Overview/0.4. Mentoring.md
@@ -0,0 +1,3 @@
+# 0.4. Mentoring
+
+Mentoring
diff --git a/docs/0. Overview/0.5. Assistants.md b/docs/0. Overview/0.5. Assistants.md
@@ -0,0 +1 @@
+# 0.5. Assistants
diff --git a/docs/0. Overview/0.6. Resources.md b/docs/0. Overview/0.6. Resources.md
@@ -0,0 +1 @@
+# 0.6. Resources
diff --git a/docs/0. Overview/index.md b/docs/0. Overview/index.md
@@ -0,0 +1 @@
+# 0. Overview
diff --git a/docs/1. Initializing/1.0. System.md b/docs/1. Initializing/1.0. System.md
@@ -0,0 +1 @@
+# 1.0. System
diff --git a/docs/1. Initializing/1.3. pyenv.md → docs/1. Initializing/1.1. pyenv.md b/docs/1. Initializing/1.3. pyenv.md → docs/1. Initializing/1.1. pyenv.md
@@ -1,4 +1,4 @@
-# 1.2. pyenv
+# 1.1. pyenv
 
 ## What is pyenv?
 

diff --git a/docs/1. Initializing/1.2. Python.md b/docs/1. Initializing/1.2. Python.md
@@ -0,0 +1 @@
+# 1.2. Python
diff --git a/docs/1. Initializing/1.4. poetry.md → docs/1. Initializing/1.3. Poetry.md b/docs/1. Initializing/1.4. poetry.md → docs/1. Initializing/1.3. Poetry.md
@@ -1,4 +1,4 @@
-# 1.3. poetry
+# 1.3. Poetry
 
 # 1.3. poetry
 

diff --git a/docs/1. Initializing/1.5. git.md → docs/1. Initializing/1.4. git.md b/docs/1. Initializing/1.5. git.md → docs/1. Initializing/1.4. git.md
diff --git a/docs/1. Initializing/1.6. GitHub.md → docs/1. Initializing/1.5. GitHub.md b/docs/1. Initializing/1.6. GitHub.md → docs/1. Initializing/1.5. GitHub.md
diff --git a/docs/1. Initializing/1.6. VS Code.md b/docs/1. Initializing/1.6. VS Code.md
@@ -0,0 +1 @@
+# 1.6. Visual Studio Code
diff --git a/docs/1. Initializing/1.7. VS Code.md b/docs/1. Initializing/1.7. VS Code.md
diff --git a/docs/1. Initializing/1.0. Starting up.md → docs/1. Initializing/index.md b/docs/1. Initializing/1.0. Starting up.md → docs/1. Initializing/index.md
@@ -1,4 +1,4 @@
-# Initialization
+# 1. Initializing
 
 This section introduces many basic concepts common to all software projects, which also apply to MLOps
 

diff --git a/docs/2. Prototyping/2.0. Notebook.md → docs/2. Prototyping/2.0. Notebooks.md b/docs/2. Prototyping/2.0. Notebook.md → docs/2. Prototyping/2.0. Notebooks.md
@@ -1,4 +1,4 @@
-# 2.0. Notebook
+# 2.0. Notebooks
 
 ## What is a notebook?
 

diff --git a/docs/2. Prototyping/2.2. Configs.md b/docs/2. Prototyping/2.2. Configs.md
@@ -74,4 +74,64 @@ How to load/transform datasets ...
 ## Pipelines
 
 How to define/run model pipelines ...
-```
+```
+
+## What are options?
+
+Options in a data science environment, such as a Jupyter notebook, are configurations that tailor the behavior and appearance of libraries like pandas, matplotlib, and scikit-learn. These options allow you to control aspects like display settings and output formats.
+
+Example of options in a notebook:
+
+```python
+# Pandas
+pd.options.display.max_rows = None
+pd.options.display.max_columns = None
+# Sklearn
+set_config(transform_output="pandas")
+```
+
+## Why do I need to pass options?
+
+Default settings of libraries may not always align with your specific needs. For example:
+- Pandas may hide some columns or rows by default, limiting the visibility of data.
+- Matplotlib's default figure sizes might be too small for detailed analysis.
+
+Adjusting these options ensures your environment is optimized for your workflow.
+
+## How should I configure Pandas options?
+
+Pandas offers a variety of options for customizing data display. Check the [Pandas Options and Settings documentation](https://pandas.pydata.org/docs/user_guide/options.html) for a comprehensive guide.
+
+```python
+import pandas as pd
+
+# Set the maximum number of rows and columns to display
+pd.options.display.max_rows = None
+pd.options.display.max_columns = None
+# Extend the maximum column width for display
+pd.options.display.max_colwidth= None
+```
+
+## How should I configure matplotlib options?
+
+Matplotlib's appearance can be customized as per your requirements. Refer to the [Matplotlib Customizing Guide](https://matplotlib.org/stable/users/explain/customizing.html) for detailed options.
+
+```python
+import matplotlib.pyplot as plt
+
+# Set default figure size
+plt.rcParams['figure.figsize'] = (20, 10)
+```
+
+## How should I configure scikit-learn options?
+
+Scikit-learn provides configurations to modify how outputs are displayed or handled. The [official documentation](https://scikit-learn.org/stable/modules/generated/sklearn.set_config.html#sklearn.set_config) outlines these options.
+
+```python
+import sklearn
+
+# return pandas dataframe instead of numpy array
+sklearn.set_config(transform_output='pandas')
+```
+
+Setting these options at the beginning of your notebook ensures a consistent and tailored working environment throughout your analysis.
diff --git a/docs/2. Prototyping/2.4. Datasets.md → docs/2. Prototyping/2.3. Datasets.md b/docs/2. Prototyping/2.4. Datasets.md → docs/2. Prototyping/2.3. Datasets.md
@@ -1,4 +1,4 @@
-# 2.4. Datasets
+# 2.3. Datasets
 
 ## What are datasets?
 
@@ -33,15 +33,6 @@ Selecting a file format for your dataset involves considering several factors:
     - **Dense**: Every data point is stored (e.g., CSV, Parquet).
     - **Sparse**: Only non-zero values are stored, useful for data with many empty values (e.g., SciPy sparse matrices).
 
-## How can I explore my dataset content?
-
-Pandas is a popular tool for exploring datasets in Python. Common methods include:
-- `.info()`: Overview of types, non-null values, and memory usage.
-- `.shape`: Dimensions of the dataframe.
-- `.describe()`: Descriptive statistics.
-
-For visual exploration, libraries like [plotly.express](https://plotly.com/python/plotly-express/), [matplotlib](https://matplotlib.org/), and [seaborn](https://seaborn.pydata.org/), and [ydata-profiling](https://github.com/ydataai/ydata-profiling) are useful.
-
 ## How can I optimize the dataset loading process?
 
 To improve dataset loading and handling:

diff --git a/docs/2. Prototyping/2.3. Options.md b/docs/2. Prototyping/2.3. Options.md
diff --git a/docs/2. Prototyping/2.4. Analysis.md b/docs/2. Prototyping/2.4. Analysis.md
@@ -0,0 +1,12 @@
+# 2.4. Analysis
+
+## pandas profiling
+
+## How can I explore my dataset content?
+
+Pandas is a popular tool for exploring datasets in Python. Common methods include:
+- `.info()`: Overview of types, non-null values, and memory usage.
+- `.shape`: Dimensions of the dataframe.
+- `.describe()`: Descriptive statistics.
+
+For visual exploration, libraries like [plotly.express](https://plotly.com/python/plotly-express/), [matplotlib](https://matplotlib.org/), and [seaborn](https://seaborn.pydata.org/), and [ydata-profiling](https://github.com/ydataai/ydata-profiling) are useful.
diff --git a/docs/2. Prototyping/2.5. Pipelines.md → docs/2. Prototyping/2.5. Modeling.md b/docs/2. Prototyping/2.5. Pipelines.md → docs/2. Prototyping/2.5. Modeling.md
@@ -1,4 +1,4 @@
-# 2.5. Pipelines
+# 2.5. Modeling
 
 ## What are pipelines?
 

diff --git a/docs/2. Prototyping/2.6. Evaluations.md b/docs/2. Prototyping/2.6. Evaluations.md
@@ -1,4 +1,4 @@
-# 2.6. Evaluations
+# 2.6. Evaluation
 
 ## What is an evaluation?
 

diff --git a/docs/2. Prototyping/index.md b/docs/2. Prototyping/index.md
@@ -0,0 +1 @@
+# 2. Prototyping
diff --git a/docs/3. Refactoring/3.2. Functions.md → docs/3. Refactoring/3.2. Paradigms.md b/docs/3. Refactoring/3.2. Functions.md → docs/3. Refactoring/3.2. Paradigms.md
@@ -1,4 +1,4 @@
-# 3.2. Functions
+# 3.3. Paradigms
 
 ## What is a function?
 

diff --git a/docs/3. Refactoring/index.md b/docs/3. Refactoring/index.md
@@ -0,0 +1 @@
+# 3. Refactoring
diff --git a/docs/4. Validating/4.0. Checkers.md b/docs/4. Validating/4.0. Checkers.md
diff --git a/docs/4. Validating/4.1. Typing.md → docs/4. Validating/4.0. Typing.md b/docs/4. Validating/4.1. Typing.md → docs/4. Validating/4.0. Typing.md
@@ -1,4 +1,4 @@
-# 4.1. Typing
+# 4.0. Typing
 
 ## What is Typing in Python?
 
@@ -30,4 +30,6 @@ The importance of typing in Python projects, particularly large-scale or complex
 5. **Integrate with CI/CD Pipelines**: Incorporate mypy checks into your continuous integration/continuous deployment workflows to automatically catch type issues before they make it to production.
 6. **Team Guidelines**: Establish team guidelines on how and when to use type annotations to maintain consistency across the codebase.
 7. **Regular Reviews**: Regularly review the type annotations in your code, especially after major refactoring or updates to Python’s typing module, to ensure they remain accurate and useful.
-8. **Leverage Advanced Features**: Explore advanced features of mypy, such as type inference, generic types, and custom type definitions, to handle more complex typing scenarios.
+8. **Leverage Advanced Features**: Explore advanced features of mypy, such as type inference, generic types, and custom type definitions, to handle more complex typing scenarios.
+
+TODO: Pandera, Pydantic
diff --git a/docs/4. Validating/4.2. Linting.md → docs/4. Validating/4.1. Linting.md b/docs/4. Validating/4.2. Linting.md → docs/4. Validating/4.1. Linting.md
@@ -1,4 +1,4 @@
-# 4.2. Linting
+# 4.1. Linting
 
 ## What is Linting in Python?
 

diff --git a/docs/4. Validating/4.3. Testing.md → docs/4. Validating/4.2. Testing.md b/docs/4. Validating/4.3. Testing.md → docs/4. Validating/4.2. Testing.md
@@ -1,4 +1,4 @@
-# 4.3. Tests
+# 4.2. Testing
 
 ## What are Tests in Python?
 

diff --git a/docs/4. Validating/4.4. Logging.md → docs/4. Validating/4.3. Logging.md b/docs/4. Validating/4.4. Logging.md → docs/4. Validating/4.3. Logging.md
@@ -1,4 +1,4 @@
-# 4.4. Logging
+# 4.3. Logging
 
 ## What is Logging in Python?
 

diff --git a/docs/4. Validating/4.4. Security.md b/docs/4. Validating/4.4. Security.md
@@ -0,0 +1,3 @@
+# 4.4. Security
+
+ruff, bandit
diff --git a/docs/4. Validating/index.md b/docs/4. Validating/index.md
@@ -0,0 +1 @@
+# 4.0 Validating
diff --git a/docs/5. Refining/5.0. Patterns.md b/docs/5. Refining/5.0. Patterns.md
@@ -1,3 +1,3 @@
-# 5.6. Security
+# 5.0. Patterns
 
 Pydantic
diff --git a/docs/5. Refining/5.4. Containers.md b/docs/5. Refining/5.4. Containers.md
@@ -0,0 +1,3 @@
+# 5.4. Containers
+
+Docker
diff --git a/docs/5. Refining/5.4. Versions.md b/docs/5. Refining/5.4. Versions.md
diff --git a/docs/5. Refining/5.5. Containers.md b/docs/5. Refining/5.5. Containers.md
diff --git a/docs/5. Refining/5.5. Experiments.md b/docs/5. Refining/5.5. Experiments.md
@@ -0,0 +1 @@
+# 5.5. Experiments
diff --git a/docs/5. Refining/5.6. Model Registries.md b/docs/5. Refining/5.6. Model Registries.md
@@ -0,0 +1 @@
+# 5.6. Model Registries
diff --git a/docs/5. Refining/5.6. Security.md b/docs/5. Refining/5.6. Security.md
diff --git a/docs/5. Refining/index.md b/docs/5. Refining/index.md
@@ -0,0 +1 @@
+# 5. Refining
diff --git a/docs/6. Collaborating/6.0. Repository.md b/docs/6. Collaborating/6.0. Repository.md
@@ -0,0 +1 @@
+# 6.0. Repository