Skip to content

Commit

Permalink
2. done
Browse files Browse the repository at this point in the history
  • Loading branch information
fmind committed Mar 26, 2024
1 parent 54a589c commit e6cfba5
Show file tree
Hide file tree
Showing 9 changed files with 441 additions and 272 deletions.
6 changes: 5 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@
"codebases",
"Codespaces",
"Colab",
"dataframes",
"hyperparameters",
"overfitting",
"prohib",
"pyenv"
"pyenv",
"scikit"
]
}
40 changes: 24 additions & 16 deletions docs/2. Prototyping/2.0. Notebooks.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,36 @@
# 2.0. Notebooks

## What is a notebook?
## What is a Python notebook?

A notebook, typically with the extension `.ipynb`, is an interactive document combining source code, explanatory text, and output. Stored as a JSON file on the disk, it transforms into an intuitive interface in IDEs, displaying code cells, narrative text, visualizations, and results in an integrated format.
A Python notebook, often referred to simply as a "notebook," is an interactive computing environment that allows users to combine executable code, rich text, visuals, and other multimedia resources in a single document. This tool is invaluable for data analysis, machine learning projects, documentation, and educational purposes, among others. Notebooks are structured in a cell-based format, where each cell can contain either code or text. When code cells are executed, the output is displayed directly beneath them, facilitating a seamless integration of code and content.

## Where can I learn how to use notebooks?

Learning how to use notebooks is straightforward, thanks to a plethora of online resources. Beginners can start with the official documentation of popular notebook applications like Jupyter (Jupyter Documentation) or Google Colab. For more interactive learning, platforms such as Coursera, Udacity, and edX offer courses specifically tailored to using Python notebooks for data science and machine learning projects. YouTube channels dedicated to data science and Python programming also frequently cover notebooks, providing valuable tips and tutorials for both beginners and advanced users.

## Why should I use a notebook for prototyping?

Notebooks are particularly suited for prototyping due to their interactive nature:
- **Interactive Development**: Allows for real-time code execution and immediate feedback.
- **Exploratory Analysis**: Ideal for data scientists to experiment with different approaches swiftly.
- **Visualization Support**: Seamlessly integrates data visualizations alongside code.
Notebooks offer an unparalleled environment for prototyping due to their unique blend of features:

- **Interactive Development**: Notebooks allow for real-time code execution, offering immediate feedback on code functionality. This interactivity is especially beneficial when testing new ideas or debugging.
- **Exploratory Analysis**: The ability to quickly iterate over different analytical approaches and visualize results makes notebooks an ideal tool for exploratory data analysis.
- **Productive Environment**: The integrated environment of notebooks helps maintain focus by minimizing the need to switch between tools or windows. This consolidation of resources boosts productivity and streamlines the development process.

In addition, the narrative structure of notebooks supports a logical flow of ideas, facilitating the documentation of thought processes and methodologies. This makes it easier to share insights with peers or stakeholders and foster collaboration.

As an alternative to notebooks, consider using the [Python Interactive Window](https://code.visualstudio.com/docs/python/jupyter-support-py) in Visual Studio Code or other text editors. These environments combine the interactivity and productivity benefits of notebooks with the robustness and feature set of an integrated development environment (IDE), such as source control integration, advanced editing tools, and a wide range of extensions for additional functionality.

*Note*: As an alternative, consider the [Python Interactive Window](https://code.visualstudio.com/docs/python/jupyter-support-py) in VS Code, if your code editor supports it.
## Can I use my notebook in production instead of creating a Python package?

## Can I use my notebook instead of creating a Python package?
Using notebooks in the early stages of development offers many advantages; however, they are not well-suited for production environments due to several limitations:

While notebooks are convenient for initial stages, they have limitations for scalable software development:
- **Mixed Content**: Merging code and output can be messy and less readable.
- **Non-Sequential Flow**: The execution order isn't inherently linear, leading to potential confusion.
- **Code Review Challenges**: Difficult to conduct thorough code reviews and implement unit tests.
- **Lack of Reusability**: Doesn't naturally encourage the development of reusable code structures like classes and functions.
- **Lack of Integration**: Notebooks often do not integrate seamlessly with tools commonly used in the Python software development ecosystem, such as testing frameworks (pytest), linting tools (ruff), and package managers (poetry).
- **Mixed Content**: The intermingling of code, output, and narrative in a single document can complicate version control and maintenance, especially with complex projects.
- **Non-Sequential Flow**: Notebooks do not enforce a linear execution order, which can lead to confusion and errors if cells are run out of sequence.
- **Lack of Reusability**: The format of notebooks does not naturally encourage the development of reusable and modular code, such as functions, classes, or packages.

For robust development, transitioning from notebooks to more structured Python packages is advisable.
For these reasons, it is advisable to transition from notebooks to structured Python packages for production. Doing so enables better software development practices, such as unit testing, continuous integration, and deployment, thereby enhancing code quality and maintainability.

## Do I need to review this section even if I know how to use notebooks?
## Do I need to review this chapter even if I know how to use notebooks?

Even experienced users may benefit from revisiting this section. It's a chance to refresh knowledge, discover new features or tools, and stay updated with best practices in notebook usage.
Yes, even seasoned users can benefit from reviewing this chapter. It introduces advanced techniques, new features, and tools that you may not know about. Furthermore, the chapter emphasizes structuring notebooks effectively and applying best practices to improve readability, collaboration, and overall efficiency.
64 changes: 34 additions & 30 deletions docs/2. Prototyping/2.1. Imports.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,79 +2,83 @@

## What are code imports?

Code imports in Python are directives that allow you to include and use functionality from external libraries or modules within your project. They are essential for accessing a wide range of capabilities that Python and its ecosystem offer.
In Python, **code imports** are statements that let you include functionality from other libraries or modules into your current project. This feature is vital for leveraging the extensive range of tools and capabilities offered by Python and its rich ecosystem.

According to [PEP 8](https://peps.python.org/pep-0008/#imports), imports should be grouped in the following order:
1. **Standard Library Imports**: Built-in Python modules (e.g., os, sys, math).
2. **Related Third Party Imports**: External libraries installed via package managers (e.g., numpy, pandas).
3. **Local Application/Library Specific Imports**: Modules or packages specific to your project.
As outlined by [PEP 8](https://peps.python.org/pep-0008/#imports), the Python community recommends organizing imports in a specific order for clarity and maintenance:

Example of code imports in a notebook:
1. **Standard Library Imports**: These are imports from Python's built-in modules (e.g., `os`, `sys`, `math`). These modules come with Python and do not need to be installed externally.
2. **Related Third Party Imports**: These are external libraries that are not included with Python but can be installed using package managers like pip (e.g., `numpy`, `pandas`). They extend Python's functionality significantly.
3. **Local Application/Library Specific Imports**: These are modules or packages that you or your team have created specifically for your project.

Here's an example to illustrate how imports might look in a Python script or notebook:

```python
import os # standard
import pandas as pd # external
from my_project import my_module # local
import os # Standard library module
import pandas as pd # External library module
from my_project import my_module # Internal project module
```

## Which packages do I need for my project?

For a data science project, there are several key packages available on [PyPI](https://pypi.org/):
In the realm of data science, a few key Python packages form the backbone of most projects, enabling data manipulation, visualization, and machine learning. Essential packages include:

- **[pandas](https://pandas.pydata.org/)**: Essential for data manipulation and analysis.
- **[plotly.express](https://plotly.com/python/plotly-express/)**: For creating interactive visualizations.
- **[scikit-learn](https://scikit-learn.org/)**: A versatile library for machine learning.
- **Pandas**: For data manipulation and analysis.
- **NumPy**: For numerical computing and array manipulation.
- **Matplotlib**: For creating static, interactive, and animated visualizations.
- **Scikit-learn**: For machine learning, providing simple and efficient tools for data analysis and modeling.
- **Plotly**: For interactive and aesthetically pleasing visualizations.

Install these using poetry:
To integrate these packages into your project using poetry, you can execute the following command in your terminal:

```bash
$ poetry add pandas plotly scikit-learn
poetry add pandas numpy matplotlib scikit-learn plotly
```

*Note*: Ensure you're in the correct [virtual environment](https://peps.python.org/pep-0405/) linked to your project.
This command tells poetry to download and install these packages, along with their dependencies, into your project environment, ensuring version compatibility and easy package management.

## How should I organize my imports to facilitate my work?

Organizing imports is a matter of preference and project standards. Importing entire modules (e.g., `import pandas as pd`) is often recommended for clarity. It helps in identifying the module origin of functions/classes and in adjusting imports as your code evolves.
Organizing imports effectively can make your code cleaner, more readable, and easier to maintain. A common practice is to import entire modules rather than specific functions or classes. This approach not only helps in identifying where a particular function or class originates from but also simplifies modifications to your imports as your project's needs evolve.

Consider the following examples:

```python
# import module
# Importing entire modules (recommended)
import pandas as pd
from sklearn import ensemble
model = ensemble.RandomForestClassifier()

# import functions/classes
# Importing specific functions/classes
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier
model = RandomForestClassifier()
```

Importing entire modules (`import pandas as pd`) is generally recommended for clarity, as it makes it easier to track the source of various functions and classes used in your code.

## Are there any side effects when importing modules in Python?

Yes, importing a module in Python can have side effects because the entire module's code is executed upon import. This behavior can be beneficial or potentially harmful. Hence, it's crucial to:
- Only import trustworthy packages.
- Be cautious about unintended side effects in your own modules.
- Clearly document any intentional side effects.
Yes, importing a module in Python executes all the top-level code in that module, which can lead to side effects. These effects can be both intentional and unintentional. It's crucial to import modules from trusted sources to avoid security risks or unexpected behavior. Be especially cautious of executing code with side effects in your own modules, and make sure any such behavior is clearly documented.

Example of risky behavior:
Consider this cautionary example:

```python
# In a module, a potentially harmful operation could be triggered
# A module with a potentially harmful operation
# lib.py
import os
os.system("rm -rf /") # Dangerous command!
os.system("rm -rf /") # This command is extremely dangerous!

# main.py
import lib # Executing lib.py could lead to data loss
import lib # Importing lib.py could lead to data loss
```

## What should I do if packages cannot be imported from my notebook?

If a package isn't importing correctly, it's often due to the Python interpreter's inability to locate it. This is common when working with virtual environments. To troubleshoot, check the interpreter path and module search paths in your notebook:
If you encounter issues importing packages, it may be because the Python interpreter can't find them. This problem is common when using virtual environments. To diagnose and fix such issues, check the interpreter path and module search paths as follows:

```python
import sys
print("Interpreter path:", sys.executable)
print("Module search paths:", sys.path)
```

Adjusting these paths or ensuring the correct virtual environment is active can resolve import issues.
Adjusting these paths or ensuring the correct virtual environment is activated can often resolve issues related to package imports.
Loading

0 comments on commit e6cfba5

Please sign in to comment.