Skip to content

Commit

Permalink
3. and 4. review
Browse files Browse the repository at this point in the history
  • Loading branch information
fmind committed Apr 3, 2024
1 parent 1ceff3f commit a5c063d
Show file tree
Hide file tree
Showing 13 changed files with 58 additions and 58 deletions.
14 changes: 7 additions & 7 deletions docs/3. Refactoring/3.0. Package.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

A Python package is a structured collection of Python modules, which allows for a convenient way to organize and share code. Among the various formats a package can take, the **wheel** format (.whl) stands out. Wheels are a built package format that can significantly speed up the installation process for Python software, compared to distributing source code and requiring the user to build it themselves.

## Why do I need to create a Python package?
## Why do you need to create a Python package?

Creating a Python package offers multiple benefits, particularly for developers looking to distribute their code effectively:

Expand All @@ -15,7 +15,7 @@ Creating a Python package offers multiple benefits, particularly for developers

Additionally, creating a package can enhance the maintainability of your code, enforce good coding practices by encouraging modular design, and facilitate version control and dependency management.

## Which tool should I use to create a Python package?
## Which tool should you use to create a Python package?

The Python ecosystem provides several tools for packaging, each with its unique features and advantages. While the choice can seem overwhelming, as humorously depicted in the [xkcd comic on Python environments](https://xkcd.com/1987/), **Poetry** emerges as a standout option. Poetry simplifies dependency management and packaging, offering an intuitive interface for developers.

Expand All @@ -41,15 +41,15 @@ poetry build --format wheel

For those seeking alternatives, tools like **PDM**, **Hatch**, and **Pipenv** offer different approaches to package management and development, each with its own set of features designed to cater to various needs within the Python community.

## Do you recommend Conda for my AI/ML project?
## Do you recommend Conda for your AI/ML project?

Although Conda is a popular choice among data scientists for its ability to manage complex dependencies, it's important to be aware of its limitations. Challenges such as slow performance, a complex dependency resolver, and confusing channel management can hinder productivity. Moreover, Conda's integration with the Python ecosystem, especially with new standards like `pyproject.toml`, is limited. For managing complex dependencies in AI/ML projects, **Docker containers** present a robust alternative, offering better isolation and compatibility across environments.

## How can I install new dependencies with Poetry?
## How can you install new dependencies with Poetry?

Please refer to [this section of the course](../1. Initializing/1.3. Poetry.md#how-can-i-install-dependencies-for-my-project-with-poetry).

## Which metadata should I provide to my Python package?
## Which metadata should you provide to your Python package?

Including detailed metadata in your `pyproject.toml` file is crucial for defining your package's identity and dependencies. This file should contain essential information such as the package name, version, authors, and dependencies. Here's an example that outlines the basic structure and content for your package's metadata:

Expand Down Expand Up @@ -78,7 +78,7 @@ build-backend = "poetry.core.masonry.api"

This information not only aids users in understanding what your package does but also facilitates its discovery and integration into other projects.

## Where should I add the source code of my Python package?
## Where should you add the source code of your Python package?

For a clean and efficient project structure, placing your package's source code in a `src` directory is recommended. This approach, known as the `src` layout, separates your package's code from other project files, such as tests and documentation, reducing the risk of import clashes and making it easier to package and distribute your code.

Expand All @@ -91,7 +91,7 @@ touch src/bikes/__init__.py

The presence of an `__init__.py` file within a directory indicates to Python that this directory should be treated as a package, making it possible for other parts of your project or external projects to import its modules.

## Should I publish my Python package? On which platform should I publish it?
## Should you publish your Python package? On which platform should you publish it?

Deciding whether to publish your Python package depends on your goals. If you aim to share your work with the broader community or need a convenient way to distribute your code across projects or teams, publishing is a great option. **The Python Package Index (PyPI)** is the primary repository for public Python packages, making it an ideal platform for reaching a wide audience.

Expand Down
8 changes: 4 additions & 4 deletions docs/3. Refactoring/3.1. Modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ import math
print(dir(math))
```

## Why do I need Python modules?
## Why do you need Python modules?

Python modules are essential for managing complexity in your projects. They provide a way to segment your code into distinct namespaces, making your projects more organized, readable, and maintainable. For example, in a machine learning project, you might have separate modules for models (`models.py`), data processing (`datasets.py`), and utility functions (`utils.py`). This separation helps in understanding, testing, and collaborating on large codebases.

Modules become indispensable as your project grows beyond a simple script. While a project with less than 100 lines of code might not need separate modules, larger projects benefit greatly from a modular structure.

## How should I create a Python module?
## How should you create a Python module?

Creating a Python module is as simple as creating a `.py` file within your project package. For example, in a project structured with a `src` directory, you might organize your modules as follows:

Expand All @@ -37,7 +37,7 @@ $ touch src/bikes/datasets.py

This creates two modules, `models.py` and `datasets.py`, under the `bikes` package. Each module can then contain specific functionalities related to your project, such as defining data models or handling dataset loading and preprocessing.

## How should I import my Python module?
## How should you import your Python module?

Importing modules in Python is influenced by the directories listed in `sys.path`, akin to path resolution in Unix systems. When importing a module, Python searches through these directories and imports the first match.

Expand All @@ -50,7 +50,7 @@ print(sys.path)

After installing your package locally (e.g., using `poetry install`), your package's directory will be added to `sys.path`, allowing you to import its modules without specifying their full path.

## How should I organize my Python modules?
## How should you organize your Python modules?

Organizing your Python modules can significantly affect your project's clarity and maintainability. Here are a few strategies for structuring your modules:

Expand Down
8 changes: 4 additions & 4 deletions docs/3. Refactoring/3.2. Paradigms.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,15 @@ class CustomModel(BaseEstimator, TransformerMixin):

This class demonstrates how to structure a machine learning model in an object-oriented way, making it compatible with `scikit-learn`'s pipeline and model selection tools.

## Why do I need to use functions and objects?
## Why do you need to use functions and objects?

Functions and objects play a crucial role in structuring code in a readable, maintainable, and reusable manner. They allow you to encapsulate functionality and state, making complex software systems more manageable.

Functions enable you to define a block of code that performs a single action, which can be executed whenever the function is called. This promotes code reuse and simplifies debugging and testing by isolating functionality.

Objects, fundamental to the object-oriented programming paradigm, bundle data and the methods that operate on that data. This encapsulation fosters modularity, as objects can be developed independently and used in different contexts.

## How should I write a new function or object?
## How should you write a new function or object?

When identifying opportunities to encapsulate code into functions or objects, look for repetitive code patterns, complex logic that needs isolation, or concepts that can be modeled as real-world objects.

Expand All @@ -61,7 +61,7 @@ def load_dataset(path: str, index_col: str = "Id") -> pd.DataFrame:

When it comes to objects, encapsulate data and behavior that logically belong together. For instance, a `DataPreprocessor` class could encapsulate methods for cleaning, normalizing, and transforming data, keeping these operations neatly packaged and reusable.

## How should I organize all my functions and objects?
## How should you organize all your functions and objects?

Structuring functions and objects into modules helps maintain a clean and navigable codebase. This structure should evolve naturally, starting from a simple layout and growing in complexity as the project expands. For example:

Expand All @@ -71,7 +71,7 @@ Structuring functions and objects into modules helps maintain a clean and naviga

This modular approach aids in separation of concerns, making your code more organized and manageable.

## What are the best practices for functions and objects?
## What are the best practices for creating functions and objects?

Following best practices ensures that your functions and objects are reliable, maintainable, and easy to understand:

Expand Down
10 changes: 5 additions & 5 deletions docs/3. Refactoring/3.3. Entrypoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ Package entrypoints are mechanisms in Python packaging that facilitate the expos

To elaborate, entrypoints are specified in a package's setup configuration, marking certain functions or classes to be directly accessible. This setup benefits both developers and users by simplifying access to a package's capabilities, improving interoperability among different software components, and enhancing the user experience by providing straightforward commands to execute tasks.

## Why do I need to set up entrypoints?
## Why do you need to set up entrypoints?

Entrypoints are essential for making specific functionalities of your package directly accessible from the command-line interface (CLI) or to other software. By setting up entrypoints, you allow users to execute components of your package directly from the CLI, streamlining operations like script execution, service initiation, or utility invocation. Additionally, entrypoints facilitate dynamic discovery and utilization of your package's functionalities by other software and frameworks, such as Apache Airflow, without the need for hard-coded paths or module names. This flexibility is particularly beneficial in complex, interconnected systems where adaptability and ease of use are paramount.

## How do I create entrypoints with poetry?
## How do you create entrypoints with poetry?

Creating entrypoints with Poetry involves specifying them in the `pyproject.toml` file under the `[tool.poetry.scripts]` section. This section outlines the command-line scripts that your package will make available:

Expand All @@ -27,7 +27,7 @@ $ poetry run bikes one two three

This snippet run the bikes entrypoint from the CLI and passes 3 positional arguments: one, two, and three.

## How can I use this entrypoint in other software?
## How can you use this entrypoint in other software?

Defining and installing a package with entrypoints enables other software to easily leverage these entrypoints. For example, within Apache Airflow, you can incorporate a task in a Directed Acyclic Graph (DAG) to execute one of your CLI tools as part of an automated workflow. By utilizing Airflow's `BashOperator` or `PythonOperator`, your package’s CLI tool can be invoked directly, facilitating seamless integration:

Expand Down Expand Up @@ -66,7 +66,7 @@ with DAG(

In this example, `submit_databricks_job` is a task that executes the `bikes` entrypoint.

## How can I use this entrypoint from the command-line (CLI)?
## How can you use this entrypoint from the command-line (CLI)?

Once your Python package has been packaged with Poetry and a wheel file is generated, you can install and use the package directly from the command-line interface (CLI). Here are the steps to accomplish this:

Expand All @@ -88,7 +88,7 @@ pip install dist/bikes*.whl
bikes one two three
```

## Which should be the input or output of my entrypoint?
## Which should be the input or output of your entrypoint?

**Inputs** for your entrypoint can vary based on the requirements and functionalities of your package but typically include:

Expand Down
10 changes: 5 additions & 5 deletions docs/3. Refactoring/3.4. Configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ job:
This structure allows for easy adjustment of parameters like file paths or job kinds, facilitating the program's operation across diverse environments and use cases.
## Why do I need to write configurations?
## Why do you need to write configurations?
Configurations enhance your code's flexibility, making it adaptable to different environments and scenarios without source code modifications. This separation of code from its execution environment boosts portability and simplifies updates or changes, much like adjusting settings in an application without altering its core functionality.
## Which file format should I use for configurations?
## Which file format should you use for configurations?
When choosing a format for configuration files, common options include JSON, TOML, and YAML. YAML is frequently preferred for its readability, ease of use, and ability to include comments, which can be particularly helpful for documentation and maintenance. However, it's essential to be aware of YAML's potential for loading malicious content; therefore, always opt for safe loading practices.
## How should I pass configuration files to my program?
## How should you pass configuration files to your program?
Passing configuration files to your program typically utilizes the CLI, offering a straightforward method to integrate configurations with additional command options or flags. For example, executing a command like:
Expand All @@ -35,7 +35,7 @@ $ bikes defaults.yaml training.yaml --verbose

This example enables the combination of configuration files with verbosity options for more detailed logging. This flexibility is also extendable to configurations stored on cloud services, provided your application supports such paths.

## Which toolkit should I use to parse and load configurations?
## Which toolkit should you use to parse and load configurations?

For handling configurations in Python, [OmegaConf](https://omegaconf.readthedocs.io/) offers a powerful solution with features like YAML loading, deep merging, variable interpolation, and read-only configurations. It's particularly suited for complex settings and hierarchical structures. Additionally, for applications involving cloud storage, [cloudpathlib](https://cloudpathlib.drivendata.org/stable/) facilitates direct loading from services like AWS, GCP, and Azure.

Expand All @@ -58,7 +58,7 @@ class TrainTestSplitter(pdt.BaseModel):
random_state: int = 42
```

## When should I use environment variables instead of configurations files?
## When should you use environment variables instead of configurations files?

Environment variables are more suitable for simple configurations or when dealing with sensitive information that shouldn't be stored in files, even though they lack the structure and type-safety of dedicated configuration files. They are universally supported and easily integrated but may become cumbersome for managing complex or numerous settings.

Expand Down
8 changes: 4 additions & 4 deletions docs/3. Refactoring/3.5. Documentations.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Software documentation encompasses written text or illustrations that support a software project. It can range from comprehensive API documentation to high-level overviews, guides, and tutorials. Effective documentation plays a crucial role in assisting users and contributors by explaining how to utilize and contribute to a project, ensuring the software is accessible and maintainable.

## Why do I need to create documentations?
## Why do you need to create documentations?

Documentation is pivotal for several reasons:

Expand All @@ -15,7 +15,7 @@ Documentation is pivotal for several reasons:

High-quality documentation encourages the use of your software and is valued by your users, while poor documentation can hinder developer productivity and deter users from engaging with your solution.

## How should I associate documentations to my code base?
## How should you associate documentations to your code base?

Documentation within Python code can be incorporated in three key places:

Expand Down Expand Up @@ -55,7 +55,7 @@ class ParquetReader(Reader):

Beyond in-code documentation, complementing it with external documentation (e.g., project organization guides or how to report a bug) is beneficial.

## Which tool, format, and convention should I use to create documentations?
## Which tool, format, and convention should you use to create documentations?

For creating documentation, you have multiple tools, formats, and conventions at your disposal:

Expand Down Expand Up @@ -87,7 +87,7 @@ Diataxis is a framework that offers a systematic approach to crafting technical

![Diataxis quadrant](https://diataxis.fr/_images/diataxis.png)

## What are best practices for writing my project documentation?
## What are best practices for writing your project documentation?

1. **Clarity and Conciseness**: Strive for clear, straightforward documentation, avoiding complex language or unnecessary technical jargon.
2. **Consistent Style**: Maintain a uniform style and format throughout all documentation to enhance readability.
Expand Down
Loading

0 comments on commit a5c063d

Please sign in to comment.