Skip to content

Commit

Permalink
4.0, 4.1, 4.2 done
Browse files Browse the repository at this point in the history
  • Loading branch information
fmind committed Mar 30, 2024
1 parent 4236337 commit ec525a8
Show file tree
Hide file tree
Showing 5 changed files with 441 additions and 102 deletions.
195 changes: 171 additions & 24 deletions docs/4. Validating/4.0. Typing.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,182 @@
# 4.0. Typing

## What is Typing in Python?
## What is programming typing?

Typing in Python refers to specifying the types of variables, function parameters, and return values in your code. This practice, known as type annotation, was introduced in Python 3.5 and has been increasingly adopted for its benefits in code clarity and error prevention.
Typing in programming involves designating specific data types for variables, functions, and classes within a programming language. This concept is critical for managing how data is stored, processed, and interacted within software applications.

### Tool for Typing in Python:
Programming languages are categorized into three main types based on how they handle typing:

1. **mypy**:
- **Description**: mypy is a popular static type checker for Python. It uses the type hints you provide in your code to verify that your code adheres to these types, catching potential bugs and inconsistencies before runtime.
- **Usage**: After adding type hints to your Python code, run mypy to analyze the codebase. It checks for type errors and reports mismatches.
- **Benefits**: Early detection of type-related bugs, enhanced code readability, and improved maintainability. It can be particularly useful in large codebases or when working in teams, ensuring that everyone adheres to expected data types.
- **Static typing**: In statically typed languages, the data type of a variable is known at compile time, which means that type checking is done during the compilation of the program. Examples include Java, C++, and Haskell. This approach allows for early detection of type-related errors, contributing to more robust and error-resistant code.
- **Dynamic typing**: Dynamically typed languages determine the type of a variable at runtime. This flexibility allows for more rapid development but can introduce type-related errors that are harder to detect early in the development process. Examples of dynamically typed languages are Ruby, JavaScript, PHP, and Python.
- **Gradual typing**: Gradual typing offers a blend of static and dynamic typing, allowing developers to choose when to enforce type constraints. This approach provides the flexibility of dynamic typing while still enabling the benefits of static type checking where it's most useful. Languages that support gradual typing include TypeScript, Dart, and Python (from version 3.5 onwards with type annotations).

## Why is Typing Important in Python Projects?
Additionally, languages can have either a weak or strong type system:

The importance of typing in Python projects, particularly large-scale or complex ones, cannot be overstated. Here are several key reasons:
- **Weak typing**: In languages with weak typing, type coercion is common, allowing for more flexibility in operations between different types but at the risk of unexpected behavior or errors (e.g., 1 + "s" => "1s").
- **Strong typing**: Strongly typed languages enforce stricter rules about interactions between data types, reducing the chances of runtime errors due to unexpected type conversions but requiring more explicit declarations and conversions by the developer (e.g., 1 + "s" => error, str(1) + "s" = "1s").

- **Early Bug Detection**: Detects potential type-related issues at an early stage, preventing bugs that could be costly and time-consuming to debug later.
- **Enhanced Code Clarity**: Type annotations make the code easier to understand, providing clear expectations of what types of data functions will accept and return.
- **Improved Development Workflow**: Assists in developing a more disciplined approach to writing Python code, leading to fewer errors and higher code quality.
- **Facilitates Collaboration**: In team environments, typing ensures that all members have a clear understanding of the function interfaces and data structures used in the project.
- **Integration with IDEs**: Modern IDEs use type hints to provide better code completion, error highlighting, and refactoring tools.
## Why is typing useful in programs?

## Best Practices for Implementing Typing
The role of typing in programming, especially in complex or large-scale projects, is invaluable for several reasons:

1. **Gradual Implementation**: Start by adding type hints to the most critical parts of your codebase. Gradually expand to cover more areas as you become comfortable with the practice.
2. **Comprehensive Coverage**: Aim to cover function arguments, return types, and variable annotations. This comprehensive approach maximizes the benefits of static type checking.
3. **Stay Updated**: Keep abreast of developments in Python's typing system, as new features and improvements are regularly introduced.
4. **Use Specific Types**: Prefer specific type annotations (like `List[int]` instead of just `list`) for greater precision and clarity.
5. **Integrate with CI/CD Pipelines**: Incorporate mypy checks into your continuous integration/continuous deployment workflows to automatically catch type issues before they make it to production.
6. **Team Guidelines**: Establish team guidelines on how and when to use type annotations to maintain consistency across the codebase.
7. **Regular Reviews**: Regularly review the type annotations in your code, especially after major refactoring or updates to Python’s typing module, to ensure they remain accurate and useful.
8. **Leverage Advanced Features**: Explore advanced features of mypy, such as type inference, generic types, and custom type definitions, to handle more complex typing scenarios.
- **Early Bug Detection**: Typing helps in identifying potential type-related issues at the early stages of development, preventing bugs that could become costly and complex to resolve later.
- **Enhanced Code Clarity**: Type annotations clarify the expected data types for function inputs and outputs, making the code more readable and understandable.
- **Improved Development Workflow**: Adopting typing encourages a disciplined coding practice, resulting in fewer errors and enhanced code quality.
- **Facilitates Collaboration**: In team settings, clear type annotations ensure that all members understand the data structures and function interfaces, leading to more effective collaboration.
- **Integration with IDEs**: Advanced IDEs utilize type hints to offer superior code completion, error highlighting, and refactoring capabilities.

TODO: Pandera, Pydantic
Although specifying types requires additional effort, this investment significantly improves the codebase's quality.

## What is the relation between Python and typing?

Python is primarily recognized as a strong and dynamically typed language, allowing programmers to write code without specifying types explicitly. This approach is straightforward but may not be scalable for larger projects. Since Python 3.5, the language has supported gradual typing, enabling developers to annotate types. This feature enhances code clarity and aids in error prevention, especially during development.

For instance, a simple function without type annotations in Python might look like this:

```python
def print_n_times(message, n):
for _ in range(n):
print(message)
```

However, for better clarity and to take advantage of gradual typing, the same function with type annotations would be:

```python
def print_n_times(message: str, n: int) -> None:
for _ in range(n):
print(message)
```

Incorporating type annotations is highly recommended for the benefits they bring in terms of code clarity and early error detection, except in some cases where the effort might not justify the value.

It's important to note that Python types are checked during development time, meaning they're used to verify the program's logic and flow rather than affecting runtime performance or optimization.

To dive deeper into Python typing, exploring resources such as the [Mypy cheatsheet](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html) and Python's built-in [typing module](https://docs.python.org/3/library/typing.html) is beneficial.

## Is it possible to provide types for a dataframe?

Yes, it's possible to provide types for dataframes using the [Pandera](https://pandera.readthedocs.io/en/stable/) library. Pandera offers a flexible and expressive API for validating data in dataframe-like objects, enhancing the readability and robustness of data processing pipelines.

Pandera allows for:

1. Defining a schema once and validating different dataframe types, including pandas, dask, modin, and pyspark.pandas.
2. Checking the types and properties of columns in a pandas DataFrame or values in a pandas Series.
3. Performing complex statistical validations, such as hypothesis testing.
4. Integrating seamlessly with data analysis and processing pipelines through function decorators.
5. Using a class-based API for dataframe models, similar to pydantic, and validating dataframes with typing syntax.
6. Synthesizing data from schema objects for property-based testing.
7. Validating dataframes lazily to execute all validation rules before raising an error.
8. Integrating with a rich ecosystem of Python tools like pydantic, fastapi, and mypy.

Here's an example schema for validating a dataframe in an MLOps codebase:

```python
import pandera as pa
import pandera.typing as papd
import pandera.typing.common as padt

class InputsSchema(pa.DataFrameModel):
"""Schema for the project inputs."""

instant: papd.Index[padt.UInt32] = pa.Field(ge=0, check_name=True)
dteday: papd.Series[padt.DateTime] = pa.Field()
season: papd.Series[padt.UInt8] = pa.Field(isin=[1, 2, 3, 4])
yr: papd.Series[padt.UInt8] = pa.Field(ge=0, le=1)
mnth: papd.Series[padt.UInt8] = pa.Field(ge=1, le=12)
hr: papd.Series[padt.UInt8] = pa.Field(ge=0, le=23)
holiday: papd.Series[padt.Bool] = pa.Field()
weekday: papd.Series[padt.UInt8] = pa.Field(ge=0, le=6)
workingday: papd.Series[padt.Bool] = pa.Field()
weathersit: papd.Series[padt.UInt8] = pa.Field(ge=1, le=4)
temp: papd.Series[padt.Float16] = pa.Field(ge=0, le=1)
atemp: papd.Series[padt.Float16] = pa.Field(ge=0, le=1)
hum: papd.Series[padt.Float16] = pa.Field(ge=0, le=1)
windspeed: papd.Series[padt.Float16] = pa.Field(ge=0, le=1)
casual: papd.Series[padt.UInt32] = pa.Field(ge=0)
registered: papd.Series[padt.UInt32] = pa.Field(ge=0)
```

## Is it possible to provide better types for classes?

Yes, [Pydantic](https://docs.pydantic.dev/latest/) enhances the native class syntax by validating class attributes and providing a cleaner, more efficient syntax.

Features of Pydantic include:

- Validation and serialization powered by type hints, integrating seamlessly with IDEs and static analysis tools.
- High performance due to core validation logic written in Rust.
- Capability to emit JSON Schema for easy integration with other tools.
- Support for both strict and lax modes for data validation.
- Validation for many standard library types, including dataclasses and TypedDicts.
- Extensive customization options for validators and serializers.
- A rich ecosystem of integrations with popular libraries like FastAPI and SQLModel.
- Reliability proven by widespread use across various industries and projects.

Example usage in an MLOps codebase:

```python
import pydantic as pdt

class GridCVSearcher(pdt.BaseModel):
"""Grid searcher with cross-fold validation for better model performance metrics."""

n_jobs: int | None = None
refit: bool = True
verbose: int = 3
error_score: str | float = "raise"
return_train_score: bool = False
```

## How can I check my types with Python?

Mypy is the primary tool for type checking in Python, providing [command-line](https://pypi.org/project/mypy/) and [IDE integration](https://marketplace.visualstudio.com/items?itemName=ms-python.mypy-type-checker) options.

```bash
poetry add -G checkers mypy
mypy src/ tests/
```

Faster alternatives to mypy include:
- [pyright](https://github.com/microsoft/pyright): Static Type Checker for Python. MIT, Microsoft
- [pyre-check](https://github.com/facebook/pyre-check): Performant type-checking for python. MIT, Meta
- [pytype](https://github.com/google/pytype): A static type analyzer for Python code. Apache-2, Google

Compared to other alternatives, Mypy supports additional plugins as we are doing to see below.

## How can I configure mypy to improve my validation workflow?

To enhance your validation workflow, you can configure mypy in your project's `pyproject.toml`. Before committing code, it's advisable to run mypy across your codebase to ensure type correctness. You can ignore the `.mypy_cache/` folders generated by mypy by adding them to your `.gitignore`.

Example mypy configuration in `pyproject.toml`:

```toml
[tool.mypy]
# improve error messages
pretty = true
# enhance strictness level
strict = true
# specify the python version
python_version = "3.12"
# check untyped definitions
check_untyped_defs = true
# all missing imports in code
ignore_missing_imports = true
# enable additional mypy plugins
plugins = ["pandera.mypy", "pydantic.mypy"]
```

If you need to ignore mypy for entire file or single line, you can add the following comment:

```python
def func(a: int, b: int) -> bool: # type: ignore[empty-body]
pass
```

More configuration options are available in the [mypy documentation](https://mypy.readthedocs.io/en/stable/config_file.html).

## What are the best practices for providing types in Python?

- **Follow the 80-20 rule**: Focus on annotating types where it brings the most benefit.
- **Familiarize yourself with the [typing module](https://docs.python.org/3/library/typing.html)**: for further understanding Python types.
- **Use implicit typing judiciously**, as not all variables require explicit annotations.
- **Employ `typing.Any` sparingly** when specific types are not necessary or known.
- **Leverage tools like mypy** for continuous type checking during development.
Loading

0 comments on commit ec525a8

Please sign in to comment.