-
-
Notifications
You must be signed in to change notification settings - Fork 49
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
441 additions
and
102 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,35 +1,182 @@ | ||
# 4.0. Typing | ||
|
||
## What is Typing in Python? | ||
## What is programming typing? | ||
|
||
Typing in Python refers to specifying the types of variables, function parameters, and return values in your code. This practice, known as type annotation, was introduced in Python 3.5 and has been increasingly adopted for its benefits in code clarity and error prevention. | ||
Typing in programming involves designating specific data types for variables, functions, and classes within a programming language. This concept is critical for managing how data is stored, processed, and interacted within software applications. | ||
|
||
### Tool for Typing in Python: | ||
Programming languages are categorized into three main types based on how they handle typing: | ||
|
||
1. **mypy**: | ||
- **Description**: mypy is a popular static type checker for Python. It uses the type hints you provide in your code to verify that your code adheres to these types, catching potential bugs and inconsistencies before runtime. | ||
- **Usage**: After adding type hints to your Python code, run mypy to analyze the codebase. It checks for type errors and reports mismatches. | ||
- **Benefits**: Early detection of type-related bugs, enhanced code readability, and improved maintainability. It can be particularly useful in large codebases or when working in teams, ensuring that everyone adheres to expected data types. | ||
- **Static typing**: In statically typed languages, the data type of a variable is known at compile time, which means that type checking is done during the compilation of the program. Examples include Java, C++, and Haskell. This approach allows for early detection of type-related errors, contributing to more robust and error-resistant code. | ||
- **Dynamic typing**: Dynamically typed languages determine the type of a variable at runtime. This flexibility allows for more rapid development but can introduce type-related errors that are harder to detect early in the development process. Examples of dynamically typed languages are Ruby, JavaScript, PHP, and Python. | ||
- **Gradual typing**: Gradual typing offers a blend of static and dynamic typing, allowing developers to choose when to enforce type constraints. This approach provides the flexibility of dynamic typing while still enabling the benefits of static type checking where it's most useful. Languages that support gradual typing include TypeScript, Dart, and Python (from version 3.5 onwards with type annotations). | ||
|
||
## Why is Typing Important in Python Projects? | ||
Additionally, languages can have either a weak or strong type system: | ||
|
||
The importance of typing in Python projects, particularly large-scale or complex ones, cannot be overstated. Here are several key reasons: | ||
- **Weak typing**: In languages with weak typing, type coercion is common, allowing for more flexibility in operations between different types but at the risk of unexpected behavior or errors (e.g., 1 + "s" => "1s"). | ||
- **Strong typing**: Strongly typed languages enforce stricter rules about interactions between data types, reducing the chances of runtime errors due to unexpected type conversions but requiring more explicit declarations and conversions by the developer (e.g., 1 + "s" => error, str(1) + "s" = "1s"). | ||
|
||
- **Early Bug Detection**: Detects potential type-related issues at an early stage, preventing bugs that could be costly and time-consuming to debug later. | ||
- **Enhanced Code Clarity**: Type annotations make the code easier to understand, providing clear expectations of what types of data functions will accept and return. | ||
- **Improved Development Workflow**: Assists in developing a more disciplined approach to writing Python code, leading to fewer errors and higher code quality. | ||
- **Facilitates Collaboration**: In team environments, typing ensures that all members have a clear understanding of the function interfaces and data structures used in the project. | ||
- **Integration with IDEs**: Modern IDEs use type hints to provide better code completion, error highlighting, and refactoring tools. | ||
## Why is typing useful in programs? | ||
|
||
## Best Practices for Implementing Typing | ||
The role of typing in programming, especially in complex or large-scale projects, is invaluable for several reasons: | ||
|
||
1. **Gradual Implementation**: Start by adding type hints to the most critical parts of your codebase. Gradually expand to cover more areas as you become comfortable with the practice. | ||
2. **Comprehensive Coverage**: Aim to cover function arguments, return types, and variable annotations. This comprehensive approach maximizes the benefits of static type checking. | ||
3. **Stay Updated**: Keep abreast of developments in Python's typing system, as new features and improvements are regularly introduced. | ||
4. **Use Specific Types**: Prefer specific type annotations (like `List[int]` instead of just `list`) for greater precision and clarity. | ||
5. **Integrate with CI/CD Pipelines**: Incorporate mypy checks into your continuous integration/continuous deployment workflows to automatically catch type issues before they make it to production. | ||
6. **Team Guidelines**: Establish team guidelines on how and when to use type annotations to maintain consistency across the codebase. | ||
7. **Regular Reviews**: Regularly review the type annotations in your code, especially after major refactoring or updates to Python’s typing module, to ensure they remain accurate and useful. | ||
8. **Leverage Advanced Features**: Explore advanced features of mypy, such as type inference, generic types, and custom type definitions, to handle more complex typing scenarios. | ||
- **Early Bug Detection**: Typing helps in identifying potential type-related issues at the early stages of development, preventing bugs that could become costly and complex to resolve later. | ||
- **Enhanced Code Clarity**: Type annotations clarify the expected data types for function inputs and outputs, making the code more readable and understandable. | ||
- **Improved Development Workflow**: Adopting typing encourages a disciplined coding practice, resulting in fewer errors and enhanced code quality. | ||
- **Facilitates Collaboration**: In team settings, clear type annotations ensure that all members understand the data structures and function interfaces, leading to more effective collaboration. | ||
- **Integration with IDEs**: Advanced IDEs utilize type hints to offer superior code completion, error highlighting, and refactoring capabilities. | ||
|
||
TODO: Pandera, Pydantic | ||
Although specifying types requires additional effort, this investment significantly improves the codebase's quality. | ||
|
||
## What is the relation between Python and typing? | ||
|
||
Python is primarily recognized as a strong and dynamically typed language, allowing programmers to write code without specifying types explicitly. This approach is straightforward but may not be scalable for larger projects. Since Python 3.5, the language has supported gradual typing, enabling developers to annotate types. This feature enhances code clarity and aids in error prevention, especially during development. | ||
|
||
For instance, a simple function without type annotations in Python might look like this: | ||
|
||
```python | ||
def print_n_times(message, n): | ||
for _ in range(n): | ||
print(message) | ||
``` | ||
|
||
However, for better clarity and to take advantage of gradual typing, the same function with type annotations would be: | ||
|
||
```python | ||
def print_n_times(message: str, n: int) -> None: | ||
for _ in range(n): | ||
print(message) | ||
``` | ||
|
||
Incorporating type annotations is highly recommended for the benefits they bring in terms of code clarity and early error detection, except in some cases where the effort might not justify the value. | ||
|
||
It's important to note that Python types are checked during development time, meaning they're used to verify the program's logic and flow rather than affecting runtime performance or optimization. | ||
|
||
To dive deeper into Python typing, exploring resources such as the [Mypy cheatsheet](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html) and Python's built-in [typing module](https://docs.python.org/3/library/typing.html) is beneficial. | ||
|
||
## Is it possible to provide types for a dataframe? | ||
|
||
Yes, it's possible to provide types for dataframes using the [Pandera](https://pandera.readthedocs.io/en/stable/) library. Pandera offers a flexible and expressive API for validating data in dataframe-like objects, enhancing the readability and robustness of data processing pipelines. | ||
|
||
Pandera allows for: | ||
|
||
1. Defining a schema once and validating different dataframe types, including pandas, dask, modin, and pyspark.pandas. | ||
2. Checking the types and properties of columns in a pandas DataFrame or values in a pandas Series. | ||
3. Performing complex statistical validations, such as hypothesis testing. | ||
4. Integrating seamlessly with data analysis and processing pipelines through function decorators. | ||
5. Using a class-based API for dataframe models, similar to pydantic, and validating dataframes with typing syntax. | ||
6. Synthesizing data from schema objects for property-based testing. | ||
7. Validating dataframes lazily to execute all validation rules before raising an error. | ||
8. Integrating with a rich ecosystem of Python tools like pydantic, fastapi, and mypy. | ||
|
||
Here's an example schema for validating a dataframe in an MLOps codebase: | ||
|
||
```python | ||
import pandera as pa | ||
import pandera.typing as papd | ||
import pandera.typing.common as padt | ||
|
||
class InputsSchema(pa.DataFrameModel): | ||
"""Schema for the project inputs.""" | ||
|
||
instant: papd.Index[padt.UInt32] = pa.Field(ge=0, check_name=True) | ||
dteday: papd.Series[padt.DateTime] = pa.Field() | ||
season: papd.Series[padt.UInt8] = pa.Field(isin=[1, 2, 3, 4]) | ||
yr: papd.Series[padt.UInt8] = pa.Field(ge=0, le=1) | ||
mnth: papd.Series[padt.UInt8] = pa.Field(ge=1, le=12) | ||
hr: papd.Series[padt.UInt8] = pa.Field(ge=0, le=23) | ||
holiday: papd.Series[padt.Bool] = pa.Field() | ||
weekday: papd.Series[padt.UInt8] = pa.Field(ge=0, le=6) | ||
workingday: papd.Series[padt.Bool] = pa.Field() | ||
weathersit: papd.Series[padt.UInt8] = pa.Field(ge=1, le=4) | ||
temp: papd.Series[padt.Float16] = pa.Field(ge=0, le=1) | ||
atemp: papd.Series[padt.Float16] = pa.Field(ge=0, le=1) | ||
hum: papd.Series[padt.Float16] = pa.Field(ge=0, le=1) | ||
windspeed: papd.Series[padt.Float16] = pa.Field(ge=0, le=1) | ||
casual: papd.Series[padt.UInt32] = pa.Field(ge=0) | ||
registered: papd.Series[padt.UInt32] = pa.Field(ge=0) | ||
``` | ||
|
||
## Is it possible to provide better types for classes? | ||
|
||
Yes, [Pydantic](https://docs.pydantic.dev/latest/) enhances the native class syntax by validating class attributes and providing a cleaner, more efficient syntax. | ||
|
||
Features of Pydantic include: | ||
|
||
- Validation and serialization powered by type hints, integrating seamlessly with IDEs and static analysis tools. | ||
- High performance due to core validation logic written in Rust. | ||
- Capability to emit JSON Schema for easy integration with other tools. | ||
- Support for both strict and lax modes for data validation. | ||
- Validation for many standard library types, including dataclasses and TypedDicts. | ||
- Extensive customization options for validators and serializers. | ||
- A rich ecosystem of integrations with popular libraries like FastAPI and SQLModel. | ||
- Reliability proven by widespread use across various industries and projects. | ||
|
||
Example usage in an MLOps codebase: | ||
|
||
```python | ||
import pydantic as pdt | ||
|
||
class GridCVSearcher(pdt.BaseModel): | ||
"""Grid searcher with cross-fold validation for better model performance metrics.""" | ||
|
||
n_jobs: int | None = None | ||
refit: bool = True | ||
verbose: int = 3 | ||
error_score: str | float = "raise" | ||
return_train_score: bool = False | ||
``` | ||
|
||
## How can I check my types with Python? | ||
|
||
Mypy is the primary tool for type checking in Python, providing [command-line](https://pypi.org/project/mypy/) and [IDE integration](https://marketplace.visualstudio.com/items?itemName=ms-python.mypy-type-checker) options. | ||
|
||
```bash | ||
poetry add -G checkers mypy | ||
mypy src/ tests/ | ||
``` | ||
|
||
Faster alternatives to mypy include: | ||
- [pyright](https://github.com/microsoft/pyright): Static Type Checker for Python. MIT, Microsoft | ||
- [pyre-check](https://github.com/facebook/pyre-check): Performant type-checking for python. MIT, Meta | ||
- [pytype](https://github.com/google/pytype): A static type analyzer for Python code. Apache-2, Google | ||
|
||
Compared to other alternatives, Mypy supports additional plugins as we are doing to see below. | ||
|
||
## How can I configure mypy to improve my validation workflow? | ||
|
||
To enhance your validation workflow, you can configure mypy in your project's `pyproject.toml`. Before committing code, it's advisable to run mypy across your codebase to ensure type correctness. You can ignore the `.mypy_cache/` folders generated by mypy by adding them to your `.gitignore`. | ||
|
||
Example mypy configuration in `pyproject.toml`: | ||
|
||
```toml | ||
[tool.mypy] | ||
# improve error messages | ||
pretty = true | ||
# enhance strictness level | ||
strict = true | ||
# specify the python version | ||
python_version = "3.12" | ||
# check untyped definitions | ||
check_untyped_defs = true | ||
# all missing imports in code | ||
ignore_missing_imports = true | ||
# enable additional mypy plugins | ||
plugins = ["pandera.mypy", "pydantic.mypy"] | ||
``` | ||
|
||
If you need to ignore mypy for entire file or single line, you can add the following comment: | ||
|
||
```python | ||
def func(a: int, b: int) -> bool: # type: ignore[empty-body] | ||
pass | ||
``` | ||
|
||
More configuration options are available in the [mypy documentation](https://mypy.readthedocs.io/en/stable/config_file.html). | ||
|
||
## What are the best practices for providing types in Python? | ||
|
||
- **Follow the 80-20 rule**: Focus on annotating types where it brings the most benefit. | ||
- **Familiarize yourself with the [typing module](https://docs.python.org/3/library/typing.html)**: for further understanding Python types. | ||
- **Use implicit typing judiciously**, as not all variables require explicit annotations. | ||
- **Employ `typing.Any` sparingly** when specific types are not necessary or known. | ||
- **Leverage tools like mypy** for continuous type checking during development. |
Oops, something went wrong.