Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up docs #18

Merged
merged 2 commits into from
Apr 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 0 additions & 12 deletions .devcontainer/Dockerfile

This file was deleted.

33 changes: 0 additions & 33 deletions .devcontainer/devcontainer.json

This file was deleted.

10 changes: 0 additions & 10 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,6 @@ on:
workflow_dispatch:

jobs:
# validation:
# uses: microsoft/action-python/.github/workflows/[email protected]
# with:
# workdir: '.'

build:
runs-on: ubuntu-latest
steps:
Expand All @@ -33,8 +28,3 @@ jobs:
run: ruff format --check .
- name: Run Mypy
run: mypy --ignore-missing-imports .
# publish:
# uses: microsoft/action-python/.github/workflows/[email protected]
# secrets:
# PYPI_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
# TEST_PYPI_PASSWORD: ${{ secrets.TEST_PYPI_PASSWORD }}
22 changes: 2 additions & 20 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,9 @@
"editor.formatOnPaste": true,
"files.trimTrailingWhitespace": true,
"files.autoSave": "onFocusChange",
"git.autofetch": true,
"[jsonc]": {
"editor.defaultFormatter": "vscode.json-language-features"
},
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
"editor.defaultFormatter": "charliermarsh.ruff"
},
"python.defaultInterpreterPath": "/usr/local/bin/python",
"python.formatting.provider": "black",
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"pylint.args": [
"--rcfile=pyproject.toml"
],
"black-formatter.args": [
"--config=pyproject.toml"
],
"flake8.args": [
"--toml-config=pyproject.toml"
],
"isort.args": [
"--settings-path=pyproject.toml"
]
}
}
9 changes: 0 additions & 9 deletions CODE_OF_CONDUCT.md

This file was deleted.

80 changes: 51 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,64 @@
# TF Tabular

### Feature Overview
* Create input layers based on lists of columns
* No model building or training: Build whatever you want on top
* Support custom embeddings
* Support attention for mixing sequence layers
* Support multi-hot categoricals
* Support computing vocab and normalization params?
TF Tabular is a project aimed at simplifying the process of handling tabular data in TensorFlow. It provides utilities for building models on top of numeric, categorical, multihot, and sequential data types.

## Features

### Competitor analysis
* DeepTables:
* This is for TensorFlow
* Broader scope: Includes model building and training
* Pytorch tabular:
* Only Pytorch
* Broader scope: Includes model building and training
* Not focused on recommenders (no support for multi-hot and sequence columns https://github.com/manujosephv/pytorch_tabular/issues/174)
- Create input layers based on lists of columns
- Support custom embeddings
- Support attention for mixing sequence layers
- Support multi-hot categoricals
- No model building or training: Build whatever you want on top


## Project Organization
## Installation

- `.github/workflows`: Contains GitHub Actions used for building, testing, and publishing.
- `.devcontainer/Dockerfile`: Contains Dockerfile to build a development container for VSCode with all the necessary extensions for Python development installed.
- `.devcontainer/devcontainer.json`: Contains the configuration for the development container for VSCode, including the Docker image to use, any additional VSCode extensions to install, and whether or not to mount the project directory into the container.
- `.vscode/settings.json`: Contains VSCode settings specific to the project, such as the Python interpreter to use and the maximum line length for auto-formatting.
- `src`: Place new source code here.
- `tests`: Contains Python-based test cases to validate source code.
- `pyproject.toml`: Contains metadata about the project and configurations for additional tools used to format, lint, type-check, and analyze Python code.
To get started with TF Tabular, you will need to install it using pip:

### `pyproject.toml`
```sh
pip install tf-tabular
```

The pyproject.toml file is a centralized configuration file for modern Python projects. It streamlines the development process by managing project metadata, dependencies, and development tool configurations in a single, structured file. This approach ensures consistency and maintainability, simplifying project setup and enabling developers to focus on writing quality code. Key components include project metadata, required and optional dependencies, development tool configurations (e.g., linters, formatters, and test runners), and build system specifications.
## Usage

In this particular pyproject.toml file, the [build-system] section specifies that the Flit package should be used to build the project. The [project] section provides metadata about the project, such as the name, description, authors, and classifiers. The [project.optional-dependencies] section lists optional dependencies, like pyspark, while the [project.urls] section supplies URLs for project documentation, source code, and issue tracking.
Here is a basic example of how to use TF Tabular:

The file also contains various configuration sections for different tools, including bandit, black, coverage, flake8, pyright, pytest, tox, and pylint. These sections specify settings for each tool, such as the maximum line length for flake8 and the minimum code coverage percentage for coverage.
```python
from tf_tabular.builder import InputBuilder

# Define columns to use and specify additional parameters:
categoricals = ['Pclass', 'no_cabin']
numericals = ['Age', 'Fare']
# ....

## TODO:
* Parse dataset to separate numeric vs categoricals, multihots and sequencials
# Build model:
input_builder = InputBuilder()
input_builder.add_inputs_list(categoricals=categoricals,
numericals=numericals,
normalization_params=norm_params,
vocabs=vocabs,
embedding_dims=embedding_dims)
inputs, output = input_builder.build_input_layers()
output = Dense(1, activation='sigmoid')(output)

model = Model(inputs=inputs, outputs=output)
```

<!-- Which will produce a model like this: -->
<!-- TODO: <Insert NETRON view of model> -->

Look at the examples folder for more complete examples.

## Contributing
Contributions to TF Tabular are welcome. If you have a feature you'd like to add, or a bug you'd like to fix, please open a pull request.

## Roadmap:
This is a list of possible features to be added in the future depending on need and interest expressed by the community.

- [ ] Parse dataset to separate numeric vs categoricals, multihots and sequencials
- [ ] Implement other types of normalization
- [ ] Support computing vocab and normalization params?
- [ ] Improve documentation and provide more usage examples

## License
TF Tabular is licensed under the MIT License. See the LICENSE file for more details.
25 changes: 0 additions & 25 deletions SUPPORT.md

This file was deleted.