Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] - fix(test): fix test workflow #72

Merged
merged 6 commits into from
Nov 10, 2024
Merged

[ci] - fix(test): fix test workflow #72

merged 6 commits into from
Nov 10, 2024

Conversation

JulesBelveze
Copy link
Owner

This PR aims at fixing the test CI workflow

…flow

 - Change the `uv sync` command to install all extras during PR checks
…t code during loading

 - Allow datasets library to execute remote code by setting `trust_remote_code=True`, improving compatibility with datasets hosted externally
 - Refactor the line that loads the dataset to span multiple lines for better code readability
 - Maintain functionality of trusting remote dataset code by setting `trust_remote_code=True` in a more readable format
…eferences

 - Deleted the ConferenceDataset class to streamline local_datasets
 - Removed ConferenceDataset import from the __init__.py to clean up package initialization

[docs] - docs: update data documentation to reflect removed ConferenceDataset

 - Removed reference to ConferenceDataset in the data.rst docs to keep documentation accurate
 - Deleted the `local_datasets` module which managed unlabeled and parallel datasets
 - Data related to BERT squeeze training removed from .gitignore indicating possible deprecation or refactoring

[docs] - docs: update documentation to reflect codebase changes

 - Removed documentation entries for the now-deleted `local_datasets` module in `bert_squeeze`
…ranslation tasks

 - Removed hardcoded text column name in favor of dynamic translation column configuration
 - Added a filter to exclude entries without translations before tokenization
 - Fixed mismatched attention mask column name in tokenized_dataset

[tests] - test: change DistilAssistant test to use `kmfoda/booksum` dataset parameters

 - Updated test cases to use `booksum` dataset path and specific configuration parameters like `percent`, `target_col`, and `source_col`
 - Modified asserts to expect different lengths for train and validation data loaders based on `booksum` dataset
@JulesBelveze JulesBelveze merged commit aec9078 into main Nov 10, 2024
2 checks passed
@JulesBelveze JulesBelveze deleted the fix/ci-test branch November 10, 2024 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant