Skip to content

Commit

Permalink
Refactor of template files (#55)
Browse files Browse the repository at this point in the history
 * Adds a base class for templates and moves duplicate code to the base class

 * passes wordcloud flag to docs renderer

 * adds allow overwrite flag

 * adds a catch for using the wrong template with the wrong template class

 * cleans up name passing (name is now only in the template class, or the actual template, no more random strings)

 * cleans up valid template checker

 * cleans up the console output

 * format the documents using ruff

 * add an extra line to the filehandler writer, that way we can remove all the double empty lines in all templates

 * update the workflow

 * add a config DEFAULT object

 * add prohibited arguments to templates

 * refactor platform detection code

 * refactor fp_template code

 * n_runs only adds a _{{ run }} to the filename if n_runs is more than 1
  • Loading branch information
jteijema authored Apr 18, 2024
1 parent 726a734 commit 1fcda57
Show file tree
Hide file tree
Showing 20 changed files with 542 additions and 628 deletions.
66 changes: 38 additions & 28 deletions .github/workflows/ci-workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,51 +4,61 @@ jobs:
test-template-and-lint:
strategy:
matrix:
os: [macos-latest, windows-latest, ubuntu-latest]
os: [windows-latest, ubuntu-latest]
python-version: ['3.8', '3.12']
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@master
- uses: actions/setup-python@v4
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.8'
python-version: ${{ matrix.python-version }}
architecture: 'x64'
- name: Install makita
- name: Cache Python packages
uses: actions/cache@v4
with:
path: |
${{ runner.os == 'Windows' && 'C:\users\runneradmin\appdata\local\pip\cache' || '~/.cache/pip' }}
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
- name: Install dependencies
run: |
pip install .
- name: Install ruff
pip install . ruff scitree asreview-datatools asreview-insights synergy-dataset
- name: Lint python with ruff
run: |
pip install ruff
ruff check .
- name: Create directories using Python
run: python -c "import os; [os.makedirs(path, exist_ok=True) for path in ['./tmp/basic/data-test', './tmp/arfi/data', './tmp/multimodel/data', './tmp/scripts', './tmp/synergy/data']]"
- name: set up environment
run: |
mkdir tmp
cd tmp
mkdir -p basic/data
mkdir -p arfi/data
mkdir -p multimodel/data
cp ../.github/workflows/test_data/labels.csv basic/data/labels.csv
cp ../.github/workflows/test_data/labels.csv arfi/data/labels.csv
cp ../.github/workflows/test_data/labels.csv multimodel/data/labels.csv
- name: Test makita templates
cp .github/workflows/test_data/labels.csv ./tmp/basic/data-test/labels.csv
cp .github/workflows/test_data/labels.csv ./tmp/arfi/data/labels.csv
cp .github/workflows/test_data/labels.csv ./tmp/multimodel/data/labels.csv
- name: Render makita templates
run: |
cd tmp/basic
asreview makita template basic | tee output.txt
asreview makita template basic --classifier nb --feature_extractor tfidf --query_strategy max --n_runs 1 -s data-test -o output-test --init_seed 1 --model_seed 2 --skip_wordclouds --overwrite --instances_per_query 2 --stop_if min --balance_strategy double | tee output.txt
grep -q "ERROR" output.txt && exit 1 || true
cd ../arfi
asreview makita template arfi | tee output.txt
grep -q "ERROR" output.txt && exit 1 || true
cd ../multimodel
asreview makita template multimodel | tee output.txt
grep -q "ERROR" output.txt && exit 1 || true
- name: Run ShellCheck
- name: Render makita scripts
run: |
asreview makita add-script --all -o ./tmp/scripts | tee output.txt
grep -q "ERROR" output.txt && exit 1 || true
- name: Run SciTree
if: ${{ matrix.os != 'windows-latest' }}
uses: ludeeus/action-shellcheck@master
with:
scandir: './tmp'
env:
SHELLCHECK_OPTS: -e SC2148
- name: Generate makita scripts
run: |
asreview makita add-script --all
- name: Lint python with ruff
cd ./tmp/
scitree
- name: Execute basic template jobs file
if: ${{ matrix.os != 'windows-latest' }}
run: |
ruff .
cd tmp/synergy
synergy_dataset get -d van_de_Schoot_2018 -o ./data -l
asreview makita template basic --instances_per_query 100 --skip_wordclouds --overwrite --n_runs 2
sh jobs.sh
scitree
4 changes: 2 additions & 2 deletions .github/workflows/pythonpackage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: '3.x'
- name: Install dependencies
Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ optional arguments:
--platform PLATFORM Platform to run jobs: Windows, Darwin, Linux. Default: the system of rendering templates.
--n_runs N_RUNS Number of runs. Default: 1.
--no_wordclouds Disables the generation of wordclouds.
--overwrite Automatically accepts all overwrite requests.
--classifier CLASSIFIER Classifier to use. Default: nb.
--feature_extractor FEATURE_EXTRACTOR Feature_extractor to use. Default: tfidf.
--query_strategy QUERY_STRATEGY Query strategy to use. Default: max.
Expand Down Expand Up @@ -148,6 +149,7 @@ optional arguments:
--platform PLATFORM Platform to run jobs: Windows, Darwin, Linux. Default: the system of rendering templates.
--n_priors N_PRIORS Number of priors. Default: 10.
--no_wordclouds Disables the generation of wordclouds.
--overwrite Automatically accepts all overwrite requests.
--classifier CLASSIFIER Classifier to use. Default: nb.
--feature_extractor FEATURE_EXTRACTOR Feature_extractor to use. Default: tfidf.
--query_strategy QUERY_STRATEGY Query strategy to use. Default: max.
Expand Down Expand Up @@ -175,18 +177,19 @@ optional arguments:
--platform PLATFORM Platform to run jobs: Windows, Darwin, Linux. Default: the system of rendering templates.
--n_runs N_RUNS Number of runs. Default: 1.
--no_wordclouds Disables the generation of wordclouds.
--overwrite Automatically accepts all overwrite requests.
--instances_per_query INSTANCES_PER_QUERY Number of instances per query. Default: 1.
--stop_if STOP_IF The number of label actions to simulate. Default 'min' will stop simulating when all relevant records are found.
--classifiers CLASSIFIERS Classifiers to use Default: ['logistic', 'nb', 'rf', 'svm']
--feature_extractors FEATURE_EXTRACTOR Feature extractors to use Default: ['doc2vec', 'sbert', 'tfidf']
--query_strategies QUERY_STRATEGY Query strategies to use Default: ['max']
--balancing_strategies BALANCE_STRATEGY Balance strategies to use Default: ['double']
--balance_strategies BALANCE_STRATEGY Balance strategies to use Default: ['double']
--impossible_models IMPOSSIBLE_MODELS Model combinations to exclude Default: ['nb,doc2vec', 'nb,sbert']
```

If you want to specify certain combinations of classifiers and feature
extractors that should and should not be used, you can use the `--classifiers`,
`--feature_extractors`, `--query_strategies`, `--balancing_strategies` and `--impossible_models` option. For instance, if you
`--feature_extractors`, `--query_strategies`, `--balance_strategies` and `--impossible_models` option. For instance, if you
want to exclude the combinations of `nb` with `doc2vec` and `logistic` with
`tfidf`, use the following command:

Expand Down
Loading

0 comments on commit 1fcda57

Please sign in to comment.