Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_round_3 added to loading #129

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chrsteck
Copy link
Collaborator

No description provided.

@chrsteck
Copy link
Collaborator Author

closes #125

Copy link
Collaborator

@lima-tango lima-tango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just added 3 best practice comments.

I'm excited to see, how our models perform on the set! :)

df_r3 = pd.read_csv(file_cached)

except:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good practice to hande errrors explicitly:

except FileError:

df_r3.query(f'id_article=={i} and {label}==1').sample(1,
random_state=42)))
id_list.extend(list(df_ann.id_post))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment on efficiency: extending a list is not efficient. You could also use df_ann after both for-loops:

id_list = list(df_ann.id_post.unique())
return id_list # or directly return the above

def get_augmented_val_id():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are douplicating code from get_augmented_val_X_y. Best practise would be to use this function (get_augmented_val_id) in the X_y-function.

@dominikmn
Copy link
Owner

dominikmn commented May 16, 2021

In branch issue-131-gbert-advanced-training I had built an alternative solution to #125 (i.e. test the gbert model on the annotation round 3 data). It's found in the 2 commits 81c452d and 4a2a39a combined.
See also this comment in #131 (comment)_
We might want to revise at some point for which solution we want to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants