Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Review and Collaborative Corrections #2

Open
fzanart opened this issue Jul 31, 2023 · 0 comments
Open

Dataset Review and Collaborative Corrections #2

fzanart opened this issue Jul 31, 2023 · 0 comments

Comments

@fzanart
Copy link

fzanart commented Jul 31, 2023

Hi @tmakesense,

Thanks for reviewing this dataset; I'm planning to work with it too. Not sure which tool you are using to review it, but at least I'm using Python/Pandas, and I would like to point out some things:

First, when I open it like this:

import pandas as pd
import numpy as np
df = pd.read_csv('https://raw.githubusercontent.com/tmakesense/logical-fallacy/main/dataset-fixed/edu_all_fixed.csv')

I noticed that some of the last rows have cells in the wrong columns. To fix it, I did:

df.loc[2208:, 'explanations'] = df.loc[2208:, 'source_article']
df.loc[2208:, 'source_article'] = df.loc[2208:, 'old_label']
df.loc[2208:, 'old_label'] = np.nan

Second, I still found duplicated fallacies in the source_article column. Please take a look at:

df[df.duplicated(subset='source_article', keep=False)].sort_values('source_article')

To fix it, I would propose deleting the following rows:

to_drop = [2218,2211,1052,381,2097,2122,1937,2213,2219,2133,2221,1054,2224,2220,2212,182,2217,2215,2214,2146,2117,2223,2222,2216,2210,2208,2209]

But I have a doubt about the rows 1159 and 2082. The source_article "'Running the government is like running a business. You can’t keep running into debt and expect to be successful.'" has two labels, "fallacy of logic" and "faulty generalization." Which one should be the correct one?

I hope you can review these changes and let me know your thoughts.
Btw, I'm assuming the correct labels are in the first column updated_label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant