Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some form of plagiarism detection and relevance scoring #9

Open
retypepassword opened this issue Mar 5, 2016 · 0 comments
Open

Comments

@retypepassword
Copy link
Owner

The NB assignments have the issue that a lot of the answers are simply copied and pasted from somewhere (often the first Google search result), defeating the purpose of the assignments.

Plagiarism detection could help mitigate this issue somewhat. One way to do plagiarism detection is to compare comments against one another and set a maximum similarity threshold (given a minimum length) beyond which a comment is flagged for plagiarism. A randomly selected sample of comments (or samples of comments from groups of comments that bear high similarity to one another) could also be checked against Google search results using the custom search engine API (search.cse.list) for plagiarism.

An additional feature that could be used in conjunction with plagiarism detection could be a relevance score. Relevance could be calculated by checking words in a comment against the corpus of all words (excluding the 100 most commonly used words) used in responding to the assignment. It would also be important to ensure that a sufficient number of different words is used so that a comment with just one or two words repeated over and over is not given a high relevance score.

These features would likely require significant time and effort to implement, so I'm not terribly inclined to implement them unless there's a consensus that they would be beneficial, useful, and necessary.

@retypepassword retypepassword changed the title Add some form of plagiarism detection and relevance detection Add some form of plagiarism detection and relevance scoring Mar 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant