You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The NB assignments have the issue that a lot of the answers are simply copied and pasted from somewhere (often the first Google search result), defeating the purpose of the assignments.
Plagiarism detection could help mitigate this issue somewhat. One way to do plagiarism detection is to compare comments against one another and set a maximum similarity threshold (given a minimum length) beyond which a comment is flagged for plagiarism. A randomly selected sample of comments (or samples of comments from groups of comments that bear high similarity to one another) could also be checked against Google search results using the custom search engine API (search.cse.list) for plagiarism.
An additional feature that could be used in conjunction with plagiarism detection could be a relevance score. Relevance could be calculated by checking words in a comment against the corpus of all words (excluding the 100 most commonly used words) used in responding to the assignment. It would also be important to ensure that a sufficient number of different words is used so that a comment with just one or two words repeated over and over is not given a high relevance score.
These features would likely require significant time and effort to implement, so I'm not terribly inclined to implement them unless there's a consensus that they would be beneficial, useful, and necessary.
The text was updated successfully, but these errors were encountered:
retypepassword
changed the title
Add some form of plagiarism detection and relevance detection
Add some form of plagiarism detection and relevance scoring
Mar 5, 2016
The NB assignments have the issue that a lot of the answers are simply copied and pasted from somewhere (often the first Google search result), defeating the purpose of the assignments.
Plagiarism detection could help mitigate this issue somewhat. One way to do plagiarism detection is to compare comments against one another and set a maximum similarity threshold (given a minimum length) beyond which a comment is flagged for plagiarism. A randomly selected sample of comments (or samples of comments from groups of comments that bear high similarity to one another) could also be checked against Google search results using the custom search engine API (
search.cse.list
) for plagiarism.An additional feature that could be used in conjunction with plagiarism detection could be a relevance score. Relevance could be calculated by checking words in a comment against the corpus of all words (excluding the 100 most commonly used words) used in responding to the assignment. It would also be important to ensure that a sufficient number of different words is used so that a comment with just one or two words repeated over and over is not given a high relevance score.
These features would likely require significant time and effort to implement, so I'm not terribly inclined to implement them unless there's a consensus that they would be beneficial, useful, and necessary.
The text was updated successfully, but these errors were encountered: