Course project for CS251 Software Systems Lab
This projects aims at making a software for detecting plagiarism between source code files. Here implemented is a website which will allow users to interact with the tool. Any user who opens the website land on the Home Page
where they are given two options - Login
for existing user and Registration
for new user. After doing either, the user is taken to their Dashboard
where there are instrcutions on running the checker. They can select a .tar/.tar.gz/.zip
file to upload for running through the plagiarism checker through the file selector available. There is also option to select boilerplate code to be excluded across all comparisons. Each user has a separate folder created at the backend where all their uploaded files are displayed. They can choose any uploaded file and then click at the button to run the plagiarism checker on it and generate results. The results generated will be a .csv
file with n x n table of similarity between every 2 pair of files and an image with the graphical representation of the results in the form of a colormap. The user has the option to download the results or run the checker again. There is also option to delete any of the existing uploaded files and a Logout
button to allow the user to logout and redirect to the Home Page
.
Here's our documentation of the Core Logic of this project
git clone https://github.com/jsum007/RedPlagCryptoFox
(This won't work now as the repo is private but is added for later)
- To get the Angular Frontend running -
npm install
ng serve --open
- To get the Django Backend running -
cd backend
pip3 install -r requirements.txt
python3 manage.py migrate
python3 manage.py makemigrations
python3 manage.py runserver
The frontend of the project was generated with Angular CLI version 10.1.6.
Run ng serve
for a dev server. Navigate to http://localhost:4200/
. The app will automatically reload if you change any of the source files.
Run ng generate component component-name
to generate a new component. You can also use ng generate directive|pipe|service|class|guard|interface|enum|module
.
Run ng build
to build the project. The build artifacts will be stored in the dist/
directory. Use the --prod
flag for a production build.
Run ng test
to execute the unit tests via Karma.
Run ng e2e
to execute the end-to-end tests via Protractor.
To get more help on the Angular CLI use ng help
or go check out the Angular CLI README.
The backend of the project was generated with Django and the APIs were generated using Django REST Framework.
JSON Web Tokens are an industry standard mechanism for generating a token which can be passed in the HTTP headers of each request, authenticating the user. Any new entry assks for an email address which must not be registered already and a password atleast 8 characters long. These after validation create a new user instance and redirect to the dashboard page.
To implement an authentication system, we used the email address and password model, where after validating this pair, you get a token that works as a credential to give you access to your resources on that service. In our case, the authentication method selected was the JWT. JWT stands for JSON Web Token, and it is an encoded JSON object defined by the RFC 7519 to perform access information exchange between two ends. djangorestframework-jwt
is an extension to Django REST Framework which provides an authentication layer using JSON Web Tokens and has been used in our project.
As we have used tokenization and winnowing as our technique, a major scope of improvement is in tokenization. We do have made separate language specific efficient tokenizers for C++, Python and Java which will handle changes such as addition of redundant code instead of functions, functions reordering and other major reordering of code. In C++, since we are using a parser so, the tokenization is giving optimal results. However, in Python and Java, we have done syntax based tokenization, which can further be improved, by adding special tokens corresponding to class/struct member variables and methods and functions (especially for Java). Further, this checker can be improved by adding more language specific support and implementing more language specific tokenizers. In winnowing, improvements can be made by tuning the value of ‘k’, i.e. the length of kgram, and making it unique for specific languages. Another possibility is by changing the hash function we are using. Currently, we are using SHA-1, we have not experimented with other hash functions, so we may obtain better results by testing other hash functions.
In the frontend and backend, a possible scope of improvement is by adding functionality to upload multiple code files and selecting them to compare at once. Further, tokenization and authentication can be made more robust. We can also use OAuth, instead of JWT Authentication.