项目作者: dineshgopal29

项目描述 :
Plagiarism Detector using AWS Sagemaker
高级语言: Jupyter Notebook
项目地址: git://github.com/dineshgopal29/plagiarism_detector_sagemaker.git


Project Description:

Build a plagiarism detector that examines a text file and performs binary classification; labeling that file as either plagiarized or not, depending on how similar that text file is to a provided source text. Detecting plagiarism is an active area of research; the task is non-trivial and the differences between paraphrased answers and original work are often not so obvious.

Techniques Used:

Computing similarity features that measure how similar a given text file is as compared to an original source text by creating features called containment and longest common subsequence.

ML Workflow Used

  • Data Processing
  • Data Cleanup
  • Feature Engineering
  • Model Development
  • Model Training
  • Model Deployment

Tools Used:

Python, sci-kit learn API, AWS Sagemaker, Numpy