Plagiarism Detector using AWS Sagemaker
Build a plagiarism detector that examines a text file and performs binary classification; labeling that file as either plagiarized or not, depending on how similar that text file is to a provided source text. Detecting plagiarism is an active area of research; the task is non-trivial and the differences between paraphrased answers and original work are often not so obvious.
Computing similarity features that measure how similar a given text file is as compared to an original source text by creating features called containment and longest common subsequence.
Python, sci-kit learn API, AWS Sagemaker, Numpy