项目作者: psanghal

项目描述 :
UMSI-Wikipedia Text Difficulty Prediction (NLP Project)
高级语言: Jupyter Notebook
项目地址: git://github.com/psanghal/text-difficulty-prediction.git
创建时间: 2021-06-02T18:59:23Z
项目社区:https://github.com/psanghal/text-difficulty-prediction

开源协议:MIT License

下载


text-difficulty-prediction

University of Michigan: Milestone Project 2

Project Description:
Applied supervised and unsupervised learning techniques on Wikipedia text to predict sentences which will need to be simplified for readers to make it easier to understand. Readers may include students, children, adults with learning/reading disability, and non-native English speakers.

Project Workflow:
This project contains 5 jupyter notebooks. It begins with extracting features from the original text and then goes on to implementing supervised and unsupervised learning models using extracted features and text tokenizers such as TFIDF, Sentence Piece, and Keras Tokenizer. The goal of doing this was to assess the effectiveness of feature representation in classifying text difficulty as well understand which steps in manual feature extraction worked well Vs could be improved in future.

Please refer to following jupyter notebooks for code implementation.

  1. Text Difficulty-Feature Extraction-Final
  2. Text Difficulty-Supervised Models-Final
  3. Text Difficulty- Deep Learning-Final
  4. Text Difficulty-Unsupervised Models-Final
  5. Text Difficulty-Topic Modelling-Final
    Features extracted from the first notebook “Text Difficulty-Feature Extraction-Final” has been used extensively in all notebooks to save computational time.

Please click on the dataset to view the file.