Projects descriptions
This page describes, in some detail, the data science projects I have recently completed.
Feel free to contact me:
Natural Language Processing •
Machine Learning Classification Projects •
Deep Learning •
Machine Learning Regression Models •
Unsupervised Learning
Natural-language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to fruitfully process large amounts of natural language data. Challenges in natural-language processing frequently involve speech recognition, natural-language understanding, and natural-language generation.
Notebooks and descriptions •
Contact Information
Notebook | Brief Description |
---|---|
neural-language-model-and-spinoza | Spinoza’s Ethics is used to build a language model for text generation with recurrent neural nets. |
sentiment-analysis | A “reverse sentiment analysis” using Bernoulli Naive Bayes was performed on movie reviews (already classified) to identify which words appear more frequently on reviews from each class. |
topic-identification | Tutorial about topic identification (in progress) |
alphabet-human-thought/meaning-of-sentences | In this notebook, it will be shown that using logic formalisms one can find more generic translation mechanisms (in progress) |
alphabet-human-thought/sentence-structure | We will show how to develop formal models for patterns in sequence of words using grammars and parsers (in progress) |
Notebooks and descriptions •
Contact Information
Notebook | Brief Description |
---|---|
predicting-comments-on-reddit | In this project I determine which characteristics of a post on Reddit contribute most to the overall interaction as measured by number of comments |
tennis-matches-prediction | The goal of the project is to predict the probability that the higher-ranked player will win a tennis match. I will call that a win (as opposed to an upset) |
churn-analysis | This project was done in collaboration with Corey Girard. A mobile device company is having a major problem with customer retention. Customers switching from one company to another is called churn. Our goal in this analysis is to understand the problem, identify behaviors which are strongly correlated with churn and to devise a solution |
click-prediction | Many ads are actually sold on a “pay-per-click” (PPC) basis, meaning the company only pays for ad clicks, not ad views. Thus your optimal approach (as a search engine) is actually to choose an ad based on “expected value”, meaning the price of a click times the likelihood that the ad will be clicked […] In order for you to maximize expected value, you therefore need to accurately predict the likelihood that a given ad will be clicked, also known as “click-through rate” (CTR). In this project I will predict the likelihood that a given online ad will be clicked |
Notebooks and descriptions •
Contact Information
Notebook | Brief Description |
---|---|
painter-identifier | I built a Convolutional Neural Net to identify the artist of a painting via transfer learning, instantiating the convolutional part of the Inception V3 model, and training a fully-connected network on top. |
bitcoin-price-analysis | I built predictive models for Bitcoin price data using recurrent neural networks (LSTMs). Correlations between altcoins are also considered. |
keras-tf-tutorial | Neural networks tutorial where I build fully-connected networks and convolutional neural networks using both Keras and TensorFlow respectively (in progress). |
transfer-learning-mini-tutorial | I illustrate the use of transfer learning using the Inception V3 deep neural network model. |
Notebooks and descriptions •
Contact Information
Notebook | Brief Description |
---|---|
retail-store-expansion-analysis-with-lasso-and-ridge-regressions | Based on a dataset containing the spirits purchase information of Iowa Class E liquor licensees by product and date of purchase this project provides recommendations on where to open new stores in the state of Iowa. To devise an expansion strategy, I first needed to understand the data and for that I conducted a thorough exploratory data analysis (EDA). With the data in hand I built multivariate regression models of total sales by county, using both Lasso and Ridge regularization, and based on these models, I made recommendations about new locations. |
conjoint-analysis | Conjoint analysis is a technique that allows researchers to predict consumers’ choice share. The analysis can be programmed using standard question types, such as the MaxDiff variation of the Matrix Table question. Instead of directly asking the survey respondents which attributes they find most relevant, conjoint analysis asks respondents to evaluate potential product profiles which include multiple product features. There are several ways to show to respondents the product profiles. In Choice-Based Conjoint (CBC) respondents are shown multiple product conceptsn and asked which option they would choose. By varying the features shown to the respondents and observing their responses to the product profiles, one can statistically deduce the most desired product features and which attributes have the most impact on choice. The end result is a set of preference scores or part-worth utilities for each level of each attribute. In this notebook I show how to use Python to calculate the utilities. The notebook is heavily based on this course and this book. |
Notebooks and descriptions •
Contact Information
Notebook | Brief Description |
---|---|
topic-modeling | In this notebook, I will use Python and its libraries for topic modeling. In topic modeling, statistical models are used to identify topics or categories in a document or a set of documents. I will use one specific method called Latent Dirichlet Allocation (LDA) and apply it to labels on research papers. |
clustering-for-customer-segmentation | In this project I will apply clustering algorithms to the dataset Wholesale Customers Data Set from the UCI Machine Learning Repository. The dataset contains customers’ spending amounts of several product categories. |
network-analysis | Neural networks tutorial where I build fully-connected networks and convolutional neural networks using both Keras and TensorFlow respectively (in progress). |