The applications data was analyzed for the purpose of developing a supervised identity fraud detection model to identify candidates for fraudulent applications. To build this model, the fraud label was assessed in relation to the linkage of five personally identifiable parameters which include SSN, address, phone number, date of birth and zip code. I created these time-window variables using sqldf library in R because it's efficient and easy to understand.
The applications data was analyzed for the purpose of developing a supervised identity fraud detection model to identify candidates for fraudulent applications. To build this model, the fraud label was assessed in relation to the linkage of five personally identifiable parameters which include SSN, address, phone number, date of birth and zip code. I created these time-window variables using sqldf library in R because it’s efficient and easy to understand.
Creating time-window variables with sqldf.R
in Rstudioapplications.csv
Creating time-window variables with sqldf.R
Variable Creation.pdf