Data Science Graduate Coop
January 2020 – March 2020
While I was working towards my Masters of Science in Data Science at Northeastern University, I took a semester off to work on the Data Science Team at The TJX Companies. During this time I worked on a multitude of projects which helped grow my Data Science skills. Two of the projects that highlight my skills and helped the company are detailed below.
Customer Churn Modeling
One of my key responsibilities at TJX was analyzing the company’s website performance. My first project at the company was to analyze the website’s users. This included how many first time customers were arriving and if return customers were coming back. This project lead me to wonder how we could actually determine if repeat customers were leaving, or in other words if a customer was “churning”. There was no set system in place to determine this, so I decided to create one in order to complete my analysis.
To calculate churn the first step was to pull the purchasing history for TJX’s online customers and calculate the days between their purchases. I then used the days between purchases to create a unique Cumulative Distribution Function for each customer. Once the customer reached an abnormally large time between purchases they would be marked as churned. This method of calculating churn is more sophisticated then just using a simple cutoff for all customers, such as marking them as churned after not purchasing in the last 3 months. Each customer has there own behavior and this needed to be taken into account. This project not only gave me the solution to my analysis problem but could be extended to other areas of the business, such as sending retargeting emails to customers who the algorithm marked as churned.
Another project I worked on at TJX was to use machine learning to project the daily website sales. In order to do this I used Facebook’s open source time series model, Prophet. Prophet is an additive model, which can fit non-linear trends in seasonality and daily effects. I recognized this project as a good use case for the model, given I had highly seasonal time series data, aggregated by day. In order to optimize the parameters of the model I used cross validation on the available historical data, shifting the days used to train and evaluate the model. Once fully built and tested, the model performed better than the in house model used by TJX on a daily basis.