Web Applications
The following are interactive web applications I designed to illustrate important concepts from statistics, data science, and machine learning. These have been used in the data science courses I worked on at Harvard Business School.
Hypothesis Testing - In a blind taste test, could you tell the difference between Coke and Pepsi? This interactive application demonstrates how hypothesis testing could be used to answer this question.
Overfitting - This interactive application demonstrates the concept of overfitting. Users can adjust the highest degree polynomial of a regression model fit to training data, and observe how the model’s performance changes on a test set.
Power Calculations - This application accompanies the case Experimentation at Yelp by Professors Iavor Bojinov and Karim R. Lakhani. It demonstrates how the power of an A/B test changes as a function of the sample size and the click rate of the treatment group.
Queueing Systems - This application simulates M/M/1 and M/M/2 queueing systems based on the arrival and service rates specified by the user.
Unsupervised Clustering - This application accompanies the case Chateau Winery (A): Unsupervised Learning by Professor Srikant M. Datar and Research Associate Caitlin N. Bowler. It demonstrates the application of several different clustering algorithms to customer purchase data from the fictional Chateau Winery.
Confusion Matrices & ROC Curves - Using the spam email data set, this application demonstrates the relationship between model thresholds, confusion matrices, and ROC curves.
Dummy Variables - This application accompanies the case Precision Paint Co. by Professors Iavor Bojinov, Chiara Farronato, and Janice Hammond, Senior Lecturer Michael Parzen, and Research Associate Paul J. Hamilton. It demonstrates the interpretation of dummy variables in a regression model.
The Multiple Testing Problem - This application illustrates alpha inflation and the multiple testing problem.
Simple Random Sampling - This application illustrates simple random sampling, and the concept of using sample statistics to estimate population parameters.
Sampling Theory - This application is a simulation-based introduction to sampling theory. Based on the original work of William Sealy Gosset at Guinness, it allows the user to simulate the random selection of many, many samples at different sample sizes.