Data Science for Managers II (Spring 2022)

MBA course, Harvard Business School, 2022

Teaching Fellow

DSM2 is designed for students who already have some exposure to basic data science. DSM2 allows students to build a deeper understanding of how data and analytics can complement judgment for managerial decision making. The course builds on concepts learned in DSM1 and is specifically suited for students who want to continue their career at companies such as technology companies, where data collection, aggregation, and analysis permeates the entire organization.

In the course, students will learn to “data wrangle” (collect and clean data that suit their purposes) and expand their machine learning repertoire to include gradient boosting, neural networks, unsupervised algorithms and natural language processing. As in DSM1/IDS, students will be guided through data analysis with “starter code.” In addition to the computing software R, students will gain familiarity with SQL to manage relational databases.

Educational Objectives

DSM2 builds on the foundations developed in DSM1 to provide students with an understanding of how to create, design, and manage data science projects from inception through data collections, analysis, and reporting. Students will be introduced to new tools (such as SQL) and new techniques (such as K-Means clustering, natural language processing, and Monte Carlo simulation). The course will also explore essential notions of data privacy, algorithmic fairness, and agile project management to prepare students to manage and lead data science projects at any organization level.

Content and Organization

The DSM2 course comprises five modules:

Module 1: Data Wrangling: Students will learn SQL (Structured Query Language), commonly used to store, manipulate and retrieve data from a relational data base.

Module 2: Inference: This module’s topics include probability distributions, Monte Carlo simulation, Clustering, K-Means, and variable transformations and interaction terms in linear and logistic regression.

Module 3: Machine Learning & Artificial Intelligence: This module will examine several concepts key to machine learning: e.g., autocorrelation, time series, Lasso and Ridge regressions, confusion and payoff matrices, ROC curves, neural networks, gradient boosting and algorithmic bias, as well as NLP (Natural Language Processing), including sentiment analysis, naïve Bayes classification, and LDA (Latent Dirichlet allocation).

Module 4: Data Science Strategy & Agile Development: This module will focus on essential strategy questions around data privacy, algorithmic fairness, and agile product development.

The course consists of two types of class sessions: Laboratory sessions in which students practice analyzing assigned problems and class sessions with case discussions. Teaching fellows will be available at each lab session to answer questions and provide both technical and conceptual support. Teaching fellows will also hold office hours and review sessions outside of class.

Course Pedagogy

The classroom sessions will utilize a problem-solving strategy that gives students a clear outline for solving any data science business problem. Emphasis will be put on communication, rigor in analysis, ethical reporting of results and dealing with messy issues such as missing data, nonresponse of participants and study design. Throughout the course, reproducibility of results will be emphasized.