Data Science for Managers I (Fall 2021)

MBA course, Harvard Business School, 2021

Teaching Fellow

As Michael Chui, partner of McKinsey Global Institute, once said: “We believe that big data and data analytics are going to be incredibly important…It is becoming a basis of competition in every industry. Companies that learn how to use data effectively are going to be more likely to win in the marketplace and those that don’t are going to fall behind.”

DSM1 is intended for students who want to build an understanding of how data and analytics can help managers make the best decisions. The course is designed for students with little or no background in data analytics, and is suitable for students who plan to work in any industry/sector—from financial services and manufacturing to technology companies—that uses data for decision-making. DSM1 can be taken as a stand-alone course to learn the basic principles of data-driven decision making, or as a building block to more advanced data science courses such as DSM2.

The course will expose students to a range of managerial decisions, from expanding into new customer segments and forecasting future sales, to predicting consumer behavior and evaluating the efficacy of product changes. Students will use the widely-deployed computing software R to conduct analyses and will be guided with “starter code.” Although few HBS MBA students will be actively engaged in coding post-graduation, the course faculty believe that having some exposure to basic, hands-on coding and techniques will allow our students to more effectively manage and interact with data scientists. DSM1 will not delve deeply into technical details, but will require students to engage with some code

Educational Objectives

The educational objectives of DSM1 are to provide students with the necessary foundations to effectively derive and evaluate data-driven insights to inform managerial decisions. Students will learn how to frame and solve business problems while complementing judgment with data-driven insights and gain valuable hands-on experience with R, with the overarching goal of helping students build a robust data analytics mindset. Developing a data analytics mindset requires combining the fundamentals of statistics and computer science with substantive domain knowledge. The course will focus on business applications, including managers’ roles in hypothesis generation and testing, model design, interpretation of results, and the formulation of actionable recommendations.

Content and Organization

The DSM1 course comprises five modules:

Module 1: Exploratory Data Analysis: This module will teach students a variety of approaches (e.g., descriptive statistics, graphs) to explore and gain insights from a data set. Descriptive statistics about key business metrics are aggregations of data that should form the information backbone of every enterprise. For example, sales, revenue, and customer churn are examples of important business metrics.

Module 2: Statistical Inference: Statistical inference, one of the fundamental pillars of data science, is the practice of using a sample to learn something (i.e., draw inferences) about the full population. Using hypothesis tests, we will determine if differences across different groups in the sample (e.g., the two sets of customers in an A/B test) are due to random fluctuations or systematic differences in the underlying populations. We will also use regressions to determine whether relationships seen in the sample hold more generally in the population.

Module 3: Prediction: Prediction is a process that uses historical data to forecast future events. In this module, we will focus on three fundamental prediction methods: regressions, decision trees, and random forests. Mastering these methods will allow students to develop predictions that apply across various business problems and industries.

Module 4: Causal Inference: Causal Inference studies how actions, interventions, or treatments (e.g., launching a new predictive algorithm) affect business metrics (e.g., engagement, click-through rate, or daily units sold). In this module, students will learn about the design of randomized experiments (e.g., A/B tests), the primary method for establishing causal relationships.

Module 5: Data Science Strategy: We will conclude the course with a discussion of how firms can develop their data science capabilities, including how different types of organization and culture might support or hinder a data-driven company.

Course Pedagogy

The classroom sessions will utilize a problem-solving strategy that gives students a clear outline for solving any data science business problem. Emphasis will be put on communication, rigor in analysis, ethical reporting of results and dealing with messy issues such as missing data, nonresponse of participants and study design. Throughout the course, reproducibility of results will be emphasized.