gourmet club at Columbia Business School runs an event called "Michelin Week" every semester. The club secures a few coveted reservations at Michelin starred restaurants throughout New York City, and subsidizes seats for a few lucky club members.
Given the scarcity of seats, and the enormous number of applicants the club struggled to figure out a way to allocate seats fairly. They reached out to me for help to design a lottery system that would automatically make these assignments given student preferences.
The resulting system automatically orchestrates the collection of preferences through personalized Google forms, confirms preferences by emails, and then makes assignments and notifies students. To ensure fairness and transparency, the entire process happens in the open. this document summarizes the system.
This case looks at a model describing the emergence of network effects in ridesharing platforms. I used it to teach The Analytics Advantage for a few years.
It is one of the great ironies of today’s retail environment that as almost every facet of the industry becomes more technologically advanced, some of its oldest and most low-tech problems are emerging as key differentiators. Supply chain management is a venerable field with a rich history. During most of my PhD, and since, I have been interested in expanding the field to handle more modern fulfillment problems. I was fortunate, during my PhD, to spend some time at Amazon, where I saw the need for this research first-hand.
Here is a selection of projects that I have worked on in this area, with my PhD advisors Awi Federgruen and Garud Iyengar, and (for the later projects) with Xujia Liu, a PhD student in the IEOR division of the engineering school at Columbia
This case, used in CBS's "Python for MBAs" class, teaches students OAuth and APIs - and in particular, guides them through the process of accessing and managing emails through gmail. It also briefly covers the use of Python to create Excel and Power Point documents.
Bail funds, organizations that collect money to post bail for community members who are eligible for cash bail to stay out of jail until the time of their trial, have grown throughout US history, buoyed by major civil rights events such as the Red Scares, Civil Rights movements, Vietnam War Protests, among others. Given the volume of potential community members who would benefit from bail fund assistance and the plethora of defendant information required to process the bail payments, such funds like the Springfield Bail Fund face a logistical and management challenge. In this case, students will learn about the US justice and bail systems and use Python to solve a bail fund’s caseload management difficulties.
This case, used in CBS's core "Operations Management" class, teaches students about process improvement, root cause analysis, and statistical process control.
The Estée Lauder Companies (ELC) is a leading manufacturer and marketer of prestige skincare, makeup, fragrance, and haircare products with annual sales of over $14 billion in 2020. ELC had a long-standing commitment to "High Touch" customer service, inviting and tracking all feedback from consumers. ELC had recently moved from a manual to an automated tracking platform, positioning the company to more efficiently handle customer feedback. This platform and the data it aggregated was put to the test in August 2020 when the Consumer Care Analytics & Insights Group noticed an uptick in contact volume. The team also noticed that the increase was driven almost exclusively by complaints. The spike was especially cause for concern given that ELC had a 100% return policy with no questions asked, so customer complaints often led to returns and lost revenue. In this case, students are asked to consider potential root causes of the increased online customer complaints—and based on their hypotheses, what type of analysis and potential interventions they would recommend.
Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there for licensing
After using Wealthfront for a few years, I decided I wanted more control over my portfolio. I designed a script that uses the eTrade API to automatically invest in a balanced portfolio of ETFs. Whenever any ETF loses value, the script automatically sells it and buys an alternate ETFs to harvest tax losses. It also uses a simple quadratic optimization problem to figure out the best way to use the cash in the portfolio to buy ETFs in a way that best matches the target allocation.
In addition to the script itself, this code might be useuful in producing a simple interface to the eTrade API.
The code is available on github here. In addition, this page contains a sample run of the algorithm.
I have found that one of the hardest parts of data science to teach is the first step - getting from the "real life" scenario to a predictive analytics problem that can then be addressed using data science techniques. This case helps me teach this concept by introducing a complex problem, and then guiding through students through a discussion that helps us define a predcitive analytics problem to solve it - we discuss target variables, units of analyses, evaluation metrics, and more.
This case describes the process and the salient factors traders must consider in pricing and selling products in the secondary market for previously issued bonds. Since there is typically no readily available, agreed-upon price for a bond, a bank will quote its own price of a specific bond to buyers or sellers, with the set price dependent on many factors, including the face value of the bond, the remaining time to maturity, the prevailing interest rate environment, the health of the company in question, and the supply/demand imbalance for the bond. Buyers typically send out inquiries (requests for quotes, or RFQs) to many banks in search of the best price—and sellers typically spend a great deal of time with RFQ-management to track these RFQs in order to determine which ones are worth pricing. This case outlines the myriad of challenges facing both sell-side and buy-side traders and asks students to suggest how predictive analytics might streamline the bond trading process—and what data would be required to implement their suggestions.
Every year, around 650 MBA students join Columbia Business School. They must be divided into 9 roughly equal clusters, each comprising several learning teams of four to six people. To ensure students have the best experience possible, it is important for these clusters to be balanced on many dimensions. For example, it wouldn't be very helpful from, a networking perspective, if every student who did their undergrad at Columbia were in one cluster, or every student who worked in banking before business were in one cluster.
The resulting problem is very difficult to solve manually because of the number of categories that must be balanced is quite large. To solve this problem, I designed an algorithm that automatically clusters students perfectly - every time. It has been used to create clusters at Columbia Business School every year since 2020.
I use this case in my AI sequence to introduce (a) the design choices that go into designing an algorithm to run an autonomous vehicle (b) the Classification and Regresssion Tree algorithm.
This case in used in Columbia's Core "Operations Management" class to teach modern fulfillment strategies using linear programming and shadow prices/opportunity costs.
Founded in 2018 with a mission to revolutionize the way America exercises, FlexiWeight had designed an innovative set of compact, Blue-tooth-enabled adjustable weights suitable for at-home use. The concept was an immediate hit and, perhaps fueled by a rise in at-home exercising during the COVID-19 pandemic, the product quickly grew from a niche product to a must-have accessory for workout buffs across the United States. Demand for the product skyrocketed, and the founders realized the ad-hoc shipping and fulfillment systems they were using would not be able to keep up. This case presents students with a historical perspective on companies’ fulfillment challenges and the recent trend to move from a single-channel fulfillment system to a more complex microfulfillment strategy—and asks them to consider the best path forward for Flexiweight.
Rohit Chauhan, Mastercard’s executive vice president of Artificial Intelligence since 2018, was tasked to lead the organization on a path to become an AI powerhouse. Chauhan believed that in order to accomplish this goal, AI would need to be deployed in every part of the company. Mastercard had identified four areas where AI could generate the greatest value: customer acquisition, portfolio optimization, risk and fraud management, and customer servicing. But it was unclear what organizational structure would help Mastercard to achieve its goal. One was to centralize its AI efforts by creating a new division staffed with AI experts; an alternative was to seed existing divisions with AI staff. In this case, students will consider which organizational structure would serve to maximize AI’s potential— a centralized approach, in which AI experts were assigned to collaborate with colleagues from other divisions on a project-by-project basis; or a decentralized approach, in which expertise and authority were dispersed across divisions.
This case in used in Columbia's Core "Operations Management" class to teach process analysis.
Aroma Consumer Products Co. has served as Chesapeake Bay Candle Co.’s Hangzhou, China-based manufacturing partner since the company’s inception and has helped Chesapeake sell candles to its initial US clients in the high-end tier of candles. With a growing reputation within the niche space, Chesapeake has recently captured a major customer in the mid-tier candle category and Aroma Consumer Products is suddenly facing a much greater production mandate. Working harder wouldn’t be enough—they would also have to work smarter, and she realized that this would mean making significant changes. In this case, students will learn about the manufacturing and distribution process in the candle industry, discuss operational bottlenecks, and determine operational capacity as well as ways to increase it to meet near term demand spikes.
This case, used in CBS's "Python for MBAs" class, teaches students pandas, and data analysis in Python, using the real story of the Dig restaurant chain.
Dig, a restaurant chain with a menu focused on seasonal produce, intentionally sourced, and offered at an accessible price point, was poised to expand. While the organization had collected and used data from its inception, management realized expansion would require a more systematic approach to driving value using data-driven insights at every stage of this value chain. This case chronicles Dig's journey from intuition to data-driven analytics, from the creation of a solid data foundation to the development of dynamic decision-support tools touching many parts of the company.
I developed this tool to teach by advanced AI sequence to MBAs using excel. See here for more details.
The CBS DRO team developed this tool as part of our core Business Analytics class at CBS. Unlike xlkitlearn, it is written in VBA and does not rely on Python. It is therefore simpler, and the scale of data it can handle is smaller, but this works just fine for our core class. It augments Excel with functions that can carry out logistic regression, k-nearest-neighbors predictions, and Monte Carlo simulation.
The add-in can be installed here, and the source code for the add-in is available here.
The MNIST database is one of the older sources of training data for machine learning algorithms; it comprises handwritten digits, together with human-generated labels.
This case, which I use in my advanced AI sequence for MBAs, provides a way to work with this dataset using only Excel, to make it accessible to a broad audience. I teach it in the context of a case about the USPS which - even in these days of emails and electronic communications - handles almost 150 billion pieces of mail a year. Doing this quickly and efficiently is a truly mythical task. Using only Excel and XLKitLearn, we develop an algorithm which achieves 92% accuracy on the MNIST dataset. This is a far cry from the performance that can be achieved with modern, deep-learning techniques, but it is impressive for an Excel-based system, and acts as an introduction to the world of unstructured data.
I'd be happy to share the materials for this case with anyone interested - feel free to reach out.
I use this case in my advanced AI sequence for MBAs to teach text analytics.
AI-driven text mining, a relatively new business analytics tool, allows users to unlock troves of information contained in documents and make them searchable by content and metadata. In this two-part case, I first introduce Evisort, a start-up seeking to create AI-enhanced software providing contract management and processing solutions for attorneys and business professionals, and discuss the challenges and opportunities inherent in such a startup. I then provide an introduction to the science of text analytics.Review copy (stamped 'do not copy') of part 1 and part 2. Published at Columbia Caseworks (part 1 and part 2), and available there with full teaching notes, data files, and solutions.
I use this case in my advanced AI sequence for MBAs to teach random forests and ensemble models to students. The case is also useful in helping students define an analytic problem - there are several features provided in the data that seem useful at first glance, but that actually should not be used - these lead to interesting discussions in class.
We develop a number of data-driven investment strategies that demonstrate how machine learning and data analytics can be used to guide investments in peer-to-peer loans. We detail the process starting with the acquisition of (real) data from a peer-to-peer lending platform all the way to the development and evaluation of investment strategies based on a variety of approaches. We focus heavily on how to apply and evaluate the data science methods, and resulting strategies, in a real-world business setting. The material presented in this article can be used by instructors who teach data science courses, at the undergraduate or graduate levels. Importantly, we go beyond just evaluating predictive performance of models, to assess how well the strategies would actually perform, using real, publicly available data. Our treatment is comprehensive and ranges from qualitative to technical, but is also modular—which gives instructors the flexibility to focus on specific parts of the case, depending on the topics they want to cover. The learning concepts include the following: data cleaning and ingestion, classification/probability estimation modeling, regression modeling, analytical engineering, calibration curves, data leakage, evaluation of model performance, basic portfolio optimization, evaluation of investment strategies, and using Python for data science.
Cargo’s mission is to help "rideshare drivers earn more money by providing complimentary and premium products to passengers." Cargo sources goods from suppliers to provide a platform for gig economy drivers to run small convenience stores out of their vehicles. Drivers earn additional income, and riders enjoy convenient and affordable access to products during their rides. As the company grew, Cargo faced a number of supply-chain-related challenges including determining the product mix in the Cargo box, replenishment of the product, and the cost of carrying inventory. In particular, would the replenishment decision be driven by the company or the driver and who would bear the responsibility for the inventory cost? The founders also considered how to most efficiently manage its suppliers: Would a centralized or decentralized model best serve Cargo and its drivers? And, how might supply chain contracts with its suppliers help support the company’s profitable growth?
Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there with full teaching notes, data files, and solutions.
I sometimes teach students data visualization in Tableau as a final lecture in some of my classes. When I do, I use this case, which leverages data made available by NYC's citibike system to create an impressive-looking interactive dashboard summarizing all trips in the system in a given month.
The final dashboard looks quite impressive (see a demo video here), and is surprisingly easy to build, so it tends to work very well.
The case comprises the following artefacts