Projects & Research

Projects & Research

Older projects are posted on the archive page.

  • Show All
  • Cases
  • Data Science/AI
  • Deep Learning
  • Code available
  • Live system

BinIt: Revolutionizing the Recycling Industry with Computer Vision, AI, and Deep Learning

I use this case in my advanced AI sequence for MBAs to introduce neural networks in the context of a real business case. This is accompanied by:

  • A set of slides describing the theory of neural networks at a high level, including gradient descent, architecture, and convolutional neural networks.
  • An Excel spreadsheet with a "mini neural network" set in the context of the Binit case. It includes functionality that allows students to train this mini neural network using gradient descent, and demonstrates the feature learning capabilities of neural networks.
BinIt is a company that is taking a sustainable approach to waste processing and management. Started in 2021 by Raghav Mecheri and James Bollas, the company uses artificial intelligence to create visibility in the recycling supply chain. This case gives students an overview of the recycling and waste management industry, the global and geopolitical forces that drive its evolution and historical and new technologies that are changing the landscape. In addition, students will gain an understanding of how artificial intelligence and computer modeling can be used to turn a tedious, human-intensive task like sorting recyclables from trash into something that can be automated and less prone to error.
Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there for licensing. Reach out if you'd like the broader set of slides.

An interactive job market simulator game

Starting in the 2023-2024 academic year, Columbia Business School started teaching market design as part of its core MBA "Operations Management" class. I designed an interactive game to help illustrate the concepts in the class. The game allows students to play the role of candidates seeking jobs, and companies making offers. The interactive app allows companies to make offers live - they immediately appear on students' screens.

The game compares the process of clearing the market with and without the deferred acceptance algorithm, and illustrates the benefits of the latter in practice.

This video shows some screenshots from the game, and explains the gameplay.

If you would like to use the game for your classes, I'd be happy to try and make that happen - reach out.

Chasing Michelin: a lottery for sparse restaurant spots

The gourmet club at Columbia Business School runs an event called "Michelin Week" every semester. The club secures a few coveted reservations at Michelin starred restaurants throughout New York City, and subsidizes seats for a few lucky club members.

Given the scarcity of seats, and the enormous number of applicants the club struggled to figure out a way to allocate seats fairly. They reached out to me for help to design a lottery system that would automatically make these assignments given student preferences.

The resulting system automatically orchestrates the collection of preferences through personalized Google forms, confirms preferences by emails, and then makes assignments and notifies students. To ensure fairness and transparency, the entire process happens in the open. this document summarizes the system.

Enabling X-to-the-xth: the Operational Secret Behind Uber’s Explosive Growth

with Zhen Lian, Garrett van Ryzin

This case looks at a model describing the emergence of network effects in ridesharing platforms. I used it to teach The Analytics Advantage for a few years.

Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there for licensing.

Dynamic Programming for Complex Multi-Item and Multi-Location inventory problems

with Awi Federgruen, Garud Iyengar, and Xujia Liu

It is one of the great ironies of today’s retail environment that as almost every facet of the industry becomes more technologically advanced, some of its oldest and most low-tech problems are emerging as key differentiators. Supply chain management is a venerable field with a rich history. During most of my PhD, and since, I have been interested in expanding the field to handle more modern fulfillment problems. I was fortunate, during my PhD, to spend some time at Amazon, where I saw the need for this research first-hand.

Here is a selection of projects that I have worked on in this area, with my PhD advisors Awi Federgruen and Garud Iyengar, and (for the later projects) with Xujia Liu, a PhD student in the IEOR division of the engineering school at Columbia

  • "Scalable Approximately Optimal Policies for Multi-Item Stochastic Inventory Problems", in which we propose a different, more complex heuristic for managing inventories at a specific facility that handles multiple correlated products. The complexity of the algorithm requires a more involved proof of assymptotic optimality as the number of items grow. We also propose an alternative heuristic, suitable for a smaller number of items, and use it to derive important managerial insights about these kinds of systems. Availble here.
  • "Multi-Item Inventory Systems with Joint Expected Value and Chance Constraints: Asymptotically Optimal Heuristics", in which we propose a simple algorithm for managing inventories at a specific facility that handles multiple correlated products. We are able to show that this policy is asymptotically optimal. Available here.
  • My thesis, "Applied Inventory Management: New Approaches to Age-Old Problems", which discusses a series of problems related to the management of products across a network of facilities, each of which have constrained capacities. The first chapter of the thesis was published as "Two Echelon Distribution Systems with Random Demands and Storage Constraints", published in Naval Research Logistics (NRL) and available here.
  • A brief report detailing my exploration of whether these methods could be applied at a luxury goods retailer from whom I was able to obtain supply chain data, available here

Automating Bureaucracy with Python: The Case of the Springfield Bail Fund

This case, used in CBS's "Python for MBAs" class, teaches students OAuth and APIs - and in particular, guides them through the process of accessing and managing emails through gmail. It also briefly covers the use of Python to create Excel and Power Point documents.

Bail funds, organizations that collect money to post bail for community members who are eligible for cash bail to stay out of jail until the time of their trial, have grown throughout US history, buoyed by major civil rights events such as the Red Scares, Civil Rights movements, Vietnam War Protests, among others. Given the volume of potential community members who would benefit from bail fund assistance and the plethora of defendant information required to process the bail payments, such funds like the Springfield Bail Fund face a logistical and management challenge. In this case, students will learn about the US justice and bail systems and use Python to solve a bail fund’s caseload management difficulties.
Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there for licensing.

Continuous Quality Monitoring via Data and Analytics at The Estée Lauder Companies

with Carri Chan

This case, used in CBS's core "Operations Management" class, teaches students about process improvement, root cause analysis, and statistical process control.

The Estée Lauder Companies (ELC) is a leading manufacturer and marketer of prestige skincare, makeup, fragrance, and haircare products with annual sales of over $14 billion in 2020. ELC had a long-standing commitment to "High Touch" customer service, inviting and tracking all feedback from consumers. ELC had recently moved from a manual to an automated tracking platform, positioning the company to more efficiently handle customer feedback. This platform and the data it aggregated was put to the test in August 2020 when the Consumer Care Analytics & Insights Group noticed an uptick in contact volume. The team also noticed that the increase was driven almost exclusively by complaints. The spike was especially cause for concern given that ELC had a 100% return policy with no questions asked, so customer complaints often led to returns and lost revenue. In this case, students are asked to consider potential root causes of the increased online customer complaints—and based on their hypotheses, what type of analysis and potential interventions they would recommend.

Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there for licensing

An automatic ETF rebalancer with tax loss harvesting using the e-trade API

After using Wealthfront for a few years, I decided I wanted more control over my portfolio. I designed a script that uses the eTrade API to automatically invest in a balanced portfolio of ETFs. Whenever any ETF loses value, the script automatically sells it and buys an alternate ETFs to harvest tax losses. It also uses a simple quadratic optimization problem to figure out the best way to use the cash in the portfolio to buy ETFs in a way that best matches the target allocation.

In addition to the script itself, this code might be useuful in producing a simple interface to the eTrade API.

The code is available on github here. In addition, this page contains a sample run of the algorithm.

Analytics in Fixed Income Trading

I have found that one of the hardest parts of data science to teach is the first step - getting from the "real life" scenario to a predictive analytics problem that can then be addressed using data science techniques. This case helps me teach this concept by introducing a complex problem, and then guiding through students through a discussion that helps us define a predcitive analytics problem to solve it - we discuss target variables, units of analyses, evaluation metrics, and more.

This case describes the process and the salient factors traders must consider in pricing and selling products in the secondary market for previously issued bonds. Since there is typically no readily available, agreed-upon price for a bond, a bank will quote its own price of a specific bond to buyers or sellers, with the set price dependent on many factors, including the face value of the bond, the remaining time to maturity, the prevailing interest rate environment, the health of the company in question, and the supply/demand imbalance for the bond. Buyers typically send out inquiries (requests for quotes, or RFQs) to many banks in search of the best price—and sellers typically spend a great deal of time with RFQ-management to track these RFQs in order to determine which ones are worth pricing. This case outlines the myriad of challenges facing both sell-side and buy-side traders and asks students to suggest how predictive analytics might streamline the bond trading process—and what data would be required to implement their suggestions.
Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there for licensing. I have a comprehensive set of slides to help facilitate the discussion, which I'd be happy to share upon request.

A system for reverse-clustering MBA students at Columbia Business School

Every year, around 650 MBA students join Columbia Business School. They must be divided into 9 roughly equal clusters, each comprising several learning teams of four to six people. To ensure students have the best experience possible, it is important for these clusters to be balanced on many dimensions. For example, it wouldn't be very helpful from, a networking perspective, if every student who did their undergrad at Columbia were in one cluster, or every student who worked in banking before business were in one cluster.

The resulting problem is very difficult to solve manually because of the number of categories that must be balanced is quite large. To solve this problem, I designed an algorithm that automatically clusters students perfectly - every time. It has been used to create clusters at Columbia Business School every year since 2020.

Autonomous Vehicles: The Technology Behind the Magic

I use this case in my AI sequence to introduce (a) the design choices that go into designing an algorithm to run an autonomous vehicle (b) the Classification and Regresssion Tree algorithm.

Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there for licensing

At the Frontier of Retail in the 21st Century: Modern Fulfillment and The Case of FlexiWeight

with Omar Besbes and Carri Chan

This case in used in Columbia's Core "Operations Management" class to teach modern fulfillment strategies using linear programming and shadow prices/opportunity costs.

Founded in 2018 with a mission to revolutionize the way America exercises, FlexiWeight had designed an innovative set of compact, Blue-tooth-enabled adjustable weights suitable for at-home use. The concept was an immediate hit and, perhaps fueled by a rise in at-home exercising during the COVID-19 pandemic, the product quickly grew from a niche product to a must-have accessory for workout buffs across the United States. Demand for the product skyrocketed, and the founders realized the ad-hoc shipping and fulfillment systems they were using would not be able to keep up. This case presents students with a historical perspective on companies’ fulfillment challenges and the recent trend to move from a single-channel fulfillment system to a more complex microfulfillment strategy—and asks them to consider the best path forward for Flexiweight.
Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there for licensing.

Mastercard’s Organizational Structure: The Making of a New AI Powerhouse

with Dan Wang
Rohit Chauhan, Mastercard’s executive vice president of Artificial Intelligence since 2018, was tasked to lead the organization on a path to become an AI powerhouse. Chauhan believed that in order to accomplish this goal, AI would need to be deployed in every part of the company. Mastercard had identified four areas where AI could generate the greatest value: customer acquisition, portfolio optimization, risk and fraud management, and customer servicing. But it was unclear what organizational structure would help Mastercard to achieve its goal. One was to centralize its AI efforts by creating a new division staffed with AI experts; an alternative was to seed existing divisions with AI staff. In this case, students will consider which organizational structure would serve to maximize AI’s potential— a centralized approach, in which AI experts were assigned to collaborate with colleagues from other divisions on a project-by-project basis; or a decentralized approach, in which expertise and authority were dispersed across divisions.

Aroma Consumer Products Co.: Managing Capacity for a Growing Company

with Carri Chan

This case in used in Columbia's Core "Operations Management" class to teach process analysis.

Aroma Consumer Products Co. has served as Chesapeake Bay Candle Co.’s Hangzhou, China-based manufacturing partner since the company’s inception and has helped Chesapeake sell candles to its initial US clients in the high-end tier of candles. With a growing reputation within the niche space, Chesapeake has recently captured a major customer in the mid-tier candle category and Aroma Consumer Products is suddenly facing a much greater production mandate. Working harder wouldn’t be enough—they would also have to work smarter, and she realized that this would mean making significant changes. In this case, students will learn about the manufacturing and distribution process in the candle industry, discuss operational bottlenecks, and determine operational capacity as well as ways to increase it to meet near term demand spikes.
Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there for licensing.

From Intuition to Data-Driven Analytics: The Case of Dig

This case, used in CBS's "Python for MBAs" class, teaches students pandas, and data analysis in Python, using the real story of the Dig restaurant chain.

Dig, a restaurant chain with a menu focused on seasonal produce, intentionally sourced, and offered at an accessible price point, was poised to expand. While the organization had collected and used data from its inception, management realized expansion would require a more systematic approach to driving value using data-driven insights at every stage of this value chain. This case chronicles Dig's journey from intuition to data-driven analytics, from the creation of a solid data foundation to the development of dynamic decision-support tools touching many parts of the company.
Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there for licensing.

XLKitLearn : an Excel front-end for sci-kit learn

I developed this tool to teach by advanced AI sequence to MBAs using excel. See here for more details.

The CBS Business Analytics Excel Add-in

with Santiago Balseiro, Utkarsh Patange, and the CBS DRO Business Analytics team

The CBS DRO team developed this tool as part of our core Business Analytics class at CBS. Unlike xlkitlearn, it is written in VBA and does not rely on Python. It is therefore simpler, and the scale of data it can handle is smaller, but this works just fine for our core class. It augments Excel with functions that can carry out logistic regression, k-nearest-neighbors predictions, and Monte Carlo simulation.

The add-in can be installed here, and the source code for the add-in is available here.

Image recognition at the USPS: working with MNIST in Excel

The MNIST database is one of the older sources of training data for machine learning algorithms; it comprises handwritten digits, together with human-generated labels.

This case, which I use in my advanced AI sequence for MBAs, provides a way to work with this dataset using only Excel, to make it accessible to a broad audience. I teach it in the context of a case about the USPS which - even in these days of emails and electronic communications - handles almost 150 billion pieces of mail a year. Doing this quickly and efficiently is a truly mythical task. Using only Excel and XLKitLearn, we develop an algorithm which achieves 92% accuracy on the MNIST dataset. This is a far cry from the performance that can be achieved with modern, deep-learning techniques, but it is impressive for an Excel-based system, and acts as an introduction to the world of unstructured data.

I'd be happy to share the materials for this case with anyone interested - feel free to reach out.

Evisort: An AI-Powered Start-up Uses Text Mining to Become Google for Contracts

I use this case in my advanced AI sequence for MBAs to teach text analytics.

AI-driven text mining, a relatively new business analytics tool, allows users to unlock troves of information contained in documents and make them searchable by content and metadata. In this two-part case, I first introduce Evisort, a start-up seeking to create AI-enhanced software providing contract management and processing solutions for attorneys and business professionals, and discuss the challenges and opportunities inherent in such a startup. I then provide an introduction to the science of text analytics.
Review copy (stamped 'do not copy') of part 1 and part 2. Published at Columbia Caseworks (part 1 and part 2), and available there with full teaching notes, data files, and solutions.

Data-Driven Investment Strategies for Peer-to-Peer Lending: A Case Study for Teaching Data Science

with Maxime Cohen, Kevin Jiao, and Foster Provost, finalist in the 2019 INFORMS case competition

I use this case in my advanced AI sequence for MBAs to teach random forests and ensemble models to students. The case is also useful in helping students define an analytic problem - there are several features provided in the data that seem useful at first glance, but that actually should not be used - these lead to interesting discussions in class.

We develop a number of data-driven investment strategies that demonstrate how machine learning and data analytics can be used to guide investments in peer-to-peer loans. We detail the process starting with the acquisition of (real) data from a peer-to-peer lending platform all the way to the development and evaluation of investment strategies based on a variety of approaches. We focus heavily on how to apply and evaluate the data science methods, and resulting strategies, in a real-world business setting. The material presented in this article can be used by instructors who teach data science courses, at the undergraduate or graduate levels. Importantly, we go beyond just evaluating predictive performance of models, to assess how well the strategies would actually perform, using real, publicly available data. Our treatment is comprehensive and ranges from qualitative to technical, but is also modular—which gives instructors the flexibility to focus on specific parts of the case, depending on the topics they want to cover. The learning concepts include the following: data cleaning and ingestion, classification/probability estimation modeling, regression modeling, analytical engineering, calibration curves, data leakage, evaluation of model performance, basic portfolio optimization, evaluation of investment strategies, and using Python for data science.
  • The case itself was published in Big Data, and is available free of charge here.
  • The following Jupyter notebooks cover the analysis discussed in the case:
  • The publication above contains both questions and solutions. Some instructors have asked us for a student handout containing only questions. Here it is, in pdf format and raw tex format for editing.
  • The data used in the case can be downloaded directly from LendingClub here. [Edit: LendingClub ended its peer-to-peer lending product in 2019; the data is unfortunately no longer available at this link.]

Supply Chain Coordination and Contracts in the Sharing Economy - a Case Study at Cargo

with Maxime Cohen and Wenqiang Xiao, winner of the 2018 INFORMS Case Competition
Cargo’s mission is to help "rideshare drivers earn more money by providing complimentary and premium products to passengers." Cargo sources goods from suppliers to provide a platform for gig economy drivers to run small convenience stores out of their vehicles. Drivers earn additional income, and riders enjoy convenient and affordable access to products during their rides. As the company grew, Cargo faced a number of supply-chain-related challenges including determining the product mix in the Cargo box, replenishment of the product, and the cost of carrying inventory. In particular, would the replenishment decision be driven by the company or the driver and who would bear the responsibility for the inventory cost? The founders also considered how to most efficiently manage its suppliers: Would a centralized or decentralized model best serve Cargo and its drivers? And, how might supply chain contracts with its suppliers help support the company’s profitable growth?

Review copy (stamped 'do not copy') available here. Published at Columbia Caseworks, and available there with full teaching notes, data files, and solutions.

The roboTA - an automated email management system for class emails

Emails are an essential means of communication between students and their teaching team (Professor and TAs). When I first started teaching, I became frustrated by how difficult it was to manage this channel of communication. Distributed responsibilities among TAs meant that emails went unanswered for long periods - and the longer a response takes to arrive, the less useful it is.
To solve this problem, I created the roboTA, a system that automatically tracks all emails sent by students to me and all the TAs, and orchestrates reminders when emails sit unanswered for too long. I use the system in all of my classes, and have made it available to my colleagues, some of whom use it too. I'm happy to make it available to you too - drop me an email.

Data Visualization in Tableau - The Case of Citibike

I sometimes teach students data visualization in Tableau as a final lecture in some of my classes. When I do, I use this case, which leverages data made available by NYC's citibike system to create an impressive-looking interactive dashboard summarizing all trips in the system in a given month.

The final dashboard looks quite impressive (see a demo video here), and is surprisingly easy to build, so it tends to work very well.

The case comprises the following artefacts

  • A case, briefly outlining the history of the system, and listing some suggested questions students might answer with the dashboard they create.
  • Data for Manhattan and Jersey city, outlining every bike trip in the system in May 2018. I tell more advanced students that instead of relying on these old datasets, they might prefer to download more recent datasets from the citibike website - note, however, that these new datasets have a different format.
  • A finished Tableau dashboard, created using these data.
  • A series of videos guiding the students through the steps required to construct the dashboard step by step.