Regression Analytics at the New York City Department of Education

With over 1,700 schools serving 1.1 million students and an annual budget of almost $25 billion, The New York City Department of Education is the largest education system in the world. It is also one of the most diverse. In this case, we first use Python to parse and prepare data in the public domain, and we then use regression analysis to explore various factors that affect performance at these schools.

Regression Analytics at the New York City Department of Education

Modern Retail Analytics: Data Visualization Using Tableau

with Maxime Cohen and Matthieu Reed

This case includes three detailed tutorials that can be used by instructors in the classroom to introduce students to Tableau in the context of retail strategy and operations (using the Global Superstore dataset that is made available with Tableau). We also wrote a homework assignment with detailed solutions.

We're making this case available free of charge to all instructors. Speaking for ourselves, we know that when we initially began using Tableau and downloaded the dataset, we felt it would provide a tremendous teaching opportunity, but we felt it didn't come with any "polished" resources we could easily distribute to students for use in a classroom. We hope this case study will fill that gap.

Material:

An introduction and backstory for the case, which introduces the Global Superstore dataset and the data in question.
The version of the Global Superstore data used in the case (note that Tableau periodically updates the version of this dataset it distributes with the software, but the dataset provided here will match the storyline and tutorials below exactly).
Three tutorials (tutorial A, tutorial B, and tutorial C) introducing various tableau features with increasing levels of complexity in the context of the storyline above.
An assignment that can be given to students in conjunction with the case.
Full solutions to the assignment (please email us together with proof of your instructional appointment for a password to this file).

We hope you find this case useful. Please feel free to browse the rest of this site for other case studies and material.

Analytics of TCUs in Californian Hospitals using Bayesian Networks

Intensive care units are invariably the most expensive units in any given hospital, and are often overloaded. Certain hospitals have introduced transitional care units (TCUs), which are cheaper to run, to try and reduce the load on intensive care units. However, it is unclear whether the introduction of these TCUs has had any positive effects. In this project, we will analyse data using data mining techniques to try and better understand the effect of the introduction of TCUs. Specifically, we will examine the effect TCUs have had on appropriate measures of treatment cost and quality of service.

This was a project I undertook as part of a service systems class with Prof Ward Whitt and with the help of Prof Carri Chan. Unfortunately, the data seemed to be of insufficient scale to get conclusive results. Final presentation here, minus any slides containing results, for confidentiality reasons.

Detecting Bubbles Using Option Prices

In the context of financial markets, bubbles refer to asset prices that exceed the asset's fundamental, intrinsic value. Bubbles are often associated with a large increase in the asset price followed by a collapse when the bubble "bursts". A series of recent papers have developed a number of mathematical models for bubbles in financial markets, together with a number of analytical tests that could, in theory, be used to detect bubbles before they burst. These tests, however, only use information available in the stock prices themselves. In this project, we investigated a variation of these detection methods that rely on prices of options on the stock, rather than on the price of the stock itself.

This was a summer project I undertook in the first year of my PhD with Prof Paul Glasserman. Power point presentation available here.

The Odds Algorithm

The classical secretary problem concerns the following situation: an interviewer needs to hire a single secretary, and sets out to interview a fixed number of candidates. While interviewing a candidates, the interviewer ascertains how the candidate ranks compared to every previous candidate. After each candidate is seen, the interviewer can either accept the candidate, and end the interview process, or reject the candidate, without any chance of ever returning to that candidate. The classical secretary problem seeks the best strategy to adopt in this case. Clearly, choosing an early candidate is a bad idea - indeed, having seen very few candidates, it is difficult to know what is available. Similarly, waiting till the last candidate might also not be the best choice - the last candidate might be lousy!
The Odds Algorithm was developed as a very elegant way to solve the secretary problem and many of its more complicated variations. In this presentation, we state and prove the Odds Theorem and consider a number of its applications.

This was a final project in Prof Omar Besbes' and Vineet Goyal's course "Dynamic Learning and Optimization: Theory and Applications", which I took in the Spring 2011 semester at Columbia. I prepared a short presentation (which requires the MathType fonts) and report.

High Dimensional Model Selection

This review paper summarizes many of the results in the field of high dimensional model selection in statistics. Broadly, the field concerns model fitting in situations in which there are many, many variables that might affect an outcome, and we seek the subset of variables best able to model this outcome.

This was an essay submitted in partial fulfilment of the requirements for my Masters in Mathematics at the University of Cambridge under the supervision of Richard Samworth.

Available here.

Package Sizing Decisions

What's the ideal size for a ketchup bottle (from Heinz's point of view)? If the bottle is too small, the company loses out on extra profit from consumers who would have been willing to buy more. If the bottle is too big, the company loses out on consumers who need much less, and therefore don't buy at all. This question were addressed in a 2010 paper by Koenigsberg, Kohli and Montoya. This piece of work reviews some of their work, but also examines their assumptions, and reports preliminary numerical attempts are improving some of them to make the model more realistic.

This is work I carried out while at Columbia in the summer of 2009 - summary available here.

The OLYMPUS experiement

What's inside a proton? We should be able to answer that question using lattice QCD (quantum chromodynamics), and when computers catch up with the theory, we probably will. In the meantime, however, we're stuck with a more primitive method - shoot things at protons, see what happens and make deductions. The problem is that particle physicists have tried two ways to "shoot stuff at a proton", and the results have not been consistent. This could be because of second-order interactions polluting one of the methods. OLYMPUS is an experiment that should reveal whether this is the case. This poster summaries the background and aims of the experiment.

This was a final presentation for class 8.276 (Particle Physics) at MIT. PDF available here.

The Path Integral Approach to Quantum Mechanics

Quantum mechanics and classical mechanics are both called "mechanics" - but they apparently have little in common. One deals with waves, operators and probabilities, whereas the other deals with particles, forces and deterministic variables. This paper is an introduction to the path integral formulation of quantum mechanics, which unifies quantum and classical mechanics under one common framework and reduces to the Lagrangian approach at very high energies (the equivalence principle).

This was a final project for class 8.06 (Quantum Mechanics) at MIT. PDF available here.

Enzyme-free constant-temperature DNA quantisation and amplification

DNA is everywhere, and being able to accurately and reliably detect and amplify tiny amounts of the molecule is crucial. The most common DNA amplification method, PCR (Polymerase Chain Reaction), is ubiquitous, but requires the use of highly specialized and expensive enzymes and tediously specialized reaction conditions most commonly obtained using thermal-cycling machines. In this project, we attempted to extend a method developed by Zhang et. al. (2007) to create an "enzyme-free" version of PCA.

This was part of a SURF project at Caltech's DNA lab. Progress report (more informative) here and final report here.

Excel Tools

This set of tools extends Excel's functionality - Formula explorer allows easy auditing of large and complex formulas - clicking on any cell refernce brings up the relevant cell and brackets can be independent and highlight for clarity. To use, hit Ctrl+Shift+F in any cell with a formula. Hit F1 from the formula explorer for a list of features. - Functions to perform redumentary linear algebra operations - finding eigenvalues, eigenvectors, Cholesky decomposition, and inverse matrices.

In theory, downloading this xla file and opening it should make these tools available in any workbook. Unfortunately, this was written for a previous version of Excel - it is unlikely to still work.

A Turing Machine Development Environment

Turing Machines are one of the simplest computing models equivalent to today's computers - that is to say, anything computers can do, Turing Machines can do and vice-versa. Turing Machines can therefore be used to find the limits of what computers can do and can't. However, Turing Machines are rather difficult and tedious to program, and very few packages exist to help this process. The aim of this project is to build a program to help the making of Turing Machines.

This was a research project I carried out at the Technion during the summer of 2003 (I was 16 when I wrote this, so don't judge!). Short presentation, project report, and executable files (file 1 and file 2; I'd be amazed if these still run on a modern machine.)

Daniel Guetta

Archives

Archives

Regression Analytics at the New York City Department of Education

Regression Analytics at the New York City Department of Education

Modern Retail Analytics: Data Visualization Using Tableau

Analytics of TCUs in Californian Hospitals using Bayesian Networks

Detecting Bubbles Using Option Prices

The Odds Algorithm

High Dimensional Model Selection

Package Sizing Decisions

The OLYMPUS experiement

The Path Integral Approach to Quantum Mechanics

Enzyme-free constant-temperature DNA quantisation and amplification

Excel Tools

A Turing Machine Development Environment