At first, it seemed like the algorithm wasn’t working right.

Michael Fleder, an MIT researcher and recent alumnus working with the Laboratory for Information and Decision Systems (LIDS), had been working on an algorithm that could break down anonymized bill totals into individual item costs, creating an overview of how many people are buying a specific item or service. He was testing it out on a bulk set of data from Netflix, and although most of the data points matched to a list of the usual subscription services, there was an outlier that kept popping up at a price point too high for anything Netflix was offering.

On closer examination, Fleder realized that the algorithm was working better than expected — not only had it found known services, but it had also discovered an unannounced-but-rumored Ultra HD subscription that Netflix was testing on a limited audience. It also discovered another as-yet unmentioned product at an even higher price point.

The algorithm is detailed in a paper published at the ACM Sigmetrics Conference in December 2020 under the playful title “I Know What You Bought At Chipotle for $9.81 by Solving A Linear Inverse Problem.” Fleder co-wrote it with Professor Devavrat Shah of the MIT Department of Electrical Engineering and Computer Science, and it will be featured as part of an upcoming book by the Cambridge University Press.

Although “big data” is currently the more popular term for dealing with large amounts of information, Fleder says, “We live in this small-data problem. How can you rip these numbers apart and extract as much as you can?”

The novel inference algorithm Fleder and Shah have developed is robust, iterative, and computationally efficient, deconstructing transaction totals into the underlying products purchased, using aggregates of what is generally called “exhaust data,” or readily available anonymized data created during digital transactions.

“What is a little surprising is how the data has a signature,” says Shah. “Each individual purchase is just one number, but if many people purchase things, there is a power in collectiveness with a bit of variation, which is remarkable.”

This algorithm could be used to track sales numbers on a weekly or even daily basis, automating elements of work currently performed by financial analysts. Companies such as Google already use studies of anonymized credit data with relation to advertising, but with more detailed information readily available and increased transparency, new market opportunities may arise.

Of particular practical interest to businesses could be the increased ability to understand demand at different points of supply chains. In the case of Chipotle, their suppliers might anticipate changes in demand for ingredients like avocados by monitoring the sales of items like guacamole.

Businesses would also have new methods by which to understand and anticipate their competitors’ strategies, and it could help with businesses such as hedge funds that use transaction data to track public companies.

In its initial development, the algorithm was used on commercially-available data, provided by data vendor Second Measure. Using transactions data related to spending at Chipotle, Apple, Spotify, and Netflix, the method correctly identified the timing of the launch of a new product tier from Spotify and the release of the new iPhone XS Max. 

Fleder intends to use this algorithm as part of a new startup, with potential applications for a wide variety of companies.