Leveraging Metadata to Identify Netflix Viewers’ Decisions
In December 2018, Netflix released Black Mirror: Bandersnatch, which allows viewer decisions to determine the protagonist’s fate. A recent study revealed that researchers could use viewer video traffic to determine which adventure they chose. Researchers did this by examining the metadata associated with viewer traffic, which contains considerable managed attribution artifacts.
In 2016, Netflix began encrypting its video streams to protect user privacy, but researchers at IIT Madras were able to use Bandersnatch metadata to prove that that effort wasn’t as effective as hoped. Metadata is popularly defined as “data about data.” It has a variety of uses, including identification, description, and data organization, as well as a variety of types, ranging from file size and title, to meta descriptions and the date and time the file was created and posted. In short, ‘metadata’ can encompass a massive amount of information.
Researchers used Netflix’s encrypted metadata to predict users’ Bandersnatch decisions. They began by identifying the types of metadata Netflix associates with its content. One of the most important discoveries was that Netflix divides each decision branch into a ‘default’ option and a ‘backup’ option. When the user came to a decision in the show, a JSON file would be sent to Netflix’ servers, and Netflix’s system would automatically prepare the default stream for selection. If either the ‘default’ option or no option were selected, the stream would continue normally; however, if the backup option was selected, additional JSON files were sent to Netflix’s servers.
Researchers began identifying the number of JSON files being sent between the users and the servers, but they still needed to be able to distinguish JSON files containing decision data from other JSON files. This was accomplished by looking at the ‘SSL record length,’ which is a JSON header that remains visible even in encrypted traffic. By using the SSL record length, researchers could identify which packets were decision packets, and identifying the number of JSON decision files being sent enabled the researcher to identify whether or not the default option had been selected. They then used their knowledge of these options to predict 100 users’ decisions with a 96% success rate.
The Bandersnatch case illustrates how users can leverage basic information about the data being trafficked through the internet to make informed inferences about the contents of that data. This case proves that the unencrypted elements of data provide a new managed attribution consideration, as every file, post, and message that is published online can weaken or strengthen users’ digital fingerprint and behavioral footprint. These artifacts can be used to infer the data’s contents, and the identity and entities with whom they’re engaging. In the case of Bandersnatch, metadata enabled the viewer to become the viewed.