Many of us intuitively understand that our behavior can leak sensitive information. Take for example someone’s profile on LinkedIn. If you suddenly see the profile improved with updated job history, better photo, new endorsements, and many new connections you might well assume that she was looking for a new job. You might even be able to guess the target companies or at least sector based on the distribution of new connections.

There was a lot of buzz around “Passive Information Leakage”, although it was not called that at the time, when a number of articles were published describing how Target was able to discover that a teen girl was pregnant before she had told her family.

Similar things happen on every website you visit. Which pages you view, how long you stay on them, and how that pattern changes over time can tell the website a great deal about you. That information becomes even more valuable when the behavior is looked at over an entire group. For example, the aggregate behavior of a company or research team can collectively provide more information than any single individual’s activity.

We see this in the way that Google discovers the location of IP addresses. Most databases and services providing the physical location of IP addresses are significantly inaccurate. Google, on the other hand, has really good location information for IP addresses. How do they do that? It turns out that they do it based on averaged user behavior. Specifically, they appear to be looking at the activity of users within a given class C block of IP addresses (addresses where the first three sets of numbers are the same). Based on how they interact with the map tools, where they start and end trips, and searches for local businesses, Google obtains a cloud of locations. The center of that cloud is the location of the IP address to within a part of a city with very high accuracy.

Previously IP to location lookups have been terrible. They were originally based on the registration information for the block of IP addresses, which might be the headquarters of an ISP a thousand miles from the datacenter using those particular IP addresses. More recently they have been improving using a different kind of behavior, shopping. The IP address location services partner with online retail to obtain billing and shipping addresses for customers, along with the IP address. This is less subtle than Google’s approach, but looks to be working quite well.

Passive Information Leakage – Part 3
Passive Information Leakage – Part 1