Monday, August 21, 2006

Faster than you can say "Clear History"

This is old news, but a few weeks ago AOL released 2 gigabytes of raw search data from it's users. Basically, this document contains every web search every AOL member made for a period of several months. The "reason" given for releasing this was academic research -- and I have to agree....seeing the search terms of my fellow Americans is very academic:



(crazy search histories from our friends at Something Awful. More here and here.)


In aggregate, this data is downright fascinating (when it's not totally disturbing) Aside from the trainwreck/cringe factor, I could (actually) understand the rationale of releasing it. It's another snapshot of the zietgiest -- and much more personal than (the now bland seeming) Google Zietgiest's revelations that "ricky bobby" is a gaining query. Figuring out what's really on America's mind has been a hobby of socialogists for decades.

The problem is folks have been conditioned to see that little text field and "Search" button as a great anonymizer. And it is -- except that your searches, with a little detective work, can reveal who you are. . In fact, last week the interpid New York Times managed to, using only the searches of one person, figure out who that AOL user was in real life -- a 62 year old grandmother.

(Sidebar: Bri spent a nice a chunk of time yesterday on my computer trying to find the pictures mentioned in this article. Can't wait til Google decides to release my search history.)



What this little incident has taught me, however, is to now be much more worried about the NSA wiretapping boondoggle. Honestly, before, it wasn't particularly worried about it. Yes, Bush broke the law. Yes, creating huge call lists is generally not something I'd like my government to be engaged in. But overall, my feeling against it were more about due process of law being ignored -- not the actual program itself.

See, I actually believed the whole notion of "We're just data mining it! There's soooo many numbers that we're just shuffling them through the computer as fast as we can looking for patterns that suggest terrorism." But that was the same rationale for the AOL search leak -- use it for data mining. Run those 2 gigs to data through a computer and try to decide how to market to people. The problem is not the data mining -- it's the file itself. Because when the file get broken down and you focus on one person (either AOL subscriber or phonejack) whole lives can be sussed out.

And, in a wierd way, I'd expect AOL to do a better job of keeping information safe than the Feds, so I think this bodes poorly for all of us.

2 Comments:

  • At 10:48 PM, Blogger Brianna said…

    when the new york times publishes stuff like that it's a like an internet search challenge! no self respecting geek can turn that down.

    sorry for the self centered comment but nothing in your post is as important as protecting my good name.

     
  • At 10:49 PM, Blogger Brianna said…

    my comments have to be approved? but the only reson i come here is to slander you! my fun is now ruined.

     

Post a Comment

<< Home