Privacy Falls into YouTube's Data Tar Pit

As a big lawsuit grinds forward, its parties engage in discovery, a wide-ranging search for information "reasonably calculated to lead to the discovery of admissible evidence." (FRCP Rule 26(b)) And so Viacom has calculated that scouring YouTube's data dumps would help provide evidence in Viacom's copyright lawsuit.

According to a discovery order released Wednesday, Viacom asked for discovery of YouTube source code and of logs of YouTube video viewership; Google refused both. The dispute came before Judge Stanton, in the Southern District of New York, who ordered the video viewing records -- but not the source code -- disclosed.

The order shows the difficulty we have protecting personally sensitive information. The court could easily see the economic value of Google's secret source code for search and video ID, and so it refused to compel disclosure of that "vital asset," the "product of over a thousand person-years of work."

But the user privacy concerns proved harder to evaluate. Viacom asked for "all data from the Logging database concerning each time a YouTube video has been viewd on the YouTube website or through embedding on a third-party website," including users' viewed videos, login IDs, and IP addresses. Google contended it should not be forced to release these records because of users' privacy concerns, which the court rejected.

The court erred both in its assessment of the personally identifying nature of these records, and the scope of the harm. It makes no sense to discuss whether an IP address is or is not "personally identifying" without considering the context with which it is connected. It may not be a name, but is often one search step from it. Moreover, even "anonymized" records often provide sufficiently deep profiles that they can be traced back to individuals, as researchers armed with the AOL and Netflix data releases showed.

Viewers "gave" their IP address and username information to YouTube for the purpose of watching videos. They might have expected the information to be used within Google, but not anticipate that it would be shared with a corporation busily prosecuting copyright infringement. Viewers may not be able to quantify economic harm, but if communications are chilled by the disclosure of viewing habits, we're all harmed socially. The court failed to consider these third party interests in ordering the disclosure.

Trade secret wins, privacy loses. Google has said it will not appeal the order.

Is there hope for the end users here, concerned about disclosure of their video viewing habits? First, we see the general privacy problem with "cloud" computing: by conducting our activities at third-party sites, we place a great deal of information about our activities in their hands. We may do so because Google is indispensable, or because it tells us its motto is "don't be evil." But discovery demands show that it's not enough for Google to follow good precepts.

Google, like most companies, indicates that it will share data where "We have a good faith belief that access, use, preservation or disclosure of such information is reasonably necessary to (a) satisfy any applicable law, regulation, legal process or enforceable governmental request." Its reputation as a good actor is important, but the company is not going to face contempt charges over user privacy.

I worry that this discovery demand is just the first of a wave, as more litigants recognize the data gold mines that online service providers have been gathering: search terms, blog readership and posting habits, video viewing, and browsing might all "lead to the discovery of admissible evidence" -- if the privacy barriers are as low as Judge Stanton indicates, won't others follow Viacom's lead? A gold mine for litigants becomes a tar pit for online services' user.

Economic concerns, the cost of producing the data in response to a wave of subpoenas, or reputational concerns, the fear that users will be driven away from a service that leaves their sensitive data vulnerable, may exercise some constraint, but they're unlikely to be enough to match our privacy expectations.

We need the law to supply protection against unwanted data flows, to declare that personally sensitive information -- or the profiles from which identity may be extracted and correlated -- deserves consideration at least on par with "economically valuable secrets." We need better assurance that the data we provide in the course of communicative activities will be kept in context. There is room for that consideration in the "undue burden" discovery standard, but statutory clarification would help both users and their Internet service providers to negotiate privacy expectations better.

Is there a law? In this particular context, there might actually be law on the viewers' side. The Video Privacy Protection Act, passed after reporters looked into Judge Bork's video rental records, gives individuals a cause of action against "a video tape service provider who knowingly discloses, to any person, personally identifiable information concerning any consumer of such provider." ("Video tape" includes similar audio visual materials.) Will any third parties intervene to ask that the discovery order be quashed?

Further, Bloomberg notes the concerns of Europeans, whose privacy regime is far more user-protective than that of the United States. Is this one case where "harmonization" can work in favor of individual rights?

Taxonomy upgrade extras: