Skip to content
This repository has been archived by the owner on Nov 7, 2024. It is now read-only.

Address diversity of privacy definitions and add missing threats #41

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

TheMaskMaker
Copy link

As advised by Sam, rather than updating the existing threats doc, I am making a new doc.

I have included many threats missing from the original, including many tracking methods and profiling strategies, and the 2 core definitions of privacy. We might need a third to represent Google's perspective if it is not covered by the prevent all tracking definition, as well as potentially others.

@jwrosewell
Copy link

jwrosewell commented May 19, 2021

Thank you, @TheMaskMaker, for initiating this change. As someone who works in a small business new to the W3C without the time or mandate to take on additional work or engage with all the W3C debates I appreciate how much effort this draft amendment required.

It is extremely important this document receives considerably wider input from a diverse set of web stakeholders if it is to be useful in supporting debates and resolving important issues.

This pull request starts to broaden the considerations concerning privacy and takes this document in a direction that starts to align with the advice of Benjamin R. Dryden and Shanker (Sean) Iyer in their paper Privacy Fixing and Predatory Privacy: The intersection of big data, privacy policies and antitrust. The conclusion provides a summary for PR reviewers.

Third, because it is relatively uncommon for companies to adopt privacy policies in direct collaboration with their competitors, the most likely target for a privacy fixing or predatory privacy claim might well be a standards-setting organization or trade association that tries to adopt a best privacy practice or a rule of ethics for an entire industry. Therefore, when such organizations wade into discussing privacy topics, they should recognize the competitive concerns and potential antitrust risks. Whenever possible, such standards-setting organizations and trade associations should make sure to apply procedures and safeguards to prevent their decisions from becoming hijacked by private interests. For instance, such organizations might consider requiring supermajority votes before any policies are adopted, basing decisions on outside expert judgments rather than industry interests, and describing any best practices as “recommendations” rather than as strict requirements.

Currently the W3C lacks a membership agreed definition of “privacy”. The W3C needs one if debates are to be resolved. I’ve found The Promise and Shortcomings of Privacy Multistakeholder Policymaking: A Case Study analysing the DNT debate at W3C helpful in understanding the groundwork needed to foster effective debate and how far we still need to come.

The priority of constituents is broadly agreed in order as 1) people, 2) authors (website operators), 3) browsers and 4) specification writers. Commentators often seek to speak on behalf of people to further their position and proposal as a lower order constituent. But who can really speak on behalf of people? In democracies elected governments who set laws. Their laws must be referenced and considered in any document that seeks to effectively discuss privacy. Such a change might lead us to define "privacy" as "unlawful privacy practices". Such a change might focus proposers on identification and sanction remedies for harm rather than more restrictive remedies that have wider consequences for the open web.

Some web browsers have privacy policies. At the W3C these are only relevant if all stakeholders’ policies are equally considered. After all web browser vendors are the third constituent. In addition to referencing laws, and the groups already considered, we also need to involve other groups such as the Internet Advertising Bureau (IAB), Partnership for Responsible Addressable Media (PRAM), Prebid, European Publishers Council (EPC) and Association of Online Publishers (AOP) to name a few.

@samuelweiler is assigned to this group for 1.5 days per week. Could Sam take the action to communicate with these wider groups and assemble their privacy positions and objectives as informative references for either this pull request or a parallel pull request? Could Sam invite these groups to present to the PING and TAG their positions? Could Sam summarize each groups position for the document?

Until the W3C have a settled single position on privacy I fear we will diverge from our mission of leading the web to its full potential.

@TheMaskMaker
Copy link
Author

@jwrosewell I would love to see groups more clearly define each one's take on what a private web should look like in clearly defined language. This would prevent a great deal of miscommunication, confusion, and vulnerability to unintentionally falling into the fallacy of equivocation.

In fact, I have a demo next week for the business group that may address the need to further communicate privacy definitions and take a bird's eye view of the situation! You might be interested in it.

I just want to be clear that this document is not meant to solve the question of what is privacy, but examine the varying viewpoints that exist in the W3 (and as you mention outside might be good to document as well) and the threats and major considerations of each one.

@jwrosewell
Copy link

@TheMaskMaker Understood about the scope of the document. I do believe an agreed W3C definition of privacy should exist in PING and TAG. This document should reference that.

@pes10k @samuelweiler as PING chair and W3C representative respectively could progressing the steps to such a definition be scheduled for a future PING meeting?

Copy link
Member

@jyasskin jyasskin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful to think through these extra threats. If the PING develops consensus that they're in the scope of the W3C's work, I think we should eventually try to merge them into the main threat model document, but doing a separate document initially should help drive the discussion.

Comment on lines 62 to 68
Sync Tracking (II): Any type of cross device syncing involving browser data. Examples include Chrome Sync. This allows for cross domain data to be tracked by the browser company. This also allows for user profile data and customization to be built from this data.

OS Snooping (II): Examples: Safari-ios or Google-Android integrations that enable the operating system to gain insight into user data in the browser when it appears as an app, or in some cases even in the classic PC browser. These browser-side integrations allow tracking prevention safeguards to be bypassed by the browser and/or OS companies. This allows a user’s web data to be tracked and users to be profiled on the web through the user agent through a back door.

Stack ID (II): The operating system or any program higher on the stack assigns a device id that can be correlated to the web user, and passes it through with some or all client calls.

Browser Login Tracking (II): The ability for a user to log in to their browser by some means that exists across domain, enabling the browser or affiliate to individually track them. Examples include gmail/chrome login.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 4 and the Browser Snooping threat below are a useful new category of threats to privacy, where users might worry about their browser learning too much about them. I think we should keep them separate from the other 3 items that cover the websites the user visits. Basically, group cases by the entity that's attacking the user.

I think the W3C hasn't historically worked much on the problem of a user choosing a browser whose behavior they didn't entirely want, and it's not clear to me that the W3C has any levers to use to change this sort of UA behavior, but it's worth discussing with the PING anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually they aren't a new category, in fact current uses of these methods feed into the exact same user tracking ad technologies as the others. I'm amazed they have not been brought up before!

To exclude them or move them to a separate category would raise the question of why browser-based or browser-partnered adtech firms get special treatment, and would make proposals benefit one group over the other.

Since the same data about users is extracted for the same use cases, and also I checked this is user agent behavior so should by the charter be in scope for us, we should probably leave them in the same list. Do you disagree?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you mean its worth flagging that only browsers can do these things? Thats a good point! I can tag them as such! Let me know if this update is in line with what you are thinking.

privacy_models.md Outdated Show resolved Hide resolved
Preventing fingerprinting requires entropy management that restricts abilities of the internet.


### The Control Theory of Privacy via ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this model is about the user using laws and regulation to control their online presence, while the other is about using technical means. The privacy threat model has, so far, been entirely about technical means. It's worth exploring those two approaches, although I'm not sure what we can say about the legal landscape in a W3C document that needs to include countries that haven't passed any useful laws about this.

I believe there's disagreement in the "regulation will help" camp between people who think "notice and consent" is enough, vs people who think it tends to just produce cookie banners. Are you the right person to explore that, or do we need to find more contributors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The privacy threat model has actually been different in different proposals so far. Its my hope we can update this document to reflect that reality. Its been very confusing! You'll find for example differences in what Google and Apple representatives consider tracking methodology that are most evident in a few of the floc tickets, but it is not at all clear when talking about the proposals since the word 'privacy' is used instead of a definition on what that means in each case.

I 100% would like each group to pen their own definitions and be very clear about it. I hope to address guidelines for doing this in Feathered Serpent which will demo in web adv next week, but the definitions should be written by their supporters, and the issues by those who disagree, this way every voice is heard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different people do have different assumptions about what "privacy" means about about what "tracking" means (e.g. https://twitter.com/jyasskin/status/1387170000511799299). This comment was an attempt to refine the description of this model so it conveys its point to more people. If you're in the "notice and consent" camp, could you write that down, with a TODO or something to get a contribution from the other folks?

privacy_models.md Outdated Show resolved Hide resolved
Comment on lines 86 to 88
Hidden Profiles: Refers to a user being unable to access a web system’s profile on their user data. This could be because it is impossible or impractical. It is of particular concern because a user can be misled as to what data of theirs is being captured where this threat exists.

Untouchable Profiles: Refers to a user profile that a user is unable to request deletion of. Includes cases where it is impractical for a user to verify or have confidence their data has truly been deleted upon request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two cases belong in a different category from the first two: the first two talk about the way a profile is built, while these two talk about how little control a user has over the profile.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good point, but its not always true that these methods are not tracking methods. Some profiles MUST be collected, i.e. in ecommerce. A user is consenting to buy something, and technically for their information to be collected. They may not want the latter but cannot separate them. Hiding the profile tricks the user into thinking they have not been tracked, and this behavior allows the profile to be collected. This is just as unfair to the user. My intention here was to prevent these threats from being forgotten.

We could put them in a separate list, but I think this method is important and I would prefer it here. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of subsections under "Profiling". When I say they're a different category, I don't mean to make claims about whether or not they're tracking (and, in fact it'd be good to avoid the term "tracking" entirely, or define it).

removing legacy II marker for individual identification methods
bolding terms
Marking browser-only and OS-only tracking methods as per PR feedback
@TheMaskMaker
Copy link
Author

@jyasskin Are you satisfied or have anything else on the remaining comments? Can this be merged?

Copy link
Member

@jyasskin jyasskin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not comfortable just merging this unilaterally: we should ask the PING chairs to add it to next week's agenda, so more people can weigh in.

Preventing fingerprinting requires entropy management that restricts abilities of the internet.


### The Control Theory of Privacy via ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different people do have different assumptions about what "privacy" means about about what "tracking" means (e.g. https://twitter.com/jyasskin/status/1387170000511799299). This comment was an attempt to refine the description of this model so it conveys its point to more people. If you're in the "notice and consent" camp, could you write that down, with a TODO or something to get a contribution from the other folks?

Tracking methods only implementable by user agent cooperation are marked with a (B), for browser
Tracking methods only implementable by operating system cooperation are marked with an (OS), for operating system

**Sync Tracking**: Any type of cross device syncing involving browser data. Examples include Chrome Sync. This allows for cross domain data to be tracked by the browser company. This also allows for user profile data and customization to be built from this data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a (B), right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: what privacy means
100% agree different folks have different opinions. I know these 2 definitions aren't enough. I'm working on Kitten Cluster to try to flesh these definitions out more in the words of the proposers. I plopped these 2 here mostly to start the conversation on differing definitions and I certainly hope folks contribute to fill them out. I noticed a few pages in PING make assumptions based on 1 definition and hopefully we can update them after discussion.

re: this is a B: YES! I missed that one thank you! fixing!


OS Snooping (II): Examples: Safari-ios or Google-Android integrations that enable the operating system to gain insight into user data in the browser when it appears as an app, or in some cases even in the classic PC browser. These browser-side integrations allow tracking prevention safeguards to be bypassed by the browser and/or OS companies. This allows a user’s web data to be tracked and users to be profiled on the web through the user agent through a back door.
**OS Snooping** (B) (OS): Examples: Safari-ios or Google-Android integrations that enable the operating system to gain insight into user data in the browser when it appears as an app, or in some cases even in the classic PC browser. These browser-side integrations allow tracking prevention safeguards to be bypassed by the browser and/or OS companies. This allows a user’s web data to be tracked and users to be profiled on the web through the user agent through a back door.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would OS integration allow that a browser couldn't do unilaterally? Or is this saying that an OS might provide APIs by which a browser passes data to it, and then the OS vendor might get data it shouldn't have? The current text is pretty handwavey, and it'd be nice to make it a little more concrete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not wrong that a browser could do this unilaterally, but OS-browser integrations were flagged by an FTC webinar as a specifically problematic pattern. Also with OS integration greater tracking becomes possible.

Also, the OS could technically snoop on the calls by itself without browser knowledge, or create an id the browser is not aware of.

To give a more concrete example, if you have a windows 10 computer, there is web (and a great deal of other) tracking happening (by default I believe) via Microsoft on the OS layer passing it through to other partners for use with web adv and a bunch of other stuff. In fact Microsoft's privacy statement, if you can find it, openly declares this process. For Apple into Safari its a device id as well and I believe it now requires consent but still performs the user tracking through integration with the safari app. Google sign in does this on android.

Lets say for example you sign into google on your android phone but not your browser. The browser-as-an-app integration can pass your id through the web calls.


Browser Login Tracking (II): The ability for a user to log in to their browser by some means that exists across domain, enabling the browser or affiliate to individually track them. Examples include gmail/chrome login.
**Browser Login Tracking** (B): The ability for a user to log in to their browser by some means that exists across domain, enabling the browser or affiliate to individually track them. Examples include gmail/chrome login.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be about the user logging into their browser, and then the browser sending that login information to more than one domain, right? I think the Google-account/Chrome integration isn't an example of this, since it only sends the login to the one first-party site. Or, if you mean to include the browser giving the user special help to sign into a particular first party, you might remove "some means that exists across domain".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Browser login is a spy on every web page you visit that can report back all cross domain activity to its own server. If we are talking about preventing 'passive tracking' threats, then this login can easily be used for that purpose, and whether it is or is not now, it certainly can be. Therfore we should give it the same caution as any other threat of this magnitude. The g-mail/chrome login is present on all domains (because its embedded in every page, top right corner), and can track cross domain activity, and tie it to a specific individual account. It is passive because the user is not even aware that they have logged into Google on EVERY site, they just think they logged into google for their e-mail, etc.

Also because legacy amp pages, I believe, are still on one of google's domain, there is even first party tracking in this threat. A news site on amp is clearly a third party to google, and yet it would be first party there. Many big browsers own domains that have wide reach in unusual ways. But thats mostly a side note, the real issue is the third party reach.

The browser can already do some of this, which is I believe a separate threat, but the last part, the tieing it directly to an individual account, makes it its own threat model, especially because the browser could use a third party login system. Suppose Chrome partners with Logalog (fake name I hope) and Logalog is tasked to run g-mail for some reason and then it executes this threat; chrome wouldn't necessarily be aware.

Google is the easiest example to use here because of the very obvious cross domain login, but this is a threat model for any browser.

Comment on lines 86 to 88
Hidden Profiles: Refers to a user being unable to access a web system’s profile on their user data. This could be because it is impossible or impractical. It is of particular concern because a user can be misled as to what data of theirs is being captured where this threat exists.

Untouchable Profiles: Refers to a user profile that a user is unable to request deletion of. Includes cases where it is impractical for a user to verify or have confidence their data has truly been deleted upon request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of subsections under "Profiling". When I say they're a different category, I don't mean to make claims about whether or not they're tracking (and, in fact it'd be good to avoid the term "tracking" entirely, or define it).

@TheMaskMaker
Copy link
Author

I'm not comfortable just merging this unilaterally: we should ask the PING chairs to add it to next week's agenda, so more people can weigh in.

Meeting sounds good

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants