Labels on records for opting out of data scraping for AI training. #3052
dunxen
started this conversation in
Bluesky Lexicons
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm unsure if this would make sense at the lower-level atproto lexicon, so apologies if this is in the wrong place to discuss by putting it at the Bluesky Lexicons level.
I'm not a lawyer, first of all, but I believe there may be a case to be made to have default labels on records indicating that no data of those records may be scraped for the purposes of training AI models. These labels should be opt-out in some way of course. Maybe even some sort of license could be part of each record stipulating this (I'm unaware of an appropriate usage license that would make sense here).
It may not actually prevent scraping obviously, but it gives some plausible claim to a violation of this "license"/request if a user does find out it's happened somehow.
This was sparked by some users' fears that firehose and other services could be "open season" for companies collecting data for training generative AI, either on art, text, or whatever.
A standard for this would obviously need to be clear so that companies could understand, so I'm not sure of what the correct approach might be here.
Beta Was this translation helpful? Give feedback.
All reactions