GDPR compliance best practices?

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* GDPR compliance best practices?
@ 2018-04-17 19:15 Peter Backes
  2018-04-17 21:38 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:42 ` Jonathan Nieder
  0 siblings, 2 replies; 53+ messages in thread
From: Peter Backes @ 2018-04-17 19:15 UTC (permalink / raw)
  To: Git Mailing List

Hi,

I'd like to ask whether anyone has best practices for achieving GDPR 
compliance for git repos? The GDPR will come into effect in the EU next 
month.

In particular, how do you cope with the "Right to erasure" concerning 
entries in the history of your git repos?

Erasing author names from the history changes the commit hashes.  It is 
well known that this leads to a lot of problems.  So I don't consider 
this a workable solution.

And how do you justify publishing your employee's name/email as part of 
a git commit under GDPR rules in the first place?

github has the following page mentioning the "Right to erasure" but 
AFAICS nothing about how it will be implemented
https://about.gitlab.com/gdpr/

Here are discussions I found but they do not really provide a solution:
https://law.stackexchange.com/questions/24623/gdpr-git-history
https://news.ycombinator.com/item?id=16509755

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-04-17 19:15 GDPR compliance best practices? Peter Backes
@ 2018-04-17 21:38 ` Ævar Arnfjörð Bjarmason
  2018-04-17 23:25   ` Peter Backes
  2018-06-03  9:27   ` Peter Backes
  2018-06-08 22:42 ` Jonathan Nieder
  1 sibling, 2 replies; 53+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-04-17 21:38 UTC (permalink / raw)
  To: Peter Backes; +Cc: Git Mailing List

On Tue, Apr 17 2018, Peter Backes wrote:

> I'd like to ask whether anyone has best practices for achieving GDPR
> compliance for git repos? The GDPR will come into effect in the EU next
> month.
>
> In particular, how do you cope with the "Right to erasure" concerning
> entries in the history of your git repos?
>
> Erasing author names from the history changes the commit hashes.  It is
> well known that this leads to a lot of problems.  So I don't consider
> this a workable solution.
>
> And how do you justify publishing your employee's name/email as part of
> a git commit under GDPR rules in the first place?
>
> github has the following page mentioning the "Right to erasure" but
> AFAICS nothing about how it will be implemented
> https://about.gitlab.com/gdpr/
>
> Here are discussions I found but they do not really provide a solution:
> https://law.stackexchange.com/questions/24623/gdpr-git-history
> https://news.ycombinator.com/item?id=16509755

[Not a lawyer and all that]

I've been loosely following a similar discussion around blockchains and
my understanding of the situation is that for a project such as say
Linux the GDPR gives you this potential out for that[1]:

    "the personal data are no longer necessary in relation to the
    purposes for which they were collected or otherwise processed"

I.e. you understand that when you submit a patch to linux.git how it's
going to get used, and that it's in a storage system that isn't going to
be pruned just because you ask for it.

In combination with the "Conditions for consent"[2] this becomes a bit
more tricky. I.e. "The data subject shall have the right to withdraw his
or her consent at any time".

You can make a compelling case that for say submitting your data to the
Bitcoin blockhcain the above quote from article 17 overrides it, but can
you for other hash-based-on-hash systems like linux.git? Maybe, maybe
not. I think nobody really knows at this point.

What I do think is for sure is that there's not going to be any one size
fits all solution based on the underlying technology.

If I start storing my webserver access logs with IP information in a git
repo, I don't get to say "sorry git stores stuff this way, I don't want
to rebase it". No court's going to buy that, I've just gone out of my
way to use technology that circumvents the GDPR for no particularly good
reason.

This is very different from you say joining a company, committing to its
internal git repo, and your name being there in perpetuity, or choosing
to submit a patch to linux.git or git.git.

I'd think that would be handled the same way as a structural engineering
firm being able to record in perpetuity who it was that drew up the
design for some bridge. I don't think it's plausible that the GDPR,
which is probably mainly going to be about consumer protection, is going
to concern itself with that in practice.

There's a lot of middle ground in between those two
though. E.g. children are specially protected under the GDPR. Is Linus
going to say he doesn't want to rebase linux.git after some 14 year old
who regrets submitting code doesn't want his name there anymore? Who
knows.

Depending on such common cases maybe git itself should eventually
support some ways to work around the issues. E.g. we could have some
mode to always supply a fake name/e-mail, or make the notice
implicit_ident_advice() spews out somewhat scarier.

1. https://gdpr-info.eu/art-17-gdpr/

2. https://gdpr-info.eu/art-7-gdpr/

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-04-17 21:38 ` Ævar Arnfjörð Bjarmason
@ 2018-04-17 23:25   ` Peter Backes
  2018-06-03  9:27   ` Peter Backes
  1 sibling, 0 replies; 53+ messages in thread
From: Peter Backes @ 2018-04-17 23:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Tue, Apr 17, 2018 at 11:38:26PM +0200, Ævar Arnfjörð Bjarmason wrote:
> I've been loosely following a similar discussion around blockchains and
> my understanding of the situation is that for a project such as say
> Linux the GDPR gives you this potential out for that[1]:
> 
>     "the personal data are no longer necessary in relation to the
>     purposes for which they were collected or otherwise processed"
> 
> I.e. you understand that when you submit a patch to linux.git how it's
> going to get used, and that it's in a storage system that isn't going to
> be pruned just because you ask for it.
> [...]
> You can make a compelling case that for say submitting your data to the
> Bitcoin blockhcain the above quote from article 17 overrides it

Well, you're quoting from lit. a but there's also lit. b to f! It says 
"one of the following grounds applies", not "all of ...".

> This is very different from you say joining a company, committing to its
> internal git repo, and your name being there in perpetuity, or choosing
> to submit a patch to linux.git or git.git.
>
> I'd think that would be handled the same way as a structural engineering
> firm being able to record in perpetuity who it was that drew up the
> design for some bridge.

Internal repo is entirely unproblematic, since you don't need consent 
for doing that. It is covered by Art. 6 (1) lit. f.

The problem is public repos. Publishing employee information is 
generally considered not to be covered by Art. 6 (1) lit. f. After all, 
you can easily publish the software but not the repo.

> I don't think it's plausible that the GDPR,
> which is probably mainly going to be about consumer protection, is going
> to concern itself with that in practice.

Oh, no, GDPR is about privacy in general. It's not only about consumer 
protection. It applies in the same way to employees in relation to 
their employer and to citizens in relation to the authorities, and to 
open source contributors in relation to the projects, or to any other 
data processing outside family and friends (Art. 2 (2) lit. c).

I am inclined to assume that Art. 6 (1) lit. b might be the solution, 
since the licenses typically demand a history of changes to be 
distributed with the program (for example, GPLv3 section 5 a). After 
all, the author generally wants to be given credit for his changes and 
it can be assumed that this one of the conditions for licensing the 
work in the first place.

On the other hand, of course, the author could waive the condition at 
any time, which means Art. 6 (1) lit. b wouldn't apply anymore and 
you'd have the same issue as with consent-based processing of the 
information (lit. a).

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-04-17 21:38 ` Ævar Arnfjörð Bjarmason
  2018-04-17 23:25   ` Peter Backes
@ 2018-06-03  9:27   ` Peter Backes
  2018-06-03 10:45     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-03  9:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

Hi,

Unfortunatly this important topic of GDPR compliance has not seen much 
interest.

After asking github about how they would cope with the issue of erasing 
the author field, they changed their privacy policy, which now 
clarifies that this won't be done.

My guess is that this would ultimately rely on "overriding legitimate 
grounds for the processing" (Art. 17 (1) point (a) GDPR) which is one 
of the most fragile legitimizations avaiblable in the GDPR.

The GDPR emphasizes the importance of using state of the art 
technology, including anonymization, in as much as possible to ensure 
privacy.

At 
https://public-inbox.org/git/CA+dhYEViN4-boZLN+5QJyE7RtX+q6a92p0C2O6TA53==BZfTrQ@mail.gmail.com/T/ 
there is already some discussion about transitioning to a different 
hashing algorithm to get more in line with state of the art in hashing. 
(My clear favourite would be SHA-3.)

In course of this, anonymization could also be added. My idea would be 
as follows:

Do not hash anything directly to obtain the commit ID. Instead, hash a 
list of hashes of [$random_number, $information] pairs. $information 
could be an author id, a commit date, a comment, or anything else. Then 
store the commit id, the list of hashes, and the list of pairs to form 
the commit.

If someone requests erasure, simply empty the corresponding pair in the 
list. All that would be left would be the hash of the pair, which is 
completely anonymous (not more useful than a random number) and thus 
not covered by the GDPR. The history could still be completely 
verified, and when displaying the log, the erased entry could be 
displayed as "<<ERASED>>".

What do you think about this?

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03  9:27   ` Peter Backes
@ 2018-06-03 10:45     ` Ævar Arnfjörð Bjarmason
  2018-06-03 11:25       ` Peter Backes
  0 siblings, 1 reply; 53+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-03 10:45 UTC (permalink / raw)
  To: Peter Backes; +Cc: Git Mailing List

On Sun, Jun 03 2018, Peter Backes wrote:

> Unfortunatly this important topic of GDPR compliance has not seen much
> interest.

I don't think you can infer that there's not much interest, but maybe
people just don't have anything to say about it.

There's a lot of discussions about this that I've seen, but what they
all have in common is that nobody really knows. Just like nobody really
knew what the "cookie law" would be like.

So I think all of us are just waiting to see.

I took the bite and tried to paraphrase some stuff I've read about it,
but as you pointed out in 20180417232504.GA4626@helen.PLASMA.Xg8.DE I
incorrectly surmised some stuff, although I very much suspect that *in
practice* the GDPR is going to be more about "consumer
protection". I.e. regulators / prosecutors are much likely to go after
some advertising company than some project using a Git repo.

Just like nobody's going after some local computer club's internal-only
website because it sets cookies without asking, but they might go after
Facebook for doing the same.

> [...]
> In course of this, anonymization could also be added. My idea would be
> as follows:
>
> Do not hash anything directly to obtain the commit ID. Instead, hash a
> list of hashes of [$random_number, $information] pairs. $information
> could be an author id, a commit date, a comment, or anything else. Then
> store the commit id, the list of hashes, and the list of pairs to form
> the commit.
>
> If someone requests erasure, simply empty the corresponding pair in the
> list. All that would be left would be the hash of the pair, which is
> completely anonymous (not more useful than a random number) and thus
> not covered by the GDPR. The history could still be completely
> verified, and when displaying the log, the erased entry could be
> displayed as "<<ERASED>>".
>
> What do you think about this?

Since the Author is free-form this sort of thing doesn't need to be part
of the git data format. You can just generate a UUID like
"5c679eda-b4e5-4f35-b691-8e13862d4f79" and then set user.name to
"refval:5c679eda-b4e5-4f35-b691-8e13862d4f79" and user.email to
"refval:5c679eda-b4e5-4f35-b691-8e13862d4f79".

Then you'd create a ref on the server like
refs/refval/5c679eda-b4e5-4f35-b691-8e13862d4f79 containing the real
"$user <$email>". If you then wanted to erase that field you'd just
delete the ref, and it would be much easier to teach stuff that renders
the likes of git-log to lookup these refs than changing the data format.

Sites that are paranoid about the GDPR could have a pre-receive hook
rejecting any pushes from EU customers unless their commits were in this
format.

Perhaps some variation of this is where the GDPR v2 will go. It'll be an
"obligation to be forgotten", and I won't be allowed to use my own name
anymore. Instead I'll have a daily UUID issued from a government API to
use on various forms, and the only way for anyone to resolve that will
be going through a webservice that'll reject UUID lookups older than N
months, caching those requests will be met with the death penalty. We'll
all be free at last.

Okey, that last paragraph is just trolling, but I think that refval: ->
ref convention is something worth considering if things *really* go in
this direction.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 10:45     ` Ævar Arnfjörð Bjarmason
@ 2018-06-03 11:25       ` Peter Backes
  2018-06-03 12:59         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-03 11:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Sun, Jun 03, 2018 at 12:45:25PM +0200, Ævar Arnfjörð Bjarmason wrote:
> protection". I.e. regulators / prosecutors are much likely to go after
> some advertising company than some project using a Git repo.

Well, it is indeed rather unlikely that one particular git repo project 
will be targeted, but I guess it is basically certain that at least 
some of them will be.

It is the same as a lottery, it's very unlikely you win the jackpot, 
yet someone wins it every few months. We should care about the entire 
community, not be too selfish.

> Since the Author is free-form this sort of thing doesn't need to be part
> of the git data format. You can just generate a UUID like
> "5c679eda-b4e5-4f35-b691-8e13862d4f79" and then set user.name to
> "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79" and user.email to
> "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79".

Well, this is merely pseudonymization, not anonymization. Note that the 
UUID, innocent as it may look, is not in any way less "personal data" 
than the author string itself. Your proposal would thus not actually 
solve the problem, only slightly transform it. Only when you truly 
anonymize (see my proposal about one way to to it), you can completely 
evade the GDPR.

> Sites that are paranoid about the GDPR could have a pre-receive hook
> rejecting any pushes from EU customers unless their commits were in this
> format.

This won't work either. The GDPR makes each data processor directly 
responsible in relation to the data subject. So it does not matter at 
all who is pushing, it matters who is in the author field of the 
commits that were pushed. And since you don't have any information 
about whether those authors are residing within the EU or not, you have 
to assume they are and you have to obey the GDPR. Even if you are 
outside the EU and do not have any subsidiaries within the EU, the GDPR 
sill applies as long as you are processing personal data of EU citizen. 
Perhaps the authorities in your country will refuse to obey letters of 
request if the EU authorities try to enforce the GDPR on an 
international scope, but if you have a record of GDPR violation and you 
ever set foot on EU territory, you are fair game.

> Instead I'll have a daily UUID issued from a government API

Heaven forbid. ;) There is an old German proverb, warning that even 
humorous trolling might be dangerous: "Man soll den Teufel nicht an die 
Wand malen!" ;)

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 11:25       ` Peter Backes
@ 2018-06-03 12:59         ` Ævar Arnfjörð Bjarmason
  2018-06-03 14:18           ` Peter Backes
  0 siblings, 1 reply; 53+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-03 12:59 UTC (permalink / raw)
  To: Peter Backes; +Cc: Git Mailing List

On Sun, Jun 03 2018, Peter Backes wrote:

> On Sun, Jun 03, 2018 at 12:45:25PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> protection". I.e. regulators / prosecutors are much likely to go after
>> some advertising company than some project using a Git repo.
>
> Well, it is indeed rather unlikely that one particular git repo project
> will be targeted, but I guess it is basically certain that at least
> some of them will be.
>
> It is the same as a lottery, it's very unlikely you win the jackpot,
> yet someone wins it every few months. We should care about the entire
> community, not be too selfish.

I'm not trying to be selfish, I'm just trying to counter your literal
reading of the law with a comment of "it'll depend".

Just like there's a law against public urination in many places, but
this is applied very differently to someone taking a piss in front of
parliament v.s. someone taking a piss in the forest on a hike, even
though the law itself usually makes no distinction about the two.

>> Since the Author is free-form this sort of thing doesn't need to be part
>> of the git data format. You can just generate a UUID like
>> "5c679eda-b4e5-4f35-b691-8e13862d4f79" and then set user.name to
>> "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79" and user.email to
>> "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79".
>
> Well, this is merely pseudonymization, not anonymization. Note that the
> UUID, innocent as it may look, is not in any way less "personal data"
> than the author string itself. Your proposal would thus not actually
> solve the problem, only slightly transform it. Only when you truly
> anonymize (see my proposal about one way to to it), you can completely
> evade the GDPR.

In this example once you'd delete the UUID ref you don't have the UUID
-> author mapping anymore (and b.t.w. that could be a many to one
mapping).

This seems perfectly acceptable to be since the spirit of the GDPR is to
prevent easy Googling of who did what in the past, not to prevent
someone with tremendous resources from say doing a textual analysis of
all git.git commits to find out who authored what.

>> Sites that are paranoid about the GDPR could have a pre-receive hook
>> rejecting any pushes from EU customers unless their commits were in this
>> format.
>
> This won't work either. The GDPR makes each data processor directly
> responsible in relation to the data subject. So it does not matter at
> all who is pushing, it matters who is in the author field of the
> commits that were pushed. And since you don't have any information
> about whether those authors are residing within the EU or not, you have
> to assume they are and you have to obey the GDPR. Even if you are
> outside the EU and do not have any subsidiaries within the EU, the GDPR
> sill applies as long as you are processing personal data of EU citizen.
> Perhaps the authorities in your country will refuse to obey letters of
> request if the EU authorities try to enforce the GDPR on an
> international scope, but if you have a record of GDPR violation and you
> ever set foot on EU territory, you are fair game.

I think again that this is taking too much of a literalist view. The
intent of that policy is to ensure that companies like Google can't just
close down their EU offices weasel out of compliance be saying "we're
just doing business from the US, it doesn't apply to us".

It will not be used against anyone who's taking every reasonable
precaution from doing business with EU customers.

What do you imagine that this is going to be like? That some EU citizen
is going to walk into a small business in South America one day, which
somehow is violating the GPDR, and when that business owner goes on
holiday to the EU they're going to get detained? Not even the US policy
against Cuba is anywhere remotely close to that.

>> Instead I'll have a daily UUID issued from a government API
>
> Heaven forbid. ;) There is an old German proverb, warning that even
> humorous trolling might be dangerous: "Man soll den Teufel nicht an die
> Wand malen!" ;)

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 12:59         ` Ævar Arnfjörð Bjarmason
@ 2018-06-03 14:18           ` Peter Backes
  2018-06-03 15:28             ` Philip Oakley
  2018-06-03 19:48             ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 53+ messages in thread
From: Peter Backes @ 2018-06-03 14:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote:
> I'm not trying to be selfish, I'm just trying to counter your literal
> reading of the law with a comment of "it'll depend".
> 
> Just like there's a law against public urination in many places, but
> this is applied very differently to someone taking a piss in front of
> parliament v.s. someone taking a piss in the forest on a hike, even
> though the law itself usually makes no distinction about the two.

We have huge companies using git now. This is not the tool used by a 
few kernel hackers anymore.

> In this example once you'd delete the UUID ref you don't have the UUID
> -> author mapping anymore (and b.t.w. that could be a many to one
> mapping).

It is not relevant whether you have that mapping or not, it is enough 
that with additional information you could obtain it. For example, say, 
you have 5000 commits with the same UUID. Now your delete the mapping. 
But your friend still has it on his local copy. Now your friendly 
merely needs to tell you who is behind that UUID and instantly you can 
associate all 5000 commits with that person again.

The GDPR is very explict about this, see recital 26. It says that 
pseudonymization is not enough, you need anonymization if you want to 
be free from regulation.

In addition, and in contrast to my proposal, your solution doesn't 
allow verification of the author field.

> I think again that this is taking too much of a literalist view. The
> intent of that policy is to ensure that companies like Google can't just
> close down their EU offices weasel out of compliance be saying "we're
> just doing business from the US, it doesn't apply to us".
> 
> It will not be used against anyone who's taking every reasonable
> precaution from doing business with EU customers.

I think you are underestimating the political intention behind the 
GDPR. It has kind of an imperialist goal, to set international 
standards, to enforce them against foreign companies and to pressure 
other nations to establish the same standards.

If I would read the GPDR in a literal sense, I would in fact come to 
the same conclusion as you: It's about companies doing substantial 
business in the EU. But the GDPR is carefully constructed in such a way 
that it is hard not to be affected by the GDPR in one way or another, 
and the obvious way to cope with that risk is to more or less obey the 
GDPR rules even if one does not have substantial business interests in 
the EU. 

> What do you imagine that this is going to be like? That some EU citizen
> is going to walk into a small business in South America one day, which
> somehow is violating the GPDR, and when that business owner goes on
> holiday to the EU they're going to get detained? Not even the US policy
> against Cuba is anywhere remotely close to that.

Well not if he's locally interacting with that business, a situation 
which I am sure is not regulated by the GDPR.

However, if a large US website accepts users from the EU and uses the 
data gathered in conflict with the GDPR, perhaps selling it for use in 
political campaigns, and it gets several fines for this by EU 
authorities but ignores them and doesn't pay them, and the CEO one day 
takes a flight to Frankfurt to continue by train to Switzerland to get 
some cash from his bank account, then he will most likely not reach 
Swiss territory.

Best wishes
Peter
-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 14:18           ` Peter Backes
@ 2018-06-03 15:28             ` Philip Oakley
  2018-06-03 17:46               ` Peter Backes
  2018-06-03 17:54               ` Philip Oakley
  2018-06-03 19:48             ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 53+ messages in thread
From: Philip Oakley @ 2018-06-03 15:28 UTC (permalink / raw)
  To: Peter Backes, Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

From: "Peter Backes" <rtc@helen.PLASMA.Xg8.DE>
> On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> I'm not trying to be selfish, I'm just trying to counter your literal
>> reading of the law with a comment of "it'll depend".
>>
>> Just like there's a law against public urination in many places, but
>> this is applied very differently to someone taking a piss in front of
>> parliament v.s. someone taking a piss in the forest on a hike, even
>> though the law itself usually makes no distinction about the two.
>
> We have huge companies using git now. This is not the tool used by a
> few kernel hackers anymore.
>
>> In this example once you'd delete the UUID ref you don't have the UUID
>> -> author mapping anymore (and b.t.w. that could be a many to one
>> mapping).
>
> It is not relevant whether you have that mapping or not, it is enough
> that with additional information you could obtain it. For example, say,
> you have 5000 commits with the same UUID. Now your delete the mapping.
> But your friend still has it on his local copy. Now your friendly
> merely needs to tell you who is behind that UUID and instantly you can
> associate all 5000 commits with that person again.
>
> The GDPR is very explict about this, see recital 26. It says that
> pseudonymization is not enough, you need anonymization if you want to
> be free from regulation.
>
> In addition, and in contrast to my proposal, your solution doesn't
> allow verification of the author field.
>
>> I think again that this is taking too much of a literalist view. The
>> intent of that policy is to ensure that companies like Google can't just
>> close down their EU offices weasel out of compliance be saying "we're
>> just doing business from the US, it doesn't apply to us".
>>
>> It will not be used against anyone who's taking every reasonable
>> precaution from doing business with EU customers.
>
> I think you are underestimating the political intention behind the
> GDPR. It has kind of an imperialist goal, to set international
> standards, to enforce them against foreign companies and to pressure
> other nations to establish the same standards.
>
> If I would read the GPDR in a literal sense, I would in fact come to
> the same conclusion as you: It's about companies doing substantial
> business in the EU. But the GDPR is carefully constructed in such a way
> that it is hard not to be affected by the GDPR in one way or another,
> and the obvious way to cope with that risk is to more or less obey the
> GDPR rules even if one does not have substantial business interests in
> the EU.
>
>> What do you imagine that this is going to be like? That some EU citizen
>> is going to walk into a small business in South America one day, which
>> somehow is violating the GPDR, and when that business owner goes on
>> holiday to the EU they're going to get detained? Not even the US policy
>> against Cuba is anywhere remotely close to that.
>
> Well not if he's locally interacting with that business, a situation
> which I am sure is not regulated by the GDPR.
>
> However, if a large US website accepts users from the EU and uses the
> data gathered in conflict with the GDPR, perhaps selling it for use in
> political campaigns, and it gets several fines for this by EU
> authorities but ignores them and doesn't pay them, and the CEO one day
> takes a flight to Frankfurt to continue by train to Switzerland to get
> some cash from his bank account, then he will most likely not reach
> Swiss territory.
>

--
Having been through corporate training and read up a number of the
conflicting views in the press, one of the issues is that there are two
viewpoints, one from each side of the fence.

From a corporate/organisation viewpoint, it is best if every case of holding
user information is for a legitimate purpose, which then means the company
has 'protection' from requests for removal because the data *is* held
legally/legitimately (which includes acting as evidence).

In most Git cases that legal/legitimate purpose is the copyright licence,
and/or corporate employment. That is, Jane wrote it, hence X has a legal
rights of use, and we need to have a record of that (Jane wrote it) as
evidence of that (I'm X, I can use it) right. That would mean that Jane
cannot just ask to have that record removed and expect it to be removed.

From a personal view, many folk want it to be that corporates (and open
source organisations) should hold no personal information with having
explicit permission that can then be withdrawn, with deletion to follow.
However that 'legal' clause does [generally] win.

In the git.git case (and linux.git) there is the DCO (to back up the GLP2)
as an explicit requirement/certification that puts the information into the
legal evidence category. IIUC almost all copyright ends up with a similar
evidentail trail for the meta data.


The more likely problem is if the content of the repo, rather than the meta
data, is subject to GDPR, and that could easily ruin any storage method.
Being able to mark an object as <Lost/Deleted> would help here(*).

Also remember that most EU legislation is 'intent' based, rather than
'letter of', for the style of legal arguments (which is where some of the UK
Brexit misunderstandings come from), so it is more than possible to get into
the situation where an action is both mandated and illegal at the same time,
so plent of snake oil salesman continue to sell magic fixes according to the
customers local biases.

I do not believe Git has anything to worry about that wasn't already an
issue.
--
Philip
(*) the "mark an object as <Lost/Deleted>" is something I'd like for doing
Narrow clones, it's almost/exactly like the missing sub-module revision
"problem".


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 15:28             ` Philip Oakley
@ 2018-06-03 17:46               ` Peter Backes
  2018-06-03 18:18                 ` Theodore Y. Ts'o
                                   ` (2 more replies)
  2018-06-03 17:54               ` Philip Oakley
  1 sibling, 3 replies; 53+ messages in thread
From: Peter Backes @ 2018-06-03 17:46 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Sun, Jun 03, 2018 at 04:28:31PM +0100, Philip Oakley wrote:
> In most Git cases that legal/legitimate purpose is the copyright licence,
> and/or corporate employment. That is, Jane wrote it, hence X has a legal
> rights of use, and we need to have a record of that (Jane wrote it) as
> evidence of that (I'm X, I can use it) right. That would mean that Jane
> cannot just ask to have that record removed and expect it to be removed.

Re corporate employment:

For sure nobody would dare to quesion that a company has a right to 
keep an internal record that Jane wrote it.

The issue is publishing that information. This is an entirely different 
story.

I already stressed that from the very beginning.

Re copyright license:

No, a copyright license does not provide a legitimization.

- copyright is about distributing the program, not about distributing 
version control metadata.

- Being named is a right, not an obligation of the author. Hence, if 
the author doesn't want his name published, the company doesn't have 
legitimate grounds based in copyright for doing it anyway, against his 
or her will.

> From a personal view, many folk want it to be that corporates (and open
> source organisations) should hold no personal information with having
> explicit permission that can then be withdrawn, with deletion to follow.
> However that 'legal' clause does [generally] win.

Let's be honest: We do not know what legitimization exactly in each 
specific case the git metadata is being distributed under.

It may be copyright, it may be employment, but it may also be revocable 
consent. This is, we cannot safely assume that no git user will ever 
have to deal with a legitimate request based on the right to be 
forgotten.

> In the git.git case (and linux.git) there is the DCO (to back up the GLP2)
> as an explicit requirement/certification that puts the information into the
> legal evidence category. IIUC almost all copyright ends up with a similar
> evidentail trail for the meta data.

This makes things more complicated, not less. You have yet more meta 
data to cope with, yet more opportunities to be bitten by the right to 
be forgotten. Since I proposed a list of metadata where each entry can 
be anonymized independently of each other, it would be able to deal 
with this perfectly.

> The more likely problem is if the content of the repo, rather than the meta
> data, is subject to GDPR, and that could easily ruin any storage method.
> Being able to mark an object as <Lost/Deleted> would help here(*).

My proposal supports any part of the commit, including the contents of 
individual files, as eraseable, yet verifiable data.

> Also remember that most EU legislation is 'intent' based, rather than
> 'letter of', for the style of legal arguments (which is where some of the UK
> Brexit misunderstandings come from), so it is more than possible to get into
> the situation where an action is both mandated and illegal at the same time,
> so plent of snake oil salesman continue to sell magic fixes according to the
> customers local biases.

This may be true. I am not trying to sell snake oil, however. To have 
erasure and verifiability at the same time is a highly generic feature 
that may be desirable to have for a multitude of reasons, including but 
not limited to legal ones like GDPR and copyright violations.

> I do not believe Git has anything to worry about that wasn't already an
> issue.

Yes, but it definitely had and still does have something to worry about.

git should provide technical means to deal with this. I provided a 
proposal based on anonymization that does not in any way have any 
drawback compared to the status quo, except a slight increase in 
metadata size and various degrees of backwards incompatibility, 
depending on how it is implemented.

What do you think about my proposal as a solution for the problem?

You provide a lot of arguments about why it is not a necessity to have 
this, but let's assume it is; is there any actual problem you see with 
the proposal, except that someone would have to implement it?

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 15:28             ` Philip Oakley
  2018-06-03 17:46               ` Peter Backes
@ 2018-06-03 17:54               ` Philip Oakley
  1 sibling, 0 replies; 53+ messages in thread
From: Philip Oakley @ 2018-06-03 17:54 UTC (permalink / raw)
  To: Peter Backes, Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

correcting a negative /with/without/ and inserting a comma.
----- Original Message ----- 
From: "Philip Oakley" <philipoakley@iee.org>
[snip]
> 
> From a personal view, many folk want it to be that corporates (and open
> source organisations) should hold no personal information with having
s/with/without/

> explicit permission that can then be withdrawn, with deletion to follow.
s/permission/permission,/  

> However that 'legal' clause does [generally] win.
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 17:46               ` Peter Backes
@ 2018-06-03 18:18                 ` Theodore Y. Ts'o
  2018-06-03 19:11                   ` Peter Backes
  2018-06-03 22:28                 ` Philip Oakley
  2018-06-07  1:38                 ` David Lang
  2 siblings, 1 reply; 53+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-03 18:18 UTC (permalink / raw)
  To: Peter Backes
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Sun, Jun 03, 2018 at 07:46:17PM +0200, Peter Backes wrote:
> 
> Let's be honest: We do not know what legitimization exactly in each 
> specific case the git metadata is being distributed under.

It seems like you are engaging in something even more dangerous than a
hardware engineering pretending they know how program, or a software
engineer knowing how to use as oldering iron --- and that's a
programmer pretending they know enough that they can speculate on the
law.

I would gently suggest that if you really want to engage in something
practical than speculating how the GPDR compliance will work out in
actual practice, that you contact a lawyer and get official legal
advice?

After getting that advice, if you or your company wants to implemnt,
you can then send patches, and those can get debated using the usual
patch submission process.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 18:18                 ` Theodore Y. Ts'o
@ 2018-06-03 19:11                   ` Peter Backes
  2018-06-03 19:24                     ` Peter Backes
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-03 19:11 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Sun, Jun 03, 2018 at 02:18:07PM -0400, Theodore Y. Ts'o wrote:
> I would gently suggest that if you really want to engage in something
> practical than speculating how the GPDR compliance will work out in
> actual practice, that you contact a lawyer and get official legal
> advice?

I completely disagree.

Erasure is a technical issue to be solved by engineers, not by lawyers.

And that's completely in line with the GDPR. The GDPR is ultimately not 
a legal thing to be solved by lawyers writing lengthy legal 
argumentations and disclaimers and such. They are not even the ones to 
take lead in GDPR implementation. All that would be simply snake oil. 
Some legal documentation may be necessary, and having a competent 
lawyer in a GDPR compliance task force is certainly a must. But that 
gets you done only 20% of the job, 80% is engineering. Every lawyer who 
claims to give you shady legal tricks to get the job 100% done in no 
time is a liar.

The GDPRs ultimate goal is to incline the world to improve how data 
protection is implemented on a technical level. The GDPR contains 
several blanket clauses that refer to the "state of the art" of 
technology, which the GDPR itself of course does not define and which 
is of course nothing a lawyer has any competence in.

My proposal is a technical, not a legal one: Provide a generic 
possibility of having eraseability and verifiability at the same time 
in git. Improve the state of the art in version control such that it is 
more in line with the GDPRs idea that people have a right to be 
forgotten, but to also be useful for a multitude of other applications. 
The lawyers can then build on this.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 19:11                   ` Peter Backes
@ 2018-06-03 19:24                     ` Peter Backes
  2018-06-03 20:07                       ` Theodore Y. Ts'o
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-03 19:24 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

Addendum:

I one discussed with a philosopher the question: What is your argument 
against libertarianism?

He said: It would be a tyranny of lawyers.

Let's not have a tyranny of lawyers. Let us, the engineers and hackers, 
exercise the necessary control over those pesky lawyers by defining and 
redefining the state of the art in technology, and prevent them from 
defining it by themselves. For a hammer, everything looks like a nail. 
What is the better options: To suggest people to pay for legal advice 
by lawyers, who only offer lengthy disclaimers and such for bypassing 
the right to be forgotten, or simply discuss technical changes for git 
which enable its easy implementation, without legal excuses for not 
doing supporting it?

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 14:18           ` Peter Backes
  2018-06-03 15:28             ` Philip Oakley
@ 2018-06-03 19:48             ` Ævar Arnfjörð Bjarmason
  2018-06-03 20:24               ` Peter Backes
  1 sibling, 1 reply; 53+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-03 19:48 UTC (permalink / raw)
  To: Peter Backes; +Cc: Git Mailing List

On Sun, Jun 03 2018, Peter Backes wrote:

> On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> I'm not trying to be selfish, I'm just trying to counter your literal
>> reading of the law with a comment of "it'll depend".
>>
>> Just like there's a law against public urination in many places, but
>> this is applied very differently to someone taking a piss in front of
>> parliament v.s. someone taking a piss in the forest on a hike, even
>> though the law itself usually makes no distinction about the two.
>
> We have huge companies using git now. This is not the tool used by a
> few kernel hackers anymore.

Sure, but what I'm pointing out is a) you can't focus on git as the
technology because it tells you nothing about what's being done with it
(e.g. the log file case I mentioned b) nobody who came up with the GDPR
was concerned with some free software projects or the SCM used by
companies, so this is very unlikely to be enforced.

>> In this example once you'd delete the UUID ref you don't have the UUID
>> -> author mapping anymore (and b.t.w. that could be a many to one
>> mapping).
>
> It is not relevant whether you have that mapping or not, it is enough
> that with additional information you could obtain it. For example, say,
> you have 5000 commits with the same UUID. Now your delete the mapping.
> But your friend still has it on his local copy. Now your friendly
> merely needs to tell you who is behind that UUID and instantly you can
> associate all 5000 commits with that person again.

So nobody can be GDPR compliant in the face of archive.org and the like?
If the law says that you need to delete information you published in the
past, and you do so, how is it your problem that someone mirrored &
re-published it? That's their compliance problem at that point.

> The GDPR is very explict about this, see recital 26. It says that
> pseudonymization is not enough, you need anonymization if you want to
> be free from regulation.
>
> In addition, and in contrast to my proposal, your solution doesn't
> allow verification of the author field.

It does if you've got the ref. Maybe I just don't get your proposal,
quote:

    Do not hash anything directly to obtain the commit ID. Instead, hash a
    list of hashes of [$random_number, $information] pairs. $information
    could be an author id, a commit date, a comment, or anything else. Then
    store the commit id, the list of hashes, and the list of pairs to form
    the commit.

You're just proposing (if I've read this correctly) that the commit
object should have some list of headers pointing to other SHA1s, and
that fsck and the like be OK with these going away. Right?

How is this intrinsically different from referring to something in the
ref namespace that may be deleted in the future?

In both cases you're just trying to solve the problem of trying to
somehow encode data into a git repository today, that may go away
tomorrow. Similar to how a reference to some LFS object today going away
doesn't fail "git fsck".

>> I think again that this is taking too much of a literalist view. The
>> intent of that policy is to ensure that companies like Google can't just
>> close down their EU offices weasel out of compliance be saying "we're
>> just doing business from the US, it doesn't apply to us".
>>
>> It will not be used against anyone who's taking every reasonable
>> precaution from doing business with EU customers.
>
> I think you are underestimating the political intention behind the
> GDPR. It has kind of an imperialist goal, to set international
> standards, to enforce them against foreign companies and to pressure
> other nations to establish the same standards.
>
> If I would read the GPDR in a literal sense, I would in fact come to
> the same conclusion as you: It's about companies doing substantial
> business in the EU. But the GDPR is carefully constructed in such a way
> that it is hard not to be affected by the GDPR in one way or another,
> and the obvious way to cope with that risk is to more or less obey the
> GDPR rules even if one does not have substantial business interests in
> the EU.

Okey, so you're not reading the GDPR in some literal sense, but you're
coming to a conclusion that's supported by ... what? To echo Theodore
Y. Ts'o E-Mail have you consulted with someone who's an actual lawyer on
this subject?

I haven't but, I'm not suggesting that the git data format needs to
change because of some new EU law. You are, what's your basis for that
opinion?

It seems to me that the git project doesn't need to do anything about
this. There's plenty of things that are illegal to publish, and some of
which may be made illegal after the fact (e.g. national security related
information). If those things are incidentally saved in git repositories
the parties involved may need to run git-filter-branch.

Of course if they need to do that on a weekly basis because of some
overzealous law we may need to have some "native" support for that, but
I see zero signs of that so far.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 19:24                     ` Peter Backes
@ 2018-06-03 20:07                       ` Theodore Y. Ts'o
  2018-06-03 20:52                         ` Peter Backes
  0 siblings, 1 reply; 53+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-03 20:07 UTC (permalink / raw)
  To: Peter Backes
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Sun, Jun 03, 2018 at 09:24:17PM +0200, Peter Backes wrote:
> 
> He said: It would be a tyranny of lawyers.
> 
> Let's not have a tyranny of lawyers. Let us, the engineers and hackers, 
> exercise the necessary control over those pesky lawyers by defining and 
> redefining the state of the art in technology, and prevent them from 
> defining it by themselves. For a hammer, everything looks like a nail. 
> What is the better options: To suggest people to pay for legal advice 
> by lawyers, who only offer lengthy disclaimers and such for bypassing 
> the right to be forgotten, or simply discuss technical changes for git 
> which enable its easy implementation, without legal excuses for not 
> doing supporting it?

Why don't you try to implement your proposal then, and then benchmark
it.  After you find out how much of a performance disaster it's going
to be, especially for large git repos, we can discuss who is being
tyrannical.

It may very well be that different people and companies will get
different legal advice, and one of the interesting things about many
git repos for open source project is that it is not owned by any one
company.  A change in the git repo format is one that has to be
adopted by the entire open source project, and if a portion of the
community isn't interesting in paying the overhead cost, and sticks
with the existing git repo format, I wonder what the "imperialistic"
(your word, not mine) EU will do --- try to lock up or sue everyone
from outside the EU that refuses to pay the 2x-10x performance
overhead and sticks with the original repo format, such that anyone
who wants to interoperate has to send git pushes in the orignial
format?

But in any case, way don't you send a patch and we can discuss?  As
the old saying goes, "code talks, bullshit walks".   :-)

Regards,

    	       	     	   	  	   	     - Ted

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 19:48             ` Ævar Arnfjörð Bjarmason
@ 2018-06-03 20:24               ` Peter Backes
  0 siblings, 0 replies; 53+ messages in thread
From: Peter Backes @ 2018-06-03 20:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Sun, Jun 03, 2018 at 09:48:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Sure, but what I'm pointing out is a) you can't focus on git as the
> technology because it tells you nothing about what's being done with it
> (e.g. the log file case I mentioned b) nobody who came up with the GDPR
> was concerned with some free software projects or the SCM used by
> companies, so this is very unlikely to be enforced.

As I already said, the GDPR refers to the state of the art in 
technology, without defining it.

The GDPR provides a generic framework. It covers everyone. From a 
single person running a small blog to a S&P500 enterprise. It also 
covers non-profits and state authorities. Everyone is covered. 
Including SCM used.

The GDPR will be enforced against SCMs. The question is just who will 
be the first to be affected. I suspect it will be a mega-corporation 
who fired one of their developers who wants to fight back and exercise 
his right to be forgotten against the company's public git repos.

> So nobody can be GDPR compliant in the face of archive.org and the like?

The GDPR has special exceptions for archives and the like.

> It does if you've got the ref. Maybe I just don't get your proposal,
> quote:
> 
>     Do not hash anything directly to obtain the commit ID. Instead, hash a
>     list of hashes of [$random_number, $information] pairs. $information
>     could be an author id, a commit date, a comment, or anything else. Then
>     store the commit id, the list of hashes, and the list of pairs to form
>     the commit.
> 
> You're just proposing (if I've read this correctly) that the commit
> object should have some list of headers pointing to other SHA1s, and
> that fsck and the like be OK with these going away. Right?

Certainly not SHA1. SHA1 is completely broken. I know Linus has a bit 
of a different opinion. But there's really no defense for SHA1. It's an 
utterly broken algorithm and should not be used at all anymore.

> How is this intrinsically different from referring to something in the
> ref namespace that may be deleted in the future?

I guess I am partly repeating myself, but:

1. Having fsck be OK with erasure is not enough. It tells you nothing 
about anonymization. If the hash is the same in 5000 instances that's 
pseudonymization, not anonymization. You need to ensure a different 
hash in each instance, and you need to ensure there's no easy way to 
reconstruct the data from its hash. Hence $random_number (or let's call 
it $huge_random_number, it should have x bits if the hash has x bits). 
If you have the SHA1 64ca93f83bb29b51d8cbd6f3e6a8daff2e08d3ec it's too 
easy to figure out the plaintext (it's "Peter" BTW).

2. If you use a random UUID you cannot reconstruct the data from its 
hash, but you have the same issue about UUID reuse. Plus, you lose the 
ability to verify the author's name as part of the commit.

> Okey, so you're not reading the GDPR in some literal sense, but you're
> coming to a conclusion that's supported by ... what? To echo Theodore
> Y. Ts'o E-Mail have you consulted with someone who's an actual lawyer on
> this subject?

I'm replying in private conversation about this one. It's not relevant 
for this discussion.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 20:07                       ` Theodore Y. Ts'o
@ 2018-06-03 20:52                         ` Peter Backes
  2018-06-03 21:03                           ` Theodore Y. Ts'o
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-03 20:52 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Sun, Jun 03, 2018 at 04:07:39PM -0400, Theodore Y. Ts'o wrote:
> Why don't you try to implement your proposal then, and then benchmark
> it.  After you find out how much of a performance disaster it's going
> to be, especially for large git repos, we can discuss who is being
> tyrannical.

See, Ted, but I have this other hobby project with git stash preserving 
timestamps, which is 90% done but not yet finished. I am a very busy 
person. I might implement it but it's not the topmost priority. Thus, 
first I want to discuss to not waste too much time implementing 
something that's then rejected by valid criticism while that criticms 
could have been raised beforehand. Perhaps I can convince my employer 
to work on it on their account. But there's so much to do at the moment.

I have a PhD, about very complex things like static program analysis by 
abstract interpretation. I love hacking very much but I can mostly only 
do it as a hobby because humanity is better served doing the complex 
things that not every hacker can do.

I know I am being whiny but that's how it is.

But I will take your message as saying you at least don't see any 
obvious criticism leading to complete rejection of the approach.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 20:52                         ` Peter Backes
@ 2018-06-03 21:03                           ` Theodore Y. Ts'o
  2018-06-03 22:16                             ` Peter Backes
  0 siblings, 1 reply; 53+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-03 21:03 UTC (permalink / raw)
  To: Peter Backes
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Sun, Jun 03, 2018 at 10:52:33PM +02h00, hPeter Backes wrote:
> But I will take your message as saying you at least don't see any 
> obvious criticism leading to complete rejection of the approach.

If you don't think a potential 2x -- 10x performance hit isn't a
blocking factor --- sure, go ahead and try implementing it.  And good
luck to you.  And this is not a guarantee that it won't get rejected.
I certainly don't have the power to make that guarantee.

If you don't have time to implement, why do you think it's fair to
inflict on everyone else the request for time to do a design review
for something for which the need hasn't even been established?

Regards,

	      	    	     	    	 - Ted

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 21:03                           ` Theodore Y. Ts'o
@ 2018-06-03 22:16                             ` Peter Backes
  2018-06-04 13:47                               ` Theodore Y. Ts'o
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-03 22:16 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Sun, Jun 03, 2018 at 05:03:44PM -0400, Theodore Y. Ts'o wrote:
> If you don't think a potential 2x -- 10x performance hit isn't a
> blocking factor --- sure, go ahead and try implementing it.  And good
> luck to you.  And this is not a guarantee that it won't get rejected.
> I certainly don't have the power to make that guarantee.

I do not want or expect a guarantee, or even a probability, of course. 
Just trying to avoid "STRONG REJECT. We could have said you before you 
even started implementing. Why didn't you discuss this beforehand?"

One would simply change something like

author A U Thor <author@example.com> 1465982009 +0000

into something like

author 21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U Thor <author@example.com>
author-hash 469bb107e38f8e59dddb3bbd6f8646e052bf73d48427865563c7358a64467f2c
authordate c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 1465982009 +0000
authordate-hash 199875e5aedb6cb164a2b40c16209dc5bb37f34c059a56c6d96766440fb0fe68

and then compute the commit id without the "author" and the 
"authordate" lines.

The *-hash values were obtained as follows:

echo -n '21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U Thor <author@example.com>' | sha3sum -a 256
echo -n 'c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 1465982009 +0000' | sha3sum -a 256

The hex values here are simply the $huge_random_numbers

Verifying the commit ID by itself wouldn't be any less efficient than 
before. Admitteldly, it wouldn't verify the author and authordate 
integrity anymore without additional work. That would be some overhead, 
sure, and could be done on demand, and would mostly affect clones. I 
don't think it would be that much of a problem. It can be parallelized 
easily. The hashes for each field are independent of each other. They 
can all be verified in parallel in different threads running on 
different cores.

On djb's typical 2015 skylake machine the supercop benchmark tells us 
that sha3-256 (~=keccakc512) has a speed of about 20 cycles/byte for 
blocks of 64 bytes of data, see 
https://bench.cr.yp.to/results-sha3.html#amd64-skylake

Let's say we have 128 bytes of data on average for the author field, so 
conservatively speaking it takes about 3000 cycles (> 128*20) to hash 
and compare the hash.

At 3000 MHz, we can thus do roughly about 1000 verifications per second 
per core.

Let's assume we have 10 anonymizable fields of this kind per commit.

Then the overhead would be one second per 100 x ncores commits.

How many commits are we talking about in a huge repository? And how 
long does a clone of such a huge repository take at the moment? Do you 
have any numbers?

> If you don't have time to implement, why do you think it's fair to
> inflict on everyone else the request for time to do a design review
> for something for which the need hasn't even been established?

I do not request from anyone to even reply to my messages. I just see a 
lot of time being wasted by discussing things about my proposal that 
are technically irrelevant. If that time were put into reviewing the 
design, it would be spent better.

Please don't devalue a proposal. It is not true that the only value is 
in actual code and proposals are "bullshit".

I was not the first to raise the issue, as I clearly showed in my 
initial email.

The demand is in fact high; very high. At present, that demand is 
satisfied by lawyers. Who are writing snake oil disclaimers and such 
for enormous sums of money. In a lot of companies. To "solve" a 
technical issue by pseudo-legal means by finding excuses for why the 
"right to be forgotten" doesn't have to be implemented in specific 
cases such as git. What if all that lawyer money were put into actually 
solving the technical issues as technical issues? Engineers are 
apparently bad at marketing, the lawyers seem more successful in that 
respect.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 17:46               ` Peter Backes
  2018-06-03 18:18                 ` Theodore Y. Ts'o
@ 2018-06-03 22:28                 ` Philip Oakley
  2018-06-03 23:01                   ` Peter Backes
  2018-06-07  1:38                 ` David Lang
  2 siblings, 1 reply; 53+ messages in thread
From: Philip Oakley @ 2018-06-03 22:28 UTC (permalink / raw)
  To: Peter Backes; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

From: "Peter Backes" <rtc@helen.PLASMA.Xg8.DE>
> On Sun, Jun 03, 2018 at 04:28:31PM +0100, Philip Oakley wrote:
>> In most Git cases that legal/legitimate purpose is the copyright licence,
>> and/or corporate employment. That is, Jane wrote it, hence X has a legal
>> rights of use, and we need to have a record of that (Jane wrote it) as
>> evidence of that (I'm X, I can use it) right. That would mean that Jane
>> cannot just ask to have that record removed and expect it to be removed.
>
> Re corporate employment:
>
> For sure nobody would dare to quesion that a company has a right to
> keep an internal record that Jane wrote it.
>
> The issue is publishing that information. This is an entirely different
> story.

It is here that Article 6 kicks in as to whether the 'organisation' can 
retain the data and continue to use it.
https://gdpr-info.eu/art-6-gdpr/
https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/lawful-basis-for-processing/
https://www.lawscot.org.uk/news-and-events/news/gdpr-legal-basis-and-why-it-matters/

For an open source project with an open source licence then an implict DCO 
applies for the meta data. It is the legal  basis for the the release.

If a corporate project has a closed source project, then yes, open 
publishing of that personal data within a repo's meta data would be 
incorrect, even though the internal repo would be kept.

>
> I already stressed that from the very beginning.
>
> Re copyright license:
>
> No, a copyright license does not provide a legitimization.
>
> - copyright is about distributing the program, not about distributing
> version control metadata.

It is specificaly about giving that right to copy by Jane Doe (but git gives 
no other information other than that supposedly globally unique 'author 
email'.

>
> - Being named is a right, not an obligation of the author. Hence, if
> the author doesn't want his name published, the company doesn't have
> legitimate grounds based in copyright for doing it anyway, against his
> or her will.

Git for Open Source is about open licencing by name. I'd agree that a closed 
corporate licence stays closed, but not forgotten.

>
>> From a personal view, many folk want it to be that corporates (and open
>> source organisations) should hold no personal information with having
>> explicit permission that can then be withdrawn, with deletion to follow.
>> However that 'legal' clause does [generally] win.
>
> Let's be honest: We do not know what legitimization exactly in each
> specific case the git metadata is being distributed under.

We should know, already. A specific licence [or limit] should be in place. 
We don't really want to have to let a court decide ;-)

>
> It may be copyright, it may be employment, but it may also be revocable
> consent. This is, we cannot safely assume that no git user will ever
> have to deal with a legitimate request based on the right to be
> forgotten.
>

The law is never decided by technical means, unfortunately. Regular git 
users should have no issues - they just need to point their finger at the 
responsible authority. (beware though, of the oneway trap door that the 
users mistakes can become the problem for the responsible authority!)

>> In the git.git case (and linux.git) there is the DCO (to back up the 
>> GLP2)
>> as an explicit requirement/certification that puts the information into 
>> the
>> legal evidence category. IIUC almost all copyright ends up with a similar
>> evidentail trail for the meta data.
>
> This makes things more complicated, not less. You have yet more meta
> data to cope with, yet more opportunities to be bitten by the right to
> be forgotten. Since I proposed a list of metadata where each entry can
> be anonymized independently of each other, it would be able to deal
> with this perfectly.

The DCO/GPL2 are the legitimate data record that recipients should have for 
their copy. There is no right to be forgotten at that point.

>
>> The more likely problem is if the content of the repo, rather than the 
>> meta
>> data, is subject to GDPR, and that could easily ruin any storage method.
>> Being able to mark an object as <Lost/Deleted> would help here(*).
>
> My proposal supports any part of the commit, including the contents of
> individual files, as eraseable, yet verifiable data.
>
>> Also remember that most EU legislation is 'intent' based, rather than
>> 'letter of', for the style of legal arguments (which is where some of the 
>> UK
>> Brexit misunderstandings come from), so it is more than possible to get 
>> into
>> the situation where an action is both mandated and illegal at the same 
>> time,
>> so plent of snake oil salesman continue to sell magic fixes according to 
>> the
>> customers local biases.
>
> This may be true. I am not trying to sell snake oil, however. To have
> erasure and verifiability at the same time is a highly generic feature
> that may be desirable to have for a multitude of reasons, including but
> not limited to legal ones like GDPR and copyright violations.
>
>> I do not believe Git has anything to worry about that wasn't already an
>> issue.
>
> Yes, but it definitely had and still does have something to worry about.
>
> git should provide technical means to deal with this. I provided a
> proposal based on anonymization that does not in any way have any
> drawback compared to the status quo, except a slight increase in
> metadata size and various degrees of backwards incompatibility,
> depending on how it is implemented.
>
> What do you think about my proposal as a solution for the problem?

I see the solution to be elsewhere, and that it is in some ways a strawman 
discussion: "if someone has the right to be forgotten, how do we delete the 
meta data", when that right (to delete the meta data in a properly licence 
repo) does not exist.

That said, the problem of maintaining repo integrity when some objects must 
be deleted or re-written (because they had stored peronal info that they 
should not have), will require a little bit extra on the side.

But this is open source, so ideas, and code, will come forward that allows 
things like 'replaced commits' to be formally part of a repo and its leading 
oid (or maybe it's an oid pair) will handle that. I'd guess that the commit 
will have an extra line after the parents and tree lines that details (in 
some manner) the 'replaced' things, so that fsck still works, the oid is 
complete and thus the whole shebang can be verified.

>
> You provide a lot of arguments about why it is not a necessity to have
> this, but let's assume it is; is there any actual problem you see with
> the proposal, except that someone would have to implement it?

It's the strawman problem. If it was a real 'real issue' then it would have 
already shown up with companies clamouring to pay folk to fix our (git's) 
latest problem. But the haven't, so I think it's a much more balanced issue.
--
Philip

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 22:28                 ` Philip Oakley
@ 2018-06-03 23:01                   ` Peter Backes
  2018-06-04 12:24                     ` Philip Oakley
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-03 23:01 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Sun, Jun 03, 2018 at 11:28:43PM +0100, Philip Oakley wrote:
> It is here that Article 6 kicks in as to whether the 'organisation' can
> retain the data and continue to use it.

Article 6 is not about continuing to use data. Article 6 is about 
having and even obtaining it in the first place.

Article 17 and article 21 are about continuing to use data.

> For an open source project with an open source licence then an implict DCO
> applies for the meta data. It is the legal  basis for the the release.

Neither article 6 nor 17 or 21 have anything remotely like an "implicit 
DCO" as a legitimization for publishing employee data.

The GDPR is very explicit about implicit stuff never being a basis for 
consent, if you want to imply that is your basis. And consent can be 
withdrawn at any time anyway.

An open source license has nothing whatsoever to do with the question 
of version control metadata. A public version control system is not 
necessary to publish open source software.

> > - copyright is about distributing the program, not about distributing
> > version control metadata.
> It is specificaly about giving that right to copy by Jane Doe (but git gives
> no other information other than that supposedly globally unique 'author
> email'.

I don't get what you are saying. As I said, a public version control 
system is not necessary to publish open source software. The two things 
may be intimately related in practice, but not in theory.

> > - Being named is a right, not an obligation of the author. Hence, if
> > the author doesn't want his name published, the company doesn't have
> > legitimate grounds based in copyright for doing it anyway, against his
> > or her will.
> Git for Open Source is about open licencing by name. I'd agree that a closed
> corporate licence stays closed, but not forgotten.

Again I don't get what you are saying. The author has a right to be 
named as the author, not an obligation. This has nothing whatsoever to 
do with the question of Open Source vs. closed corporate licenses.

> > Let's be honest: We do not know what legitimization exactly in each
> > specific case the git metadata is being distributed under.
> 
> We should know, already. A specific licence [or limit] should be in place.
> We don't really want to have to let a court decide ;-)

It is insufficient to have a license for distributing the program. The 
license is not a GDPR legitimization for git metadata. Distributing the 
program can be done without distributing the author's identity as part 
of the metadata of his commits.

> The law is never decided by technical means, unfortunately.

It is. The GDPR refers to the state of the art of technology without 
defining it. Thus, technical means are very important in the GDPR. This 
may be something new for lawyers. If technology changes tomorrow, even 
without anything else changing, you may be breaking the GDPR by this 
simple fact tomorrow, while not breaking it today.

Again: Technology is very important in the GDPR.

> Regular git users should have no issues - they just need to point 
> their finger at the responsible authority.

If git users are putting commits online for global download, they are 
the responsible authority.

> The DCO/GPL2 are the legitimate data record that recipients should have for
> their copy. There is no right to be forgotten at that point.

What do you mean by "should have for their copy"? Why shouldn't there 
be a right to be forgotten? Open Source Software has been distributed a 
lot without detailed version control history information. Having this 
information as a record is certainly in the interest of the recipient, 
but it is very very questionable that it is an overriding legitimate 
grounds as per Art. 17 for keeping that data.

> I see the solution to be elsewhere, and that it is in some ways a strawman
> discussion: "if someone has the right to be forgotten, how do we delete the
> meta data", when that right (to delete the meta data in a properly licence
> repo) does not exist.

See, this kind of shady legal argument is what lawyers are selling you. 
Why not put the energy into designing a technical solution.

They tell you: "Ignore the GDPR. I will give you backup by giving you 
lots of disclaimers and excuses for doing so. Just give me a lot of 
money."

Having the ability to validate yet erase data form repositorys is 
desirable from a technical point of view. It has a lot of uses, not 
necessarily only legal ones. The objection of efficiency raised by Ted 
is a valid one. The strawman argument is not.

Best wishes
Peter
-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 23:01                   ` Peter Backes
@ 2018-06-04 12:24                     ` Philip Oakley
  0 siblings, 0 replies; 53+ messages in thread
From: Philip Oakley @ 2018-06-04 12:24 UTC (permalink / raw)
  To: Peter Backes; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

Hi Peter,
(lost the cc's)

From: "Peter Backes" <rtc@helen.PLASMA.Xg8.DE>
> On Sun, Jun 03, 2018 at 11:28:43PM +0100, Philip Oakley wrote:
>> It is here that Article 6 kicks in as to whether the 'organisation' can
>> retain the data and continue to use it.
>
> Article 6 is not about continuing to use data. Article 6 is about
> having and even obtaining it in the first place.

Correct, and that is the part I was refering to. Recipients of the
particular meta data require it for the licencing purpose. Thus they can
continue to have (and 'need') that data. It is that 'other side of the 
fence'
view I mentioned.

>
> Article 17 and article 21 are about continuing to use data.
>
>> For an open source project with an open source licence then an implict
>> DCO
>> applies for the meta data. It is the legal  basis for the the release.
>
> Neither article 6 nor 17 or 21 have anything remotely like an "implicit
> DCO" as a legitimization for publishing employee data.

I was refering to 'implict' in a reverse direction, that is, the DCO
supports the legal basis to have and hold the data. The express licence
terms in the various open source licences give the permission, and becomes
one of these legally conflicting aspects

>
> The GDPR is very explicit about implicit stuff never being a basis for
> consent, if you want to imply that is your basis. And consent can be
> withdrawn at any time anyway.
>
> An open source license has nothing whatsoever to do with the question
> of version control metadata. A public version control system is not
> necessary to publish open source software.
>
>> > - copyright is about distributing the program, not about distributing
>> > version control metadata.
>> It is specificaly about giving that right to copy by Jane Doe (but git
>> gives
>> no other information other than that supposedly globally unique 'author
>> email'.
>
> I don't get what you are saying. As I said, a public version control
> system is not necessary to publish open source software. The two things
> may be intimately related in practice, but not in theory.

Such is the law. It's the practice that is legal/illegal, decided in court
(if it gets there)

>
>> > - Being named is a right, not an obligation of the author. Hence, if
>> > the author doesn't want his name published, the company doesn't have
>> > legitimate grounds based in copyright for doing it anyway, against his
>> > or her will.
>> Git for Open Source is about open licencing by name. I'd agree that a
>> closed
>> corporate licence stays closed, but not forgotten.
>
> Again I don't get what you are saying. The author has a right to be
> named as the author, not an obligation. This has nothing whatsoever to
> do with the question of Open Source vs. closed corporate licenses.
>

The question is which clause is being used to justify an action. Those
corporate organisations want a legal basis for holding data, not a voluntary
permisson (because folk may try and rescind that permission... ). Those in
open source want to ensure that their licence is a legal basis for other
folk to have copies, and that folk can show they have that permission.

Those with a personal data view, will focus on the hope that they can remove
permission, especially for companies that are doing things they find
unacceptable, and maybe 'illegal' or unethical. The GDPR attempts to balance
the different set of expectaions, and the overlaps will need to be
negotiated. Different nations (and individuals) have different perceptions
as to what is normal and reasonable thus focus on different aspects, not
appreciating the Competeing Values that are present in the different
Frameworks of their weltanshauung.

If a closed source corporate does publish their closed data, they have real
internal problems anyway regarding that contradiction!

>> > Let's be honest: We do not know what legitimization exactly in each
>> > specific case the git metadata is being distributed under.
>>
>> We should know, already. A specific licence [or limit] should be in
>> place.
>> We don't really want to have to let a court decide ;-)
>
> It is insufficient to have a license for distributing the program. The
> license is not a GDPR legitimization for git metadata. Distributing the
> program can be done without distributing the author's identity as part
> of the metadata of his commits.
>
>> The law is never decided by technical means, unfortunately.
>
> It is. The GDPR refers to the state of the art of technology without
> defining it. Thus, technical means are very important in the GDPR. This
> may be something new for lawyers. If technology changes tomorrow, even
> without anything else changing, you may be breaking the GDPR by this
> simple fact tomorrow, while not breaking it today.
>

They will still argue about what is the state of the art, and that if the
art is hidden in some lab, then it's not available to meet the criteia.

> Again: Technology is very important in the GDPR.

We know quantum computing can crack the codes, but.... when does it become
the state of the art. SHA1 has been 'cracked' once in one special case, but
that doesn't make it state of the art for cracking a Git repo. It is a
problem about fooling some of the people some of the time which needs to
become [not fooling] most of the [appropriate] people most of the time. That
is what the owners should have known.

Some of this is, unfortunately, also about legal systems as to their
approaches to law and evidence, so UK maybe responding differently to
Germany, or USA, as to what even the words mean.

>
>> Regular git users should have no issues - they just need to point
>> their finger at the responsible authority.
>
> If git users are putting commits online for global download, they are
> the responsible authority.
>
>> The DCO/GPL2 are the legitimate data record that recipients should have
>> for
>> their copy. There is no right to be forgotten at that point.
>
> What do you mean by "should have for their copy"? Why shouldn't there
> be a right to be forgotten?

It isn't an absolute GDPR right

>           Open Source Software has been distributed a
> lot without detailed version control history information. Having this
> information as a record is certainly in the interest of the recipient,
> but it is very very questionable that it is an overriding legitimate
> grounds as per Art. 17 for keeping that data.

So your agument is that you/someone can make someone else guilty of an
offence by demanding they destroy evidence that proves their innocence.
>
>> I see the solution to be elsewhere, and that it is in some ways a
>> strawman
>> discussion: "if someone has the right to be forgotten, how do we delete
>> the
>> meta data", when that right (to delete the meta data in a properly
>> licence
>> repo) does not exist.
>
> See, this kind of shady legal argument is what lawyers are selling you.
> Why not put the energy into designing a technical solution.
>
> They tell you: "Ignore the GDPR. I will give you backup by giving you
> lots of disclaimers and excuses for doing so. Just give me a lot of
> money."

It's: make sure you understand all sides of the GDPR. There is a lot of FUD
from all sides.

>
> Having the ability to validate yet erase data form repositorys is
> desirable from a technical point of view. It has a lot of uses, not
> necessarily only legal ones. The objection of efficiency raised by Ted
> is a valid one. The strawman argument is not.

Efficiency would not be a valid argument. A major annoyance, yes. Something
that likely would stop open source contributers working on it, yes. But just
as ALARP is used in safety to say spend what it takes, if slowing down the
processing is what it takes to meet the GDPR then do it, so companies (and 
those
that do the processing) would have to fund that.

Thanks

Philip

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 22:16                             ` Peter Backes
@ 2018-06-04 13:47                               ` Theodore Y. Ts'o
  2018-06-04 18:22                                 ` Peter Backes
  0 siblings, 1 reply; 53+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-04 13:47 UTC (permalink / raw)
  To: Peter Backes
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Mon, Jun 04, 2018 at 12:16:16AM +0200, Peter Backes wrote:
> 
> Verifying the commit ID by itself wouldn't be any less efficient than 
> before. Admitteldly, it wouldn't verify the author and authordate 
> integrity anymore without additional work. That would be some overhead, 
> sure, and could be done on demand, and would mostly affect clones.

For people who are doing real work on git repos, other commands that
we very much care about include "git log --author=<authorname>", "git
tag --contains", "git blame", etc.

At least for any repo that *I* control, slow those down, and I
wouldn't downgrade my git binary/repo just to make some imperialistic
European bureaucrats happy.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-04 13:47                               ` Theodore Y. Ts'o
@ 2018-06-04 18:22                                 ` Peter Backes
  0 siblings, 0 replies; 53+ messages in thread
From: Peter Backes @ 2018-06-04 18:22 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Mon, Jun 04, 2018 at 09:47:18AM -0400, Theodore Y. Ts'o wrote:
> For people who are doing real work on git repos, other commands that
> we very much care about include "git log --author=<authorname>", "git
> tag --contains", "git blame", etc.

I do not see how those, or anything but git clone (and even that only 
if author verification is requested) could possibly be affected in any 
significant way.

Best wishes
Peter


-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-03 17:46               ` Peter Backes
  2018-06-03 18:18                 ` Theodore Y. Ts'o
  2018-06-03 22:28                 ` Philip Oakley
@ 2018-06-07  1:38                 ` David Lang
  2018-06-07  6:32                   ` Peter Backes
  2 siblings, 1 reply; 53+ messages in thread
From: David Lang @ 2018-06-07  1:38 UTC (permalink / raw)
  To: Peter Backes
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

I'm going to take the risk of inserting actual real-world data into the mix 
rather than just speculation :-)

Here is an example of that the Rsyslog project is doing (main developers based 
in Germany). I'll say as someone who's day job has been very involved with GDPR 
stuff recently, this looks like a very reasonable statement to me. But I am not 
a lawyer. I will also say that I think it would be very reasonable for projects 
to not accept code from someone who doesn't give them any way to contact them 
later in case there is a question about authorship or licensing.

David Lang

https://github.com/rsyslog/rsyslog/pull/2746/files

LEGAL GDPR NOTICE:
According to the European data protection laws (GDPR), we would like to make you
aware that contributing to rsyslog via git will permanently store the
name and email address you provide as well as the actual commit and the
time and date you made it inside git's version history. This is inevitable,
because it is a main feature git. If you are concerned about your
privacy, we strongly recommend to use

--author "anonymous <gdpr@example.com>"

together with your commit. Also please do NOT sign your commit in this case,
as that potentially could lead back to you. Please note that if you use your
real identity, the GDPR grants you the right to have this information removed
later. However, we have valid reasons why we cannot remove that information
later on. The reasons are:

* this would break git history and make future merges unworkable
* the rsyslog projects has legitimate interest to keep a permanent record of the
   contributor identity, once given, for
   - copyright verification
   - being able to provide proof should a malicious commit be made

Please also note that your commit is public and as such will potentially be
processed by many third-parties. Git's distributed nature makes it impossible
to track where exactly your commit, and thus your personal data, will be stored
and be processed. If you would not like to accept this risk, please do either
commit anonymously or refrain from contributing to the rsyslog project.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-07  1:38                 ` David Lang
@ 2018-06-07  6:32                   ` Peter Backes
  2018-06-07 21:28                     ` Philip Oakley
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-07  6:32 UTC (permalink / raw)
  To: David Lang
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

Hi David,

thanks for your input on the issue.

> LEGAL GDPR NOTICE:
> According to the European data protection laws (GDPR), we would like to make you
> aware that contributing to rsyslog via git will permanently store the
> name and email address you provide as well as the actual commit and the
> time and date you made it inside git's version history. This is inevitable,
> because it is a main feature git.

As we can, see, rsyslog tries to solve the issue by the already 
discussed legal "technology" of disclaimers (which is certainly not 
accepted as state of the art technology by the GDPR). In essence, they 
are giving excuses for why they are not honoring the right to be 
forgotten.

Disclaimers do not work. They have no legal effect, they are placebos.

The GDPR does not accept such excuses. If it would, companies could 
arbitrarily design their data storage such as to make it "the main 
feature" to not honor the right to be forgotten and/or other GDPR 
rights. It is obvious that this cannot work, as it would completely 
undermine those rights.

The GDPR honors technology as a means to protect the individual's 
rights, not as a means to subvert them.

> If you are concerned about your
> privacy, we strongly recommend to use
> 
> --author "anonymous <gdpr@example.com>"
> 
> together with your commit.

This can only be a solution if the project rejects any commits which 
are not anonymous.

> However, we have valid reasons why we cannot remove that information
> later on. The reasons are:
> 
> * this would break git history and make future merges unworkable

This is not a valid excuse (see above). The technology has to be 
designed or applied in such a way that the individuals rights are 
honored, not the other way around.

In absence of other means, the project has to rewrite history if it 
gets a valid request by someone exercising his right to be forgotten, 
even if that causes a lot of hazzle for everyone.

> * the rsyslog projects has legitimate interest to keep a permanent record of the
>   contributor identity, once given, for
>   - copyright verification
>   - being able to provide proof should a malicious commit be made

True, but that doesn't justify publishing that information and keeping 
it published even when someone exercises his right to be forgotten.

In that case, "legitimate interest" is not enough. There need to be 
"overriding legitimate grounds". I don't see them here.

> Please also note that your commit is public and as such will potentially be
> processed by many third-parties. Git's distributed nature makes it impossible
> to track where exactly your commit, and thus your personal data, will be stored
> and be processed. If you would not like to accept this risk, please do either
> commit anonymously or refrain from contributing to the rsyslog project.

This is one of those statements that ultimately say "we do not honor 
the GDPR; either accept that or don't submit". That's the old, arguably 
ignorant mentality, and won't stand.

The project has to have a legal basis for publishing the personal 
metadata contained in the repository. In doubt, it needs to be consent 
based, as that is practically the only basis that allows putting the 
data on the internet for everyone to download. And consent can be 
withdrawn at any time.

The GDPR's transitional period started over two years ago. There was 
enough time to get everything GDPR compliant.

It might be possible to implement my solution without changing git, 
btw. Simply use the anonymous hash as author name, and store the random 
number and the author as a git-notes. git-notes can be rewritten or 
deleted at any time without changing the commit ID. I am currently 
looking into this solution. One just needs to add something that can 
verify and resolve those anonymous hashes.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-07  6:32                   ` Peter Backes
@ 2018-06-07 21:28                     ` Philip Oakley
  2018-06-07 22:34                       ` Peter Backes
  0 siblings, 1 reply; 53+ messages in thread
From: Philip Oakley @ 2018-06-07 21:28 UTC (permalink / raw)
  To: Peter Backes, David Lang
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

Hi Peter, David,

I thought that the legal notice (aka 'disclaimer') was pretty reaonable.

Some of Peter's fine distinctions may be technically valid, but that does 
not stop there being legal grounds. The proof of copyright is a legal 
grounds.

Unfortunately once one gets into legal nitpicking the wording becomes 
tortuous and helps no-one.

If one starts from an absolute "right to be forgotten" perspective one can 
demand all evidence of wrong doing , or authority to do something, be 
forgotten. The GDPR has the right to retain such evidence.

I'll try and comment where I see the distinctions to be.

From: "Peter Backes" <rtc@helen.PLASMA.Xg8.DE>

> Hi David,
>
> thanks for your input on the issue.
>
>> LEGAL GDPR NOTICE:
>> According to the European data protection laws (GDPR), we would like to 
>> make you
>> aware that contributing to rsyslog via git will permanently store the
>> name and email address you provide as well as the actual commit and the
>> time and date you made it inside git's version history.

This is simply an information statement

>       This is inevitable,
>> because it is a main feature git.

The "inevitable" word creates a point of argument within the GDPR. Removing 
the word (and 'because/main') brings the sentance back to be an informative 
statement without a GDPR claim.
>
> As we can, see, rsyslog tries to solve the issue by the already
> discussed legal "technology" of disclaimers (which is certainly not
> accepted as state of the art technology by the GDPR). In essence, they
> are giving excuses for why they are not honoring the right to be
> forgotten.
>
> Disclaimers do not work. They have no legal effect, they are placebos.
>
> The GDPR does not accept such excuses. If it would, companies could
> arbitrarily design their data storage such as to make it "the main
> feature" to not honor the right to be forgotten and/or other GDPR
> rights. It is obvious that this cannot work, as it would completely
> undermine those rights.
>
> The GDPR honors technology as a means to protect the individual's
> rights, not as a means to subvert them.
>
>> If you are concerned about your
>> privacy, we strongly recommend to use
>>
>> --author "anonymous <gdpr@example.com>"
>>
>> together with your commit.

The [key] missing information here is whether rsyslog has a DCO (Developer 
Certificate of Origin) and what that contains.

The git.git DCO is here 
https://github.com/git/git/blob/master/Documentation/SubmittingPatches#L304-L349

This will also help discriminate between the "name" part and the <unique> 
identifier, as both could be separately anonymised (given the right DCO). 
Thus it may be that the name is recored as "anonymous", but with a 
<uid@known.place> that bridges the legal evidence/right to be forgotten 
bridge.
>
> This can only be a solution if the project rejects any commits which
> are not anonymous.
>
>> However, we have valid reasons why we cannot remove that information
>> later on. The reasons are:
>>
>> * this would break git history and make future merges unworkable
>
> This is not a valid excuse (see above).

Within the GDPR, that is correct. It (breaking history validation), of 
itself, should not be the reason.

>      The technology has to be
> designed or applied in such a way that the individuals rights are
> honored, not the other way around.
>
> In absence of other means, the project has to rewrite history if it
> gets a valid request by someone exercising his right to be forgotten,
> even if that causes a lot of hazzle for everyone.
>
>> * the rsyslog projects has legitimate interest to keep a permanent record 
>> of the
>>   contributor identity, once given, for
>>   - copyright verification
>>   - being able to provide proof should a malicious commit be made
>
> True, but that doesn't justify publishing that information and keeping
> it published even when someone exercises his right to be forgotten.

Publishing (the meta data) is *distinct* from having it.

However publishing the content and it's legal copyright is also associated 
with identifying the copyright holder (who has released it). This can be the 
uid if they hide behind a legal entity. This creates the catch 22 scenario. 
You either start off public and stay public, or you start off private and 
stay there.

Whether the rsyslog folk want to accept copyrighted work without appropriate 
legal release (who guards the guards, what's their badge number?) is part of 
the same information requirement.

Malicious intent makes the submission (commit) part of a legal evidence one 
needs to retain, so is supported by GDPR.

>
> In that case, "legitimate interest" is not enough. There need to be
> "overriding legitimate grounds". I don't see them here.
>
>> Please also note that your commit is public and as such will potentially 
>> be
>> processed by many third-parties. Git's distributed nature makes it 
>> impossible
>> to track where exactly your commit, and thus your personal data, will be 
>> stored
>> and be processed. If you would not like to accept this risk, please do 
>> either
>> commit anonymously or refrain from contributing to the rsyslog project.

The onward publishing and release should be by refernce to the DCO, and not 
that it's the Git way. As Peter notes, the 'Git way' (solely by itself) is 
no defence.
>
> This is one of those statements that ultimately say "we do not honor
> the GDPR; either accept that or don't submit". That's the old, arguably
> ignorant mentality, and won't stand.
>
> The project has to have a legal basis for publishing the personal
> metadata contained in the repository. In doubt, it needs to be consent
> based, as that is practically the only basis that allows putting the
> data on the internet for everyone to download. And consent can be
> withdrawn at any time.
>
> The GDPR's transitional period started over two years ago. There was
> enough time to get everything GDPR compliant.
>
> It might be possible to implement my solution without changing git,
> btw. Simply use the anonymous hash as author name, and store the random
> number and the author as a git-notes. git-notes can be rewritten or
> deleted at any time without changing the commit ID. I am currently
> looking into this solution. One just needs to add something that can
> verify and resolve those anonymous hashes.
>

To me, the key legal document is the DCO (and the law on which it stands). 
It is that which either conveys the rights, or does not. If the DCO is too 
loose then folk will be able to walk away from their malign code, and demand 
that someone else take responsibility for protecting them from it.

Philip 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-07 21:28                     ` Philip Oakley
@ 2018-06-07 22:34                       ` Peter Backes
  2018-06-07 22:38                         ` David Lang
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-07 22:34 UTC (permalink / raw)
  To: Philip Oakley
  Cc: David Lang, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Thu, Jun 07, 2018 at 10:28:47PM +0100, Philip Oakley wrote:
> Some of Peter's fine distinctions may be technically valid, but that does
> not stop there being legal grounds. The proof of copyright is a legal
> grounds.

Again: The GDPR certainly allows you to keep a proof of copyright 
privately if you have it. However, it does not allow you to keep 
publishing it if someone exercises his right to be forgotten.

There is simply no justification for publishing against the explicit 
will of the subject, except for the rare circumstances where there are 
overriding legitimate grounds for doing so. I hardly see those for the 
average author entry in your everyday git repo. Such a justification is 
extremely fragile.

> Unfortunately once one gets into legal nitpicking the wording becomes
> tortuous and helps no-one.

That's not nitpicking. If what you say were true, the GDPR would be 
without any practical validity at all.

> If one starts from an absolute "right to be forgotten" perspective one can
> demand all evidence of wrong doing , or authority to do something, be
> forgotten. The GDPR has the right to retain such evidence.

Yes, but not to keep it published.

> I'll try and comment where I see the distinctions to be.

You're essentially repeating what you already said there.

> Publishing (the meta data) is *distinct* from having it.

Absolutely right. That is my point.

> You either start off public and stay public, or you start off private and
> stay there.

Nope. The GDPR says you have to go from public to private if the 
subject wishes so and there are no overriding legitimate grounds.

That is the entire purpose of the GDPR's right to be forgotten.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-07 22:34                       ` Peter Backes
@ 2018-06-07 22:38                         ` David Lang
  2018-06-07 23:21                           ` Peter Backes
  0 siblings, 1 reply; 53+ messages in thread
From: David Lang @ 2018-06-07 22:38 UTC (permalink / raw)
  To: Peter Backes
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Fri, 8 Jun 2018, Peter Backes wrote:

> On Thu, Jun 07, 2018 at 10:28:47PM +0100, Philip Oakley wrote:
>> Some of Peter's fine distinctions may be technically valid, but that does
>> not stop there being legal grounds. The proof of copyright is a legal
>> grounds.
>
> Again: The GDPR certainly allows you to keep a proof of copyright
> privately if you have it. However, it does not allow you to keep
> publishing it if someone exercises his right to be forgotten.

someone is granting the world the right to use the code and you are claiming 
that the evidence that they have granted this right is illegal to have?

the GDPR recognizes that there are legal reasons why records need to be kept and 
does not insist that they be deleted.

you can't sign a deal to buy something, then insist that the GDPR allows your 
name to be removed from the contract.

And you are incorrect to say that the GDPR lets you keep records privately and 
only applies to publishing them. The GDPR is specifically targeted at companies 
like Facebook and Google that want to keep lots of data privately. It does no 
good to ask Facebook to not publish your info, they don't want to publish it in 
the first place, they want to keep it internally and use it.

David Lang


> There is simply no justification for publishing against the explicit
> will of the subject, except for the rare circumstances where there are
> overriding legitimate grounds for doing so. I hardly see those for the
> average author entry in your everyday git repo. Such a justification is
> extremely fragile.
>
>> Unfortunately once one gets into legal nitpicking the wording becomes
>> tortuous and helps no-one.
>
> That's not nitpicking. If what you say were true, the GDPR would be
> without any practical validity at all.
>
>> If one starts from an absolute "right to be forgotten" perspective one can
>> demand all evidence of wrong doing , or authority to do something, be
>> forgotten. The GDPR has the right to retain such evidence.
>
> Yes, but not to keep it published.
>
>> I'll try and comment where I see the distinctions to be.
>
> You're essentially repeating what you already said there.
>
>> Publishing (the meta data) is *distinct* from having it.
>
> Absolutely right. That is my point.
>
>> You either start off public and stay public, or you start off private and
>> stay there.
>
> Nope. The GDPR says you have to go from public to private if the
> subject wishes so and there are no overriding legitimate grounds.
>
> That is the entire purpose of the GDPR's right to be forgotten.
>
> Best wishes
> Peter
>
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-07 22:38                         ` David Lang
@ 2018-06-07 23:21                           ` Peter Backes
  2018-06-07 23:53                             ` David Lang
  2018-06-08  2:53                             ` Theodore Y. Ts'o
  0 siblings, 2 replies; 53+ messages in thread
From: Peter Backes @ 2018-06-07 23:21 UTC (permalink / raw)
  To: David Lang
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Thu, Jun 07, 2018 at 03:38:49PM -0700, David Lang wrote:
> > Again: The GDPR certainly allows you to keep a proof of copyright
> > privately if you have it. However, it does not allow you to keep
> > publishing it if someone exercises his right to be forgotten.
> someone is granting the world the right to use the code and you are claiming
> that the evidence that they have granted this right is illegal to have?

Hell no! Please read what I wrote:

- "allows you to keep a proof ... privately"
- "However, it does not allow you to keep publishing it"

> And you are incorrect to say that the GDPR lets you keep records privately
> and only applies to publishing them. The GDPR is specifically targeted at
> companies like Facebook and Google that want to keep lots of data privately.
> It does no good to ask Facebook to not publish your info, they don't want to
> publish it in the first place, they want to keep it internally and use it.

How can you misunderstand so badly what I wrote.

Sure the GDPR does not let you keep records privately at will. You 
ultimately need to have overriding legitimate grounds for doing so. 

However, overriding legitimate grounds for keeping private records are 
rarely overriding legitimate grounds for publishing them.

In case of git history metadata, for publishing, you may have consent 
or even legitimate interests, but not overriding legitimate grounds. 
For keeping a private copy of the metadata, your probably have 
overriding legitimate grounds, however.

The GDPR is not an "all or nothing" thing.

Facebook and Google certainly do not have overriding legitimate grounds 
for most of the data they keep privately.

Is it that so hard to understand?

Best wishes
Peter
-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-07 23:21                           ` Peter Backes
@ 2018-06-07 23:53                             ` David Lang
  2018-06-08  6:16                               ` Peter Backes
  2018-06-08  2:53                             ` Theodore Y. Ts'o
  1 sibling, 1 reply; 53+ messages in thread
From: David Lang @ 2018-06-07 23:53 UTC (permalink / raw)
  To: Peter Backes
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Fri, 8 Jun 2018, Peter Backes wrote:

> On Thu, Jun 07, 2018 at 03:38:49PM -0700, David Lang wrote:
>>> Again: The GDPR certainly allows you to keep a proof of copyright
>>> privately if you have it. However, it does not allow you to keep
>>> publishing it if someone exercises his right to be forgotten.
>> someone is granting the world the right to use the code and you are claiming
>> that the evidence that they have granted this right is illegal to have?
>
> Hell no! Please read what I wrote:
>
> - "allows you to keep a proof ... privately"
> - "However, it does not allow you to keep publishing it"
>
>> And you are incorrect to say that the GDPR lets you keep records privately
>> and only applies to publishing them. The GDPR is specifically targeted at
>> companies like Facebook and Google that want to keep lots of data privately.
>> It does no good to ask Facebook to not publish your info, they don't want to
>> publish it in the first place, they want to keep it internally and use it.
>
> How can you misunderstand so badly what I wrote.
>
> Sure the GDPR does not let you keep records privately at will. You
> ultimately need to have overriding legitimate grounds for doing so.
>
> However, overriding legitimate grounds for keeping private records are
> rarely overriding legitimate grounds for publishing them.

the license is granted to the world, so the world has an interest in it.

Unless you are going to argue that the GDPR outlawed open source development.

> In case of git history metadata, for publishing, you may have consent
> or even legitimate interests, but not overriding legitimate grounds.
> For keeping a private copy of the metadata, your probably have
> overriding legitimate grounds, however.
>
> The GDPR is not an "all or nothing" thing.
>
> Facebook and Google certainly do not have overriding legitimate grounds
> for most of the data they keep privately.
>
> Is it that so hard to understand?

you are the one arguing that the GDPR prohibits Git from storing and revealing 
this license granting data, not me.

David Lang

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-07 23:21                           ` Peter Backes
  2018-06-07 23:53                             ` David Lang
@ 2018-06-08  2:53                             ` Theodore Y. Ts'o
  2018-06-08  6:26                               ` Peter Backes
                                                 ` (2 more replies)
  1 sibling, 3 replies; 53+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-08  2:53 UTC (permalink / raw)
  To: Peter Backes
  Cc: David Lang, Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Fri, Jun 08, 2018 at 01:21:29AM +0200, Peter Backes wrote:
> On Thu, Jun 07, 2018 at 03:38:49PM -0700, David Lang wrote:
> > > Again: The GDPR certainly allows you to keep a proof of copyright
> > > privately if you have it. However, it does not allow you to keep
> > > publishing it if someone exercises his right to be forgotten.
> > someone is granting the world the right to use the code and you are claiming
> > that the evidence that they have granted this right is illegal to have?
> 
> Hell no! Please read what I wrote:
> 
> - "allows you to keep a proof ... privately"
> - "However, it does not allow you to keep publishing it"

The problem is you've left undefined who is "you"?  With an open
source project, anyone who has contributed to open source project has
a copyright interest.  That hobbyist in German who submitted a patch?
They have a copyright interest.  That US Company based in Redmond,
Washington?  They own a copyright interest.  Huawei in China?  They
have a copyright interest.

So there is no "privately".  And "you" numbers in the thousands and
thousands of copyright holders of portions of the open source code.

And of course, that's the other thing you seem to fundamentally not
understand about how git works.  Every developer in the world working
on that open source project has their own copy.  There is
fundamentally no way that you can expunge that information from every
single git repository in the world.  You can remote a git note from a
single repository.  But that doesn't affect my copy of the repository
on my laptop.  And if I push that repository to my server, it git note
will be out there for the whole world to see.

So someone could *try* sending a public request to the entire world,
saying, "I am a European and I demand that you disassociate commit
DEADBEF12345 from my name".  They could try serving legal papers on
everyone.  But at this point, it's going to trigger something called
the "Streisand Effect".  If you haven't heard of it, I suggest you
look it up:

http://mentalfloss.com/article/67299/how-barbra-streisand-inspired-streisand-effect

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-07 23:53                             ` David Lang
@ 2018-06-08  6:16                               ` Peter Backes
  2018-06-08  7:42                                 ` David Lang
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-08  6:16 UTC (permalink / raw)
  To: David Lang
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Thu, Jun 07, 2018 at 04:53:16PM -0700, David Lang wrote:
> the license is granted to the world, so the world has an interest in it.

Certainly, but you need to have overriding legitimate grounds. An 
interest is not enough for justification. You have to weight your 
interests against those of the subject.

> Unless you are going to argue that the GDPR outlawed open source 
> development.

No it certainly did not and I don't see how it could.

All the GDPR arguably demands is that the author's identity is deleted 
from a public repository if he wishes so.

Just assume it was a CVS repo. Then removal would not be any issue at 
all. It is a technical speciality of git that makes the removal so 
intricate to implement, which is not at all an intrinsic property of 
open source development.

> you are the one arguing that the GDPR prohibits Git from storing and
> revealing this license granting data, not me.

It prohibits publishing, and only after a request to be forgotten. It 
does not prohibit storing your private copy.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08  2:53                             ` Theodore Y. Ts'o
@ 2018-06-08  6:26                               ` Peter Backes
  2018-06-08  8:13                                 ` Ævar Arnfjörð Bjarmason
  2018-06-08 14:45                                 ` Theodore Y. Ts'o
  2018-06-08 22:09                               ` Johannes Sixt
  2018-06-09 22:50                               ` Philip Oakley
  2 siblings, 2 replies; 53+ messages in thread
From: Peter Backes @ 2018-06-08  6:26 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: David Lang, Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Thu, Jun 07, 2018 at 10:53:13PM -0400, Theodore Y. Ts'o wrote:
> The problem is you've left undefined who is "you"?  With an open
> source project, anyone who has contributed to open source project has
> a copyright interest.  That hobbyist in German who submitted a patch?
> They have a copyright interest.  That US Company based in Redmond,
> Washington?  They own a copyright interest.  Huawei in China?  They
> have a copyright interest.
> 
> So there is no "privately".  And "you" numbers in the thousands and
> thousands of copyright holders of portions of the open source code.

Of course there is "privately". Every single one of those who have the 
author information can keep it, privately, for themselves. But those 
that have received a request to be forgotten must not keep publishing 
it on the Internet for download or distribute it to others.

> And of course, that's the other thing you seem to fundamentally not
> understand about how git works.  Every developer in the world working
> on that open source project has their own copy.  There is
> fundamentally no way that you can expunge that information from every
> single git repository in the world.

The misunderstanding is on your side.

If you run a website where the world can access a repository, you are 
responsible for obeying the GDPR with respect to that repository. If 
you receive a request to be forgotten, you have to make sure you stop 
publishing that author's identity as part of the repository.

You do NOT need to

- delete it from a private copy you have
- care about others who publish that data
- or even make sure the data is deleted from private copies others may 
have, even if the number of copies is in the thousands.

In practical terms, if someone wishes to exercise his right to be 
forgotten, he will usually send the request to the maintainer and stop 
him from distributing the information, and perhaps to a third party he 
might use as a platform for publication, such as github.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08  6:16                               ` Peter Backes
@ 2018-06-08  7:42                                 ` David Lang
  2018-06-08 11:58                                   ` Peter Backes
  0 siblings, 1 reply; 53+ messages in thread
From: David Lang @ 2018-06-08  7:42 UTC (permalink / raw)
  To: Peter Backes
  Cc: David Lang, Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Fri, 8 Jun 2018, Peter Backes wrote:

>> you are the one arguing that the GDPR prohibits Git from storing and
>> revealing this license granting data, not me.
>
> It prohibits publishing, and only after a request to be forgotten. It
> does not prohibit storing your private copy.

Wrong, if you have to delete info, you are not allowed to keep a private copy. 
There is _nothing_ in the GDPR about publishing information, everything in it is 
about what you are allowed to store privately, how you are required to protect 
it (or more precisely, what you are required to do if private data gets hacked), 
and how you are required to keep it available.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08  6:26                               ` Peter Backes
@ 2018-06-08  8:13                                 ` Ævar Arnfjörð Bjarmason
  2018-06-08 12:03                                   ` Peter Backes
  2018-06-08 14:45                                 ` Theodore Y. Ts'o
  1 sibling, 1 reply; 53+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08  8:13 UTC (permalink / raw)
  To: Peter Backes
  Cc: Theodore Y. Ts'o, David Lang, Philip Oakley, Git Mailing List

On Fri, Jun 08 2018, Peter Backes wrote:

> On Thu, Jun 07, 2018 at 10:53:13PM -0400, Theodore Y. Ts'o wrote:
>> The problem is you've left undefined who is "you"?  With an open
>> source project, anyone who has contributed to open source project has
>> a copyright interest.  That hobbyist in German who submitted a patch?
>> They have a copyright interest.  That US Company based in Redmond,
>> Washington?  They own a copyright interest.  Huawei in China?  They
>> have a copyright interest.
>>
>> So there is no "privately".  And "you" numbers in the thousands and
>> thousands of copyright holders of portions of the open source code.
>
> Of course there is "privately". Every single one of those who have the
> author information can keep it, privately, for themselves. But those
> that have received a request to be forgotten must not keep publishing
> it on the Internet for download or distribute it to others.

Can you walk us through how anyone would be expected to fork (as create
a new project, not the github-ism) existing projects under such a
regiment?

E.g. in git.git we have SOB lines for the whole history, in lieu of
GNU-style copyright assignment (which is how things mainly worked back
in the CVS days) someone can just clone the repository and create a
hostile fork, which is one of the central ideas of free software.

In the world you're describing the history would have been expunged
publicly, and no hosting site would be willing to host it. It might be
gone in practical terms to anyone who just doesn't like how (in this
example) the Git project is run, and thinks they can do it better.

Maybe (again, in this example) the Software Freedom Conservancy's scope
would have to expand to retain this private history (right now they have
nothing to do with copyright).

But then how am I going to fork the Git project if the SFC decides they
don't want to cooperate with me?

As David Lang notes upthread, "the license is granted to the world, so
the world has an interest in it". I wouldn't be so sure that this line
of argument wouldn't work.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08  7:42                                 ` David Lang
@ 2018-06-08 11:58                                   ` Peter Backes
  2018-06-08 18:51                                     ` David Lang
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-08 11:58 UTC (permalink / raw)
  To: David Lang
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Fri, Jun 08, 2018 at 12:42:54AM -0700, David Lang wrote:
> Wrong, if you have to delete info, you are not allowed to keep a private
> copy.

Yes you are allowed. See Art. 17 (3) lit e GDPR.

> There is _nothing_ in the GDPR about publishing information,
> everything in it is about what you are allowed to store privately, how you
> are required to protect it (or more precisely, what you are required to do
> if private data gets hacked), and how you are required to keep it available.

Nope, the GDPR is not at all restricted to private copies.

The GDPR has special jargon for publishing; the GDPR calls it 
"disclosure (Art. 4 (2) GDPR) to an unspecified number of unspecified 
recipients (Art. 4 (9) GDPR), including ones in third countries 
(Chapter 5) in a repetitive (Art 49 (1) GDPR) fashion".

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08  8:13                                 ` Ævar Arnfjörð Bjarmason
@ 2018-06-08 12:03                                   ` Peter Backes
  2018-06-08 22:53                                     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Backes @ 2018-06-08 12:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Theodore Y. Ts'o, David Lang, Philip Oakley, Git Mailing List

On Fri, Jun 08, 2018 at 10:13:20AM +0200, Ævar Arnfjörð Bjarmason wrote:
> Can you walk us through how anyone would be expected to fork (as create
> a new project, not the github-ism) existing projects under such a
> regiment?

I don't see your point. Copy the repository to fork. Nothing changes 
about that. Nothing prevents anyone from forking a repository which had 
some of its author names removed from the commits.

> As David Lang notes upthread, "the license is granted to the world, so
> the world has an interest in it". I wouldn't be so sure that this line
> of argument wouldn't work.

As I already stressed, having an interest is not enough. You need to 
have overriding legitimate grounds.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08  6:26                               ` Peter Backes
  2018-06-08  8:13                                 ` Ævar Arnfjörð Bjarmason
@ 2018-06-08 14:45                                 ` Theodore Y. Ts'o
  2018-06-08 16:02                                   ` Peter Backes
  1 sibling, 1 reply; 53+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-08 14:45 UTC (permalink / raw)
  To: Peter Backes
  Cc: David Lang, Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Fri, Jun 08, 2018 at 08:26:57AM +0200, Peter Backes wrote:
> 
> If you run a website where the world can access a repository, you are 
> responsible for obeying the GDPR with respect to that repository. If 
> you receive a request to be forgotten, you have to make sure you stop 
> publishing that author's identity as part of the repository.
>

*Anyone* can run a repository.  It's not just github and gitlab.  The
hobbiest in New Zealand, who might never visit Europe (so she can't
be arrested when she visits the fair shores of Europe) and who has no
business interests in Europe, can host such a web site.

So the person trying to engage in censorship would need to contact
*everyone*.  And someone who has a git note in their private repo who
then pushes to github/gitlab would end up pushing that note back up to
the web server.

> You do NOT need to
> 
> - delete it from a private copy you have
> - care about others who publish that data
> - or even make sure the data is deleted from private copies others may 
> have, even if the number of copies is in the thousands.

Great, so you can get github and gitlab to get rid of the information.
But it's *pointless*.  And given that real developers really do care
about who authored a patch, and regularly will do operations that
reference the authorship information, the fact that it is stored
somewhere else (e.g., in a git note, per your proposal), *will* slow
down those operations.

> In practical terms, if someone wishes to exercise his right to be 
> forgotten, he will usually send the request to the maintainer and stop 
> him from distributing the information, and perhaps to a third party he 
> might use as a platform for publication, such as github.

Your problem is in the word: "a"

							- Ted

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08 14:45                                 ` Theodore Y. Ts'o
@ 2018-06-08 16:02                                   ` Peter Backes
  0 siblings, 0 replies; 53+ messages in thread
From: Peter Backes @ 2018-06-08 16:02 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: David Lang, Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Fri, Jun 08, 2018 at 10:45:51AM -0400, Theodore Y. Ts'o wrote:
> *Anyone* can run a repository.  It's not just github and gitlab.  The
> hobbiest in New Zealand, who might never visit Europe (so she can't
> be arrested when she visits the fair shores of Europe) and who has no
> business interests in Europe, can host such a web site.

Just because letters of request are hardly enforced doesn't make it 
legal to break the GDPR. For sure, a hobbyist would not have much to 
fear, even if he is violating the GDPR and coming to Europe. The GDPR 
is mostly about taming the megacorporations, not about arresting 
tourists.

> So the person trying to engage in censorship

Censorship? The GDPR is not about censorship.

If you want to write an opionion about someone by name, the GDPR gives 
you all legitimization to do so, against that person's will.

This is about removing the data under ordinary circumstances.

> would need to contact *everyone*.

This is the subject's problem, not the repository provider's.

> And someone who has a git note in their private repo who
> then pushes to github/gitlab would end up pushing that note back up to
> the web server.

If that note has been deleted based on the right to be forgotten, you 
as the repository provider have to make sure you don't publish it 
again. Since you are allowed to keep a private copy, ensuring that 
shouldn't be a problem for you. 

> Great, so you can get github and gitlab to get rid of the information.
> But it's *pointless*.

It's up to the subject to consider it pointless or not to exercise his 
rights...

> Your problem is in the word: "a"

...and against whom, whether one repository provider, the major ones, 
all of them he can find.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08 11:58                                   ` Peter Backes
@ 2018-06-08 18:51                                     ` David Lang
  2018-06-12 18:56                                       ` David Lang
  0 siblings, 1 reply; 53+ messages in thread
From: David Lang @ 2018-06-08 18:51 UTC (permalink / raw)
  To: Peter Backes
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Fri, 8 Jun 2018, Peter Backes wrote:

> On Fri, Jun 08, 2018 at 12:42:54AM -0700, David Lang wrote:
>> Wrong, if you have to delete info, you are not allowed to keep a private
>> copy.
>
> Yes you are allowed. See Art. 17 (3) lit e GDPR.
>
>> There is _nothing_ in the GDPR about publishing information,
>> everything in it is about what you are allowed to store privately, how you
>> are required to protect it (or more precisely, what you are required to do
>> if private data gets hacked), and how you are required to keep it available.
>
> Nope, the GDPR is not at all restricted to private copies.

If the GDPR doesn't restrict private copies, then Google and Facebook are free 
to keep all data about everyone. That is explicitly what the GDPR is trying to 
prevent.

> The GDPR has special jargon for publishing; the GDPR calls it
> "disclosure (Art. 4 (2) GDPR) to an unspecified number of unspecified
> recipients (Art. 4 (9) GDPR), including ones in third countries
> (Chapter 5) in a repetitive (Art 49 (1) GDPR) fashion".

disclosure is what the person who submits the patch is doing, torturing the 
language of the GDPR to say that hanging on to data that people want you to 
delete is legal, and echoing public data that people have asked to be public is 
not legal is not going to be a successful line of argument, it's the exact 
opposite of the stated goals of the GDPR.

David Lang

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08  2:53                             ` Theodore Y. Ts'o
  2018-06-08  6:26                               ` Peter Backes
@ 2018-06-08 22:09                               ` Johannes Sixt
  2018-06-09 22:50                               ` Philip Oakley
  2 siblings, 0 replies; 53+ messages in thread
From: Johannes Sixt @ 2018-06-08 22:09 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Peter Backes, David Lang, Philip Oakley,
	Ævar Arnfjörð Bjarmason, Git Mailing List

Am 08.06.2018 um 04:53 schrieb Theodore Y. Ts'o:
> And of course, that's the other thing you seem to fundamentally not
> understand about how git works.  Every developer in the world working
> on that open source project has their own copy.

Everyone here understands how Git works, of course.

"*shrug* but that's how Git works" does *NOT* override the GDPR.

-- Hannes

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-04-17 19:15 GDPR compliance best practices? Peter Backes
  2018-04-17 21:38 ` Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:42 ` Jonathan Nieder
  2018-06-08 23:00   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 53+ messages in thread
From: Jonathan Nieder @ 2018-06-08 22:42 UTC (permalink / raw)
  To: Peter Backes; +Cc: Git Mailing List, Theodore Y. Ts'o

Hi,

Peter Backes wrote:

> I'd like to ask whether anyone has best practices for achieving GDPR
> compliance for git repos? The GDPR will come into effect in the EU next
> month.

This is a reasonable question to ask other Git users on this list to
share ideas, so thanks for asking it.

> In particular, how do you cope with the "Right to erasure" concerning
> entries in the history of your git repos?

Later in the thread you discussed some changes you would like to make
to Git or in front of Git to ensure that people can erase their
authorship information from a repository after the fact in a
non-disruptive way.

I have no opinion about how that relates to GDPR requirements.  I tend
to expect any legal advice a person gets to be situation-specific;
it's much harder to get legal advice that is useful to share.

Separate from that legal context, though, I think it's an interesting
feature request.  I don't think it goes far enough: I would like a way
to erase arbitrary information from the history in a repository.  For
example, if I accidentally check in an encryption key in my repository
as content or a commit message, I would like a way to remove it,
assuming that others who fetch from the same repo are willing to
cooperate with me, of course (i.e. in place of the object, the server
would store a placeholder and an _advisory_ token allowing clients to
know (1) that this object was deleted, (2) what object to use instead,
and (3) an explanatory note about why the deletion occured; clients
could make whatever use of this information they choose).

I've seen some discussion on this subject at
https://www.mercurial-scm.org/pipermail/mercurial/2008-March/017802.html
long ago and have some ideas of my own, but nothing concrete yet.
Anyway, I thought it might be useful to get people's minds working on
it.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08 12:03                                   ` Peter Backes
@ 2018-06-08 22:53                                     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 53+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:53 UTC (permalink / raw)
  To: Peter Backes
  Cc: Theodore Y. Ts'o, David Lang, Philip Oakley, Git Mailing List

On Fri, Jun 08 2018, Peter Backes wrote:

> On Fri, Jun 08, 2018 at 10:13:20AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Can you walk us through how anyone would be expected to fork (as create
>> a new project, not the github-ism) existing projects under such a
>> regiment?
>
> I don't see your point. Copy the repository to fork. Nothing changes
> about that. Nothing prevents anyone from forking a repository which had
> some of its author names removed from the commits.

This basically the same as saying the whole notion of Signed-off-by
should be abandoned entirely, since in this case the fork will only have
a partial set of these.

The point is that we're recording information so each line in the
repository can be traced back to a SOB.

These sorts of take-downs would destroy that information, and the
proposed solution of having some party retain these creates a special
class of free software users who are capable of following that line of
attributions.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08 22:42 ` Jonathan Nieder
@ 2018-06-08 23:00   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 53+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 23:00 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Peter Backes, Git Mailing List, Theodore Y. Ts'o, Joey Hess


On Fri, Jun 08 2018, Jonathan Nieder wrote:

> Separate from that legal context, though, I think it's an interesting
> feature request.  I don't think it goes far enough: I would like a way
> to erase arbitrary information from the history in a repository.  For
> example, if I accidentally check in an encryption key in my repository
> as content or a commit message, I would like a way to remove it,
> assuming that others who fetch from the same repo are willing to
> cooperate with me, of course (i.e. in place of the object, the server
> would store a placeholder and an _advisory_ token allowing clients to
> know (1) that this object was deleted, (2) what object to use instead,
> and (3) an explanatory note about why the deletion occured; clients
> could make whatever use of this information they choose).
>
> I've seen some discussion on this subject at
> https://www.mercurial-scm.org/pipermail/mercurial/2008-March/017802.html
> long ago and have some ideas of my own, but nothing concrete yet.
> Anyway, I thought it might be useful to get people's minds working on
> it.

You may find it interesting to look at how git-annex-forget does this:
https://git-annex.branchable.com/git-annex-forget/ &
http://git-annex.branchable.com/devblog/day_-4__forgetting/

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08  2:53                             ` Theodore Y. Ts'o
  2018-06-08  6:26                               ` Peter Backes
  2018-06-08 22:09                               ` Johannes Sixt
@ 2018-06-09 22:50                               ` Philip Oakley
  2018-06-10  1:41                                 ` Theodore Y. Ts'o
  2 siblings, 1 reply; 53+ messages in thread
From: Philip Oakley @ 2018-06-09 22:50 UTC (permalink / raw)
  To: Theodore Y. Ts'o, Peter Backes
  Cc: David Lang, Ævar Arnfjörð Bjarmason,
	Git Mailing List

From: "Theodore Y. Ts'o" <tytso@mit.edu>
Sent: Friday, June 08, 2018 3:53 AM
> On Fri, Jun 08, 2018 at 01:21:29AM +0200, Peter Backes wrote:
>> On Thu, Jun 07, 2018 at 03:38:49PM -0700, David Lang wrote:
>> > > Again: The GDPR certainly allows you to keep a proof of copyright
>> > > privately if you have it. However, it does not allow you to keep
>> > > publishing it if someone exercises his right to be forgotten.
>> > someone is granting the world the right to use the code and you are
>> > claiming
>> > that the evidence that they have granted this right is illegal to have?
>>
>> Hell no! Please read what I wrote:
>>
>> - "allows you to keep a proof ... privately"
>> - "However, it does not allow you to keep publishing it"
>
> The problem is you've left undefined who is "you"?  With an open
> source project, anyone who has contributed to open source project has
> a copyright interest.  That hobbyist in German who submitted a patch?
> They have a copyright interest.  That US Company based in Redmond,
> Washington?  They own a copyright interest.  Huawei in China?  They
> have a copyright interest.
>
> So there is no "privately".  And "you" numbers in the thousands and
> thousands of copyright holders of portions of the open source code.
>
> And of course, that's the other thing you seem to fundamentally not
> understand about how git works.  Every developer in the world working
> on that open source project has their own copy.  There is
> fundamentally no way that you can expunge that information from every
> single git repository in the world.  You can remote a git note from a
> single repository.  But that doesn't affect my copy of the repository
> on my laptop.  And if I push that repository to my server, it git note
> will be out there for the whole world to see.
>
> So someone could *try* sending a public request to the entire world,
> saying, "I am a European and I demand that you disassociate commit
> DEADBEF12345 from my name".  They could try serving legal papers on
> everyone.  But at this point, it's going to trigger something called
> the "Streisand Effect".  If you haven't heard of it, I suggest you
> look it up:
>
> http://mentalfloss.com/article/67299/how-barbra-streisand-inspired-streisand-effect
>
> Regards,
>
> - Ted
>
Hi Ted,

I just want to remind folks that Gmane disappeared as a regular list because
of a legal challenge, the SCO v IBM Unix court case keeps rumbling on, so
clarifying the legal case for:
a) holding the 'personal git meta data', and
b) disclosing (publishing) 'personal git meta data'
under various copyright and other legal issue scenarios relative to GDPR is
worth clarifying.

I'm of the opinion that the GPL should be able to allow both holding and
disclosing that data, though it may need a few more clarifications as to
verifying that the author is 'correct' (e.g. not a child) and if a DCO is
needed, etc.

We are already looking at a change to the hash, so the technical challenge
could be addressed, but may create too many logical conflicts if 'right to
be forgotten' is allowed (one hash change is enough;-)

Philip


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-09 22:50                               ` Philip Oakley
@ 2018-06-10  1:41                                 ` Theodore Y. Ts'o
  0 siblings, 0 replies; 53+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-10  1:41 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Peter Backes, David Lang, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Sat, Jun 09, 2018 at 11:50:32PM +0100, Philip Oakley wrote:
> I just want to remind folks that Gmane disappeared as a regular list because
> of a legal challenge, the SCO v IBM Unix court case keeps rumbling on, so
> clarifying the legal case for:
> a) holding the 'personal git meta data', and
> b) disclosing (publishing) 'personal git meta data'
> under various copyright and other legal issue scenarios relative to GDPR is
> worth clarifying.

And I suspect the best way of clarifying things is for laywers at the
major corporations (e.g., Red Hat, Microsoft now that it owns github,
Google since it publishes Android sources at sources.android.com,
Canonical, etc.) to figure it out.

Those situations may very well differ depend on whether they have a
CLA or Copyright Assignment Agreement which they require of
contributors.  But fortunately, those organizations are also best set
up to send patches.   :-)

If those organizations are not choosing to send patches, I suspect
that might be a strong hint as to what those lawyers have concluded.

						- Ted

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-08 18:51                                     ` David Lang
@ 2018-06-12 18:56                                       ` David Lang
  2018-06-12 19:12                                         ` Peter Backes
  0 siblings, 1 reply; 53+ messages in thread
From: David Lang @ 2018-06-12 18:56 UTC (permalink / raw)
  To: Peter Backes
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1450 bytes --]

Adding one more datapoint here, I reached out to Github to find out their 
stance.

Here is what I got back

Quote:

Thanks for reaching out to us about this.

It's important to remember that the Right to Erasure only applies to personal 
data, not all data. It only applies to data a controller (GitHub, for example) 
is processing _solely_ on the basis of consent. And it only applies when there's 
not another legal reason to keep the data — for instance, if the data is no 
longer necessary for the purpose for which it was collected.

We do not process Git commit history on the basis of consent. We have a 
legitimate business purpose for collecting Git commit history: to maintain the 
integrity of the Git commit record. It remains necessary for its purpose for as 
long as a commit needs to be attributable to its committer. At GitHub, as part 
of our Privacy By Design work, we offer ways for users to set their own Git 
commit email data, so if an individual wants to remain anonymous or 
pseudonymous, he or she can do so. We also explain, in our [Privacy 
Statement](https://help.github.com/articles/github-privacy-statement), that we 
are not able to delete personal data from the Git commit history once it has 
been recorded.

End Quote

I'll point out that not only did the Github lawyers need to sign off on this 
stance, but the Microsoft lawyers would have looked at it as well as part of 
their purchase of Github.

David Lang

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-12 18:56                                       ` David Lang
@ 2018-06-12 19:12                                         ` Peter Backes
  2018-06-12 19:16                                           ` Martin Fick
  2018-06-13 14:12                                           ` Theodore Y. Ts'o
  0 siblings, 2 replies; 53+ messages in thread
From: Peter Backes @ 2018-06-12 19:12 UTC (permalink / raw)
  To: David Lang
  Cc: Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Tue, Jun 12, 2018 at 11:56:13AM -0700, David Lang wrote:
> [quoting github]
> 
> It's important to remember that the Right to Erasure only applies to
> personal data, not all data. It only applies to data a controller (GitHub,
> for example) is processing _solely_ on the basis of consent.

This is very obviously wrong. See Art. 17 GDPR. Consent is only one of 
the explicitly mentioned grounds for deletion (it is (1) lit b, but 
there's also a and c to f).

> And it only
> applies when there's not another legal reason to keep the data -- for
> instance, if the data is no longer necessary for the purpose for which it
> was collected.

This incorrect claim is completely inverting the logic of Art. 17.

The logic is clarly that if ANY of lit (a) to (f) is satisfied, the 
data must be deleted.

It is not necessary for ALL of them to be satisfied.

In particular, if the data is no longer necessary for the purpose for 
which it was collected, then THAT ALONE is grounds for erasure ((1) 
lit. a). It does not matter at all whether processing was consent-based 
or whether such consent was withdrawn.

> We do not process Git commit history on the basis of consent. We have a
> legitimate business purpose for collecting Git commit history: to maintain
> the integrity of the Git commit record. It remains necessary for its purpose
> for as long as a commit needs to be attributable to its committer.

Right, but this merely justifies storing the data, not publishing it, 
or keeping it published, as I already explained at length.

> At GitHub, as part of our Privacy By Design work, we offer ways for users to
> set their own Git commit email data, so if an individual wants to remain
> anonymous or pseudonymous, he or she can do so.

Not only is this contradicting fundamentally what they just said in the 
previous sentence, it is not a justification for ignoring the right to 
erasure either. It is exactly the purpose of the right to erasure to 
get the data erased *after* the fact.

> We also explain, in our
> [Privacy
> Statement](https://help.github.com/articles/github-privacy-statement), that
> we are not able to delete personal data from the Git commit history once it
> has been recorded.

Privacy Statements are not a justification under GDPR for processing 
data or ignoring the right to erasure.

And oh yes they are able. Rewriting history is a possibility, though an 
inconvenient one.

I have pointed towards more convenient solutions.

> I'll point out that not only did the Github lawyers need to sign off on this
> stance, but the Microsoft lawyers would have looked at it as well as part of
> their purchase of Github.

So? If a thousand lawyers claim 1+1=3, it becomes a mathematical truth?

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-12 19:12                                         ` Peter Backes
@ 2018-06-12 19:16                                           ` Martin Fick
  2018-06-13 14:12                                           ` Theodore Y. Ts'o
  1 sibling, 0 replies; 53+ messages in thread
From: Martin Fick @ 2018-06-12 19:16 UTC (permalink / raw)
  To: Peter Backes
  Cc: David Lang, Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Tuesday, June 12, 2018 09:12:19 PM Peter Backes wrote:
> So? If a thousand lawyers claim 1+1=3, it becomes a
> mathematical truth?

No, but probably a legal "truth". :)

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-12 19:12                                         ` Peter Backes
  2018-06-12 19:16                                           ` Martin Fick
@ 2018-06-13 14:12                                           ` Theodore Y. Ts'o
  2018-06-13 14:48                                             ` Peter Backes
  1 sibling, 1 reply; 53+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-13 14:12 UTC (permalink / raw)
  To: Peter Backes
  Cc: David Lang, Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Tue, Jun 12, 2018 at 09:12:19PM +0200, Peter Backes wrote:
> This incorrect claim is completely inverting the logic of Art. 17.
> 
> The logic is clarly that if ANY of lit (a) to (f) is satisfied, the 
> data must be deleted.
> 
> It is not necessary for ALL of them to be satisfied.
> 
> In particular, if the data is no longer necessary for the purpose for 
> which it was collected, then THAT ALONE is grounds for erasure ((1) 
> lit. a). It does not matter at all whether processing was consent-based 
> or whether such consent was withdrawn.

Sure, but given that you are the one trying to claim that people need
to do all sorts of extra development work (I don't see any patches
from you) and suffer performance degredation, the burden of proof is
on _you_ to show that this is a problem that github, et. al., are
likely run into.

In particular, keep in mind that distribution of open source code can
only be done under the terms of an open source license --- and a
license is a contract.  So in particular, your claim that the data is
no longer necessary (point a) is at the very least going to be subject
to dispute and is a legal question.  I can think of any number of ways
that this could considered necessary in order to assure open source
license compliance, the public interest in terms of allowing forking,
etc.

The bottom line is I'm sure the lawyers at github and Microsoft have
very carefully done their due diligence, and if they are concerned,
I'm sure we'll see patches from them, since after all, they would not
be interested in seeing the imperial European bureaucrats trying to
assess 4% of Microsoft's world-wide revenues --- that's $3.6 billion
dollars, by the way.  I'm sure if they think it's a concern, their
programmers will be right on it.

					- Ted

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: GDPR compliance best practices?
  2018-06-13 14:12                                           ` Theodore Y. Ts'o
@ 2018-06-13 14:48                                             ` Peter Backes
  0 siblings, 0 replies; 53+ messages in thread
From: Peter Backes @ 2018-06-13 14:48 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: David Lang, Philip Oakley, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Wed, Jun 13, 2018 at 10:12:18AM -0400, Theodore Y. Ts'o wrote:
> Sure, but given that you are the one trying to claim that people need
> to do all sorts of extra development work (I don't see any patches

No. I am not. I said it is desirable to have a convenient solution for 
the problem. I did not demand development work or patches from anyone, 
just kindly asked for a comment on a possible solution.

> from you) and suffer performance degredation, the burden of proof is
> on _you_ to show that this is a problem that github, et. al., are
> likely run into.

*You* claimed there was performance degradation, not me.

That github et. al. will sooner or later receive such erasure requests 
is a practical certainty. Google receives them every day in large 
quantities. Just think about someone who committed smelly code on 
github and now wants to get a new job and wants to get rid of all 
associations with those smells.

> In particular, keep in mind that distribution of open source code can
> only be done under the terms of an open source license --- and a
> license is a contract.

Not that it would be relevant here, but, depending on jurisdication, it 
is highly controversial whether open source licenses really constitute 
contracts (or, for example, promissory estoppel).

For the right to erasure, it does not matter whether a contract exists 
or not.

The GDPR explicitly prohibits any use of contracts in a way that 
undermines the GDPR. Making it an irrevocable contractual obligation to 
publish the data is not going to be an excuse thus. And Free Software 
licenses have nothing whatsoever to do with repository metadata. Such 
software has existed long before version control became so popular.

> So in particular, your claim that the data is
> no longer necessary (point a) is at the very least going to be subject

No, it is github's claim that it must no longer be necessary for being 
erased, not mine!

I clearly stated that if ANY point (not: ALL points) is given, the data 
must be deleted.

Thus, point b, c, d or any other are just as good as point a.

> to dispute and is a legal question.  I can think of any number of ways
> that this could considered necessary in order to assure open source
> license compliance, the public interest in terms of allowing forking,
> etc.

To claim that the data is necessary (which is, as I said, irrelevant) 
and then say it's not because you can as well use a dummy user string, 
is self-contradicting.

> The bottom line is I'm sure the lawyers at github and Microsoft have
> very carefully done their due diligence, and if they are concerned,
> I'm sure we'll see patches from them, since after all, they would not

Why should they be concerned? They can rewrite history if necessary. 
They have a solution, though an inconvenient one. As far as the lawyers 
are concerned, that solution is pefectly fine.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2018-06-13 14:48 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17 19:15 GDPR compliance best practices? Peter Backes
2018-04-17 21:38 ` Ævar Arnfjörð Bjarmason
2018-04-17 23:25   ` Peter Backes
2018-06-03  9:27   ` Peter Backes
2018-06-03 10:45     ` Ævar Arnfjörð Bjarmason
2018-06-03 11:25       ` Peter Backes
2018-06-03 12:59         ` Ævar Arnfjörð Bjarmason
2018-06-03 14:18           ` Peter Backes
2018-06-03 15:28             ` Philip Oakley
2018-06-03 17:46               ` Peter Backes
2018-06-03 18:18                 ` Theodore Y. Ts'o
2018-06-03 19:11                   ` Peter Backes
2018-06-03 19:24                     ` Peter Backes
2018-06-03 20:07                       ` Theodore Y. Ts'o
2018-06-03 20:52                         ` Peter Backes
2018-06-03 21:03                           ` Theodore Y. Ts'o
2018-06-03 22:16                             ` Peter Backes
2018-06-04 13:47                               ` Theodore Y. Ts'o
2018-06-04 18:22                                 ` Peter Backes
2018-06-03 22:28                 ` Philip Oakley
2018-06-03 23:01                   ` Peter Backes
2018-06-04 12:24                     ` Philip Oakley
2018-06-07  1:38                 ` David Lang
2018-06-07  6:32                   ` Peter Backes
2018-06-07 21:28                     ` Philip Oakley
2018-06-07 22:34                       ` Peter Backes
2018-06-07 22:38                         ` David Lang
2018-06-07 23:21                           ` Peter Backes
2018-06-07 23:53                             ` David Lang
2018-06-08  6:16                               ` Peter Backes
2018-06-08  7:42                                 ` David Lang
2018-06-08 11:58                                   ` Peter Backes
2018-06-08 18:51                                     ` David Lang
2018-06-12 18:56                                       ` David Lang
2018-06-12 19:12                                         ` Peter Backes
2018-06-12 19:16                                           ` Martin Fick
2018-06-13 14:12                                           ` Theodore Y. Ts'o
2018-06-13 14:48                                             ` Peter Backes
2018-06-08  2:53                             ` Theodore Y. Ts'o
2018-06-08  6:26                               ` Peter Backes
2018-06-08  8:13                                 ` Ævar Arnfjörð Bjarmason
2018-06-08 12:03                                   ` Peter Backes
2018-06-08 22:53                                     ` Ævar Arnfjörð Bjarmason
2018-06-08 14:45                                 ` Theodore Y. Ts'o
2018-06-08 16:02                                   ` Peter Backes
2018-06-08 22:09                               ` Johannes Sixt
2018-06-09 22:50                               ` Philip Oakley
2018-06-10  1:41                                 ` Theodore Y. Ts'o
2018-06-03 17:54               ` Philip Oakley
2018-06-03 19:48             ` Ævar Arnfjörð Bjarmason
2018-06-03 20:24               ` Peter Backes
2018-06-08 22:42 ` Jonathan Nieder
2018-06-08 23:00   ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).