git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Peter Backes <rtc@helen.PLASMA.Xg8.DE>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: "Philip Oakley" <philipoakley@iee.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: Re: GDPR compliance best practices?
Date: Mon, 4 Jun 2018 00:16:16 +0200	[thread overview]
Message-ID: <20180603221616.GA14636@helen.PLASMA.Xg8.DE> (raw)
In-Reply-To: <20180603210344.GF1750@thunk.org>

On Sun, Jun 03, 2018 at 05:03:44PM -0400, Theodore Y. Ts'o wrote:
> If you don't think a potential 2x -- 10x performance hit isn't a
> blocking factor --- sure, go ahead and try implementing it.  And good
> luck to you.  And this is not a guarantee that it won't get rejected.
> I certainly don't have the power to make that guarantee.

I do not want or expect a guarantee, or even a probability, of course. 
Just trying to avoid "STRONG REJECT. We could have said you before you 
even started implementing. Why didn't you discuss this beforehand?"

One would simply change something like

author A U Thor <author@example.com> 1465982009 +0000

into something like

author 21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U Thor <author@example.com>
author-hash 469bb107e38f8e59dddb3bbd6f8646e052bf73d48427865563c7358a64467f2c
authordate c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 1465982009 +0000
authordate-hash 199875e5aedb6cb164a2b40c16209dc5bb37f34c059a56c6d96766440fb0fe68

and then compute the commit id without the "author" and the 
"authordate" lines.

The *-hash values were obtained as follows:

echo -n '21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U Thor <author@example.com>' | sha3sum -a 256
echo -n 'c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 1465982009 +0000' | sha3sum -a 256

The hex values here are simply the $huge_random_numbers

Verifying the commit ID by itself wouldn't be any less efficient than 
before. Admitteldly, it wouldn't verify the author and authordate 
integrity anymore without additional work. That would be some overhead, 
sure, and could be done on demand, and would mostly affect clones. I 
don't think it would be that much of a problem. It can be parallelized 
easily. The hashes for each field are independent of each other. They 
can all be verified in parallel in different threads running on 
different cores.

On djb's typical 2015 skylake machine the supercop benchmark tells us 
that sha3-256 (~=keccakc512) has a speed of about 20 cycles/byte for 
blocks of 64 bytes of data, see 
https://bench.cr.yp.to/results-sha3.html#amd64-skylake

Let's say we have 128 bytes of data on average for the author field, so 
conservatively speaking it takes about 3000 cycles (> 128*20) to hash 
and compare the hash.

At 3000 MHz, we can thus do roughly about 1000 verifications per second 
per core.

Let's assume we have 10 anonymizable fields of this kind per commit.

Then the overhead would be one second per 100 x ncores commits.

How many commits are we talking about in a huge repository? And how 
long does a clone of such a huge repository take at the moment? Do you 
have any numbers?

> If you don't have time to implement, why do you think it's fair to
> inflict on everyone else the request for time to do a design review
> for something for which the need hasn't even been established?

I do not request from anyone to even reply to my messages. I just see a 
lot of time being wasted by discussing things about my proposal that 
are technically irrelevant. If that time were put into reviewing the 
design, it would be spent better.

Please don't devalue a proposal. It is not true that the only value is 
in actual code and proposals are "bullshit".

I was not the first to raise the issue, as I clearly showed in my 
initial email.

The demand is in fact high; very high. At present, that demand is 
satisfied by lawyers. Who are writing snake oil disclaimers and such 
for enormous sums of money. In a lot of companies. To "solve" a 
technical issue by pseudo-legal means by finding excuses for why the 
"right to be forgotten" doesn't have to be implemented in specific 
cases such as git. What if all that lawyer money were put into actually 
solving the technical issues as technical issues? Engineers are 
apparently bad at marketing, the lawyers seem more successful in that 
respect.

Best wishes
Peter

-- 
Peter Backes, rtc@helen.PLASMA.Xg8.DE

  reply	other threads:[~2018-06-03 22:17 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17 19:15 GDPR compliance best practices? Peter Backes
2018-04-17 21:38 ` Ævar Arnfjörð Bjarmason
2018-04-17 23:25   ` Peter Backes
2018-06-03  9:27   ` Peter Backes
2018-06-03 10:45     ` Ævar Arnfjörð Bjarmason
2018-06-03 11:25       ` Peter Backes
2018-06-03 12:59         ` Ævar Arnfjörð Bjarmason
2018-06-03 14:18           ` Peter Backes
2018-06-03 15:28             ` Philip Oakley
2018-06-03 17:46               ` Peter Backes
2018-06-03 18:18                 ` Theodore Y. Ts'o
2018-06-03 19:11                   ` Peter Backes
2018-06-03 19:24                     ` Peter Backes
2018-06-03 20:07                       ` Theodore Y. Ts'o
2018-06-03 20:52                         ` Peter Backes
2018-06-03 21:03                           ` Theodore Y. Ts'o
2018-06-03 22:16                             ` Peter Backes [this message]
2018-06-04 13:47                               ` Theodore Y. Ts'o
2018-06-04 18:22                                 ` Peter Backes
2018-06-03 22:28                 ` Philip Oakley
2018-06-03 23:01                   ` Peter Backes
2018-06-04 12:24                     ` Philip Oakley
2018-06-07  1:38                 ` David Lang
2018-06-07  6:32                   ` Peter Backes
2018-06-07 21:28                     ` Philip Oakley
2018-06-07 22:34                       ` Peter Backes
2018-06-07 22:38                         ` David Lang
2018-06-07 23:21                           ` Peter Backes
2018-06-07 23:53                             ` David Lang
2018-06-08  6:16                               ` Peter Backes
2018-06-08  7:42                                 ` David Lang
2018-06-08 11:58                                   ` Peter Backes
2018-06-08 18:51                                     ` David Lang
2018-06-12 18:56                                       ` David Lang
2018-06-12 19:12                                         ` Peter Backes
2018-06-12 19:16                                           ` Martin Fick
2018-06-13 14:12                                           ` Theodore Y. Ts'o
2018-06-13 14:48                                             ` Peter Backes
2018-06-08  2:53                             ` Theodore Y. Ts'o
2018-06-08  6:26                               ` Peter Backes
2018-06-08  8:13                                 ` Ævar Arnfjörð Bjarmason
2018-06-08 12:03                                   ` Peter Backes
2018-06-08 22:53                                     ` Ævar Arnfjörð Bjarmason
2018-06-08 14:45                                 ` Theodore Y. Ts'o
2018-06-08 16:02                                   ` Peter Backes
2018-06-08 22:09                               ` Johannes Sixt
2018-06-09 22:50                               ` Philip Oakley
2018-06-10  1:41                                 ` Theodore Y. Ts'o
2018-06-03 17:54               ` Philip Oakley
2018-06-03 19:48             ` Ævar Arnfjörð Bjarmason
2018-06-03 20:24               ` Peter Backes
2018-06-08 22:42 ` Jonathan Nieder
2018-06-08 23:00   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180603221616.GA14636@helen.PLASMA.Xg8.DE \
    --to=rtc@helen.plasma.xg8.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=philipoakley@iee.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).