From: Peter Backes <rtc@helen.PLASMA.Xg8.DE>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: "Philip Oakley" <philipoakley@iee.org>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Git Mailing List" <git@vger.kernel.org>
Subject: Re: GDPR compliance best practices?
Date: Mon, 4 Jun 2018 00:16:16 +0200 [thread overview]
Message-ID: <20180603221616.GA14636@helen.PLASMA.Xg8.DE> (raw)
In-Reply-To: <20180603210344.GF1750@thunk.org>
On Sun, Jun 03, 2018 at 05:03:44PM -0400, Theodore Y. Ts'o wrote:
> If you don't think a potential 2x -- 10x performance hit isn't a
> blocking factor --- sure, go ahead and try implementing it. And good
> luck to you. And this is not a guarantee that it won't get rejected.
> I certainly don't have the power to make that guarantee.
I do not want or expect a guarantee, or even a probability, of course.
Just trying to avoid "STRONG REJECT. We could have said you before you
even started implementing. Why didn't you discuss this beforehand?"
One would simply change something like
author A U Thor <author@example.com> 1465982009 +0000
into something like
author 21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U Thor <author@example.com>
author-hash 469bb107e38f8e59dddb3bbd6f8646e052bf73d48427865563c7358a64467f2c
authordate c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 1465982009 +0000
authordate-hash 199875e5aedb6cb164a2b40c16209dc5bb37f34c059a56c6d96766440fb0fe68
and then compute the commit id without the "author" and the
"authordate" lines.
The *-hash values were obtained as follows:
echo -n '21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U Thor <author@example.com>' | sha3sum -a 256
echo -n 'c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 1465982009 +0000' | sha3sum -a 256
The hex values here are simply the $huge_random_numbers
Verifying the commit ID by itself wouldn't be any less efficient than
before. Admitteldly, it wouldn't verify the author and authordate
integrity anymore without additional work. That would be some overhead,
sure, and could be done on demand, and would mostly affect clones. I
don't think it would be that much of a problem. It can be parallelized
easily. The hashes for each field are independent of each other. They
can all be verified in parallel in different threads running on
different cores.
On djb's typical 2015 skylake machine the supercop benchmark tells us
that sha3-256 (~=keccakc512) has a speed of about 20 cycles/byte for
blocks of 64 bytes of data, see
https://bench.cr.yp.to/results-sha3.html#amd64-skylake
Let's say we have 128 bytes of data on average for the author field, so
conservatively speaking it takes about 3000 cycles (> 128*20) to hash
and compare the hash.
At 3000 MHz, we can thus do roughly about 1000 verifications per second
per core.
Let's assume we have 10 anonymizable fields of this kind per commit.
Then the overhead would be one second per 100 x ncores commits.
How many commits are we talking about in a huge repository? And how
long does a clone of such a huge repository take at the moment? Do you
have any numbers?
> If you don't have time to implement, why do you think it's fair to
> inflict on everyone else the request for time to do a design review
> for something for which the need hasn't even been established?
I do not request from anyone to even reply to my messages. I just see a
lot of time being wasted by discussing things about my proposal that
are technically irrelevant. If that time were put into reviewing the
design, it would be spent better.
Please don't devalue a proposal. It is not true that the only value is
in actual code and proposals are "bullshit".
I was not the first to raise the issue, as I clearly showed in my
initial email.
The demand is in fact high; very high. At present, that demand is
satisfied by lawyers. Who are writing snake oil disclaimers and such
for enormous sums of money. In a lot of companies. To "solve" a
technical issue by pseudo-legal means by finding excuses for why the
"right to be forgotten" doesn't have to be implemented in specific
cases such as git. What if all that lawyer money were put into actually
solving the technical issues as technical issues? Engineers are
apparently bad at marketing, the lawyers seem more successful in that
respect.
Best wishes
Peter
--
Peter Backes, rtc@helen.PLASMA.Xg8.DE
next prev parent reply other threads:[~2018-06-03 22:17 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-17 19:15 GDPR compliance best practices? Peter Backes
2018-04-17 21:38 ` Ævar Arnfjörð Bjarmason
2018-04-17 23:25 ` Peter Backes
2018-06-03 9:27 ` Peter Backes
2018-06-03 10:45 ` Ævar Arnfjörð Bjarmason
2018-06-03 11:25 ` Peter Backes
2018-06-03 12:59 ` Ævar Arnfjörð Bjarmason
2018-06-03 14:18 ` Peter Backes
2018-06-03 15:28 ` Philip Oakley
2018-06-03 17:46 ` Peter Backes
2018-06-03 18:18 ` Theodore Y. Ts'o
2018-06-03 19:11 ` Peter Backes
2018-06-03 19:24 ` Peter Backes
2018-06-03 20:07 ` Theodore Y. Ts'o
2018-06-03 20:52 ` Peter Backes
2018-06-03 21:03 ` Theodore Y. Ts'o
2018-06-03 22:16 ` Peter Backes [this message]
2018-06-04 13:47 ` Theodore Y. Ts'o
2018-06-04 18:22 ` Peter Backes
2018-06-03 22:28 ` Philip Oakley
2018-06-03 23:01 ` Peter Backes
2018-06-04 12:24 ` Philip Oakley
2018-06-07 1:38 ` David Lang
2018-06-07 6:32 ` Peter Backes
2018-06-07 21:28 ` Philip Oakley
2018-06-07 22:34 ` Peter Backes
2018-06-07 22:38 ` David Lang
2018-06-07 23:21 ` Peter Backes
2018-06-07 23:53 ` David Lang
2018-06-08 6:16 ` Peter Backes
2018-06-08 7:42 ` David Lang
2018-06-08 11:58 ` Peter Backes
2018-06-08 18:51 ` David Lang
2018-06-12 18:56 ` David Lang
2018-06-12 19:12 ` Peter Backes
2018-06-12 19:16 ` Martin Fick
2018-06-13 14:12 ` Theodore Y. Ts'o
2018-06-13 14:48 ` Peter Backes
2018-06-08 2:53 ` Theodore Y. Ts'o
2018-06-08 6:26 ` Peter Backes
2018-06-08 8:13 ` Ævar Arnfjörð Bjarmason
2018-06-08 12:03 ` Peter Backes
2018-06-08 22:53 ` Ævar Arnfjörð Bjarmason
2018-06-08 14:45 ` Theodore Y. Ts'o
2018-06-08 16:02 ` Peter Backes
2018-06-08 22:09 ` Johannes Sixt
2018-06-09 22:50 ` Philip Oakley
2018-06-10 1:41 ` Theodore Y. Ts'o
2018-06-03 17:54 ` Philip Oakley
2018-06-03 19:48 ` Ævar Arnfjörð Bjarmason
2018-06-03 20:24 ` Peter Backes
2018-06-08 22:42 ` Jonathan Nieder
2018-06-08 23:00 ` Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180603221616.GA14636@helen.PLASMA.Xg8.DE \
--to=rtc@helen.plasma.xg8.de \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=philipoakley@iee.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).