git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Edward Thomson <ethomson@edwardthomson.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"brian m . carlson" <sandals@crustytoothpaste.net>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	demerphq <demerphq@gmail.com>,
	Brandon Williams <bmwill@google.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: Questions about the hash function transition
Date: Tue, 28 Aug 2018 17:02:18 +0200	[thread overview]
Message-ID: <87ftyyedqd.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <CA+WKDT1k1SpHQmUKunV+vC+VLBfTBjZBgw+n4NeTE=oKxWL-Sg@mail.gmail.com>


On Tue, Aug 28 2018, Edward Thomson wrote:

> On Tue, Aug 28, 2018 at 2:50 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> If we instead had something like clean/smudge filters:
>>
>>     [extensions]
>>         objectFilter = sha256-to-sha1
>>         compatObjectFormat = sha1
>>     [objectFilter "sha256-to-sha1"]
>>         clean  = ...
>>         smudge = ...
>>
>> We could apply arbitrary transformations on objects through filters
>> which would accept/return some simple format requesting them to
>> translate such-and-such objects, and would either return object
>> names/types under which to store them, or "nothing to do".
>
> If I'm understanding you correctly, then on the libgit2 side, I'm very much
> opposed to this proposal.  We never execute commands, nor do I want to start
> thinking that we can do so arbitrarily.  We run in environments where that's
> a non-starter

I'm being unclear. I'm suggesting that we slightly amend the syntax of
what we're proposing to put in the .git/config to leave the door open
for *optionally* doing arbitrary mappings.

It would still work exactly the same internally for the common
sha1<->sha256 case, i.e. neither git, libgit, jgit or anyone else would
need to shell out to anything.

They'd just pick up that common case and handle it internally, similar
to how e.g. the crlf filter (v.s. full clean/smudge support) works in
git & libgit2:
https://github.com/libgit2/libgit2/blob/master/tests/filter/crlf.c

So the sha256<->sha1 support would be an implicit built-in like crlf, it
would just leave the door open to having something like git-lfs.

Now what does that really mean? And I admit I may be missing something
here.

Unlike smudge/clean filters we're going to be constrained by having
hashes of length 20 or 32, locally & remotely, since we wouldn't want to
support arbitrary lengths, but with relatively small changes it'll allow
for changing just:

    # local  remote
    sha256<->sha1

To also support:

    # local  remote
    fn(sha1)<->fn(sha1)
    fn(sha1)<->fn(sha256)
    fn(sha256)<->fn(sha1)
    fn(sha256)<->fn(sha256)

Where fn() is some hook you'd provide to hook into the bits where we're
e.g. unpacking SHA-1 objects from the remote, and writing them locally
as SHA-256, except instead of (as we do by default) writing:

    SHA256_map(sha256(content)) = content

You'd write:

    SHA256_map(sha256(fn(content))) = fn(content)

Where fn() would need to be idempotent.

Now, why is this useful or worth considering? As noted in the E-Mail I
linked to it allows for some novel use cases for doing local to remote
object translation.

But really, I'm not suggesting that *that* is something we should
consider. *All* I'm saying is that given the experience of how we
started out with stuff like built-in "crlf", and then grew smudge/clean
filters, that it's worth considering what sort of .git/config key-value
pairs we'd pick that would yield themselves to such future extensions,
should that be something we deem to be a good idea in the future.

Because if we don't we've lost nothing, but if we do we'd need to
support two sets of config syntaxes to do those two related things.

> At present, in libgit2, users can provide their own mechanism for running
> clean/smudge filters.  But hash transformation / compatibility is going to
> be a crucial compatibility component.  So this is not something that we
> could simply opt out of or require users to implement themselves.

Indeed.

  reply	other threads:[~2018-08-28 15:02 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-23 14:02 Questions about the hash function transition Ævar Arnfjörð Bjarmason
2018-08-23 14:27 ` Junio C Hamano
2018-08-23 15:20   ` Ævar Arnfjörð Bjarmason
2018-08-23 16:13     ` Junio C Hamano
2018-08-24  1:40 ` brian m. carlson
2018-08-24  1:54   ` Jonathan Nieder
2018-08-24  4:47     ` brian m. carlson
2018-08-24  4:52       ` Jonathan Nieder
2018-08-24  1:47 ` Jonathan Nieder
2018-08-28 12:04   ` Johannes Schindelin
2018-08-28 12:49     ` Derrick Stolee
2018-08-28 17:12       ` Jonathan Nieder
2018-08-28 17:11     ` Jonathan Nieder
2018-08-29 13:09       ` Johannes Schindelin
2018-08-29 13:27         ` Derrick Stolee
2018-08-29 14:43           ` Derrick Stolee
2018-08-29  9:13   ` How is the ^{sha256} peel syntax supposed to work? Ævar Arnfjörð Bjarmason
2018-08-29 17:51     ` Stefan Beller
2018-08-29 17:59       ` Jonathan Nieder
2018-08-29 18:34         ` Stefan Beller
2018-08-29 18:41         ` Ævar Arnfjörð Bjarmason
2018-08-29 19:12           ` Jonathan Nieder
2018-08-29 19:37             ` Ævar Arnfjörð Bjarmason
2018-08-29 20:46               ` Jonathan Nieder
2018-08-29 23:45                 ` Jeff King
2018-08-29 20:53             ` Junio C Hamano
2018-08-29 21:01               ` Jonathan Nieder
2018-08-29 17:56     ` Jonathan Nieder
2018-08-24  2:51 ` Questions about the hash function transition Jonathan Nieder
2018-08-28 13:50 ` Ævar Arnfjörð Bjarmason
2018-08-28 14:15   ` Edward Thomson
2018-08-28 15:02     ` Ævar Arnfjörð Bjarmason [this message]
2018-08-28 15:45     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ftyyedqd.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=bmwill@google.com \
    --cc=demerphq@gmail.com \
    --cc=ethomson@edwardthomson.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=stolee@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).