Questions about the hash function transition

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Questions about the hash function transition
@ 2018-08-23 14:02 Ævar Arnfjörð Bjarmason
  2018-08-23 14:27 ` Junio C Hamano
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-23 14:02 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Linus Torvalds, Edward Thomson, brian m . carlson,
	Jonathan Nieder, Johannes Schindelin, demerphq, Brandon Williams,
	Derrick Stolee

I wanted to send another series to clarify things in
hash-function-transition.txt, but for some of the issues I don't know
the answer, and I had some questions after giving this another read.

So let's discuss that here first. Quoting from the document (available
at
https://github.com/git/git/blob/v2.19.0-rc0/Documentation/technical/hash-function-transition.txt)

> Git hash function transition
> ============================
>
> Objective
> ---------
> Migrate Git from SHA-1 to a stronger hash function.

Should way say "Migrate Git from SHA-1 to SHA-256" here instead?

Maybe it's overly specific, i.e. really we're also describnig how /any/
hash function transition might happen, but having just read this now
from start to finish it takes us a really long time to mention (and at
first, only offhand) that SHA-256 is the new hash.

> [...]
> Goals
> -----
> 1. The transition to SHA-256 can be done one local repository at a time.
>    a. Requiring no action by any other party.
>    b. A SHA-256 repository can communicate with SHA-1 Git servers
>       (push/fetch).
>    c. Users can use SHA-1 and SHA-256 identifiers for objects
>       interchangeably (see "Object names on the command line", below).
>    d. New signed objects make use of a stronger hash function than
>       SHA-1 for their security guarantees.
> 2. Allow a complete transition away from SHA-1.
>    a. Local metadata for SHA-1 compatibility can be removed from a
>       repository if compatibility with SHA-1 is no longer needed.
> 3. Maintainability throughout the process.
>    a. The object format is kept simple and consistent.
>    b. Creation of a generalized repository conversion tool.
>
> Non-Goals
> ---------
> 1. Add SHA-256 support to Git protocol. This is valuable and the
>    logical next step but it is out of scope for this initial design.

This is a non-goal according to the docs, but now that we have protocol
v2 in git, perhaps we could start specifying or describing how this
protocol extension will work?

> [...]
> 3. Intermixing objects using multiple hash functions in a single
>    repository.

But isn't that the goal now per "Translation table" & writing both SHA-1
and SHA-256 versions of objects?

> [...]
> Pack index
> ~~~~~~~~~~
> Pack index (.idx) files use a new v3 format that supports multiple
> hash functions. They have the following format (all integers are in
> network byte order):
>
> - A header appears at the beginning and consists of the following:
>   - The 4-byte pack index signature: '\377t0c'
>   - 4-byte version number: 3
>   - 4-byte length of the header section, including the signature and
>     version number
>   - 4-byte number of objects contained in the pack
>   - 4-byte number of object formats in this pack index: 2
>   - For each object format:
>     - 4-byte format identifier (e.g., 'sha1' for SHA-1)

So, given that we have 4-byte limit and have decided on SHA-256 are we
just going to call this 'sha2'? That might be confusingly ambiguous
since SHA2 is a standard with more than just SHA-256, maybe 's256', or
maybe we should give this 8 bytes with trailing \0s so we can have
"SHA-1\0\0\0" and "SHA-256\0"?

> [...]
> - The trailer consists of the following:
>   - A copy of the 20-byte SHA-256 checksum at the end of the
>     corresponding packfile.
>
>   - 20-byte SHA-256 checksum of all of the above.

We need to update both of these to 32 byte, right? Or are we planning to
truncate the checksums?

This seems like just a mistake when we did s/NewHash/SHA-256/g, but then
again it was originally "20-byte NewHash checksum" ever since 752414ae43
("technical doc: add a design doc for hash function transition",
2017-09-27), so what do we mean here?

> Loose object index
> ~~~~~~~~~~~~~~~~~~
> A new file $GIT_OBJECT_DIR/loose-object-idx contains information about
> all loose objects. Its format is
>
>   # loose-object-idx
>   (sha256-name SP sha1-name LF)*
>
> where the object names are in hexadecimal format. The file is not
> sorted.
>
> The loose object index is protected against concurrent writes by a
> lock file $GIT_OBJECT_DIR/loose-object-idx.lock. To add a new loose
> object:
>
> 1. Write the loose object to a temporary file, like today.
> 2. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the lock.
> 3. Rename the loose object into place.
> 4. Open loose-object-idx with O_APPEND and write the new object
> 5. Unlink loose-object-idx.lock to release the lock.
>
> To remove entries (e.g. in "git pack-refs" or "git-prune"):
>
> 1. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the
>    lock.
> 2. Write the new content to loose-object-idx.lock.
> 3. Unlink any loose objects being removed.
> 4. Rename to replace loose-object-idx, releasing the lock.

Do we expect multiple concurrent writers to poll the lock if they can't
aquire it right away? I.e. concurrent "git commit" would block? Has this
overall approach been benchmarked somewhere?

I wonder if some lock-less variant of this would perform
better. E.g. that we'd consult not one, but any
loose-object-{1..Inf}.idx files written in sequence, and we's specify
that each file would contain no more than N mappings.

Then writers could stat() the file to figure out if it has more than N
already by looking at the size (we have fixed-width records). Once we've
filled loose-object-1.idx we start writing to loose-object-2.idx and so
on.

The advantage of this is that writers wouldn't have to block one
another, and could just O_APPEND write to the file(s), although we could
end up with duplicate entries (which readers would need to tolerate).

Then some GC process could look at the set of loose-object-{1..Inf}.idx
files, find one that was at the max size (or slightly above, due to the
O_APPEND race condition), and whose mtime was deemed old enough to be
"safe" (to guard against an hours-long git-commit writing to it),
compact it, and rename a new one in-place, or better yet get rid of it
in favor of a pack).

Maybe I've missed some subtlety where that won't work, I'm just
concerned that something that's writing a lot of objects in parallel
will be slowed down (e.g. the likes of BFG repo cleaner).

> Translation table
> ~~~~~~~~~~~~~~~~~
> The index files support a bidirectional mapping between sha1-names
> and sha256-names. The lookup proceeds similarly to ordinary object
> lookups. For example, to convert a sha1-name to a sha256-name:
>
>  1. Look for the object in idx files. If a match is present in the
>     idx's sorted list of truncated sha1-names, then:
>     a. Read the corresponding entry in the sha1-name order to pack
>        name order mapping.
>     b. Read the corresponding entry in the full sha1-name table to
>        verify we found the right object. If it is, then
>     c. Read the corresponding entry in the full sha256-name table.
>        That is the object's sha256-name.
>  2. Check for a loose object. Read lines from loose-object-idx until
>     we find a match.
>
> Step (1) takes the same amount of time as an ordinary object lookup:
> O(number of packs * log(objects per pack)). Step (2) takes O(number of
> loose objects) time. To maintain good performance it will be necessary
> to keep the number of loose objects low. See the "Loose objects and
> unreachable objects" section below for more details.
>
> Since all operations that make new objects (e.g., "git commit") add
> the new objects to the corresponding index, this mapping is possible
> for all objects in the object store.

Are we going to need a midx version of these mapping files? How does
midx fit into this picture? Perhaps it's too obscure to worry about...

> Reading an object's sha1-content
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> The sha1-content of an object can be read by converting all sha256-names
> its sha256-content references to sha1-names using the translation table.
>
> Fetch
> ~~~~~
> Fetching from a SHA-1 based server requires translating between SHA-1
> and SHA-256 based representations on the fly.
>
> SHA-1s named in the ref advertisement that are present on the client
> can be translated to SHA-256 and looked up as local objects using the
> translation table.
>
> Negotiation proceeds as today. Any "have"s generated locally are
> converted to SHA-1 before being sent to the server, and SHA-1s
> mentioned by the server are converted to SHA-256 when looking them up
> locally.
>
> After negotiation, the server sends a packfile containing the
> requested objects. We convert the packfile to SHA-256 format using
> the following steps:
>
> 1. index-pack: inflate each object in the packfile and compute its
>    SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
>    objects the client has locally. These objects can be looked up
>    using the translation table and their sha1-content read as
>    described above to resolve the deltas.
> 2. topological sort: starting at the "want"s from the negotiation
>    phase, walk through objects in the pack and emit a list of them,
>    excluding blobs, in reverse topologically sorted order, with each
>    object coming later in the list than all objects it references.
>    (This list only contains objects reachable from the "wants". If the
>    pack from the server contained additional extraneous objects, then
>    they will be discarded.)
> 3. convert to sha256: open a new (sha256) packfile. Read the topologically
>    sorted list just generated. For each object, inflate its
>    sha1-content, convert to sha256-content, and write it to the sha256
>    pack. Record the new sha1<->sha256 mapping entry for use in the idx.
> 4. sort: reorder entries in the new pack to match the order of objects
>    in the pack the server generated and include blobs. Write a sha256 idx
>    file
> 5. clean up: remove the SHA-1 based pack file, index, and
>    topologically sorted list obtained from the server in steps 1
>    and 2.

Doesn't this process require us to implement a "fetch quarantine"? Least
we have (e.g. other concurrent fetches) referencing those new SHA-1
objects we've fetched in a pack that we'll remove in step #5?

> [...]
> The user can also explicitly specify which format to use for a
> particular revision specifier and for output, overriding the mode. For
> example:
>
> git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}

How is this going to interact with other peel syntax? I.e. now we have
<object>^{commit} <sha>^{tag} etc. It seems to me we'll need not ^{sha1}
but ^{sha1:<current_type>}, e.g. ^{sha1:commit} or ^{sha1:tag}, with
current ^{} being a synonym for ^{sha1:}.

Or is this expected to be chained, as e.g. <object>^{tag}^{sha256} ?

> Transition plan
> ---------------

One thing that's not covered in this document at all, which I feel is
missing, is how we're going to handle references to old commit IDs in
commit messages, bug trackers etc. once we go through the whole
migration process.

I.e. are users who expect to be able to read old history and "git show
<sha1 I found>" expected to maintain a repository that has a live
sha1<->sha256 mapping forever, or could we be smarter about this and
support some sort of marker in the repository saying "maintain the
mapping up until this point".

Then, along with some v2 protocol extension to transfer such a
historical mapping (and perhaps a default user option to request it)
we'd be guaranteed to be able to read old log messages and "git show"
them, and servers could avoid breaking past URLs without maintaining the
mapping going forward.

One example of this on the server is that on GitLab (I don't know how
GitHub does this) when you reference a commit from e.g a bug, a
refs/keep-around/<sha1> is created, to make sure it doesn't get GC'd.

Those sorts of hosting providers would like to not break *existing*
links, without needing to forever maintain a bidirectional mapping.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-23 14:02 Questions about the hash function transition Ævar Arnfjörð Bjarmason
@ 2018-08-23 14:27 ` Junio C Hamano
  2018-08-23 15:20   ` Ævar Arnfjörð Bjarmason
  2018-08-24  1:40 ` brian m. carlson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2018-08-23 14:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Linus Torvalds, Edward Thomson, brian m . carlson,
	Jonathan Nieder, Johannes Schindelin, demerphq, Brandon Williams,
	Derrick Stolee

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> - The trailer consists of the following:
>>   - A copy of the 20-byte SHA-256 checksum at the end of the
>>     corresponding packfile.
>>
>>   - 20-byte SHA-256 checksum of all of the above.
>
> We need to update both of these to 32 byte, right? Or are we planning to
> truncate the checksums?

https://public-inbox.org/git/CA+55aFwc7UQ61EbNJ36pFU_aBCXGya4JuT-TvpPJ21hKhRengQ@mail.gmail.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-23 14:27 ` Junio C Hamano
@ 2018-08-23 15:20   ` Ævar Arnfjörð Bjarmason
  2018-08-23 16:13     ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-23 15:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Linus Torvalds, Edward Thomson, brian m . carlson,
	Jonathan Nieder, Johannes Schindelin, demerphq, Brandon Williams,
	Derrick Stolee

On Thu, Aug 23 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> - The trailer consists of the following:
>>>   - A copy of the 20-byte SHA-256 checksum at the end of the
>>>     corresponding packfile.
>>>
>>>   - 20-byte SHA-256 checksum of all of the above.
>>
>> We need to update both of these to 32 byte, right? Or are we planning to
>> truncate the checksums?
>
> https://public-inbox.org/git/CA+55aFwc7UQ61EbNJ36pFU_aBCXGya4JuT-TvpPJ21hKhRengQ@mail.gmail.com/

Thanks.

Yeah for this checksum purpose even 10 or 5 characters would do, but
since we'll need a new pack format anyway for SHA-256 why not just use
the full length of the SHA-256 here? We're using the full length of the
SHA-1.

I don't see it mattering for security / corruption detection purposes,
but just to avoid confusion. We'll have this one place left where
something looks like a SHA-1, but is actually a trunctated SHA-256.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-23 15:20   ` Ævar Arnfjörð Bjarmason
@ 2018-08-23 16:13     ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2018-08-23 16:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Linus Torvalds, Edward Thomson, brian m . carlson,
	Jonathan Nieder, Johannes Schindelin, demerphq, Brandon Williams,
	Derrick Stolee

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Aug 23 2018, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>>>> - The trailer consists of the following:
>>>>   - A copy of the 20-byte SHA-256 checksum at the end of the
>>>>     corresponding packfile.
>>>>
>>>>   - 20-byte SHA-256 checksum of all of the above.
>>>
>>> We need to update both of these to 32 byte, right? Or are we planning to
>>> truncate the checksums?
>>
>> https://public-inbox.org/git/CA+55aFwc7UQ61EbNJ36pFU_aBCXGya4JuT-TvpPJ21hKhRengQ@mail.gmail.com/
>
> Thanks.
>
> Yeah for this checksum purpose even 10 or 5 characters would do, but
> since we'll need a new pack format anyway for SHA-256 why not just use
> the full length of the SHA-256 here? We're using the full length of the
> SHA-1.
>
> I don't see it mattering for security / corruption detection purposes,
> but just to avoid confusion. We'll have this one place left where
> something looks like a SHA-1, but is actually a trunctated SHA-256.

I would prefer to see us at least explore if the gain in throughput
is sufficiently big if we switch to weaker checksum, like crc32.  If
does not give us sufficient gain, I'd agree with you that consistently
using full hash everywhere would conceptually be cleaner.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-23 14:02 Questions about the hash function transition Ævar Arnfjörð Bjarmason
  2018-08-23 14:27 ` Junio C Hamano
@ 2018-08-24  1:40 ` brian m. carlson
  2018-08-24  1:54   ` Jonathan Nieder
  2018-08-24  1:47 ` Jonathan Nieder
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: brian m. carlson @ 2018-08-24  1:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Linus Torvalds, Edward Thomson,
	Jonathan Nieder, Johannes Schindelin, demerphq, Brandon Williams,
	Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 5395 bytes --]

On Thu, Aug 23, 2018 at 04:02:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > [...]
> > Goals
> > -----
> > 1. The transition to SHA-256 can be done one local repository at a time.
> >    a. Requiring no action by any other party.
> >    b. A SHA-256 repository can communicate with SHA-1 Git servers
> >       (push/fetch).
> >    c. Users can use SHA-1 and SHA-256 identifiers for objects
> >       interchangeably (see "Object names on the command line", below).
> >    d. New signed objects make use of a stronger hash function than
> >       SHA-1 for their security guarantees.
> > 2. Allow a complete transition away from SHA-1.
> >    a. Local metadata for SHA-1 compatibility can be removed from a
> >       repository if compatibility with SHA-1 is no longer needed.
> > 3. Maintainability throughout the process.
> >    a. The object format is kept simple and consistent.
> >    b. Creation of a generalized repository conversion tool.
> >
> > Non-Goals
> > ---------
> > 1. Add SHA-256 support to Git protocol. This is valuable and the
> >    logical next step but it is out of scope for this initial design.
> 
> This is a non-goal according to the docs, but now that we have protocol
> v2 in git, perhaps we could start specifying or describing how this
> protocol extension will work?

I have code that does this.  The reason is that the first stage of the
transition code is to implement stage 4 of the transition: that is, a
full SHA-256 implementation without any SHA-1 support.  Implementing it
that way means that we don't have to deal with any of the SHA-1 to
SHA-256 mapping in the first stage of the code.

In order to clone an SHA-256 repo (which the testsuite is completely
broken without), you need to be able to have basic SHA-256 support in
the protocol.  I know this was a non-goal, but the alternative is a an
inability to run the testsuite using SHA-256 until all the code is
merged, which is unsuitable for development.  The transition plan also
anticipates stage 4 (full SHA-256) support before earlier stages, so
this will be required.

I hope to be able to spend some time documenting this in a little bit.
I have documentation for that code in my branch, but I haven't sent it
in yet.

I realize I have a lot of code that has not been sent in yet, but I also
tend to build on my own series a lot, and I probably need to be a bit
better about extracting reusable pieces that can go in independently
without waiting for the previous series to land.

> > [...]
> > 3. Intermixing objects using multiple hash functions in a single
> >    repository.
> 
> But isn't that the goal now per "Translation table" & writing both SHA-1
> and SHA-256 versions of objects?

No, I think this statement is basically that you have to have the entire
repository use all one algorithm under the hood in the .git directory,
translation tables excluded.  I don't think that's controversial.

> > [...]
> > Pack index
> > ~~~~~~~~~~
> > Pack index (.idx) files use a new v3 format that supports multiple
> > hash functions. They have the following format (all integers are in
> > network byte order):
> >
> > - A header appears at the beginning and consists of the following:
> >   - The 4-byte pack index signature: '\377t0c'
> >   - 4-byte version number: 3
> >   - 4-byte length of the header section, including the signature and
> >     version number
> >   - 4-byte number of objects contained in the pack
> >   - 4-byte number of object formats in this pack index: 2
> >   - For each object format:
> >     - 4-byte format identifier (e.g., 'sha1' for SHA-1)
> 
> So, given that we have 4-byte limit and have decided on SHA-256 are we
> just going to call this 'sha2'? That might be confusingly ambiguous
> since SHA2 is a standard with more than just SHA-256, maybe 's256', or
> maybe we should give this 8 bytes with trailing \0s so we can have
> "SHA-1\0\0\0" and "SHA-256\0"?

This is the format_version field in struct git_hash_algo.

For SHA-1, I have 0x73686131, which is "sha1", big-endian, and for
SHA-256, I have 0x73323536, which is "s256", big-endian.  The former is
in the codebase already; the latter, in my hash-impl branch.

If people have objections, we can change this up until we merge the pack
index v3 code (which is not yet finished).  It needs to be unique, and
that's it.  We could specify 0x00000001 and 0x00000002 if we wanted,
although I feel the values I mentioned above are self-documenting, which
is desirable.

> > [...]
> > - The trailer consists of the following:
> >   - A copy of the 20-byte SHA-256 checksum at the end of the
> >     corresponding packfile.
> >
> >   - 20-byte SHA-256 checksum of all of the above.
> 
> We need to update both of these to 32 byte, right? Or are we planning to
> truncate the checksums?
> 
> This seems like just a mistake when we did s/NewHash/SHA-256/g, but then
> again it was originally "20-byte NewHash checksum" ever since 752414ae43
> ("technical doc: add a design doc for hash function transition",
> 2017-09-27), so what do we mean here?

Yes, this will be 32 bytes.  The code I have uses 32 bytes, because
truncating it means that we have to write special code just for that
case, which seems silly.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-24  1:40 ` brian m. carlson
@ 2018-08-24  1:54   ` Jonathan Nieder
  2018-08-24  4:47     ` brian m. carlson
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-24  1:54 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee

Hi,

brian m. carlson wrote:
> On Thu, Aug 23, 2018 at 04:02:51PM +0200, Ævar Arnfjörð Bjarmason wrote:

>>> 1. Add SHA-256 support to Git protocol. This is valuable and the
>>>    logical next step but it is out of scope for this initial design.
>>
>> This is a non-goal according to the docs, but now that we have protocol
>> v2 in git, perhaps we could start specifying or describing how this
>> protocol extension will work?
>
> I have code that does this.  The reason is that the first stage of the
[nice explanation snipped]
> I hope to be able to spend some time documenting this in a little bit.
> I have documentation for that code in my branch, but I haven't sent it
> in yet.

Yay!

> I realize I have a lot of code that has not been sent in yet, but I also
> tend to build on my own series a lot, and I probably need to be a bit
> better about extracting reusable pieces that can go in independently
> without waiting for the previous series to land.

For what it's worth, even if it all is in one commit with message
"wip", I think I'd benefit from being able to see this code.  I can
promise not to critique it, and to only treat it as a rough
premonition of the future.

[...]
> For SHA-1, I have 0x73686131, which is "sha1", big-endian, and for
> SHA-256, I have 0x73323536, which is "s256", big-endian.  The former is
> in the codebase already; the latter, in my hash-impl branch.

I mentioned in another reply that "sha2" sounds fine.  "s256" of
course also sounds fine to me.  Thanks to Ævar for asking so that we
have the reminder to pin it down in the doc.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-24  1:54   ` Jonathan Nieder
@ 2018-08-24  4:47     ` brian m. carlson
  2018-08-24  4:52       ` Jonathan Nieder
  0 siblings, 1 reply; 33+ messages in thread
From: brian m. carlson @ 2018-08-24  4:47 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

On Thu, Aug 23, 2018 at 06:54:38PM -0700, Jonathan Nieder wrote:
> brian m. carlson wrote:
> > I realize I have a lot of code that has not been sent in yet, but I also
> > tend to build on my own series a lot, and I probably need to be a bit
> > better about extracting reusable pieces that can go in independently
> > without waiting for the previous series to land.
> 
> For what it's worth, even if it all is in one commit with message
> "wip", I think I'd benefit from being able to see this code.  I can
> promise not to critique it, and to only treat it as a rough
> premonition of the future.

It's in my object-id-partn branch at https://github.com/bk2204/git.

It doesn't do protocol v2 yet, but it does do protocol v1.  It is, of
course, subject to change (especially naming) depending on what the list
thinks is most appropriate.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-24  4:47     ` brian m. carlson
@ 2018-08-24  4:52       ` Jonathan Nieder
  0 siblings, 0 replies; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-24  4:52 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee

brian m. carlson wrote:
> On Thu, Aug 23, 2018 at 06:54:38PM -0700, Jonathan Nieder wrote:

>> For what it's worth, even if it all is in one commit with message
>> "wip", I think I'd benefit from being able to see this code.  I can
>> promise not to critique it, and to only treat it as a rough
>> premonition of the future.
>
> It's in my object-id-partn branch at https://github.com/bk2204/git.
>
> It doesn't do protocol v2 yet, but it does do protocol v1.  It is, of
> course, subject to change (especially naming) depending on what the list
> thinks is most appropriate.

 $ git diff --shortstat origin/master...bmc/object-id-partn
  185 files changed, 2263 insertions(+), 1535 deletions(-)

Beautiful.  Thanks much for this.

Sincerely,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-23 14:02 Questions about the hash function transition Ævar Arnfjörð Bjarmason
  2018-08-23 14:27 ` Junio C Hamano
  2018-08-24  1:40 ` brian m. carlson
@ 2018-08-24  1:47 ` Jonathan Nieder
  2018-08-28 12:04   ` Johannes Schindelin
  2018-08-29  9:13   ` How is the ^{sha256} peel syntax supposed to work? Ævar Arnfjörð Bjarmason
  2018-08-24  2:51 ` Questions about the hash function transition Jonathan Nieder
  2018-08-28 13:50 ` Ævar Arnfjörð Bjarmason
  4 siblings, 2 replies; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-24  1:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Linus Torvalds, Edward Thomson,
	brian m . carlson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee

Hi,

Ævar Arnfjörð Bjarmason wrote:

> I wanted to send another series to clarify things in
> hash-function-transition.txt, but for some of the issues I don't know
> the answer, and I had some questions after giving this another read.

Thanks for looking it over!  Let's go. :)

[...]
>> Objective
>> ---------
>> Migrate Git from SHA-1 to a stronger hash function.
>
> Should way say "Migrate Git from SHA-1 to SHA-256" here instead?
>
> Maybe it's overly specific, i.e. really we're also describnig how /any/
> hash function transition might happen, but having just read this now
> from start to finish it takes us a really long time to mention (and at
> first, only offhand) that SHA-256 is the new hash.

Well, the objective really is to migrate to a stronger hash function,
and that we chose SHA-256 is part of the details of how we chose to do
that.  So I think this would be a misleading change.

You can tell that I'm not just trying to justify after the fact
because the initial version of the design doc at [*] already uses this
wording, and that version assumed that the hash function was going to
be SHA-256.

[*] https://public-inbox.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com/

[...]
>> Non-Goals
>> ---------
>> 1. Add SHA-256 support to Git protocol. This is valuable and the
>>    logical next step but it is out of scope for this initial design.
>
> This is a non-goal according to the docs, but now that we have protocol
> v2 in git, perhaps we could start specifying or describing how this
> protocol extension will work?

Yes, that would be great!  But I suspect it's cleanest to do so in a
separate doc.  That would allow clarifying this part, by pointing to
the protocol doc.

[...]
>> 3. Intermixing objects using multiple hash functions in a single
>>    repository.
>
> But isn't that the goal now per "Translation table" & writing both SHA-1
> and SHA-256 versions of objects?

No, we don't write both versions of objects.  The translation records
both names of an object.

[...]
>>   - For each object format:
>>     - 4-byte format identifier (e.g., 'sha1' for SHA-1)
>
> So, given that we have 4-byte limit and have decided on SHA-256 are we
> just going to call this 'sha2'?

Good question.  'sha2' sounds fine to me.  If we want to do
SHA-512/256 later, say, we'd just have to come up with a name for that
at that point (and it doesn't have to be ASCII).

>                                 That might be confusingly ambiguous

This is a binary format.  Are you really worried that people are going
to misinterpret the magic numbers it contains?

> since SHA2 is a standard with more than just SHA-256, maybe 's256', or
> maybe we should give this 8 bytes with trailing \0s so we can have
> "SHA-1\0\0\0" and "SHA-256\0"?

For what it's worth, if that's the alternative, I'd rather have four
random bytes.

[...]
>> The loose object index is protected against concurrent writes by a
>> lock file $GIT_OBJECT_DIR/loose-object-idx.lock. To add a new loose
>> object:
>>
>> 1. Write the loose object to a temporary file, like today.
>> 2. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the lock.
>> 3. Rename the loose object into place.
>> 4. Open loose-object-idx with O_APPEND and write the new object
>> 5. Unlink loose-object-idx.lock to release the lock.
>>
>> To remove entries (e.g. in "git pack-refs" or "git-prune"):
>>
>> 1. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the
>>    lock.
>> 2. Write the new content to loose-object-idx.lock.
>> 3. Unlink any loose objects being removed.
>> 4. Rename to replace loose-object-idx, releasing the lock.
>
> Do we expect multiple concurrent writers to poll the lock if they can't
> aquire it right away? I.e. concurrent "git commit" would block? Has this
> overall approach been benchmarked somewhere?

Git doesn't support concurrent "git commit" today.

My feeling is that if loose object writing becomes a performance
problem, we should switch to writing packfiles instead (as "git
receive-pack" already does).  So when there's a choice between better
performance of writing loose objects and simplicity, I lean toward
simplicity (though that's not absolute, there are definitely tradeoffs
to be made).

Earlier discussion about this had sharded loose object indices for
each xy/ subdir.  It was more complicated, for not much gain.

[...]
> Maybe I've missed some subtlety where that won't work, I'm just
> concerned that something that's writing a lot of objects in parallel
> will be slowed down (e.g. the likes of BFG repo cleaner).

BFG repo cleaner is an application like fast-import that is a good fit
for writing packs, not loose objects.

[...]
>> Since all operations that make new objects (e.g., "git commit") add
>> the new objects to the corresponding index, this mapping is possible
>> for all objects in the object store.
>
> Are we going to need a midx version of these mapping files? How does
> midx fit into this picture? Perhaps it's too obscure to worry about...

That's a great question!  I think the simplest answer is to have a
midx only for the primary object format and fall back to using
ordinary idx files for the others.

The midx format already has a field for hash function (thanks,
Derrick!).

[...]
>> 5. clean up: remove the SHA-1 based pack file, index, and
>>    topologically sorted list obtained from the server in steps 1
>>    and 2.
>
> Doesn't this process require us to implement a "fetch quarantine"? Least
> we have (e.g. other concurrent fetches) referencing those new SHA-1
> objects we've fetched in a pack that we'll remove in step #5?

During a fetch today, objects aren't accessible until the
corresponding .idx file has been put in place.

[...]
>> The user can also explicitly specify which format to use for a
>> particular revision specifier and for output, overriding the mode. For
>> example:
>>
>> git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
>
> How is this going to interact with other peel syntax? I.e. now we have
> <object>^{commit} <sha>^{tag} etc. It seems to me we'll need not ^{sha1}
> but ^{sha1:<current_type>}, e.g. ^{sha1:commit} or ^{sha1:tag}, with
> current ^{} being a synonym for ^{sha1:}.
>
> Or is this expected to be chained, as e.g. <object>^{tag}^{sha256} ?

Great question.  The latter (well, <hexdigits>^{sha256}^{tag}, not the
other way around).

>> Transition plan
>> ---------------
>
> One thing that's not covered in this document at all, which I feel is
> missing, is how we're going to handle references to old commit IDs in
> commit messages, bug trackers etc. once we go through the whole
> migration process.
>
> I.e. are users who expect to be able to read old history and "git show
> <sha1 I found>" expected to maintain a repository that has a live
> sha1<->sha256 mapping forever, or could we be smarter about this and
> support some sort of marker in the repository saying "maintain the
> mapping up until this point".

That's a good question, too.  My feeling is that such a selective
mapping could be invented later and would want to work differently
than this design.  The important thing with this design is that the
information is not lost, so the door to implementing that is not
closed.

As a brief strawman of what I mean, I wouldn't be surprised if
projects want to distribute a simple signed flat sha1<->sha256 mapping
table for commits from "before the SHA-256 era", and Git could learn
to consume that.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-24  1:47 ` Jonathan Nieder
@ 2018-08-28 12:04   ` Johannes Schindelin
  2018-08-28 12:49     ` Derrick Stolee
  2018-08-28 17:11     ` Jonathan Nieder
  2018-08-29  9:13   ` How is the ^{sha256} peel syntax supposed to work? Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 33+ messages in thread
From: Johannes Schindelin @ 2018-08-28 12:04 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, brian m . carlson, demerphq,
	Brandon Williams, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 908 bytes --]

Hi,

On Thu, 23 Aug 2018, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
> 
> [...]
> >> Since all operations that make new objects (e.g., "git commit") add
> >> the new objects to the corresponding index, this mapping is possible
> >> for all objects in the object store.
> >
> > Are we going to need a midx version of these mapping files? How does
> > midx fit into this picture? Perhaps it's too obscure to worry about...
> 
> That's a great question!  I think the simplest answer is to have a
> midx only for the primary object format and fall back to using
> ordinary idx files for the others.
> 
> The midx format already has a field for hash function (thanks,
> Derrick!).

Related: I wondered whether we could simply leverage the midx code for the
bidirectional SHA-1 <-> SHA-256 mapping, as it strikes me as very similar
in concept and challenges.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-28 12:04   ` Johannes Schindelin
@ 2018-08-28 12:49     ` Derrick Stolee
  2018-08-28 17:12       ` Jonathan Nieder
  2018-08-28 17:11     ` Jonathan Nieder
  1 sibling, 1 reply; 33+ messages in thread
From: Derrick Stolee @ 2018-08-28 12:49 UTC (permalink / raw)
  To: Johannes Schindelin, Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, brian m . carlson, demerphq,
	Brandon Williams

On 8/28/2018 8:04 AM, Johannes Schindelin wrote:
> Hi,
>
> On Thu, 23 Aug 2018, Jonathan Nieder wrote:
>
>> Ævar Arnfjörð Bjarmason wrote:
>>
>> [...]
>>>> Since all operations that make new objects (e.g., "git commit") add
>>>> the new objects to the corresponding index, this mapping is possible
>>>> for all objects in the object store.
>>> Are we going to need a midx version of these mapping files? How does
>>> midx fit into this picture? Perhaps it's too obscure to worry about...
>> That's a great question!  I think the simplest answer is to have a
>> midx only for the primary object format and fall back to using
>> ordinary idx files for the others.
>>
>> The midx format already has a field for hash function (thanks,
>> Derrick!).
> Related: I wondered whether we could simply leverage the midx code for the
> bidirectional SHA-1 <-> SHA-256 mapping, as it strikes me as very similar
> in concept and challenges.

If we would like such a mapping, then I would propose the following:

1. The object store has everything in SHA-256, so the HASH_LEN parameter 
of the multi-pack-index is 32.

2. We create an optional chunk to add to the multi-pack-index that 
stores the SHA-1 for each object. This list would be in lex order.

3. We create two optional chunks that store the bijection between 
SHA-256 and SHA-1: the first is a list of integers i_0, i_1, ..., 
i_{N-1} such that i_k is the position in the SHA-1 list corresponding to 
the kth SHA-256. The second is a list of integers j_0, j_1, ..., j_{N-1} 
such that j_k is the position in the SHA-256 list of the kth SHA-1.

I'm not super-familiar with how the transition plan specifically needs 
this mapping, but it seems like a good place to put it.

Thanks,

-Stolee

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-28 12:49     ` Derrick Stolee
@ 2018-08-28 17:12       ` Jonathan Nieder
  0 siblings, 0 replies; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-28 17:12 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Johannes Schindelin, Ævar Arnfjörð Bjarmason, git,
	Junio C Hamano, Linus Torvalds, Edward Thomson, brian m . carlson,
	demerphq, Brandon Williams

Derrick Stolee wrote:

> I'm not super-familiar with how the transition plan specifically needs this
> mapping, but it seems like a good place to put it.

Would you mind reading it through and letting me know your thoughts?
More eyes can't hurt.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-28 12:04   ` Johannes Schindelin
  2018-08-28 12:49     ` Derrick Stolee
@ 2018-08-28 17:11     ` Jonathan Nieder
  2018-08-29 13:09       ` Johannes Schindelin
  1 sibling, 1 reply; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-28 17:11 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, brian m . carlson, demerphq,
	Brandon Williams, Derrick Stolee

Hi,

Johannes Schindelin wrote:
> On Thu, 23 Aug 2018, Jonathan Nieder wrote:
> > Ævar Arnfjörð Bjarmason wrote:

>>> Are we going to need a midx version of these mapping files? How does
>>> midx fit into this picture? Perhaps it's too obscure to worry about...
>>
>> That's a great question!  I think the simplest answer is to have a
>> midx only for the primary object format and fall back to using
>> ordinary idx files for the others.
>>
>> The midx format already has a field for hash function (thanks,
>> Derrick!).
>
> Related: I wondered whether we could simply leverage the midx code for the
> bidirectional SHA-1 <-> SHA-256 mapping, as it strikes me as very similar
> in concept and challenges.

Interesting: tell me more.

My first instinct is to prefer the idx-based design that is already
described in the design doc.  If we want to change that, we should
have a motivating reason.

Midx is designed to be optional and to not necessarily cover all
objects, so it doesn't seem like a good fit.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-28 17:11     ` Jonathan Nieder
@ 2018-08-29 13:09       ` Johannes Schindelin
  2018-08-29 13:27         ` Derrick Stolee
  0 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2018-08-29 13:09 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, brian m . carlson, demerphq,
	Brandon Williams, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]

Hi Jonathan,

On Tue, 28 Aug 2018, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> > On Thu, 23 Aug 2018, Jonathan Nieder wrote:
> > > Ævar Arnfjörð Bjarmason wrote:
> 
> >>> Are we going to need a midx version of these mapping files? How does
> >>> midx fit into this picture? Perhaps it's too obscure to worry
> >>> about...
> >>
> >> That's a great question!  I think the simplest answer is to have a
> >> midx only for the primary object format and fall back to using
> >> ordinary idx files for the others.
> >>
> >> The midx format already has a field for hash function (thanks,
> >> Derrick!).
> >
> > Related: I wondered whether we could simply leverage the midx code for
> > the bidirectional SHA-1 <-> SHA-256 mapping, as it strikes me as very
> > similar in concept and challenges.
> 
> Interesting: tell me more.
> 
> My first instinct is to prefer the idx-based design that is already
> described in the design doc.  If we want to change that, we should
> have a motivating reason.
> 
> Midx is designed to be optional and to not necessarily cover all
> objects, so it doesn't seem like a good fit.

Right.

What I meant was to leverage the midx code, not the .midx files.

My comment was motivated by my realizing that both the SHA-1 <-> SHA-256
mapping and the MIDX code have to look up (in a *fast* way) information
with hash values as keys. *And* this information is immutable. *And* the
amount of information should grow with new objects being added to the
database.

I know that Stolee performed a bit of performance testing regarding
different data structures to use in MIDX. We could benefit from that
testing by using not only the results from those tests, but also the code.

IIRC one of the insights was that packs are a natural structure that
can be used for the MIDX mapping, too (you could, for example, store the
SHA-1 <-> SHA-256 mapping *only* for objects inside packs, and re-generate
them on the fly for loose objects all the time).

Stolee can speak with much more competence and confidence about this,
though, whereas all of what I said above is me waving my hands quite
frantically.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-29 13:09       ` Johannes Schindelin
@ 2018-08-29 13:27         ` Derrick Stolee
  2018-08-29 14:43           ` Derrick Stolee
  0 siblings, 1 reply; 33+ messages in thread
From: Derrick Stolee @ 2018-08-29 13:27 UTC (permalink / raw)
  To: Johannes Schindelin, Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, brian m . carlson, demerphq,
	Brandon Williams

On 8/29/2018 9:09 AM, Johannes Schindelin wrote:
> Hi Jonathan,
>
> On Tue, 28 Aug 2018, Jonathan Nieder wrote:
>
>> Johannes Schindelin wrote:
>>> On Thu, 23 Aug 2018, Jonathan Nieder wrote:
>>>> Ævar Arnfjörð Bjarmason wrote:
>>>>> Are we going to need a midx version of these mapping files? How does
>>>>> midx fit into this picture? Perhaps it's too obscure to worry
>>>>> about...
>>>> That's a great question!  I think the simplest answer is to have a
>>>> midx only for the primary object format and fall back to using
>>>> ordinary idx files for the others.
>>>>
>>>> The midx format already has a field for hash function (thanks,
>>>> Derrick!).
>>> Related: I wondered whether we could simply leverage the midx code for
>>> the bidirectional SHA-1 <-> SHA-256 mapping, as it strikes me as very
>>> similar in concept and challenges.
>> Interesting: tell me more.
>>
>> My first instinct is to prefer the idx-based design that is already
>> described in the design doc.  If we want to change that, we should
>> have a motivating reason.
>>
>> Midx is designed to be optional and to not necessarily cover all
>> objects, so it doesn't seem like a good fit.

It is optional, but shouldn't this mode where a Git repo that needs to 
know about two different versions of all files be optional? Or at least 
temporary?

The multi-pack-index is intended to cover all packed objects, so covers 
the same number of objects as an IDX-based strategy. If we are 
rebuilding the repo from scratch by translating the hashes, then "being 
too big to repack" is probably not a problem, so we would expect a 
single IDX file anyway.

In my opinion, whatever we do for the IDX-based approach will need to be 
duplicated in the multi-pack-index. The multi-pack-index does have a 
natural mechanism (optional chunks) for inserting this data without 
incrementing the version number.

> Right.
>
> What I meant was to leverage the midx code, not the .midx files.
>
> My comment was motivated by my realizing that both the SHA-1 <-> SHA-256
> mapping and the MIDX code have to look up (in a *fast* way) information
> with hash values as keys. *And* this information is immutable. *And* the
> amount of information should grow with new objects being added to the
> database.

I'm unsure what this means, as the multi-pack-index simply uses 
bsearch_hash() to find hashes in the list. The same method is used for 
IDX lookups.

> I know that Stolee performed a bit of performance testing regarding
> different data structures to use in MIDX. We could benefit from that
> testing by using not only the results from those tests, but also the code.

I did test ways to use something other than bsearch_hash(), such as 
using a 65,536-entry fanout table for lookups using the first two bytes 
of a hash (tl;dr: it speeds things up a bit, but the super-small 
improvement is probably not worth the space and complexity). I've also 
toyed with the idea of using interpolation search inside bsearch_hash(), 
but I haven't had time to do that.

> IIRC one of the insights was that packs are a natural structure that
> can be used for the MIDX mapping, too (you could, for example, store the
> SHA-1 <-> SHA-256 mapping *only* for objects inside packs, and re-generate
> them on the fly for loose objects all the time).
>
> Stolee can speak with much more competence and confidence about this,
> though, whereas all of what I said above is me waving my hands quite
> frantically.

I understand the hesitation to pair such an important feature (hash 
transition) to a feature that hasn't even shipped. We will need to see 
how things progress on both fronts to see how mature the 
multi-pack-index is when we need this transition table.

Thanks,

-Stolee

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-29 13:27         ` Derrick Stolee
@ 2018-08-29 14:43           ` Derrick Stolee
  0 siblings, 0 replies; 33+ messages in thread
From: Derrick Stolee @ 2018-08-29 14:43 UTC (permalink / raw)
  To: Johannes Schindelin, Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, brian m . carlson, demerphq,
	Brandon Williams

On 8/29/2018 9:27 AM, Derrick Stolee wrote:
> On 8/29/2018 9:09 AM, Johannes Schindelin wrote:
>>
>> What I meant was to leverage the midx code, not the .midx files.
>>
>> My comment was motivated by my realizing that both the SHA-1 <-> SHA-256
>> mapping and the MIDX code have to look up (in a *fast* way) information
>> with hash values as keys. *And* this information is immutable. *And* the
>> amount of information should grow with new objects being added to the
>> database.
>
> I'm unsure what this means, as the multi-pack-index simply uses 
> bsearch_hash() to find hashes in the list. The same method is used for 
> IDX lookups.
>
I talked with Johannes privately, and we found differences in our 
understanding of the current multi-pack-index feature. Johannes thought 
the feature was farther along than it is, specifically related to how 
much we value the data in the multi-pack-index when adding objects to 
pack-files or repacking. Some of this misunderstanding is due to how the 
equivalent feature works in VSTS (where there is no IDX-file equivalent, 
every object in the repo is tracked by a multi-pack-index).

I'd like to point out a few things about how the multi-pack-index works 
now, and how we hope to extend it in the future.

Currently:

1. Objects are added to the multi-pack-index by adding a new set of 
.idx/.pack file pairs. We scan the .idx file for the objects and offsets 
to add.

2. We re-use the information in the multi-pack-index only to write the 
new one without re-reading the .pack files that are already covered.

3. If a 'git repack' command deletes a pack-file, then we delete the 
multi-pack-index. It must be regenerated by 'git multi-pack-index write' 
later.

In the current world, the multi-pack-index is completely secondary to 
the .idx files.

In the future, I hope these features exist in the multi-pack-index:

1. A stable object order. As objects are added to the multi-pack-index, 
we assign a distinct integer value to each. As we add objects, those 
integers values do not change. We can then pair the reachability bitmap 
to the multi-pack-index instead of a specific pack-file (allowing repack 
and bitmap computations to happen asynchronously). The data required to 
store this object order is very similar to storing the bijection between 
SHA-1 and SHA-256 hashes.

2. Incremental multi-pack-index: Currently, we have only one 
multi-pack-index file per object directory. We can use a mechanism 
similar to the split-index to keep a small number of multi-pack-index 
files (at most 3, probably) such that the 
'.git/objects/pack/multi-pack-index' file is small and easy to rewrite, 
while it refers to larger '.git/objects/pack/*.midx' files that change 
infrequently.

3. Multi-pack-index-aware repack: The repacker only knows about the 
multi-pack-index enough to delete it. We could instead directly 
manipulate the multi-pack-index during repack, and we could decide to do 
more incremental repacks based on data stored in the multi-pack-index.

In conclusion: please keep the multi-pack-index in mind as we implement 
the transition plan. I'll continue building the feature as planned (the 
next thing to do after the current series of cleanups is 'git 
multi-pack-index verify') but am happy to look into other applications 
as we need it.

Thanks,

-Stolee

^ permalink raw reply	[flat|nested] 33+ messages in thread

* How is the ^{sha256} peel syntax supposed to work?
  2018-08-24  1:47 ` Jonathan Nieder
  2018-08-28 12:04   ` Johannes Schindelin
@ 2018-08-29  9:13   ` Ævar Arnfjörð Bjarmason
  2018-08-29 17:51     ` Stefan Beller
  2018-08-29 17:56     ` Jonathan Nieder
  1 sibling, 2 replies; 33+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-29  9:13 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Junio C Hamano, Linus Torvalds, Edward Thomson,
	brian m . carlson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee

On Fri, Aug 24 2018, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>
>>> git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
>>
>> How is this going to interact with other peel syntax? I.e. now we have
>> <object>^{commit} <sha>^{tag} etc. It seems to me we'll need not ^{sha1}
>> but ^{sha1:<current_type>}, e.g. ^{sha1:commit} or ^{sha1:tag}, with
>> current ^{} being a synonym for ^{sha1:}.
>>
>> Or is this expected to be chained, as e.g. <object>^{tag}^{sha256} ?
>
> Great question.  The latter (well, <hexdigits>^{sha256}^{tag}, not the
> other way around).

Since nobody's chimed in with an answer, and I suspect many have an
adversion to that big thread I thought I'd spin out just this small
question into its own thread.

brian m. carlson did some prep work for this in his just-submitted
https://public-inbox.org/git/20180829005857.980820-2-sandals@crustytoothpaste.net/

I was going to work on some of the peel code soon (digging up the type
disambiguation patches I still need to re-submit), so could do this
while I'm at it, i.e. implement ^{sha1}.

But as noted above it's not clear how it should work. Jonathan's
chaining suggestion (<hexdigits>^{sha256}^{tag} not
<hexdigits>^{tag}^{sha256}) makes more sense than mine, but is that what
we're going for, or ^{sha256:tag}?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29  9:13   ` How is the ^{sha256} peel syntax supposed to work? Ævar Arnfjörð Bjarmason
@ 2018-08-29 17:51     ` Stefan Beller
  2018-08-29 17:59       ` Jonathan Nieder
  2018-08-29 17:56     ` Jonathan Nieder
  1 sibling, 1 reply; 33+ messages in thread
From: Stefan Beller @ 2018-08-29 17:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jonathan Nieder, git, Junio C Hamano, Linus Torvalds,
	Edward Thomson, brian m. carlson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee

On Wed, Aug 29, 2018 at 2:13 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Fri, Aug 24 2018, Jonathan Nieder wrote:
>
> > Hi,
> >
> > Ævar Arnfjörð Bjarmason wrote:
> >
> >>> git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
> >>
> >> How is this going to interact with other peel syntax? I.e. now we have
> >> <object>^{commit} <sha>^{tag} etc. It seems to me we'll need not ^{sha1}
> >> but ^{sha1:<current_type>}, e.g. ^{sha1:commit} or ^{sha1:tag}, with
> >> current ^{} being a synonym for ^{sha1:}.
> >>
> >> Or is this expected to be chained, as e.g. <object>^{tag}^{sha256} ?
> >
> > Great question.  The latter (well, <hexdigits>^{sha256}^{tag}, not the
> > other way around).
>
> Since nobody's chimed in with an answer, and I suspect many have an
> adversion to that big thread I thought I'd spin out just this small
> question into its own thread.
>
> brian m. carlson did some prep work for this in his just-submitted
> https://public-inbox.org/git/20180829005857.980820-2-sandals@crustytoothpaste.net/
>
> I was going to work on some of the peel code soon (digging up the type
> disambiguation patches I still need to re-submit), so could do this
> while I'm at it, i.e. implement ^{sha1}.
>
> But as noted above it's not clear how it should work. Jonathan's
> chaining suggestion (<hexdigits>^{sha256}^{tag} not
> <hexdigits>^{tag}^{sha256}) makes more sense than mine, but is that what
> we're going for, or ^{sha256:tag}?

The choice of hash seems position independent to me, so as a user
I would expect both to work at first. Though when looking at more
syntax of these expressions, e.g. b9dfa238d5c34~1^2^^, it is
read left to right, i.e. you arrive at the destination by evaluating
the next part of the expression and then jumping around based on
each expression. And with that model, <hexdigits>^{sha256}^{tree}
could mean to obtain the sha256 value of <hexvalue> and then derive
the tree from that object, so it is unclear if the tree object would also come
in sha256 or if we could just return the tree in sha1 notation (as it would
be correctly - though confusingly - described that way. The sha256
conversion happened at an intermediate step.)

So with that said, I would expect the hash specifier at the end of the chain.

Would the position of the hash specifier make any difference for
verifying signed tags/commits ? (subtle asking to verify the sha1
signature or the sha256 signature explicitly vs asking to verify an object
that is given with <hexval> in sha1 or in sha256)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29 17:51     ` Stefan Beller
@ 2018-08-29 17:59       ` Jonathan Nieder
  2018-08-29 18:34         ` Stefan Beller
  2018-08-29 18:41         ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-29 17:59 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, brian m. carlson,
	Johannes Schindelin, demerphq, Brandon Williams, Derrick Stolee

Stefan Beller wrote:

>                  And with that model, <hexdigits>^{sha256}^{tree}
> could mean to obtain the sha256 value of <hexvalue> and then derive
> the tree from that object,

What does "the sha256 value of <hexvalue>" mean?

For example, in a repository with two objects:

 1. an object with sha1-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
    and sha256-name ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01

 2. an object with sha1-name ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
    and sha256-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...

what objects would you expect the following to refer to?

  abcdabcd^{sha1}
  abcdabcd^{sha256}
  ef01ef01^{sha1}
  ef01ef01^{sha256}

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29 17:59       ` Jonathan Nieder
@ 2018-08-29 18:34         ` Stefan Beller
  2018-08-29 18:41         ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2018-08-29 18:34 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Linus Torvalds, Edward Thomson, brian m. carlson,
	Johannes Schindelin, demerphq, Brandon Williams, Derrick Stolee

On Wed, Aug 29, 2018 at 10:59 AM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Stefan Beller wrote:
>
> >                  And with that model, <hexdigits>^{sha256}^{tree}
> > could mean to obtain the sha256 value of <hexvalue> and then derive
> > the tree from that object,
>
> What does "the sha256 value of <hexvalue>" mean?

s/hexvalue/hexdigits/
..

And with that model, <hexdigits>^{sha256}^{tree}
could mean to obtain the object using sha256 descriptors
(for trees/blobs/commits/tags) of <hexdigits> (as defined by
the step of the transition plan, it could mean <hexdigits>
to be interpreted as SHA1 or SHA256 or DWIM).

>
> For example, in a repository with two objects:
>
>  1. an object with sha1-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
>     and sha256-name ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>
>  2. an object with sha1-name ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>     and sha256-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...
>

It would be super cool to have hash values to match vice versa,
but for the sake of the example, let's go with that.

> what objects would you expect the following to refer to?

That generally depends on the step of the transition plan
in that specific Git repository.

I thought these format specifiers would only describe how to
output the object names correctly for now, so it could be
possible to have:

$ git show abcdabcd^{sha1}
commit abcdabcd...
....

$ git show abcdabcd^{sha256}
commit ef01ef01e...
....

in one step and

$ git show abcdabcd^{sha1}
commit ef01ef01e...
....

$ git show abcdabcd^{sha256}
commit abcdabcd...
....

in another step, and in yet another step it could mean

$ git show abcdabcd^{sha1}
commit abcdabcd[...]^{sha1}
...

But my question was more hinting to the point that we should not
overload the syntax to mean much more than either output formatting
or hash selection.

The third meaning could be used for verifying objects as we could
use this syntax to mean

  "please verify the signature of the object (as given by ^{hash}"

or it could mean

  "please verify the signature of the object as given and ensure that
    it was signed in this ^{hash} and not in a weaker hash world".

And I would think all the verification should not be folded into this
notation for now, but we only want to ask for the output to be
one or the other hash, or we could ask for an object that is
<hexdigits> in the specified hash, but these two modes depend
on the step in the transition plan.

Stefan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29 17:59       ` Jonathan Nieder
  2018-08-29 18:34         ` Stefan Beller
@ 2018-08-29 18:41         ` Ævar Arnfjörð Bjarmason
  2018-08-29 19:12           ` Jonathan Nieder
  1 sibling, 1 reply; 33+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-29 18:41 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Stefan Beller, git, Junio C Hamano, Linus Torvalds,
	Edward Thomson, brian m. carlson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee, Jeff King


On Wed, Aug 29 2018, Jonathan Nieder wrote:

> Stefan Beller wrote:
>
>>                  And with that model, <hexdigits>^{sha256}^{tree}
>> could mean to obtain the sha256 value of <hexvalue> and then derive
>> the tree from that object,
>
> What does "the sha256 value of <hexvalue>" mean?
>
> For example, in a repository with two objects:
>
>  1. an object with sha1-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
>     and sha256-name ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>
>  2. an object with sha1-name ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>     and sha256-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...

I'm not saying this makes sense, or that it doesn't honestly my head's
still spinning a bit from this mail exchange (these are the patches I
need to re-submit):
https://public-inbox.org/git/878t8txfyf.fsf@evledraar.gmail.com/#t

But paraphrasing my understanding of what Junio & Jeff are saying in
that thread, basically what the peel syntax means is different in the
two completely different scenarios it's used:

 1. When it's being used as <object>^{<thing>}[...^{<thing2>}] AND
    <object> is an unambiguous SHA1 it's fairly straightforward, i.e. if
    <object> is a commit and you say ^{tree} it lists the tree SHA-1,
    but if <object> is e.g. a tree and you say ^{blob} it produces an
    error, since there's no one blob.

 2. When it's used in the same way, but <object> is an ambiguous SHA1 we
    fall back on a completely different sort of behavior.

    Now it's, or well, supposed to be, I haven't worked through the
    feedback and rewritten the patches, this weird sort of filter syntax
    where <ambiguous_object>^{<type>} will return SHA1s of starting with
    a prefix of <ambiguous_object> IF the types of such SHA1s could be
    contained within that type of object.

    So e.g. abcabc^{tree} is supposed to list all tree and blob objects
    starting with a prefix of abcabc, even though some of the blobs
    could not be reachable from those trees.

    It doesn't make sense to me, but there it is.

Now, because of this SHA1 v.s. SHA256 thing we have a third case.

> what objects would you expect the following to refer to?
>
>   abcdabcd^{sha1}
>   abcdabcd^{sha256}
>   ef01ef01^{sha1}
>   ef01ef01^{sha256}

I still can't really make any sense of why anyone would even want #2 as
described above, but for this third case I think we should do this:

    abcdabcd^{sha1}   = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
    abcdabcd^{sha256} = ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
    ef01ef01^{sha1}   = ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
    ef01ef01^{sha256} = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...

I.e. a really useful thing about this peel syntax is that it's
forgiving, and will try to optimistically look up what you want.

So e.g. <hash>^{commit} is not an error if <hash> is already a commit,
it could be (why are you trying to peel something already peeled!),
because it's useful to be able to feed it a set of things, some of which
are commits, some of which are tags, and have it always resolve things
without error handling on the caller side.

Similarly, I think it would be very useful if we just make this work:

    git rev-parse $some_hash^{sha256}^{commit}

And not care whether $some_hash is SHA-1 or SHA-256, if it's the former
we'd consult the SHA-1 <-> SHA-256 lookup table and go from there, and
always return a useful value.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29 18:41         ` Ævar Arnfjörð Bjarmason
@ 2018-08-29 19:12           ` Jonathan Nieder
  2018-08-29 19:37             ` Ævar Arnfjörð Bjarmason
  2018-08-29 20:53             ` Junio C Hamano
  0 siblings, 2 replies; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-29 19:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Stefan Beller, git, Junio C Hamano, Linus Torvalds,
	Edward Thomson, brian m. carlson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee, Jeff King

Hi,

Ævar Arnfjörð Bjarmason wrote:
> On Wed, Aug 29 2018, Jonathan Nieder wrote:

>> what objects would you expect the following to refer to?
>>
>>   abcdabcd^{sha1}
>>   abcdabcd^{sha256}
>>   ef01ef01^{sha1}
>>   ef01ef01^{sha256}
>
> I still can't really make any sense of why anyone would even want #2 as
> described above, but for this third case I think we should do this:
>
>     abcdabcd^{sha1}   = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
>     abcdabcd^{sha256} = ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>     ef01ef01^{sha1}   = ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>     ef01ef01^{sha256} = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...
>
> I.e. a really useful thing about this peel syntax is that it's
> forgiving, and will try to optimistically look up what you want.

Sorry, I'm still not understanding.

I am not attached to any particular syntax, but what I really want is
the following:

	Someone who only uses SHA-256 sent me the commit id
	abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd... out of band.
	Show me that commit.

	I don't care what object id you show me when you show that
	commit.  If I pass --output-format=sha1, then that means I
	care, and show me the SHA-1.

In other words, I want the input format and output format completely
decoupled.  If I pass ^{sha1}, I am indicating the input format.  To
specify the output format, I'd use --output-format instead.

That lets me mix both hash functions in my input:

	git --output-format=sha256 diff abcdabcd^{sha1} abcdabcd^{sha256}

I learned about these two commits out of band from different users,
one who only uses SHA-1 and the other who only uses SHA-256.

In other words:

[...]
> Similarly, I think it would be very useful if we just make this work:
>
>     git rev-parse $some_hash^{sha256}^{commit}
>
> And not care whether $some_hash is SHA-1 or SHA-256, if it's the former
> we'd consult the SHA-1 <-> SHA-256 lookup table and go from there, and
> always return a useful value.

The opposite of this. :)

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29 19:12           ` Jonathan Nieder
@ 2018-08-29 19:37             ` Ævar Arnfjörð Bjarmason
  2018-08-29 20:46               ` Jonathan Nieder
  2018-08-29 20:53             ` Junio C Hamano
  1 sibling, 1 reply; 33+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-29 19:37 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Stefan Beller, git, Junio C Hamano, Linus Torvalds,
	Edward Thomson, brian m. carlson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee, Jeff King


On Wed, Aug 29 2018, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Aug 29 2018, Jonathan Nieder wrote:
>
>>> what objects would you expect the following to refer to?
>>>
>>>   abcdabcd^{sha1}
>>>   abcdabcd^{sha256}
>>>   ef01ef01^{sha1}
>>>   ef01ef01^{sha256}
>>
>> I still can't really make any sense of why anyone would even want #2 as
>> described above, but for this third case I think we should do this:
>>
>>     abcdabcd^{sha1}   = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
>>     abcdabcd^{sha256} = ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>>     ef01ef01^{sha1}   = ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>>     ef01ef01^{sha256} = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...
>>
>> I.e. a really useful thing about this peel syntax is that it's
>> forgiving, and will try to optimistically look up what you want.
>
> Sorry, I'm still not understanding.
>
> I am not attached to any particular syntax, but what I really want is
> the following:
>
> 	Someone who only uses SHA-256 sent me the commit id
> 	abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd... out of band.
> 	Show me that commit.

This is reasonable.

> 	I don't care what object id you show me when you show that
> 	commit.  If I pass --output-format=sha1, then that means I
> 	care, and show me the SHA-1.
>
> In other words, I want the input format and output format completely
> decoupled.  If I pass ^{sha1}, I am indicating the input format.  To
> specify the output format, I'd use --output-format instead.

This is also a reasonable thing to want, but I don't see how it can be
sensibly squared with the existing peel syntax.

The peel syntax <thing>^{commit} doesn't mean <thing> is a commit, it
means that thing might be some thing (commit, tag), and it should be
(recursively if needed) *resolved* as the thing on the RHS.

So to be consistent <thing>^{sha1} shouldn't mean <thing> is SHA-1, but
that I want a SHA-1 out of <thing>.

> That lets me mix both hash functions in my input:
>
> 	git --output-format=sha256 diff abcdabcd^{sha1} abcdabcd^{sha256}

Presumably you mean something like:

     git diff-tree --raw -r -p bcdabcd^{sha1} abcdabcd^{sha256}

I.e. we don't show any sort of SHAs in diff output, so what would this
--output-format=sha256 mean?

> I learned about these two commits out of band from different users,
> one who only uses SHA-1 and the other who only uses SHA-256.

I think for those cases we would just support:

     git diff-tree --raw -r -p bcdabcd abcdabcd

I.e. there's no need to specify the hash type, unless the two happen to
be ambiguous, but yeah, if that's the case we'd need to peel them (or
supply more hexdigits).

> In other words:
>
> [...]
>> Similarly, I think it would be very useful if we just make this work:
>>
>>     git rev-parse $some_hash^{sha256}^{commit}
>>
>> And not care whether $some_hash is SHA-1 or SHA-256, if it's the former
>> we'd consult the SHA-1 <-> SHA-256 lookup table and go from there, and
>> always return a useful value.
>
> The opposite of this. :)

Can you elaborate on that? What do you think that should do? Return an
error if $some_hash is SHA-1, even though we have a $some_hash =
$some_hash_256 mapping?

I.e. if I'm using this in a script I'd need:

    if x = git rev-parse $some_hash^{sha256}^{commit}
        hash = x
    elsif x = git rev-parse $some_hash^{sha1}^{commit}
        hash = x
    endif

As opposed to the thing I'm saying is the redeeming quality of the peel
syntax:

    hash = git rev-parse $some_hash^{sha256}^{commit}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29 19:37             ` Ævar Arnfjörð Bjarmason
@ 2018-08-29 20:46               ` Jonathan Nieder
  2018-08-29 23:45                 ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-29 20:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Stefan Beller, git, Junio C Hamano, Linus Torvalds,
	Edward Thomson, brian m. carlson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee, Jeff King

Hi,

Ævar Arnfjörð Bjarmason wrote:
> On Wed, Aug 29 2018, Jonathan Nieder wrote:

>> In other words, I want the input format and output format completely
>> decoupled.  If I pass ^{sha1}, I am indicating the input format.  To
>> specify the output format, I'd use --output-format instead.
>
> This is also a reasonable thing to want, but I don't see how it can be
> sensibly squared with the existing peel syntax.

All the weight here is on the word "sensibly".  Currently, ^{thing}
means "act on the object" and @{thing} means "act on the ref".  This
^{sha1} syntax is really a new kind of modifier, ~{thing}, meaning
"act on the string".

That said, we can make it do anything we want.  There is nothing
forcing us to make it more similar to ^{commit} than to
^{/searchstring}, say.

In that context:

[...]
>> Ævar Arnfjörð Bjarmason wrote:

>>> Similarly, I think it would be very useful if we just make this work:
>>>
>>>     git rev-parse $some_hash^{sha256}^{commit}
>>>
>>> And not care whether $some_hash is SHA-1 or SHA-256, if it's the former
>>> we'd consult the SHA-1 <-> SHA-256 lookup table and go from there, and
>>> always return a useful value.
>>
>> The opposite of this. :)
>
> Can you elaborate on that?

What I'm saying is, regardless of the syntax used, as a user I *need*
a way to look up $some_hash as a sha256-name, with zero risk of Git
trying to outsmart me and treating $some_hash as a sha1-name instead.

Any design without that capability is a non-starter.

[...]
> I.e. if I'm using this in a script I'd need:
>
>     if x = git rev-parse $some_hash^{sha256}^{commit}
>         hash = x
>     elsif x = git rev-parse $some_hash^{sha1}^{commit}
>         hash = x
>     endif

Why wouldn't you use "git rev-parse $some_hash^{commit}" instead?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29 20:46               ` Jonathan Nieder
@ 2018-08-29 23:45                 ` Jeff King
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2018-08-29 23:45 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, Stefan Beller, git,
	Junio C Hamano, Linus Torvalds, Edward Thomson, brian m. carlson,
	Johannes Schindelin, demerphq, Brandon Williams, Derrick Stolee

On Wed, Aug 29, 2018 at 01:46:23PM -0700, Jonathan Nieder wrote:

> > Can you elaborate on that?
> 
> What I'm saying is, regardless of the syntax used, as a user I *need*
> a way to look up $some_hash as a sha256-name, with zero risk of Git
> trying to outsmart me and treating $some_hash as a sha1-name instead.
> 
> Any design without that capability is a non-starter.

Right, this is IMHO the only thing that makes sense for ^{hash} to do:
it disambiguates the sha1 that you just gave it. Nothing more, nothing
less.

> > I.e. if I'm using this in a script I'd need:
> >
> >     if x = git rev-parse $some_hash^{sha256}^{commit}
> >         hash = x
> >     elsif x = git rev-parse $some_hash^{sha1}^{commit}
> >         hash = x
> >     endif
> 
> Why wouldn't you use "git rev-parse $some_hash^{commit}" instead?

Yes, the sane rules seem to me to be:

  # try any available hash for $some_hash
  git rev-parse $some_hash

  # look _only_ for $some_hash as a sha1
  git rev-parse $some_hash^{sha1}

  # ditto for sha256
  git rev-parse $some_hash^{sha256}

  # ditto, but then peel the result to a commit
  git rev-parse $some_hash^{sha256}^{commit}

  # this is nonsense, and should produce an error
  git rev-parse $some_hash^{commit}^{sha256}

For convenience of scripts, we may also want:

  git rev-parse --input-hash=sha256 $some_hash

to pretend as if "^{sha256}" was appended to each command-line hash we
try to resolve (e.g., consider a case where a script is feeding 0 or
more hashes).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29 19:12           ` Jonathan Nieder
  2018-08-29 19:37             ` Ævar Arnfjörð Bjarmason
@ 2018-08-29 20:53             ` Junio C Hamano
  2018-08-29 21:01               ` Jonathan Nieder
  1 sibling, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2018-08-29 20:53 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, Stefan Beller, git,
	Linus Torvalds, Edward Thomson, brian m. carlson,
	Johannes Schindelin, demerphq, Brandon Williams, Derrick Stolee,
	Jeff King

Jonathan Nieder <jrnieder@gmail.com> writes:

> In other words, I want the input format and output format completely
> decoupled.

I thought that the original suggestion was to use "hashname:" as a
prefix to specify input format.  In other words

	sha1:abababab
	sha256:abababab

And an unadorned abababab is first looked up in sha256 space for
uniqueness, and if and only if there is only one object whose sha256
name begins with abababab and there is *no* object whose sha1 name
begins with that hexstring (or vice versa), that string will be
resolved to an object name.

I do not think ^{hashname} mixes well with ^{objecttype} syntax at
all as an output specifier, either.  It would make sense to be more
explicit, I would think, e.g.

	git rev-parse --output=sha1 sha256:abababab

(or would that be the job for name-rev?)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29 20:53             ` Junio C Hamano
@ 2018-08-29 21:01               ` Jonathan Nieder
  0 siblings, 0 replies; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-29 21:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Stefan Beller, git,
	Linus Torvalds, Edward Thomson, brian m. carlson,
	Johannes Schindelin, demerphq, Brandon Williams, Derrick Stolee,
	Jeff King

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>> In other words, I want the input format and output format completely
>> decoupled.
>
> I thought that the original suggestion was to use "hashname:" as a
> prefix to specify input format.  In other words
>
> 	sha1:abababab
> 	sha256:abababab

That's fine with me too, and it's probably easier to understand than
^{sha1}.  The disadvantage is that it clashes with existing meaning of
"path abababab in branch sha1".  If we're okay with that change, then
it's a good syntax.

If we have a collection of proposed syntaxes, I can get some help from
a UI designer here, too, to help find any ramifications we've missed.

[...]
> I do not think ^{hashname} mixes well with ^{objecttype} syntax at
> all as an output specifier, either.  It would make sense to be more
> explicit, I would think, e.g.
>
> 	git rev-parse --output=sha1 sha256:abababab

Agreed.  I don't think it makes sense to put output specifiers in
revision names.  It would create a lot of unnecessary complexity and
ambiguity.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: How is the ^{sha256} peel syntax supposed to work?
  2018-08-29  9:13   ` How is the ^{sha256} peel syntax supposed to work? Ævar Arnfjörð Bjarmason
  2018-08-29 17:51     ` Stefan Beller
@ 2018-08-29 17:56     ` Jonathan Nieder
  1 sibling, 0 replies; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-29 17:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Linus Torvalds, Edward Thomson,
	brian m . carlson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee

Hi,

Ævar Arnfjörð Bjarmason wrote:
> On Fri, Aug 24 2018, Jonathan Nieder wrote:
>> Ævar Arnfjörð Bjarmason wrote:

>>> Or is this expected to be chained, as e.g. <object>^{tag}^{sha256} ?
>>
>> Great question.  The latter (well, <hexdigits>^{sha256}^{tag}, not the
>> other way around).
>
> Since nobody's chimed in with an answer, and I suspect many have an
> adversion to that big thread I thought I'd spin out just this small
> question into its own thread.
>
> brian m. carlson did some prep work for this in his just-submitted
> https://public-inbox.org/git/20180829005857.980820-2-sandals@crustytoothpaste.net/
>
> I was going to work on some of the peel code soon (digging up the type
> disambiguation patches I still need to re-submit), so could do this
> while I'm at it, i.e. implement ^{sha1}.

Cool!

> But as noted above it's not clear how it should work. Jonathan's
> chaining suggestion (<hexdigits>^{sha256}^{tag} not
> <hexdigits>^{tag}^{sha256}) makes more sense than mine, but is that what
> we're going for, or ^{sha256:tag}?

I don't have a strong opinion about this, but since it affects the
interpretation of <hexdigits>, my assumption has been that, in the
spirit of referential transparency, you would put
'<hexdigits>^{format}' and could put any additional specifiers after
that.

In other words, ^{format} changes the interpretation of <hexdigits> so
my assumption is that people would want it to be close by.

But if something else is easier to implement, we can start with that
something else and figure out whether we like it in review.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-23 14:02 Questions about the hash function transition Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2018-08-24  1:47 ` Jonathan Nieder
@ 2018-08-24  2:51 ` Jonathan Nieder
  2018-08-28 13:50 ` Ævar Arnfjörð Bjarmason
  4 siblings, 0 replies; 33+ messages in thread
From: Jonathan Nieder @ 2018-08-24  2:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Linus Torvalds, Edward Thomson,
	brian m . carlson, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee

Ævar Arnfjörð Bjarmason wrote:

>> Objective
>> ---------
>> Migrate Git from SHA-1 to a stronger hash function.
>
> Should way say "Migrate Git from SHA-1 to SHA-256" here instead?
>
> Maybe it's overly specific, i.e. really we're also describnig how /any/
> hash function transition might happen, but having just read this now
> from start to finish it takes us a really long time to mention (and at
> first, only offhand) that SHA-256 is the new hash.

I answered this question in my other reply, but my answer missed the
point.

I think it would be fine for this to say "Migrate Git from SHA-1 to a
stronger hash function (SHA-256)".  More importantly, I think the
Background section should say something about SHA-256 --- e.g. how about
replacing the sentence

  SHA-1 still possesses the other properties such as fast object
  lookup and safe error checking, but other hash functions are equally
  suitable that are believed to be cryptographically secure.

with something about SHA-256?

Rereading the background section, I see some other bits that could be
clarified, too.  It has a run-on sentence:

  Thus Git has in effect already migrated to a new hash that isn't
  SHA-1 and doesn't share its vulnerabilities, its new hash function
  just happens to produce exactly the same output for all known
  inputs, except two PDFs published by the SHAttered researchers, and
  the new implementation (written by those researchers) claims to
  detect future cryptanalytic collision attacks.

The "," after vulnerabilities should be a period, ending the sentence.
My understanding is that sha1collisiondetection's safe-hash is meant
to protect against known attacks and that the code is meant to be
adaptable for future attacks of the same kind (by updating the list of
disturbance vectors), but it doesn't claim to guard against future
novel cryptanalysis methods that haven't been published yet.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-23 14:02 Questions about the hash function transition Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2018-08-24  2:51 ` Questions about the hash function transition Jonathan Nieder
@ 2018-08-28 13:50 ` Ævar Arnfjörð Bjarmason
  2018-08-28 14:15   ` Edward Thomson
  4 siblings, 1 reply; 33+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28 13:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Linus Torvalds, Edward Thomson, brian m . carlson,
	Jonathan Nieder, Johannes Schindelin, demerphq, Brandon Williams,
	Derrick Stolee

On Thu, Aug 23 2018, Ævar Arnfjörð Bjarmason wrote:

>> Transition plan
>> ---------------
>
> One thing that's not covered in this document at all, which I feel is
> missing, is how we're going to handle references to old commit IDs in
> commit messages, bug trackers etc. once we go through the whole
> migration process.
>
> I.e. are users who expect to be able to read old history and "git show
> <sha1 I found>" expected to maintain a repository that has a live
> sha1<->sha256 mapping forever, or could we be smarter about this and
> support some sort of marker in the repository saying "maintain the
> mapping up until this point".
>
> Then, along with some v2 protocol extension to transfer such a
> historical mapping (and perhaps a default user option to request it)
> we'd be guaranteed to be able to read old log messages and "git show"
> them, and servers could avoid breaking past URLs without maintaining the
> mapping going forward.
>
> One example of this on the server is that on GitLab (I don't know how
> GitHub does this) when you reference a commit from e.g a bug, a
> refs/keep-around/<sha1> is created, to make sure it doesn't get GC'd.
>
> Those sorts of hosting providers would like to not break *existing*
> links, without needing to forever maintain a bidirectional mapping.

Considering this a bit more, I think this would nicely fall under what I
suggested in
https://public-inbox.org/git/874ll3yd75.fsf@evledraar.gmail.com/

I.e. the interface that's now proposed / documented is fairly
inelastic. I.e.:

    [extensions]
        objectFormat = sha256
        compatObjectFormat = sha1

If we instead had something like clean/smudge filters:

    [extensions]
        objectFilter = sha256-to-sha1
        compatObjectFormat = sha1
    [objectFilter "sha256-to-sha1"]
        clean  = ...
        smudge = ...

We could apply arbitrary transformations on objects through filters
which would accept/return some simple format requesting them to
translate such-and-such objects, and would either return object
names/types under which to store them, or "nothing to do".

So we could also have filters that would munge the contents of objects
between local & remote (for e.g. this "use a public remote host for
storing an encrypted repo" that'll fsck on their end) use-case, but also
e.g. be able to pass arguments to the filters saying that only commits
older than so-and-so are to have a reverse mapping (for looking up old
commits), or just ones on some branch etc.

It wouldn't be any slower than the current proposal, since some subset
of it would be picked up and implemented in C directly via some fast
path, similar to the proposal that e.g. some encoding filters be
implemented as built-ins.

But by having it be more extendable it'll be easy to e.g. pass options,
or implement custom transformations.

We're still far away from reviewing patches to implement this, but in
anticipation of that I'd like to see what people think about
future-proofing this objectFilter syntax.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-28 13:50 ` Ævar Arnfjörð Bjarmason
@ 2018-08-28 14:15   ` Edward Thomson
  2018-08-28 15:02     ` Ævar Arnfjörð Bjarmason
  2018-08-28 15:45     ` Junio C Hamano
  0 siblings, 2 replies; 33+ messages in thread
From: Edward Thomson @ 2018-08-28 14:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Linus Torvalds,
	brian m . carlson, Jonathan Nieder, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee

On Tue, Aug 28, 2018 at 2:50 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> If we instead had something like clean/smudge filters:
>
>     [extensions]
>         objectFilter = sha256-to-sha1
>         compatObjectFormat = sha1
>     [objectFilter "sha256-to-sha1"]
>         clean  = ...
>         smudge = ...
>
> We could apply arbitrary transformations on objects through filters
> which would accept/return some simple format requesting them to
> translate such-and-such objects, and would either return object
> names/types under which to store them, or "nothing to do".

If I'm understanding you correctly, then on the libgit2 side, I'm very much
opposed to this proposal.  We never execute commands, nor do I want to start
thinking that we can do so arbitrarily.  We run in environments where that's
a non-starter

At present, in libgit2, users can provide their own mechanism for running
clean/smudge filters.  But hash transformation / compatibility is going to
be a crucial compatibility component.  So this is not something that we
could simply opt out of or require users to implement themselves.

-ed

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-28 14:15   ` Edward Thomson
@ 2018-08-28 15:02     ` Ævar Arnfjörð Bjarmason
  2018-08-28 15:45     ` Junio C Hamano
  1 sibling, 0 replies; 33+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28 15:02 UTC (permalink / raw)
  To: Edward Thomson
  Cc: Git Mailing List, Junio C Hamano, Linus Torvalds,
	brian m . carlson, Jonathan Nieder, Johannes Schindelin, demerphq,
	Brandon Williams, Derrick Stolee

On Tue, Aug 28 2018, Edward Thomson wrote:

> On Tue, Aug 28, 2018 at 2:50 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> If we instead had something like clean/smudge filters:
>>
>>     [extensions]
>>         objectFilter = sha256-to-sha1
>>         compatObjectFormat = sha1
>>     [objectFilter "sha256-to-sha1"]
>>         clean  = ...
>>         smudge = ...
>>
>> We could apply arbitrary transformations on objects through filters
>> which would accept/return some simple format requesting them to
>> translate such-and-such objects, and would either return object
>> names/types under which to store them, or "nothing to do".
>
> If I'm understanding you correctly, then on the libgit2 side, I'm very much
> opposed to this proposal.  We never execute commands, nor do I want to start
> thinking that we can do so arbitrarily.  We run in environments where that's
> a non-starter

I'm being unclear. I'm suggesting that we slightly amend the syntax of
what we're proposing to put in the .git/config to leave the door open
for *optionally* doing arbitrary mappings.

It would still work exactly the same internally for the common
sha1<->sha256 case, i.e. neither git, libgit, jgit or anyone else would
need to shell out to anything.

They'd just pick up that common case and handle it internally, similar
to how e.g. the crlf filter (v.s. full clean/smudge support) works in
git & libgit2:
https://github.com/libgit2/libgit2/blob/master/tests/filter/crlf.c

So the sha256<->sha1 support would be an implicit built-in like crlf, it
would just leave the door open to having something like git-lfs.

Now what does that really mean? And I admit I may be missing something
here.

Unlike smudge/clean filters we're going to be constrained by having
hashes of length 20 or 32, locally & remotely, since we wouldn't want to
support arbitrary lengths, but with relatively small changes it'll allow
for changing just:

    # local  remote
    sha256<->sha1

To also support:

    # local  remote
    fn(sha1)<->fn(sha1)
    fn(sha1)<->fn(sha256)
    fn(sha256)<->fn(sha1)
    fn(sha256)<->fn(sha256)

Where fn() is some hook you'd provide to hook into the bits where we're
e.g. unpacking SHA-1 objects from the remote, and writing them locally
as SHA-256, except instead of (as we do by default) writing:

    SHA256_map(sha256(content)) = content

You'd write:

    SHA256_map(sha256(fn(content))) = fn(content)

Where fn() would need to be idempotent.

Now, why is this useful or worth considering? As noted in the E-Mail I
linked to it allows for some novel use cases for doing local to remote
object translation.

But really, I'm not suggesting that *that* is something we should
consider. *All* I'm saying is that given the experience of how we
started out with stuff like built-in "crlf", and then grew smudge/clean
filters, that it's worth considering what sort of .git/config key-value
pairs we'd pick that would yield themselves to such future extensions,
should that be something we deem to be a good idea in the future.

Because if we don't we've lost nothing, but if we do we'd need to
support two sets of config syntaxes to do those two related things.

> At present, in libgit2, users can provide their own mechanism for running
> clean/smudge filters.  But hash transformation / compatibility is going to
> be a crucial compatibility component.  So this is not something that we
> could simply opt out of or require users to implement themselves.

Indeed.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Questions about the hash function transition
  2018-08-28 14:15   ` Edward Thomson
  2018-08-28 15:02     ` Ævar Arnfjörð Bjarmason
@ 2018-08-28 15:45     ` Junio C Hamano
  1 sibling, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2018-08-28 15:45 UTC (permalink / raw)
  To: Edward Thomson
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	Linus Torvalds, brian m . carlson, Jonathan Nieder,
	Johannes Schindelin, demerphq, Brandon Williams, Derrick Stolee

Edward Thomson <ethomson@edwardthomson.com> writes:

> If I'm understanding you correctly, then on the libgit2 side, I'm very much
> opposed to this proposal.  We never execute commands, nor do I want to start
> thinking that we can do so arbitrarily.  We run in environments where that's
> a non-starter
>
> At present, in libgit2, users can provide their own mechanism for running
> clean/smudge filters.  But hash transformation / compatibility is going to
> be a crucial compatibility component.  So this is not something that we
> could simply opt out of or require users to implement themselves.

While I suspect the "apparent flexibility" does not equal to "we
must be able to run arbitrary external programs" in the proposal, I
do agree that hash transformation MUST NOT be configurable like
this.  We do not want to add random source of incompatible mappings
when there is no need to introduce confusion.

If old object names under old hash users find in log messages and
other places need to be easily looked up in a repository that has
been converted, then:

 (1) get_sha1() equivalent in the new world should learn to fall
     back to use old hash when there is no object with that name
     under new hash;

 (2) in addition to the above fallback, there should be a syntax to
     explicitly tell that function that it is using the old hash;

 (3) get_commit_buffer() should learn to optionally allow converting
     old hash in log messages to new ones, in a way similar to how
     textconv filter can be specified by the end-users to make
     binary blob easier to grok by text-based tools (the important
     part is that such a filter does not have to be limited to
     "upgrade hash algorithm"---it can be more general "correct
     misspelt words automatically" filter).

With 1+2, you can say "git log $sha1" and also "git log sha1:$sha1"
to disambiguate.  3 would be icing on the cake.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-08-29 23:45 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-23 14:02 Questions about the hash function transition Ævar Arnfjörð Bjarmason
2018-08-23 14:27 ` Junio C Hamano
2018-08-23 15:20   ` Ævar Arnfjörð Bjarmason
2018-08-23 16:13     ` Junio C Hamano
2018-08-24  1:40 ` brian m. carlson
2018-08-24  1:54   ` Jonathan Nieder
2018-08-24  4:47     ` brian m. carlson
2018-08-24  4:52       ` Jonathan Nieder
2018-08-24  1:47 ` Jonathan Nieder
2018-08-28 12:04   ` Johannes Schindelin
2018-08-28 12:49     ` Derrick Stolee
2018-08-28 17:12       ` Jonathan Nieder
2018-08-28 17:11     ` Jonathan Nieder
2018-08-29 13:09       ` Johannes Schindelin
2018-08-29 13:27         ` Derrick Stolee
2018-08-29 14:43           ` Derrick Stolee
2018-08-29  9:13   ` How is the ^{sha256} peel syntax supposed to work? Ævar Arnfjörð Bjarmason
2018-08-29 17:51     ` Stefan Beller
2018-08-29 17:59       ` Jonathan Nieder
2018-08-29 18:34         ` Stefan Beller
2018-08-29 18:41         ` Ævar Arnfjörð Bjarmason
2018-08-29 19:12           ` Jonathan Nieder
2018-08-29 19:37             ` Ævar Arnfjörð Bjarmason
2018-08-29 20:46               ` Jonathan Nieder
2018-08-29 23:45                 ` Jeff King
2018-08-29 20:53             ` Junio C Hamano
2018-08-29 21:01               ` Jonathan Nieder
2018-08-29 17:56     ` Jonathan Nieder
2018-08-24  2:51 ` Questions about the hash function transition Jonathan Nieder
2018-08-28 13:50 ` Ævar Arnfjörð Bjarmason
2018-08-28 14:15   ` Edward Thomson
2018-08-28 15:02     ` Ævar Arnfjörð Bjarmason
2018-08-28 15:45     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).