git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Martin Ågren" <martin.agren@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Derrick Stolee" <dstolee@microsoft.com>,
	"Jeff Hostetler" <git@jeffhostetler.com>,
	"Jeff King" <peff@peff.net>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>
Subject: Re: [PATCH 20/20] abbrev: add a core.validateAbbrev setting
Date: Tue, 12 Jun 2018 11:58:37 -0700	[thread overview]
Message-ID: <xmqqzhzz96hu.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <CAN0heSqo1WSVkYNiFAv4A7x8hG0wEx-iSz7ssLmYYVuen7b-LQ@mail.gmail.com> ("Martin Ågren"'s message of "Sat, 9 Jun 2018 17:47:18 +0200")

Martin Ågren <martin.agren@gmail.com> writes:

>> +This is especially useful in combination with the
>> +`core.validateAbbrev` setting, or to get more future-proof hashes to
>> +reference in the future in a repository whose number of objects is
>> +expected to grow.
>
> Maybe s/validateAbbrev/validateAbbrev = false/?

Perhaps, but even with =true it would equally be useful, as the
point of this setting is to future-proofing.

>> ++
>> +When printing abbreviated object names Git needs to look through the
>> +local object store. This is an `O(log N)` operation assuming all the
>> +objects are in a single pack file, but `X * O(log N)` given `X` pack
>> +files, which can get expensive on some larger repositories.
>
> This might be very close to too much information.

Not very close, but just too much information without crucial detail
(i.e. log N times what constant???).  I'd drop it.

>> ++
>> +This setting changes that to `O(1)`, but with the trade-off that
>> +depending on the value of `core.abbrev` we may be printing abbreviated
>> +hashes that collide.

It may not be technically wrong to say "This changes it to O(1)",
but I think to most people it is more understandable to say "This
changes it to zero" ;-)

    Setting this variable to false makes Git not to validate the
    abbreviation it produces uniquely identifies an object among the
    current set of objects in the repository.  Depending on the
    value of `core.abbrev`, we may be printing abbreviated hashes
    that collide.  Note that setting this variable to true (or
    leaving it unset) does not guarantee that an abbreviated hash
    will never collide with future objects in the repository (you
    need to set core.abbrevLength to a larger value for that).

would be sufficient to clarify, and also nuke the following
overly-detailed paragraph.

>> ... Too see how likely this is, try running:
>> ++
>> +-----------------------------------------------------------------------------------------------------------
>> +git log --all --pretty=format:%h --abbrev=4 | perl -nE 'chomp; say length' | sort | uniq -c | sort -nr
>> +-----------------------------------------------------------------------------------------------------------
>> ++
>> +This shows how many commits were found at each abbreviation length. On
>> +linux.git in June 2018 this shows a bit more than 750,000 commits,
>> +with just 4 needing 11 characters to be fully abbreviated, and the
>> +default heuristic picks a length of 12.
>
> These last few paragraphs seem like too much to me.

Yeah, it goes to too low level a detail, especially with the "try
running" part.  I'd remove everything but "On linux.git in June ..."
if I were writing it from the above.

>> ++
>> +Even without `core.validateAbbrev=false` the results abbreviation
>> +already a bit of a probability game. They're guaranteed at the moment
>> +of generation, but as more objects are added, ambiguities may be
>> +introduced. Likewise, what's unambiguous for you may not be for
>> +somebody else you're communicating with, if they have their own clone.
>
> This seems more useful.

Yes, but still overly verbose; I think rolling it in the single
paragraph description like I showed above would be sufficient.

>> ++
>> +Therefore the default of `core.validateAbbrev=true` may not save you
>> +in practice if you're sharing the SHA-1 or noting it now to use after
>> +a `git fetch`. You may be better off setting `core.abbrev` to
>> +e.g. `+2` to add 2 extra characters to the SHA-1, and possibly combine
>> +that with `core.validateAbbrev=false` to get a reasonable trade-off
>> +between safety and performance.
>
> Makes sense. As before, I'd suggest s/SHA-1/object ID/.

Likewise.  If we were to keep it, then s/object ID/object name/.

      reply	other threads:[~2018-06-12 18:58 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 01/20] t/README: clarify the description of test_line_count Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 02/20] test library: add a test_byte_count Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 03/20] blame doc: explicitly note how --abbrev=40 gives 39 chars Ævar Arnfjörð Bjarmason
2018-06-12 18:10   ` Junio C Hamano
2018-06-08 22:41 ` [PATCH 04/20] abbrev tests: add tests for core.abbrev and --abbrev Ævar Arnfjörð Bjarmason
2018-06-12 18:31   ` Junio C Hamano
2018-06-08 22:41 ` [PATCH 05/20] abbrev tests: test "git-blame" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 06/20] blame: fix a bug, core.abbrev should work like --abbrev Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 07/20] abbrev tests: test "git branch" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 08/20] abbrev tests: test for "git-describe" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 09/20] abbrev tests: test for "git-log" behavior Ævar Arnfjörð Bjarmason
2018-06-09  8:43   ` Martin Ågren
2018-06-09  9:56     ` Ævar Arnfjörð Bjarmason
2018-06-09 13:56       ` Martin Ågren
2018-06-08 22:41 ` [PATCH 10/20] abbrev tests: test for "git-diff" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 11/20] abbrev tests: test for plumbing behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 12/20] abbrev tests: test for --abbrev and core.abbrev=[+-]N Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 13/20] parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 14/20] config.c: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 15/20] parse-options-cb.c: " Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 16/20] abbrev: unify the handling of non-numeric values Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 17/20] abbrev: unify the handling of empty values Ævar Arnfjörð Bjarmason
2018-06-09 14:24   ` Martin Ågren
2018-06-09 14:31     ` Martin Ågren
2018-06-08 22:41 ` [PATCH 18/20] abbrev parsing: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 19/20] abbrev: support relative abbrev values Ævar Arnfjörð Bjarmason
2018-06-09 15:38   ` Martin Ågren
2018-06-12 19:16   ` Junio C Hamano
2018-06-13 22:22     ` Ævar Arnfjörð Bjarmason
2018-06-13 22:34       ` Junio C Hamano
2018-06-14  7:36         ` Ævar Arnfjörð Bjarmason
2018-06-14 15:50           ` Junio C Hamano
2018-06-14 19:07             ` Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 20/20] abbrev: add a core.validateAbbrev setting Ævar Arnfjörð Bjarmason
2018-06-09 15:47   ` Martin Ågren
2018-06-12 18:58     ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqzhzz96hu.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=jrnieder@gmail.com \
    --cc=martin.agren@gmail.com \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).