From: "Martin Ågren" <martin.agren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>,
Derrick Stolee <dstolee@microsoft.com>,
Jeff Hostetler <git@jeffhostetler.com>, Jeff King <peff@peff.net>,
Johannes Schindelin <johannes.schindelin@gmx.de>,
Jonathan Nieder <jrnieder@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 20/20] abbrev: add a core.validateAbbrev setting
Date: Sat, 9 Jun 2018 17:47:18 +0200 [thread overview]
Message-ID: <CAN0heSqo1WSVkYNiFAv4A7x8hG0wEx-iSz7ssLmYYVuen7b-LQ@mail.gmail.com> (raw)
In-Reply-To: <20180608224136.20220-21-avarab@gmail.com>
On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> Instead of trying really hard to find an unambiguous SHA-1 we can with
> core.validateAbbrev=false, and preferably combined with the new
> support for relative core.abbrev values (such as +4) unconditionally
> print a short SHA-1 without doing any disambiguation check. I.e. it
This first paragraph read weirdly the first time. On the second attempt
I knew how to structure it and got it right. It might be easier to read
if the part about core.appreb=+4 were in a separate second sentence.
That last "it" is "the combination of these two configs", right?
> allows for picking a trade-off between performance, and the odds that
> future or remote (or current and local) short SHA-1 will be ambiguous.
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index abf07be7b6..df31d1351f 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -925,6 +925,49 @@ means to add or subtract N characters from the SHA-1 that Git would
> otherwise print, this allows for producing more future-proof SHA-1s
> for use within a given project, while adjusting the value for the
> current approximate number of objects.
> ++
> +This is especially useful in combination with the
> +`core.validateAbbrev` setting, or to get more future-proof hashes to
> +reference in the future in a repository whose number of objects is
> +expected to grow.
Maybe s/validateAbbrev/validateAbbrev = false/?
> +
> +core.validateAbbrev::
> + If set to false (true by default) don't do any validation when
> + printing abbreviated object names to see if they're really
> + unique. This makes printing objects more performant at the
> + cost of potentially printing object names that aren't unique
> + within the current repository.
Good. I understand why I'd want to use it, and why not.
> ++
> +When printing abbreviated object names Git needs to look through the
> +local object store. This is an `O(log N)` operation assuming all the
> +objects are in a single pack file, but `X * O(log N)` given `X` pack
> +files, which can get expensive on some larger repositories.
This might be very close to too much information.
> ++
> +This setting changes that to `O(1)`, but with the trade-off that
> +depending on the value of `core.abbrev` we may be printing abbreviated
> +hashes that collide. Too see how likely this is, try running:
> ++
> +-----------------------------------------------------------------------------------------------------------
> +git log --all --pretty=format:%h --abbrev=4 | perl -nE 'chomp; say length' | sort | uniq -c | sort -nr
> +-----------------------------------------------------------------------------------------------------------
> ++
> +This shows how many commits were found at each abbreviation length. On
> +linux.git in June 2018 this shows a bit more than 750,000 commits,
> +with just 4 needing 11 characters to be fully abbreviated, and the
> +default heuristic picks a length of 12.
These last few paragraphs seem like too much to me.
> ++
> +Even without `core.validateAbbrev=false` the results abbreviation
> +already a bit of a probability game. They're guaranteed at the moment
> +of generation, but as more objects are added, ambiguities may be
> +introduced. Likewise, what's unambiguous for you may not be for
> +somebody else you're communicating with, if they have their own clone.
This seems more useful.
> ++
> +Therefore the default of `core.validateAbbrev=true` may not save you
> +in practice if you're sharing the SHA-1 or noting it now to use after
> +a `git fetch`. You may be better off setting `core.abbrev` to
> +e.g. `+2` to add 2 extra characters to the SHA-1, and possibly combine
> +that with `core.validateAbbrev=false` to get a reasonable trade-off
> +between safety and performance.
Makes sense. As before, I'd suggest s/SHA-1/object ID/.
I do wonder if this documentation could be a bit less verbose without
sacrificing too much correctness and clarity. No brilliant suggestions
on how to do that, sorry.
Martin
next prev parent reply other threads:[~2018-06-09 15:47 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 01/20] t/README: clarify the description of test_line_count Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 02/20] test library: add a test_byte_count Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 03/20] blame doc: explicitly note how --abbrev=40 gives 39 chars Ævar Arnfjörð Bjarmason
2018-06-12 18:10 ` Junio C Hamano
2018-06-08 22:41 ` [PATCH 04/20] abbrev tests: add tests for core.abbrev and --abbrev Ævar Arnfjörð Bjarmason
2018-06-12 18:31 ` Junio C Hamano
2018-06-08 22:41 ` [PATCH 05/20] abbrev tests: test "git-blame" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 06/20] blame: fix a bug, core.abbrev should work like --abbrev Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 07/20] abbrev tests: test "git branch" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 08/20] abbrev tests: test for "git-describe" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 09/20] abbrev tests: test for "git-log" behavior Ævar Arnfjörð Bjarmason
2018-06-09 8:43 ` Martin Ågren
2018-06-09 9:56 ` Ævar Arnfjörð Bjarmason
2018-06-09 13:56 ` Martin Ågren
2018-06-08 22:41 ` [PATCH 10/20] abbrev tests: test for "git-diff" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 11/20] abbrev tests: test for plumbing behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 12/20] abbrev tests: test for --abbrev and core.abbrev=[+-]N Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 13/20] parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 14/20] config.c: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 15/20] parse-options-cb.c: " Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 16/20] abbrev: unify the handling of non-numeric values Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 17/20] abbrev: unify the handling of empty values Ævar Arnfjörð Bjarmason
2018-06-09 14:24 ` Martin Ågren
2018-06-09 14:31 ` Martin Ågren
2018-06-08 22:41 ` [PATCH 18/20] abbrev parsing: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 19/20] abbrev: support relative abbrev values Ævar Arnfjörð Bjarmason
2018-06-09 15:38 ` Martin Ågren
2018-06-12 19:16 ` Junio C Hamano
2018-06-13 22:22 ` Ævar Arnfjörð Bjarmason
2018-06-13 22:34 ` Junio C Hamano
2018-06-14 7:36 ` Ævar Arnfjörð Bjarmason
2018-06-14 15:50 ` Junio C Hamano
2018-06-14 19:07 ` Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 20/20] abbrev: add a core.validateAbbrev setting Ævar Arnfjörð Bjarmason
2018-06-09 15:47 ` Martin Ågren [this message]
2018-06-12 18:58 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAN0heSqo1WSVkYNiFAv4A7x8hG0wEx-iSz7ssLmYYVuen7b-LQ@mail.gmail.com \
--to=martin.agren@gmail.com \
--cc=avarab@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).