git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 0/4] get_oid: cope with a possibly stale loose object cache
Date: Wed, 13 Mar 2019 11:33:17 -0400	[thread overview]
Message-ID: <20190313153317.GB24101@sigill.intra.peff.net> (raw)
In-Reply-To: <pull.161.git.gitgitgadget@gmail.com>

On Wed, Mar 13, 2019 at 03:16:30AM -0700, Johannes Schindelin via GitGitGadget wrote:

> Over at the companion PR for Git for Windows
> [https://github.com/git-for-windows/git/pull/2121], I discussed this with
> Peff (who introduced the loose object cache), and he pointed out that my
> original solution was a bit too specific for the interactive rebase. He
> suggested the current method of re-reading upon a missing object instead,
> and explained patiently that this does not affect the code path for which
> the loose object cache was introduced originally: to help with performance
> issues on NFS when Git tries to find the same missing objects over and over
> again.
> 
> The regression test still reflects what I need this fix for, and I would
> rather keep it as-is (even if the fix is not about the interactive rebase
> per se), as it really tests for the functionality that I really need to
> continue to work.

The case you found is the only way I can think of to trigger this
deterministically. So I think it is quite a good test case. :)

Just to rehash a bit from that PR: I believe that this bug has been
present in the way the test checks for it since we started caching in
the sha1-abbreviation code in cc817ca3ef (sha1_name: cache readdir(3)
results in find_short_object_filename(), 2017-06-22).

But the more fundamental issue, that we do not do the usual
reprepare_packed_git() thing upon failing to find an object, goes back
forever. It's just much less common to hit that race than the one with
normal object access, because we tend to resolve objects quickly at the
beginning of a program and then spend a long time walking the graph. But
it is still possible, especially in a long-running program that resolves
names not just at the beginning. Like rebase.

  Actually, I guess one could probably construct a similar case with
  a long-running "cat-file --batch", feeding it one name, then doing a
  complete repack, and then feeding it another name. I guess I could
  think of another case. ;)

> My only concern is that this might cause some performance regressions that
> neither Peff nor I could think of, where get_oid() may run repeatedly into
> missing objects by design, and where we should not blow away and recreate
> the loose object cache all the time.

This only affects get_oid_short(), which is already pretty expensive,
because it has to disambiguate them. The usual path we care about being
fast is the 40-hex sha1 parser, which should be able to do its job
without doing anything but parsing the string. We've already run into
speed issues there with the object/refname ambiguity checks, and callers
can disable that with a flag.

So I'd say:

  - if you want to think about callers which might be sensitive to this
    change, the ones setting warn_on_object_refname_ambiguity might be
    good candidates.

  - anything except 40-hex sha1 is already pretty expensive (ref
    lookups, disambiguation, etc), so as long as we're not touching that
    path, I'm not incredibly worried.

    I did wonder for a minute whether the 40-hex sha1 resolution would
    want this similar reprepare_packed_git() handling. But it doesn't
    verify the object at all, and just passes back the oid whether it
    exists or not (and it's up to the caller to then use
    OBJECT_INFO_QUICK to access the object data if it chooses).

That's my opinion, of course. I think your intent was to solicit other
thoughts, so I'd be curious to hear if anybody disagrees. :)

> Johannes Schindelin (4):
>   rebase -i: demonstrate obscure loose object cache bug
>   sequencer: improve error message when an OID could not be parsed
>   sequencer: move stale comment into correct location
>   get_oid(): when an object was not found, try harder

The patches themselves look good to me. Thanks for pushing our
discussion forward.

-Peff

  parent reply	other threads:[~2019-03-13 15:34 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-13 10:16 [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
2019-03-13 10:16 ` [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
2019-03-13 16:11   ` Ævar Arnfjörð Bjarmason
2019-03-13 16:35     ` Jeff King
2019-03-13 16:53       ` Jeff King
2019-03-13 22:40         ` Johannes Schindelin
2019-03-14  0:07           ` Jeff King
2019-03-13 22:27       ` Johannes Schindelin
2019-03-14  0:05         ` Jeff King
2019-03-14  6:40       ` Johannes Sixt
2019-03-14 13:06         ` Johannes Schindelin
2019-03-14  1:12   ` Junio C Hamano
2019-03-14 12:44     ` Johannes Schindelin
2019-03-13 10:16 ` [PATCH 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
2019-03-14  1:14   ` Junio C Hamano
2019-03-13 10:16 ` [PATCH 3/4] sequencer: move stale comment into correct location Johannes Schindelin via GitGitGadget
2019-03-13 10:16 ` [PATCH 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget
2019-03-14  1:29   ` Junio C Hamano
2019-03-14  2:22     ` Jeff King
2019-03-14  2:40       ` Jeff King
2019-03-14  4:05         ` Junio C Hamano
2019-03-14 18:55           ` Jeff King
2019-03-14  3:49       ` Junio C Hamano
2019-03-14 13:17     ` Johannes Schindelin
2019-03-14 18:57       ` Jeff King
2019-03-14 19:55         ` Johannes Schindelin
2019-03-13 15:33 ` Jeff King [this message]
2019-03-14 15:33 ` [PATCH v2 0/4] get_short_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 3/4] sequencer: move stale comment into correct location Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190313153317.GB24101@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).