git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] get_oid: cope with a possibly stale loose object cache
@ 2019-03-13 10:16 Johannes Schindelin via GitGitGadget
  2019-03-13 10:16 ` [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-03-13 10:16 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano

Being rather a power user of the interactive rebase, I discovered recently
that one of my scripts ran into an odd problem where an interactive rebase
appended new commands to the todo list, and the interactive rebase failed to
validate the (short) OIDs. But when I tried to fix things via git rebase
--edit-todo, the interactive rebase had no longer any problem to validate
them.

Turns out that I generated the objects during the interactive rebase (yes, I
implemented a 2-phase rebase
[https://github.com/git-for-windows/build-extra/blob/master/ever-green.sh]),
and that their hashes happened to share the first two hex digits with some
loose object that had already been read in the same interactive rebase run,
causing a now-stale loose object cache to be initialized for that 
.git/objects/?? subdirectory.

It was quite the, let's say, adventure, to track that one down.

Over at the companion PR for Git for Windows
[https://github.com/git-for-windows/git/pull/2121], I discussed this with
Peff (who introduced the loose object cache), and he pointed out that my
original solution was a bit too specific for the interactive rebase. He
suggested the current method of re-reading upon a missing object instead,
and explained patiently that this does not affect the code path for which
the loose object cache was introduced originally: to help with performance
issues on NFS when Git tries to find the same missing objects over and over
again.

The regression test still reflects what I need this fix for, and I would
rather keep it as-is (even if the fix is not about the interactive rebase
per se), as it really tests for the functionality that I really need to
continue to work.

My only concern is that this might cause some performance regressions that
neither Peff nor I could think of, where get_oid() may run repeatedly into
missing objects by design, and where we should not blow away and recreate
the loose object cache all the time.

Does anybody share this concern? Or even better: can anybody think of such a
scenario?

Johannes Schindelin (4):
  rebase -i: demonstrate obscure loose object cache bug
  sequencer: improve error message when an OID could not be parsed
  sequencer: move stale comment into correct location
  get_oid(): when an object was not found, try harder

 sequencer.c                 |  5 +++--
 sha1-name.c                 | 12 ++++++++++++
 t/t3429-rebase-edit-todo.sh | 22 ++++++++++++++++++++++
 3 files changed, 37 insertions(+), 2 deletions(-)


base-commit: e902e9bcae2010bc42648c80ab6adc6c5a16a4a5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-161%2Fdscho%2Frereading-todo-list-and-loose-object-cache-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-161/dscho/rereading-todo-list-and-loose-object-cache-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/161
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-13 10:16 [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
@ 2019-03-13 10:16 ` Johannes Schindelin via GitGitGadget
  2019-03-13 16:11   ` Ævar Arnfjörð Bjarmason
  2019-03-14  1:12   ` Junio C Hamano
  2019-03-13 10:16 ` [PATCH 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 32+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-03-13 10:16 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

We specifically support `exec` commands in `git rebase -i`'s todo lists
to rewrite the very same todo list. Of course, we need to validate that
todo list when re-reading it.

It is also totally legitimate to extend the todo list by `pick` lines
using short names of commits that were created only after the rebase
started.

And this is where the loose object cache interferes with this feature:
if *some* loose object was read whose hash shares the same first two
digits with a commit that was not yet created when that loose object was
created, then we fail to find that new commit by its short name in
`get_oid()`, and the interactive rebase fails with an obscure error
message like:

	error: invalid line 1: pick 6568fef
	error: please fix this using 'git rebase --edit-todo'.

Let's first demonstrate that this is actually a bug in a new regression
test, in a separate commit so that other developers who do not believe
me can cherry-pick it to confirm the problem.

This new regression test generates two commits whose hashes share the
first two hex digits (so that their corresponding loose objects live in
the same subdirectory of .git/objects/, and are therefore supposed to be
in the same loose object cache bin).

It then picks the first, to make sure that the loose object cache is
initialized and cached that object directory, then generates the second
commit and picks it, too. Since the commit was generated in a different
process than the sequencer that wants to pick it, the loose object cache
had no chance of being updated in the meantime.

Technically, we would need only one `exec` command in this regression
test case, but for ease of implementation, it uses a pseudo-recursive
call to the same script.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t3429-rebase-edit-todo.sh | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/t/t3429-rebase-edit-todo.sh b/t/t3429-rebase-edit-todo.sh
index b9292dfc2a..862f229c87 100755
--- a/t/t3429-rebase-edit-todo.sh
+++ b/t/t3429-rebase-edit-todo.sh
@@ -11,4 +11,26 @@ test_expect_success 'rebase exec modifies rebase-todo' '
 	test -e F
 '
 
+test_expect_failure SHA1 'loose object cache vs re-reading todo list' '
+	GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
+	export GIT_REBASE_TODO &&
+	write_script append-todo.sh <<-\EOS &&
+	# For values 5 and 6, this yields SHA-1s with the same first two digits
+	echo "pick $(git rev-parse --short \
+		$(printf "%s\\n" \
+			"tree $EMPTY_TREE" \
+			"author A U Thor <author@example.org> $1 +0000" \
+			"committer A U Thor <author@example.org> $1 +0000" \
+			"" \
+			"$1" |
+		  git hash-object -t commit -w --stdin))" >>$GIT_REBASE_TODO
+
+	shift
+	test -z "$*" ||
+	echo "exec $0 $*" >>$GIT_REBASE_TODO
+	EOS
+
+	git rebase HEAD -x "./append-todo.sh 5 6"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/4] sequencer: improve error message when an OID could not be parsed
  2019-03-13 10:16 [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
  2019-03-13 10:16 ` [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
@ 2019-03-13 10:16 ` Johannes Schindelin via GitGitGadget
  2019-03-14  1:14   ` Junio C Hamano
  2019-03-13 10:16 ` [PATCH 3/4] sequencer: move stale comment into correct location Johannes Schindelin via GitGitGadget
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-03-13 10:16 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The interactive rebase simply complains about an "invalid line" when the
object hash of, say, a `pick` line could not be parsed.

Let's tell the user what happened in a little more detail.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sequencer.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 95dda23eee..f91062718d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2137,7 +2137,8 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
 	item->arg_len = (int)(eol - item->arg);
 
 	if (status < 0)
-		return -1;
+		return error(_("could not parse '%.*s'"),
+			     (int)(end_of_object_name - bol), bol);
 
 	item->commit = lookup_commit_reference(r, &commit_oid);
 	return !item->commit;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 3/4] sequencer: move stale comment into correct location
  2019-03-13 10:16 [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
  2019-03-13 10:16 ` [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
  2019-03-13 10:16 ` [PATCH 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
@ 2019-03-13 10:16 ` Johannes Schindelin via GitGitGadget
  2019-03-13 10:16 ` [PATCH 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-03-13 10:16 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sequencer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index f91062718d..79a046d748 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3641,7 +3641,6 @@ static int pick_commits(struct repository *r,
 			res = do_exec(r, item->arg);
 			*end_of_arg = saved;
 
-			/* Reread the todo file if it has changed. */
 			if (res) {
 				if (opts->reschedule_failed_exec)
 					reschedule = 1;
@@ -3649,6 +3648,7 @@ static int pick_commits(struct repository *r,
 				res = error_errno(_("could not stat '%s'"),
 						  get_todo_path(opts));
 			else if (match_stat_data(&todo_list->stat, &st)) {
+				/* Reread the todo file if it has changed. */
 				todo_list_release(todo_list);
 				if (read_populate_todo(r, todo_list, opts))
 					res = -1; /* message was printed */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 4/4] get_oid(): when an object was not found, try harder
  2019-03-13 10:16 [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
                   ` (2 preceding siblings ...)
  2019-03-13 10:16 ` [PATCH 3/4] sequencer: move stale comment into correct location Johannes Schindelin via GitGitGadget
@ 2019-03-13 10:16 ` Johannes Schindelin via GitGitGadget
  2019-03-14  1:29   ` Junio C Hamano
  2019-03-13 15:33 ` [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Jeff King
  2019-03-14 15:33 ` [PATCH v2 0/4] get_short_oid: " Johannes Schindelin via GitGitGadget
  5 siblings, 1 reply; 32+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-03-13 10:16 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

It is quite possible that the loose object cache gets stale when new
objects are written. In that case, get_oid() would potentially say that
it cannot find a given object, even if it should find it.

Let's blow away the loose object cache as well as the read packs and try
again in that case.

Note: this does *not* affect the code path that was introduced to help
avoid looking for the same non-existing objects (which made some
operations really expensive via NFS): that code path is handled by the
`OBJECT_INFO_QUICK` flag (which does not even apply to `get_oid()`,
which has no equivalent flag, at least at the time this patch was
written).

This incidentally fixes the problem identified earlier where an
interactive rebase wanted to re-read (and validate) the todo list after
an `exec` command modified it.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sha1-name.c                 | 12 ++++++++++++
 t/t3429-rebase-edit-todo.sh |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/sha1-name.c b/sha1-name.c
index 6dda2c16df..cfe5c874b6 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -442,6 +442,18 @@ static enum get_oid_result get_short_oid(const char *name, int len,
 	find_short_packed_object(&ds);
 	status = finish_object_disambiguation(&ds, oid);
 
+	/*
+	 * If we didn't find it, do the usual reprepare() slow-path,
+	 * since the object may have recently been added to the repository
+	 * or migrated from loose to packed.
+	 */
+	if (status == MISSING_OBJECT) {
+		reprepare_packed_git(the_repository);
+		find_short_object_filename(&ds);
+		find_short_packed_object(&ds);
+		status = finish_object_disambiguation(&ds, oid);
+	}
+
 	if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) {
 		struct oid_array collect = OID_ARRAY_INIT;
 
diff --git a/t/t3429-rebase-edit-todo.sh b/t/t3429-rebase-edit-todo.sh
index 862f229c87..76f6d306ea 100755
--- a/t/t3429-rebase-edit-todo.sh
+++ b/t/t3429-rebase-edit-todo.sh
@@ -11,7 +11,7 @@ test_expect_success 'rebase exec modifies rebase-todo' '
 	test -e F
 '
 
-test_expect_failure SHA1 'loose object cache vs re-reading todo list' '
+test_expect_success SHA1 'loose object cache vs re-reading todo list' '
 	GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
 	export GIT_REBASE_TODO &&
 	write_script append-todo.sh <<-\EOS &&
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/4] get_oid: cope with a possibly stale loose object cache
  2019-03-13 10:16 [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
                   ` (3 preceding siblings ...)
  2019-03-13 10:16 ` [PATCH 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget
@ 2019-03-13 15:33 ` Jeff King
  2019-03-14 15:33 ` [PATCH v2 0/4] get_short_oid: " Johannes Schindelin via GitGitGadget
  5 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-03-13 15:33 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Johannes Schindelin, git, Junio C Hamano

On Wed, Mar 13, 2019 at 03:16:30AM -0700, Johannes Schindelin via GitGitGadget wrote:

> Over at the companion PR for Git for Windows
> [https://github.com/git-for-windows/git/pull/2121], I discussed this with
> Peff (who introduced the loose object cache), and he pointed out that my
> original solution was a bit too specific for the interactive rebase. He
> suggested the current method of re-reading upon a missing object instead,
> and explained patiently that this does not affect the code path for which
> the loose object cache was introduced originally: to help with performance
> issues on NFS when Git tries to find the same missing objects over and over
> again.
> 
> The regression test still reflects what I need this fix for, and I would
> rather keep it as-is (even if the fix is not about the interactive rebase
> per se), as it really tests for the functionality that I really need to
> continue to work.

The case you found is the only way I can think of to trigger this
deterministically. So I think it is quite a good test case. :)

Just to rehash a bit from that PR: I believe that this bug has been
present in the way the test checks for it since we started caching in
the sha1-abbreviation code in cc817ca3ef (sha1_name: cache readdir(3)
results in find_short_object_filename(), 2017-06-22).

But the more fundamental issue, that we do not do the usual
reprepare_packed_git() thing upon failing to find an object, goes back
forever. It's just much less common to hit that race than the one with
normal object access, because we tend to resolve objects quickly at the
beginning of a program and then spend a long time walking the graph. But
it is still possible, especially in a long-running program that resolves
names not just at the beginning. Like rebase.

  Actually, I guess one could probably construct a similar case with
  a long-running "cat-file --batch", feeding it one name, then doing a
  complete repack, and then feeding it another name. I guess I could
  think of another case. ;)

> My only concern is that this might cause some performance regressions that
> neither Peff nor I could think of, where get_oid() may run repeatedly into
> missing objects by design, and where we should not blow away and recreate
> the loose object cache all the time.

This only affects get_oid_short(), which is already pretty expensive,
because it has to disambiguate them. The usual path we care about being
fast is the 40-hex sha1 parser, which should be able to do its job
without doing anything but parsing the string. We've already run into
speed issues there with the object/refname ambiguity checks, and callers
can disable that with a flag.

So I'd say:

  - if you want to think about callers which might be sensitive to this
    change, the ones setting warn_on_object_refname_ambiguity might be
    good candidates.

  - anything except 40-hex sha1 is already pretty expensive (ref
    lookups, disambiguation, etc), so as long as we're not touching that
    path, I'm not incredibly worried.

    I did wonder for a minute whether the 40-hex sha1 resolution would
    want this similar reprepare_packed_git() handling. But it doesn't
    verify the object at all, and just passes back the oid whether it
    exists or not (and it's up to the caller to then use
    OBJECT_INFO_QUICK to access the object data if it chooses).

That's my opinion, of course. I think your intent was to solicit other
thoughts, so I'd be curious to hear if anybody disagrees. :)

> Johannes Schindelin (4):
>   rebase -i: demonstrate obscure loose object cache bug
>   sequencer: improve error message when an OID could not be parsed
>   sequencer: move stale comment into correct location
>   get_oid(): when an object was not found, try harder

The patches themselves look good to me. Thanks for pushing our
discussion forward.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-13 10:16 ` [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
@ 2019-03-13 16:11   ` Ævar Arnfjörð Bjarmason
  2019-03-13 16:35     ` Jeff King
  2019-03-14  1:12   ` Junio C Hamano
  1 sibling, 1 reply; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-13 16:11 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin


On Wed, Mar 13 2019, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> We specifically support `exec` commands in `git rebase -i`'s todo lists
> to rewrite the very same todo list. Of course, we need to validate that
> todo list when re-reading it.
>
> It is also totally legitimate to extend the todo list by `pick` lines
> using short names of commits that were created only after the rebase
> started.
>
> And this is where the loose object cache interferes with this feature:
> if *some* loose object was read whose hash shares the same first two
> digits with a commit that was not yet created when that loose object was
> created, then we fail to find that new commit by its short name in
> `get_oid()`, and the interactive rebase fails with an obscure error
> message like:
>
> 	error: invalid line 1: pick 6568fef
> 	error: please fix this using 'git rebase --edit-todo'.

As a further improvement, is there a good reason for why we wouldn't
pass something down to the oid machinery to say "we're only interested
in commits". I have a WIP series somewhere to generalize that more, but
e.g.  here locally:

    $ git rev-parse 80b06
    error: short SHA1 80b06 is ambiguous
    hint: The candidates are:
    hint:   80b06b942e commit 2019-03-13 - Revert "this patch fail whales"
    hint:   80b063bb9b blob
    hint:   80b06f0714 blob
    80b06
    $ git rev-parse 80b06^{commit}
    80b06b942ed33e597a0152b3e6ba45b7d8ead94b

I can't remember if there's some caveat with that particular peel syntax
I meant to fix, but I mean if we could pass something down to say "no
blobs or trees" shouldn't we?

Then stuff like this wouldn't die:
    
    $ git rebase -i
    hint: Waiting for your editor to close the file... Waiting for Emacs...
    error: short SHA1 80b06 is ambiguous
    hint: The candidates are:
    hint:   80b06b942e commit 2019-03-13 - Revert "this patch fail whales"
    hint:   80b063bb9b blob
    hint:   80b06f0714 blob
    error: invalid line 2: p 80b06 Revert "this patch fail whales"
    You can fix this with 'git rebase --edit-todo' and then run 'git rebase --continue'.

> [...]
> +test_expect_failure SHA1 'loose object cache vs re-reading todo list' '
> +	GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
> +	export GIT_REBASE_TODO &&
> +	write_script append-todo.sh <<-\EOS &&
> +	# For values 5 and 6, this yields SHA-1s with the same first two digits
> +	echo "pick $(git rev-parse --short \
> +		$(printf "%s\\n" \
> +			"tree $EMPTY_TREE" \
> +			"author A U Thor <author@example.org> $1 +0000" \
> +			"committer A U Thor <author@example.org> $1 +0000" \
> +			"" \
> +			"$1" |
> +		  git hash-object -t commit -w --stdin))" >>$GIT_REBASE_TODO
> +
> +	shift
> +	test -z "$*" ||
> +	echo "exec $0 $*" >>$GIT_REBASE_TODO
> +	EOS
> +
> +	git rebase HEAD -x "./append-todo.sh 5 6"
> +'
> +
>  test_done

This is a test for rebase, but perhaps it would be best put in
t1512-rev-parse-disambiguation.sh. Then when we finally port that
somehow to SHA256 we'll have all this SHA-1 golfing in the same file &
can fix it at once. Just a thought...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-13 16:11   ` Ævar Arnfjörð Bjarmason
@ 2019-03-13 16:35     ` Jeff King
  2019-03-13 16:53       ` Jeff King
                         ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Jeff King @ 2019-03-13 16:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano,
	Johannes Schindelin

On Wed, Mar 13, 2019 at 05:11:44PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > And this is where the loose object cache interferes with this feature:
> > if *some* loose object was read whose hash shares the same first two
> > digits with a commit that was not yet created when that loose object was
> > created, then we fail to find that new commit by its short name in
> > `get_oid()`, and the interactive rebase fails with an obscure error
> > message like:
> >
> > 	error: invalid line 1: pick 6568fef
> > 	error: please fix this using 'git rebase --edit-todo'.

Are we 100% sure this part is necessary? From my understanding of the
problem, even without any ambiguity get_oid() could fail due to just
plain not finding the object in question.

> As a further improvement, is there a good reason for why we wouldn't
> pass something down to the oid machinery to say "we're only interested
> in commits". I have a WIP series somewhere to generalize that more, but
> e.g.  here locally:

We have get_oid_commit() and get_oid_committish() already. Should rebase
just be using those? (I think we probably want "commit()", because we do
not expect a "pick" line to have a tag, for example.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-13 16:35     ` Jeff King
@ 2019-03-13 16:53       ` Jeff King
  2019-03-13 22:40         ` Johannes Schindelin
  2019-03-13 22:27       ` Johannes Schindelin
  2019-03-14  6:40       ` Johannes Sixt
  2 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2019-03-13 16:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano,
	Johannes Schindelin

On Wed, Mar 13, 2019 at 12:35:16PM -0400, Jeff King wrote:

> On Wed, Mar 13, 2019 at 05:11:44PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> > > And this is where the loose object cache interferes with this feature:
> > > if *some* loose object was read whose hash shares the same first two
> > > digits with a commit that was not yet created when that loose object was
> > > created, then we fail to find that new commit by its short name in
> > > `get_oid()`, and the interactive rebase fails with an obscure error
> > > message like:
> > >
> > > 	error: invalid line 1: pick 6568fef
> > > 	error: please fix this using 'git rebase --edit-todo'.
> 
> Are we 100% sure this part is necessary? From my understanding of the
> problem, even without any ambiguity get_oid() could fail due to just
> plain not finding the object in question.

Sorry, I was being dumb. We do need a two-digit collision, not because
we need an ambiguity, but because the loose-object cache is filled in
one directory at a time. So we must be unlucky enough to hit the same
directory twice, and using these objects ensures that unluckiness. And
the commit message does describe this.

If we didn't want to depend on a particular hash, we could simply do it
N times, where N is enough to get us any two entries which collide in
the first byte. By the birthday paradox, that's 50% at only 2^4. But 50%
is not very good odds for the test working. You'd need 257 iterations to
ensure a collision by the pigeon-hole principle. That's enough that I
kind of prefer the found collision here, even if we'll have to update it
for a new hash (it is correctly marked with SHA1, so I don't think
finding it later will be a problem).

By the way, while reading the test more carefully, I did notice two
funny things:

> +test_expect_failure SHA1 'loose object cache vs re-reading todo list' '
> +	GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
> +	export GIT_REBASE_TODO &&
> +	write_script append-todo.sh <<-\EOS &&
> +	# For values 5 and 6, this yields SHA-1s with the same first two digits
> +	echo "pick $(git rev-parse --short \
> +		$(printf "%s\\n" \
> +			"tree $EMPTY_TREE" \
> +			"author A U Thor <author@example.org> $1 +0000" \
> +			"committer A U Thor <author@example.org> $1 +0000" \
> +			"" \
> +			"$1" |
> +		  git hash-object -t commit -w --stdin))" >>$GIT_REBASE_TODO

Here we redirect the output into $GIT_REBASE_TODO, not stdout.

> +	shift
> +	test -z "$*" ||
> +	echo "exec $0 $*" >>$GIT_REBASE_TODO

And here we do the same thing. That second redirection is unnecessary.

I also find it interesting that it iterates over its arguments by
recursive processes. Wouldn't:

  for i in "$@"; do
	echo "pick ..." >>$GIT_REBASE_TODO
  done

be a bit more efficient (as well as more obvious?).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-13 16:35     ` Jeff King
  2019-03-13 16:53       ` Jeff King
@ 2019-03-13 22:27       ` Johannes Schindelin
  2019-03-14  0:05         ` Jeff King
  2019-03-14  6:40       ` Johannes Sixt
  2 siblings, 1 reply; 32+ messages in thread
From: Johannes Schindelin @ 2019-03-13 22:27 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason,
	Johannes Schindelin via GitGitGadget, git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1871 bytes --]

Hi Ævar & Peff,

On Wed, 13 Mar 2019, Jeff King wrote:

> On Wed, Mar 13, 2019 at 05:11:44PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> > > And this is where the loose object cache interferes with this
> > > feature: if *some* loose object was read whose hash shares the same
> > > first two digits with a commit that was not yet created when that
> > > loose object was created, then we fail to find that new commit by
> > > its short name in `get_oid()`, and the interactive rebase fails with
> > > an obscure error message like:
> > >
> > > 	error: invalid line 1: pick 6568fef
> > > 	error: please fix this using 'git rebase --edit-todo'.
> 
> Are we 100% sure this part is necessary? From my understanding of the
> problem, even without any ambiguity get_oid() could fail due to just
> plain not finding the object in question.

Indeed. It could be a typo, for example. Which is why that error message
is so helpful.

> > As a further improvement, is there a good reason for why we wouldn't
> > pass something down to the oid machinery to say "we're only interested
> > in commits". I have a WIP series somewhere to generalize that more, but
> > e.g.  here locally:
> 
> We have get_oid_commit() and get_oid_committish() already. Should rebase
> just be using those? (I think we probably want "commit()", because we do
> not expect a "pick" line to have a tag, for example.

I did think about this while developing this patch series, and decided
against conflating concerns.

And I was totally right to do so! Because I do have an internal ticket
that talks about allowing `reset v2.20.1`, which is a tag, not a commit.

Granted, it is easy to work around: just use `reset v2.20.1^0`, but it is
quite annoying that we do not allow this at the moment: even if we do
allow `get_oid()` to resolve the tag, we don't peel it to the commit.

#leftoverbits

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-13 16:53       ` Jeff King
@ 2019-03-13 22:40         ` Johannes Schindelin
  2019-03-14  0:07           ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Johannes Schindelin @ 2019-03-13 22:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason,
	Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Hi Peff,

On Wed, 13 Mar 2019, Jeff King wrote:

> 
> By the way, while reading the test more carefully, I did notice two
> funny things:
> 
> > +test_expect_failure SHA1 'loose object cache vs re-reading todo list' '
> > +	GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
> > +	export GIT_REBASE_TODO &&
> > +	write_script append-todo.sh <<-\EOS &&
> > +	# For values 5 and 6, this yields SHA-1s with the same first two digits
> > +	echo "pick $(git rev-parse --short \
> > +		$(printf "%s\\n" \
> > +			"tree $EMPTY_TREE" \
> > +			"author A U Thor <author@example.org> $1 +0000" \
> > +			"committer A U Thor <author@example.org> $1 +0000" \
> > +			"" \
> > +			"$1" |
> > +		  git hash-object -t commit -w --stdin))" >>$GIT_REBASE_TODO
> 
> Here we redirect the output into $GIT_REBASE_TODO, not stdout.

Indeed, because we want to append a `pick` command to the todo list.

> > +	shift
> > +	test -z "$*" ||
> > +	echo "exec $0 $*" >>$GIT_REBASE_TODO
> 
> And here we do the same thing. That second redirection is unnecessary.

It is actually not unnecessary, but to the contrary quite necessary to
achieve the intended effect: with this command, we append an `exec` line
to the todo list that is guaranteed to be executed after the `pick`
command that we added earlier.

> I also find it interesting that it iterates over its arguments by
> recursive processes. Wouldn't:
> 
>   for i in "$@"; do
> 	echo "pick ..." >>$GIT_REBASE_TODO
>   done
> 
> be a bit more efficient (as well as more obvious?).

It would be more efficient, but it would also fail to test for the
regression.

Remember: it is absolutely crucial for the regression test that the parent
process' loose object cache already has been initialized *before* the new
commit is created and then picked. Otherwise the cache would contain that
commit object already. The whole point of the regression test is that the
cache does *not* contain that object.

The only way we can guarantee that order in this test is if the first
commit is created and picked *before* we `exec` to create the second
commit and then append the `pick` line for that one.

Now, I could have tried to play some fake editor games because it is not
strictly necessary to create the first commit via an `exec` line. Instead,
I could have generated it before the rebase, and then initialized the todo
list with the `pick` of the first commit and then an `exec` of the script
that creates the second commit and then appends a `pick` line for that.

But the reality is that this would have resulted in more code! And not
even easier-to-read code at that! (I know, because one of my unsent
iterations did exactly that.)

So instead, I opted for using the `-x` option to modify the initial todo
list to begin with (it consists of a single `noop`, obviously). This will
add that `exec` line that calls the script that creates the first commit
and appends the `pick` line.

It *also* adds an `exec` line to guarantee that the second commit is
created, and a `pick` line for it is appended to the todo list, *after*
the sequencer initalized the loose object cache by virtue of picking the
first commit.

So yes, it is crucial that the `append-todo.sh` script is `exec`ed
*twice*. Otherwise the first `pick` would initialize the loose object
cache *after* the second commit was created, which Just Works, even
without this here patch series.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-13 22:27       ` Johannes Schindelin
@ 2019-03-14  0:05         ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-03-14  0:05 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason,
	Johannes Schindelin via GitGitGadget, git, Junio C Hamano

On Wed, Mar 13, 2019 at 11:27:02PM +0100, Johannes Schindelin wrote:

> > We have get_oid_commit() and get_oid_committish() already. Should rebase
> > just be using those? (I think we probably want "commit()", because we do
> > not expect a "pick" line to have a tag, for example.
> 
> I did think about this while developing this patch series, and decided
> against conflating concerns.

Yeah, I'd definitely agree it is a separate topic.

> And I was totally right to do so! Because I do have an internal ticket
> that talks about allowing `reset v2.20.1`, which is a tag, not a commit.

Thanks for that example. I agree it's plausible for somebody to be using
a tag there, so committish() is probably the better option.

(It also doesn't really hurt disambiguation much; in any given repo,
commits tend to be dwarfed by trees and blobs by an order of magnitude,
and tags are even an order of magnitude smaller than commits).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-13 22:40         ` Johannes Schindelin
@ 2019-03-14  0:07           ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-03-14  0:07 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason,
	Johannes Schindelin via GitGitGadget, git, Junio C Hamano

On Wed, Mar 13, 2019 at 11:40:54PM +0100, Johannes Schindelin wrote:

> > > +	shift
> > > +	test -z "$*" ||
> > > +	echo "exec $0 $*" >>$GIT_REBASE_TODO
> > 
> > And here we do the same thing. That second redirection is unnecessary.
> 
> It is actually not unnecessary, but to the contrary quite necessary to
> achieve the intended effect: with this command, we append an `exec` line
> to the todo list that is guaranteed to be executed after the `pick`
> command that we added earlier.

Ah, right, I was totally misreading it. We are not execing _now_, but
rather appending an exec. And that explains my confusion about
recursion.

So your nice explanation below has gone to waste, because I have just
now understood what is going on. ;)

I agree that what you have here is probably the simplest way of
demonstrating the issue.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-13 10:16 ` [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
  2019-03-13 16:11   ` Ævar Arnfjörð Bjarmason
@ 2019-03-14  1:12   ` Junio C Hamano
  2019-03-14 12:44     ` Johannes Schindelin
  1 sibling, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2019-03-14  1:12 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Jeff King, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +test_expect_failure SHA1 'loose object cache vs re-reading todo list' '
> +	GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
> +	export GIT_REBASE_TODO &&
> +	write_script append-todo.sh <<-\EOS &&
> +	# For values 5 and 6, this yields SHA-1s with the same first two digits
> +	echo "pick $(git rev-parse --short \
> +		$(printf "%s\\n" \
> +			"tree $EMPTY_TREE" \
> +			"author A U Thor <author@example.org> $1 +0000" \
> +			"committer A U Thor <author@example.org> $1 +0000" \
> +			"" \
> +			"$1" |
> +		  git hash-object -t commit -w --stdin))" >>$GIT_REBASE_TODO

In the generated append-todo.sh, as the <<EOS above is quoted, we
will see ">>$GIT_REBASE_TODO" literally (not the actual pathname
that begins with .git, but a reference to the variable).  test-lint
may or may not catch it, but redirecting into variable reference
would trigger a(n arguably misguided) warning when run with some
versions of bash.  Quoting

		echo ... >>"$GIT_REBASE_TODO"

would work it around, of course.

> +	shift
> +	test -z "$*" ||
> +	echo "exec $0 $*" >>$GIT_REBASE_TODO

Likewise.

> +	EOS
> +
> +	git rebase HEAD -x "./append-todo.sh 5 6"
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] sequencer: improve error message when an OID could not be parsed
  2019-03-13 10:16 ` [PATCH 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
@ 2019-03-14  1:14   ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2019-03-14  1:14 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Jeff King, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The interactive rebase simply complains about an "invalid line" when the
> object hash of, say, a `pick` line could not be parsed.
>
> Let's tell the user what happened in a little more detail.

Makes sense.

>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  sequencer.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index 95dda23eee..f91062718d 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2137,7 +2137,8 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
>  	item->arg_len = (int)(eol - item->arg);
>  
>  	if (status < 0)
> -		return -1;
> +		return error(_("could not parse '%.*s'"),
> +			     (int)(end_of_object_name - bol), bol);
>  
>  	item->commit = lookup_commit_reference(r, &commit_oid);
>  	return !item->commit;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] get_oid(): when an object was not found, try harder
  2019-03-13 10:16 ` [PATCH 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget
@ 2019-03-14  1:29   ` Junio C Hamano
  2019-03-14  2:22     ` Jeff King
  2019-03-14 13:17     ` Johannes Schindelin
  0 siblings, 2 replies; 32+ messages in thread
From: Junio C Hamano @ 2019-03-14  1:29 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Jeff King, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> @@ -442,6 +442,18 @@ static enum get_oid_result get_short_oid(const char *name, int len,
>  	find_short_packed_object(&ds);
>  	status = finish_object_disambiguation(&ds, oid);
>  
> +	/*
> +	 * If we didn't find it, do the usual reprepare() slow-path,
> +	 * since the object may have recently been added to the repository
> +	 * or migrated from loose to packed.
> +	 */
> +	if (status == MISSING_OBJECT) {
> +		reprepare_packed_git(the_repository);
> +		find_short_object_filename(&ds);
> +		find_short_packed_object(&ds);
> +		status = finish_object_disambiguation(&ds, oid);
> +	}
> +

This looks obviously correct, but two things that made me wonder
briefly were:

 1. is reprepare_packed_git() a bit too heavy-weight, if the only
    thing we are addressing is the loose-object cache going stale?

 2. is there a way to cleanly avoid the three-line duplicate?

My tentative answers are (1) even if it is, but get_short_oid() is
already heavy-weight enough; it won't be worth restructuring the
code to make it possible to clear only the loose-object cache, and
(2) a loop that runs twice when the first result is MISSING_OBJECT
and otherwise leaves after once would need an extra variable, its
iniialization, check and increment, which is more than what we might
save with such a restructuring, so it won't be worth pursuing.

But others may have better ideas, as always ;-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] get_oid(): when an object was not found, try harder
  2019-03-14  1:29   ` Junio C Hamano
@ 2019-03-14  2:22     ` Jeff King
  2019-03-14  2:40       ` Jeff King
  2019-03-14  3:49       ` Junio C Hamano
  2019-03-14 13:17     ` Johannes Schindelin
  1 sibling, 2 replies; 32+ messages in thread
From: Jeff King @ 2019-03-14  2:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

On Thu, Mar 14, 2019 at 10:29:49AM +0900, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > @@ -442,6 +442,18 @@ static enum get_oid_result get_short_oid(const char *name, int len,
> >  	find_short_packed_object(&ds);
> >  	status = finish_object_disambiguation(&ds, oid);
> >  
> > +	/*
> > +	 * If we didn't find it, do the usual reprepare() slow-path,
> > +	 * since the object may have recently been added to the repository
> > +	 * or migrated from loose to packed.
> > +	 */
> > +	if (status == MISSING_OBJECT) {
> > +		reprepare_packed_git(the_repository);
> > +		find_short_object_filename(&ds);
> > +		find_short_packed_object(&ds);
> > +		status = finish_object_disambiguation(&ds, oid);
> > +	}
> > +
> 
> This looks obviously correct, but two things that made me wonder
> briefly were:
> 
>  1. is reprepare_packed_git() a bit too heavy-weight, if the only
>     thing we are addressing is the loose-object cache going stale?

It's not the only thing we are addressing. :)

Try this:

-- >8 --
mkfifo in
(git cat-file --batch-check <in) &
exec 9>in

git commit --allow-empty -m one
git commit --allow-empty -m two

git rev-parse --short HEAD^ >&9
git repack -adk
git rev-parse --short HEAD >&9
-- >8 --

The second object will (usually) be reported as "missing", even though
by calling reprepare_packed_git(), we would then find it in the
packfile.

I say "usually" because if the two commits have the same first-byte to
their sha1, then we'd load both into the loose cache during the first
request, and use that during the second request.

After this patch, it works consistently (we flush both the loose cache
and the cached set of packs, and find the packed version).

>  2. is there a way to cleanly avoid the three-line duplicate?

Yeah, as you noted, I think the boilerplate is worse than the
duplication. The most readable alternative to me is a separate function,
like:

diff --git a/sha1-name.c b/sha1-name.c
index cfe5c874b6..441666993b 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -411,6 +411,14 @@ static int sort_ambiguous(const void *a, const void *b)
 	return a_type_sort > b_type_sort ? 1 : -1;
 }
 
+static enum get_oid_result(struct disambiguate_state *ds,
+			   struct object_id *oid)
+{
+	find_short_object_filename(&ds);
+	find_short_packed_object(&ds);
+	return finish_object_disambiguation(&ds, oid);
+}
+
 static enum get_oid_result get_short_oid(const char *name, int len,
 					 struct object_id *oid,
 					 unsigned flags)
@@ -438,20 +446,16 @@ static enum get_oid_result get_short_oid(const char *name, int len,
 	else
 		ds.fn = default_disambiguate_hint;
 
-	find_short_object_filename(&ds);
-	find_short_packed_object(&ds);
-	status = finish_object_disambiguation(&ds, oid);
+	status = do_get_short_oid(&ds, oid);
 
 	/*
 	 * If we didn't find it, do the usual reprepare() slow-path,
 	 * since the object may have recently been added to the repository
 	 * or migrated from loose to packed.
 	 */
 	if (status == MISSING_OBJECT) {
 		reprepare_packed_git(the_repository);
-		find_short_object_filename(&ds);
-		find_short_packed_object(&ds);
-		status = finish_object_disambiguation(&ds, oid);
+		status = do_get_short_oid(&ds, oid);
 	}
 
 	if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) {

But what I find particularly ugly is not just that it's more lines, but
that the assumptions and outputs of do_get_short_oid() aren't
particularly clear.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] get_oid(): when an object was not found, try harder
  2019-03-14  2:22     ` Jeff King
@ 2019-03-14  2:40       ` Jeff King
  2019-03-14  4:05         ` Junio C Hamano
  2019-03-14  3:49       ` Junio C Hamano
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff King @ 2019-03-14  2:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

On Wed, Mar 13, 2019 at 10:22:46PM -0400, Jeff King wrote:

> Try this:
> 
> -- >8 --
> mkfifo in
> (git cat-file --batch-check <in) &
> exec 9>in
> 
> git commit --allow-empty -m one
> git commit --allow-empty -m two
> 
> git rev-parse --short HEAD^ >&9
> git repack -adk
> git rev-parse --short HEAD >&9
> -- >8 --
> 
> The second object will (usually) be reported as "missing", even though
> by calling reprepare_packed_git(), we would then find it in the
> packfile.
> 
> I say "usually" because if the two commits have the same first-byte to
> their sha1, then we'd load both into the loose cache during the first
> request, and use that during the second request.
> 
> After this patch, it works consistently (we flush both the loose cache
> and the cached set of packs, and find the packed version).

By the way, an interesting implication of this is that we're doing the
re-prepare only when we come up with no object. But we _could_ actually
come up with the wrong object when we should say "ambiguous". I.e., the
user asks for 1234abcd. We have 1234abcde in a pack, and 1234abcdf
loose. A simultaneous process repacks, and we see neither copy of
1234abcdf, reporting that 1234abcde is the right answer.

I'm not sure it's really worth addressing (just because I don't think
there's a good way to do it that isn't expensive).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] get_oid(): when an object was not found, try harder
  2019-03-14  2:22     ` Jeff King
  2019-03-14  2:40       ` Jeff King
@ 2019-03-14  3:49       ` Junio C Hamano
  1 sibling, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2019-03-14  3:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

Jeff King <peff@peff.net> writes:

>>  1. is reprepare_packed_git() a bit too heavy-weight, if the only
>>     thing we are addressing is the loose-object cache going stale?
>
> It's not the only thing we are addressing. :)
>
> Try this:

Yes, I knew about repacking.  I was alluding to the overly heavy
reference to loose-object-cache in the log message ;-).

>>  2. is there a way to cleanly avoid the three-line duplicate?
>
> Yeah, as you noted, I think the boilerplate is worse than the
> duplication. The most readable alternative to me is a separate function,
> like:
> ...
> But what I find particularly ugly is not just that it's more lines, but
> that the assumptions and outputs of do_get_short_oid() aren't
> particularly clear.

Yeah, exactly.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] get_oid(): when an object was not found, try harder
  2019-03-14  2:40       ` Jeff King
@ 2019-03-14  4:05         ` Junio C Hamano
  2019-03-14 18:55           ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2019-03-14  4:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> I'm not sure it's really worth addressing (just because I don't think
> there's a good way to do it that isn't expensive).

I do not think so, either.  Not at this layer, anyway.

If a "-x" command newly created an object whose prefix makes it
ambiguous against an existing object, we would not notice, unless we
refresh the loose object cache, that we now have two objects that
share the prefix.  And we will find an object (the older one) with
the prefix in the stale cache without hitting "oh we do not find it,
let's retry" codepath.  So to solve both sides of the coin, we'd
need invalidation either way, which obviously defeats the whole
caching concept, doesn't it?

But when we know we are running things like interactive rebase with
"-x" that (1) creates new objects while we are waiting for that
process to finish and (2) not performance critical, the loop that
drives these steps can flush any cache after seeing an external
process to return control to us.  Such an approach would still not
solve the issue a totally unrelated "repack" process creates by
mucking with the object database behind our back, but at least the
damage to the code is much better isolated ;-)





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-13 16:35     ` Jeff King
  2019-03-13 16:53       ` Jeff King
  2019-03-13 22:27       ` Johannes Schindelin
@ 2019-03-14  6:40       ` Johannes Sixt
  2019-03-14 13:06         ` Johannes Schindelin
  2 siblings, 1 reply; 32+ messages in thread
From: Johannes Sixt @ 2019-03-14  6:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason,
	Johannes Schindelin via GitGitGadget, git, Junio C Hamano,
	Johannes Schindelin

Am 13.03.19 um 17:35 schrieb Jeff King:
> On Wed, Mar 13, 2019 at 05:11:44PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> As a further improvement, is there a good reason for why we wouldn't
>> pass something down to the oid machinery to say "we're only interested
>> in commits". I have a WIP series somewhere to generalize that more, but
>> e.g.  here locally:
> 
> We have get_oid_commit() and get_oid_committish() already. Should rebase
> just be using those? (I think we probably want "commit()", because we do
> not expect a "pick" line to have a tag, for example.

'pick' needs all the power of revision expressions. Not all too
infrequently I do insert a pick line with a rev expression argument.
Assuming that the resolved object is a commit is too strict.

-- Hannes

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-14  1:12   ` Junio C Hamano
@ 2019-03-14 12:44     ` Johannes Schindelin
  0 siblings, 0 replies; 32+ messages in thread
From: Johannes Schindelin @ 2019-03-14 12:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git, Jeff King

Hi Junio,

On Thu, 14 Mar 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > +test_expect_failure SHA1 'loose object cache vs re-reading todo list' '
> > +	GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
> > +	export GIT_REBASE_TODO &&
> > +	write_script append-todo.sh <<-\EOS &&
> > +	# For values 5 and 6, this yields SHA-1s with the same first two digits
> > +	echo "pick $(git rev-parse --short \
> > +		$(printf "%s\\n" \
> > +			"tree $EMPTY_TREE" \
> > +			"author A U Thor <author@example.org> $1 +0000" \
> > +			"committer A U Thor <author@example.org> $1 +0000" \
> > +			"" \
> > +			"$1" |
> > +		  git hash-object -t commit -w --stdin))" >>$GIT_REBASE_TODO
> 
> In the generated append-todo.sh, as the <<EOS above is quoted, we
> will see ">>$GIT_REBASE_TODO" literally (not the actual pathname
> that begins with .git, but a reference to the variable).  test-lint
> may or may not catch it, but redirecting into variable reference
> would trigger a(n arguably misguided) warning when run with some
> versions of bash.  Quoting
> 
> 		echo ... >>"$GIT_REBASE_TODO"
> 
> would work it around, of course.
> 
> > +	shift
> > +	test -z "$*" ||
> > +	echo "exec $0 $*" >>$GIT_REBASE_TODO
> 
> Likewise.

Yep. As a rule of thumb, I should always quote variable expansions.

Thanks,
Dscho

> 
> > +	EOS
> > +
> > +	git rebase HEAD -x "./append-todo.sh 5 6"
> > +'
> > +
> >  test_done
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-14  6:40       ` Johannes Sixt
@ 2019-03-14 13:06         ` Johannes Schindelin
  0 siblings, 0 replies; 32+ messages in thread
From: Johannes Schindelin @ 2019-03-14 13:06 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Jeff King, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin via GitGitGadget, git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

Hi Hannes,

On Thu, 14 Mar 2019, Johannes Sixt wrote:

> Am 13.03.19 um 17:35 schrieb Jeff King:
> > On Wed, Mar 13, 2019 at 05:11:44PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> As a further improvement, is there a good reason for why we wouldn't
> >> pass something down to the oid machinery to say "we're only interested
> >> in commits". I have a WIP series somewhere to generalize that more, but
> >> e.g.  here locally:
> > 
> > We have get_oid_commit() and get_oid_committish() already. Should rebase
> > just be using those? (I think we probably want "commit()", because we do
> > not expect a "pick" line to have a tag, for example.
> 
> 'pick' needs all the power of revision expressions. Not all too
> infrequently I do insert a pick line with a rev expression argument.
> Assuming that the resolved object is a commit is too strict.

No need to worry. `get_oid_committish()` can resolve all kinds of crazy
expressions. In a way, it is surprising that it will even resolve
"HEAD:Makefile". The only thing (that I know of) that
`get_oid_committish()` does that `get_oid()` does not is when you pass a
short hash which is unambiguous *only* when you exclude blobs and trees
from the disambiguation.

So I think `get_oid_committish()` will serve us really well, as we really
want to parse the given hash name into the `struct todo_item`'s `commit`
field.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] get_oid(): when an object was not found, try harder
  2019-03-14  1:29   ` Junio C Hamano
  2019-03-14  2:22     ` Jeff King
@ 2019-03-14 13:17     ` Johannes Schindelin
  2019-03-14 18:57       ` Jeff King
  1 sibling, 1 reply; 32+ messages in thread
From: Johannes Schindelin @ 2019-03-14 13:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git, Jeff King

Hi Junio & Peff,

On Thu, 14 Mar 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > @@ -442,6 +442,18 @@ static enum get_oid_result get_short_oid(const char *name, int len,
> >  	find_short_packed_object(&ds);
> >  	status = finish_object_disambiguation(&ds, oid);
> >  
> > +	/*
> > +	 * If we didn't find it, do the usual reprepare() slow-path,
> > +	 * since the object may have recently been added to the repository
> > +	 * or migrated from loose to packed.
> > +	 */
> > +	if (status == MISSING_OBJECT) {
> > +		reprepare_packed_git(the_repository);
> > +		find_short_object_filename(&ds);
> > +		find_short_packed_object(&ds);
> > +		status = finish_object_disambiguation(&ds, oid);
> > +	}
> > +
> 
> This looks obviously correct, but two things that made me wonder
> briefly were:
> 
>  1. is reprepare_packed_git() a bit too heavy-weight, if the only
>     thing we are addressing is the loose-object cache going stale?
> 
>  2. is there a way to cleanly avoid the three-line duplicate?
> 
> My tentative answers are (1) even if it is, but get_short_oid() is
> already heavy-weight enough; it won't be worth restructuring the
> code to make it possible to clear only the loose-object cache, and
> (2) a loop that runs twice when the first result is MISSING_OBJECT
> and otherwise leaves after once would need an extra variable, its
> iniialization, check and increment, which is more than what we might
> save with such a restructuring, so it won't be worth pursuing.
> 
> But others may have better ideas, as always ;-)

Peff tried with a function, but I think that this would actually be a
really appropriate occasion for a well-placed `goto`:

-- snip --
diff --git a/sha1-name.c b/sha1-name.c
index 6dda2c16df10..36a66026964a 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -415,7 +415,7 @@ static enum get_oid_result get_short_oid(const char *name, int len,
 					 struct object_id *oid,
 					 unsigned flags)
 {
-	int status;
+	int status, attempts = 0;
 	struct disambiguate_state ds;
 	int quietly = !!(flags & GET_OID_QUIETLY);
 
@@ -438,10 +438,21 @@ static enum get_oid_result get_short_oid(const char *name, int len,
 	else
 		ds.fn = default_disambiguate_hint;
 
+try_again:
 	find_short_object_filename(&ds);
 	find_short_packed_object(&ds);
 	status = finish_object_disambiguation(&ds, oid);
 
+	/*
+	 * If we did not find it, do the usual reprepare() slow-path, since the
+	 * object may have recently been added to the repository or migrated
+	 * from loose to packed.
+	 */
+	if (status == MISSING_OBJECT && !attempts++) {
+		reprepare_packed_git(the_repository);
+		goto try_again;
+	}
+
 	if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) {
 		struct oid_array collect = OID_ARRAY_INIT;
 
-- snap --

Granted, it's 11 lines inserted and one changed as opposed to 12 lines
inserted, but it does make the code DRYer (and therefore slightly safer to
modify in the future). I pushed this to the GitGitGadget PR.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/4] rebase -i: demonstrate obscure loose object cache bug
  2019-03-14 15:33 ` [PATCH v2 0/4] get_short_oid: " Johannes Schindelin via GitGitGadget
@ 2019-03-14 15:33   ` Johannes Schindelin via GitGitGadget
  2019-03-14 15:33   ` [PATCH v2 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-03-14 15:33 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

We specifically support `exec` commands in `git rebase -i`'s todo lists
to rewrite the very same todo list. Of course, we need to validate that
todo list when re-reading it.

It is also totally legitimate to extend the todo list by `pick` lines
using short names of commits that were created only after the rebase
started.

And this is where the loose object cache interferes with this feature:
if *some* loose object was read whose hash shares the same first two
digits with a commit that was not yet created when that loose object was
created, then we fail to find that new commit by its short name in
`get_oid()`, and the interactive rebase fails with an obscure error
message like:

	error: invalid line 1: pick 6568fef
	error: please fix this using 'git rebase --edit-todo'.

Let's first demonstrate that this is actually a bug in a new regression
test, in a separate commit so that other developers who do not believe
me can cherry-pick it to confirm the problem.

This new regression test generates two commits whose hashes share the
first two hex digits (so that their corresponding loose objects live in
the same subdirectory of .git/objects/, and are therefore supposed to be
in the same loose object cache bin).

It then picks the first, to make sure that the loose object cache is
initialized and cached that object directory, then generates the second
commit and picks it, too. Since the commit was generated in a different
process than the sequencer that wants to pick it, the loose object cache
had no chance of being updated in the meantime.

Technically, we would need only one `exec` command in this regression
test case, but for ease of implementation, it uses a pseudo-recursive
call to the same script.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t3429-rebase-edit-todo.sh | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/t/t3429-rebase-edit-todo.sh b/t/t3429-rebase-edit-todo.sh
index b9292dfc2a..83e5bb5eba 100755
--- a/t/t3429-rebase-edit-todo.sh
+++ b/t/t3429-rebase-edit-todo.sh
@@ -11,4 +11,26 @@ test_expect_success 'rebase exec modifies rebase-todo' '
 	test -e F
 '
 
+test_expect_failure SHA1 'loose object cache vs re-reading todo list' '
+	GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
+	export GIT_REBASE_TODO &&
+	write_script append-todo.sh <<-\EOS &&
+	# For values 5 and 6, this yields SHA-1s with the same first two digits
+	echo "pick $(git rev-parse --short \
+		$(printf "%s\\n" \
+			"tree $EMPTY_TREE" \
+			"author A U Thor <author@example.org> $1 +0000" \
+			"committer A U Thor <author@example.org> $1 +0000" \
+			"" \
+			"$1" |
+		  git hash-object -t commit -w --stdin))" >>"$GIT_REBASE_TODO"
+
+	shift
+	test -z "$*" ||
+	echo "exec $0 $*" >>"$GIT_REBASE_TODO"
+	EOS
+
+	git rebase HEAD -x "./append-todo.sh 5 6"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/4] get_short_oid: cope with a possibly stale loose object cache
  2019-03-13 10:16 [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
                   ` (4 preceding siblings ...)
  2019-03-13 15:33 ` [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Jeff King
@ 2019-03-14 15:33 ` Johannes Schindelin via GitGitGadget
  2019-03-14 15:33   ` [PATCH v2 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
                     ` (3 more replies)
  5 siblings, 4 replies; 32+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-03-14 15:33 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano

Being rather a power user of the interactive rebase, I discovered recently
that one of my scripts ran into an odd problem where an interactive rebase
appended new commands to the todo list, and the interactive rebase failed to
validate the (short) OIDs. But when I tried to fix things via git rebase
--edit-todo, the interactive rebase had no longer any problem to validate
them.

Turns out that I generated the objects during the interactive rebase (yes, I
implemented a 2-phase rebase
[https://github.com/git-for-windows/build-extra/blob/master/ever-green.sh]),
and that their hashes happened to share the first two hex digits with some
loose object that had already been read in the same interactive rebase run,
causing a now-stale loose object cache to be initialized for that 
.git/objects/?? subdirectory.

It was quite the, let's say, adventure, to track that one down.

Over at the companion PR for Git for Windows
[https://github.com/git-for-windows/git/pull/2121], I discussed this with
Peff (who introduced the loose object cache), and he pointed out that my
original solution was a bit too specific for the interactive rebase. He
suggested the current method of re-reading upon a missing object instead,
and explained patiently that this does not affect the code path for which
the loose object cache was introduced originally: to help with performance
issues on NFS when Git tries to find the same missing objects over and over
again.

The regression test still reflects what I need this fix for, and I would
rather keep it as-is (even if the fix is not about the interactive rebase
per se), as it really tests for the functionality that I really need to
continue to work. Since I (and other contributors touching the rebase code)
frequently run only the t34* subset of the test suite while developing git
rebase patches, I also want to keep the test case there, rather than putting
it in with other SHA1-specific disambiguation test cases in 
t1512-rev-parse-disambiguation.sh.

My only concern was that this might cause some performance regressions that
neither Peff nor I could think of, where get_oid() may run repeatedly into
missing objects by design, and where we should not blow away and recreate
the loose object cache all the time. But Peff dispelled my fears, pointing
out that get_oid() is already rather expensive and therefore avoided in hot
loops.

Changes since v1:

 * The added test case now properly quotes $GIT_REBASE_TODO (even if it does
   not contain spaces, it is simply safer to quote variable expansions in
   Bash always [https://mywiki.wooledge.org/Quotes].
 * The three lines to try again to disambiguate are no longer repeated.
   Instead, we use a Beautiful Goto.
 * The commit message of 2/2 does not stress lean on the loose object cache
   so much, but also talks about packs a bit more.

Johannes Schindelin (4):
  rebase -i: demonstrate obscure loose object cache bug
  sequencer: improve error message when an OID could not be parsed
  sequencer: move stale comment into correct location
  get_oid(): when an object was not found, try harder

 sequencer.c                 |  5 +++--
 sha1-name.c                 | 13 ++++++++++++-
 t/t3429-rebase-edit-todo.sh | 22 ++++++++++++++++++++++
 3 files changed, 37 insertions(+), 3 deletions(-)


base-commit: e902e9bcae2010bc42648c80ab6adc6c5a16a4a5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-161%2Fdscho%2Frereading-todo-list-and-loose-object-cache-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-161/dscho/rereading-todo-list-and-loose-object-cache-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/161

Range-diff vs v1:

 1:  b3fcd37765 ! 1:  63ddb1dd04 rebase -i: demonstrate obscure loose object cache bug
     @@ -60,11 +60,11 @@
      +			"committer A U Thor <author@example.org> $1 +0000" \
      +			"" \
      +			"$1" |
     -+		  git hash-object -t commit -w --stdin))" >>$GIT_REBASE_TODO
     ++		  git hash-object -t commit -w --stdin))" >>"$GIT_REBASE_TODO"
      +
      +	shift
      +	test -z "$*" ||
     -+	echo "exec $0 $*" >>$GIT_REBASE_TODO
     ++	echo "exec $0 $*" >>"$GIT_REBASE_TODO"
      +	EOS
      +
      +	git rebase HEAD -x "./append-todo.sh 5 6"
 2:  d8c4a3dde5 = 2:  f0cbfacf9a sequencer: improve error message when an OID could not be parsed
 3:  d749c63170 = 3:  b55e14cd63 sequencer: move stale comment into correct location
 4:  994446236d ! 4:  41fc6eb9ba get_oid(): when an object was not found, try harder
     @@ -3,8 +3,9 @@
          get_oid(): when an object was not found, try harder
      
          It is quite possible that the loose object cache gets stale when new
     -    objects are written. In that case, get_oid() would potentially say that
     -    it cannot find a given object, even if it should find it.
     +    objects are written. Or that a new pack was installed. In those cases,
     +    get_oid() would potentially say that it cannot find a given object, even
     +    if it should find it.
      
          Let's blow away the loose object cache as well as the read packs and try
          again in that case.
     @@ -27,19 +28,31 @@
       --- a/sha1-name.c
       +++ b/sha1-name.c
      @@
     + 					 struct object_id *oid,
     + 					 unsigned flags)
     + {
     +-	int status;
     ++	int status, attempts = 0;
     + 	struct disambiguate_state ds;
     + 	int quietly = !!(flags & GET_OID_QUIETLY);
     + 
     +@@
     + 	else
     + 		ds.fn = default_disambiguate_hint;
     + 
     ++try_again:
     + 	find_short_object_filename(&ds);
       	find_short_packed_object(&ds);
       	status = finish_object_disambiguation(&ds, oid);
       
      +	/*
     -+	 * If we didn't find it, do the usual reprepare() slow-path,
     -+	 * since the object may have recently been added to the repository
     -+	 * or migrated from loose to packed.
     ++	 * If we did not find it, do the usual reprepare() slow-path, since the
     ++	 * object may have recently been added to the repository or migrated
     ++	 * from loose to packed.
      +	 */
     -+	if (status == MISSING_OBJECT) {
     ++	if (status == MISSING_OBJECT && !attempts++) {
      +		reprepare_packed_git(the_repository);
     -+		find_short_object_filename(&ds);
     -+		find_short_packed_object(&ds);
     -+		status = finish_object_disambiguation(&ds, oid);
     ++		goto try_again;
      +	}
      +
       	if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) {

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 2/4] sequencer: improve error message when an OID could not be parsed
  2019-03-14 15:33 ` [PATCH v2 0/4] get_short_oid: " Johannes Schindelin via GitGitGadget
  2019-03-14 15:33   ` [PATCH v2 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
@ 2019-03-14 15:33   ` Johannes Schindelin via GitGitGadget
  2019-03-14 15:33   ` [PATCH v2 3/4] sequencer: move stale comment into correct location Johannes Schindelin via GitGitGadget
  2019-03-14 15:33   ` [PATCH v2 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget
  3 siblings, 0 replies; 32+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-03-14 15:33 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The interactive rebase simply complains about an "invalid line" when the
object hash of, say, a `pick` line could not be parsed.

Let's tell the user what happened in a little more detail.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sequencer.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 95dda23eee..f91062718d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2137,7 +2137,8 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
 	item->arg_len = (int)(eol - item->arg);
 
 	if (status < 0)
-		return -1;
+		return error(_("could not parse '%.*s'"),
+			     (int)(end_of_object_name - bol), bol);
 
 	item->commit = lookup_commit_reference(r, &commit_oid);
 	return !item->commit;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 3/4] sequencer: move stale comment into correct location
  2019-03-14 15:33 ` [PATCH v2 0/4] get_short_oid: " Johannes Schindelin via GitGitGadget
  2019-03-14 15:33   ` [PATCH v2 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
  2019-03-14 15:33   ` [PATCH v2 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
@ 2019-03-14 15:33   ` Johannes Schindelin via GitGitGadget
  2019-03-14 15:33   ` [PATCH v2 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget
  3 siblings, 0 replies; 32+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-03-14 15:33 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sequencer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index f91062718d..79a046d748 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3641,7 +3641,6 @@ static int pick_commits(struct repository *r,
 			res = do_exec(r, item->arg);
 			*end_of_arg = saved;
 
-			/* Reread the todo file if it has changed. */
 			if (res) {
 				if (opts->reschedule_failed_exec)
 					reschedule = 1;
@@ -3649,6 +3648,7 @@ static int pick_commits(struct repository *r,
 				res = error_errno(_("could not stat '%s'"),
 						  get_todo_path(opts));
 			else if (match_stat_data(&todo_list->stat, &st)) {
+				/* Reread the todo file if it has changed. */
 				todo_list_release(todo_list);
 				if (read_populate_todo(r, todo_list, opts))
 					res = -1; /* message was printed */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 4/4] get_oid(): when an object was not found, try harder
  2019-03-14 15:33 ` [PATCH v2 0/4] get_short_oid: " Johannes Schindelin via GitGitGadget
                     ` (2 preceding siblings ...)
  2019-03-14 15:33   ` [PATCH v2 3/4] sequencer: move stale comment into correct location Johannes Schindelin via GitGitGadget
@ 2019-03-14 15:33   ` Johannes Schindelin via GitGitGadget
  3 siblings, 0 replies; 32+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-03-14 15:33 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

It is quite possible that the loose object cache gets stale when new
objects are written. Or that a new pack was installed. In those cases,
get_oid() would potentially say that it cannot find a given object, even
if it should find it.

Let's blow away the loose object cache as well as the read packs and try
again in that case.

Note: this does *not* affect the code path that was introduced to help
avoid looking for the same non-existing objects (which made some
operations really expensive via NFS): that code path is handled by the
`OBJECT_INFO_QUICK` flag (which does not even apply to `get_oid()`,
which has no equivalent flag, at least at the time this patch was
written).

This incidentally fixes the problem identified earlier where an
interactive rebase wanted to re-read (and validate) the todo list after
an `exec` command modified it.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sha1-name.c                 | 13 ++++++++++++-
 t/t3429-rebase-edit-todo.sh |  2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/sha1-name.c b/sha1-name.c
index 6dda2c16df..36a6602696 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -415,7 +415,7 @@ static enum get_oid_result get_short_oid(const char *name, int len,
 					 struct object_id *oid,
 					 unsigned flags)
 {
-	int status;
+	int status, attempts = 0;
 	struct disambiguate_state ds;
 	int quietly = !!(flags & GET_OID_QUIETLY);
 
@@ -438,10 +438,21 @@ static enum get_oid_result get_short_oid(const char *name, int len,
 	else
 		ds.fn = default_disambiguate_hint;
 
+try_again:
 	find_short_object_filename(&ds);
 	find_short_packed_object(&ds);
 	status = finish_object_disambiguation(&ds, oid);
 
+	/*
+	 * If we did not find it, do the usual reprepare() slow-path, since the
+	 * object may have recently been added to the repository or migrated
+	 * from loose to packed.
+	 */
+	if (status == MISSING_OBJECT && !attempts++) {
+		reprepare_packed_git(the_repository);
+		goto try_again;
+	}
+
 	if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) {
 		struct oid_array collect = OID_ARRAY_INIT;
 
diff --git a/t/t3429-rebase-edit-todo.sh b/t/t3429-rebase-edit-todo.sh
index 83e5bb5eba..d5d2925a44 100755
--- a/t/t3429-rebase-edit-todo.sh
+++ b/t/t3429-rebase-edit-todo.sh
@@ -11,7 +11,7 @@ test_expect_success 'rebase exec modifies rebase-todo' '
 	test -e F
 '
 
-test_expect_failure SHA1 'loose object cache vs re-reading todo list' '
+test_expect_success SHA1 'loose object cache vs re-reading todo list' '
 	GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
 	export GIT_REBASE_TODO &&
 	write_script append-todo.sh <<-\EOS &&
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] get_oid(): when an object was not found, try harder
  2019-03-14  4:05         ` Junio C Hamano
@ 2019-03-14 18:55           ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-03-14 18:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

On Thu, Mar 14, 2019 at 01:05:03PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I'm not sure it's really worth addressing (just because I don't think
> > there's a good way to do it that isn't expensive).
> 
> I do not think so, either.  Not at this layer, anyway.
> 
> If a "-x" command newly created an object whose prefix makes it
> ambiguous against an existing object, we would not notice, unless we
> refresh the loose object cache, that we now have two objects that
> share the prefix.  And we will find an object (the older one) with
> the prefix in the stale cache without hitting "oh we do not find it,
> let's retry" codepath.  So to solve both sides of the coin, we'd
> need invalidation either way, which obviously defeats the whole
> caching concept, doesn't it?

Right, that was the conclusion I came to. You cannot solve it in all
cases without checking whether new objects have appeared for every
lookup.  I think that could be as cheap as a stat() on the pack
directory and one on the appropriate loose directory, but I suspect even
that might be measurably slower.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] get_oid(): when an object was not found, try harder
  2019-03-14 13:17     ` Johannes Schindelin
@ 2019-03-14 18:57       ` Jeff King
  2019-03-14 19:55         ` Johannes Schindelin
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2019-03-14 18:57 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Johannes Schindelin via GitGitGadget, git

On Thu, Mar 14, 2019 at 02:17:18PM +0100, Johannes Schindelin wrote:

> >  2. is there a way to cleanly avoid the three-line duplicate?
> [...]
> Peff tried with a function, but I think that this would actually be a
> really appropriate occasion for a well-placed `goto`:

I worried that a goto would be too confusing/ugly. But I am OK with it
if you are.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] get_oid(): when an object was not found, try harder
  2019-03-14 18:57       ` Jeff King
@ 2019-03-14 19:55         ` Johannes Schindelin
  0 siblings, 0 replies; 32+ messages in thread
From: Johannes Schindelin @ 2019-03-14 19:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Johannes Schindelin via GitGitGadget, git

Hi Peff,

On Thu, 14 Mar 2019, Jeff King wrote:

> On Thu, Mar 14, 2019 at 02:17:18PM +0100, Johannes Schindelin wrote:
> 
> > >  2. is there a way to cleanly avoid the three-line duplicate?
> > [...]
> > Peff tried with a function, but I think that this would actually be a
> > really appropriate occasion for a well-placed `goto`:
> 
> I worried that a goto would be too confusing/ugly. But I am OK with it
> if you are.

I hope that after reading the patch, you will agree with me that it is not
too confusing/ugly... ;-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-03-14 19:55 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-13 10:16 [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
2019-03-13 10:16 ` [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
2019-03-13 16:11   ` Ævar Arnfjörð Bjarmason
2019-03-13 16:35     ` Jeff King
2019-03-13 16:53       ` Jeff King
2019-03-13 22:40         ` Johannes Schindelin
2019-03-14  0:07           ` Jeff King
2019-03-13 22:27       ` Johannes Schindelin
2019-03-14  0:05         ` Jeff King
2019-03-14  6:40       ` Johannes Sixt
2019-03-14 13:06         ` Johannes Schindelin
2019-03-14  1:12   ` Junio C Hamano
2019-03-14 12:44     ` Johannes Schindelin
2019-03-13 10:16 ` [PATCH 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
2019-03-14  1:14   ` Junio C Hamano
2019-03-13 10:16 ` [PATCH 3/4] sequencer: move stale comment into correct location Johannes Schindelin via GitGitGadget
2019-03-13 10:16 ` [PATCH 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget
2019-03-14  1:29   ` Junio C Hamano
2019-03-14  2:22     ` Jeff King
2019-03-14  2:40       ` Jeff King
2019-03-14  4:05         ` Junio C Hamano
2019-03-14 18:55           ` Jeff King
2019-03-14  3:49       ` Junio C Hamano
2019-03-14 13:17     ` Johannes Schindelin
2019-03-14 18:57       ` Jeff King
2019-03-14 19:55         ` Johannes Schindelin
2019-03-13 15:33 ` [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Jeff King
2019-03-14 15:33 ` [PATCH v2 0/4] get_short_oid: " Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 2/4] sequencer: improve error message when an OID could not be parsed Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 3/4] sequencer: move stale comment into correct location Johannes Schindelin via GitGitGadget
2019-03-14 15:33   ` [PATCH v2 4/4] get_oid(): when an object was not found, try harder Johannes Schindelin via GitGitGadget

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).