git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, jonathantanmy@google.com, me@ttaylorr.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH] connected.c: reprepare packs for corner cases
Date: Thu, 12 Mar 2020 17:16:38 -0400	[thread overview]
Message-ID: <20200312211638.GA872402@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqzhcle23x.fsf@gitster.c.googlers.com>

On Thu, Mar 12, 2020 at 01:42:58PM -0700, Junio C Hamano wrote:

> > Placing a reprepare_packed_git() call inside chck_connected() before
> > looping through the packed_git list seems like the safest way to
> > avoid this issue in the future.
> 
> Hmmm.  I am not sure if I am convinced that check_connected() is the
> best place to do this.  Do we know the place that adds a new pack to
> the repository, yet forgets to add it to the packed-git list, that
> caused the failure you were observing?  Doing this change, without
> describing the answer to the question in the log message, makes it
> smell rather like a random hack than a designed solution to me.

Thanks, I was just writing a very similar message. :)

> If lazy fetching of objects happen in multiple fetches before a
> single check_connected() sweeps them to check for connectivity, then
> perhaps the lazy fetching codepath needs to remember the fact that
> it added a new pack that is still not known to the packed-git list
> (or just add it immediately, without having to scan at all), and
> check_connected() would need to rescan only when there is at least
> one such new pack?  That way, you do not have to penalize normal
> callers of check_connected() that do not use lazy fetches at all,
> right?

I share your concern that it's not the best place for this. In practice,
I do doubt that callers of check_connected() would notice, as it's a
pretty heavy-weight operation by itself. But I'd be concerned about
going the other way: we know that calling it in check_connected() fixes
_this_ problem, but we don't know if there's other code that would need
a similar fix.

I was able to reproduce easily from Stolee's instructions, but I did
notice one interesting thing: the problem occurs only with the http
protocol, not with git:// or ssh. The obvious difference between them is
that http code is mostly running in the remote-helper program and in a
spawned git-fetch-pack. But the check_connected() we hit is in the
parent git-fetch process. So I _suspect_ that there's some low-level
code which calls reprepare() that happens in-process in most cases, but
not for the http case.

If I instrument Git like this:

diff --git a/common-main.c b/common-main.c
index 71e21dd20a..bf6ffe42b7 100644
--- a/common-main.c
+++ b/common-main.c
@@ -49,6 +49,11 @@ int main(int argc, const char **argv)
 	trace2_cmd_start(argv);
 	trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP);
 
+	trace_printf("pid=%d, cmd=%s",
+		     (int)getpid(),
+		     !ends_with(argv[0], "git") ? argv[0] :
+		     argv[1] ? argv[1] : argv[0]);
+
 	result = cmd_main(argc, argv);
 
 	trace2_cmd_exit(result);
diff --git a/packfile.c b/packfile.c
index f4e752996d..832d020e6e 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1004,6 +1004,8 @@ void reprepare_packed_git(struct repository *r)
 {
 	struct object_directory *odb;
 
+	trace_printf("repreparing in pid %d", (int)getpid());
+
 	obj_read_lock();
 	for (odb = r->objects->odb; odb; odb = odb->next)
 		odb_clear_loose_cache(odb);

then running over the git protocol gives me:

  17:13:52.642337 common-main.c:52        pid=882782, cmd=fetch
  17:13:52.642871 git.c:439               trace: built-in: git fetch origin
  17:13:52.725971 packfile.c:1007         repreparing in pid 882782
  remote: Enumerating objects: 49, done.
  17:13:52.793355 run-command.c:663       trace: run_command: git index-pack --stdin -v --fix-thin --promisor --pack_header=2,49
  remote: Total 49 (delta 0), reused 0 (delta 0), pack-reused 49
  17:13:52.796106 common-main.c:52        pid=882784, cmd=index-pack
  17:13:52.796655 git.c:439               trace: built-in: git index-pack --stdin -v --fix-thin --promisor --pack_header=2,49
  Receiving objects: 100% (49/49), 6.87 KiB | 6.87 MiB/s, done.
  Resolving deltas: 100% (17/17), done.
  17:13:52.811302 packfile.c:1007         repreparing in pid 882782
  From git://github.com/derrickstolee/TreeSearch
   * [new branch]      master     -> origin/master
  17:13:52.812955 run-command.c:1616      run_processes_parallel: preparing to run up to 1 tasks
  17:13:52.812986 run-command.c:1648      run_processes_parallel: done
  17:13:52.813034 run-command.c:663       trace: run_command: git gc --auto
  17:13:52.815435 common-main.c:52        pid=882788, cmd=gc
  17:13:52.815941 git.c:439               trace: built-in: git gc --auto

There's an early reprepare, but the one after index-pack finishes
resolving deltas is probably the interesting one. And it happens inside
the fetch process. But if we go back to http, I get:

  17:15:28.234866 common-main.c:52        pid=882836, cmd=fetch
  17:15:28.235403 git.c:439               trace: built-in: git fetch origin
  17:15:28.236329 run-command.c:663       trace: run_command: GIT_DIR=.git git-remote-https origin https://github.com/derrickstolee/TreeSearch
  17:15:28.244049 common-main.c:52        pid=882837, cmd=/home/peff/compile/git/git-remote-https
  17:15:28.382586 packfile.c:1007         repreparing in pid 882836
  17:15:28.382996 run-command.c:663       trace: run_command: git fetch-pack --stateless-rpc --stdin --lock-pack --include-tag --thin --from-promisor --filter=blob:none https://github.com/derrickstolee/TreeSearch/
  17:15:28.386284 common-main.c:52        pid=882839, cmd=fetch-pack
  17:15:28.386807 git.c:439               trace: built-in: git fetch-pack --stateless-rpc --stdin --lock-pack --include-tag --thin --from-promisor --filter=blob:none https://github.com/derrickstolee/TreeSearch/
  remote: Enumerating objects: 49, done.
  remote: Total 49 (delta 0), reused 0 (delta 0), pack-reused 49
  17:15:28.444799 run-command.c:663       trace: run_command: git index-pack --stdin -v --fix-thin --promisor --pack_header=2,49
  17:15:28.447441 common-main.c:52        pid=882841, cmd=index-pack
  17:15:28.447944 git.c:439               trace: built-in: git index-pack --stdin -v --fix-thin --promisor --pack_header=2,49
  Receiving objects: 100% (49/49), 6.87 KiB | 3.44 MiB/s, done.
  Resolving deltas: 100% (17/17), done.
  17:15:28.461871 packfile.c:1007         repreparing in pid 882839
  error: https://github.com/derrickstolee/TreeSearch did not send all necessary objects
  
  17:15:28.463446 run-command.c:663       trace: run_command: git gc --auto
  17:15:28.463949 common-main.c:52        pid=882845, cmd=gc
  17:15:28.464056 git.c:439               trace: built-in: git gc --auto

There we see that same reprepare happen in 882839, which is the child
fetch-pack. The parent fetch probably needs to reprepare itself after
fetch-pack completes.

-Peff

  reply	other threads:[~2020-03-12 21:16 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-12 15:36 [PATCH] connected.c: reprepare packs for corner cases Derrick Stolee via GitGitGadget
2020-03-12 16:39 ` Jonathan Tan
2020-03-12 17:34   ` Derrick Stolee
2020-03-12 20:42 ` Junio C Hamano
2020-03-12 21:16   ` Jeff King [this message]
2020-03-12 21:26     ` Jeff King
2020-03-13  0:54       ` Derrick Stolee
2020-03-13  1:14         ` Junio C Hamano
2020-03-13  2:30         ` Jeff King
2020-03-13  2:34           ` Jeff King
2020-03-13 12:43             ` Derrick Stolee
2020-03-13 21:11 ` [PATCH v2] " Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200312211638.GA872402@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).