git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] fetch-pack: write fetched refs to .promisor
@ 2019-08-26 21:47 Jonathan Tan
  2019-08-27 20:27 ` Junio C Hamano
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Jonathan Tan @ 2019-08-26 21:47 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

The specification of promisor packfiles (in partial-clone.txt) states
that the .promisor files that accompany packfiles do not matter (just
like .keep files), so whenever a packfile is fetched from the promisor
remote, Git has been writing empty .promisor files. But these files
could contain more useful information.

So instead of writing empty files, write the refs fetched to these
files. This makes it easier to debug issues with partial clones, as we
can identify what refs (and their associated hashes) were fetched at the
time the packfile was downloaded, and if necessary, compare those hashes
against what the promisor remote reports now.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
As written in the NEEDSWORK comment, repack does not preserve the
contents of .promisor files, but I thought I'd send this out anyway as
this change is already useful for users who don't run repack much.
---
 builtin/repack.c         |  5 +++++
 fetch-pack.c             | 41 ++++++++++++++++++++++++++++++++++++----
 t/t5616-partial-clone.sh |  8 ++++++++
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 632c0c0a79..8c1621d414 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -232,6 +232,11 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
 		/*
 		 * pack-objects creates the .pack and .idx files, but not the
 		 * .promisor file. Create the .promisor file, which is empty.
+		 *
+		 * NEEDSWORK: fetch-pack generates non-empty .promisor files,
+		 * but this would not preserve their contents. Maybe
+		 * concatenate the contents of all .promisor files instead of
+		 * just creating a new empty file.
 		 */
 		promisor_name = mkpathdup("%s-%s.promisor", packtmp,
 					  line.buf);
diff --git a/fetch-pack.c b/fetch-pack.c
index 65be043f2a..07029e1bbf 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -758,8 +758,33 @@ static int sideband_demux(int in, int out, void *data)
 	return ret;
 }
 
+static void write_promisor_file(const char *keep_name,
+				struct ref **sought, int nr_sought)
+{
+	struct strbuf promisor_name = STRBUF_INIT;
+	int suffix_stripped;
+	FILE *output;
+	int i;
+
+	strbuf_addstr(&promisor_name, keep_name);
+	suffix_stripped = strbuf_strip_suffix(&promisor_name, ".keep");
+	if (!suffix_stripped)
+		BUG("name of pack lockfile should end with .keep (was '%s')",
+		    keep_name);
+	strbuf_addstr(&promisor_name, ".promisor");
+
+	output = xfopen(promisor_name.buf, "w");
+	for (i = 0; i < nr_sought; i++)
+		fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
+			sought[i]->name);
+	fclose(output);
+
+	strbuf_release(&promisor_name);
+}
+
 static int get_pack(struct fetch_pack_args *args,
-		    int xd[2], char **pack_lockfile)
+		    int xd[2], char **pack_lockfile,
+		    struct ref **sought, int nr_sought)
 {
 	struct async demux;
 	int do_keep = args->keep_pack;
@@ -821,7 +846,13 @@ static int get_pack(struct fetch_pack_args *args,
 		}
 		if (args->check_self_contained_and_connected)
 			argv_array_push(&cmd.args, "--check-self-contained-and-connected");
-		if (args->from_promisor)
+		/*
+		 * If we're obtaining the filename of a lockfile, we'll use
+		 * that filename to write a .promisor file with more
+		 * information below. If not, we need index-pack to do it for
+		 * us.
+		 */
+		if (!(do_keep && pack_lockfile) && args->from_promisor)
 			argv_array_push(&cmd.args, "--promisor");
 	}
 	else {
@@ -859,6 +890,8 @@ static int get_pack(struct fetch_pack_args *args,
 		die(_("fetch-pack: unable to fork off %s"), cmd_name);
 	if (do_keep && pack_lockfile) {
 		*pack_lockfile = index_pack_lockfile(cmd.out);
+		if (args->from_promisor)
+			write_promisor_file(*pack_lockfile, sought, nr_sought);
 		close(cmd.out);
 	}
 
@@ -1009,7 +1042,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		alternate_shallow_file = setup_temporary_shallow(si->shallow);
 	else
 		alternate_shallow_file = NULL;
-	if (get_pack(args, fd, pack_lockfile))
+	if (get_pack(args, fd, pack_lockfile, sought, nr_sought))
 		die(_("git fetch-pack: fetch failed."));
 
  all_done:
@@ -1458,7 +1491,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 
 			/* get the pack */
 			process_section_header(&reader, "packfile", 0);
-			if (get_pack(args, fd, pack_lockfile))
+			if (get_pack(args, fd, pack_lockfile, sought, nr_sought))
 				die(_("git fetch-pack: fetch failed."));
 
 			state = FETCH_DONE;
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 565254558f..486db27ee0 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -46,6 +46,14 @@ test_expect_success 'do partial clone 1' '
 	test "$(git -C pc1 config --local core.partialclonefilter)" = "blob:none"
 '
 
+test_expect_success 'verify that .promisor file contains refs fetched' '
+	ls pc1/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	git -C srv.bare rev-list HEAD >headhash &&
+	grep "$(cat headhash) HEAD" $(cat promisorlist) &&
+	grep "$(cat headhash) refs/heads/master" $(cat promisorlist)
+'
+
 # checkout master to force dynamic object fetch of blobs at HEAD.
 test_expect_success 'verify checkout with dynamic object fetch' '
 	git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
-- 
2.23.0.187.g17f5b7556c-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch-pack: write fetched refs to .promisor
  2019-08-26 21:47 [PATCH] fetch-pack: write fetched refs to .promisor Jonathan Tan
@ 2019-08-27 20:27 ` Junio C Hamano
  2019-08-27 21:50   ` Jonathan Tan
  2019-09-05  7:01 ` Jeff King
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2019-08-27 20:27 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> As written in the NEEDSWORK comment, repack does not preserve the
> contents of .promisor files, but I thought I'd send this out anyway as
> this change is already useful for users who don't run repack much.

What do you exactly mean by "much" here?  The comment sounds like it
is saying "running this code once and you'd make the commits and
objects that were depending on the existing promisor invalid", in
which case it would be more like "it is already useful for users
until they run their first repack that destroyes their repository",
but certainly that is not what we want to do, so...

> +test_expect_success 'verify that .promisor file contains refs fetched' '
> +	ls pc1/.git/objects/pack/pack-*.promisor >promisorlist &&
> +	test_line_count = 1 promisorlist &&
> +	git -C srv.bare rev-list HEAD >headhash &&
> +	grep "$(cat headhash) HEAD" $(cat promisorlist) &&
> +	grep "$(cat headhash) refs/heads/master" $(cat promisorlist)
> +'
> +
>  # checkout master to force dynamic object fetch of blobs at HEAD.
>  test_expect_success 'verify checkout with dynamic object fetch' '
>  	git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch-pack: write fetched refs to .promisor
  2019-08-27 20:27 ` Junio C Hamano
@ 2019-08-27 21:50   ` Jonathan Tan
  0 siblings, 0 replies; 12+ messages in thread
From: Jonathan Tan @ 2019-08-27 21:50 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > As written in the NEEDSWORK comment, repack does not preserve the
> > contents of .promisor files, but I thought I'd send this out anyway as
> > this change is already useful for users who don't run repack much.
> 
> What do you exactly mean by "much" here?

For diagnostic information to be preserved, the user must not have run
repack between the fetch and the discovery of a problem. Admittedly,
this is probablistic, but if the user never GCs (for example), this
would work.

> The comment sounds like it
> is saying "running this code once and you'd make the commits and
> objects that were depending on the existing promisor invalid", in
> which case it would be more like "it is already useful for users
> until they run their first repack that destroyes their repository",
> but certainly that is not what we want to do, so...

To be clear, repacks will not destroy their repository, whether before
or after this change. Before and after this change, a repack will just
collect all promisor objects from all promisor packs (that is, the ones
with .promisor) into one single pack, and then generate an empty
.promisor file to indicate that the new single pack is a promisor pack.
The difference is that before this change, Git does not write anything
into the .promisor file (at least for fetches), so nothing is lost. With
this change, we now write something for fetches, so something is lost
(since we delete all the old packs, including the .promisor files).

But the only thing lost is diagnostic information (for humans - to
diagnose, the user will need to open the .promisor file in a text
editor) - commits/objects are still valid, and the repository is not
destroyed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch-pack: write fetched refs to .promisor
  2019-08-26 21:47 [PATCH] fetch-pack: write fetched refs to .promisor Jonathan Tan
  2019-08-27 20:27 ` Junio C Hamano
@ 2019-09-05  7:01 ` Jeff King
  2019-09-05 17:13   ` Junio C Hamano
  2019-09-05 18:39   ` Jonathan Tan
  2019-10-14 22:27 ` Josh Steadmon
  2019-10-15  0:12 ` [PATCH v2] " Jonathan Tan
  3 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2019-09-05  7:01 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Mon, Aug 26, 2019 at 02:47:37PM -0700, Jonathan Tan wrote:

> The specification of promisor packfiles (in partial-clone.txt) states
> that the .promisor files that accompany packfiles do not matter (just
> like .keep files), so whenever a packfile is fetched from the promisor
> remote, Git has been writing empty .promisor files. But these files
> could contain more useful information.
> 
> So instead of writing empty files, write the refs fetched to these
> files. This makes it easier to debug issues with partial clones, as we
> can identify what refs (and their associated hashes) were fetched at the
> time the packfile was downloaded, and if necessary, compare those hashes
> against what the promisor remote reports now.

I'm not really opposed to what you're doing here, but I did recently
think of another possible use for .promisor files. So it seems like a
good time to bring it up, since presumably we'd have to choose one or
the other.

I noticed when playing with partial clones that the client may sometimes
pause for a while, chewing CPU. The culprit is is_promisor_object(),
which generates the list of known promisor objects by opening every
object we _do_ have to find out which ones they mention.

I know one of the original design features of the promisor pack was that
the client would _not_ keep a list of all of the objects it didn't have.
But I wonder if it would make sense to keep a cache of these "cut
points" in the partial clone. That's potentially smaller than the
complete set of objects (especially for tree-based partial cloning), and
it seems clear we're willing to store it in memory anyway.

And if we do that, would the .promisor file for a pack be a good place
to store it?

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch-pack: write fetched refs to .promisor
  2019-09-05  7:01 ` Jeff King
@ 2019-09-05 17:13   ` Junio C Hamano
  2019-09-05 17:59     ` Jeff King
  2019-09-05 18:39   ` Jonathan Tan
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2019-09-05 17:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, git

Jeff King <peff@peff.net> writes:

> But I wonder if it would make sense to keep a cache of these "cut
> points" in the partial clone. That's potentially smaller than the
> complete set of objects (especially for tree-based partial cloning), and
> it seems clear we're willing to store it in memory anyway.

That sounds in line with how "shallow" gives us cut points in the
history, but then would we end up listing a handful of cut-point
objects for each and every commit in the history?  That still may be
a lot cheaper than computing the same set of cut-point objects every
time, though.

> And if we do that, would the .promisor file for a pack be a good place
> to store it?
>
> -Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch-pack: write fetched refs to .promisor
  2019-09-05 17:13   ` Junio C Hamano
@ 2019-09-05 17:59     ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2019-09-05 17:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git

On Thu, Sep 05, 2019 at 10:13:24AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > But I wonder if it would make sense to keep a cache of these "cut
> > points" in the partial clone. That's potentially smaller than the
> > complete set of objects (especially for tree-based partial cloning), and
> > it seems clear we're willing to store it in memory anyway.
> 
> That sounds in line with how "shallow" gives us cut points in the
> history, but then would we end up listing a handful of cut-point
> objects for each and every commit in the history?  That still may be
> a lot cheaper than computing the same set of cut-point objects every
> time, though.

I think it would be one set of cut points for the whole pack. There
would potentially be a lot of overlap between commits (e.g., if you omit
blob X, then every commit after it was added, even if it doesn't touch
it, mentions X in its tree).

It would also make sense to omit objects from the list that are actually
in the pack (because we can trivially already know they're promisor
objects by finding them in the pack's .idx).

Which means that in the case of blob filters, the set of cut points is
identical to the set of omitted objects (because blobs can't reference
other objects). For a sparse-path filter that omits a whole tree like
"!/foo", we'd end up with a list of all of the oids in history that were
ever at the "foo" entry, but nothing below that.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch-pack: write fetched refs to .promisor
  2019-09-05  7:01 ` Jeff King
  2019-09-05 17:13   ` Junio C Hamano
@ 2019-09-05 18:39   ` Jonathan Tan
  2019-10-02 16:03     ` Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: Jonathan Tan @ 2019-09-05 18:39 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git

> I'm not really opposed to what you're doing here, but I did recently
> think of another possible use for .promisor files. So it seems like a
> good time to bring it up, since presumably we'd have to choose one or
> the other.

Thanks for bringing it up - yes, we should discuss this.

> I noticed when playing with partial clones that the client may sometimes
> pause for a while, chewing CPU. The culprit is is_promisor_object(),
> which generates the list of known promisor objects by opening every
> object we _do_ have to find out which ones they mention.
> 
> I know one of the original design features of the promisor pack was that
> the client would _not_ keep a list of all of the objects it didn't have.
> But I wonder if it would make sense to keep a cache of these "cut
> points" in the partial clone. That's potentially smaller than the
> complete set of objects (especially for tree-based partial cloning), and
> it seems clear we're willing to store it in memory anyway.

Well, before the current design was implemented, I had a design that had
such a list of missing objects. :-) I couldn't find a writeup, but here
is some preliminary code [1]. In that code, as far as I can tell, the
server gives us the list directly during fetch and the client merges it
with a repository-wide file called $GIT_DIR/objects/promisedblob, but we
don't have to follow the design (we could lazily generate the file, have
per-packfile promisedblob files, etc.).

[1] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/

> And if we do that, would the .promisor file for a pack be a good place
> to store it?

After looking at [1], it might be better in another place. If we want to
preserve fast fetches, we still need another file to indicate that the
pack is a promisor, so ".promisor" seems good for that. The presence or
absence of the cutoff points is a separate issue and could go into a
separate file, and it might be worth putting all cutoff points into a
single per-repository file too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch-pack: write fetched refs to .promisor
  2019-09-05 18:39   ` Jonathan Tan
@ 2019-10-02 16:03     ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2019-10-02 16:03 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Thu, Sep 05, 2019 at 11:39:26AM -0700, Jonathan Tan wrote:

> > I'm not really opposed to what you're doing here, but I did recently
> > think of another possible use for .promisor files. So it seems like a
> > good time to bring it up, since presumably we'd have to choose one or
> > the other.
> 
> Thanks for bringing it up - yes, we should discuss this.

Sorry for starting a discussion and then abandoning it. :)

> > I know one of the original design features of the promisor pack was that
> > the client would _not_ keep a list of all of the objects it didn't have.
> > But I wonder if it would make sense to keep a cache of these "cut
> > points" in the partial clone. That's potentially smaller than the
> > complete set of objects (especially for tree-based partial cloning), and
> > it seems clear we're willing to store it in memory anyway.
> 
> Well, before the current design was implemented, I had a design that had
> such a list of missing objects. :-) I couldn't find a writeup, but here
> is some preliminary code [1]. In that code, as far as I can tell, the
> server gives us the list directly during fetch and the client merges it
> with a repository-wide file called $GIT_DIR/objects/promisedblob, but we
> don't have to follow the design (we could lazily generate the file, have
> per-packfile promisedblob files, etc.).
> 
> [1] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/

This was also a feature of my very old "external odb" patches. In fact,
there the server told the client the type and size, which let us easily
know that many objects weren't worth fetching (e.g., a large blob would
be marked as binary and skipped for diffing, without the diff code even
having to care about "is it a promisor object that we don't have?").

For just omitting some blobs, I think that carrying extra information
(either just the set of blobs, or even some basic meta-information) is
valuable. But for "narrow" or "cone" clones, the current system of
implied promisors is pretty nice, because then the client effort
literally does (or could) scale with the size of their cone, independent
of the number of objects outside their cone.

> > And if we do that, would the .promisor file for a pack be a good place
> > to store it?
> 
> After looking at [1], it might be better in another place. If we want to
> preserve fast fetches, we still need another file to indicate that the
> pack is a promisor, so ".promisor" seems good for that. The presence or
> absence of the cutoff points is a separate issue and could go into a
> separate file, and it might be worth putting all cutoff points into a
> single per-repository file too.

OK, that makes sense. Thanks for giving it some thought.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch-pack: write fetched refs to .promisor
  2019-08-26 21:47 [PATCH] fetch-pack: write fetched refs to .promisor Jonathan Tan
  2019-08-27 20:27 ` Junio C Hamano
  2019-09-05  7:01 ` Jeff King
@ 2019-10-14 22:27 ` Josh Steadmon
  2019-10-14 23:56   ` Jonathan Tan
  2019-10-15  0:12 ` [PATCH v2] " Jonathan Tan
  3 siblings, 1 reply; 12+ messages in thread
From: Josh Steadmon @ 2019-10-14 22:27 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

I have a few questions below, but they're probably due to lack of a full
understanding on my part of how packfiles are managed.

On 2019.08.26 14:47, Jonathan Tan wrote:
> The specification of promisor packfiles (in partial-clone.txt) states
> that the .promisor files that accompany packfiles do not matter (just
> like .keep files), so whenever a packfile is fetched from the promisor
> remote, Git has been writing empty .promisor files. But these files
> could contain more useful information.
> 
> So instead of writing empty files, write the refs fetched to these
> files. This makes it easier to debug issues with partial clones, as we
> can identify what refs (and their associated hashes) were fetched at the
> time the packfile was downloaded, and if necessary, compare those hashes
> against what the promisor remote reports now.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> As written in the NEEDSWORK comment, repack does not preserve the
> contents of .promisor files, but I thought I'd send this out anyway as
> this change is already useful for users who don't run repack much.
> ---
>  builtin/repack.c         |  5 +++++
>  fetch-pack.c             | 41 ++++++++++++++++++++++++++++++++++++----
>  t/t5616-partial-clone.sh |  8 ++++++++
>  3 files changed, 50 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 632c0c0a79..8c1621d414 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -232,6 +232,11 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
>  		/*
>  		 * pack-objects creates the .pack and .idx files, but not the
>  		 * .promisor file. Create the .promisor file, which is empty.
> +		 *
> +		 * NEEDSWORK: fetch-pack generates non-empty .promisor files,
> +		 * but this would not preserve their contents. Maybe
> +		 * concatenate the contents of all .promisor files instead of
> +		 * just creating a new empty file.
>  		 */
>  		promisor_name = mkpathdup("%s-%s.promisor", packtmp,
>  					  line.buf);

Since this is just diagnostic information, it seems fine. Maybe
explicitly note in the comment what information is being lost?

> diff --git a/fetch-pack.c b/fetch-pack.c
> index 65be043f2a..07029e1bbf 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -758,8 +758,33 @@ static int sideband_demux(int in, int out, void *data)
>  	return ret;
>  }
>  
> +static void write_promisor_file(const char *keep_name,
> +				struct ref **sought, int nr_sought)
> +{
> +	struct strbuf promisor_name = STRBUF_INIT;
> +	int suffix_stripped;
> +	FILE *output;
> +	int i;
> +
> +	strbuf_addstr(&promisor_name, keep_name);
> +	suffix_stripped = strbuf_strip_suffix(&promisor_name, ".keep");
> +	if (!suffix_stripped)
> +		BUG("name of pack lockfile should end with .keep (was '%s')",
> +		    keep_name);
> +	strbuf_addstr(&promisor_name, ".promisor");
> +
> +	output = xfopen(promisor_name.buf, "w");
> +	for (i = 0; i < nr_sought; i++)
> +		fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
> +			sought[i]->name);
> +	fclose(output);
> +
> +	strbuf_release(&promisor_name);
> +}
> +

I am not sure why we want to tie creating the .promisor to creating the
lockfile. I'll keep reading and see if it becomes clear later. Other
than that, the logic here seems clear.

>  static int get_pack(struct fetch_pack_args *args,
> -		    int xd[2], char **pack_lockfile)
> +		    int xd[2], char **pack_lockfile,
> +		    struct ref **sought, int nr_sought)
>  {
>  	struct async demux;
>  	int do_keep = args->keep_pack;
> @@ -821,7 +846,13 @@ static int get_pack(struct fetch_pack_args *args,
>  		}
>  		if (args->check_self_contained_and_connected)
>  			argv_array_push(&cmd.args, "--check-self-contained-and-connected");
> -		if (args->from_promisor)
> +		/*
> +		 * If we're obtaining the filename of a lockfile, we'll use
> +		 * that filename to write a .promisor file with more
> +		 * information below. If not, we need index-pack to do it for
> +		 * us.
> +		 */
> +		if (!(do_keep && pack_lockfile) && args->from_promisor)
>  			argv_array_push(&cmd.args, "--promisor");
>  	}
>  	else {

This makes me wonder why we don't also change index-pack to write a
similar message to the .promisor. I guess there's potentially too much
information to shove all the refs on the command-line?

> @@ -859,6 +890,8 @@ static int get_pack(struct fetch_pack_args *args,
>  		die(_("fetch-pack: unable to fork off %s"), cmd_name);
>  	if (do_keep && pack_lockfile) {
>  		*pack_lockfile = index_pack_lockfile(cmd.out);
> +		if (args->from_promisor)
> +			write_promisor_file(*pack_lockfile, sought, nr_sought);
>  		close(cmd.out);
>  	}
>  

Apart from using the lockfile name as the base for the .promisor
filename, I'm still not seeing why we need to tie this to the fact that
we're creating a lockfile. Could we instead just unconditionally create
the .promisor when args->from_promisor is set, and then remove the logic
in the previous chunk that adds the "--promisor" flag to the index-pack
call?

> @@ -1009,7 +1042,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
>  		alternate_shallow_file = setup_temporary_shallow(si->shallow);
>  	else
>  		alternate_shallow_file = NULL;
> -	if (get_pack(args, fd, pack_lockfile))
> +	if (get_pack(args, fd, pack_lockfile, sought, nr_sought))
>  		die(_("git fetch-pack: fetch failed."));
>  
>   all_done:
> @@ -1458,7 +1491,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  
>  			/* get the pack */
>  			process_section_header(&reader, "packfile", 0);
> -			if (get_pack(args, fd, pack_lockfile))
> +			if (get_pack(args, fd, pack_lockfile, sought, nr_sought))
>  				die(_("git fetch-pack: fetch failed."));
>  
>  			state = FETCH_DONE;
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index 565254558f..486db27ee0 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -46,6 +46,14 @@ test_expect_success 'do partial clone 1' '
>  	test "$(git -C pc1 config --local core.partialclonefilter)" = "blob:none"
>  '
>  
> +test_expect_success 'verify that .promisor file contains refs fetched' '
> +	ls pc1/.git/objects/pack/pack-*.promisor >promisorlist &&
> +	test_line_count = 1 promisorlist &&
> +	git -C srv.bare rev-list HEAD >headhash &&
> +	grep "$(cat headhash) HEAD" $(cat promisorlist) &&
> +	grep "$(cat headhash) refs/heads/master" $(cat promisorlist)
> +'
> +
>  # checkout master to force dynamic object fetch of blobs at HEAD.
>  test_expect_success 'verify checkout with dynamic object fetch' '
>  	git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
> -- 
> 2.23.0.187.g17f5b7556c-goog
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch-pack: write fetched refs to .promisor
  2019-10-14 22:27 ` Josh Steadmon
@ 2019-10-14 23:56   ` Jonathan Tan
  0 siblings, 0 replies; 12+ messages in thread
From: Jonathan Tan @ 2019-10-14 23:56 UTC (permalink / raw)
  To: steadmon; +Cc: jonathantanmy, git

Thanks for your comments. Rearranging them:

> This makes me wonder why we don't also change index-pack to write a
> similar message to the .promisor. I guess there's potentially too much
> information to shove all the refs on the command-line?

index-pack already is capable of writing messages to .promisor using the
"--promisor" argument. You're right that I'm not using that because I
don't want to run into argument length limits.

> I am not sure why we want to tie creating the .promisor to creating the
> lockfile. I'll keep reading and see if it becomes clear later. Other
> than that, the logic here seems clear.

[snip]

> Apart from using the lockfile name as the base for the .promisor
> filename, I'm still not seeing why we need to tie this to the fact that
> we're creating a lockfile. Could we instead just unconditionally create
> the .promisor when args->from_promisor is set, and then remove the logic
> in the previous chunk that adds the "--promisor" flag to the index-pack
> call?

I'm tying the promisor to the lockfile to avoid overcomplicating things:
fetch-pack currently reads filename information from index-pack only
when there is a lockfile. (It could do so even when there is no
lockfile, but it currently does not.) We need this filename to know what
to call the ".promisor" file.

And the situation that I'm interested in - when the user fetches with
"git fetch" from http or ssh - always uses a lockfile (see
fetch_refs_via_pack() in transport.c). So I'm writing additional data in
this case, and falling back on the "--promisor" flag otherwise. I'll
elaborate in the commit message.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] fetch-pack: write fetched refs to .promisor
  2019-08-26 21:47 [PATCH] fetch-pack: write fetched refs to .promisor Jonathan Tan
                   ` (2 preceding siblings ...)
  2019-10-14 22:27 ` Josh Steadmon
@ 2019-10-15  0:12 ` Jonathan Tan
  2019-10-15 19:40   ` Josh Steadmon
  3 siblings, 1 reply; 12+ messages in thread
From: Jonathan Tan @ 2019-10-15  0:12 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, steadmon

The specification of promisor packfiles (in partial-clone.txt) states
that the .promisor files that accompany packfiles do not matter (just
like .keep files), so whenever a packfile is fetched from the promisor
remote, Git has been writing empty .promisor files. But these files
could contain more useful information.

So instead of writing empty files, write the refs fetched to these
files. This makes it easier to debug issues with partial clones, as we
can identify what refs (and their associated hashes) were fetched at the
time the packfile was downloaded, and if necessary, compare those hashes
against what the promisor remote reports now.

This is implemented by teaching fetch-pack to write its own non-empty
.promisor file whenever it knows the name of the pack's lockfile. This
covers the case wherein the user runs "git fetch" with an internal
protocol or HTTP protocol v2 (fetch_refs_via_pack() in transport.c sets
lock_pack) and with HTTP protocol v0/v1 (fetch_git() in remote-curl.c
passes "--lock-pack" to "fetch-pack").

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Changes from v1:
 - commit message explains scope of change ("it knows the name of the
   pack's lockfile)
 - explained what .promisor contains in comment in builtin/repack.c
 - moved .promisor writing until after we know that index-pack succeeded
---
 builtin/repack.c         |  7 ++++++
 fetch-pack.c             | 47 ++++++++++++++++++++++++++++++++++++----
 t/t5616-partial-clone.sh |  8 +++++++
 3 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 094c2f8ea4..78b23d7a9a 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -233,6 +233,13 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
 		/*
 		 * pack-objects creates the .pack and .idx files, but not the
 		 * .promisor file. Create the .promisor file, which is empty.
+		 *
+		 * NEEDSWORK: fetch-pack sometimes generates non-empty
+		 * .promisor files containing the ref names and associated
+		 * hashes at the point of generation of the corresponding
+		 * packfile, but this would not preserve their contents. Maybe
+		 * concatenate the contents of all .promisor files instead of
+		 * just creating a new empty file.
 		 */
 		promisor_name = mkpathdup("%s-%s.promisor", packtmp,
 					  line.buf);
diff --git a/fetch-pack.c b/fetch-pack.c
index 947da545de..b9e63b52ff 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -754,8 +754,33 @@ static int sideband_demux(int in, int out, void *data)
 	return ret;
 }
 
+static void write_promisor_file(const char *keep_name,
+				struct ref **sought, int nr_sought)
+{
+	struct strbuf promisor_name = STRBUF_INIT;
+	int suffix_stripped;
+	FILE *output;
+	int i;
+
+	strbuf_addstr(&promisor_name, keep_name);
+	suffix_stripped = strbuf_strip_suffix(&promisor_name, ".keep");
+	if (!suffix_stripped)
+		BUG("name of pack lockfile should end with .keep (was '%s')",
+		    keep_name);
+	strbuf_addstr(&promisor_name, ".promisor");
+
+	output = xfopen(promisor_name.buf, "w");
+	for (i = 0; i < nr_sought; i++)
+		fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
+			sought[i]->name);
+	fclose(output);
+
+	strbuf_release(&promisor_name);
+}
+
 static int get_pack(struct fetch_pack_args *args,
-		    int xd[2], char **pack_lockfile)
+		    int xd[2], char **pack_lockfile,
+		    struct ref **sought, int nr_sought)
 {
 	struct async demux;
 	int do_keep = args->keep_pack;
@@ -817,7 +842,13 @@ static int get_pack(struct fetch_pack_args *args,
 		}
 		if (args->check_self_contained_and_connected)
 			argv_array_push(&cmd.args, "--check-self-contained-and-connected");
-		if (args->from_promisor)
+		/*
+		 * If we're obtaining the filename of a lockfile, we'll use
+		 * that filename to write a .promisor file with more
+		 * information below. If not, we need index-pack to do it for
+		 * us.
+		 */
+		if (!(do_keep && pack_lockfile) && args->from_promisor)
 			argv_array_push(&cmd.args, "--promisor");
 	}
 	else {
@@ -871,6 +902,14 @@ static int get_pack(struct fetch_pack_args *args,
 		die(_("%s failed"), cmd_name);
 	if (use_sideband && finish_async(&demux))
 		die(_("error in sideband demultiplexer"));
+
+	/*
+	 * Now that index-pack has succeeded, write the promisor file using the
+	 * obtained .keep filename if necessary
+	 */
+	if (do_keep && pack_lockfile && args->from_promisor)
+		write_promisor_file(*pack_lockfile, sought, nr_sought);
+
 	return 0;
 }
 
@@ -1006,7 +1045,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		alternate_shallow_file = setup_temporary_shallow(si->shallow);
 	else
 		alternate_shallow_file = NULL;
-	if (get_pack(args, fd, pack_lockfile))
+	if (get_pack(args, fd, pack_lockfile, sought, nr_sought))
 		die(_("git fetch-pack: fetch failed."));
 
  all_done:
@@ -1453,7 +1492,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 
 			/* get the pack */
 			process_section_header(&reader, "packfile", 0);
-			if (get_pack(args, fd, pack_lockfile))
+			if (get_pack(args, fd, pack_lockfile, sought, nr_sought))
 				die(_("git fetch-pack: fetch failed."));
 
 			state = FETCH_DONE;
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 79f7b65f8c..eaa33a852b 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -46,6 +46,14 @@ test_expect_success 'do partial clone 1' '
 	test "$(git -C pc1 config --local remote.origin.partialclonefilter)" = "blob:none"
 '
 
+test_expect_success 'verify that .promisor file contains refs fetched' '
+	ls pc1/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	git -C srv.bare rev-list HEAD >headhash &&
+	grep "$(cat headhash) HEAD" $(cat promisorlist) &&
+	grep "$(cat headhash) refs/heads/master" $(cat promisorlist)
+'
+
 # checkout master to force dynamic object fetch of blobs at HEAD.
 test_expect_success 'verify checkout with dynamic object fetch' '
 	git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] fetch-pack: write fetched refs to .promisor
  2019-10-15  0:12 ` [PATCH v2] " Jonathan Tan
@ 2019-10-15 19:40   ` Josh Steadmon
  0 siblings, 0 replies; 12+ messages in thread
From: Josh Steadmon @ 2019-10-15 19:40 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On 2019.10.14 17:12, Jonathan Tan wrote:
> The specification of promisor packfiles (in partial-clone.txt) states
> that the .promisor files that accompany packfiles do not matter (just
> like .keep files), so whenever a packfile is fetched from the promisor
> remote, Git has been writing empty .promisor files. But these files
> could contain more useful information.
> 
> So instead of writing empty files, write the refs fetched to these
> files. This makes it easier to debug issues with partial clones, as we
> can identify what refs (and their associated hashes) were fetched at the
> time the packfile was downloaded, and if necessary, compare those hashes
> against what the promisor remote reports now.
> 
> This is implemented by teaching fetch-pack to write its own non-empty
> .promisor file whenever it knows the name of the pack's lockfile. This
> covers the case wherein the user runs "git fetch" with an internal
> protocol or HTTP protocol v2 (fetch_refs_via_pack() in transport.c sets
> lock_pack) and with HTTP protocol v0/v1 (fetch_git() in remote-curl.c
> passes "--lock-pack" to "fetch-pack").
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> Changes from v1:
>  - commit message explains scope of change ("it knows the name of the
>    pack's lockfile)
>  - explained what .promisor contains in comment in builtin/repack.c
>  - moved .promisor writing until after we know that index-pack succeeded
> ---

Thanks, this looks good to me.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-10-15 19:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26 21:47 [PATCH] fetch-pack: write fetched refs to .promisor Jonathan Tan
2019-08-27 20:27 ` Junio C Hamano
2019-08-27 21:50   ` Jonathan Tan
2019-09-05  7:01 ` Jeff King
2019-09-05 17:13   ` Junio C Hamano
2019-09-05 17:59     ` Jeff King
2019-09-05 18:39   ` Jonathan Tan
2019-10-02 16:03     ` Jeff King
2019-10-14 22:27 ` Josh Steadmon
2019-10-14 23:56   ` Jonathan Tan
2019-10-15  0:12 ` [PATCH v2] " Jonathan Tan
2019-10-15 19:40   ` Josh Steadmon

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).