git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Emily Shaffer <emilyshaffer@google.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, sandals@crustytoothpaste.net,
	steadmon@google.com, jrnieder@gmail.com, peff@peff.net,
	congdanhqx@gmail.com, phillip.wood123@gmail.com,
	sluongng@gmail.com, jonathantanmy@google.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 2/9] maintenance: add prefetch task
Date: Thu, 13 Aug 2020 21:28:33 -0400	[thread overview]
Message-ID: <e96b2eec-0746-1b00-4b8b-ce737cdce967@gmail.com> (raw)
In-Reply-To: <20200812231036.GH2965447@google.com>

On 8/12/2020 7:10 PM, Emily Shaffer wrote:
> On Thu, Aug 06, 2020 at 04:30:17PM +0000, Derrick Stolee via GitGitGadget wrote:
>>
>> 3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
>>    we can ensure that we actually load the new values somewhere in
>>    our refspace while not updating refs/heads or refs/remotes. By
>>    storing these refs here, the commit-graph job will update the
>>    commit-graph with the commits from these hidden refs.
> 
> [emily] How does the content of refs/prefetch/* get used?
> [jrnieder] How does the content of refs/prefetch/* get cleaned up?
> [jonathantan] refs/prefetch/* will get used in negotiation so it is
> useful to keep these.
>>
>> 4. --prune will delete the refs/prefetch/<remote> refs that no
>>    longer appear on the remote.
> [jrnieder] this is what cleans up refs/prefetch/* later :)
> 
>> +static int fetch_remote(const char *remote, struct maintenance_opts *opts)
>> +{
>> +	struct child_process child = CHILD_PROCESS_INIT;
>> +
>> +	child.git_cmd = 1;
>> +	strvec_pushl(&child.args, "fetch", remote, "--prune", "--no-tags",
>> +		     "--no-write-fetch-head", "--refmap=", NULL);
> [jonathantan] It would be good to pass --recurse-submodules=no.
> [jrnieder] Since we are specifying the refmap here, if we were to
> recurse into submodules, we'd have to be careful about making sure
> refmap gets propagated correctly.

Good point.

>> +static int fill_each_remote(struct remote *remote, void *cbdata)
> [jrnieder] Since this is a callback that happens for each remote, maybe
> this should be named to indicate it only fills one remote at a time
> instead. Nit :)

Sure. append_remote() is better.

>> +{
>> +	struct string_list *remotes = (struct string_list *)cbdata;
>> +
>> +	string_list_append(remotes, remote->name);
>> +	return 0;
>> +}
>> +
>> +static int maintenance_task_prefetch(struct maintenance_opts *opts)
>> +{
>> +	int result = 0;
>> +	struct string_list_item *item;
>> +	struct string_list remotes = STRING_LIST_INIT_DUP;
>> +
>> +	if (for_each_remote(fill_each_remote, &remotes)) {
>> +		error(_("failed to fill remotes"));
>> +		result = 1;
>> +		goto cleanup;
>> +	}
>> +
>> +	for (item = remotes.items;
>> +	     item && item < remotes.items + remotes.nr;
>> +	     item++)
> 
> Is there a reason not to use for_each_string_list_item() instead? This
> would be more brief and also easier to read (less thinking about what
> the loop is doing).

I just keep forgetting about that macro. Thanks.

>> +		result |= fetch_remote(item->string, opts);
>> +
>> +cleanup:
>> +	string_list_clear(&remotes, 0);
>> +	return result;
>> +}
>> +
> 
>>  enum maintenance_task_label {
>> +	TASK_PREFETCH,
> [jrnieder] Nit: Is there a sort order for these? Should we establish an order
> early on (e.g. alphabetical)?

There _is_ an order, and maybe that should be up for debate.
This one is really about fetching new refs before (possibly)
deleting unreachable data or doing a full repack in the 'gc'
task. The rest of the tasks are after the 'gc' task (in an
unimportant order) because they are likely to be no-ops if the
'gc' task does work.

Of course, users can customize this order using the
--task=<task> options.

>>  	TASK_GC,
>>  	TASK_COMMIT_GRAPH,
> 
>> +test_expect_success 'prefetch multiple remotes' '
>> +	git clone . clone1 &&
>> +	git clone . clone2 &&
>> +	git remote add remote1 "file://$(pwd)/clone1" &&
>> +	git remote add remote2 "file://$(pwd)/clone2" &&
>> +	git -C clone1 switch -c one &&
>> +	git -C clone2 switch -c two &&
>> +	test_commit -C clone1 one &&
>> +	test_commit -C clone2 two &&
>> +	GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
>> +	fetchargs="--prune --no-tags --no-write-fetch-head --refmap= --quiet" &&
>> +	test_subcommand git fetch remote1 $fetchargs +refs/heads/\\*:refs/prefetch/remote1/\\* <run-prefetch.txt &&

> [jrnieder] In practice, why were all the \\ needed? Trying to figure out
> where Git is using a shell that would need the * escaped and finding it
> hard to reason about.

It's the 'grep' way down inside test_subcommand that will
fail if it interprets these '*' incorrectly.

>> +	test_subcommand git fetch remote2 $fetchargs +refs/heads/\\*:refs/prefetch/remote2/\\* <run-prefetch.txt &&
>> +	test_path_is_missing .git/refs/remotes &&
>> +	test_cmp clone1/.git/refs/heads/one .git/refs/prefetch/remote1/one &&
>> +	test_cmp clone2/.git/refs/heads/two .git/refs/prefetch/remote2/two &&

> [jrnieder] Should we use test_cmp_rev instead to make compatible with
> packed-refs?

While that would be good, these are refs across two .git directories.

I suppose that I could fetch normally and check refs/remotes/remote1/one
versus refs/prefetch/remote1/one.

>> +	git log prefetch/remote1/one &&
>> +	git log prefetch/remote2/two

> [jonathantan] Why do we use 'git log' to check? I'm a little confused
> about what's going on; if you just want to check that the refs are
> present you could use 'git rev-parse' instead?

The main point is to demonstrate that we have the actual objects
in the current repository, not just refs storing object IDs that
don't actually exist in the object database. Does rev-parse check
that the object exists? (A quick check in my local repository seems
to say no.)

Reorganizing the test to work with test_cmp_rev results in the
following:

test_expect_success 'prefetch multiple remotes' '
	git clone . clone1 &&
	git clone . clone2 &&
	git remote add remote1 "file://$(pwd)/clone1" &&
	git remote add remote2 "file://$(pwd)/clone2" &&
	git -C clone1 switch -c one &&
	git -C clone2 switch -c two &&
	test_commit -C clone1 one &&
	test_commit -C clone2 two &&
	GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
	test_subcommand git fetch remote1 $fetchargs +refs/heads/\\*:refs/prefetch/remote1/\\* <run-prefetch.txt &&
	test_subcommand git fetch remote2 $fetchargs +refs/heads/\\*:refs/prefetch/remote2/\\* <run-prefetch.txt &&
	test_path_is_missing .git/refs/remotes &&
	git log prefetch/remote1/one &&
	git log prefetch/remote2/two &&
	git fetch --all &&
	test_cmp_rev refs/remotes/remote1/one refs/prefetch/remote1/one &&
	test_cmp_rev refs/remotes/remote2/two refs/prefetch/remote2/two
'

Thanks,
-Stolee

  reply	other threads:[~2020-08-14  1:30 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-06 16:30 [PATCH 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 1/9] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano via GitGitGadget
2020-08-12 23:10   ` Emily Shaffer
2020-08-13  0:03     ` Junio C Hamano
2020-08-13  1:45       ` Jonathan Nieder
2020-08-13  4:37       ` [PATCH v3] " Junio C Hamano
2020-08-14  1:13         ` Derrick Stolee
2020-08-14  1:32           ` Junio C Hamano
2020-08-06 16:30 ` [PATCH 2/9] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-08-12 23:10   ` Emily Shaffer
2020-08-14  1:28     ` Derrick Stolee [this message]
2020-08-06 16:30 ` [PATCH 3/9] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-08-12 23:10   ` Emily Shaffer
2020-08-14  1:46     ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 4/9] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 5/9] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 6/9] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 7/9] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-06 17:02   ` Son Luong Ngoc
2020-08-06 18:13     ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 9/9] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 1/9] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 2/9] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 3/9] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 4/9] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 5/9] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 6/9] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 7/9] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 9/9] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-08-25 18:36   ` [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-25 18:36     ` [PATCH v3 1/8] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-09-22 23:05       ` Jonathan Tan
2020-08-25 18:36     ` [PATCH v3 2/8] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-09-22 23:09       ` Jonathan Tan
2020-09-24 13:45         ` Derrick Stolee
2020-08-25 18:36     ` [PATCH v3 3/8] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-09-22 23:15       ` Jonathan Tan
2020-09-24 13:51         ` Derrick Stolee
2020-08-25 18:36     ` [PATCH v3 4/8] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-09-22 23:16       ` Jonathan Tan
2020-09-24 13:53         ` Derrick Stolee
2020-08-25 18:36     ` [PATCH v3 5/8] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-25 18:36     ` [PATCH v3 6/8] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-09-22 23:26       ` Jonathan Tan
2020-09-24 14:05         ` Derrick Stolee
2020-09-24 22:01           ` Jonathan Tan
2020-08-25 18:36     ` [PATCH v3 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-25 18:36     ` [PATCH v3 8/8] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-09-22 23:52       ` Jonathan Tan
2020-08-25 20:59     ` [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks Junio C Hamano
2020-08-26 15:15     ` Son Luong Ngoc
2020-08-26 16:21       ` Derrick Stolee
2020-09-25 12:33     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 1/8] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 2/8] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 3/8] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-09-25 18:00         ` Junio C Hamano
2020-09-25 18:43           ` Derrick Stolee
2020-09-25 12:33       ` [PATCH v4 4/8] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 5/8] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 6/8] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 8/8] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e96b2eec-0746-1b00-4b8b-ce737cdce967@gmail.com \
    --to=stolee@gmail.com \
    --cc=congdanhqx@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sluongng@gmail.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).