git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, sandals@crustytoothpaste.net,
	steadmon@google.com, jrnieder@gmail.com, peff@peff.net,
	congdanhqx@gmail.com, phillip.wood123@gmail.com,
	sluongng@gmail.com, jonathantanmy@google.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 3/9] maintenance: add loose-objects task
Date: Wed, 12 Aug 2020 16:10:53 -0700	[thread overview]
Message-ID: <20200812231053.GI2965447@google.com> (raw)
In-Reply-To: <621375a3c99014c48568660bde062b7330d5a662.1596731425.git.gitgitgadget@gmail.com>

On Thu, Aug 06, 2020 at 04:30:18PM +0000, Derrick Stolee via GitGitGadget wrote:

> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> index bb0d5eded4..898aff4726 100644
> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -80,6 +80,17 @@ gc::
>  	It can also be disruptive in some situations, as it deletes stale
>  	data.
>  
> +loose-objects::
> +	The `loose-objects` job cleans up loose objects and places them into
> +	pack-files. In order to prevent race conditions with concurrent Git
> +	commands, it follows a two-step process. First, it deletes any loose
> +	objects that already exist in a pack-file; concurrent Git processes
> +	will examine the pack-file for the object data instead of the loose
> +	object. Second, it creates a new pack-file (starting with "loose-")
> +	containing a batch of loose objects. The batch size is limited to 50
> +	thousand objects to prevent the job from taking too long on a
> +	repository with many loose objects.

[emily] What's not said is what happens to loose-* packfile. Does this
get repacked and disappear, is it treated differently, etc?
[jonathantan] This is treated the same as any other pack. Is there a
reason to call it loose-*?
[jrnieder] It seems like unreachable objects will get stuck going in and
out of this pack, right? unreachable loose obj -> loose-*.pack ->
repack, unreachable becomes loose -> repeat

> +static int prune_packed(struct maintenance_opts *opts)
> +{
> +	struct child_process child = CHILD_PROCESS_INIT;
> +
> +	child.git_cmd = 1;
> +	strvec_push(&child.args, "prune-packed");
> +
> +	if (opts->quiet)
> +		strvec_push(&child.args, "--quiet");
> +
> +	return !!run_command(&child);
[emily] Why not report the error code here to the caller? Is there a
path to notify user of errors that require user intervention?
[jrnieder] Some errors might be expected and some might not, so should
we handle them differently?
[emily] Probably makes more sense to revisit this in the far future when
we teach 'git maintenance' to send us emails and flashing lights when
our jobs fail, instead of worrying about it now :)
[jrnieder] Imagine if we got exit 127 because the usage we are using is
wrong - in that case, for example, we would want to BUG()

> +static int write_loose_object_to_stdin(const struct object_id *oid,
> +				       const char *path,
> +				       void *data)
> +{
> +	struct write_loose_object_data *d = (struct write_loose_object_data *)data;
[jrnieder] Since we are enlightened C developers, not C++ developers,
you can skip the explicit cast.

> +++ b/t/t7900-maintenance.sh
> @@ -83,4 +83,43 @@ test_expect_success 'prefetch multiple remotes' '
>  	git log prefetch/remote2/two
>  '
>  
> +test_expect_success 'loose-objects task' '
[jrnieder] Unrelated to this change, this test makes me sad that we
don't have better low-level test helpers to enable this kind of testing.
Makes me wish for something better :)
> +	# Repack everything so we know the state of the object dir
> +	git repack -adk &&
> +
> +	# Hack to stop maintenance from running during "git commit"
> +	echo in use >.git/objects/maintenance.lock &&
> +
> +	# Assuming that "git commit" creates at least one loose object
> +	test_commit create-loose-object &&
> +	rm .git/objects/maintenance.lock &&
> +
> +	ls .git/objects >obj-dir-before &&
> +	test_file_not_empty obj-dir-before &&
> +	ls .git/objects/pack/*.pack >packs-before &&
> +	test_line_count = 1 packs-before &&
> +
> +	# The first run creates a pack-file
> +	# but does not delete loose objects.
> +	git maintenance run --task=loose-objects &&
> +	ls .git/objects >obj-dir-between &&
> +	test_cmp obj-dir-before obj-dir-between &&
> +	ls .git/objects/pack/*.pack >packs-between &&
> +	test_line_count = 2 packs-between &&
> +	ls .git/objects/pack/loose-*.pack >loose-packs &&
> +	test_line_count = 1 loose-packs &&
> +
> +	# The second run deletes loose objects
> +	# but does not create a pack-file.
> +	git maintenance run --task=loose-objects &&
> +	ls .git/objects >obj-dir-after &&
> +	cat >expect <<-\EOF &&
> +	info
> +	pack
> +	EOF
> +	test_cmp expect obj-dir-after &&
> +	ls .git/objects/pack/*.pack >packs-after &&
> +	test_cmp packs-between packs-after
> +'
> +


  reply	other threads:[~2020-08-12 23:11 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-06 16:30 [PATCH 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 1/9] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano via GitGitGadget
2020-08-12 23:10   ` Emily Shaffer
2020-08-13  0:03     ` Junio C Hamano
2020-08-13  1:45       ` Jonathan Nieder
2020-08-13  4:37       ` [PATCH v3] " Junio C Hamano
2020-08-14  1:13         ` Derrick Stolee
2020-08-14  1:32           ` Junio C Hamano
2020-08-06 16:30 ` [PATCH 2/9] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-08-12 23:10   ` Emily Shaffer
2020-08-14  1:28     ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 3/9] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-08-12 23:10   ` Emily Shaffer [this message]
2020-08-14  1:46     ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 4/9] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 5/9] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 6/9] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 7/9] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-06 17:02   ` Son Luong Ngoc
2020-08-06 18:13     ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 9/9] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 1/9] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 2/9] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 3/9] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 4/9] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 5/9] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 6/9] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 7/9] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 9/9] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-08-25 18:36   ` [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-25 18:36     ` [PATCH v3 1/8] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-09-22 23:05       ` Jonathan Tan
2020-08-25 18:36     ` [PATCH v3 2/8] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-09-22 23:09       ` Jonathan Tan
2020-09-24 13:45         ` Derrick Stolee
2020-08-25 18:36     ` [PATCH v3 3/8] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-09-22 23:15       ` Jonathan Tan
2020-09-24 13:51         ` Derrick Stolee
2020-08-25 18:36     ` [PATCH v3 4/8] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-09-22 23:16       ` Jonathan Tan
2020-09-24 13:53         ` Derrick Stolee
2020-08-25 18:36     ` [PATCH v3 5/8] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-25 18:36     ` [PATCH v3 6/8] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-09-22 23:26       ` Jonathan Tan
2020-09-24 14:05         ` Derrick Stolee
2020-09-24 22:01           ` Jonathan Tan
2020-08-25 18:36     ` [PATCH v3 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-25 18:36     ` [PATCH v3 8/8] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-09-22 23:52       ` Jonathan Tan
2020-08-25 20:59     ` [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks Junio C Hamano
2020-08-26 15:15     ` Son Luong Ngoc
2020-08-26 16:21       ` Derrick Stolee
2020-09-25 12:33     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 1/8] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 2/8] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 3/8] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-09-25 18:00         ` Junio C Hamano
2020-09-25 18:43           ` Derrick Stolee
2020-09-25 12:33       ` [PATCH v4 4/8] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 5/8] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 6/8] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 8/8] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200812231053.GI2965447@google.com \
    --to=emilyshaffer@google.com \
    --cc=congdanhqx@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sluongng@gmail.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).