git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Son Luong Ngoc <sluongng@gmail.com>, gitgitgadget@gmail.com
Cc: dstolee@microsoft.com, git@vger.kernel.org, jrnieder@google.com,
	peff@peff.net
Subject: Re: [PATCH 05/15] run-job: implement pack-files job
Date: Mon, 4 May 2020 10:34:17 -0400	[thread overview]
Message-ID: <db35867f-bb33-1155-170f-30a895948733@gmail.com> (raw)
In-Reply-To: <CAL3xRKfcKh=XxGso4DTRfJMAVMtwqyQkO0VUhqVuZUr_aQ5f+A@mail.gmail.com>

On 5/2/2020 3:56 AM, Son Luong Ngoc wrote:
> Hi Derrick,
> 
> Sorry for another late reply on this RFC.
> This time is a question on the multi-pack-index repack process.
> 
> Why expire *before* repack and not after?
> 
> I thought `core.multiPackIndex=true` would prevent old pack files from
> being used thus expiring immediately after repack is safe? (on that
> note, are users
> required to set this config prior to running the job?)
> 
> If expiring immediately after repack()+verify() is not safe, then should
> we have a minimum allowed interval set? (although I would preferred to
> make expire() safe)

Suppose we run "repack" and then "expire".

Suppose that there is a concurrent Git process that has a read handle
on the multi-pack-index from before the repack. This multi-pack-index
has a pack-file that was repacked by the "repack" command, and hence
was expired and deleted by the "expire" command (because all of its
objects are in the _new_ pack). However, when the Git process with the
old multi-pack-index reads an object pointing to that pack-file, it
finds that the pack does not exist when it tries to load it.

Git is usually pretty resilient to these kinds of things, but it seemed
prudent to be extra careful here. By spacing out these jobs across a
time where we don't expect concurrent Git processes to hold a read handle
on that old multi-pack-index (say, 24 hours) then this method is safe.

In fact, Scalar ensures that no concurrent Git process is running at the
start of the job [1], which means that no Git processes are _still_ running
since the last job (but this does not stop concurrent processes from starting
after that point and before the 'expire' command).

[1] https://github.com/microsoft/scalar/blob/489764b815dfea32bec5cd4228f2157766012784/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs#L106-L111

>> +
>> + if (multi_pack_index_verify()) {
>> + warning(_("multi-pack-index verify failed after expire"));
>> + return rewrite_multi_pack_index();
>> + }
>> +
>> + if (multi_pack_index_repack()) {
>> + error(_("multi-pack-index repack failed"));
>> + return 1;
>> + }
> 
> Again, I just think the decision to include verify() inside repack()
> made this part a bit inconsistent.

You're right. It is a bit inconsistent. That should be fixed in the
next version.

Thanks,
-Stolee

  reply	other threads:[~2020-05-04 14:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-02  7:56 [PATCH 05/15] run-job: implement pack-files job Son Luong Ngoc
2020-05-04 14:34 ` Derrick Stolee [this message]
  -- strict thread matches above, loose matches on Subject: below --
2020-04-03 20:47 [PATCH 00/15] [RFC] Maintenance jobs and job runner Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 05/15] run-job: implement pack-files job Derrick Stolee via GitGitGadget
2020-05-27 22:17   ` Josh Steadmon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db35867f-bb33-1155-170f-30a895948733@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jrnieder@google.com \
    --cc=peff@peff.net \
    --cc=sluongng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).