git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Tao Klerks <tao@klerks.biz>
Cc: git <git@vger.kernel.org>
Subject: Re: Auto packing the repository - foreground or background in Windows?
Date: Fri, 9 Dec 2022 10:11:56 -0500	[thread overview]
Message-ID: <1e8386ed-6acb-0deb-2e46-2e9dbd6e4ad5@github.com> (raw)
In-Reply-To: <CAPMMpoii52KrR2MBpPdSEH8-jc7uMPzi4DH5g2bchwd-RPNTJw@mail.gmail.com>

On 12/8/2022 9:52 AM, Tao Klerks wrote:
> On Tue, Dec 6, 2022 at 7:03 PM Derrick Stolee <derrickstolee@github.com> wrote:
>>
>> Instead, the modern recommendation for repositories where "git gc --auto"
>> would be slow is to run "git maintenance start" which will schedule
>> background maintenance jobs with the Windows scheduler. Those processes
>> are built to do updates that are non-invasive to concurrent foreground
>> processes. It also sets config to avoid "git gc --auto" commands at the
>> end of foreground Git processes.
>>
>> See [1] for more details.
>>
>> [1] https://git-scm.com/docs/git-maintenance
>>
> 
> Thanks Stolee, I've known about the existence of this system for a
> while, but I can't quite figure out what's recommended for who, when,
> given the doc at https://git-scm.com/docs/git-maintenance

Thanks for the feedback that this document could use a clearer
high-level description for recommended ways to use the command, and
_when_.

One goal when creating the documentation was to _not_ recommend a
specific use pattern, instead focusing on the many ways a user could
customize their maintenance patterns. Perhaps the feature has
stabilized enough (and shown its benefits) that we could add a
recommended use section.
 
> Clearly on Windows, one reason to do "git maintenance start" is to
> avoid foregrounded "git gc --auto" runs later. That's a clear enough
> benefit to say "frequent users of large repos on windows *should* run
> 'git maintenance start' (or have some setup process or GUI do it for
> them) on those large repos".
> 
> Is there a corresponding tangible benefit on MacOS and/or Linux, over
> simply getting "git gc --auto" do its backgrounded thing when it feels
> like it? Or is there an eventual plan to *switch* from the current
> "git gc --auto" spawning to a "git maintenance start" execution when
> trigger conditions are met? Are there any *dis*advantages to running
> "git maintenance start" in general or on any given platform?

For large repositories, the default 'git gc --auto' takes a lot of
resources to rewrite all object data into a single pack-file. The
background maintenance does smaller, incremental repacks. Here,
"large" means "more than 2GB of packed object data", since that's
the default limit for the incremental repacks starting a new pack.

There's other benefits where it does hourly prefetches, getting
object data from remotes before the user requests a ref update
through 'git fetch' or 'git pull'. Those foreground operations
speed up, as well.

> For "my users", I have something like Scalar that can start the
> maintenance on the repo where it's needed - but it seems like there
> will be lots of users out there in the world who clone things like the
> linux repo, which looks like it is big enough to warrant these kinds
> of concerns, but it doesn't seem obvious that anyone will ever find
> "https://git-scm.com/docs/git-maintenance" and decide to run "git
> maintenance start" on their own...

We do what we can to advertise these kinds of features, but at some
point users need to self-discover things. But that's also a motivation
for the Scalar command: the user can relax some control to allow the
Scalar command to choose those recommended settings on behalf of the
user.

> As I noted in another email, I propose to replace "Auto packing the
> repository for optimum performance" with something like "Auto packing
> the repository for optimum performance; to run this kind of
> maintenance in the background, see 'git maintenance' at
> https://git-scm.com/docs/git-maintenance." - but I imagine I'm missing
> a bigger picture / a long-term plan for how these two mechanisms
> should interact.

A message that points out 'git maintenance' like this might work best
as part of the "advice" API, so those who don't want to see the
message every time could disable it.

> My apologies if I've missed one or many conversations about this on
> the list, but maybe a pointer here can also help me add directional
> hints at https://git-scm.com/docs/git-maintenance for "outside users"?

I'm trying to think of a builtin whose documentation has such strong
"recommended use" language.

The best I could think about are commands with substantial "examples"
sections, such as 'git bundle'.

A more radical approach would be to create a new doc type that
provides recommendations for how to manage large repositories. I
imagine it would be sorted in order of increasing complexity,
something like:

 1. Use 'scalar' and see if it works for your needs.

 2. Self-serve with 'git maintenance start', 'git sparse-checkout',
    partial clone, and feature.manyFiles=true as needed.

 3. Go deep on individual plumbing commands and config options
    that provide knobs to tweak how Git manages information.

I think starting with some examples or a "recommended use" section
for 'git maintenance' would be a better first step.

Thanks,
-Stolee

      reply	other threads:[~2022-12-09 15:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-01 12:25 Auto packing the repository - foreground or background in Windows? Tao Klerks
2022-12-06 18:03 ` Derrick Stolee
2022-12-06 19:19   ` Ævar Arnfjörð Bjarmason
2022-12-06 22:41   ` Jeff Hostetler
2022-12-08 14:29     ` Tao Klerks
2022-12-08 14:52   ` Tao Klerks
2022-12-09 15:11     ` Derrick Stolee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e8386ed-6acb-0deb-2e46-2e9dbd6e4ad5@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=tao@klerks.biz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).