git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
To: Christian Couder <christian.couder@gmail.com>
Cc: git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>,
	Jeff Hostetler <git@jeffhostetler.com>
Subject: Re: [PATCH 3/5] parallel-checkout: add configuration options
Date: Fri, 2 Apr 2021 11:45:47 -0300	[thread overview]
Message-ID: <CAHd-oW4jhoUz=4XZYC0HZPFTziyC1fw3DKQuR5rzSmmBgxiCCw@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD3s3NUpi2eyPWFa5bL4rez1wNtj5-dUpv8dJLo2CKYu9A@mail.gmail.com>

On Wed, Mar 31, 2021 at 1:33 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
>
> > The above benchmarks show that parallel checkout is most effective on
> > repositories located on an SSD or over a distributed file system. For
> > local file systems on spinning disks, and/or older machines, the
> > parallelism does not always bring a good performance. For this reason,
> > the default value for checkout.workers is one, a.k.a. sequential
> > checkout.
>
> I wonder how many people are still using HDD, and how many will still
> use them in a few years.
>
> I think having 1 as the default value for checkout.workers might be
> good for a while for backward compatibility and stability, while
> people who are interested can test parallel checkout on different
> environments. But we might want, in a few releases, after some bugs,
> if any, have been fixed, to use a default, maybe 10, that will provide
> significant speedup for most people, and will justify the added
> complexity, especially as your numbers still show a small speedup for
> HDD when using 10.

Yeah, I agree that it would be nice to have a better default in the
near future. Unfortunately, on some other HDD machines that I tested,
parallel checkout was slower than the sequential version. So I think
it may not be possible to enable parallelism by default now.

Nevertheless, I was also experimenting with some ideas to auto-detect
if parallelism is efficient in a given environment/repo and
auto-enable it if so. One interesting possibility suggested by Ævar
[1] was to implement this as a git maintenance task, which could
periodically probe the system and tune the checkout settings
appropriately.

[1]: https://lore.kernel.org/git/87y2ixpvos.fsf@evledraar.gmail.com/

> > @@ -23,6 +25,19 @@ enum pc_status parallel_checkout_status(void)
> >         return parallel_checkout.status;
> >  }
> >
> > +#define DEFAULT_THRESHOLD_FOR_PARALLELISM 100
>
> Using a "static const int" might be a bit better.

Ok, I will change that.

> > +void get_parallel_checkout_configs(int *num_workers, int *threshold)
> > +{
> > +       if (git_config_get_int("checkout.workers", num_workers))
> > +               *num_workers = 1;
>
> I think it's better when an important default value like this "1" is
> made more visible using a "static const int" or a "#define".

Will do.

> > @@ -568,7 +581,7 @@ int run_parallel_checkout(struct checkout *state)
> >         if (parallel_checkout.nr < num_workers)
> >                 num_workers = parallel_checkout.nr;
> >
> > -       if (num_workers <= 1) {
> > +       if (num_workers <= 1 || parallel_checkout.nr < threshold) {
>
> Here we check the number of workers...
>
> >                 write_items_sequentially(state);
> >         } else {
> >                 struct pc_worker *workers = setup_workers(state, num_workers);
>
> [...]
>
> > @@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o,
> >                 }
> >         }
> >         stop_progress(&progress);
> > -       errs |= run_parallel_checkout(&state);
> > +       if (pc_workers > 1)
> > +               errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);

> ...but the number of workers was already checked here.

The re-checking at run_parallel_checkout() is important because
`num_workers` might actually become <= 1 after the above check at
check_updates(). This happens when there aren't enough enqueued
entries for 2+ workers, so we fall back to sequential checkout. It
also kind of works as a safe-mechanism for the case where the
run_parallel_checkout() caller forgot to check if the user actually
wants parallelism before calling the function.

  reply	other threads:[~2021-04-02 14:47 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-03-31  4:22   ` Christian Couder
2021-04-02 14:39     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-03-31  4:32   ` Christian Couder
2021-04-02 14:42     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-03-31  4:33   ` Christian Couder
2021-04-02 14:45     ` Matheus Tavares Bernardino [this message]
2021-03-17 21:12 ` [PATCH 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-03-31  5:36   ` Christian Couder
2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-03-19  3:24   ` Matheus Tavares
2021-03-19 22:58     ` Junio C Hamano
2021-03-31  5:42 ` Christian Couder
2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-08 19:52   ` [PATCH v2 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-04-16 21:43   ` Junio C Hamano
2021-04-17 19:57     ` Matheus Tavares Bernardino
2021-04-19  9:41     ` Christian Couder
2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-19  9:36       ` Christian Couder
2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 5/5] parallel-checkout: add design documentation Matheus Tavares

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHd-oW4jhoUz=4XZYC0HZPFTziyC1fw3DKQuR5rzSmmBgxiCCw@mail.gmail.com' \
    --to=matheus.bernardino@usp.br \
    --cc=christian.couder@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).