From: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
To: Christian Couder <christian.couder@gmail.com>
Cc: git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>,
Jeff Hostetler <git@jeffhostetler.com>
Subject: Re: [PATCH 3/5] parallel-checkout: add configuration options
Date: Fri, 2 Apr 2021 11:45:47 -0300 [thread overview]
Message-ID: <CAHd-oW4jhoUz=4XZYC0HZPFTziyC1fw3DKQuR5rzSmmBgxiCCw@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD3s3NUpi2eyPWFa5bL4rez1wNtj5-dUpv8dJLo2CKYu9A@mail.gmail.com>
On Wed, Mar 31, 2021 at 1:33 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
>
> > The above benchmarks show that parallel checkout is most effective on
> > repositories located on an SSD or over a distributed file system. For
> > local file systems on spinning disks, and/or older machines, the
> > parallelism does not always bring a good performance. For this reason,
> > the default value for checkout.workers is one, a.k.a. sequential
> > checkout.
>
> I wonder how many people are still using HDD, and how many will still
> use them in a few years.
>
> I think having 1 as the default value for checkout.workers might be
> good for a while for backward compatibility and stability, while
> people who are interested can test parallel checkout on different
> environments. But we might want, in a few releases, after some bugs,
> if any, have been fixed, to use a default, maybe 10, that will provide
> significant speedup for most people, and will justify the added
> complexity, especially as your numbers still show a small speedup for
> HDD when using 10.
Yeah, I agree that it would be nice to have a better default in the
near future. Unfortunately, on some other HDD machines that I tested,
parallel checkout was slower than the sequential version. So I think
it may not be possible to enable parallelism by default now.
Nevertheless, I was also experimenting with some ideas to auto-detect
if parallelism is efficient in a given environment/repo and
auto-enable it if so. One interesting possibility suggested by Ævar
[1] was to implement this as a git maintenance task, which could
periodically probe the system and tune the checkout settings
appropriately.
[1]: https://lore.kernel.org/git/87y2ixpvos.fsf@evledraar.gmail.com/
> > @@ -23,6 +25,19 @@ enum pc_status parallel_checkout_status(void)
> > return parallel_checkout.status;
> > }
> >
> > +#define DEFAULT_THRESHOLD_FOR_PARALLELISM 100
>
> Using a "static const int" might be a bit better.
Ok, I will change that.
> > +void get_parallel_checkout_configs(int *num_workers, int *threshold)
> > +{
> > + if (git_config_get_int("checkout.workers", num_workers))
> > + *num_workers = 1;
>
> I think it's better when an important default value like this "1" is
> made more visible using a "static const int" or a "#define".
Will do.
> > @@ -568,7 +581,7 @@ int run_parallel_checkout(struct checkout *state)
> > if (parallel_checkout.nr < num_workers)
> > num_workers = parallel_checkout.nr;
> >
> > - if (num_workers <= 1) {
> > + if (num_workers <= 1 || parallel_checkout.nr < threshold) {
>
> Here we check the number of workers...
>
> > write_items_sequentially(state);
> > } else {
> > struct pc_worker *workers = setup_workers(state, num_workers);
>
> [...]
>
> > @@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o,
> > }
> > }
> > stop_progress(&progress);
> > - errs |= run_parallel_checkout(&state);
> > + if (pc_workers > 1)
> > + errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
> ...but the number of workers was already checked here.
The re-checking at run_parallel_checkout() is important because
`num_workers` might actually become <= 1 after the above check at
check_updates(). This happens when there aren't enough enqueued
entries for 2+ workers, so we fall back to sequential checkout. It
also kind of works as a safe-mechanism for the case where the
run_parallel_checkout() caller forgot to check if the user actually
wants parallelism before calling the function.
next prev parent reply other threads:[~2021-04-02 14:47 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-03-31 4:22 ` Christian Couder
2021-04-02 14:39 ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-03-31 4:32 ` Christian Couder
2021-04-02 14:42 ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-03-31 4:33 ` Christian Couder
2021-04-02 14:45 ` Matheus Tavares Bernardino [this message]
2021-03-17 21:12 ` [PATCH 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-03-31 5:36 ` Christian Couder
2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-03-19 3:24 ` Matheus Tavares
2021-03-19 22:58 ` Junio C Hamano
2021-03-31 5:42 ` Christian Couder
2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
2021-04-08 16:17 ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-08 16:17 ` [PATCH v2 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-08 16:17 ` [PATCH v2 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-08 16:17 ` [PATCH v2 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-08 16:17 ` [PATCH v2 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-08 19:52 ` [PATCH v2 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-04-16 21:43 ` Junio C Hamano
2021-04-17 19:57 ` Matheus Tavares Bernardino
2021-04-19 9:41 ` Christian Couder
2021-04-19 0:14 ` [PATCH v3 " Matheus Tavares
2021-04-19 0:14 ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19 0:14 ` [PATCH v3 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19 0:14 ` [PATCH v3 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19 0:14 ` [PATCH v3 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19 0:14 ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-19 9:36 ` Christian Couder
2021-04-19 19:53 ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-04-19 19:53 ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19 19:53 ` [PATCH v4 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19 19:53 ` [PATCH v4 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19 19:53 ` [PATCH v4 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19 19:53 ` [PATCH v4 5/5] parallel-checkout: add design documentation Matheus Tavares
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHd-oW4jhoUz=4XZYC0HZPFTziyC1fw3DKQuR5rzSmmBgxiCCw@mail.gmail.com' \
--to=matheus.bernardino@usp.br \
--cc=christian.couder@gmail.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).