git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matheus Tavares <matheus.bernardino@usp.br>
To: gitster@pobox.com
Cc: git@vger.kernel.org, christian.couder@gmail.com, git@jeffhostetler.com
Subject: [PATCH v4 0/5] Parallel Checkout (part 2)
Date: Mon, 19 Apr 2021 16:53:30 -0300	[thread overview]
Message-ID: <cover.1618861380.git.matheus.bernardino@usp.br> (raw)
In-Reply-To: <cover.1618790794.git.matheus.bernardino@usp.br>

This version is almost identical to v3, but the last patch incorporates
the typo fixes and other rewording suggestions Christian made about the
design doc on the last round.

I decided to remove the sentence about step 3 dominating the execution
time as that's not always the case on e.g. a non-local clone or
sparse-checkout.

Matheus Tavares (5):
  unpack-trees: add basic support for parallel checkout
  parallel-checkout: make it truly parallel
  parallel-checkout: add configuration options
  parallel-checkout: support progress displaying
  parallel-checkout: add design documentation

 .gitignore                                    |   1 +
 Documentation/Makefile                        |   1 +
 Documentation/config/checkout.txt             |  21 +
 Documentation/technical/parallel-checkout.txt | 270 ++++++++
 Makefile                                      |   2 +
 builtin.h                                     |   1 +
 builtin/checkout--worker.c                    | 145 ++++
 entry.c                                       |  17 +-
 git.c                                         |   2 +
 parallel-checkout.c                           | 655 ++++++++++++++++++
 parallel-checkout.h                           | 111 +++
 unpack-trees.c                                |  19 +-
 12 files changed, 1240 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/technical/parallel-checkout.txt
 create mode 100644 builtin/checkout--worker.c
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

Range-diff against v3:
1:  7096822c14 = 1:  7096822c14 unpack-trees: add basic support for parallel checkout
2:  4526516ea0 = 2:  4526516ea0 parallel-checkout: make it truly parallel
3:  ad165c0637 = 3:  ad165c0637 parallel-checkout: add configuration options
4:  cf9e28dc0e = 4:  cf9e28dc0e parallel-checkout: support progress displaying
5:  415d4114aa ! 5:  fd929f072c parallel-checkout: add design documentation
    @@ Documentation/technical/parallel-checkout.txt (new)
     +* Step 4: Write the new index to disk.
     +
     +Step 3 is the focus of the "parallel checkout" effort described here.
    -+It dominates the execution time for most of the above command types.
     +
     +Sequential Implementation
     +-------------------------
    @@ Documentation/technical/parallel-checkout.txt (new)
     +It wouldn't be safe to perform Step 3b in parallel, as there could be
     +race conditions between file creations and removals. Instead, the
     +parallel checkout framework lets the sequential code handle Step 3b,
    -+and use parallel workers to replace the sequential
    ++and uses parallel workers to replace the sequential
     +`entry.c:write_entry()` calls from Step 3c.
     +
     +Rejected Multi-Threaded Solution
    @@ Documentation/technical/parallel-checkout.txt (new)
     +warning for the user, like the classic sequential checkout does.
     +
     +The workers are able to detect both collisions among the entries being
    -+concurrently written and collisions among parallel-eligible and
    -+ineligible entries. The general idea for collision detection is quite
    -+straightforward: for each parallel-eligible entry, the main process must
    -+remove all files that prevent this entry from being written (before
    -+enqueueing it). This includes any non-directory file in the leading path
    -+of the entry. Later, when a worker gets assigned the entry, it looks
    -+again for the non-directories files and for an already existing file at
    -+the entry's path. If any of these checks finds something, the worker
    -+knows that there was a path collision.
    ++concurrently written and collisions between a parallel-eligible entry
    ++and an ineligible entry. The general idea for collision detection is
    ++quite straightforward: for each parallel-eligible entry, the main
    ++process must remove all files that prevent this entry from being written
    ++(before enqueueing it). This includes any non-directory file in the
    ++leading path of the entry. Later, when a worker gets assigned the entry,
    ++it looks again for the non-directories files and for an already existing
    ++file at the entry's path. If any of these checks finds something, the
    ++worker knows that there was a path collision.
     +
     +Because parallel checkout can distinguish path collisions from the case
     +where the file was already present in the working tree before checkout,
    @@ Documentation/technical/parallel-checkout.txt (new)
     +Besides, long-running filters may use the delayed checkout feature to
     +postpone the return of some filtered blobs. The delayed checkout queue
     +and the parallel checkout queue are not compatible and should remain
    -+separated.
    ++separate.
     ++
     +Note: regular files that only require internal filters, like end-of-line
     +conversion and re-encoding, are eligible for parallel checkout.
    @@ Documentation/technical/parallel-checkout.txt (new)
     +The API
     +-------
     +
    -+The parallel checkout API was designed with the goal to minimize changes
    -+to the current users of the checkout machinery. This means that they
    -+don't have to call a different function for sequential or parallel
    ++The parallel checkout API was designed with the goal of minimizing
    ++changes to the current users of the checkout machinery. This means that
    ++they don't have to call a different function for sequential or parallel
     +checkout. As already mentioned, `checkout_entry()` will automatically
     +insert the given entry in the parallel checkout queue when this feature
     +is enabled and the entry is eligible; otherwise, it will just write the
-- 
2.30.1


  parent reply	other threads:[~2021-04-19 19:53 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-03-31  4:22   ` Christian Couder
2021-04-02 14:39     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-03-31  4:32   ` Christian Couder
2021-04-02 14:42     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-03-31  4:33   ` Christian Couder
2021-04-02 14:45     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-03-31  5:36   ` Christian Couder
2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-03-19  3:24   ` Matheus Tavares
2021-03-19 22:58     ` Junio C Hamano
2021-03-31  5:42 ` Christian Couder
2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-08 19:52   ` [PATCH v2 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-04-16 21:43   ` Junio C Hamano
2021-04-17 19:57     ` Matheus Tavares Bernardino
2021-04-19  9:41     ` Christian Couder
2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-19  9:36       ` Christian Couder
2021-04-19 19:53     ` Matheus Tavares [this message]
2021-04-19 19:53       ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 5/5] parallel-checkout: add design documentation Matheus Tavares

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1618861380.git.matheus.bernardino@usp.br \
    --to=matheus.bernardino@usp.br \
    --cc=christian.couder@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).