git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matheus Tavares <matheus.bernardino@usp.br>
To: gitster@pobox.com
Cc: git@vger.kernel.org, christian.couder@gmail.com, git@jeffhostetler.com
Subject: [PATCH v3 0/5] Parallel Checkout (part 2)
Date: Sun, 18 Apr 2021 21:14:52 -0300	[thread overview]
Message-ID: <cover.1618790794.git.matheus.bernardino@usp.br> (raw)
In-Reply-To: <cover.1617893234.git.matheus.bernardino@usp.br>

Three small changes since the last version: 

- In `write_pc_item_to_fd()`, renamed variable 'new_blob' to just 'blob'.

- In `write_pc_item_to_fd()`, used `ce->name` instead of `path` when
  reporting failure to read the object.

- Made `write_pc_item()` unlink() the file on a `write_pc_item_to_fd()`
  error. The reasoning for this is:

  Unlike sequential checkout, the parallel workers create each file
  *before* reading and converting the associated blob, so that they can
  detect path collisions as soon as possible in the process. This is
  important both to avoid unnecessary work reading and filtering
  objects that are not going to be written in parallel, and also to
  avoid reporting errors for colliding entries at the wrong time. These
  entries will be handled in a sequential phase later, and that will be
  the correct time to print any errors regarding their checkout.

  However, these logistics make us leave an empty file behind when we
  find an error after creating the file, e.g. a missing object. The
  added unlink() fix this case while maintaining the other important
  mechanics. The design doc was also adjusted to mention this.

Matheus Tavares (5):
  unpack-trees: add basic support for parallel checkout
  parallel-checkout: make it truly parallel
  parallel-checkout: add configuration options
  parallel-checkout: support progress displaying
  parallel-checkout: add design documentation

 .gitignore                                    |   1 +
 Documentation/Makefile                        |   1 +
 Documentation/config/checkout.txt             |  21 +
 Documentation/technical/parallel-checkout.txt | 271 ++++++++
 Makefile                                      |   2 +
 builtin.h                                     |   1 +
 builtin/checkout--worker.c                    | 145 ++++
 entry.c                                       |  17 +-
 git.c                                         |   2 +
 parallel-checkout.c                           | 655 ++++++++++++++++++
 parallel-checkout.h                           | 111 +++
 unpack-trees.c                                |  19 +-
 12 files changed, 1241 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/technical/parallel-checkout.txt
 create mode 100644 builtin/checkout--worker.c
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

Range-diff against v2:
1:  0506ac7159 ! 1:  7096822c14 unpack-trees: add basic support for parallel checkout
    @@ parallel-checkout.c (new)
     +	int ret;
     +	struct stream_filter *filter;
     +	struct strbuf buf = STRBUF_INIT;
    -+	char *new_blob;
    ++	char *blob;
     +	unsigned long size;
     +	ssize_t wrote;
     +
    @@ parallel-checkout.c (new)
     +		}
     +	}
     +
    -+	new_blob = read_blob_entry(pc_item->ce, &size);
    -+	if (!new_blob)
    ++	blob = read_blob_entry(pc_item->ce, &size);
    ++	if (!blob)
     +		return error("cannot read object %s '%s'",
    -+			     oid_to_hex(&pc_item->ce->oid), path);
    ++			     oid_to_hex(&pc_item->ce->oid), pc_item->ce->name);
     +
     +	/*
     +	 * checkout metadata is used to give context for external process
    @@ parallel-checkout.c (new)
     +	 * checkout, so pass NULL.
     +	 */
     +	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
    -+					 new_blob, size, &buf, NULL);
    ++					 blob, size, &buf, NULL);
     +
     +	if (ret) {
     +		size_t newsize;
    -+		free(new_blob);
    -+		new_blob = strbuf_detach(&buf, &newsize);
    ++		free(blob);
    ++		blob = strbuf_detach(&buf, &newsize);
     +		size = newsize;
     +	}
     +
    -+	wrote = write_in_full(fd, new_blob, size);
    -+	free(new_blob);
    ++	wrote = write_in_full(fd, blob, size);
    ++	free(blob);
     +	if (wrote < 0)
     +		return error("unable to write file '%s'", path);
     +
    @@ parallel-checkout.c (new)
     +	if (write_pc_item_to_fd(pc_item, fd, path.buf)) {
     +		/* Error was already reported. */
     +		pc_item->status = PC_ITEM_FAILED;
    ++		close_and_clear(&fd);
    ++		unlink(path.buf);
     +		goto out;
     +	}
     +
    @@ parallel-checkout.c (new)
     +	pc_item->status = PC_ITEM_WRITTEN;
     +
     +out:
    -+	/*
    -+	 * No need to check close() return at this point. Either fd is already
    -+	 * closed, or we are on an error path.
    -+	 */
    -+	close_and_clear(&fd);
     +	strbuf_release(&path);
     +}
     +
2:  0d65b517bd ! 2:  4526516ea0 parallel-checkout: make it truly parallel
    @@ parallel-checkout.c: static int write_pc_item_to_fd(struct parallel_checkout_ite
     +	 * be passed from the main process to the workers.
      	 */
      	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
    - 					 new_blob, size, &buf, NULL);
    + 					 blob, size, &buf, NULL);
     @@ parallel-checkout.c: static int close_and_clear(int *fd)
      	return ret;
      }
3:  6ea057f9c5 = 3:  ad165c0637 parallel-checkout: add configuration options
4:  0ac4bee67e = 4:  cf9e28dc0e parallel-checkout: support progress displaying
5:  087f8bdf35 ! 5:  415d4114aa parallel-checkout: add design documentation
    @@ Documentation/technical/parallel-checkout.txt (new)
     +Note that, when possible, steps W3 to W5 are delegated to the streaming
     +machinery, removing the need to keep the entire blob in memory.
     +
    -+Also note that the workers *never* remove any file. As mentioned
    -+earlier, it is the responsibility of the main process to remove any file
    -+that blocks the checkout operation (or abort if removing the file(s)
    -+would cause data loss and the user didn't ask to `--force`). This is
    -+crucial to avoid race conditions and also to properly detect path
    -+collisions at Step W1.
    ++If the worker fails to read the blob or to write it to the working tree,
    ++it removes the created file to avoid leaving empty files behind. This is
    ++the *only* time a worker is allowed to remove a file.
    ++
    ++As mentioned earlier, it is the responsibility of the main process to
    ++remove any file that blocks the checkout operation (or abort if the
    ++removal(s) would cause data loss and the user didn't ask to `--force`).
    ++This is crucial to avoid race conditions and also to properly detect
    ++path collisions at Step W1.
     +
     +After the workers finish writing the items and sending back the required
     +information, the main process handles the results in two steps:
-- 
2.30.1


  parent reply	other threads:[~2021-04-19  0:15 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-03-31  4:22   ` Christian Couder
2021-04-02 14:39     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-03-31  4:32   ` Christian Couder
2021-04-02 14:42     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-03-31  4:33   ` Christian Couder
2021-04-02 14:45     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-03-31  5:36   ` Christian Couder
2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-03-19  3:24   ` Matheus Tavares
2021-03-19 22:58     ` Junio C Hamano
2021-03-31  5:42 ` Christian Couder
2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-08 19:52   ` [PATCH v2 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-04-16 21:43   ` Junio C Hamano
2021-04-17 19:57     ` Matheus Tavares Bernardino
2021-04-19  9:41     ` Christian Couder
2021-04-19  0:14   ` Matheus Tavares [this message]
2021-04-19  0:14     ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-19  9:36       ` Christian Couder
2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 5/5] parallel-checkout: add design documentation Matheus Tavares

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1618790794.git.matheus.bernardino@usp.br \
    --to=matheus.bernardino@usp.br \
    --cc=christian.couder@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).