git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
	stolee@gmail.com, gitster@pobox.com
Subject: [PATCH v3 0/7] Lazy fetch with subprocess
Date: Mon, 17 Aug 2020 21:01:30 -0700	[thread overview]
Message-ID: <cover.1597722941.git.jonathantanmy@google.com> (raw)
In-Reply-To: <20200724223844.2723397-1-jonathantanmy@google.com>

These patches are based on jc/no-update-fetch-head, once again.

I've addressed most reviewer comments by expanding commit messages and
fixing the cascading "if" issue in patch 3.

> Would "git fetch" be the only user potentially benefit from being
> able to dereference a tag that already exists locally and ignore
> missing ones?  I wonder if teaching deref_tag() a new flag would be
> a better alternative to keep the separation of concerns at different
> layers.

Right now, I think so - "fetch" is a special case of having to avoid
infinite loops when lazy fetching, and of being able to tolerate missing
ref targets. I don't think most other commands are like that.

Jonathan Tan (7):
  negotiator/noop: add noop fetch negotiator
  fetch: allow refspecs specified through stdin
  fetch: avoid reading submodule config until needed
  fetch: only populate existing_refs if needed
  fetch-pack: do not lazy-fetch during ref iteration
  promisor-remote: lazy-fetch objects in subprocess
  fetch-pack: remove no_dependents code

 Documentation/config/fetch.txt            |   5 +-
 Documentation/git-fetch.txt               |   4 +
 Documentation/technical/partial-clone.txt |  13 +-
 Makefile                                  |   1 +
 builtin/fetch-pack.c                      |   4 -
 builtin/fetch.c                           |  42 ++++-
 fetch-negotiator.c                        |   5 +
 fetch-pack.c                              | 189 +++++++++-------------
 fetch-pack.h                              |  14 --
 negotiator/noop.c                         |  44 +++++
 negotiator/noop.h                         |   8 +
 promisor-remote.c                         |  46 +++---
 remote-curl.c                             |   6 -
 repo-settings.c                           |   2 +
 repository.h                              |   1 +
 submodule-config.c                        |   8 +-
 t/t0410-partial-clone.sh                  |   2 +-
 t/t4067-diff-partial-clone.sh             |   8 +-
 t/t5554-noop-fetch-negotiator.sh          |  22 +++
 t/t5601-clone.sh                          |   2 +-
 t/t5616-partial-clone.sh                  |  20 +++
 transport.c                               |   4 -
 transport.h                               |   7 -
 23 files changed, 255 insertions(+), 202 deletions(-)
 create mode 100644 negotiator/noop.c
 create mode 100644 negotiator/noop.h
 create mode 100755 t/t5554-noop-fetch-negotiator.sh

Range-diff against v2:
1:  35bdd372ae ! 1:  5b3b49d123 negotiator/null: add null fetch negotiator
    @@ Metadata
     Author: Jonathan Tan <jonathantanmy@google.com>
     
      ## Commit message ##
    -    negotiator/null: add null fetch negotiator
    +    negotiator/noop: add noop fetch negotiator
     
    -    Add a null fetch negotiator. This is introduced to allow partial clones
    +    Add a noop fetch negotiator. This is introduced to allow partial clones
         to skip the unneeded negotiation step when fetching missing objects
         using a "git fetch" subprocess. (The implementation of spawning a "git
         fetch" subprocess will be done in a subsequent patch.) But this can also
    @@ Documentation/config/fetch.txt: fetch.negotiationAlgorithm::
      	server. Set to "skipping" to use an algorithm that skips commits in an
      	effort to converge faster, but may result in a larger-than-necessary
     -	packfile; The default is "default" which instructs Git to use the default algorithm
    -+	packfile; or set to "null" to not send any information at all, which
    ++	packfile; or set to "noop" to not send any information at all, which
     +	will almost certainly result in a larger-than-necessary packfile, but
     +	will skip the negotiation step.
     +	The default is "default" which instructs Git to use the default algorithm
    @@ Makefile: LIB_OBJS += mergesort.o
      LIB_OBJS += midx.o
      LIB_OBJS += name-hash.o
      LIB_OBJS += negotiator/default.o
    -+LIB_OBJS += negotiator/null.o
    ++LIB_OBJS += negotiator/noop.o
      LIB_OBJS += negotiator/skipping.o
      LIB_OBJS += notes-cache.o
      LIB_OBJS += notes-merge.o
    @@ fetch-negotiator.c
      #include "fetch-negotiator.h"
      #include "negotiator/default.h"
      #include "negotiator/skipping.h"
    -+#include "negotiator/null.h"
    ++#include "negotiator/noop.h"
      #include "repository.h"
      
      void fetch_negotiator_init(struct repository *r,
    @@ fetch-negotiator.c: void fetch_negotiator_init(struct repository *r,
      		skipping_negotiator_init(negotiator);
      		return;
      
    -+	case FETCH_NEGOTIATION_NULL:
    -+		null_negotiator_init(negotiator);
    ++	case FETCH_NEGOTIATION_NOOP:
    ++		noop_negotiator_init(negotiator);
     +		return;
     +
      	case FETCH_NEGOTIATION_DEFAULT:
      	default:
      		default_negotiator_init(negotiator);
     
    - ## negotiator/null.c (new) ##
    + ## negotiator/noop.c (new) ##
     @@
     +#include "cache.h"
    -+#include "null.h"
    ++#include "noop.h"
     +#include "../commit.h"
     +#include "../fetch-negotiator.h"
     +
    @@ negotiator/null.c (new)
     +	 * This negotiator does not emit any commits, so there is no commit to
     +	 * be acknowledged. If there is any ack, there is a bug.
     +	 */
    -+	BUG("ack with null negotiator, which does not emit any commits");
    ++	BUG("ack with noop negotiator, which does not emit any commits");
     +	return 0;
     +}
     +
    @@ negotiator/null.c (new)
     +	/* nothing to release */
     +}
     +
    -+void null_negotiator_init(struct fetch_negotiator *negotiator)
    ++void noop_negotiator_init(struct fetch_negotiator *negotiator)
     +{
     +	negotiator->known_common = known_common;
     +	negotiator->add_tip = add_tip;
    @@ negotiator/null.c (new)
     +	negotiator->data = NULL;
     +}
     
    - ## negotiator/null.h (new) ##
    + ## negotiator/noop.h (new) ##
     @@
    -+#ifndef NEGOTIATOR_NULL_H
    -+#define NEGOTIATOR_NULL_H
    ++#ifndef NEGOTIATOR_NOOP_H
    ++#define NEGOTIATOR_NOOP_H
     +
     +struct fetch_negotiator;
     +
    -+void null_negotiator_init(struct fetch_negotiator *negotiator);
    ++void noop_negotiator_init(struct fetch_negotiator *negotiator);
     +
     +#endif
     
    @@ repo-settings.c: void prepare_repo_settings(struct repository *r)
      	if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
      		if (!strcasecmp(strval, "skipping"))
      			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
    -+		else if (!strcasecmp(strval, "null"))
    -+			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NULL;
    ++		else if (!strcasecmp(strval, "noop"))
    ++			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
      		else
      			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_DEFAULT;
      	}
    @@ repository.h: enum fetch_negotiation_setting {
      	FETCH_NEGOTIATION_NONE = 0,
      	FETCH_NEGOTIATION_DEFAULT = 1,
      	FETCH_NEGOTIATION_SKIPPING = 2,
    -+	FETCH_NEGOTIATION_NULL = 3,
    ++	FETCH_NEGOTIATION_NOOP = 3,
      };
      
      struct repo_settings {
     
    - ## t/t5554-null-fetch-negotiator.sh (new) ##
    + ## t/t5554-noop-fetch-negotiator.sh (new) ##
     @@
     +#!/bin/sh
     +
    -+test_description='test null fetch negotiator'
    ++test_description='test noop fetch negotiator'
     +. ./test-lib.sh
     +
    -+test_expect_success 'null negotiator does not emit any "have"' '
    ++test_expect_success 'noop negotiator does not emit any "have"' '
     +	rm -f trace &&
     +
     +	test_create_repo server &&
    @@ t/t5554-null-fetch-negotiator.sh (new)
     +	test_create_repo client &&
     +	test_commit -C client we_have &&
     +
    -+	test_config -C client fetch.negotiationalgorithm null &&
    ++	test_config -C client fetch.negotiationalgorithm noop &&
     +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
     +
     +	! grep "fetch> have" trace &&
2:  00ad7dd875 = 2:  9f277f1631 fetch: allow refspecs specified through stdin
3:  8b4a522a13 ! 3:  fda9f834f6 fetch: avoid reading submodule config until needed
    @@ Metadata
      ## Commit message ##
         fetch: avoid reading submodule config until needed
     
    -    Teach "git fetch" to avoid reading the submodule config until necessary.
    -    This allows users to avoid the lazy-fetching of this potentially missing
    -    config file by specifying the --recurse-submodules=no command line
    -    option.
    +    In "fetch", there are two parameters submodule_fetch_jobs_config and
    +    recurse_submodules that can be set in a variety of ways: through
    +    .gitmodules, through .git/config, and through the command line.
    +    Currently "fetch" handles this by first reading .gitmodules, then
    +    reading .git/config (allowing it to overwrite existing values), then
    +    reading the command line (allowing it to overwrite existing values).
    +
    +    Notice that we can avoid reading .gitmodules if .git/config and/or the
    +    command line already provides us with what we need. In addition, if
    +    recurse_submodules is found to be "no", we do not need the value of
    +    submodule_fetch_jobs_config.
    +
    +    Avoiding reading .gitmodules is especially important when we use "git
    +    fetch" to perform lazy fetches in a partial clone because the
    +    .gitmodules file itself might need to be lazy fetched (and otherwise
    +    causing an infinite loop).
    +
    +    In light of all this, avoid reading .gitmodules until necessary. When
    +    reading it, we may only need one of the two parameters it provides, so
    +    teach fetch_config_from_gitmodules() to support NULL arguments. With
    +    this patch, users (including Git itself when invoking "git fetch" to
    +    lazy-fetch) will be able to guarantee avoiding reading .gitmodules by
    +    passing --recurse-submodules=no.
     
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
    @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
      		if (deepen_relative < 0)
     
      ## submodule-config.c ##
    -@@ submodule-config.c: struct fetch_config {
    - static int gitmodules_fetch_config(const char *var, const char *value, void *cb)
    +@@ submodule-config.c: static int gitmodules_fetch_config(const char *var, const char *value, void *cb)
      {
      	struct fetch_config *config = cb;
    --	if (!strcmp(var, "submodule.fetchjobs")) {
    -+	if (!strcmp(var, "submodule.fetchjobs") && config->max_children) {
    - 		*(config->max_children) = parse_submodule_fetchjobs(var, value);
    + 	if (!strcmp(var, "submodule.fetchjobs")) {
    +-		*(config->max_children) = parse_submodule_fetchjobs(var, value);
    ++		if (config->max_children)
    ++			*(config->max_children) =
    ++				parse_submodule_fetchjobs(var, value);
      		return 0;
    --	} else if (!strcmp(var, "fetch.recursesubmodules")) {
    -+	} else if (!strcmp(var, "fetch.recursesubmodules") &&
    -+		   config->recurse_submodules) {
    - 		*(config->recurse_submodules) = parse_fetch_recurse_submodules_arg(var, value);
    + 	} else if (!strcmp(var, "fetch.recursesubmodules")) {
    +-		*(config->recurse_submodules) = parse_fetch_recurse_submodules_arg(var, value);
    ++		if (config->recurse_submodules)
    ++			*(config->recurse_submodules) =
    ++				parse_fetch_recurse_submodules_arg(var, value);
      		return 0;
      	}
    + 
4:  77bc83e7f2 ! 4:  a5554cd27f fetch: only populate existing_refs if needed
    @@ Metadata
      ## Commit message ##
         fetch: only populate existing_refs if needed
     
    -    When fetching tags, Git only writes tags that do not already exist in
    -    the client repository. This necessitates an iteration over all the refs,
    -    but fetch performs this iteration even if no tags are fetched.
    +    In "fetch", get_ref_map() iterates over all refs to populate
    +    "existing_refs" in order to populate peer_ref->old_oid in the returned
    +    refmap, even if the refmap has no peer_ref set - which is the case when
    +    only literal hashes (i.e. no refs by name) are fetched.
     
    -    This issue is more severe in a partial clone because the iteration over
    -    refs also checks that the targets of those refs are present,
    -    necessitating a lazy fetch if the target is missing.
    +    Iterating over refs causes the targets of those refs to be checked for
    +    existence. Avoiding this is especially important when we use "git fetch"
    +    to perform lazy fetches in a partial clone because a target of such a
    +    ref may need to be itself lazy-fetched (and otherwise causing an
    +    infinite loop).
     
    -    Therefore, iterate over the refs only when necessary. The user can avoid
    -    this iteration by refraining from fetching tags, for example, by passing
    -    --no-tags as an argument. A subsequent patch will also teach Git to use
    -    "git fetch" to lazy-fetch missing objects in a partial clone, thus also
    -    making use of this change.
    +    Therefore, avoid populating "existing_refs" until necessary. With this
    +    patch, because Git lazy-fetches objects by literal hashes (to be done in
    +    a subsequent commit), it will then be able to guarantee avoiding reading
    +    targets of refs.
     
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
5:  d42e98ff07 ! 5:  f56c24dd1b fetch-pack: do not lazy-fetch during ref iteration
    @@ Metadata
      ## Commit message ##
         fetch-pack: do not lazy-fetch during ref iteration
     
    -    Teach "fetch-pack" not to lazy fetch whenever iterating over refs. This
    -    is done by using the raw form of ref iteration and by dereferencing tags
    -    ourselves.
    +    In order to determine negotiation tips, "fetch-pack" iterates over all
    +    refs and dereferences all annotated tags found. This causes the
    +    existence of targets of refs and annotated tags to be checked. Avoiding
    +    this is especially important when we use "git fetch" (which invokes
    +    "fetch-pack") to perform lazy fetches in a partial clone because a
    +    target of such a ref or annotated tag may need to be itself lazy-fetched
    +    (and otherwise causing an infinite loop).
    +
    +    Therefore, teach "fetch-pack" not to lazy fetch whenever iterating over
    +    refs. This is done by using the raw form of ref iteration and by
    +    dereferencing tags ourselves.
     
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
6:  55d2e5a4cc = 6:  4e72ca6258 promisor-remote: lazy-fetch objects in subprocess
7:  e8f16d6908 = 7:  3ff9d034e9 fetch-pack: remove no_dependents code
-- 
2.28.0.220.ged08abb693-goog


  parent reply	other threads:[~2020-08-18  4:02 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-24 22:38 [RFC PATCH] Modify fetch-pack to no longer die on error? Jonathan Tan
2020-07-24 23:07 ` Junio C Hamano
2020-07-24 23:11 ` Junio C Hamano
2020-07-25 21:41   ` Jeff King
2020-07-25 23:01     ` Junio C Hamano
2020-07-27 17:11       ` Jeff King
2020-07-28 19:23       ` Jonathan Tan
2020-07-28 20:08         ` Jeff King
2020-07-29 18:53           ` Jonathan Tan
2020-07-29 19:29             ` Jeff King
2020-07-29 19:02           ` Junio C Hamano
2020-07-29 22:55             ` Jonathan Tan
2020-08-05  1:20 ` [RFC PATCH 0/7] Lazy fetch with subprocess Jonathan Tan
2020-08-05  1:20   ` [RFC PATCH 1/7] fetch-pack: allow NULL negotiator->add_tip Jonathan Tan
2020-08-05 19:53     ` Junio C Hamano
2020-08-07 20:53       ` Jonathan Tan
2020-08-05  1:20   ` [RFC PATCH 2/7] fetch-pack: allow NULL negotiator->known_common Jonathan Tan
2020-08-05 20:08     ` Junio C Hamano
2020-08-05 22:11       ` Junio C Hamano
2020-08-07 20:59         ` Jonathan Tan
2020-08-05  1:20   ` [RFC PATCH 3/7] negotiator/null: add null fetch negotiator Jonathan Tan
2020-08-05  1:20   ` [RFC PATCH 4/7] fetch: --stdin Jonathan Tan
2020-08-05 20:33     ` Junio C Hamano
2020-08-07 21:10       ` Jonathan Tan
2020-08-07 21:58         ` Junio C Hamano
2020-08-07 21:10       ` Jonathan Tan
2020-08-05  1:20   ` [RFC PATCH 5/7] fetch: submodule config Jonathan Tan
2020-08-05  1:20   ` [RFC PATCH 6/7] fetch: only populate existing_refs if needed Jonathan Tan
2020-08-05  1:20   ` [RFC PATCH 7/7] promisor-remote: use subprocess to fetch Jonathan Tan
2020-08-11 22:52 ` [PATCH v2 0/7] Lazy fetch with subprocess Jonathan Tan
2020-08-11 22:52   ` [PATCH v2 1/7] negotiator/null: add null fetch negotiator Jonathan Tan
2020-08-12 12:55     ` Derrick Stolee
2020-08-12 16:44       ` Junio C Hamano
2020-08-12 17:29         ` Jonathan Tan
2020-08-11 22:52   ` [PATCH v2 2/7] fetch: allow refspecs specified through stdin Jonathan Tan
2020-08-11 22:52   ` [PATCH v2 3/7] fetch: avoid reading submodule config until needed Jonathan Tan
2020-08-12 17:34     ` Junio C Hamano
2020-08-11 22:52   ` [PATCH v2 4/7] fetch: only populate existing_refs if needed Jonathan Tan
2020-08-12 18:06     ` Junio C Hamano
2020-08-11 22:52   ` [PATCH v2 5/7] fetch-pack: do not lazy-fetch during ref iteration Jonathan Tan
2020-08-12 18:25     ` Junio C Hamano
2020-08-11 22:52   ` [PATCH v2 6/7] promisor-remote: lazy-fetch objects in subprocess Jonathan Tan
2020-08-12 18:28     ` Junio C Hamano
2020-08-11 22:52   ` [PATCH v2 7/7] fetch-pack: remove no_dependents code Jonathan Tan
2020-08-12 12:51   ` [PATCH v2 0/7] Lazy fetch with subprocess Derrick Stolee
2020-08-18  4:01 ` Jonathan Tan [this message]
2020-08-18  4:01   ` [PATCH v3 1/7] negotiator/noop: add noop fetch negotiator Jonathan Tan
2020-08-18  4:01   ` [PATCH v3 2/7] fetch: allow refspecs specified through stdin Jonathan Tan
2020-08-18  4:01   ` [PATCH v3 3/7] fetch: avoid reading submodule config until needed Jonathan Tan
2020-08-18  4:01   ` [PATCH v3 4/7] fetch: only populate existing_refs if needed Jonathan Tan
2020-08-18  4:01   ` [PATCH v3 5/7] fetch-pack: do not lazy-fetch during ref iteration Jonathan Tan
2020-08-18  4:01   ` [PATCH v3 6/7] promisor-remote: lazy-fetch objects in subprocess Jonathan Tan
2020-08-18  4:01   ` [PATCH v3 7/7] fetch-pack: remove no_dependents code Jonathan Tan
2020-08-18 19:56   ` [PATCH v3 0/7] Lazy fetch with subprocess Junio C Hamano
2020-08-18 22:32     ` Junio C Hamano
2020-08-18 23:36       ` [PATCH] fixup! promisor-remote: lazy-fetch objects in subprocess Jonathan Tan
2020-08-18 23:57         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1597722941.git.jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).