git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH] fetch: delay fetch_if_missing=0 until after config
@ 2019-10-07 18:18 Jonathan Tan
  2019-10-23 21:03 ` Emily Shaffer
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jonathan Tan @ 2019-10-07 18:18 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When running "git fetch" in a partial clone with no blobs, for example,
by:

  git clone --filter=blob:none --no-checkout \
    https://kernel.googlesource.com/pub/scm/git/git
  git -C git fetch

"git fetch" will fail to load the config blob object, printing "unable
to load config blob object".

This is because fetch_if_missing is set to 0 before the config is
processed. Git must set fetch_if_missing to 0 before the fetch because
as part of the fetch, packfile negotiation happens (and we do not want
to fetch any missing objects when checking existence of objects), but we
do not need to set it so early. Move the setting of fetch_if_missing to
the earliest possible point in cmd_fetch(), right before any fetching
happens.
---
This is not a full solution, but this helps in the use case described in
the commit message. The full solution probably will involve teaching the
fetch mechanism to support arbitrary struct repository objects, and by
moving fetch_if_missing into the repository object. (Alternatively, we
could add the equivalent of OBJECT_INFO_SKIP_FETCH_OBJECT to functions
like parse_commit() that are used by files like negotiator/default.c, or
split up commit parsing into object reading - which already has that
flag - and commit parsing.)
---
 builtin/fetch.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 24d382b2fb..865ae6677d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1666,8 +1666,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("fetch");
 
-	fetch_if_missing = 0;
-
 	/* Record the command line for the reflog */
 	strbuf_addstr(&default_rla, "fetch");
 	for (i = 1; i < argc; i++)
@@ -1734,6 +1732,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	fetch_if_missing = 0;
+
 	if (remote) {
 		if (filter_options.choice || has_promisor_remote())
 			fetch_one_setup_partial(remote);
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] fetch: delay fetch_if_missing=0 until after config
  2019-10-07 18:18 [PATCH] fetch: delay fetch_if_missing=0 until after config Jonathan Tan
@ 2019-10-23 21:03 ` Emily Shaffer
  2019-10-23 21:44 ` [PATCH v2] " Jonathan Tan
  2019-10-23 23:34 ` [PATCH v3] " Jonathan Tan
  2 siblings, 0 replies; 11+ messages in thread
From: Emily Shaffer @ 2019-10-23 21:03 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Mon, Oct 07, 2019 at 11:18:25AM -0700, Jonathan Tan wrote:
> When running "git fetch" in a partial clone with no blobs, for example,
> by:
> 
>   git clone --filter=blob:none --no-checkout \
>     https://kernel.googlesource.com/pub/scm/git/git
>   git -C git fetch
> 
> "git fetch" will fail to load the config blob object, printing "unable
> to load config blob object".

I'm having some trouble figuring out which object is actually missing.
Is this the .git/config object? (That doesn't make much sense to me...)
Is it .gitmodules?

> 
> This is because fetch_if_missing is set to 0 before the config is
> processed. Git must set fetch_if_missing to 0 before the fetch because
> as part of the fetch, packfile negotiation happens (and we do not want
> to fetch any missing objects when checking existence of objects), but we
> do not need to set it so early. Move the setting of fetch_if_missing to
> the earliest possible point in cmd_fetch(), right before any fetching
> happens.

Doubts aside about what's actually failing, I definitely agree with the
premise of not setting this until the last moment we need it. Plus, I
may be alone here, but it'd make it easier for me to understand the code
if I saw a note explaining *why* we don't want to fetch_if_missing in
this case.

By the way, I think I understand that this is OK to go in
unconditionally because:
 - In the full clone case, it's a no-op; we haven't got anything that's
   missing, so who cares.
 - In the filter case, it's as you said - we don't want to
   fetch_if_missing because that will turn someone's partial clone into
   a a full clone.
   - This probably applies to bare checkout, too.

Of course if I'm wrong I'd like to know, but that's how I understand it
at the moment.

> ---
> This is not a full solution, but this helps in the use case described in
> the commit message. The full solution probably will involve teaching the
> fetch mechanism to support arbitrary struct repository objects, and by
> moving fetch_if_missing into the repository object. (Alternatively, we
> could add the equivalent of OBJECT_INFO_SKIP_FETCH_OBJECT to functions
> like parse_commit() that are used by files like negotiator/default.c, or
> split up commit parsing into object reading - which already has that
> flag - and commit parsing.)

Ah, I remember this was listed as one of the potential intern projects -
I think we dismissed it as being too tech-debt-y for an intern to feel
good about. :(

> ---
>  builtin/fetch.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 24d382b2fb..865ae6677d 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1666,8 +1666,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  
>  	packet_trace_identity("fetch");
>  
> -	fetch_if_missing = 0;
> -
>  	/* Record the command line for the reflog */
>  	strbuf_addstr(&default_rla, "fetch");
>  	for (i = 1; i < argc; i++)
> @@ -1734,6 +1732,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  		}
>  	}
>  
> +	fetch_if_missing = 0;
> +
>  	if (remote) {
>  		if (filter_options.choice || has_promisor_remote())
>  			fetch_one_setup_partial(remote);
> -- 
> 2.23.0.581.g78d2f28ef7-goog
> 

The diff, though, looks fine for me.

 - Emily

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] fetch: delay fetch_if_missing=0 until after config
  2019-10-07 18:18 [PATCH] fetch: delay fetch_if_missing=0 until after config Jonathan Tan
  2019-10-23 21:03 ` Emily Shaffer
@ 2019-10-23 21:44 ` Jonathan Tan
  2019-10-23 23:30   ` Emily Shaffer
  2019-10-23 23:34 ` [PATCH v3] " Jonathan Tan
  2 siblings, 1 reply; 11+ messages in thread
From: Jonathan Tan @ 2019-10-23 21:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, emilyshaffer

Suppose, from a repository that has ".gitmodules", we clone with
--filter=blob:none:

  git clone --filter=blob:none --no-checkout \
    https://kernel.googlesource.com/pub/scm/git/git

Then we fetch:

  git -C git fetch

This will cause a "unable to load config blob object", because the
fetch_config_from_gitmodules() invocation in cmd_fetch() will attempt to
load ".gitmodules" (which Git knows to exist because the client has the
tree of HEAD) while fetch_if_missing is set to 0.

fetch_if_missing is set to 0 too early - ".gitmodules" here should be
lazily fetched.  Git must set fetch_if_missing to 0 before the fetch
because as part of the fetch, packfile negotiation happens (and we do
not want to fetch any missing objects when checking existence of
objects), but we do not need to set it so early. Move the setting of
fetch_if_missing to the earliest possible point in cmd_fetch(), right
before any fetching happens.
---
No changes from v1 except that I improved the commit message.

Thanks, Emily, for taking a look.

> I'm having some trouble figuring out which object is actually missing.
> Is this the .git/config object? (That doesn't make much sense to me...)
> Is it .gitmodules?

Yes, it is indeed .gitmodules. I improved the commit message to further
explain what is going on.

> By the way, I think I understand that this is OK to go in
> unconditionally because:
>  - In the full clone case, it's a no-op; we haven't got anything that's
>    missing, so who cares.
>  - In the filter case, it's as you said - we don't want to
>    fetch_if_missing because that will turn someone's partial clone into
>    a a full clone.
>    - This probably applies to bare checkout, too.

Yes, that is correct. What do you mean by bare checkout? If you mean the
checkout that happens after clone (that we can suppress with
--no-checkout), that indeed happens after fetch_if_missing=0, so we
shouldn't have a problem there.

 builtin/fetch.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0c345b5dfe..863c858fde 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1755,8 +1755,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("fetch");
 
-	fetch_if_missing = 0;
-
 	/* Record the command line for the reflog */
 	strbuf_addstr(&default_rla, "fetch");
 	for (i = 1; i < argc; i++)
@@ -1824,6 +1822,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	fetch_if_missing = 0;
+
 	if (remote) {
 		if (filter_options.choice || has_promisor_remote())
 			fetch_one_setup_partial(remote);
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] fetch: delay fetch_if_missing=0 until after config
  2019-10-23 21:44 ` [PATCH v2] " Jonathan Tan
@ 2019-10-23 23:30   ` Emily Shaffer
  0 siblings, 0 replies; 11+ messages in thread
From: Emily Shaffer @ 2019-10-23 23:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Wed, Oct 23, 2019 at 02:44:28PM -0700, Jonathan Tan wrote:
> Suppose, from a repository that has ".gitmodules", we clone with
> --filter=blob:none:
> 
>   git clone --filter=blob:none --no-checkout \
>     https://kernel.googlesource.com/pub/scm/git/git
> 
> Then we fetch:
> 
>   git -C git fetch
> 
> This will cause a "unable to load config blob object", because the
> fetch_config_from_gitmodules() invocation in cmd_fetch() will attempt to
> load ".gitmodules" (which Git knows to exist because the client has the
> tree of HEAD) while fetch_if_missing is set to 0.
> 
> fetch_if_missing is set to 0 too early - ".gitmodules" here should be
> lazily fetched.  Git must set fetch_if_missing to 0 before the fetch
> because as part of the fetch, packfile negotiation happens (and we do
> not want to fetch any missing objects when checking existence of
> objects), but we do not need to set it so early. Move the setting of
> fetch_if_missing to the earliest possible point in cmd_fetch(), right
> before any fetching happens.

I think your sign-off is missing from the new commit message, right?

Otherwise it looks fine to me.

> ---
> No changes from v1 except that I improved the commit message.
> 
> Thanks, Emily, for taking a look.
> 
> > I'm having some trouble figuring out which object is actually missing.
> > Is this the .git/config object? (That doesn't make much sense to me...)
> > Is it .gitmodules?
> 
> Yes, it is indeed .gitmodules. I improved the commit message to further
> explain what is going on.
> 
> > By the way, I think I understand that this is OK to go in
> > unconditionally because:
> >  - In the full clone case, it's a no-op; we haven't got anything that's
> >    missing, so who cares.
> >  - In the filter case, it's as you said - we don't want to
> >    fetch_if_missing because that will turn someone's partial clone into
> >    a a full clone.
> >    - This probably applies to bare checkout, too.
> 
> Yes, that is correct. What do you mean by bare checkout? If you mean the
> checkout that happens after clone (that we can suppress with
> --no-checkout), that indeed happens after fetch_if_missing=0, so we
> shouldn't have a problem there.

I meant bare clone, not checkout, my apologies, but as I understand it
better, they're completely separate concepts - that is, you can
certainly have a bare clone which is also a full clone. So, please
disregard this comment.

 - Emily

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3] fetch: delay fetch_if_missing=0 until after config
  2019-10-07 18:18 [PATCH] fetch: delay fetch_if_missing=0 until after config Jonathan Tan
  2019-10-23 21:03 ` Emily Shaffer
  2019-10-23 21:44 ` [PATCH v2] " Jonathan Tan
@ 2019-10-23 23:34 ` Jonathan Tan
  2019-10-24  4:30   ` Junio C Hamano
  2 siblings, 1 reply; 11+ messages in thread
From: Jonathan Tan @ 2019-10-23 23:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, emilyshaffer

Suppose, from a repository that has ".gitmodules", we clone with
--filter=blob:none:

  git clone --filter=blob:none --no-checkout \
    https://kernel.googlesource.com/pub/scm/git/git

Then we fetch:

  git -C git fetch

This will cause a "unable to load config blob object", because the
fetch_config_from_gitmodules() invocation in cmd_fetch() will attempt to
load ".gitmodules" (which Git knows to exist because the client has the
tree of HEAD) while fetch_if_missing is set to 0.

fetch_if_missing is set to 0 too early - ".gitmodules" here should be
lazily fetched.  Git must set fetch_if_missing to 0 before the fetch
because as part of the fetch, packfile negotiation happens (and we do
not want to fetch any missing objects when checking existence of
objects), but we do not need to set it so early. Move the setting of
fetch_if_missing to the earliest possible point in cmd_fetch(), right
before any fetching happens.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
No change from v2 except that I've added Signed-off-by.
---
 builtin/fetch.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0c345b5dfe..863c858fde 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1755,8 +1755,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("fetch");
 
-	fetch_if_missing = 0;
-
 	/* Record the command line for the reflog */
 	strbuf_addstr(&default_rla, "fetch");
 	for (i = 1; i < argc; i++)
@@ -1824,6 +1822,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	fetch_if_missing = 0;
+
 	if (remote) {
 		if (filter_options.choice || has_promisor_remote())
 			fetch_one_setup_partial(remote);
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] fetch: delay fetch_if_missing=0 until after config
  2019-10-23 23:34 ` [PATCH v3] " Jonathan Tan
@ 2019-10-24  4:30   ` Junio C Hamano
  2019-10-24 19:18     ` Jonathan Tan
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2019-10-24  4:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

> Suppose, from a repository that has ".gitmodules", we clone with
> --filter=blob:none:
>
>   git clone --filter=blob:none --no-checkout \
>     https://kernel.googlesource.com/pub/scm/git/git
>
> Then we fetch:
>
>   git -C git fetch
>
> This will cause a "unable to load config blob object", because the
> fetch_config_from_gitmodules() invocation in cmd_fetch() will attempt to
> load ".gitmodules" (which Git knows to exist because the client has the
> tree of HEAD) while fetch_if_missing is set to 0.
>
> fetch_if_missing is set to 0 too early - ".gitmodules" here should be
> lazily fetched.  Git must set fetch_if_missing to 0 before the fetch
> because as part of the fetch, packfile negotiation happens (and we do
> not want to fetch any missing objects when checking existence of
> objects)...

Is it only me to feel that this is piling band-aids on top of
band-aids?

Perhaps the addition (and enabling) of lazy-fetch should have been
done after "checking existence" calls are vetted and sifted into two
categories?  Some accesses to objects are "because we need it
now---so let's lazily fetch if that is available as a fallback
option to us", as opposed to "if we do not have it locally right
now, we want to know the fact".  And each callsite should be able to
declare for what reason between the two it is making the access.

The single fetch-if-missing boolean may have been a quick-and-dirty
way to get the ball rolling, but perhaps the codebase grew up enough
so that it is time to wean off of it?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] fetch: delay fetch_if_missing=0 until after config
  2019-10-24  4:30   ` Junio C Hamano
@ 2019-10-24 19:18     ` Jonathan Tan
  2019-10-25  2:38       ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Tan @ 2019-10-24 19:18 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, emilyshaffer

> > fetch_if_missing is set to 0 too early - ".gitmodules" here should be
> > lazily fetched.  Git must set fetch_if_missing to 0 before the fetch
> > because as part of the fetch, packfile negotiation happens (and we do
> > not want to fetch any missing objects when checking existence of
> > objects)...
> 
> Is it only me to feel that this is piling band-aids on top of
> band-aids?

To me, this is moving a band-aid, not adding another one. But to the
bigger point...

> Perhaps the addition (and enabling) of lazy-fetch should have been
> done after "checking existence" calls are vetted and sifted into two
> categories?  Some accesses to objects are "because we need it
> now---so let's lazily fetch if that is available as a fallback
> option to us", as opposed to "if we do not have it locally right
> now, we want to know the fact".  And each callsite should be able to
> declare for what reason between the two it is making the access.
> 
> The single fetch-if-missing boolean may have been a quick-and-dirty
> way to get the ball rolling, but perhaps the codebase grew up enough
> so that it is time to wean off of it?

This is one of the alternatives I mentioned in the original email [1]
after "---". But to elaborate on that, I prefer sifting regions over
sifting calls.

Sifting calls into two categories might work, but it's error-prone in
that we would have to do the same line-by-line analysis we did when we
added the repository argument to many functions, and we would have to
modify functions like parse_commit() to take a flag similar to
OBJECT_INFO_SKIP_FETCH_OBJECT. Also, we would have to do the same
careful inspection for future patches.

Instead, we can control whether a region of code lazy-fetches by moving
fetch_if_missing to the struct repository object and using a struct
repository that has fetch_if_missing==0 when we don't want lazy
fetching. Besides being less error-prone, the infrastructure for this
has already been built (if I remember correctly, for submodules, at
first).

The microproject to put fetch_if_missing into struct repository and my
Outreachy project 'Refactor "git index-pack" logic into library code'
[2] are steps towards this goal. (I don't think that the latter is
strictly necessary, but it will make things simpler. In particular,
passing "-C" then the_repository->gitdir to index-pack makes a few tests
fail - not many, but not zero; and even after we resolve that, we would
need CLI parameters ensuring that we marshal everything we need from
the_repository, including fetch_if_missing. It seems better to libify
index-pack first.)

[1] https://public-inbox.org/git/20191007181825.13463-1-jonathantanmy@google.com/
[2] https://www.outreachy.org/apply/project-selection/#git

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] fetch: delay fetch_if_missing=0 until after config
  2019-10-24 19:18     ` Jonathan Tan
@ 2019-10-25  2:38       ` Junio C Hamano
  2019-10-25 17:41         ` Jonathan Tan
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2019-10-25  2:38 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

> To me, this is moving a band-aid, not adding another one. But to the

Fair enough.

> Sifting calls into two categories might work, but it's error-prone in
> that we would have to do the same line-by-line analysis we did when we
> added the repository argument to many functions, and we would have to
> modify functions like parse_commit() to take a flag similar to
> OBJECT_INFO_SKIP_FETCH_OBJECT. Also, we would have to do the same
> careful inspection for future patches.

Absolutely.  That is the price to pay for the lazy-fetch feature.

> Instead, we can control whether a region of code lazy-fetches...

The approach "from here to there, we can set global to forbid
lazy-fetch" may prolong the life support of the quick-and-dirty
mechanism, but it has to assume you can say "from here to there"; it
would mean that we cannot go multi-threaded until we get off of it.

That is why I said it may be time for us to wean us off of the
quick-and-dirty hack that helped us bootstrapping the initial
implementation of lazy clone.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] fetch: delay fetch_if_missing=0 until after config
  2019-10-25  2:38       ` Junio C Hamano
@ 2019-10-25 17:41         ` Jonathan Tan
  2019-10-29  1:39           ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Tan @ 2019-10-25 17:41 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, emilyshaffer

> > Instead, we can control whether a region of code lazy-fetches...
> 
> The approach "from here to there, we can set global to forbid
> lazy-fetch" may prolong the life support of the quick-and-dirty
> mechanism, but it has to assume you can say "from here to there"; it
> would mean that we cannot go multi-threaded until we get off of it.

By "from here to there", I meant, for example, creating a struct
repository in cmd_fetch() that has fetch_if_missing=0, then passing that
repository to fetch_pack() (once fetch_pack() and all functions it calls
support a repository object). In that way, from here (start of
fetch_pack()) to there (end of fetch_pack()) there will be no lazy
fetching. As a bonus, if we ever want to support fetching in
repositories other than the_repository (e.g. submodules), this change
will allow us to support that too. I don't think this is
quick-and-dirty, and I don't see how this prevents multithreading.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] fetch: delay fetch_if_missing=0 until after config
  2019-10-25 17:41         ` Jonathan Tan
@ 2019-10-29  1:39           ` Junio C Hamano
  2019-11-01 20:43             ` Jonathan Tan
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2019-10-29  1:39 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

>> > Instead, we can control whether a region of code lazy-fetches...
>> 
>> The approach "from here to there, we can set global to forbid
>> lazy-fetch" may prolong the life support of the quick-and-dirty
>> mechanism, but it has to assume you can say "from here to there"; it
>> would mean that we cannot go multi-threaded until we get off of it.
>
> By "from here to there", I meant, for example, creating a struct
> repository in cmd_fetch() that has fetch_if_missing=0, then passing that
> repository to fetch_pack() (once fetch_pack() and all functions it calls
> support a repository object).

I know---but that means the struct cannot be shared among threads
that are calling object layer, some of which want to lazily fetch
missing objects while others only want to check the existence, at
the same time.

Compared to that, judicious use of OBJECT_INFO_SKIP_FETCH_OBJECT and
other flags by callers can tell the underlying machinery why we are
interested in the object, which I think is the right direction to go
in the longer term.  What I am not certain about is if we are ready
to move to the right direction for the longer term, or we should still
be relying on the big-hammer global bit for expediency a bit longer.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] fetch: delay fetch_if_missing=0 until after config
  2019-10-29  1:39           ` Junio C Hamano
@ 2019-11-01 20:43             ` Jonathan Tan
  0 siblings, 0 replies; 11+ messages in thread
From: Jonathan Tan @ 2019-11-01 20:43 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, emilyshaffer

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >> > Instead, we can control whether a region of code lazy-fetches...
> >> 
> >> The approach "from here to there, we can set global to forbid
> >> lazy-fetch" may prolong the life support of the quick-and-dirty
> >> mechanism, but it has to assume you can say "from here to there"; it
> >> would mean that we cannot go multi-threaded until we get off of it.
> >
> > By "from here to there", I meant, for example, creating a struct
> > repository in cmd_fetch() that has fetch_if_missing=0, then passing that
> > repository to fetch_pack() (once fetch_pack() and all functions it calls
> > support a repository object).
> 
> I know---but that means the struct cannot be shared among threads
> that are calling object layer, some of which want to lazily fetch
> missing objects while others only want to check the existence, at
> the same time.

That's true. Mixing lazy-fetch and non-lazy-fetch in a function is rare,
but admittedly it does happen - in particular, when fetching, the server
may send us objects delta-ed against a missing object, and we need to
lazy-fetch those missing objects.

> Compared to that, judicious use of OBJECT_INFO_SKIP_FETCH_OBJECT and
> other flags by callers can tell the underlying machinery why we are
> interested in the object, which I think is the right direction to go
> in the longer term.  What I am not certain about is if we are ready
> to move to the right direction for the longer term, or we should still
> be relying on the big-hammer global bit for expediency a bit longer.

I've taken a look at this and figured out tests that can demonstrate
what exactly is being lazy-fetched (and hopefully will be good enough to
prevent future code changes from adding extra lazy-fetches). In this
way, maybe we can at least reduce the scope of the big-hammer global.

I just sent out a patch [1].

[1] https://public-inbox.org/git/20191101203830.231676-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-07 18:18 [PATCH] fetch: delay fetch_if_missing=0 until after config Jonathan Tan
2019-10-23 21:03 ` Emily Shaffer
2019-10-23 21:44 ` [PATCH v2] " Jonathan Tan
2019-10-23 23:30   ` Emily Shaffer
2019-10-23 23:34 ` [PATCH v3] " Jonathan Tan
2019-10-24  4:30   ` Junio C Hamano
2019-10-24 19:18     ` Jonathan Tan
2019-10-25  2:38       ` Junio C Hamano
2019-10-25 17:41         ` Jonathan Tan
2019-10-29  1:39           ` Junio C Hamano
2019-11-01 20:43             ` Jonathan Tan

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git