git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH] send-pack: never fetch when checking exclusions
@ 2019-10-07 21:38 Jonathan Tan
  2019-10-08  4:10 ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Tan @ 2019-10-07 21:38 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When building the packfile to be sent, send_pack() is given a list of
remote refs to be used as exclusions. For each ref, it first checks if
the ref exists locally, and if it does, passes it with a "^" prefix to
pack-objects. However, in a partial clone, the check may trigger a lazy
fetch. Ensure that this lazy fetch does not occur.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 send-pack.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/send-pack.c b/send-pack.c
index 6dc16c3211..34c77cbb1a 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -40,7 +40,8 @@ int option_parse_push_signed(const struct option *opt,
 
 static void feed_object(const struct object_id *oid, FILE *fh, int negative)
 {
-	if (negative && !has_object_file(oid))
+	if (negative &&
+	    !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT))
 		return;
 
 	if (negative)
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] send-pack: never fetch when checking exclusions
  2019-10-07 21:38 [PATCH] send-pack: never fetch when checking exclusions Jonathan Tan
@ 2019-10-08  4:10 ` Junio C Hamano
  2019-10-08 18:37   ` [PATCH v2] " Jonathan Tan
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2019-10-08  4:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> When building the packfile to be sent, send_pack() is given a list of
> remote refs to be used as exclusions. For each ref, it first checks if
> the ref exists locally, and if it does, passes it with a "^" prefix to
> pack-objects. However, in a partial clone, the check may trigger a lazy
> fetch. Ensure that this lazy fetch does not occur.

Is there any effect worth describing here, other than the obvious
"we do not lazily fetch from within the has_object_file() function"?

For example, would this change mean that the resulting pack may
include stuff that are reachable from the (missing) negative objects
that would not otherwise have to be sent if these objects were
available (or made available by the lazy fetching), and we are
making a trade-off to send possibly more in order for not fetching?
Have we laid enough on the table to help readers if such a trade-off
(if we are making one, that is) strikes the right balance?

Thanks.

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  send-pack.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/send-pack.c b/send-pack.c
> index 6dc16c3211..34c77cbb1a 100644
> --- a/send-pack.c
> +++ b/send-pack.c
> @@ -40,7 +40,8 @@ int option_parse_push_signed(const struct option *opt,
>  
>  static void feed_object(const struct object_id *oid, FILE *fh, int negative)
>  {
> -	if (negative && !has_object_file(oid))
> +	if (negative &&
> +	    !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT))
>  		return;
>  
>  	if (negative)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] send-pack: never fetch when checking exclusions
  2019-10-08  4:10 ` Junio C Hamano
@ 2019-10-08 18:37   ` " Jonathan Tan
  2019-10-11  6:12     ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Tan @ 2019-10-08 18:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

When building the packfile to be sent, send_pack() is given a list of
remote refs to be used as exclusions. For each ref, it first checks if
the ref exists locally, and if it does, passes it with a "^" prefix to
pack-objects. However, in a partial clone, the check may trigger a lazy
fetch.

The additional commit ancestry information obtained during such fetches
may show that certain objects that would have been sent are already
known to the server, resulting in a smaller pack being sent. But this is
at the cost of fetching from many possibly unrelated refs, and the lazy
fetches do not help at all in the typical case where the client is
up-to-date with the upstream of the branch being pushed.

Ensure that these lazy fetches do not occur.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
> For example, would this change mean that the resulting pack may
> include stuff that are reachable from the (missing) negative objects
> that would not otherwise have to be sent if these objects were
> available (or made available by the lazy fetching), and we are
> making a trade-off to send possibly more in order for not fetching?
> Have we laid enough on the table to help readers if such a trade-off
> (if we are making one, that is) strikes the right balance?

Thanks for your comments. I've expanded the commit message.
---
 send-pack.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/send-pack.c b/send-pack.c
index 6dc16c3211..34c77cbb1a 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -40,7 +40,8 @@ int option_parse_push_signed(const struct option *opt,
 
 static void feed_object(const struct object_id *oid, FILE *fh, int negative)
 {
-	if (negative && !has_object_file(oid))
+	if (negative &&
+	    !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT))
 		return;
 
 	if (negative)
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] send-pack: never fetch when checking exclusions
  2019-10-08 18:37   ` [PATCH v2] " Jonathan Tan
@ 2019-10-11  6:12     ` Jeff King
  2019-10-11 12:31       ` Derrick Stolee
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2019-10-11  6:12 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster

On Tue, Oct 08, 2019 at 11:37:39AM -0700, Jonathan Tan wrote:

> When building the packfile to be sent, send_pack() is given a list of
> remote refs to be used as exclusions. For each ref, it first checks if
> the ref exists locally, and if it does, passes it with a "^" prefix to
> pack-objects. However, in a partial clone, the check may trigger a lazy
> fetch.
> 
> The additional commit ancestry information obtained during such fetches
> may show that certain objects that would have been sent are already
> known to the server, resulting in a smaller pack being sent. But this is
> at the cost of fetching from many possibly unrelated refs, and the lazy
> fetches do not help at all in the typical case where the client is
> up-to-date with the upstream of the branch being pushed.
> 
> Ensure that these lazy fetches do not occur.

That makes sense. For similar reasons, should we be using
OBJECT_INFO_QUICK here? If the other side has a bunch of ref tips that
we don't have, we'll end up re-scanning the pack directory over and over
(which is _usually_ pretty quick, but can be slow if you have a lot of
packs locally). And it's OK if we racily miss out on an exclusion due to
somebody else repacking simultaneously.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] send-pack: never fetch when checking exclusions
  2019-10-11  6:12     ` Jeff King
@ 2019-10-11 12:31       ` Derrick Stolee
  2019-10-11 16:15         ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2019-10-11 12:31 UTC (permalink / raw)
  To: Jeff King, Jonathan Tan; +Cc: git, gitster

On 10/11/2019 2:12 AM, Jeff King wrote:
> On Tue, Oct 08, 2019 at 11:37:39AM -0700, Jonathan Tan wrote:
> 
>> When building the packfile to be sent, send_pack() is given a list of
>> remote refs to be used as exclusions. For each ref, it first checks if
>> the ref exists locally, and if it does, passes it with a "^" prefix to
>> pack-objects. However, in a partial clone, the check may trigger a lazy
>> fetch.
>>
>> The additional commit ancestry information obtained during such fetches
>> may show that certain objects that would have been sent are already
>> known to the server, resulting in a smaller pack being sent. But this is
>> at the cost of fetching from many possibly unrelated refs, and the lazy
>> fetches do not help at all in the typical case where the client is
>> up-to-date with the upstream of the branch being pushed.
>>
>> Ensure that these lazy fetches do not occur.
> 
> That makes sense. For similar reasons, should we be using
> OBJECT_INFO_QUICK here? If the other side has a bunch of ref tips that
> we don't have, we'll end up re-scanning the pack directory over and over
> (which is _usually_ pretty quick, but can be slow if you have a lot of
> packs locally). And it's OK if we racily miss out on an exclusion due to
> somebody else repacking simultaneously.

That's a good idea. We can hint to the object store that we don't expect
misses to be due to a concurrent repack, so we don't want to reprepare
pack-files.

-Stolee


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] send-pack: never fetch when checking exclusions
  2019-10-11 12:31       ` Derrick Stolee
@ 2019-10-11 16:15         ` Jeff King
  2019-10-11 22:08           ` Jonathan Tan
  2019-10-12  0:47           ` Junio C Hamano
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff King @ 2019-10-11 16:15 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jonathan Tan, git, gitster

On Fri, Oct 11, 2019 at 08:31:30AM -0400, Derrick Stolee wrote:

> >> Ensure that these lazy fetches do not occur.
> > 
> > That makes sense. For similar reasons, should we be using
> > OBJECT_INFO_QUICK here? If the other side has a bunch of ref tips that
> > we don't have, we'll end up re-scanning the pack directory over and over
> > (which is _usually_ pretty quick, but can be slow if you have a lot of
> > packs locally). And it's OK if we racily miss out on an exclusion due to
> > somebody else repacking simultaneously.
> 
> That's a good idea. We can hint to the object store that we don't expect
> misses to be due to a concurrent repack, so we don't want to reprepare
> pack-files.

As a general rule (and why I'm raising this issue in reply to Jonathan's
patch), I think most or all sites that want OBJECT_INFO_QUICK will want
SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally
the same:

  - it's OK to racily have a false negative (we'll still be correct, but
    possibly a little less optimal)

  - it's expected and normal to be missing the object, so spending time
    double-checking the pack store wastes measurable time in real-world
    cases

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] send-pack: never fetch when checking exclusions
  2019-10-11 16:15         ` Jeff King
@ 2019-10-11 22:08           ` Jonathan Tan
  2019-10-17  6:10             ` Jeff King
  2019-10-12  0:47           ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Tan @ 2019-10-11 22:08 UTC (permalink / raw)
  To: peff; +Cc: stolee, jonathantanmy, git, gitster

> As a general rule (and why I'm raising this issue in reply to Jonathan's
> patch), I think most or all sites that want OBJECT_INFO_QUICK will want
> SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally
> the same:
> 
>   - it's OK to racily have a false negative (we'll still be correct, but
>     possibly a little less optimal)
> 
>   - it's expected and normal to be missing the object, so spending time
>     double-checking the pack store wastes measurable time in real-world
>     cases

I took a look on "next" and it's true for these reasons in most cases
but not all.

QUICK implies SKIP_FETCH_OBJECT:

  fetch-pack.c: Run with fetch_if_missing=0 (from builtin/fetch.c,
  builtin/fetch-pack.c, or through a lazy fetch) so OK.
  
  builtin/index-pack.c: Run with fetch_if_missing=0, so OK.
  
  builtin/fetch.c: Run with fetch_if_missing=0, so OK.
  
  object-store.h, sha1-file.c: Definition and implementation of this
  flag.

Everything is OK here. Now, SKIP_FETCH_OBJECT implies QUICK:

  cache-tree.c: I added this recently in f981ec18cf ("cache-tree: do not
  lazy-fetch tentative tree", 2019-09-09). No problem with a false
  negative, since we know how to reconstruct the tree. OK.
  
  object-store.h, sha1-file.c: Definition and implementation of this
  flag.
  
  send-pack.c: This patch (which is already in "next"). If we have a
  false negative, we might accidentally send more than we need. But that
  is not too bad.
  
  promisor-remote.c: This is the slightly tricky one. We use this
  information to determine if we got our lazily-fetched object from the
  most recent lazy fetch, or if we should continue attempting to fetch the
  given object from other promisor remotes; so this information is
  important. However, adding QUICK doesn't lose us anything because the
  lack of QUICK only helps us when there is another process packing
  loose objects: if we got our object, our object will be in a pack
  (because of the way the fetch is implemented - in particular, we need
  a pack because we need the ".promisor" file).

So everything is OK except for promisor-remote.c, but even that is OK
for another reason.

Having said that, perhaps we should consider promisor-remote.c as
low-level code and expect it to know that objects are fetched into a
packfile (as opposed to loose objects), so it can safely use QUICK
(which is documented as checking packed after packed and loose). If no
one disagrees, I can make such a patch after jt/push-avoid-lazy-fetch is
merged to master (as is the plan, according to What's Cooking [1]).

[1] https://public-inbox.org/git/xmqq8sprhgzc.fsf@gitster-ct.c.googlers.com/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] send-pack: never fetch when checking exclusions
  2019-10-11 16:15         ` Jeff King
  2019-10-11 22:08           ` Jonathan Tan
@ 2019-10-12  0:47           ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2019-10-12  0:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, Jonathan Tan, git

Jeff King <peff@peff.net> writes:

> As a general rule (and why I'm raising this issue in reply to Jonathan's
> patch), I think most or all sites that want OBJECT_INFO_QUICK will want
> SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally
> the same:
>
>   - it's OK to racily have a false negative (we'll still be correct, but
>     possibly a little less optimal)
>
>   - it's expected and normal to be missing the object, so spending time
>     double-checking the pack store wastes measurable time in real-world
>     cases

31f5256c ("sha1-file: split OBJECT_INFO_FOR_PREFETCH", 2019-05-28)
separated SKIP_FETCH_OBJECT out of FOR_PREFETCH, the latter of which
was and is SKIP_FETCH and QUICK combined.  Use SKIP_FETCH_OBJECT
alone may need to be re-examined and discouraged?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] send-pack: never fetch when checking exclusions
  2019-10-11 22:08           ` Jonathan Tan
@ 2019-10-17  6:10             ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2019-10-17  6:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: stolee, git, gitster

On Fri, Oct 11, 2019 at 03:08:22PM -0700, Jonathan Tan wrote:

> > As a general rule (and why I'm raising this issue in reply to Jonathan's
> > patch), I think most or all sites that want OBJECT_INFO_QUICK will want
> > SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally
> > the same:
> > 
> >   - it's OK to racily have a false negative (we'll still be correct, but
> >     possibly a little less optimal)
> > 
> >   - it's expected and normal to be missing the object, so spending time
> >     double-checking the pack store wastes measurable time in real-world
> >     cases
> 
> I took a look on "next" and it's true for these reasons in most cases
> but not all.

Thanks for digging into this.

> QUICK implies SKIP_FETCH_OBJECT:
> 
>   fetch-pack.c: Run with fetch_if_missing=0 (from builtin/fetch.c,
>   builtin/fetch-pack.c, or through a lazy fetch) so OK.
>   
>   builtin/index-pack.c: Run with fetch_if_missing=0, so OK.
>   
>   builtin/fetch.c: Run with fetch_if_missing=0, so OK.
>   
>   object-store.h, sha1-file.c: Definition and implementation of this
>   flag.

Right, I think going in this direction is pretty simple. Having been
marked with QUICK, they hit both of my points from above. And if we want
to avoid re-scanning the pack directory because of cost, we _definitely_
want to avoid making an expensive network call.

> Everything is OK here. Now, SKIP_FETCH_OBJECT implies QUICK:
> 
>   cache-tree.c: I added this recently in f981ec18cf ("cache-tree: do not
>   lazy-fetch tentative tree", 2019-09-09). No problem with a false
>   negative, since we know how to reconstruct the tree. OK.
> [...]
>   send-pack.c: This patch (which is already in "next"). If we have a
>   false negative, we might accidentally send more than we need. But that
>   is not too bad.

Yeah, I think both of these could be QUICK.

>   promisor-remote.c: This is the slightly tricky one. We use this
>   information to determine if we got our lazily-fetched object from the
>   most recent lazy fetch, or if we should continue attempting to fetch the
>   given object from other promisor remotes; so this information is
>   important. However, adding QUICK doesn't lose us anything because the
>   lack of QUICK only helps us when there is another process packing
>   loose objects: if we got our object, our object will be in a pack
>   (because of the way the fetch is implemented - in particular, we need
>   a pack because we need the ".promisor" file).
> 
> So everything is OK except for promisor-remote.c, but even that is OK
> for another reason.

Yeah, though I wouldn't be sad to see that use a separate flag, since it
really is about promisor logic.

That implies to me maybe we should be using QUICK more aggressively, and
QUICK should auto-imply SKIP_FETCH_OBJECT.

> Having said that, perhaps we should consider promisor-remote.c as
> low-level code and expect it to know that objects are fetched into a
> packfile (as opposed to loose objects), so it can safely use QUICK
> (which is documented as checking packed after packed and loose). If no
> one disagrees, I can make such a patch after jt/push-avoid-lazy-fetch is
> merged to master (as is the plan, according to What's Cooking [1]).

I think it's OK to continue leaving out QUICK there if it's not causing
problems. It really is a bit different than the other cases.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-07 21:38 [PATCH] send-pack: never fetch when checking exclusions Jonathan Tan
2019-10-08  4:10 ` Junio C Hamano
2019-10-08 18:37   ` [PATCH v2] " Jonathan Tan
2019-10-11  6:12     ` Jeff King
2019-10-11 12:31       ` Derrick Stolee
2019-10-11 16:15         ` Jeff King
2019-10-11 22:08           ` Jonathan Tan
2019-10-17  6:10             ` Jeff King
2019-10-12  0:47           ` Junio C Hamano

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox