git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] sha1-file: make pretend_object_file() not prefetch
@ 2020-07-21 22:50 Jonathan Tan
  2020-07-21 23:25 ` Junio C Hamano
  2020-07-21 23:27 ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Jonathan Tan @ 2020-07-21 22:50 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When pretend_object_file() is invoked with an object that does not exist
(as is the typical case), there is no need to fetch anything from the
promisor remote, because the caller already knows what the object is
supposed to contain. Therefore, suppress the fetch. (The
OBJECT_INFO_QUICK flag is added for the same reason.)

This was noticed at $DAYJOB when "blame" was run on a file that had
uncommitted modifications.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
This is another instance of the issue Junio raised here [1]:

> Fixing issues hit by real users reactively is a necessary and good
> thing, but this is not the first time we patch callers of
> has_object_file() for this kind of "we are merely trying to
> determine the boundary of what we have, so that we know what we need
> to add to this repository" queries, I am afraid.
>
> Perhaps it is a good idea to sweep all the hits from "git grep -e
> has_object_file \*.c" and audit the codebase to see if there are
> other problematic ones?

To make progress towards solving this, I'm thinking of making a new
function has_object() that takes a repository, an object ID, and flags,
and only supports one flag: HAS_OBJECT_RECHECK_PACKED (which has the
opposite meaning of OBJECT_INFO_QUICK). Checks that should not fetch
should use has_object(), and checks that should fetch (if they exist - I
think that most would want additional information about the object, so
they would use oid_object_info() or similar already) should use
oid_object_info() or a similar function. That way we can see how many
uses of has_object_file() we have checked, and at the same time make
behavior we usually want into the default behavior. Any opinions?

[1] https://lore.kernel.org/git/xmqqpn8wie21.fsf@gitster.c.googlers.com/
---
 sha1-file.c      |  3 ++-
 t/t8002-blame.sh | 11 +++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index ccd34dd9e8..45fdb8415c 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1600,7 +1600,8 @@ int pretend_object_file(void *buf, unsigned long len, enum object_type type,
 	struct cached_object *co;
 
 	hash_object_file(the_hash_algo, buf, len, type_name(type), oid);
-	if (has_object_file(oid) || find_cached_object(oid))
+	if (has_object_file_with_flags(oid, OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT) ||
+	    find_cached_object(oid))
 		return 0;
 	ALLOC_GROW(cached_objects, cached_object_nr + 1, cached_object_alloc);
 	co = &cached_objects[cached_object_nr++];
diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
index eea048e52c..2ed6aaae35 100755
--- a/t/t8002-blame.sh
+++ b/t/t8002-blame.sh
@@ -122,4 +122,15 @@ test_expect_success '--exclude-promisor-objects does not BUG-crash' '
 	test_must_fail git blame --exclude-promisor-objects one
 '
 
+test_expect_success 'blame with uncommitted edits in partial clone does not crash' '
+	git init server &&
+	echo foo >server/file.txt &&
+	git -C server add file.txt &&
+	git -C server commit -m file &&
+
+	git clone --filter=blob:none "file://$(pwd)/server" client &&
+	echo bar >>client/file.txt &&
+	git -C client blame file.txt
+'
+
 test_done
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sha1-file: make pretend_object_file() not prefetch
  2020-07-21 22:50 [PATCH] sha1-file: make pretend_object_file() not prefetch Jonathan Tan
@ 2020-07-21 23:25 ` Junio C Hamano
  2020-07-21 23:27 ` Junio C Hamano
  1 sibling, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2020-07-21 23:25 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> To make progress towards solving this, I'm thinking of making a new
> function has_object() that takes a repository, an object ID, and flags,
> and only supports one flag: HAS_OBJECT_RECHECK_PACKED (which has the
> opposite meaning of OBJECT_INFO_QUICK). Checks that should not fetch
> should use has_object(), and checks that should fetch (if they exist - I
> think that most would want additional information about the object, so
> they would use oid_object_info() or similar already) should use
> oid_object_info() or a similar function. That way we can see how many
> uses of has_object_file() we have checked, and at the same time make
> behavior we usually want into the default behavior. Any opinions?

Yes, a new function (or a pair of functions) makes it easy to see
how much progress we made and also gives us easier greppability.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sha1-file: make pretend_object_file() not prefetch
  2020-07-21 22:50 [PATCH] sha1-file: make pretend_object_file() not prefetch Jonathan Tan
  2020-07-21 23:25 ` Junio C Hamano
@ 2020-07-21 23:27 ` Junio C Hamano
  2020-07-23 17:47   ` Jeff King
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2020-07-21 23:27 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> When pretend_object_file() is invoked with an object that does not exist
> (as is the typical case), there is no need to fetch anything from the
> promisor remote, because the caller already knows what the object is
> supposed to contain. Therefore, suppress the fetch. (The
> OBJECT_INFO_QUICK flag is added for the same reason.)

Yes, "pretend" is also a way to lie about the contents IIRC, so even
if the object is available elsewhere, we should *not* fetch from the
promisor.  Makes sense to me.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sha1-file: make pretend_object_file() not prefetch
  2020-07-21 23:27 ` Junio C Hamano
@ 2020-07-23 17:47   ` Jeff King
  2020-07-23 18:06     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2020-07-23 17:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git

On Tue, Jul 21, 2020 at 04:27:15PM -0700, Junio C Hamano wrote:

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > When pretend_object_file() is invoked with an object that does not exist
> > (as is the typical case), there is no need to fetch anything from the
> > promisor remote, because the caller already knows what the object is
> > supposed to contain. Therefore, suppress the fetch. (The
> > OBJECT_INFO_QUICK flag is added for the same reason.)
> 
> Yes, "pretend" is also a way to lie about the contents IIRC, so even
> if the object is available elsewhere, we should *not* fetch from the
> promisor.  Makes sense to me.

I agree this patch is fine, but I wonder if it could go even further. If
we are pretending some particular contents, shouldn't we override
anything that might be in the object database? I.e., could we eliminate
this has_object_file() entirely?

That should be OK for the same reason that it's OK to use QUICK.

There's only one caller of this function (git-blame), which I think
would be happy with such a change.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sha1-file: make pretend_object_file() not prefetch
  2020-07-23 17:47   ` Jeff King
@ 2020-07-23 18:06     ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2020-07-23 18:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, git

Jeff King <peff@peff.net> writes:

> I agree this patch is fine, but I wonder if it could go even further. If
> we are pretending some particular contents, shouldn't we override
> anything that might be in the object database? I.e., could we eliminate
> this has_object_file() entirely?
>
> That should be OK for the same reason that it's OK to use QUICK.
>
> There's only one caller of this function (git-blame), which I think
> would be happy with such a change.

I actually have to take that "we could even lie to say that content
that does not hash to X is object X" back---that was never the
intention of this mechanism.

It was to ensure that operations that are supposedly read-only can
avoid writing into the repository---"blame" uses it to pretend as if
the working tree file already has a corresponding object in the
object store, IIRC.

The only reason why hash_object_file() is used here is to allow us
discarding the memory held for that working tree copy if it happens
to match what is stored in the object database.  The saving would be
within a few hundred kilobytes to a single digit megabyte range at
most, hopefully, so we could drop it (oh, saying "a single digit
megabyte" reminds me that my first Linux computer was 486dx with
4MB ram---on that box, it may have mattered).


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-23 18:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21 22:50 [PATCH] sha1-file: make pretend_object_file() not prefetch Jonathan Tan
2020-07-21 23:25 ` Junio C Hamano
2020-07-21 23:27 ` Junio C Hamano
2020-07-23 17:47   ` Jeff King
2020-07-23 18:06     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).