git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH] apply: do not fetch when checking object existence
@ 2020-07-28  1:04 Jonathan Tan
  2020-07-28  1:19 ` Junio C Hamano
  2020-08-05 23:06 ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Jonathan Tan
  0 siblings, 2 replies; 9+ messages in thread
From: Jonathan Tan @ 2020-07-28  1:04 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

There have been a few bugs wherein Git fetches missing objects whenever
the existence of an object is checked, even though it does not need to
perform such a fetch. To resolve these bugs, we could look at all the
places that has_object_file() (or a similar function) is used. As a
first step, introduce a new function has_object() that checks for the
existence of an object, with a default behavior of not fetching if the
object is missing and the repository is a partial clone. As we verify
each has_object_file() (or similar) usage, we can replace it with
has_object(), and we will know that we are done when we can delete
has_object_file() (and the other similar functions).

Also, the new function has_object() has more appropriate defaults:
besides not fetching, it also does not recheck packed storage.

Then, use this new function to fix a bug in apply.c.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
I mentioned the idea for this change here:
https://lore.kernel.org/git/20200721225020.1352772-1-jonathantanmy@google.com/

I included the same have_repository check introduced in 3e8b7d3c77
("has_sha1_file: don't bother if we are not in a repository",
2017-04-13), but I couldn't verify if it is still needed. In particular,
even at that commit, all the tests pass (after I make a small change
to an irrelevant test about the order of entries in a cookie file).
---
 apply.c        |  2 +-
 object-store.h | 25 +++++++++++++++++++++++--
 sha1-file.c    | 12 ++++++++++++
 t/t4150-am.sh  | 16 ++++++++++++++++
 4 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/apply.c b/apply.c
index 8bff604dbe..402d80602a 100644
--- a/apply.c
+++ b/apply.c
@@ -3178,7 +3178,7 @@ static int apply_binary(struct apply_state *state,
 		return 0; /* deletion patch */
 	}
 
-	if (has_object_file(&oid)) {
+	if (has_object(the_repository, &oid, 0)) {
 		/* We already have the postimage */
 		enum object_type type;
 		unsigned long size;
diff --git a/object-store.h b/object-store.h
index f439d47af8..c4fc9dd74e 100644
--- a/object-store.h
+++ b/object-store.h
@@ -239,12 +239,33 @@ int read_loose_object(const char *path,
 		      unsigned long *size,
 		      void **contents);
 
+/* Retry packed storage after checking packed and loose storage */
+#define HAS_OBJECT_RECHECK_PACKED 1
+
+/*
+ * Returns 1 if the object exists. This function will not lazily fetch objects
+ * in a partial clone.
+ */
+int has_object(struct repository *r, const struct object_id *oid,
+	       unsigned flags);
+
+/*
+ * These macros and functions are deprecated. If checking existence for an
+ * object that is likely to be missing and/or whose absence is relatively
+ * inconsequential (or is consequential but the caller is prepared to handle
+ * it), use has_object(), which has better defaults (no lazy fetch in a partial
+ * clone and no rechecking of packed storage). In the unlikely event that a
+ * caller needs to assert existence of an object that it fully expects to
+ * exist, and wants to trigger a lazy fetch in a partial clone, use
+ * oid_object_info_extended() with a NULL struct object_info.
+ *
+ * These functions can be removed once all callers have migrated to
+ * has_object() and/or oid_object_info_extended().
+ */
 #ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
 #define has_sha1_file_with_flags(sha1, flags) repo_has_sha1_file_with_flags(the_repository, sha1, flags)
 #define has_sha1_file(sha1) repo_has_sha1_file(the_repository, sha1)
 #endif
-
-/* Same as the above, except for struct object_id. */
 int repo_has_object_file(struct repository *r, const struct object_id *oid);
 int repo_has_object_file_with_flags(struct repository *r,
 				    const struct object_id *oid, int flags);
diff --git a/sha1-file.c b/sha1-file.c
index ccd34dd9e8..ff444d7abb 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1988,6 +1988,18 @@ int force_object_loose(const struct object_id *oid, time_t mtime)
 	return ret;
 }
 
+int has_object(struct repository *r, const struct object_id *oid,
+	       unsigned flags)
+{
+	int quick = !(flags & HAS_OBJECT_RECHECK_PACKED);
+	unsigned object_info_flags = OBJECT_INFO_SKIP_FETCH_OBJECT |
+		(quick ? OBJECT_INFO_QUICK : 0);
+
+	if (!startup_info->have_repository)
+		return 0;
+	return oid_object_info_extended(r, oid, NULL, object_info_flags) >= 0;
+}
+
 int repo_has_object_file_with_flags(struct repository *r,
 				    const struct object_id *oid, int flags)
 {
diff --git a/t/t4150-am.sh b/t/t4150-am.sh
index bda4586a79..94a2c76522 100755
--- a/t/t4150-am.sh
+++ b/t/t4150-am.sh
@@ -1133,4 +1133,20 @@ test_expect_success 'am and .gitattibutes' '
 	)
 '
 
+test_expect_success 'apply binary blob in partial clone' '
+	printf "\\000" >binary &&
+	git add binary &&
+	git commit -m "binary blob" &&
+	git format-patch --stdout -m HEAD^ >patch &&
+
+	test_create_repo server &&
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --filter=blob:none "file://$(pwd)/server" client &&
+	test_when_finished "rm -rf client" &&
+
+	# Exercise to make sure that it works
+	git -C client am ../patch
+'
+
 test_done
-- 
2.28.0.rc0.142.g3c755180ce-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] apply: do not fetch when checking object existence
  2020-07-28  1:04 [PATCH] apply: do not fetch when checking object existence Jonathan Tan
@ 2020-07-28  1:19 ` Junio C Hamano
  2020-07-28 18:23   ` Jonathan Tan
  2020-08-05 23:06 ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Jonathan Tan
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2020-07-28  1:19 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> There have been a few bugs wherein Git fetches missing objects whenever
> the existence of an object is checked, even though it does not need to
> perform such a fetch. To resolve these bugs, we could look at all the
> places that has_object_file() (or a similar function) is used. As a
> first step, introduce a new function has_object() that checks for the
> existence of an object, with a default behavior of not fetching if the
> object is missing and the repository is a partial clone. As we verify
> each has_object_file() (or similar) usage, we can replace it with
> has_object(), and we will know that we are done when we can delete
> has_object_file() (and the other similar functions).

I wonder if we want to name the two (i.e. one variant that refuses
to go to network because it is trying to see if a lazy fetch is
needed, and the other that goes to network behind caller's back for
ease of use in a lazy clone) a bit more distinctly so that which one
could potentially go outside.

Depending on one's view which one is _normal_ access pattern, giving
an explicit adverb to one variant while leaving the other one bland
might be sufficient.  For example, I _think_ most of the places do
not want to handle the details of lazily fetching themselves, and I
suspect that the traditional has_object_file() semantics without "do
not trigger lazy fetch" option would be the normal access pattern.

In which case, renaming your new "has_object" to something like
"has_object_locally()" would be a good name for a special case
codepath that wants to care---if the object does not exist locally
and needs to be obtained lazily from elsewhere, the function would
say "no".

And all the other names like has_object_file() that by default gives
callers a transparent access to lazily fetched objects can stay the
same.

> I mentioned the idea for this change here:
> https://lore.kernel.org/git/20200721225020.1352772-1-jonathantanmy@google.com/

Yup, I think that is going in a good direction.  I suspect that
apply will not be the only remaining case we need to "fix", and
using the new helper function, codepaths that have already been
"fixed" by passing "do not lazily fetch" option to the traditional
API functions would become easier to read.  And if that is the case,
let's have the introduction of the helper function as a separate
patch, with each of [PATCH 2-N/N] be a fix for separate codepaths.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] apply: do not fetch when checking object existence
  2020-07-28  1:19 ` Junio C Hamano
@ 2020-07-28 18:23   ` Jonathan Tan
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Tan @ 2020-07-28 18:23 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > There have been a few bugs wherein Git fetches missing objects whenever
> > the existence of an object is checked, even though it does not need to
> > perform such a fetch. To resolve these bugs, we could look at all the
> > places that has_object_file() (or a similar function) is used. As a
> > first step, introduce a new function has_object() that checks for the
> > existence of an object, with a default behavior of not fetching if the
> > object is missing and the repository is a partial clone. As we verify
> > each has_object_file() (or similar) usage, we can replace it with
> > has_object(), and we will know that we are done when we can delete
> > has_object_file() (and the other similar functions).
> 
> I wonder if we want to name the two (i.e. one variant that refuses
> to go to network because it is trying to see if a lazy fetch is
> needed, and the other that goes to network behind caller's back for
> ease of use in a lazy clone) a bit more distinctly so that which one
> could potentially go outside.
> 
> Depending on one's view which one is _normal_ access pattern, giving
> an explicit adverb to one variant while leaving the other one bland
> might be sufficient.  For example, I _think_ most of the places do
> not want to handle the details of lazily fetching themselves, and I
> suspect that the traditional has_object_file() semantics without "do
> not trigger lazy fetch" option would be the normal access pattern.

Right now, I think that most (if not all) places would not want to fetch
at all - so *with* "do not trigger lazy fetch" would be the normal
access pattern. This is because (in my opinion) if a caller checks the
existence of an object, it most likely can tolerate the object's
absence; if the caller couldn't tolerate it, it would just directly
query for its type or contents or something like that.

I tried to communicate this in my documentation of the deprecated
functions/macros, but perhaps it could be written better.

(One other option to consider is to just change has_object_file() to
never fetch, although I think this is more risky.)

> In which case, renaming your new "has_object" to something like
> "has_object_locally()" would be a good name for a special case
> codepath that wants to care---if the object does not exist locally
> and needs to be obtained lazily from elsewhere, the function would
> say "no".
> 
> And all the other names like has_object_file() that by default gives
> callers a transparent access to lazily fetched objects can stay the
> same.

If my analysis above is wrong, then yes I agree that we should do this.
But we might need to find another way to indicate which has_object_file()
has been checked and which hasn't - changing away from has_object_file()
completely gives us a way to indicate this, but if we're sticking with
has_object_file(), we have to find another way of indicating that we've
looked at this call and it is OK.

> > I mentioned the idea for this change here:
> > https://lore.kernel.org/git/20200721225020.1352772-1-jonathantanmy@google.com/
> 
> Yup, I think that is going in a good direction.  I suspect that
> apply will not be the only remaining case we need to "fix", and
> using the new helper function, codepaths that have already been
> "fixed" by passing "do not lazily fetch" option to the traditional
> API functions would become easier to read.  And if that is the case,
> let's have the introduction of the helper function as a separate
> patch, with each of [PATCH 2-N/N] be a fix for separate codepaths.
> 
> Thanks.

OK - I'll separate out the helper function into its own patch in version
2.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes
  2020-07-28  1:04 [PATCH] apply: do not fetch when checking object existence Jonathan Tan
  2020-07-28  1:19 ` Junio C Hamano
@ 2020-08-05 23:06 ` Jonathan Tan
  2020-08-05 23:06   ` [PATCH v2 1/4] sha1-file: introduce no-lazy-fetch has_object() Jonathan Tan
                     ` (4 more replies)
  1 sibling, 5 replies; 9+ messages in thread
From: Jonathan Tan @ 2020-08-05 23:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

Since v1, I've took a look at 3 other cases that use has_object_file():
2 don't need lazy-fetching (patches 3 and 4 in this set) and 1 does -
"cat-file -e" - although I think this is a special case. So I still
think that not lazy-fetching when checking object existence is more
likely, and should be privileged with the shorter function name
(has_object() instead of has_object_locally()).

Changes from v1:
 - Patch split into 2 (patch 1 and patch 2)
 - 2 additional patches that fix bugs by making use of the new function

Jonathan Tan (4):
  sha1-file: introduce no-lazy-fetch has_object()
  apply: do not lazy fetch when applying binary
  pack-objects: no fetch when allow-{any,promisor}
  fsck: do not lazy fetch known non-promisor object

 Documentation/git-pack-objects.txt | 11 +++++++----
 apply.c                            |  2 +-
 builtin/fsck.c                     |  2 +-
 builtin/pack-objects.c             |  4 ++--
 object-store.h                     | 25 +++++++++++++++++++++++--
 sha1-file.c                        | 12 ++++++++++++
 t/t4150-am.sh                      | 16 ++++++++++++++++
 7 files changed, 62 insertions(+), 10 deletions(-)

-- 
2.28.0.236.gb10cc79966-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/4] sha1-file: introduce no-lazy-fetch has_object()
  2020-08-05 23:06 ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Jonathan Tan
@ 2020-08-05 23:06   ` Jonathan Tan
  2020-08-05 23:06   ` [PATCH v2 2/4] apply: do not lazy fetch when applying binary Jonathan Tan
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Jonathan Tan @ 2020-08-05 23:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

There have been a few bugs wherein Git fetches missing objects whenever
the existence of an object is checked, even though it does not need to
perform such a fetch. To resolve these bugs, we could look at all the
places that has_object_file() (or a similar function) is used. As a
first step, introduce a new function has_object() that checks for the
existence of an object, with a default behavior of not fetching if the
object is missing and the repository is a partial clone. As we verify
each has_object_file() (or similar) usage, we can replace it with
has_object(), and we will know that we are done when we can delete
has_object_file() (and the other similar functions).

Also, the new function has_object() has more appropriate defaults:
besides not fetching, it also does not recheck packed storage.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 object-store.h | 25 +++++++++++++++++++++++--
 sha1-file.c    | 12 ++++++++++++
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/object-store.h b/object-store.h
index f439d47af8..c4fc9dd74e 100644
--- a/object-store.h
+++ b/object-store.h
@@ -239,12 +239,33 @@ int read_loose_object(const char *path,
 		      unsigned long *size,
 		      void **contents);
 
+/* Retry packed storage after checking packed and loose storage */
+#define HAS_OBJECT_RECHECK_PACKED 1
+
+/*
+ * Returns 1 if the object exists. This function will not lazily fetch objects
+ * in a partial clone.
+ */
+int has_object(struct repository *r, const struct object_id *oid,
+	       unsigned flags);
+
+/*
+ * These macros and functions are deprecated. If checking existence for an
+ * object that is likely to be missing and/or whose absence is relatively
+ * inconsequential (or is consequential but the caller is prepared to handle
+ * it), use has_object(), which has better defaults (no lazy fetch in a partial
+ * clone and no rechecking of packed storage). In the unlikely event that a
+ * caller needs to assert existence of an object that it fully expects to
+ * exist, and wants to trigger a lazy fetch in a partial clone, use
+ * oid_object_info_extended() with a NULL struct object_info.
+ *
+ * These functions can be removed once all callers have migrated to
+ * has_object() and/or oid_object_info_extended().
+ */
 #ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
 #define has_sha1_file_with_flags(sha1, flags) repo_has_sha1_file_with_flags(the_repository, sha1, flags)
 #define has_sha1_file(sha1) repo_has_sha1_file(the_repository, sha1)
 #endif
-
-/* Same as the above, except for struct object_id. */
 int repo_has_object_file(struct repository *r, const struct object_id *oid);
 int repo_has_object_file_with_flags(struct repository *r,
 				    const struct object_id *oid, int flags);
diff --git a/sha1-file.c b/sha1-file.c
index ccd34dd9e8..ff444d7abb 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1988,6 +1988,18 @@ int force_object_loose(const struct object_id *oid, time_t mtime)
 	return ret;
 }
 
+int has_object(struct repository *r, const struct object_id *oid,
+	       unsigned flags)
+{
+	int quick = !(flags & HAS_OBJECT_RECHECK_PACKED);
+	unsigned object_info_flags = OBJECT_INFO_SKIP_FETCH_OBJECT |
+		(quick ? OBJECT_INFO_QUICK : 0);
+
+	if (!startup_info->have_repository)
+		return 0;
+	return oid_object_info_extended(r, oid, NULL, object_info_flags) >= 0;
+}
+
 int repo_has_object_file_with_flags(struct repository *r,
 				    const struct object_id *oid, int flags)
 {
-- 
2.28.0.236.gb10cc79966-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 2/4] apply: do not lazy fetch when applying binary
  2020-08-05 23:06 ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Jonathan Tan
  2020-08-05 23:06   ` [PATCH v2 1/4] sha1-file: introduce no-lazy-fetch has_object() Jonathan Tan
@ 2020-08-05 23:06   ` Jonathan Tan
  2020-08-05 23:06   ` [PATCH v2 3/4] pack-objects: no fetch when allow-{any,promisor} Jonathan Tan
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Jonathan Tan @ 2020-08-05 23:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

When applying a binary patch, as an optimization, "apply" checks if the
postimage is already present. During this fetch, it is perfectly
expected for the postimage not to be present, so there is no need to
lazy-fetch missing objects. Teach "apply" not to lazy-fetch in this
case.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 apply.c       |  2 +-
 t/t4150-am.sh | 16 ++++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/apply.c b/apply.c
index 8bff604dbe..402d80602a 100644
--- a/apply.c
+++ b/apply.c
@@ -3178,7 +3178,7 @@ static int apply_binary(struct apply_state *state,
 		return 0; /* deletion patch */
 	}
 
-	if (has_object_file(&oid)) {
+	if (has_object(the_repository, &oid, 0)) {
 		/* We already have the postimage */
 		enum object_type type;
 		unsigned long size;
diff --git a/t/t4150-am.sh b/t/t4150-am.sh
index bda4586a79..94a2c76522 100755
--- a/t/t4150-am.sh
+++ b/t/t4150-am.sh
@@ -1133,4 +1133,20 @@ test_expect_success 'am and .gitattibutes' '
 	)
 '
 
+test_expect_success 'apply binary blob in partial clone' '
+	printf "\\000" >binary &&
+	git add binary &&
+	git commit -m "binary blob" &&
+	git format-patch --stdout -m HEAD^ >patch &&
+
+	test_create_repo server &&
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --filter=blob:none "file://$(pwd)/server" client &&
+	test_when_finished "rm -rf client" &&
+
+	# Exercise to make sure that it works
+	git -C client am ../patch
+'
+
 test_done
-- 
2.28.0.236.gb10cc79966-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 3/4] pack-objects: no fetch when allow-{any,promisor}
  2020-08-05 23:06 ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Jonathan Tan
  2020-08-05 23:06   ` [PATCH v2 1/4] sha1-file: introduce no-lazy-fetch has_object() Jonathan Tan
  2020-08-05 23:06   ` [PATCH v2 2/4] apply: do not lazy fetch when applying binary Jonathan Tan
@ 2020-08-05 23:06   ` Jonathan Tan
  2020-08-05 23:06   ` [PATCH v2 4/4] fsck: do not lazy fetch known non-promisor object Jonathan Tan
  2020-08-06 20:00   ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Junio C Hamano
  4 siblings, 0 replies; 9+ messages in thread
From: Jonathan Tan @ 2020-08-05 23:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

The options --missing=allow-{any,promisor} were introduced in caf3827e2f
("rev-list: add list-objects filtering support", 2017-11-22) with the
following note in the commit message:

    This patch introduces handling of missing objects to help
    debugging and development of the "partial clone" mechanism,
    and once the mechanism is implemented, for a power user to
    perform operations that are missing-object aware without
    incurring the cost of checking if a missing link is expected.

The idea that these options are missing-object aware (and thus do not
need to lazily fetch objects, unlike unaware commands that assume that
all objects are present) are assumed in later commits such as 07ef3c6604
("fetch test: use more robust test for filtered objects", 2020-01-15).

However, the current implementations of these options use
has_object_file(), which indeed lazily fetches missing objects. Teach
these implementations not to do so. Also, update the documentation of
these options to be clearer.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/git-pack-objects.txt | 11 +++++++----
 builtin/pack-objects.c             |  4 ++--
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index eaa2f2a404..54d715ead1 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -270,15 +270,18 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle.
 	This option specifies how missing objects are handled.
 +
 The form '--missing=error' requests that pack-objects stop with an error if
-a missing object is encountered.  This is the default action.
+a missing object is encountered.  If the repository is a partial clone, an
+attempt to fetch missing objects will be made before declaring them missing.
+This is the default action.
 +
 The form '--missing=allow-any' will allow object traversal to continue
-if a missing object is encountered.  Missing objects will silently be
-omitted from the results.
+if a missing object is encountered.  No fetch of a missing object will occur.
+Missing objects will silently be omitted from the results.
 +
 The form '--missing=allow-promisor' is like 'allow-any', but will only
 allow object traversal to continue for EXPECTED promisor missing objects.
-Unexpected missing object will raise an error.
+No fetch of a missing object will occur.  An unexpected missing object will
+raise an error.
 
 --exclude-promisor-objects::
 	Omit objects that are known to be in the promisor remote.  (This
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7016b28485..2a2afb7cb1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3016,7 +3016,7 @@ static void show_object__ma_allow_any(struct object *obj, const char *name, void
 	 * Quietly ignore ALL missing objects.  This avoids problems with
 	 * staging them now and getting an odd error later.
 	 */
-	if (!has_object_file(&obj->oid))
+	if (!has_object(the_repository, &obj->oid, 0))
 		return;
 
 	show_object(obj, name, data);
@@ -3030,7 +3030,7 @@ static void show_object__ma_allow_promisor(struct object *obj, const char *name,
 	 * Quietly ignore EXPECTED missing objects.  This avoids problems with
 	 * staging them now and getting an odd error later.
 	 */
-	if (!has_object_file(&obj->oid) && is_promisor_object(&obj->oid))
+	if (!has_object(the_repository, &obj->oid, 0) && is_promisor_object(&obj->oid))
 		return;
 
 	show_object(obj, name, data);
-- 
2.28.0.236.gb10cc79966-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 4/4] fsck: do not lazy fetch known non-promisor object
  2020-08-05 23:06 ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Jonathan Tan
                     ` (2 preceding siblings ...)
  2020-08-05 23:06   ` [PATCH v2 3/4] pack-objects: no fetch when allow-{any,promisor} Jonathan Tan
@ 2020-08-05 23:06   ` Jonathan Tan
  2020-08-06 20:00   ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Junio C Hamano
  4 siblings, 0 replies; 9+ messages in thread
From: Jonathan Tan @ 2020-08-05 23:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

There is a call to has_object_file(), which lazily fetches missing
objects in a partial clone, when the object is known to not be
a promisor object. Change that call to has_object(), which does not do
any lazy fetching.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/fsck.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 37aa07da78..fbf26cafcf 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -168,7 +168,7 @@ static int mark_object(struct object *obj, int type, void *data, struct fsck_opt
 		return 0;
 
 	if (!(obj->flags & HAS_OBJ)) {
-		if (parent && !has_object_file(&obj->oid)) {
+		if (parent && !has_object(the_repository, &obj->oid, 1)) {
 			printf_ln(_("broken link from %7s %s\n"
 				    "              to %7s %s"),
 				  printable_type(&parent->oid, parent->type),
-- 
2.28.0.236.gb10cc79966-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes
  2020-08-05 23:06 ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Jonathan Tan
                     ` (3 preceding siblings ...)
  2020-08-05 23:06   ` [PATCH v2 4/4] fsck: do not lazy fetch known non-promisor object Jonathan Tan
@ 2020-08-06 20:00   ` Junio C Hamano
  4 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2020-08-06 20:00 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> Since v1, I've took a look at 3 other cases that use has_object_file():
> 2 don't need lazy-fetching (patches 3 and 4 in this set) and 1 does -
> "cat-file -e" - although I think this is a special case. So I still
> think that not lazy-fetching when checking object existence is more
> likely, and should be privileged with the shorter function name
> (has_object() instead of has_object_locally()).

Thanks.  has_object then.

>
> Changes from v1:
>  - Patch split into 2 (patch 1 and patch 2)
>  - 2 additional patches that fix bugs by making use of the new function
>
> Jonathan Tan (4):
>   sha1-file: introduce no-lazy-fetch has_object()
>   apply: do not lazy fetch when applying binary
>   pack-objects: no fetch when allow-{any,promisor}
>   fsck: do not lazy fetch known non-promisor object
>
>  Documentation/git-pack-objects.txt | 11 +++++++----
>  apply.c                            |  2 +-
>  builtin/fsck.c                     |  2 +-
>  builtin/pack-objects.c             |  4 ++--
>  object-store.h                     | 25 +++++++++++++++++++++++--
>  sha1-file.c                        | 12 ++++++++++++
>  t/t4150-am.sh                      | 16 ++++++++++++++++
>  7 files changed, 62 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-06 20:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-28  1:04 [PATCH] apply: do not fetch when checking object existence Jonathan Tan
2020-07-28  1:19 ` Junio C Hamano
2020-07-28 18:23   ` Jonathan Tan
2020-08-05 23:06 ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Jonathan Tan
2020-08-05 23:06   ` [PATCH v2 1/4] sha1-file: introduce no-lazy-fetch has_object() Jonathan Tan
2020-08-05 23:06   ` [PATCH v2 2/4] apply: do not lazy fetch when applying binary Jonathan Tan
2020-08-05 23:06   ` [PATCH v2 3/4] pack-objects: no fetch when allow-{any,promisor} Jonathan Tan
2020-08-05 23:06   ` [PATCH v2 4/4] fsck: do not lazy fetch known non-promisor object Jonathan Tan
2020-08-06 20:00   ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Junio C Hamano

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git