git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, peff@peff.net, gitster@pobox.com
Subject: Re: [PATCH v3 3/4] object-file: emit corruption errors when detected
Date: Fri, 09 Dec 2022 15:19:44 +0100	[thread overview]
Message-ID: <221209.86y1rg63tt.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <7c9ed861e7431352df864c8d2c3bec7dee6e3905.1670532905.git.jonathantanmy@google.com>


On Thu, Dec 08 2022, Jonathan Tan wrote:

> Instead of relying on errno being preserved across function calls, teach
> do_oid_object_info_extended() to itself report object corruption when
> it first detects it. There are 3 types of corruption being detected:
>  - when a replacement object is missing
>  - when a loose object is corrupt
>  - when a packed object is corrupt and the object cannot be read
>    in another way
>
> Note that in the RHS of this patch's diff, a check for ENOENT that was
> introduced in 3ba7a06552 (A loose object is not corrupt if it cannot
> be read due to EMFILE, 2010-10-28) is also removed. The purpose of this
> check is to avoid a false report of corruption if the errno contains
> something like EMFILE (or anything that is not ENOENT), in which case
> a more generic report is presented. Because, as of this patch, we no
> longer rely on such a heuristic to determine corruption, but surface
> the error message at the point when we read something that we did not
> expect, this check is no longer necessary.
>
> Besides being more resilient, this also prepares for a future patch in
> which an indirect caller of do_oid_object_info_extended() will need
> such functionality.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  object-file.c  | 48 ++++++++++++++++++++++--------------------------
>  object-store.h |  3 +++
>  2 files changed, 25 insertions(+), 26 deletions(-)
>
> diff --git a/object-file.c b/object-file.c
> index d99d05839f..f166065f32 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1433,6 +1433,7 @@ static int loose_object_info(struct repository *r,
>  {
>  	int status = 0;
>  	unsigned long mapsize;
> +	const char *path = NULL;

I think the NULL assignment here should either go, or it's incomplete.

Below you chechk the return value of "map", so let's either trust it to
populate "path" if it's returning success (which it does), *or* not
trust it, init this to NULL, and below add....

>  	void *map;
>  	git_zstream stream;
>  	char hdr[MAX_HEADER_LEN];
> @@ -1464,7 +1465,7 @@ static int loose_object_info(struct repository *r,
>  		return 0;
>  	}
>  
> -	map = map_loose_object(r, oid, &mapsize);
> +	map = map_loose_object_1(r, oid, &mapsize, &path);
>  	if (!map)
>  		return -1;

....

	if (!path)
		BUG("map_loose_object_1 should have given us a path");

But I don't think it's good to just hide the potential difference
between the two, especially as....

> @@ -1502,6 +1503,10 @@ static int loose_object_info(struct repository *r,
>  		break;
>  	}
>  
> +	if (status && (flags & OBJECT_INFO_DIE_IF_CORRUPT))
> +		die(_("loose object %s (stored in %s) is corrupt"),
> +		    oid_to_hex(oid), path);

...I think this is leftover from a previous round (or maybe not, I
didn't check) where we did a free(path), but here we'd end up
segfaulting (not on glibc, but some platforms) if we have a NULL path.

So init-ing it didn't help us, but just helps to hide that potential
(and much worse) bug.

I think this change should also remove the existing "const char *path"
in this function from the "if"'d scope omitted in this context.

The C compiler won't care, but to the human reader it's easier to reason
about not shadowing the variable now, for as it turns out no reason, as
they're effectively independent.

>  	git_inflate_end(&stream);
>  cleanup:
>  	munmap(map, mapsize);
> @@ -1611,6 +1616,15 @@ static int do_oid_object_info_extended(struct repository *r,
>  			continue;
>  		}
>  
> +		if (flags & OBJECT_INFO_DIE_IF_CORRUPT) {
> +			const struct packed_git *p;

Nit: add an extra \n here, between decls and code.

> -	if (oid_object_info_extended(r, oid, &oi, 0) < 0)
> +	if (oid_object_info_extended(r, oid, &oi,
> +				     die_if_corrupt ? OBJECT_INFO_DIE_IF_CORRUPT : 0)
> +	    < 0)
>  		return NULL;
>  	return content;
>  }

This is a very odd coding style/wrapping, to not even end up with a line
shorter than 79 characters. You can instead do:

	if (oid_object_info_extended(r, oid, &oi, die_if_corrupt
				     ? OBJECT_INFO_DIE_IF_CORRUPT : 0) < 0)

Which is in line with our usual style, and does wrap before 79 characters...

> diff --git a/object-store.h b/object-store.h
> index b1ec0bde82..98c1d67946 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -445,6 +445,9 @@ struct object_info {
>   */
>  #define OBJECT_INFO_FOR_PREFETCH (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK)
>  
> +/* Die if object corruption (not just an object being missing) was detected. */
> +#define OBJECT_INFO_DIE_IF_CORRUPT 32

Personally I wouldn't mind a short cleanup step in this series to change
these to 1<<0, 1<<1 etc., as we do for almost everything els.

I.e. in an earlier step you removed the "16", and changed that "32" to
"16", now we're adding a "32" again.

I also notice that you didn't just add a "4" here, which is an existing
gap, which turns out to be a leftover bit from your 9c8a294a1ae
(sha1-file: remove OBJECT_INFO_SKIP_CACHED, 2020-01-02) ~2 years ago :)

Maybe it's too much, and we could do it later, but something like this
as a first step:

diff --git a/object-file.c b/object-file.c
index 26290554bb4..48eff3850f5 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1534,7 +1534,8 @@ int fetch_if_missing = 1;
 
 static int do_oid_object_info_extended(struct repository *r,
 				       const struct object_id *oid,
-				       struct object_info *oi, unsigned flags)
+				       struct object_info *oi,
+				       enum object_info_flags flags)
 {
 	static struct object_info blank_oi = OBJECT_INFO_INIT;
 	struct cached_object *co;
@@ -1633,7 +1634,7 @@ static int do_oid_object_info_extended(struct repository *r,
 }
 
 int oid_object_info_extended(struct repository *r, const struct object_id *oid,
-			     struct object_info *oi, unsigned flags)
+			     struct object_info *oi, enum object_info_flags flags)
 {
 	int ret;
 	obj_read_lock();
diff --git a/object-store.h b/object-store.h
index 1be57abaf10..a20e00395b9 100644
--- a/object-store.h
+++ b/object-store.h
@@ -428,28 +428,31 @@ struct object_info {
  */
 #define OBJECT_INFO_INIT { 0 }
 
-/* Invoke lookup_replace_object() on the given hash */
-#define OBJECT_INFO_LOOKUP_REPLACE 1
-/* Allow reading from a loose object file of unknown/bogus type */
-#define OBJECT_INFO_ALLOW_UNKNOWN_TYPE 2
-/* Do not retry packed storage after checking packed and loose storage */
-#define OBJECT_INFO_QUICK 8
-/* Do not check loose object */
-#define OBJECT_INFO_IGNORE_LOOSE 16
-/*
- * Do not attempt to fetch the object if missing (even if fetch_is_missing is
- * nonzero).
- */
-#define OBJECT_INFO_SKIP_FETCH_OBJECT 32
-/*
- * This is meant for bulk prefetching of missing blobs in a partial
- * clone. Implies OBJECT_INFO_SKIP_FETCH_OBJECT and OBJECT_INFO_QUICK
- */
-#define OBJECT_INFO_FOR_PREFETCH (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK)
+enum object_info_flags {
+	/* Invoke lookup_replace_object() on the given hash */
+	OBJECT_INFO_LOOKUP_REPLACE = 1<<0,
+	/* Allow reading from a loose object file of unknown/bogus type */
+	OBJECT_INFO_ALLOW_UNKNOWN_TYPE = 1<<1,
+	/* Do not retry packed storage after checking packed and loose storage */
+	OBJECT_INFO_QUICK = 1<<2,
+	/* Do not check loose object */
+	OBJECT_INFO_IGNORE_LOOSE = 1<<3,
+	/*
+	 * Do not attempt to fetch the object if missing (even if fetch_is_missing is
+	 * nonzero).
+	 */
+	OBJECT_INFO_SKIP_FETCH_OBJECT = 1<<4,
+	/*
+	 * This is meant for bulk prefetching of missing blobs in a partial
+	 * clone. Implies OBJECT_INFO_SKIP_FETCH_OBJECT and OBJECT_INFO_QUICK
+	 */
+	OBJECT_INFO_FOR_PREFETCH = (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK),
+};
 
 int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
-			     struct object_info *, unsigned flags);
+			     struct object_info *,
+			     enum object_info_flags flags);
 
 /*
  * Iterate over the files in the loose-object parts of the object


  parent reply	other threads:[~2022-12-09 14:37 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30 20:30 [PATCH 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-11-30 20:30 ` [PATCH 1/4] object-file: reread object with exact same args Jonathan Tan
2022-11-30 20:30 ` [PATCH 2/4] object-file: refactor corrupt object diagnosis Jonathan Tan
2022-11-30 20:47   ` Jeff King
2022-11-30 23:42     ` Junio C Hamano
2022-12-01 19:06       ` Jonathan Tan
2022-11-30 20:30 ` [PATCH 3/4] object-file: refactor replace object lookup Jonathan Tan
2022-11-30 20:54   ` Jeff King
2022-11-30 20:30 ` [PATCH 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-11-30 21:04   ` Jeff King
2022-12-01 19:11     ` Jonathan Tan
2022-12-01 19:33       ` Jeff King
2022-11-30 23:56   ` Junio C Hamano
2022-11-30 21:06 ` [PATCH 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-01 19:27 ` [PATCH v2 " Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 1/4] object-file: reread object with exact same args Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 2/4] object-file: refactor corrupt object diagnosis Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 3/4] object-file: refactor replace object lookup Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-01 19:54   ` [PATCH v2 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-01 21:26     ` Jonathan Tan
2022-12-02  0:23       ` Jeff King
2022-12-06  0:49         ` Jonathan Tan
2022-12-06  2:03           ` Jeff King
2022-12-01 23:09     ` Junio C Hamano
2022-12-07  0:40 ` [PATCH v2 0/3] " Jonathan Tan
2022-12-07  0:40   ` [PATCH v2 1/3] object-file: don't exit early if skipping loose Jonathan Tan
2022-12-07  1:12     ` Junio C Hamano
2022-12-07  6:14       ` Jeff King
2022-12-07  6:43         ` Junio C Hamano
2022-12-07 23:20           ` Jonathan Tan
2022-12-07  0:40   ` [PATCH v2 2/3] object-file: emit corruption errors when detected Jonathan Tan
2022-12-07  1:16     ` Junio C Hamano
2022-12-07  4:05     ` Ævar Arnfjörð Bjarmason
2022-12-07  7:07       ` Jeff King
2022-12-07 10:33         ` Ævar Arnfjörð Bjarmason
2022-12-07 23:26           ` Jonathan Tan
2022-12-07 23:50             ` Ævar Arnfjörð Bjarmason
2022-12-08  6:33               ` Jeff King
2022-12-07  6:42     ` Jeff King
2022-12-07  0:40   ` [PATCH v2 3/3] commit: don't lazy-fetch commits Jonathan Tan
2022-12-07  1:17     ` Junio C Hamano
2022-12-07  6:47     ` Jeff King
2022-12-08 20:57 ` [PATCH v3 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-09  2:00     ` Jeff King
2022-12-09 18:17       ` Jonathan Tan
2022-12-09 20:27         ` Jeff King
2022-12-09 20:27           ` Jeff King
2022-12-08 20:57   ` [PATCH v3 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-09  1:56     ` Jeff King
2022-12-09 18:26       ` Jonathan Tan
2022-12-09 14:19     ` Ævar Arnfjörð Bjarmason [this message]
2022-12-09 18:33       ` Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-09 14:14     ` Ævar Arnfjörð Bjarmason
2022-12-09 21:44 ` [PATCH v4 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-10  0:16     ` Junio C Hamano
2022-12-12 20:38       ` Jonathan Tan
2022-12-12 20:49       ` Jeff King
2022-12-12 20:59         ` Jonathan Tan
2022-12-12 21:20           ` Jeff King
2022-12-12 21:29             ` Jonathan Tan
2022-12-12 22:17               ` Jeff King
2022-12-12 22:52             ` Jonathan Tan
2022-12-13 10:37               ` Jeff King
2022-12-09 21:44   ` [PATCH v4 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-12 22:48 ` [PATCH v5 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-13  1:51     ` Junio C Hamano
2022-12-13 10:38       ` Jeff King
2022-12-12 22:48   ` [PATCH v5 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-14 19:17 ` [PATCH v6 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-14 20:43   ` [PATCH v6 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-15  0:07     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221209.86y1rg63tt.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).