git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH v2 2/3] object-file: emit corruption errors when detected
Date: Wed, 07 Dec 2022 11:33:47 +0100	[thread overview]
Message-ID: <221207.86pmcva2s8.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <Y5A7qOaxyWxHJiex@coredump.intra.peff.net>


On Wed, Dec 07 2022, Jeff King wrote:

> On Wed, Dec 07, 2022 at 05:05:47AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> Isn't the below squashed in better? I.e. just always pass the "path",
>> but maybe pass a "fd=0", in which case the function might need to
>> git_open() it.
>> 
>> Then have map_loose_object() and loose_object_info() call
>> open_loose_object(), and pass in the "path" and "fd".
>
> I like this direction, though I'd give a few small suggestions. One is
> to make it unconditional to pass in a valid "fd". These kind of magic
> sentinel values sometimes lead to confusion or bugs, and it's easy
> enough for the caller to use git_open() itself.
>
> In fact, in the one caller who cares, it lets us produce a nicer
> error message:
>
> diff --git a/object-file.c b/object-file.c
> index 24793e1b47..7c2a85132b 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1219,9 +1219,6 @@ static void *map_loose_object_1(struct repository *r, const char *const path,
>  {
>  	void *map;
>  
> -	if (!fd)
> -		fd = git_open(path);
> -
>  	map = NULL;
>  	if (fd >= 0) {
>  		struct stat st;
> @@ -2790,13 +2787,18 @@ int read_loose_object(const char *path,
>  		      struct object_info *oi)
>  {
>  	int ret = -1;
> +	int fd;
>  	void *map = NULL;
>  	unsigned long mapsize;
>  	git_zstream stream;
>  	char hdr[MAX_HEADER_LEN];
>  	unsigned long *size = oi->sizep;
>  
> -	map = map_loose_object_1(the_repository, path, 0, &mapsize);
> +	fd = git_open(path);
> +	if (fd < 0)
> +		error_errno(_("unable to open %s"), path);
> +
> +	map = map_loose_object_1(the_repository, path, fd, &mapsize);
>  	if (!map) {
>  		error_errno(_("unable to mmap %s"), path);
>  		goto out;

Yeah, I think that's even better, although...

>> +static void *map_loose_object_1(struct repository *r, const char *const path,
>> +				int fd, unsigned long *size)
>>  {
>>  	void *map;
>> -	int fd;
>>  
>> -	if (path)
>> +	if (!fd)
>>  		fd = git_open(path);
>> -	else
>> -		fd = open_loose_object(r, oid, &path);
>> -	if (mapped_path)
>> -		*mapped_path = xstrdup(path);
>
> The other weird thing here is ownership of "fd". Now some callers pass
> it in, but map_loose_object_1() always closes it. I think that's OK
> (since we want it closed even on success), but definitely surprising
> enough that we'd want to document that in a comment.
>
>> @@ -1251,7 +1245,10 @@ void *map_loose_object(struct repository *r,
>>  		       const struct object_id *oid,
>>  		       unsigned long *size)
>>  {
>> -	return map_loose_object_1(r, NULL, oid, size, NULL);
>> +	const char *path;
>> +	int fd = open_loose_object(r, oid, &path);
>> +
>> +	return map_loose_object_1(r, path,fd, size);
>>  }
>
> It's also kind of weird that map_loose_object_1() is a noop on a
> negative descriptor. That technically makes this correct, but I think it
> would be much less surprising to always take a valid descriptor, and
> this code should do:
>
>   if (fd)
> 	return -1;
>   return map_loose_object_1(r, path, fd, size);
>
> If we are going to make map_loose_object_1() less confusing (and I think
> that is worth doing), let's go all the way.

...maybe we should go further in the other direction. I.e. with my
earlier suggestion we're left with the mess that the "fd" ownership
isn't clear.

But what I was trying to do was fix up the ownership around the
"mapped_path", but we don't need to xstrdup() it in the first place. We
already have the caller of open_loose_object() not doing that, we can
just say that you're not going to open two loose objects at a time.

Then this becomes easier, and we can just pass the maybe-NULL "const
char **oid_path" all the way to open_loose_object():


diff --git a/object-file.c b/object-file.c
index c7a513d123e..6e900737b76 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1176,7 +1176,7 @@ static int stat_loose_object(struct repository *r, const struct object_id *oid,
  * descriptor. See the caveats on the "path" parameter above.
  */
 static int open_loose_object(struct repository *r,
-			     const struct object_id *oid, const char **path)
+			     const struct object_id *oid, const char **oid_path)
 {
 	int fd;
 	struct object_directory *odb;
@@ -1185,8 +1185,12 @@ static int open_loose_object(struct repository *r,
 
 	prepare_alt_odb(r);
 	for (odb = r->objects->odb; odb; odb = odb->next) {
-		*path = odb_loose_path(odb, &buf, oid);
-		fd = git_open(*path);
+		const char *path;
+
+		path = odb_loose_path(odb, &buf, oid);
+		if (oid_path)
+			*oid_path = path;
+		fd = git_open(path);
 		if (fd >= 0)
 			return fd;
 
@@ -1214,19 +1218,22 @@ static int quick_has_loose(struct repository *r,
  * Map the loose object at "path" if it is not NULL, or the path found by
  * searching for a loose object named "oid".
  */
-static void *map_loose_object_1(struct repository *r, const char *path,
+static void *map_loose_object_1(struct repository *r, const char *const path,
 				const struct object_id *oid, unsigned long *size,
-				char **mapped_path)
+				const char **oid_path)
 {
 	void *map;
 	int fd;
 
+	if (path && oid_path)
+		BUG("don't tell me about the path, and ask me what it is!");
+	else if (!(path || oid))
+		BUG("must get an OID or a path!");
+
 	if (path)
 		fd = git_open(path);
 	else
-		fd = open_loose_object(r, oid, &path);
-	if (mapped_path)
-		*mapped_path = xstrdup(path);
+		fd = open_loose_object(r, oid, oid_path);
 
 	map = NULL;
 	if (fd >= 0) {
@@ -1236,7 +1243,8 @@ static void *map_loose_object_1(struct repository *r, const char *path,
 			*size = xsize_t(st.st_size);
 			if (!*size) {
 				/* mmap() is forbidden on empty files */
-				error(_("object file %s is empty"), path);
+				error(_("object file %s is empty"),
+				      path ? path : *oid_path);
 				close(fd);
 				return NULL;
 			}
@@ -1432,7 +1440,7 @@ static int loose_object_info(struct repository *r,
 {
 	int status = 0;
 	unsigned long mapsize;
-	char *mapped_path = NULL;
+	const char *oid_path;
 	void *map;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
@@ -1464,11 +1472,9 @@ static int loose_object_info(struct repository *r,
 		return 0;
 	}
 
-	map = map_loose_object_1(r, NULL, oid, &mapsize, &mapped_path);
-	if (!map) {
-		free(mapped_path);
+	map = map_loose_object_1(r, NULL, oid, &mapsize, &oid_path);
+	if (!map)
 		return -1;
-	}
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
@@ -1506,11 +1512,10 @@ static int loose_object_info(struct repository *r,
 
 	if (status && (flags & OBJECT_INFO_DIE_IF_CORRUPT))
 		die(_("loose object %s (stored in %s) is corrupt"),
-		    oid_to_hex(oid), mapped_path);
+		    oid_to_hex(oid), oid_path);
 
 	git_inflate_end(&stream);
 cleanup:
-	free(mapped_path);
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;






  reply	other threads:[~2022-12-07 11:11 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30 20:30 [PATCH 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-11-30 20:30 ` [PATCH 1/4] object-file: reread object with exact same args Jonathan Tan
2022-11-30 20:30 ` [PATCH 2/4] object-file: refactor corrupt object diagnosis Jonathan Tan
2022-11-30 20:47   ` Jeff King
2022-11-30 23:42     ` Junio C Hamano
2022-12-01 19:06       ` Jonathan Tan
2022-11-30 20:30 ` [PATCH 3/4] object-file: refactor replace object lookup Jonathan Tan
2022-11-30 20:54   ` Jeff King
2022-11-30 20:30 ` [PATCH 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-11-30 21:04   ` Jeff King
2022-12-01 19:11     ` Jonathan Tan
2022-12-01 19:33       ` Jeff King
2022-11-30 23:56   ` Junio C Hamano
2022-11-30 21:06 ` [PATCH 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-01 19:27 ` [PATCH v2 " Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 1/4] object-file: reread object with exact same args Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 2/4] object-file: refactor corrupt object diagnosis Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 3/4] object-file: refactor replace object lookup Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-01 19:54   ` [PATCH v2 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-01 21:26     ` Jonathan Tan
2022-12-02  0:23       ` Jeff King
2022-12-06  0:49         ` Jonathan Tan
2022-12-06  2:03           ` Jeff King
2022-12-01 23:09     ` Junio C Hamano
2022-12-07  0:40 ` [PATCH v2 0/3] " Jonathan Tan
2022-12-07  0:40   ` [PATCH v2 1/3] object-file: don't exit early if skipping loose Jonathan Tan
2022-12-07  1:12     ` Junio C Hamano
2022-12-07  6:14       ` Jeff King
2022-12-07  6:43         ` Junio C Hamano
2022-12-07 23:20           ` Jonathan Tan
2022-12-07  0:40   ` [PATCH v2 2/3] object-file: emit corruption errors when detected Jonathan Tan
2022-12-07  1:16     ` Junio C Hamano
2022-12-07  4:05     ` Ævar Arnfjörð Bjarmason
2022-12-07  7:07       ` Jeff King
2022-12-07 10:33         ` Ævar Arnfjörð Bjarmason [this message]
2022-12-07 23:26           ` Jonathan Tan
2022-12-07 23:50             ` Ævar Arnfjörð Bjarmason
2022-12-08  6:33               ` Jeff King
2022-12-07  6:42     ` Jeff King
2022-12-07  0:40   ` [PATCH v2 3/3] commit: don't lazy-fetch commits Jonathan Tan
2022-12-07  1:17     ` Junio C Hamano
2022-12-07  6:47     ` Jeff King
2022-12-08 20:57 ` [PATCH v3 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-09  2:00     ` Jeff King
2022-12-09 18:17       ` Jonathan Tan
2022-12-09 20:27         ` Jeff King
2022-12-09 20:27           ` Jeff King
2022-12-08 20:57   ` [PATCH v3 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-09  1:56     ` Jeff King
2022-12-09 18:26       ` Jonathan Tan
2022-12-09 14:19     ` Ævar Arnfjörð Bjarmason
2022-12-09 18:33       ` Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-09 14:14     ` Ævar Arnfjörð Bjarmason
2022-12-09 21:44 ` [PATCH v4 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-10  0:16     ` Junio C Hamano
2022-12-12 20:38       ` Jonathan Tan
2022-12-12 20:49       ` Jeff King
2022-12-12 20:59         ` Jonathan Tan
2022-12-12 21:20           ` Jeff King
2022-12-12 21:29             ` Jonathan Tan
2022-12-12 22:17               ` Jeff King
2022-12-12 22:52             ` Jonathan Tan
2022-12-13 10:37               ` Jeff King
2022-12-09 21:44   ` [PATCH v4 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-12 22:48 ` [PATCH v5 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-13  1:51     ` Junio C Hamano
2022-12-13 10:38       ` Jeff King
2022-12-12 22:48   ` [PATCH v5 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-14 19:17 ` [PATCH v6 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-14 20:43   ` [PATCH v6 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-15  0:07     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221207.86pmcva2s8.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).