git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
@ 2019-09-05  8:24 Stephan Beyer
  2019-09-05 17:08 ` René Scharfe
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Stephan Beyer @ 2019-09-05  8:24 UTC (permalink / raw)
  To: Paul Tan, Jeff King, brian m. carlson, Shawn O. Pearce,
	Johannes Schindelin
  Cc: Stephan Beyer, git

Compiler heuristics for detection of potentially uninitialized variables
may change between compiler versions and enabling link-time optimization
may find new warnings.  Indeed, compiling with gcc 9.2.1 and enabled
link-time optimization feature resulted in a few hits that are fixed by
this patch in the most naïve way.  This allows to compile git using the
DEVELOPER=1 switch (which sets -Werror) and using the -flto flag.

Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
---
 builtin/am.c               | 2 +-
 builtin/pack-objects.c     | 2 +-
 bulk-checkin.c             | 2 ++
 fast-import.c              | 3 ++-
 t/helper/test-read-cache.c | 2 +-
 5 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 1aea657a7f..ab914fd46e 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1266,7 +1266,7 @@ static int get_mail_commit_oid(struct object_id *commit_id, const char *mail)
 static void get_commit_info(struct am_state *state, struct commit *commit)
 {
 	const char *buffer, *ident_line, *msg;
-	size_t ident_len;
+	size_t ident_len = 0;
 	struct ident_split id;

 	buffer = logmsg_reencode(commit, NULL, get_commit_output_encoding());
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 76ce906946..d0c03b0e9b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1171,7 +1171,7 @@ static int add_object_entry(const struct object_id *oid, enum object_type type,
 {
 	struct packed_git *found_pack = NULL;
 	off_t found_offset = 0;
-	uint32_t index_pos;
+	uint32_t index_pos = 0;

 	display_progress(progress_state, ++nr_seen);

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 39ee7d6107..87fa28c227 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -200,6 +200,8 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	struct hashfile_checkpoint checkpoint;
 	struct pack_idx_entry *idx = NULL;

+	checkpoint.offset = 0;
+
 	seekback = lseek(fd, 0, SEEK_CUR);
 	if (seekback == (off_t) -1)
 		return error("cannot find the current offset");
diff --git a/fast-import.c b/fast-import.c
index b44d6a467e..58f73f9105 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -903,7 +903,8 @@ static int store_object(
 	struct object_entry *e;
 	unsigned char hdr[96];
 	struct object_id oid;
-	unsigned long hdrlen, deltalen;
+	unsigned long hdrlen;
+	unsigned long deltalen = 0;
 	git_hash_ctx c;
 	git_zstream s;

diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
index 7e79b555de..ef0963e2f4 100644
--- a/t/helper/test-read-cache.c
+++ b/t/helper/test-read-cache.c
@@ -4,7 +4,7 @@

 int cmd__read_cache(int argc, const char **argv)
 {
-	int i, cnt = 1, namelen;
+	int i, cnt = 1, namelen = 0;
 	const char *name = NULL;

 	if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) {
--
2.23.0.38.g7ab3f3815a


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05  8:24 [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto Stephan Beyer
@ 2019-09-05 17:08 ` René Scharfe
  2019-09-05 17:53   ` Jeff King
  2019-09-05 17:41 ` Junio C Hamano
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: René Scharfe @ 2019-09-05 17:08 UTC (permalink / raw)
  To: Stephan Beyer, Paul Tan, Jeff King, brian m. carlson,
	Johannes Schindelin
  Cc: git

Am 05.09.19 um 10:24 schrieb Stephan Beyer:
> Compiler heuristics for detection of potentially uninitialized variables
> may change between compiler versions and enabling link-time optimization
> may find new warnings.  Indeed, compiling with gcc 9.2.1 and enabled
> link-time optimization feature resulted in a few hits that are fixed by
> this patch in the most naïve way.  This allows to compile git using the
> DEVELOPER=1 switch (which sets -Werror) and using the -flto flag.
>
> Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
> ---
>  builtin/am.c               | 2 +-
>  builtin/pack-objects.c     | 2 +-
>  bulk-checkin.c             | 2 ++
>  fast-import.c              | 3 ++-
>  t/helper/test-read-cache.c | 2 +-
>  5 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/am.c b/builtin/am.c
> index 1aea657a7f..ab914fd46e 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -1266,7 +1266,7 @@ static int get_mail_commit_oid(struct object_id *commit_id, const char *mail)
>  static void get_commit_info(struct am_state *state, struct commit *commit)
>  {
>  	const char *buffer, *ident_line, *msg;
> -	size_t ident_len;
> +	size_t ident_len = 0;
>  	struct ident_split id;
>
>  	buffer = logmsg_reencode(commit, NULL, get_commit_output_encoding());

Further context:

	ident_line = find_commit_header(buffer, "author", &ident_len);

	if (split_ident_line(&id, ident_line, ident_len) < 0)
		die(_("invalid ident line: %.*s"), (int)ident_len, ident_line);

find_commit_header() can return NULL.  split_ident_line() won't handle
that well.  So I think what's missing here is a NULL check.  If the
compiler is smart enough then that should silence the initialization
warning.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 76ce906946..d0c03b0e9b 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1171,7 +1171,7 @@ static int add_object_entry(const struct object_id *oid, enum object_type type,
>  {
>  	struct packed_git *found_pack = NULL;
>  	off_t found_offset = 0;
> -	uint32_t index_pos;
> +	uint32_t index_pos = 0;
>
>  	display_progress(progress_state, ++nr_seen);
>

Further context:

	if (have_duplicate_entry(oid, exclude, &index_pos))
		return 0;

	if (!want_object_in_pack(oid, exclude, &found_pack, &found_offset)) {
		/* The pack is missing an object, so it will not have closure */
		if (write_bitmap_index) {
			if (write_bitmap_index != WRITE_BITMAP_QUIET)
				warning(_(no_closure_warning));
			write_bitmap_index = 0;
		}
		return 0;
	}

	create_object_entry(oid, type, pack_name_hash(name),
			    exclude, name && no_try_delta(name),
			    index_pos, found_pack, found_offset);

So we call have_duplicate_entry() and if it returns 0 then we might
end up using index_pos.  So when does it return 0?

static int have_duplicate_entry(const struct object_id *oid,
				int exclude,
				uint32_t *index_pos)
{
	struct object_entry *entry;

	entry = packlist_find(&to_pack, oid, index_pos);
	if (!entry)
		return 0;

OK, it does that if packlist_find() returns NULL.  When does it do
that?
struct object_entry *packlist_find(struct packing_data *pdata,
				   const struct object_id *oid,
				   uint32_t *index_pos)
{
	uint32_t i;
	int found;

	if (!pdata->index_size)
		return NULL;

	i = locate_object_entry_hash(pdata, oid, &found);

	if (index_pos)
		*index_pos = i;

	if (!found)
		return NULL;

So if the packing list is empty then it returns NULL without setting
index_pos.  Hmm.  It does set it in all other cases, no matter if oid is
found or not.  Is it really a good idea to make that exception?  I
suspect always setting index_pos here would silence the compiler as well
and fix the issue closer to its root.

But I may be missing something, this code looks complicated.

> diff --git a/bulk-checkin.c b/bulk-checkin.c
> index 39ee7d6107..87fa28c227 100644
> --- a/bulk-checkin.c
> +++ b/bulk-checkin.c
> @@ -200,6 +200,8 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
>  	struct hashfile_checkpoint checkpoint;
>  	struct pack_idx_entry *idx = NULL;
>
> +	checkpoint.offset = 0;
> +
>  	seekback = lseek(fd, 0, SEEK_CUR);
>  	if (seekback == (off_t) -1)
>  		return error("cannot find the current offset");

Omitting further context, even though it would help, but this reply is
long enough already.  It seems the compiler got confused -- I can't see
an execution path that would use an uninitialized offset.  If idx is
NULL then the function is exited early, and if it's not then offset is
initialized.  But perhaps I'm missing something.

Anyway, my points are that simply initializing might not always be the
best fix, and that more context would help reviewers of such a patch,
but only if functions are reasonably short and it's not necessary to
follow the rabbit into a call chain hole.

Didn't check the other cases.

René

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05  8:24 [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto Stephan Beyer
  2019-09-05 17:08 ` René Scharfe
@ 2019-09-05 17:41 ` Junio C Hamano
  2019-09-05 17:56 ` Junio C Hamano
  2019-09-05 22:48 ` Jeff King
  3 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2019-09-05 17:41 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Paul Tan, Jeff King, brian m. carlson, Shawn O. Pearce,
	Johannes Schindelin, git

Stephan Beyer <s-beyer@gmx.net> writes:

> diff --git a/fast-import.c b/fast-import.c
> index b44d6a467e..58f73f9105 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -903,7 +903,8 @@ static int store_object(
>  	struct object_entry *e;
>  	unsigned char hdr[96];
>  	struct object_id oid;
> -	unsigned long hdrlen, deltalen;
> +	unsigned long hdrlen;
> +	unsigned long deltalen = 0;
>  	git_hash_ctx c;
>  	git_zstream s;

[in my attempt to imitate Réne...]

In this function, deltalen is used only when delta != NULL, i.e.

	if (delta) {
		s.next_in = delta;
		s.avail_in = deltalen;
	} else {
		s.next_in = (void *)dat->buf;
		s.avail_in = dat->len;
	}
	...
	if (delta) {
		...
		hdrlen = encode_in_pack_object_header(hdr, sizeof(hdr),
						      OBJ_OFS_DELTA, deltalen);
	...

Could delta become non-NULL without deltalen getting set?  We see
these before all uses of delta/deltalen in this function.

	if (last && last->data.len && last->data.buf && last->depth < max_depth
		&& dat->len > the_hash_algo->rawsz) {

		delta_count_attempts_by_type[type]++;
		delta = diff_delta(last->data.buf, last->data.len,
			dat->buf, dat->len,
			&deltalen, dat->len - the_hash_algo->rawsz);
	} else
		delta = NULL;

If diff_delta() returns non-NULL without touching deltalen, we'd be
in trouble.  We see this in delta.h

static inline void *
diff_delta(const void *src_buf, unsigned long src_bufsize,
	   const void *trg_buf, unsigned long trg_bufsize,
	   unsigned long *delta_size, unsigned long max_delta_size)
{
	struct delta_index *index = create_delta_index(src_buf, src_bufsize);
	if (index) {
		void *delta = create_delta(index, trg_buf, trg_bufsize,
					   delta_size, max_delta_size);
		free_delta_index(index);
		return delta;
	}
	return NULL;
}

so the question is if create_delta() can return non-NULL without
touching delta_size.  In diff-delta.c::create_delta(), *delta_size
is assigned once at the very end, when the function returns a
pointer to an allocated memory 'out'.  All the "return" statement
other than that last one literally returns "NULL".

So it seems that this is a case the compiler getting confused.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 17:08 ` René Scharfe
@ 2019-09-05 17:53   ` Jeff King
  2019-09-05 19:09     ` René Scharfe
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2019-09-05 17:53 UTC (permalink / raw)
  To: René Scharfe
  Cc: Stephan Beyer, Paul Tan, brian m. carlson, Johannes Schindelin,
	git

On Thu, Sep 05, 2019 at 07:08:49PM +0200, René Scharfe wrote:

> Anyway, my points are that simply initializing might not always be the
> best fix, and that more context would help reviewers of such a patch,
> but only if functions are reasonably short and it's not necessary to
> follow the rabbit into a call chain hole.

I started looking at these, too, and came to the same conclusion.  Some
of these are pointing to real bugs, and just silencing the warnings
misses the opportunity to find them.

I'll comment on the ones I did look at.

> Further context:
> 
> 	ident_line = find_commit_header(buffer, "author", &ident_len);
> 
> 	if (split_ident_line(&id, ident_line, ident_len) < 0)
> 		die(_("invalid ident line: %.*s"), (int)ident_len, ident_line);
> 
> find_commit_header() can return NULL.  split_ident_line() won't handle
> that well.  So I think what's missing here is a NULL check.  If the
> compiler is smart enough then that should silence the initialization
> warning.

Yeah, this one is a segfault waiting to happen, I think, if the author
line is missing.

> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 76ce906946..d0c03b0e9b 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -1171,7 +1171,7 @@ static int add_object_entry(const struct object_id *oid, enum object_type type,
> >  {
> >  	struct packed_git *found_pack = NULL;
> >  	off_t found_offset = 0;
> > -	uint32_t index_pos;
> > +	uint32_t index_pos = 0;
> [...]
> So if the packing list is empty then it returns NULL without setting
> index_pos.  Hmm.  It does set it in all other cases, no matter if oid is
> found or not.  Is it really a good idea to make that exception?  I
> suspect always setting index_pos here would silence the compiler as well
> and fix the issue closer to its root.

Yeah, this is a weird one. That index_pos is actually a position in the
current hash table. And in the first object we see in pack-objects'
input, we definitely _do_ end up with a nonsense index_pos, which then
gets passed to packlist_alloc().

But since we also need to grow the hash table during that allocation, we
don't use index_pos at all. So setting index_pos to something known like
"0" is kind of weird, in that we don't have a hash position at all (the
table doesn't exist!). But it's probably the least bad thing. If we do
that, it should happen in packlist_find().

I have to agree this whole "passing around index_pos" thing looks
complicated. It seems like we're just saving ourselves one hash lookup
on insertion (and it's not even an expensive hash, since we're reusing
the oid).

I suspect the whole thing would also be a lot simpler using one of our
existing hash implementations.

> Didn't check the other cases.

The only other one I looked at was:

>> int cmd__read_cache(int argc, const char **argv)
>> {
>>-       int i, cnt = 1, namelen;
>>+       int i, cnt = 1, namelen = 0;

I actually saw this one the other day, because it triggered for me when
compiling with SANITIZE=address. AFAICT it's a false positive. "name" is
always NULL unless skip_prefix() returns true, in which case we always
set "namelen". And we only look at "namelen" if "name" is non-NULL.

This one doesn't even require LTO, because skip_prefix() is an inline
function. I'm not sure why the compiler gets confused here. I don't mind
initializing namelen to 0 to silence it, though (we already set name to
NULL, so this would just match).

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05  8:24 [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto Stephan Beyer
  2019-09-05 17:08 ` René Scharfe
  2019-09-05 17:41 ` Junio C Hamano
@ 2019-09-05 17:56 ` Junio C Hamano
  2019-09-05 18:03   ` Jeff King
  2019-09-05 22:48 ` Jeff King
  3 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2019-09-05 17:56 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Paul Tan, Jeff King, brian m. carlson, Shawn O. Pearce,
	Johannes Schindelin, git

Stephan Beyer <s-beyer@gmx.net> writes:

> diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
> index 7e79b555de..ef0963e2f4 100644
> --- a/t/helper/test-read-cache.c
> +++ b/t/helper/test-read-cache.c
> @@ -4,7 +4,7 @@
>
>  int cmd__read_cache(int argc, const char **argv)
>  {
> -	int i, cnt = 1, namelen;
> +	int i, cnt = 1, namelen = 0;
>  	const char *name = NULL;
>
>  	if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) {
		namelen = strlen(name);

The above is the only assignment to namelen in this function, and
namelen is used like so:

		if (name) {
			...
			pos = index_name_pos(&the_index, name, namelen);

So somebody does not realize that skip_prefix() returns true only
when it touches name.  But skip_prefix() is inline and visible to
the compiler, and it is quite clear that name is only touched when 
the function returns non-zero.

static inline int skip_prefix(const char *str, const char *prefix,
			      const char **out)
{
	do {
		if (!*prefix) {
			*out = str;
			return 1;
		}
	} while (*str++ == *prefix++);
	return 0;
}

So it looks like it is another case of compiler getting confused.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 17:56 ` Junio C Hamano
@ 2019-09-05 18:03   ` Jeff King
  2019-09-05 18:22     ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2019-09-05 18:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stephan Beyer, Paul Tan, brian m. carlson, Shawn O. Pearce,
	Johannes Schindelin, git

On Thu, Sep 05, 2019 at 10:56:12AM -0700, Junio C Hamano wrote:

> Stephan Beyer <s-beyer@gmx.net> writes:
> 
> > diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
> > index 7e79b555de..ef0963e2f4 100644
> > --- a/t/helper/test-read-cache.c
> > +++ b/t/helper/test-read-cache.c
> > @@ -4,7 +4,7 @@
> >
> >  int cmd__read_cache(int argc, const char **argv)
> >  {
> > -	int i, cnt = 1, namelen;
> > +	int i, cnt = 1, namelen = 0;
> >  	const char *name = NULL;
> >
> >  	if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) {
> 		namelen = strlen(name);
> 
> The above is the only assignment to namelen in this function, and
> namelen is used like so:
> 
> 		if (name) {
> 			...
> 			pos = index_name_pos(&the_index, name, namelen);
> 
> So somebody does not realize that skip_prefix() returns true only
> when it touches name.  But skip_prefix() is inline and visible to
> the compiler, and it is quite clear that name is only touched when 
> the function returns non-zero.

I said earlier that I wouldn't mind seeing "namelen = 0" here. But I
think there is a much more direct solution: keeping the assignment and
point of use closer together. That makes it more clear both to the
compiler and to a human when we expect the variable to be valid. In
fact, since it's only used once, we can drop the variable altogther. :)

diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
index 7e79b555de..244977a29b 100644
--- a/t/helper/test-read-cache.c
+++ b/t/helper/test-read-cache.c
@@ -4,11 +4,10 @@
 
 int cmd__read_cache(int argc, const char **argv)
 {
-	int i, cnt = 1, namelen;
+	int i, cnt = 1;
 	const char *name = NULL;
 
 	if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) {
-		namelen = strlen(name);
 		argc--;
 		argv++;
 	}
@@ -24,7 +23,7 @@ int cmd__read_cache(int argc, const char **argv)
 
 			refresh_index(&the_index, REFRESH_QUIET,
 				      NULL, NULL, NULL);
-			pos = index_name_pos(&the_index, name, namelen);
+			pos = index_name_pos(&the_index, name, strlen(name));
 			if (pos < 0)
 				die("%s not in index", name);
 			printf("%s is%s up to date\n", name,

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 18:03   ` Jeff King
@ 2019-09-05 18:22     ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2019-09-05 18:22 UTC (permalink / raw)
  To: Jeff King
  Cc: Stephan Beyer, Paul Tan, brian m. carlson, Shawn O. Pearce,
	Johannes Schindelin, git

Jeff King <peff@peff.net> writes:

> I said earlier that I wouldn't mind seeing "namelen = 0" here. But I
> think there is a much more direct solution: keeping the assignment and
> point of use closer together. That makes it more clear both to the
> compiler and to a human when we expect the variable to be valid. In
> fact, since it's only used once, we can drop the variable altogther. :)

Yeah, that sounds like a nice solution.

> diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
> index 7e79b555de..244977a29b 100644
> --- a/t/helper/test-read-cache.c
> +++ b/t/helper/test-read-cache.c
> @@ -4,11 +4,10 @@
>  
>  int cmd__read_cache(int argc, const char **argv)
>  {
> -	int i, cnt = 1, namelen;
> +	int i, cnt = 1;
>  	const char *name = NULL;
>  
>  	if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) {
> -		namelen = strlen(name);
>  		argc--;
>  		argv++;
>  	}
> @@ -24,7 +23,7 @@ int cmd__read_cache(int argc, const char **argv)
>  
>  			refresh_index(&the_index, REFRESH_QUIET,
>  				      NULL, NULL, NULL);
> -			pos = index_name_pos(&the_index, name, namelen);
> +			pos = index_name_pos(&the_index, name, strlen(name));
>  			if (pos < 0)
>  				die("%s not in index", name);
>  			printf("%s is%s up to date\n", name,

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 17:53   ` Jeff King
@ 2019-09-05 19:09     ` René Scharfe
  2019-09-05 19:25       ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: René Scharfe @ 2019-09-05 19:09 UTC (permalink / raw)
  To: Jeff King
  Cc: Stephan Beyer, Paul Tan, brian m. carlson, Johannes Schindelin,
	git

Am 05.09.19 um 19:53 schrieb Jeff King:
>>> int cmd__read_cache(int argc, const char **argv)
>>> {
>>> -       int i, cnt = 1, namelen;
>>> +       int i, cnt = 1, namelen = 0;
>
> I actually saw this one the other day, because it triggered for me when
> compiling with SANITIZE=address. AFAICT it's a false positive. "name" is
> always NULL unless skip_prefix() returns true, in which case we always
> set "namelen". And we only look at "namelen" if "name" is non-NULL.
>
> This one doesn't even require LTO, because skip_prefix() is an inline
> function. I'm not sure why the compiler gets confused here.

Yes, that's curious.

> I don't mind
> initializing namelen to 0 to silence it, though (we already set name to
> NULL, so this would just match).

Pushing the strlen() call into the loop and getting rid of namelen should
work as well -- and I'd be surprised if this had a measurable performance
impact.

René

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 19:09     ` René Scharfe
@ 2019-09-05 19:25       ` Junio C Hamano
  2019-09-05 19:39         ` René Scharfe
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2019-09-05 19:25 UTC (permalink / raw)
  To: René Scharfe
  Cc: Jeff King, Stephan Beyer, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

René Scharfe <l.s.r@web.de> writes:

> Am 05.09.19 um 19:53 schrieb Jeff King:
>>>> int cmd__read_cache(int argc, const char **argv)
>>>> {
>>>> -       int i, cnt = 1, namelen;
>>>> +       int i, cnt = 1, namelen = 0;
>>
>> I actually saw this one the other day, because it triggered for me when
>> compiling with SANITIZE=address. AFAICT it's a false positive. "name" is
>> always NULL unless skip_prefix() returns true, in which case we always
>> set "namelen". And we only look at "namelen" if "name" is non-NULL.
>>
>> This one doesn't even require LTO, because skip_prefix() is an inline
>> function. I'm not sure why the compiler gets confused here.
>
> Yes, that's curious.
>
>> I don't mind
>> initializing namelen to 0 to silence it, though (we already set name to
>> NULL, so this would just match).
>
> Pushing the strlen() call into the loop and getting rid of namelen should
> work as well -- and I'd be surprised if this had a measurable performance
> impact.

Yeah, we are making strlen() call on a constant "name" in a loop
over argv[].  I do not think it matters in this case, either.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 19:25       ` Junio C Hamano
@ 2019-09-05 19:39         ` René Scharfe
  2019-09-05 19:50           ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: René Scharfe @ 2019-09-05 19:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Stephan Beyer, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

Am 05.09.19 um 21:25 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Am 05.09.19 um 19:53 schrieb Jeff King:
>>>>> int cmd__read_cache(int argc, const char **argv)
>>>>> {
>>>>> -       int i, cnt = 1, namelen;
>>>>> +       int i, cnt = 1, namelen = 0;
>>>
>>> I actually saw this one the other day, because it triggered for me when
>>> compiling with SANITIZE=address. AFAICT it's a false positive. "name" is
>>> always NULL unless skip_prefix() returns true, in which case we always
>>> set "namelen". And we only look at "namelen" if "name" is non-NULL.
>>>
>>> This one doesn't even require LTO, because skip_prefix() is an inline
>>> function. I'm not sure why the compiler gets confused here.
>>
>> Yes, that's curious.
>>
>>> I don't mind
>>> initializing namelen to 0 to silence it, though (we already set name to
>>> NULL, so this would just match).
>>
>> Pushing the strlen() call into the loop and getting rid of namelen should
>> work as well -- and I'd be surprised if this had a measurable performance
>> impact.
>
> Yeah, we are making strlen() call on a constant "name" in a loop
> over argv[].  I do not think it matters in this case, either.

The loop count is either 1 or argv[1] interpreted as a number, i.e. it could
be very high.  Its body consists of an index load and writing a number to a
file, though -- a strlen() call on the name of that file should go unnoticed
amid that activity.  (I didn't measure it, though.)

René


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 19:39         ` René Scharfe
@ 2019-09-05 19:50           ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2019-09-05 19:50 UTC (permalink / raw)
  To: René Scharfe
  Cc: Jeff King, Stephan Beyer, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

René Scharfe <l.s.r@web.de> writes:

> The loop count is either 1 or argv[1] interpreted as a number, i.e. it could
> be very high.  Its body consists of an index load and writing a number to a
> file, though -- a strlen() call on the name of that file should go unnoticed
> amid that activity.  (I didn't measure it, though.)

Yup, I didn't either but we reasoned along the same line.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05  8:24 [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto Stephan Beyer
                   ` (2 preceding siblings ...)
  2019-09-05 17:56 ` Junio C Hamano
@ 2019-09-05 22:48 ` Jeff King
  2019-09-05 22:50   ` [PATCH 1/6] git-am: handle missing "author" when parsing commit Jeff King
                     ` (6 more replies)
  3 siblings, 7 replies; 23+ messages in thread
From: Jeff King @ 2019-09-05 22:48 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

On Thu, Sep 05, 2019 at 10:24:59AM +0200, Stephan Beyer wrote:

> Compiler heuristics for detection of potentially uninitialized variables
> may change between compiler versions and enabling link-time optimization
> may find new warnings.  Indeed, compiling with gcc 9.2.1 and enabled
> link-time optimization feature resulted in a few hits that are fixed by
> this patch in the most naïve way.  This allows to compile git using the
> DEVELOPER=1 switch (which sets -Werror) and using the -flto flag.

Lots of discussion in this thread. Let's try to turn it into some
patches. :)

After the patches below, I can compile cleanly with gcc 9.2.1 using
-flto with both -O2 and -O3 (some of the cases only seemed to trigger
for me with -O3).

I've ordered them in decreasing value. The first one is a real bugfix,
the second is a related cleanup. The next 3 are appeasing the compiler,
but I think are a good idea (but note I went more for root causes than
your originals). The last one is perhaps more controversial, but IMHO
worth doing.

  [1/6]: git-am: handle missing "author" when parsing commit
  [2/6]: pack-objects: use object_id in packlist_alloc()
  [3/6]: bulk-checkin: zero-initialize hashfile_checkpoint
  [4/6]: diff-delta: set size out-parameter to 0 for NULL delta
  [5/6]: test-read-cache: drop namelen variable
  [6/6]: pack-objects: drop packlist index_pos optimization

 builtin/am.c               |  4 +++-
 builtin/pack-objects.c     | 33 ++++++++++++++-------------------
 bulk-checkin.c             |  2 +-
 diff-delta.c               |  2 ++
 pack-bitmap-write.c        |  2 +-
 pack-bitmap.c              |  2 +-
 pack-objects.c             | 20 ++++++++++----------
 pack-objects.h             |  6 ++----
 t/helper/test-read-cache.c |  5 ++---
 9 files changed, 36 insertions(+), 40 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/6] git-am: handle missing "author" when parsing commit
  2019-09-05 22:48 ` Jeff King
@ 2019-09-05 22:50   ` Jeff King
  2019-09-05 22:52   ` [PATCH 2/6] pack-objects: use object_id in packlist_alloc() Jeff King
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2019-09-05 22:50 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

We try to parse the "author" line out of a commit buffer. We handle the
case that split_ident_line() doesn't work, but we don't do any error
checking that we found an "author" line in the first place! This would
cause us to segfault on such a corrupt object.

Let's put in an explicit NULL check (we can just die(), which is what a
bogus split would do, too). As a bonus, this silences a warning when
compiling with gcc 9.2.1 using "-flto -O3", which claims that ident_len
may be uninitialized (it would only be if we had a NULL here).

Reported-by: Stephan Beyer <s-beyer@gmx.net>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/am.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index 1aea657a7f..ee7305eaa6 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1272,7 +1272,9 @@ static void get_commit_info(struct am_state *state, struct commit *commit)
 	buffer = logmsg_reencode(commit, NULL, get_commit_output_encoding());
 
 	ident_line = find_commit_header(buffer, "author", &ident_len);
-
+	if (!ident_line)
+		die(_("missing author line in commit %s"),
+		      oid_to_hex(&commit->object.oid));
 	if (split_ident_line(&id, ident_line, ident_len) < 0)
 		die(_("invalid ident line: %.*s"), (int)ident_len, ident_line);
 
-- 
2.23.0.463.g883b23b1c5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/6] pack-objects: use object_id in packlist_alloc()
  2019-09-05 22:48 ` Jeff King
  2019-09-05 22:50   ` [PATCH 1/6] git-am: handle missing "author" when parsing commit Jeff King
@ 2019-09-05 22:52   ` Jeff King
  2019-09-05 22:52   ` [PATCH 3/6] bulk-checkin: zero-initialize hashfile_checkpoint Jeff King
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2019-09-05 22:52 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

The only caller of packlist_alloc() already has a "struct object_id",
and we immediately copy the hash they pass us into our own object_id.
Let's avoid the unnecessary round-trip to a raw sha1 pointer.

Signed-off-by: Jeff King <peff@peff.net>
---
Just noticed since I was touching that function.

This is the second-to-last raw sha1 in pack-objects.c. The final one is
slightly tricky to get rid of, because it comes from the raw base_ref
pointer we parse out of the packfile's mmap. I left it out of this
series, but I wouldn't mind if somebody wants to take a stab at it.

 builtin/pack-objects.c | 2 +-
 pack-objects.c         | 4 ++--
 pack-objects.h         | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 76ce906946..dc2a7e9ac0 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1147,7 +1147,7 @@ static void create_object_entry(const struct object_id *oid,
 {
 	struct object_entry *entry;
 
-	entry = packlist_alloc(&to_pack, oid->hash, index_pos);
+	entry = packlist_alloc(&to_pack, oid, index_pos);
 	entry->hash = hash;
 	oe_set_type(entry, type);
 	if (exclude)
diff --git a/pack-objects.c b/pack-objects.c
index 52560293b6..c1df08df1a 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -153,7 +153,7 @@ void prepare_packing_data(struct repository *r, struct packing_data *pdata)
 }
 
 struct object_entry *packlist_alloc(struct packing_data *pdata,
-				    const unsigned char *sha1,
+				    const struct object_id *oid,
 				    uint32_t index_pos)
 {
 	struct object_entry *new_entry;
@@ -177,7 +177,7 @@ struct object_entry *packlist_alloc(struct packing_data *pdata,
 	new_entry = pdata->objects + pdata->nr_objects++;
 
 	memset(new_entry, 0, sizeof(*new_entry));
-	hashcpy(new_entry->idx.oid.hash, sha1);
+	oidcpy(&new_entry->idx.oid, oid);
 
 	if (pdata->index_size * 3 <= pdata->nr_objects * 4)
 		rehash_objects(pdata);
diff --git a/pack-objects.h b/pack-objects.h
index 857d43850b..47bf7ebf86 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -183,7 +183,7 @@ static inline void packing_data_unlock(struct packing_data *pdata)
 }
 
 struct object_entry *packlist_alloc(struct packing_data *pdata,
-				    const unsigned char *sha1,
+				    const struct object_id *oid,
 				    uint32_t index_pos);
 
 struct object_entry *packlist_find(struct packing_data *pdata,
-- 
2.23.0.463.g883b23b1c5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/6] bulk-checkin: zero-initialize hashfile_checkpoint
  2019-09-05 22:48 ` Jeff King
  2019-09-05 22:50   ` [PATCH 1/6] git-am: handle missing "author" when parsing commit Jeff King
  2019-09-05 22:52   ` [PATCH 2/6] pack-objects: use object_id in packlist_alloc() Jeff King
@ 2019-09-05 22:52   ` Jeff King
  2019-09-05 22:53   ` [PATCH 4/6] diff-delta: set size out-parameter to 0 for NULL delta Jeff King
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2019-09-05 22:52 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

We declare a "struct hashfile_checkpoint" but only sometimes actually
call hashfile_checkpoint() on it. That makes it not immediately obvious
that it's valid when we later access its members.

In fact, the code is fine: we fill it in unconditionally in the while(1)
loop as long as "idx" is non-NULL. And then if "idx" is NULL, we exit
early from the function (because we're just computing the hash, not
actually writing), before we look at the struct.

However, this does seem to confuse gcc 9.2.1's -Wmaybe-uninitialized
when compiled with "-flto -O2" (probably because with LTO it can now
realize that our call to hashfile_truncate() does not set the members
either). Let's zero-initialize the struct to tell the compiler, as well
as any readers of the code, that all is well.

Reported-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Jeff King <peff@peff.net>
---
We could also have a HASHFILE_CHECKPOINT_INIT, but it seemed like
overkill. I dunno.

 bulk-checkin.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 39ee7d6107..583aacb9e3 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -197,7 +197,7 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	git_hash_ctx ctx;
 	unsigned char obuf[16384];
 	unsigned header_len;
-	struct hashfile_checkpoint checkpoint;
+	struct hashfile_checkpoint checkpoint = {0};
 	struct pack_idx_entry *idx = NULL;
 
 	seekback = lseek(fd, 0, SEEK_CUR);
-- 
2.23.0.463.g883b23b1c5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/6] diff-delta: set size out-parameter to 0 for NULL delta
  2019-09-05 22:48 ` Jeff King
                     ` (2 preceding siblings ...)
  2019-09-05 22:52   ` [PATCH 3/6] bulk-checkin: zero-initialize hashfile_checkpoint Jeff King
@ 2019-09-05 22:53   ` Jeff King
  2019-09-05 22:53   ` [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto Stephan Beyer
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2019-09-05 22:53 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

When we cannot generate a delta, we return NULL but leave delta_size
untouched. This is generally OK, as callers rely on NULL to decide if
the output is usable or not. But it can confuse compilers; in
particular, gcc 9.2.1 with "-flto -O3" complains in fast-import's
store_object() that delta_len may be used uninitialized.

Let's change the diff-delta code to set the size explicitly to 0 for a
NULL return. That silences the compiler and makes it easier to reason
about the result.

Reported-by: Stephan Beyer <s-beyer@gmx.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
---
I suspect this same pattern of "if we return error, out-parameters are
undefined" is used in a lot of other functions, too. And I wouldn't
necessarily want to go around changing all of them. But the fact that
this tickles the compiler makes me think it's worthwhile.

 diff-delta.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/diff-delta.c b/diff-delta.c
index e49643353b..77fea08dfb 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -326,6 +326,8 @@ create_delta(const struct delta_index *index,
 	const unsigned char *ref_data, *ref_top, *data, *top;
 	unsigned char *out;
 
+	*delta_size = 0;
+
 	if (!trg_buf || !trg_size)
 		return NULL;
 
-- 
2.23.0.463.g883b23b1c5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 22:48 ` Jeff King
                     ` (3 preceding siblings ...)
  2019-09-05 22:53   ` [PATCH 4/6] diff-delta: set size out-parameter to 0 for NULL delta Jeff King
@ 2019-09-05 22:53   ` Stephan Beyer
  2019-09-05 22:58     ` Jeff King
  2019-09-05 22:54   ` [PATCH 5/6] test-read-cache: drop namelen variable Jeff King
  2019-09-05 22:56   ` [PATCH 6/6] pack-objects: drop packlist index_pos optimization Jeff King
  6 siblings, 1 reply; 23+ messages in thread
From: Stephan Beyer @ 2019-09-05 22:53 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

On 9/6/19 12:48 AM, Jeff King wrote:
> On Thu, Sep 05, 2019 at 10:24:59AM +0200, Stephan Beyer wrote:
>
>> Compiler heuristics for detection of potentially uninitialized variables
>> may change between compiler versions and enabling link-time optimization
>> may find new warnings.  Indeed, compiling with gcc 9.2.1 and enabled
>> link-time optimization feature resulted in a few hits that are fixed by
>> this patch in the most naïve way.  This allows to compile git using the
>> DEVELOPER=1 switch (which sets -Werror) and using the -flto flag.
>
> Lots of discussion in this thread. Let's try to turn it into some
> patches. :)

I thought the same and just sent my version of it. :D

Stephan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 5/6] test-read-cache: drop namelen variable
  2019-09-05 22:48 ` Jeff King
                     ` (4 preceding siblings ...)
  2019-09-05 22:53   ` [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto Stephan Beyer
@ 2019-09-05 22:54   ` Jeff King
  2019-09-05 22:56   ` [PATCH 6/6] pack-objects: drop packlist index_pos optimization Jeff King
  6 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2019-09-05 22:54 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

Early in the function we set "namelen = strlen(name)" if "name" is
non-NULL. Later, we use "namelen" only if "name" is non-NULL. However,
it's hard to immediately see this, and it seems to confuse gcc 9.2.1
(with "-flto" interestingly, though all of the involved logic is in
inline functions; it also triggers when building with ASan).

Let's simplify the code and remove the variable entirely. There's only
one use of namelen anyway, so we can just call strlen() then. It's true
this is in a loop, so we might execute strlen() more often. But:

  - this is test code that only ever loops twice in our test suite (we
    do loop 1000 times in a t/perf test, but without using this option).

  - a decent compiler ought to be able to hoist that out of the loop
    anyway (though I wouldn't count on gcc 9.2.1 doing so!)

Reported-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/helper/test-read-cache.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
index 7e79b555de..244977a29b 100644
--- a/t/helper/test-read-cache.c
+++ b/t/helper/test-read-cache.c
@@ -4,11 +4,10 @@
 
 int cmd__read_cache(int argc, const char **argv)
 {
-	int i, cnt = 1, namelen;
+	int i, cnt = 1;
 	const char *name = NULL;
 
 	if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) {
-		namelen = strlen(name);
 		argc--;
 		argv++;
 	}
@@ -24,7 +23,7 @@ int cmd__read_cache(int argc, const char **argv)
 
 			refresh_index(&the_index, REFRESH_QUIET,
 				      NULL, NULL, NULL);
-			pos = index_name_pos(&the_index, name, namelen);
+			pos = index_name_pos(&the_index, name, strlen(name));
 			if (pos < 0)
 				die("%s not in index", name);
 			printf("%s is%s up to date\n", name,
-- 
2.23.0.463.g883b23b1c5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/6] pack-objects: drop packlist index_pos optimization
  2019-09-05 22:48 ` Jeff King
                     ` (5 preceding siblings ...)
  2019-09-05 22:54   ` [PATCH 5/6] test-read-cache: drop namelen variable Jeff King
@ 2019-09-05 22:56   ` Jeff King
  6 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2019-09-05 22:56 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

Once upon a time, the code to add an object to our packing list in
pack-objects all lived in a single function. It computed the position
within the hash table once, then used it to check if the object was
already present, and if not, to add it.

Later, in 2834bc27c1 (pack-objects: refactor the packing list,
2013-10-24), this was split into two functions: packlist_find() and
packlist_alloc(). We ended up with an "index_pos" variable that gets
passed through several functions to make it from one to the other.

The resulting code is rather confusing to follow. The "index_pos"
variable is sometimes undefined, if we don't yet have a hash table. This
works out in practice because in that case packlist_alloc() won't use
our index_pos at all, since it will have to create/grow the hash table
fresh (and reassign the index for all entries). But it's hard to verify
that, and it does cause gcc 9.2.1's -Wmaybe-uninitialized to complain
when compiled with "-flto -O3" (rightfully, since we do pass the
uninitialized value as a function parameter, even if nobody ends up
using it).

All of this is to save computing the hash index again when we're
inserting into the hash table, which I found doesn't make a measurable
difference in the program runtime (which is not surprising, since we're
doing all kinds of other heavyweight things for each object).

Let's just drop this index_pos variable entirely, simplifying the code
(and pleasing the compiler).

We might be better still refactoring this custom hash table to use one
of our existing implementations (an oidmap, or a kh_oid_map). I stopped
short of that here, but this would be the likely first step towards that
anyway.

Reported-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Jeff King <peff@peff.net>
---
I suspect one advantage of this custom hash is that it stores a uint32_t
instead of the hash, probably saving 16 bytes per object. But that's
something we can think about when looking at converting the hash.

 builtin/pack-objects.c | 33 ++++++++++++++-------------------
 pack-bitmap-write.c    |  2 +-
 pack-bitmap.c          |  2 +-
 pack-objects.c         | 18 +++++++++---------
 pack-objects.h         |  6 ++----
 5 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index dc2a7e9ac0..9a8d935700 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -610,12 +610,12 @@ static int mark_tagged(const char *path, const struct object_id *oid, int flag,
 		       void *cb_data)
 {
 	struct object_id peeled;
-	struct object_entry *entry = packlist_find(&to_pack, oid, NULL);
+	struct object_entry *entry = packlist_find(&to_pack, oid);
 
 	if (entry)
 		entry->tagged = 1;
 	if (!peel_ref(path, &peeled)) {
-		entry = packlist_find(&to_pack, &peeled, NULL);
+		entry = packlist_find(&to_pack, &peeled);
 		if (entry)
 			entry->tagged = 1;
 	}
@@ -996,12 +996,11 @@ static int no_try_delta(const char *path)
  * few lines later when we want to add the new entry.
  */
 static int have_duplicate_entry(const struct object_id *oid,
-				int exclude,
-				uint32_t *index_pos)
+				int exclude)
 {
 	struct object_entry *entry;
 
-	entry = packlist_find(&to_pack, oid, index_pos);
+	entry = packlist_find(&to_pack, oid);
 	if (!entry)
 		return 0;
 
@@ -1141,13 +1140,12 @@ static void create_object_entry(const struct object_id *oid,
 				uint32_t hash,
 				int exclude,
 				int no_try_delta,
-				uint32_t index_pos,
 				struct packed_git *found_pack,
 				off_t found_offset)
 {
 	struct object_entry *entry;
 
-	entry = packlist_alloc(&to_pack, oid, index_pos);
+	entry = packlist_alloc(&to_pack, oid);
 	entry->hash = hash;
 	oe_set_type(entry, type);
 	if (exclude)
@@ -1171,11 +1169,10 @@ static int add_object_entry(const struct object_id *oid, enum object_type type,
 {
 	struct packed_git *found_pack = NULL;
 	off_t found_offset = 0;
-	uint32_t index_pos;
 
 	display_progress(progress_state, ++nr_seen);
 
-	if (have_duplicate_entry(oid, exclude, &index_pos))
+	if (have_duplicate_entry(oid, exclude))
 		return 0;
 
 	if (!want_object_in_pack(oid, exclude, &found_pack, &found_offset)) {
@@ -1190,7 +1187,7 @@ static int add_object_entry(const struct object_id *oid, enum object_type type,
 
 	create_object_entry(oid, type, pack_name_hash(name),
 			    exclude, name && no_try_delta(name),
-			    index_pos, found_pack, found_offset);
+			    found_pack, found_offset);
 	return 1;
 }
 
@@ -1199,17 +1196,15 @@ static int add_object_entry_from_bitmap(const struct object_id *oid,
 					int flags, uint32_t name_hash,
 					struct packed_git *pack, off_t offset)
 {
-	uint32_t index_pos;
-
 	display_progress(progress_state, ++nr_seen);
 
-	if (have_duplicate_entry(oid, 0, &index_pos))
+	if (have_duplicate_entry(oid, 0))
 		return 0;
 
 	if (!want_object_in_pack(oid, 0, &pack, &offset))
 		return 0;
 
-	create_object_entry(oid, type, name_hash, 0, 0, index_pos, pack, offset);
+	create_object_entry(oid, type, name_hash, 0, 0, pack, offset);
 	return 1;
 }
 
@@ -1507,7 +1502,7 @@ static int can_reuse_delta(const unsigned char *base_sha1,
 	 * First see if we're already sending the base (or it's explicitly in
 	 * our "excluded" list).
 	 */
-	base = packlist_find(&to_pack, &base_oid, NULL);
+	base = packlist_find(&to_pack, &base_oid);
 	if (base) {
 		if (!in_same_island(&delta->idx.oid, &base->idx.oid))
 			return 0;
@@ -2579,7 +2574,7 @@ static void add_tag_chain(const struct object_id *oid)
 	 * it was included via bitmaps, we would not have parsed it
 	 * previously).
 	 */
-	if (packlist_find(&to_pack, oid, NULL))
+	if (packlist_find(&to_pack, oid))
 		return;
 
 	tag = lookup_tag(the_repository, oid);
@@ -2603,7 +2598,7 @@ static int add_ref_tag(const char *path, const struct object_id *oid, int flag,
 
 	if (starts_with(path, "refs/tags/") && /* is a tag? */
 	    !peel_ref(path, &peeled)    && /* peelable? */
-	    packlist_find(&to_pack, &peeled, NULL))      /* object packed? */
+	    packlist_find(&to_pack, &peeled))      /* object packed? */
 		add_tag_chain(oid);
 	return 0;
 }
@@ -2803,7 +2798,7 @@ static void show_object(struct object *obj, const char *name, void *data)
 		for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
 			depth++;
 
-		ent = packlist_find(&to_pack, &obj->oid, NULL);
+		ent = packlist_find(&to_pack, &obj->oid);
 		if (ent && depth > oe_tree_depth(&to_pack, ent))
 			oe_set_tree_depth(&to_pack, ent, depth);
 	}
@@ -3034,7 +3029,7 @@ static void loosen_unused_packed_objects(void)
 
 		for (i = 0; i < p->num_objects; i++) {
 			nth_packed_object_oid(&oid, p, i);
-			if (!packlist_find(&to_pack, &oid, NULL) &&
+			if (!packlist_find(&to_pack, &oid) &&
 			    !has_sha1_pack_kept_or_nonlocal(&oid) &&
 			    !loosened_object_can_be_discarded(&oid, p->mtime))
 				if (force_object_loose(&oid, p->mtime))
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index fa78a460c9..a7a4964b50 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -144,7 +144,7 @@ static inline void reset_all_seen(void)
 
 static uint32_t find_object_pos(const struct object_id *oid)
 {
-	struct object_entry *entry = packlist_find(writer.to_pack, oid, NULL);
+	struct object_entry *entry = packlist_find(writer.to_pack, oid);
 
 	if (!entry) {
 		die("Failed to write bitmap index. Packfile doesn't have full closure "
diff --git a/pack-bitmap.c b/pack-bitmap.c
index ed2befaac6..84cd1bed4a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1063,7 +1063,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git,
 
 		entry = &bitmap_git->pack->revindex[i];
 		nth_packed_object_oid(&oid, bitmap_git->pack, entry->nr);
-		oe = packlist_find(mapping, &oid, NULL);
+		oe = packlist_find(mapping, &oid);
 
 		if (oe)
 			reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
diff --git a/pack-objects.c b/pack-objects.c
index c1df08df1a..3624090593 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -68,8 +68,7 @@ static void rehash_objects(struct packing_data *pdata)
 }
 
 struct object_entry *packlist_find(struct packing_data *pdata,
-				   const struct object_id *oid,
-				   uint32_t *index_pos)
+				   const struct object_id *oid)
 {
 	uint32_t i;
 	int found;
@@ -79,9 +78,6 @@ struct object_entry *packlist_find(struct packing_data *pdata,
 
 	i = locate_object_entry_hash(pdata, oid, &found);
 
-	if (index_pos)
-		*index_pos = i;
-
 	if (!found)
 		return NULL;
 
@@ -153,8 +149,7 @@ void prepare_packing_data(struct repository *r, struct packing_data *pdata)
 }
 
 struct object_entry *packlist_alloc(struct packing_data *pdata,
-				    const struct object_id *oid,
-				    uint32_t index_pos)
+				    const struct object_id *oid)
 {
 	struct object_entry *new_entry;
 
@@ -181,8 +176,13 @@ struct object_entry *packlist_alloc(struct packing_data *pdata,
 
 	if (pdata->index_size * 3 <= pdata->nr_objects * 4)
 		rehash_objects(pdata);
-	else
-		pdata->index[index_pos] = pdata->nr_objects;
+	else {
+		int found;
+		uint32_t pos = locate_object_entry_hash(pdata,
+							&new_entry->idx.oid,
+							&found);
+		pdata->index[pos] = pdata->nr_objects;
+	}
 
 	if (pdata->in_pack)
 		pdata->in_pack[pdata->nr_objects - 1] = NULL;
diff --git a/pack-objects.h b/pack-objects.h
index 47bf7ebf86..6fe6ae5ee8 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -183,12 +183,10 @@ static inline void packing_data_unlock(struct packing_data *pdata)
 }
 
 struct object_entry *packlist_alloc(struct packing_data *pdata,
-				    const struct object_id *oid,
-				    uint32_t index_pos);
+				    const struct object_id *oid);
 
 struct object_entry *packlist_find(struct packing_data *pdata,
-				   const struct object_id *oid,
-				   uint32_t *index_pos);
+				   const struct object_id *oid);
 
 static inline uint32_t pack_name_hash(const char *name)
 {
-- 
2.23.0.463.g883b23b1c5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 22:53   ` [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto Stephan Beyer
@ 2019-09-05 22:58     ` Jeff King
  2019-09-05 23:10       ` Stephan Beyer
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2019-09-05 22:58 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

On Fri, Sep 06, 2019 at 12:53:49AM +0200, Stephan Beyer wrote:

> On 9/6/19 12:48 AM, Jeff King wrote:
> > On Thu, Sep 05, 2019 at 10:24:59AM +0200, Stephan Beyer wrote:
> >
> >> Compiler heuristics for detection of potentially uninitialized variables
> >> may change between compiler versions and enabling link-time optimization
> >> may find new warnings.  Indeed, compiling with gcc 9.2.1 and enabled
> >> link-time optimization feature resulted in a few hits that are fixed by
> >> this patch in the most naïve way.  This allows to compile git using the
> >> DEVELOPER=1 switch (which sets -Werror) and using the -flto flag.
> >
> > Lots of discussion in this thread. Let's try to turn it into some
> > patches. :)
> 
> I thought the same and just sent my version of it. :D

Yeah, I see that our mails crossed. :) I like some of my versions
better, but I'm OK with either (or a mix-and-match).

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 22:58     ` Jeff King
@ 2019-09-05 23:10       ` Stephan Beyer
  2019-09-06  1:24         ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Stephan Beyer @ 2019-09-05 23:10 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

On 9/6/19 12:58 AM, Jeff King wrote:
> On Fri, Sep 06, 2019 at 12:53:49AM +0200, Stephan Beyer wrote:
>
>> On 9/6/19 12:48 AM, Jeff King wrote:
>>> On Thu, Sep 05, 2019 at 10:24:59AM +0200, Stephan Beyer wrote:
>>>
>>>> Compiler heuristics for detection of potentially uninitialized variables
>>>> may change between compiler versions and enabling link-time optimization
>>>> may find new warnings.  Indeed, compiling with gcc 9.2.1 and enabled
>>>> link-time optimization feature resulted in a few hits that are fixed by
>>>> this patch in the most naïve way.  This allows to compile git using the
>>>> DEVELOPER=1 switch (which sets -Werror) and using the -flto flag.
>>>
>>> Lots of discussion in this thread. Let's try to turn it into some
>>> patches. :)
>>
>> I thought the same and just sent my version of it. :D
>
> Yeah, I see that our mails crossed. :) I like some of my versions
> better, but I'm OK with either (or a mix-and-match).

I took a quick glance at yours. I also noticed the issue you address in
[PATCH 2/6], but I was unsure if this is the way to go (I'm only
occasionally reading on this list). I would prefer your patch series,
with maybe one exception...

The thing is: I had *exactly* the same commit like your [PATCH 6/6]
(except for the commit message and for the number), but I dropped it.
Why? Because I had the feeling (no particular instance though) that the
second locate_object_entry_hash() for each insertion *can* indeed take
"too much" time. Also, I was wondering, if the "found = 1" case should
be catched as a BUG("should not happen") or something.

I don't care much, though. The performance impact should probably be
checked carefully.

Stephan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto
  2019-09-05 23:10       ` Stephan Beyer
@ 2019-09-06  1:24         ` Jeff King
  2019-09-06  1:36           ` [PATCH v2 6/6] pack-objects: drop packlist index_pos optimization Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2019-09-06  1:24 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

On Fri, Sep 06, 2019 at 01:10:46AM +0200, Stephan Beyer wrote:

> I took a quick glance at yours. I also noticed the issue you address in
> [PATCH 2/6], but I was unsure if this is the way to go (I'm only
> occasionally reading on this list). I would prefer your patch series,
> with maybe one exception...

Yeah, we've been slowly removing these old "unsigned char *" references
(see a7db4c193d98f for the latest round touching this same code -- I'm
actually surprised I missed this one back then, as that's when
packlist_find() got converted).

> The thing is: I had *exactly* the same commit like your [PATCH 6/6]
> (except for the commit message and for the number), but I dropped it.
> Why? Because I had the feeling (no particular instance though) that the
> second locate_object_entry_hash() for each insertion *can* indeed take
> "too much" time. Also, I was wondering, if the "found = 1" case should
> be catched as a BUG("should not happen") or something.
> 
> I don't care much, though. The performance impact should probably be
> checked carefully.

I did measure it, like this:

  # Do the traversal separately. This would make any difference in the
  # pack-objects hash code stand out _more_, plus it makes it cheaper to
  # run multiple trials.
  git rev-list --objects --all >input

  # Make sure stderr is redirected to avoid progress, which again would
  # amplify any differences.
  git pack-objects --stdout --delta-base-offset <input >/dev/null 2>&1

Running on linux.git, I got:

  [before]
  Benchmark #1: git pack-objects --stdout --delta-base-offset <input >/dev/null 2>&1
    Time (mean ± σ):     26.225 s ±  0.233 s    [User: 24.089 s, System: 4.867 s]
    Range (min … max):   25.915 s … 26.555 s    10 runs

  [after]
  Benchmark #1: git pack-objects --stdout --delta-base-offset <input >/dev/null 2>&1
    Time (mean ± σ):     26.129 s ±  0.170 s    [User: 24.003 s, System: 4.958 s]
    Range (min … max):   25.974 s … 26.570 s    10 runs

So actually faster after, though not statistically significant. ;)

The BUG() on "found==1" might be worth doing.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 6/6] pack-objects: drop packlist index_pos optimization
  2019-09-06  1:24         ` Jeff King
@ 2019-09-06  1:36           ` Jeff King
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2019-09-06  1:36 UTC (permalink / raw)
  To: Stephan Beyer
  Cc: Junio C Hamano, René Scharfe, Paul Tan, brian m. carlson,
	Johannes Schindelin, git

On Thu, Sep 05, 2019 at 09:24:30PM -0400, Jeff King wrote:

> Running on linux.git, I got:
> 
>   [before]
>   Benchmark #1: git pack-objects --stdout --delta-base-offset <input >/dev/null 2>&1
>     Time (mean ± σ):     26.225 s ±  0.233 s    [User: 24.089 s, System: 4.867 s]
>     Range (min … max):   25.915 s … 26.555 s    10 runs
> 
>   [after]
>   Benchmark #1: git pack-objects --stdout --delta-base-offset <input >/dev/null 2>&1
>     Time (mean ± σ):     26.129 s ±  0.170 s    [User: 24.003 s, System: 4.958 s]
>     Range (min … max):   25.974 s … 26.570 s    10 runs
> 
> So actually faster after, though not statistically significant. ;)

By the way, I wondered if LTO actually helped here. Those are both
compiled with -O2. Here's the same measurement with "-flto -O2":

  Benchmark #1: git pack-objects --stdout --delta-base-offset <input >/dev/null 2>&1
  Time (mean ± σ):     26.276 s ±  0.131 s    [User: 24.153 s, System: 4.900 s]
  Range (min … max):   26.111 s … 26.469 s    10 runs

Slightly slower, but still within the noise.

> The BUG() on "found==1" might be worth doing.

And here's a revised version of patch 6 with that (it matches the
duplicate check that rehash_objects does, too, which is a nice
symmetry).

-- >8 --
Subject: [PATCH v2] pack-objects: drop packlist index_pos optimization

Once upon a time, the code to add an object to our packing list in
pack-objects all lived in a single function. It computed the position
within the hash table once, then used it to check if the object was
already present, and if not, to add it.

Later, in 2834bc27c1 (pack-objects: refactor the packing list,
2013-10-24), this was split into two functions: packlist_find() and
packlist_alloc(). We ended up with an "index_pos" variable that gets
passed through several functions to make it from one to the other.

The resulting code is rather confusing to follow. The "index_pos"
variable is sometimes undefined, if we don't yet have a hash table. This
works out in practice because in that case packlist_alloc() won't use it
at all, since it will have to create/grow the hash table. But it's hard
to verify that, and it does cause gcc 9.2.1's -Wmaybe-uninitialized to
complain when compiled with "-flto -O3" (rightfully, since we do pass
the uninitialized value as a function parameter, even if nobody ends up
using it).

All of this is to save computing the hash index again when we're
inserting into the hash table, which I found doesn't make a measurable
difference in the program runtime (which is not surprising, since we're
doing all kinds of other heavyweight things for each object).

Let's just drop this index_pos variable entirely, simplifying the code
(and pleasing the compiler).

We might be better still refactoring this custom hash table to use one
of our existing implementations (an oidmap, or a kh_oid_map). I stopped
short of that here, but this would be the likely first step towards that
anyway.

Reported-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c | 33 ++++++++++++++-------------------
 pack-bitmap-write.c    |  2 +-
 pack-bitmap.c          |  2 +-
 pack-objects.c         | 20 +++++++++++---------
 pack-objects.h         |  6 ++----
 5 files changed, 29 insertions(+), 34 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index dc2a7e9ac0..9a8d935700 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -610,12 +610,12 @@ static int mark_tagged(const char *path, const struct object_id *oid, int flag,
 		       void *cb_data)
 {
 	struct object_id peeled;
-	struct object_entry *entry = packlist_find(&to_pack, oid, NULL);
+	struct object_entry *entry = packlist_find(&to_pack, oid);
 
 	if (entry)
 		entry->tagged = 1;
 	if (!peel_ref(path, &peeled)) {
-		entry = packlist_find(&to_pack, &peeled, NULL);
+		entry = packlist_find(&to_pack, &peeled);
 		if (entry)
 			entry->tagged = 1;
 	}
@@ -996,12 +996,11 @@ static int no_try_delta(const char *path)
  * few lines later when we want to add the new entry.
  */
 static int have_duplicate_entry(const struct object_id *oid,
-				int exclude,
-				uint32_t *index_pos)
+				int exclude)
 {
 	struct object_entry *entry;
 
-	entry = packlist_find(&to_pack, oid, index_pos);
+	entry = packlist_find(&to_pack, oid);
 	if (!entry)
 		return 0;
 
@@ -1141,13 +1140,12 @@ static void create_object_entry(const struct object_id *oid,
 				uint32_t hash,
 				int exclude,
 				int no_try_delta,
-				uint32_t index_pos,
 				struct packed_git *found_pack,
 				off_t found_offset)
 {
 	struct object_entry *entry;
 
-	entry = packlist_alloc(&to_pack, oid, index_pos);
+	entry = packlist_alloc(&to_pack, oid);
 	entry->hash = hash;
 	oe_set_type(entry, type);
 	if (exclude)
@@ -1171,11 +1169,10 @@ static int add_object_entry(const struct object_id *oid, enum object_type type,
 {
 	struct packed_git *found_pack = NULL;
 	off_t found_offset = 0;
-	uint32_t index_pos;
 
 	display_progress(progress_state, ++nr_seen);
 
-	if (have_duplicate_entry(oid, exclude, &index_pos))
+	if (have_duplicate_entry(oid, exclude))
 		return 0;
 
 	if (!want_object_in_pack(oid, exclude, &found_pack, &found_offset)) {
@@ -1190,7 +1187,7 @@ static int add_object_entry(const struct object_id *oid, enum object_type type,
 
 	create_object_entry(oid, type, pack_name_hash(name),
 			    exclude, name && no_try_delta(name),
-			    index_pos, found_pack, found_offset);
+			    found_pack, found_offset);
 	return 1;
 }
 
@@ -1199,17 +1196,15 @@ static int add_object_entry_from_bitmap(const struct object_id *oid,
 					int flags, uint32_t name_hash,
 					struct packed_git *pack, off_t offset)
 {
-	uint32_t index_pos;
-
 	display_progress(progress_state, ++nr_seen);
 
-	if (have_duplicate_entry(oid, 0, &index_pos))
+	if (have_duplicate_entry(oid, 0))
 		return 0;
 
 	if (!want_object_in_pack(oid, 0, &pack, &offset))
 		return 0;
 
-	create_object_entry(oid, type, name_hash, 0, 0, index_pos, pack, offset);
+	create_object_entry(oid, type, name_hash, 0, 0, pack, offset);
 	return 1;
 }
 
@@ -1507,7 +1502,7 @@ static int can_reuse_delta(const unsigned char *base_sha1,
 	 * First see if we're already sending the base (or it's explicitly in
 	 * our "excluded" list).
 	 */
-	base = packlist_find(&to_pack, &base_oid, NULL);
+	base = packlist_find(&to_pack, &base_oid);
 	if (base) {
 		if (!in_same_island(&delta->idx.oid, &base->idx.oid))
 			return 0;
@@ -2579,7 +2574,7 @@ static void add_tag_chain(const struct object_id *oid)
 	 * it was included via bitmaps, we would not have parsed it
 	 * previously).
 	 */
-	if (packlist_find(&to_pack, oid, NULL))
+	if (packlist_find(&to_pack, oid))
 		return;
 
 	tag = lookup_tag(the_repository, oid);
@@ -2603,7 +2598,7 @@ static int add_ref_tag(const char *path, const struct object_id *oid, int flag,
 
 	if (starts_with(path, "refs/tags/") && /* is a tag? */
 	    !peel_ref(path, &peeled)    && /* peelable? */
-	    packlist_find(&to_pack, &peeled, NULL))      /* object packed? */
+	    packlist_find(&to_pack, &peeled))      /* object packed? */
 		add_tag_chain(oid);
 	return 0;
 }
@@ -2803,7 +2798,7 @@ static void show_object(struct object *obj, const char *name, void *data)
 		for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
 			depth++;
 
-		ent = packlist_find(&to_pack, &obj->oid, NULL);
+		ent = packlist_find(&to_pack, &obj->oid);
 		if (ent && depth > oe_tree_depth(&to_pack, ent))
 			oe_set_tree_depth(&to_pack, ent, depth);
 	}
@@ -3034,7 +3029,7 @@ static void loosen_unused_packed_objects(void)
 
 		for (i = 0; i < p->num_objects; i++) {
 			nth_packed_object_oid(&oid, p, i);
-			if (!packlist_find(&to_pack, &oid, NULL) &&
+			if (!packlist_find(&to_pack, &oid) &&
 			    !has_sha1_pack_kept_or_nonlocal(&oid) &&
 			    !loosened_object_can_be_discarded(&oid, p->mtime))
 				if (force_object_loose(&oid, p->mtime))
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index fa78a460c9..a7a4964b50 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -144,7 +144,7 @@ static inline void reset_all_seen(void)
 
 static uint32_t find_object_pos(const struct object_id *oid)
 {
-	struct object_entry *entry = packlist_find(writer.to_pack, oid, NULL);
+	struct object_entry *entry = packlist_find(writer.to_pack, oid);
 
 	if (!entry) {
 		die("Failed to write bitmap index. Packfile doesn't have full closure "
diff --git a/pack-bitmap.c b/pack-bitmap.c
index ed2befaac6..84cd1bed4a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1063,7 +1063,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git,
 
 		entry = &bitmap_git->pack->revindex[i];
 		nth_packed_object_oid(&oid, bitmap_git->pack, entry->nr);
-		oe = packlist_find(mapping, &oid, NULL);
+		oe = packlist_find(mapping, &oid);
 
 		if (oe)
 			reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
diff --git a/pack-objects.c b/pack-objects.c
index c1df08df1a..c6250d77f4 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -68,8 +68,7 @@ static void rehash_objects(struct packing_data *pdata)
 }
 
 struct object_entry *packlist_find(struct packing_data *pdata,
-				   const struct object_id *oid,
-				   uint32_t *index_pos)
+				   const struct object_id *oid)
 {
 	uint32_t i;
 	int found;
@@ -79,9 +78,6 @@ struct object_entry *packlist_find(struct packing_data *pdata,
 
 	i = locate_object_entry_hash(pdata, oid, &found);
 
-	if (index_pos)
-		*index_pos = i;
-
 	if (!found)
 		return NULL;
 
@@ -153,8 +149,7 @@ void prepare_packing_data(struct repository *r, struct packing_data *pdata)
 }
 
 struct object_entry *packlist_alloc(struct packing_data *pdata,
-				    const struct object_id *oid,
-				    uint32_t index_pos)
+				    const struct object_id *oid)
 {
 	struct object_entry *new_entry;
 
@@ -181,8 +176,15 @@ struct object_entry *packlist_alloc(struct packing_data *pdata,
 
 	if (pdata->index_size * 3 <= pdata->nr_objects * 4)
 		rehash_objects(pdata);
-	else
-		pdata->index[index_pos] = pdata->nr_objects;
+	else {
+		int found;
+		uint32_t pos = locate_object_entry_hash(pdata,
+							&new_entry->idx.oid,
+							&found);
+		if (found)
+			BUG("duplicate object inserted into hash");
+		pdata->index[pos] = pdata->nr_objects;
+	}
 
 	if (pdata->in_pack)
 		pdata->in_pack[pdata->nr_objects - 1] = NULL;
diff --git a/pack-objects.h b/pack-objects.h
index 47bf7ebf86..6fe6ae5ee8 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -183,12 +183,10 @@ static inline void packing_data_unlock(struct packing_data *pdata)
 }
 
 struct object_entry *packlist_alloc(struct packing_data *pdata,
-				    const struct object_id *oid,
-				    uint32_t index_pos);
+				    const struct object_id *oid);
 
 struct object_entry *packlist_find(struct packing_data *pdata,
-				   const struct object_id *oid,
-				   uint32_t *index_pos);
+				   const struct object_id *oid);
 
 static inline uint32_t pack_name_hash(const char *name)
 {
-- 
2.23.0.463.g883b23b1c5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-09-06  1:36 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-05  8:24 [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto Stephan Beyer
2019-09-05 17:08 ` René Scharfe
2019-09-05 17:53   ` Jeff King
2019-09-05 19:09     ` René Scharfe
2019-09-05 19:25       ` Junio C Hamano
2019-09-05 19:39         ` René Scharfe
2019-09-05 19:50           ` Junio C Hamano
2019-09-05 17:41 ` Junio C Hamano
2019-09-05 17:56 ` Junio C Hamano
2019-09-05 18:03   ` Jeff King
2019-09-05 18:22     ` Junio C Hamano
2019-09-05 22:48 ` Jeff King
2019-09-05 22:50   ` [PATCH 1/6] git-am: handle missing "author" when parsing commit Jeff King
2019-09-05 22:52   ` [PATCH 2/6] pack-objects: use object_id in packlist_alloc() Jeff King
2019-09-05 22:52   ` [PATCH 3/6] bulk-checkin: zero-initialize hashfile_checkpoint Jeff King
2019-09-05 22:53   ` [PATCH 4/6] diff-delta: set size out-parameter to 0 for NULL delta Jeff King
2019-09-05 22:53   ` [PATCH] Fix maybe-uninitialized warnings found by gcc 9 -flto Stephan Beyer
2019-09-05 22:58     ` Jeff King
2019-09-05 23:10       ` Stephan Beyer
2019-09-06  1:24         ` Jeff King
2019-09-06  1:36           ` [PATCH v2 6/6] pack-objects: drop packlist index_pos optimization Jeff King
2019-09-05 22:54   ` [PATCH 5/6] test-read-cache: drop namelen variable Jeff King
2019-09-05 22:56   ` [PATCH 6/6] pack-objects: drop packlist index_pos optimization Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).