git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: <git@vger.kernel.org>
Subject: Re: [PATCH v2 20/24] fast-import: permit reading multiple marks files
Date: Fri, 05 Jun 2020 09:14:49 -0700	[thread overview]
Message-ID: <xmqqimg5o5fq.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <20200222201749.937983-21-sandals@crustytoothpaste.net> (brian m. carlson's message of "Sat, 22 Feb 2020 20:17:45 +0000")

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> In the future, we'll want to read marks files for submodules as well.
> Refactor the existing code to make it possible to read multiple marks
> files, each into their own marks set.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  fast-import.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)

This conversion looks a bit strange.  We are working off of 'marks'
that is file scope global, and the topmost caller passes the pointer
to that single instance down the callchain to this function (among
others) in "s", and it all looked no-op to my eye, but this actually
updates the global 'marks'.

> diff --git a/fast-import.c b/fast-import.c
> index b8b65a801c..b9ecd89699 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -493,25 +493,24 @@ static char *pool_strdup(const char *s)
>  	return r;
>  }
>  
> -static void insert_mark(uintmax_t idnum, struct object_entry *oe)
> +static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry *oe)
>  {
> -	struct mark_set *s = marks;

This is no longer needed as the caller gave the 'marks' to us in
's'.  Up to this line, this step is a no-op patch. 

>  	while ((idnum >> s->shift) >= 1024) {
>  		s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));

Here we overwrite 's', and we lost the pointer the caller gave us!!!

>  		s->shift = marks->shift + 10;

And we still work off of global marks here and

>  		s->data.sets[0] = marks;
>  		marks = s;

try to swap it here.  But if our caller is also using 's' that was
passed from its caller, this assignment would not be seen by our
caller.  We'd need to pass a pointer to caller's 's' to this
function and update it around here.

>  	}
>  	while (s->shift) {
>  		uintmax_t i = idnum >> s->shift;
>  		idnum -= i << s->shift;
>  		if (!s->data.sets[i]) {
>  			s->data.sets[i] = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));
>  			s->data.sets[i]->shift = s->shift - 10;
>  		}
>  		s = s->data.sets[i];
>  	}
>  	if (!s->data.marked[idnum])
>  		marks_set_count++;

I am not sure if this global also needs a separate counter and I
didn't bother to check.

>  	s->data.marked[idnum] = oe;
>  }

With this patch t9300 still seems to pass.

I do not know if this is the fix for Billes Tibor's "fast-import
oom" issue <c53bb69b-682d-3b47-4ed0-5f4559e69e37@gmx.com> but the
change to this function stood out.

Thanks.


 fast-import.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 0dfa14dc8c..f39b7890ac 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -534,13 +534,15 @@ static char *pool_strdup(const char *s)
 	return r;
 }
 
-static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry *oe)
+static void insert_mark(struct mark_set **sp, uintmax_t idnum, struct object_entry *oe)
 {
+	struct mark_set *s = *sp;
+
 	while ((idnum >> s->shift) >= 1024) {
 		s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));
-		s->shift = marks->shift + 10;
-		s->data.sets[0] = marks;
-		marks = s;
+		s->shift = (*sp)->shift + 10;
+		s->data.sets[0] = (*sp);
+		(*sp) = s;
 	}
 	while (s->shift) {
 		uintmax_t i = idnum >> s->shift;
@@ -958,7 +960,7 @@ static int store_object(
 
 	e = insert_object(&oid);
 	if (mark)
-		insert_mark(marks, mark, e);
+		insert_mark(&marks, mark, e);
 	if (e->idx.offset) {
 		duplicate_count_by_type[type]++;
 		return 1;
@@ -1156,7 +1158,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 	e = insert_object(&oid);
 
 	if (mark)
-		insert_mark(marks, mark, e);
+		insert_mark(&marks, mark, e);
 
 	if (e->idx.offset) {
 		duplicate_count_by_type[OBJ_BLOB]++;
@@ -1745,12 +1747,12 @@ static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintm
 		e->pack_id = MAX_PACK_ID;
 		e->idx.offset = 1; /* just not zero! */
 	}
-	insert_mark(s, mark, e);
+	insert_mark(&s, mark, e);
 }
 
 static void insert_oid_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
 {
-	insert_mark(s, mark, xmemdupz(oid, sizeof(*oid)));
+	insert_mark(&s, mark, xmemdupz(oid, sizeof(*oid)));
 }
 
 static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inserter)
@@ -3242,7 +3244,7 @@ static void parse_alias(void)
 		die(_("Expected 'to' command, got %s"), command_buf.buf);
 	e = find_object(&b.oid);
 	assert(e);
-	insert_mark(marks, next_mark, e);
+	insert_mark(&marks, next_mark, e);
 }
 
 static char* make_fast_import_path(const char *path)

  reply	other threads:[~2020-06-05 16:15 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
2020-02-22 20:17 ` [PATCH v2 01/24] builtin/pack-objects: make hash agnostic brian m. carlson
2020-02-23 21:57   ` Jeff King
2020-02-24  3:01     ` Jeff King
2020-02-24  3:42     ` brian m. carlson
2020-02-24  4:20       ` Jeff King
2020-02-22 20:17 ` [PATCH v2 02/24] hash: implement and use a context cloning function brian m. carlson
2020-02-22 20:17 ` [PATCH v2 03/24] hex: introduce parsing variants taking hash algorithms brian m. carlson
2020-02-22 20:17 ` [PATCH v2 04/24] hex: add functions to parse hex object IDs in any algorithm brian m. carlson
2020-02-22 20:17 ` [PATCH v2 05/24] repository: require a build flag to use SHA-256 brian m. carlson
2020-02-22 20:17 ` [PATCH v2 06/24] t: use hash-specific lookup tables to define test constants brian m. carlson
2020-02-22 20:17 ` [PATCH v2 07/24] t6300: abstract away SHA-1-specific constants brian m. carlson
2020-02-24 18:01   ` Junio C Hamano
2020-02-24 18:12     ` Jeff King
2020-02-24 20:41       ` Junio C Hamano
2020-02-22 20:17 ` [PATCH v2 08/24] t6300: make hash algorithm independent brian m. carlson
2020-02-22 20:17 ` [PATCH v2 09/24] t/helper/test-dump-split-index: initialize git repository brian m. carlson
2020-02-22 20:17 ` [PATCH v2 10/24] t/helper: initialize repository if necessary brian m. carlson
2020-02-24 18:05   ` Junio C Hamano
2020-02-25  0:05     ` brian m. carlson
2020-02-22 20:17 ` [PATCH v2 11/24] t/helper: make repository tests hash independent brian m. carlson
2020-02-22 20:17 ` [PATCH v2 12/24] setup: allow check_repository_format to read repository format brian m. carlson
2020-02-22 20:17 ` [PATCH v2 13/24] builtin/init-db: allow specifying hash algorithm on command line brian m. carlson
2020-02-24 18:14   ` Junio C Hamano
2020-02-25  0:11     ` brian m. carlson
2020-02-22 20:17 ` [PATCH v2 14/24] builtin/init-db: add environment variable for new repo hash brian m. carlson
2020-02-22 20:17 ` [PATCH v2 15/24] init-db: move writing repo version into a function brian m. carlson
2020-02-22 20:17 ` [PATCH v2 16/24] worktree: allow repository version 1 brian m. carlson
2020-02-22 20:17 ` [PATCH v2 17/24] commit: use expected signature header for SHA-256 brian m. carlson
2020-02-24 18:24   ` Junio C Hamano
2020-02-22 20:17 ` [PATCH v2 18/24] gpg-interface: improve interface for parsing tags brian m. carlson
2020-02-24 18:26   ` Junio C Hamano
2020-02-25 10:29   ` Johannes Schindelin
2020-02-25 19:25     ` Junio C Hamano
2020-02-26  3:05       ` brian m. carlson
2020-02-26  3:11         ` Junio C Hamano
2020-02-26  2:23     ` brian m. carlson
2020-02-27 13:24       ` Johannes Schindelin
2020-02-27 15:06         ` Junio C Hamano
2020-02-22 20:17 ` [PATCH v2 19/24] tag: store SHA-256 signatures in a header brian m. carlson
2020-02-24 18:30   ` Junio C Hamano
2020-02-22 20:17 ` [PATCH v2 20/24] fast-import: permit reading multiple marks files brian m. carlson
2020-06-05 16:14   ` Junio C Hamano [this message]
2020-06-05 16:26     ` Junio C Hamano
2020-06-05 22:39       ` brian m. carlson
2020-06-05 22:53         ` Junio C Hamano
2020-02-22 20:17 ` [PATCH v2 21/24] fast-import: add helper function for inserting mark object entries brian m. carlson
2020-02-22 20:17 ` [PATCH v2 22/24] fast-import: make find_marks work on any mark set brian m. carlson
2020-02-22 20:17 ` [PATCH v2 23/24] fast-import: add a generic function to iterate over marks brian m. carlson
2020-02-22 20:17 ` [PATCH v2 24/24] fast-import: add options for rewriting submodules brian m. carlson
2020-02-24 18:34 ` [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqimg5o5fq.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).