git-fast-import out of memory

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* git-fast-import out of memory
@ 2020-06-05  5:15 Billes Tibor
  2020-06-05 22:43 ` brian m. carlson
  2020-06-06  0:22 ` [PATCH] fast-import: fix incomplete conversion with multiple mark files brian m. carlson
  0 siblings, 2 replies; 9+ messages in thread
From: Billes Tibor @ 2020-06-05  5:15 UTC (permalink / raw)
  To: git, brian m. carlson, Junio C Hamano

Hi,

I recently upgraded my git to version 2.27.0-1~ppa0~ubuntu18.04.1 and
noticed
that git-fast-import uses so much memory it gets killed. I'm fetching from a
Mercurial repo using an importer from
https://github.com/mnauw/git-remote-hg.git which uses git-fast-import to
fetch
commits from Mercurial.

Here is an output of a git fetch showing is used 14Gb of RAM (on a 16Gb
machine)
# time git fetch
error: git-fast-import died of signal 9
fatal: error while running fast-import
Command exited with non-zero status 128
2.02user 3.82system 0:08.00elapsed 73%CPU (0avgtext+0avgdata
14744800maxresident)k
104920inputs+0outputs (414major+3688606minor)pagefaults 0swaps

strace shows that git-fast-import is reading the marks from a file, then
allocate some memory, reads more marks, allocates more memory, and so on:

11191 06:19:08.180572 read(7<.../.git/hg/origin/marks-git>, "79798
8ea080f15ab22807608aae4696dd23edefd8febe\n:220396
919079de10d43caf3fcde56bb1a17994b47a6214\n:75683
928813193a1535dc1274ed9da2f54f5de2caf2f4\n:155297
9108211d7ba318076fb53b2bd3d291102b376dbf\n:162042
9458fe329e9be30ad2b61e75197595889d80144b\n:305834
93485ce7991b4330a1114136b5d8e08d8bd1505b\n:223654
9750bdef7d22a885d2522bdd9e0a0e882979098e\n...", 4096) = 4096 <0.000027>
11191 06:19:08.182162 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be38ef000 <0.000024>
11191 06:19:08.183403 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be36ee000 <0.000127>
11191 06:19:08.184775 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be34ed000 <0.000059>
11191 06:19:08.186036 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be32ec000 <0.000121>
11191 06:19:08.187412 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be30eb000 <0.000110>
11191 06:19:08.188743 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be2eea000 <0.000022>
11191 06:19:08.189929 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be2ce9000 <0.000039>
11191 06:19:08.191150 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be2ae8000 <0.000019>
11191 06:19:08.192329 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be28e7000 <0.000023>
11191 06:19:08.193536 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be26e6000 <0.000038>
11191 06:19:08.194523 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be24e5000 <0.000019>
11191 06:19:08.195474 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be22e4000 <0.000212>
11191 06:19:08.196677 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be20e3000 <0.000027>
11191 06:19:08.197729 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be1ee2000 <0.000128>
11191 06:19:08.198883 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be1ce1000 <0.000043>
11191 06:19:08.199881 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be1ae0000 <0.000124>
11191 06:19:08.200959 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be18df000 <0.000020>
11191 06:19:08.201943 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be16de000 <0.000021>

The following shows that memory allocation seems to be linear with
respect to
the number of marks, but with a very high constant factor:
# cut -d' ' -f 3 /tmp/gitfetch.strace | cut -d '(' -f 1 | uniq -c

         [ ... cut (this is not the start of the allocations) ... ]
       1 read
      47 mmap
       1 read
      79 mmap
       1 read
      36 mmap
         [ ... removed some other syscalls ... ]
      73 mmap
       1 read
     141 mmap
       1 read
     173 mmap
       1 read
     204 mmap
       1 read
     235 mmap
       1 read
     267 mmap
       1 read
     297 mmap
       1 read
     329 mmap
       1 read
     361 mmap
       1 read
     392 mmap
       1 read
     424 mmap
       1 read
     454 mmap
       1 read
     493 mmap

My marks file contains 91k entries, git fetch reads only 1400 before killed.

I bisected the problem, below is my bisect log:
git bisect start
# good: [af6b65d45ef179ed52087e80cb089f6b2349f4ec] Git 2.26.2
git bisect good af6b65d45ef179ed52087e80cb089f6b2349f4ec
# bad: [b3d7a52fac39193503a0b6728771d1bf6a161464] Git 2.27
git bisect bad b3d7a52fac39193503a0b6728771d1bf6a161464
# bad: [af986863c1ae2e306d5627f4e42cc6d2cf2a057f] Merge branch
'dd/ci-musl-libc'
git bisect bad af986863c1ae2e306d5627f4e42cc6d2cf2a057f
# bad: [7a8bb6db7cc04add05484c4fc907e34f76b12fb9] Merge branch
'jm/gitweb-fastcgi-utf8'
git bisect bad 7a8bb6db7cc04add05484c4fc907e34f76b12fb9
# bad: [4e4baee3f44da26a5eaab27c76d597b04fef5259] Merge branch
'bc/filter-process'
git bisect bad 4e4baee3f44da26a5eaab27c76d597b04fef5259
# good: [883e23820ed21b4ae65463f2a87152285bf77937] Merge branch
'en/oidset-uninclude-hashmap'
git bisect good 883e23820ed21b4ae65463f2a87152285bf77937
# bad: [1bdca816412910e1206c15ef47f2a8a6b369b831] fast-import: add
options for rewriting submodules
git bisect bad 1bdca816412910e1206c15ef47f2a8a6b369b831
# good: [bf154a878281b6a971ece0fb6d917938298be60d] t/helper: make
repository tests hash independent
git bisect good bf154a878281b6a971ece0fb6d917938298be60d
# good: [e02a7141f83326f7098800fed764061ecf1f0eff] worktree: allow
repository version 1
git bisect good e02a7141f83326f7098800fed764061ecf1f0eff
# bad: [abe0cc536414f2b9cfa37f208b36df5126e6356a] fast-import: add
helper function for inserting mark object entries
git bisect bad abe0cc536414f2b9cfa37f208b36df5126e6356a
# bad: [ddddf8d7e254f4af6297d0ed62ea6a5d7eabdb64] fast-import: permit
reading multiple marks files
git bisect bad ddddf8d7e254f4af6297d0ed62ea6a5d7eabdb64
# good: [42d4e1d1128fa1cb56032ac58f65ea3dd1296a9a] commit: use expected
signature header for SHA-256
git bisect good 42d4e1d1128fa1cb56032ac58f65ea3dd1296a9a
# first bad commit: [ddddf8d7e254f4af6297d0ed62ea6a5d7eabdb64]
fast-import: permit reading multiple marks files

According to the bisect the first bad commit is:

commit ddddf8d7e254f4af6297d0ed62ea6a5d7eabdb64 (refs/bisect/bad)
Author: brian m. carlson <sandals@crustytoothpaste.net>
Date:   Sat Feb 22 20:17:45 2020 +0000

     fast-import: permit reading multiple marks files

     In the future, we'll want to read marks files for submodules as well.
     Refactor the existing code to make it possible to read multiple marks
     files, each into their own marks set.

     Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     Signed-off-by: Junio C Hamano <gitster@pobox.com>

When doing the bisect it was easier for me to use git from the Ubuntu
package and only replace the git-fast-import binary with the one I was
testing.
I hope it doesn't falsify the bisect results. The behavior seemed to be
consistent: it either produced the issue above, or it worked perfectly fine.

Can you help me fix this issue? I hope the information I gathered is
enough to
help you find the cause of this behavior. I'd be happy to provide more
information if needed or test patches.  Unfortunately the source code I was
fetching is proprietary, I cannot post it.

Best Regards,
Tibor Billes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git-fast-import out of memory
  2020-06-05  5:15 git-fast-import out of memory Billes Tibor
@ 2020-06-05 22:43 ` brian m. carlson
  2020-06-06  0:22 ` [PATCH] fast-import: fix incomplete conversion with multiple mark files brian m. carlson
  1 sibling, 0 replies; 9+ messages in thread
From: brian m. carlson @ 2020-06-05 22:43 UTC (permalink / raw)
  To: Billes Tibor; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

On 2020-06-05 at 05:15:04, Billes Tibor wrote:
> Can you help me fix this issue? I hope the information I gathered is
> enough to
> help you find the cause of this behavior. I'd be happy to provide more
> information if needed or test patches.  Unfortunately the source code I was
> fetching is proprietary, I cannot post it.

Hey, thanks for this report.  I can confirm this with git.git and the
following commands:

  git fast-export --all --reencode=yes --signed-tags=verbatim --tag-of-filtered-object=drop >$TMP/git.fe
  cd $TMP && rm -fr test-repo && git init test-repo &&
    (cd test-repo && /usr/bin/time git fast-import --export-marks=$TMP/git.marks < $TMP/git.fe &&
    /usr/bin/time git fast-import --import-marks=$TMP/git.marks < $TMP/git.fe)

The latter fast-import command gets killed with SIGKILL due to using 16
GiB of memory, which I would agree is unreasonable.

It looks like with Junio's patch at
<xmqqeeqto4x1.fsf@gitster.c.googlers.com>, the memory usage is down to
1.4 GiB, which considering that a pack must be written, is much more
reasonable.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] fast-import: fix incomplete conversion with multiple mark files
  2020-06-05  5:15 git-fast-import out of memory Billes Tibor
  2020-06-05 22:43 ` brian m. carlson
@ 2020-06-06  0:22 ` brian m. carlson
  2020-06-06  0:22   ` [PATCH v2 0/1] Run pipeline command in subshell in sh mode brian m. carlson
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: brian m. carlson @ 2020-06-06  0:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Billes Tibor

When ddddf8d7e2 ("fast-import: permit reading multiple marks files",
2020-02-22) converted fast-import to handle multiple marks files in
preparation for submodule support, the conversion was incomplete.  With
a large number of marks, we would actually modify the marks variable
even though we had passed in a different variable to operate on.  In
addition, we didn't consider the fact that the code can replace the mark
set passed in, so when we did so we happened to leak quite a bit of
memory, since we never reused the structure we created, instead
reallocating a new one each time.

It doesn't appear from some testing that we actually produce incorrect
results in this case, only that we leak a substantial amount of memory.
To make things work properly and avoid leaking, pass a pointer to
pointer to struct mark_set, which allows us to modify the set of marks
when the number of marks is large.

With this patch, importing a dump of git.git with a set of exported
marks goes from taking in excess of 15 GiB of memory (and being killed
by the Linux OOM killer) to using a maximum of 1.4 GiB of memory.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 0dfa14dc8c..ed87d6e380 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -150,7 +150,7 @@ struct recent_command {
 	char *buf;
 };
 
-typedef void (*mark_set_inserter_t)(struct mark_set *s, struct object_id *oid, uintmax_t mark);
+typedef void (*mark_set_inserter_t)(struct mark_set **s, struct object_id *oid, uintmax_t mark);
 typedef void (*each_mark_fn_t)(uintmax_t mark, void *obj, void *cbp);
 
 /* Configured limits on output */
@@ -534,13 +534,15 @@ static char *pool_strdup(const char *s)
 	return r;
 }
 
-static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry *oe)
+static void insert_mark(struct mark_set **sp, uintmax_t idnum, struct object_entry *oe)
 {
+	struct mark_set *s = *sp;
+
 	while ((idnum >> s->shift) >= 1024) {
 		s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));
-		s->shift = marks->shift + 10;
-		s->data.sets[0] = marks;
-		marks = s;
+		s->shift = (*sp)->shift + 10;
+		s->data.sets[0] = (*sp);
+		(*sp) = s;
 	}
 	while (s->shift) {
 		uintmax_t i = idnum >> s->shift;
@@ -958,7 +960,7 @@ static int store_object(
 
 	e = insert_object(&oid);
 	if (mark)
-		insert_mark(marks, mark, e);
+		insert_mark(&marks, mark, e);
 	if (e->idx.offset) {
 		duplicate_count_by_type[type]++;
 		return 1;
@@ -1156,7 +1158,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 	e = insert_object(&oid);
 
 	if (mark)
-		insert_mark(marks, mark, e);
+		insert_mark(&marks, mark, e);
 
 	if (e->idx.offset) {
 		duplicate_count_by_type[OBJ_BLOB]++;
@@ -1731,7 +1733,7 @@ static void dump_marks(void)
 	}
 }
 
-static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
+static void insert_object_entry(struct mark_set **s, struct object_id *oid, uintmax_t mark)
 {
 	struct object_entry *e;
 	e = find_object(oid);
@@ -1748,12 +1750,12 @@ static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintm
 	insert_mark(s, mark, e);
 }
 
-static void insert_oid_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
+static void insert_oid_entry(struct mark_set **s, struct object_id *oid, uintmax_t mark)
 {
 	insert_mark(s, mark, xmemdupz(oid, sizeof(*oid)));
 }
 
-static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inserter)
+static void read_mark_file(struct mark_set **s, FILE *f, mark_set_inserter_t inserter)
 {
 	char line[512];
 	while (fgets(line, sizeof(line), f)) {
@@ -1786,7 +1788,7 @@ static void read_marks(void)
 		goto done; /* Marks file does not exist */
 	else
 		die_errno("cannot read '%s'", import_marks_file);
-	read_mark_file(marks, f, insert_object_entry);
+	read_mark_file(&marks, f, insert_object_entry);
 	fclose(f);
 done:
 	import_marks_file_done = 1;
@@ -3242,7 +3244,7 @@ static void parse_alias(void)
 		die(_("Expected 'to' command, got %s"), command_buf.buf);
 	e = find_object(&b.oid);
 	assert(e);
-	insert_mark(marks, next_mark, e);
+	insert_mark(&marks, next_mark, e);
 }
 
 static char* make_fast_import_path(const char *path)
@@ -3340,7 +3342,7 @@ static void option_rewrite_submodules(const char *arg, struct string_list *list)
 	fp = fopen(f, "r");
 	if (!fp)
 		die_errno("cannot read '%s'", f);
-	read_mark_file(ms, fp, insert_oid_entry);
+	read_mark_file(&ms, fp, insert_oid_entry);
 	fclose(fp);
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 0/1] Run pipeline command in subshell in sh mode
  2020-06-06  0:22 ` [PATCH] fast-import: fix incomplete conversion with multiple mark files brian m. carlson
@ 2020-06-06  0:22   ` brian m. carlson
  2020-06-06  0:31     ` brian m. carlson
  2020-06-06  0:22   ` [PATCH v2] exec: run final pipeline command in a " brian m. carlson
  2020-06-08 15:52   ` [PATCH] fast-import: fix incomplete conversion with multiple mark files Tibor Billes
  2 siblings, 1 reply; 9+ messages in thread
From: brian m. carlson @ 2020-06-06  0:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Billes Tibor

POSIX sh implementations run each command in a pipeline in a subshell,
although zsh (and AT&T ksh) do not: instead, they run the final command
in the main shell.  This leads to very different behavior when the final
command is a shell function which modifies variables.

zsh is starting to be used in some cases as /bin/sh, such as on macOS
Catalina.  Consequently, it makes sense to emulate the POSIX behavior as
much as possible when emulating sh, since that's the least surprising
behavior.  This patch does exactly that.

With this patch, using "zsh --emulate sh" passes the Git testsuite.  I
expect that it will also be fully functional as /bin/sh on Debian,
although I have not tested.

This patch was sent before, but didn't get picked up.  In hopes of
aiding reviewers, I've resent it with a significantly expanded commit
message so that it is easier to reason about.

I'm not subscribed to the list, so please CC me if you have questions or
comments.

brian m. carlson (1):
  exec: run final pipeline command in a subshell in sh mode

 Src/exec.c           | 10 ++++++----
 Test/B07emulate.ztst | 22 ++++++++++++++++++++++
 2 files changed, 28 insertions(+), 4 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] exec: run final pipeline command in a subshell in sh mode
  2020-06-06  0:22 ` [PATCH] fast-import: fix incomplete conversion with multiple mark files brian m. carlson
  2020-06-06  0:22   ` [PATCH v2 0/1] Run pipeline command in subshell in sh mode brian m. carlson
@ 2020-06-06  0:22   ` brian m. carlson
  2020-06-08 15:52   ` [PATCH] fast-import: fix incomplete conversion with multiple mark files Tibor Billes
  2 siblings, 0 replies; 9+ messages in thread
From: brian m. carlson @ 2020-06-06  0:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Billes Tibor

zsh typically runs the final command in a pipeline in the main shell
instead of a subshell.  However, POSIX requires that all commands in a
pipeline run in a subshell, but permits zsh's behavior as an extension.

Since zsh may be used as /bin/sh in some cases (such as macOS Catalina),
it makes sense to have the POSIX behavior when emulating sh, so do that
by checking for being the final item of a multi-item pipeline and
creating a subshell in that case.

From the comment above execpline(), we know the following:

  last1 is a flag that this command is the last command in a shell that
  is about to exit, so we can exec instead of forking.  It gets passed
  all the way down to execcmd() which actually makes the decision.  A 0
  is always passed if the command is not the last in the pipeline. […]
  If last1 is zero but the command is at the end of a pipeline, we pass
  2 down to execcmd().

So there are three cases to consider in this code:

• last1 is 0, which means we are not at the end of a pipeline, in which
  case we should not change behavior.
• last1 is 1, which means we are effectively running in a subshell,
  because nothing that happens due to the exec is going to affect the
  actual shell, since it will have been replaced.  So there is nothing
  to do here.
• last1 is 2, which means our command is at the end of the pipeline, so
  in sh mode we should create a subshell by forking.

input is nonzero if the input to this process is a pipe that we've
opened.  At the end of a multi-stage pipeline, it will necessarily be
nonzero.

Note that several of the tests may appear bizarre, since most developers
do not place useless variable assignments directly at the end of a
pipeline.  However, as the function tests demonstrate, there are cases
where assignments may occur when a shell function is used at the end of
a command.  The remaining assignment tests simply test additional cases,
such as the use of local, that would otherwise be untested.
---
 Src/exec.c           | 10 ++++++----
 Test/B07emulate.ztst | 22 ++++++++++++++++++++++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/Src/exec.c b/Src/exec.c
index 29f4fc5ca..f2650f311 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -2866,11 +2866,13 @@ execcmd_exec(Estate state, Execcmd_params eparams,
 	    pushnode(args, dupstring("fg"));
     }

-    if ((how & Z_ASYNC) || output) {
+    if ((how & Z_ASYNC) || output ||
+	(last1 == 2 && input && EMULATION(EMULATE_SH))) {
 	/*
-	 * If running in the background, or not the last command in a
-	 * pipeline, we don't need any of the rest of this function to
-	 * affect the state in the main shell, so fork immediately.
+	 * If running in the background, not the last command in a
+	 * pipeline, or the last command in a multi-stage pipeline
+	 * in sh mode, we don't need any of the rest of this function
+	 * to affect the state in the main shell, so fork immediately.
 	 *
 	 * In other cases we may need to process the command line
 	 * a bit further before we make the decision.
diff --git a/Test/B07emulate.ztst b/Test/B07emulate.ztst
index 7b1592fa9..45c39b51d 100644
--- a/Test/B07emulate.ztst
+++ b/Test/B07emulate.ztst
@@ -276,3 +276,25 @@ F:Some reserved tokens are handled in alias expansion
 0:--emulate followed by other options
 >yes
 >no
+
+  emulate sh -c '
+  foo () {
+    VAR=foo &&
+    echo $VAR | bar &&
+    echo "$VAR"
+  }
+  bar () {
+    tr f b &&
+    VAR="$(echo bar | tr r z)" &&
+    echo "$VAR"
+  }
+  foo
+  '
+  emulate sh -c 'func() { echo | local def="abc"; echo $def;}; func'
+  emulate sh -c 'abc="def"; echo | abc="ghi"; echo $abc'
+0:emulate sh uses subshell for last pipe entry
+>boo
+>baz
+>foo
+>
+>def

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/1] Run pipeline command in subshell in sh mode
  2020-06-06  0:22   ` [PATCH v2 0/1] Run pipeline command in subshell in sh mode brian m. carlson
@ 2020-06-06  0:31     ` brian m. carlson
  0 siblings, 0 replies; 9+ messages in thread
From: brian m. carlson @ 2020-06-06  0:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Billes Tibor

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

On 2020-06-06 at 00:22:40, brian m. carlson wrote:
> POSIX sh implementations run each command in a pipeline in a subshell,
> although zsh (and AT&T ksh) do not: instead, they run the final command
> in the main shell.  This leads to very different behavior when the final
> command is a shell function which modifies variables.

Sorry about this and the following patch.  I forgot to clean my patches
directory before sending, and accidentally sent some patches for zsh.
My apologies.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import: fix incomplete conversion with multiple mark files
  2020-06-06  0:22 ` [PATCH] fast-import: fix incomplete conversion with multiple mark files brian m. carlson
  2020-06-06  0:22   ` [PATCH v2 0/1] Run pipeline command in subshell in sh mode brian m. carlson
  2020-06-06  0:22   ` [PATCH v2] exec: run final pipeline command in a " brian m. carlson
@ 2020-06-08 15:52   ` Tibor Billes
  2020-06-08 16:47     ` Junio C Hamano
  2 siblings, 1 reply; 9+ messages in thread
From: Tibor Billes @ 2020-06-08 15:52 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano

Hi,

On Sat, 6 Jun 2020, brian m. carlson wrote:

> When ddddf8d7e2 ("fast-import: permit reading multiple marks files",
> 2020-02-22) converted fast-import to handle multiple marks files in
> preparation for submodule support, the conversion was incomplete.  With
> a large number of marks, we would actually modify the marks variable
> even though we had passed in a different variable to operate on.  In
> addition, we didn't consider the fact that the code can replace the mark
> set passed in, so when we did so we happened to leak quite a bit of
> memory, since we never reused the structure we created, instead
> reallocating a new one each time.
>
> It doesn't appear from some testing that we actually produce incorrect
> results in this case, only that we leak a substantial amount of memory.
> To make things work properly and avoid leaking, pass a pointer to
> pointer to struct mark_set, which allows us to modify the set of marks
> when the number of marks is large.
>
> With this patch, importing a dump of git.git with a set of exported
> marks goes from taking in excess of 15 GiB of memory (and being killed
> by the Linux OOM killer) to using a maximum of 1.4 GiB of memory.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>

Thanks for the quickly patching it! I tested the patch and I can confirm this
solves the memory leak for me.

Thanks,
Tibor Billes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import: fix incomplete conversion with multiple mark files
  2020-06-08 15:52   ` [PATCH] fast-import: fix incomplete conversion with multiple mark files Tibor Billes
@ 2020-06-08 16:47     ` Junio C Hamano
  2020-06-08 22:58       ` brian m. carlson
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2020-06-08 16:47 UTC (permalink / raw)
  To: Tibor Billes; +Cc: brian m. carlson, git

Tibor Billes <tbilles@gmx.com> writes:

> On Sat, 6 Jun 2020, brian m. carlson wrote:
>
>> When ddddf8d7e2 ("fast-import: permit reading multiple marks files",
>> 2020-02-22) converted fast-import to handle multiple marks files in
>> preparation for submodule support, the conversion was incomplete.
>...
>
> Thanks for the quickly patching it! I tested the patch and I can confirm this
> solves the memory leak for me.

Tibor, thanks for testing.

Brian, I notice that the singleton global "marks_set_count", even
though the number could be counted per instance of the mark_set
structure, is still singleton.  I didn't bother thinking about it
when I wrote my "perhaps along this line" patch, but now I read it
again, I think it is OK to leave it (and other stats counters) as
is, primarily because we don't have a need (yet) to show stats per
mark_set.  Did you leave it as a global for the same reason?  Just
sanity-checking.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import: fix incomplete conversion with multiple mark files
  2020-06-08 16:47     ` Junio C Hamano
@ 2020-06-08 22:58       ` brian m. carlson
  0 siblings, 0 replies; 9+ messages in thread
From: brian m. carlson @ 2020-06-08 22:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Tibor Billes, git

[-- Attachment #1: Type: text/plain, Size: 959 bytes --]

On 2020-06-08 at 16:47:58, Junio C Hamano wrote:
> Brian, I notice that the singleton global "marks_set_count", even
> though the number could be counted per instance of the mark_set
> structure, is still singleton.  I didn't bother thinking about it
> when I wrote my "perhaps along this line" patch, but now I read it
> again, I think it is OK to leave it (and other stats counters) as
> is, primarily because we don't have a need (yet) to show stats per
> mark_set.  Did you leave it as a global for the same reason?  Just
> sanity-checking.

Yes, I did; sorry for not mentioning that in the commit message.  I
think it's fine to count the total number of marks processed as a
statistic, and that would be independent of how we processed them.  It's
an interesting metric, but not so interesting that folks will have a
need to see a detailed breakdown, I think.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-08 22:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05  5:15 git-fast-import out of memory Billes Tibor
2020-06-05 22:43 ` brian m. carlson
2020-06-06  0:22 ` [PATCH] fast-import: fix incomplete conversion with multiple mark files brian m. carlson
2020-06-06  0:22   ` [PATCH v2 0/1] Run pipeline command in subshell in sh mode brian m. carlson
2020-06-06  0:31     ` brian m. carlson
2020-06-06  0:22   ` [PATCH v2] exec: run final pipeline command in a " brian m. carlson
2020-06-08 15:52   ` [PATCH] fast-import: fix incomplete conversion with multiple mark files Tibor Billes
2020-06-08 16:47     ` Junio C Hamano
2020-06-08 22:58       ` brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).