git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-fast-import out of memory
@ 2020-06-05  5:15 Billes Tibor
  2020-06-05 22:43 ` brian m. carlson
  2020-06-06  0:22 ` [PATCH] fast-import: fix incomplete conversion with multiple mark files brian m. carlson
  0 siblings, 2 replies; 9+ messages in thread
From: Billes Tibor @ 2020-06-05  5:15 UTC (permalink / raw)
  To: git, brian m. carlson, Junio C Hamano

Hi,

I recently upgraded my git to version 2.27.0-1~ppa0~ubuntu18.04.1 and
noticed
that git-fast-import uses so much memory it gets killed. I'm fetching from a
Mercurial repo using an importer from
https://github.com/mnauw/git-remote-hg.git which uses git-fast-import to
fetch
commits from Mercurial.

Here is an output of a git fetch showing is used 14Gb of RAM (on a 16Gb
machine)
# time git fetch
error: git-fast-import died of signal 9
fatal: error while running fast-import
Command exited with non-zero status 128
2.02user 3.82system 0:08.00elapsed 73%CPU (0avgtext+0avgdata
14744800maxresident)k
104920inputs+0outputs (414major+3688606minor)pagefaults 0swaps

strace shows that git-fast-import is reading the marks from a file, then
allocate some memory, reads more marks, allocates more memory, and so on:

11191 06:19:08.180572 read(7<.../.git/hg/origin/marks-git>, "79798
8ea080f15ab22807608aae4696dd23edefd8febe\n:220396
919079de10d43caf3fcde56bb1a17994b47a6214\n:75683
928813193a1535dc1274ed9da2f54f5de2caf2f4\n:155297
9108211d7ba318076fb53b2bd3d291102b376dbf\n:162042
9458fe329e9be30ad2b61e75197595889d80144b\n:305834
93485ce7991b4330a1114136b5d8e08d8bd1505b\n:223654
9750bdef7d22a885d2522bdd9e0a0e882979098e\n...", 4096) = 4096 <0.000027>
11191 06:19:08.182162 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be38ef000 <0.000024>
11191 06:19:08.183403 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be36ee000 <0.000127>
11191 06:19:08.184775 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be34ed000 <0.000059>
11191 06:19:08.186036 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be32ec000 <0.000121>
11191 06:19:08.187412 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be30eb000 <0.000110>
11191 06:19:08.188743 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be2eea000 <0.000022>
11191 06:19:08.189929 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be2ce9000 <0.000039>
11191 06:19:08.191150 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be2ae8000 <0.000019>
11191 06:19:08.192329 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be28e7000 <0.000023>
11191 06:19:08.193536 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be26e6000 <0.000038>
11191 06:19:08.194523 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be24e5000 <0.000019>
11191 06:19:08.195474 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be22e4000 <0.000212>
11191 06:19:08.196677 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be20e3000 <0.000027>
11191 06:19:08.197729 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be1ee2000 <0.000128>
11191 06:19:08.198883 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be1ce1000 <0.000043>
11191 06:19:08.199881 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be1ae0000 <0.000124>
11191 06:19:08.200959 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be18df000 <0.000020>
11191 06:19:08.201943 mmap(NULL, 2101248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5be16de000 <0.000021>

The following shows that memory allocation seems to be linear with
respect to
the number of marks, but with a very high constant factor:
# cut -d' ' -f 3 /tmp/gitfetch.strace | cut -d '(' -f 1 | uniq -c

         [ ... cut (this is not the start of the allocations) ... ]
       1 read
      47 mmap
       1 read
      79 mmap
       1 read
      36 mmap
         [ ... removed some other syscalls ... ]
      73 mmap
       1 read
     141 mmap
       1 read
     173 mmap
       1 read
     204 mmap
       1 read
     235 mmap
       1 read
     267 mmap
       1 read
     297 mmap
       1 read
     329 mmap
       1 read
     361 mmap
       1 read
     392 mmap
       1 read
     424 mmap
       1 read
     454 mmap
       1 read
     493 mmap

My marks file contains 91k entries, git fetch reads only 1400 before killed.

I bisected the problem, below is my bisect log:
git bisect start
# good: [af6b65d45ef179ed52087e80cb089f6b2349f4ec] Git 2.26.2
git bisect good af6b65d45ef179ed52087e80cb089f6b2349f4ec
# bad: [b3d7a52fac39193503a0b6728771d1bf6a161464] Git 2.27
git bisect bad b3d7a52fac39193503a0b6728771d1bf6a161464
# bad: [af986863c1ae2e306d5627f4e42cc6d2cf2a057f] Merge branch
'dd/ci-musl-libc'
git bisect bad af986863c1ae2e306d5627f4e42cc6d2cf2a057f
# bad: [7a8bb6db7cc04add05484c4fc907e34f76b12fb9] Merge branch
'jm/gitweb-fastcgi-utf8'
git bisect bad 7a8bb6db7cc04add05484c4fc907e34f76b12fb9
# bad: [4e4baee3f44da26a5eaab27c76d597b04fef5259] Merge branch
'bc/filter-process'
git bisect bad 4e4baee3f44da26a5eaab27c76d597b04fef5259
# good: [883e23820ed21b4ae65463f2a87152285bf77937] Merge branch
'en/oidset-uninclude-hashmap'
git bisect good 883e23820ed21b4ae65463f2a87152285bf77937
# bad: [1bdca816412910e1206c15ef47f2a8a6b369b831] fast-import: add
options for rewriting submodules
git bisect bad 1bdca816412910e1206c15ef47f2a8a6b369b831
# good: [bf154a878281b6a971ece0fb6d917938298be60d] t/helper: make
repository tests hash independent
git bisect good bf154a878281b6a971ece0fb6d917938298be60d
# good: [e02a7141f83326f7098800fed764061ecf1f0eff] worktree: allow
repository version 1
git bisect good e02a7141f83326f7098800fed764061ecf1f0eff
# bad: [abe0cc536414f2b9cfa37f208b36df5126e6356a] fast-import: add
helper function for inserting mark object entries
git bisect bad abe0cc536414f2b9cfa37f208b36df5126e6356a
# bad: [ddddf8d7e254f4af6297d0ed62ea6a5d7eabdb64] fast-import: permit
reading multiple marks files
git bisect bad ddddf8d7e254f4af6297d0ed62ea6a5d7eabdb64
# good: [42d4e1d1128fa1cb56032ac58f65ea3dd1296a9a] commit: use expected
signature header for SHA-256
git bisect good 42d4e1d1128fa1cb56032ac58f65ea3dd1296a9a
# first bad commit: [ddddf8d7e254f4af6297d0ed62ea6a5d7eabdb64]
fast-import: permit reading multiple marks files

According to the bisect the first bad commit is:

commit ddddf8d7e254f4af6297d0ed62ea6a5d7eabdb64 (refs/bisect/bad)
Author: brian m. carlson <sandals@crustytoothpaste.net>
Date:   Sat Feb 22 20:17:45 2020 +0000

     fast-import: permit reading multiple marks files

     In the future, we'll want to read marks files for submodules as well.
     Refactor the existing code to make it possible to read multiple marks
     files, each into their own marks set.

     Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     Signed-off-by: Junio C Hamano <gitster@pobox.com>

When doing the bisect it was easier for me to use git from the Ubuntu
package and only replace the git-fast-import binary with the one I was
testing.
I hope it doesn't falsify the bisect results. The behavior seemed to be
consistent: it either produced the issue above, or it worked perfectly fine.

Can you help me fix this issue? I hope the information I gathered is
enough to
help you find the cause of this behavior. I'd be happy to provide more
information if needed or test patches.  Unfortunately the source code I was
fetching is proprietary, I cannot post it.

Best Regards,
Tibor Billes

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-08 22:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05  5:15 git-fast-import out of memory Billes Tibor
2020-06-05 22:43 ` brian m. carlson
2020-06-06  0:22 ` [PATCH] fast-import: fix incomplete conversion with multiple mark files brian m. carlson
2020-06-06  0:22   ` [PATCH v2 0/1] Run pipeline command in subshell in sh mode brian m. carlson
2020-06-06  0:31     ` brian m. carlson
2020-06-06  0:22   ` [PATCH v2] exec: run final pipeline command in a " brian m. carlson
2020-06-08 15:52   ` [PATCH] fast-import: fix incomplete conversion with multiple mark files Tibor Billes
2020-06-08 16:47     ` Junio C Hamano
2020-06-08 22:58       ` brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).