git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git repack on shallow clone of large repo (linux kernel) hangs at "Enumerating objects"
@ 2021-05-16 13:09 Bagas Sanjaya
  2021-05-17  8:51 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Bagas Sanjaya @ 2021-05-16 13:09 UTC (permalink / raw)
  To: Git Users

Hi,

I have a shallow clone of linux-stable repo [1] on my computer. Now
I'm trying to repack with `git repack -A -d`.

Before repacking, here is the object counts on my clone
(`git count-objects -v`):

> count: 0
> size: 0
> in-pack: 3162206
> packs: 17
> size-pack: 3120393
> prune-packable: 0
> garbage: 0
> size-garbage: 0

And I have 41496 commits (only on master).

And here are relevant config used:

> pack.deltacachesize=120M
> pack.windowmemory=400M
> pack.packsizelimit=650M
> pack.autopacklimit=0

When I trigger repack operation, I expected that all objects on 17 packs
are consolidated into several 650M-sized packs. However, in my case, repacking
was hang at "Enumerating objects" stage, that is I stuck at:

"Eumerating objects: 902036"

I also try to reproduce the issue with other repo, such as with local copy
of GCC repo on my computer [2] using similar config, and the repack succeed
without any errors.

Am I missing something?

[1]: https://github.com/gregkh/linux
[2]: https://github.com/gcc-mirror/gcc

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git repack on shallow clone of large repo (linux kernel) hangs at "Enumerating objects"
  2021-05-16 13:09 git repack on shallow clone of large repo (linux kernel) hangs at "Enumerating objects" Bagas Sanjaya
@ 2021-05-17  8:51 ` Jeff King
  2021-05-18 11:23   ` Bagas Sanjaya
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2021-05-17  8:51 UTC (permalink / raw)
  To: Bagas Sanjaya; +Cc: Git Users

On Sun, May 16, 2021 at 08:09:56PM +0700, Bagas Sanjaya wrote:

> I have a shallow clone of linux-stable repo [1] on my computer. Now
> I'm trying to repack with `git repack -A -d`.
> 
> Before repacking, here is the object counts on my clone
> (`git count-objects -v`):
> 
> > count: 0
> > size: 0
> > in-pack: 3162206
> > packs: 17
> > size-pack: 3120393
> > prune-packable: 0
> > garbage: 0
> > size-garbage: 0
> 
> And I have 41496 commits (only on master).
> 
> And here are relevant config used:
> 
> > pack.deltacachesize=120M
> > pack.windowmemory=400M
> > pack.packsizelimit=650M
> > pack.autopacklimit=0
> 
> When I trigger repack operation, I expected that all objects on 17 packs
> are consolidated into several 650M-sized packs. However, in my case, repacking
> was hang at "Enumerating objects" stage, that is I stuck at:
> 
> "Eumerating objects: 902036"

You could try using strace or gdb to see what it's doing.

But as a guess, one thing that sometimes causes a stall near the end of
"enumerating objects" is loosening unreachable objects that are
currently packed. You told repack to use "-A", which asks to loosen
those objects so they aren't lost when the old packs are deleted (as
opposed to "-a").

You'd probably want to at least say "--unpack-unreachable=some.time" to
avoid writing out ones which are not even recent (and which is what "git
gc" will do under the hood).

But if you don't care about keeping them at all (e.g., because this is
not an active repository where other simultaneous operations may be
taking place, so you know it is safe to delete even recent ones), then
just "git repack -a -d" is probably your best bet.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git repack on shallow clone of large repo (linux kernel) hangs at "Enumerating objects"
  2021-05-17  8:51 ` Jeff King
@ 2021-05-18 11:23   ` Bagas Sanjaya
  2021-05-18 12:07     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Bagas Sanjaya @ 2021-05-18 11:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Users

[-- Attachment #1: Type: text/plain, Size: 2215 bytes --]

Hi Jeff,

> You could try using strace or gdb to see what it's doing.
> 
> But as a guess, one thing that sometimes causes a stall near the end of
> "enumerating objects" is loosening unreachable objects that are
> currently packed. You told repack to use "-A", which asks to loosen
> those objects so they aren't lost when the old packs are deleted (as
> opposed to "-a").
> 

I attached two strace logs, one for "git repack -A -d" and one for "git
repack -a -d".

For the former, I got following excerpt before I had to SIGINT the process:

> stat("/opt/git/libexec/git-core/git", {st_mode=S_IFREG|0755, st_size=22096480, ...}) = 0
> pipe([5, 7])                            = 0
> openat(AT_FDCWD, "/dev/null", O_RDWR|O_CLOEXEC) = 8
> fcntl(8, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
> fcntl(8, F_SETFD, FD_CLOEXEC)           = 0
> rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
> clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7feb5ecbfa10) = 13691
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> close(7)                                = 0
> read(5, "", 8)                          = 0
> close(5)                                = 0
> close(8)                                = 0
> close(4)                                = 0
> fcntl(3, F_GETFL)                       = 0 (flags O_RDONLY)
> fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
> read(3, 0x55a540de5250, 4096)           = ? ERESTARTSYS (To be restarted if SA_RESTART is set)

I thought that in the case of "git repack -A -d", it is stucked at the
last read() before I ctrl-c'ed to trigger SIGINT.

> You'd probably want to at least say "--unpack-unreachable=some.time" to
> avoid writing out ones which are not even recent (and which is what "git
> gc" will do under the hood).
>> But if you don't care about keeping them at all (e.g., because this is
> not an active repository where other simultaneous operations may be
> taking place, so you know it is safe to delete even recent ones), then
> just "git repack -a -d" is probably your best bet.
> 
> -Peff
> 
Using "-a" instead of "-A" on git repack works. Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: git-repack-ad.strace.xz --]
[-- Type: application/x-xz, Size: 6408 bytes --]

[-- Attachment #3: git-repack-Ad.strace.xz --]
[-- Type: application/x-xz, Size: 3228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git repack on shallow clone of large repo (linux kernel) hangs at "Enumerating objects"
  2021-05-18 11:23   ` Bagas Sanjaya
@ 2021-05-18 12:07     ` Jeff King
  2021-05-22 11:16       ` Bagas Sanjaya
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2021-05-18 12:07 UTC (permalink / raw)
  To: Bagas Sanjaya; +Cc: Git Users

On Tue, May 18, 2021 at 06:23:40PM +0700, Bagas Sanjaya wrote:

> > You could try using strace or gdb to see what it's doing.
> > 
> > But as a guess, one thing that sometimes causes a stall near the end of
> > "enumerating objects" is loosening unreachable objects that are
> > currently packed. You told repack to use "-A", which asks to loosen
> > those objects so they aren't lost when the old packs are deleted (as
> > opposed to "-a").
> > 
> 
> I attached two strace logs, one for "git repack -A -d" and one for "git
> repack -a -d".
> 
> For the former, I got following excerpt before I had to SIGINT the process:
> 
> > stat("/opt/git/libexec/git-core/git", {st_mode=S_IFREG|0755, st_size=22096480, ...}) = 0
> > pipe([5, 7])                            = 0
> > openat(AT_FDCWD, "/dev/null", O_RDWR|O_CLOEXEC) = 8
> > fcntl(8, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
> > fcntl(8, F_SETFD, FD_CLOEXEC)           = 0
> > rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
> > clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7feb5ecbfa10) = 13691
> > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> > close(7)                                = 0
> > read(5, "", 8)                          = 0
> > close(5)                                = 0
> > close(8)                                = 0
> > close(4)                                = 0
> > fcntl(3, F_GETFL)                       = 0 (flags O_RDONLY)
> > fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
> > read(3, 0x55a540de5250, 4096)           = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> 
> I thought that in the case of "git repack -A -d", it is stucked at the
> last read() before I ctrl-c'ed to trigger SIGINT.

You need "strace -f". The parent repack process spawns pack-objects (via
the clone() call above), and then waits for it to print the name of the
generated pack at the end. So it will stall on that read() for quite a
while, even under normal circumstances.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git repack on shallow clone of large repo (linux kernel) hangs at "Enumerating objects"
  2021-05-18 12:07     ` Jeff King
@ 2021-05-22 11:16       ` Bagas Sanjaya
  2021-05-22 12:11         ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Bagas Sanjaya @ 2021-05-22 11:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Users

Hi Jeff, sorry for long reply.

On 18/05/21 19.07, Jeff King wrote:
> You need "strace -f". The parent repack process spawns pack-objects (via
> the clone() call above), and then waits for it to print the name of the
> generated pack at the end. So it will stall on that read() for quite a
> while, even under normal circumstances.
> 
> -Peff
> 

Now on my clone, git count-objects -v says:

> count: 0
> size: 0
> in-pack: 3469192
> packs: 7
> size-pack: 2181233
> prune-packable: 0
> garbage: 0
> size-garbage: 0

And I have 60211 commits, with the oldest tag available is v3.15-rc3.

Here is excerpt from strace -f just before I triggered SIGINT due to hang:

> 22903 openat(AT_FDCWD, ".git/objects/3b/tmp_obj_24pz93", O_RDWR|O_CREAT|O_EXCL, 0444) = 3
> 22903 write(3, "x\1+)JMU0421d040031Q\320K\316I\314K\327M\313/\312M,"..., 1113) = 1113
> 22903 close(3)                          = 0
> 22903 utime(".git/objects/3b/tmp_obj_24pz93", {actime=1621599665 /* 2021-05-21T19:21:05+0700 */, modtime=1621599665 /* 2021-05-21T19:21:05+0700 */}) = 0
> 22903 link(".git/objects/3b/tmp_obj_24pz93", ".git/objects/3b/816f00a02062692e95a9a756247fca34abb911") = 0
> 22903 unlink(".git/objects/3b/tmp_obj_24pz93") = 0
> 22903 access(".git/objects/3b/819396230eda4ce9be9fbb2c91c13ebb28e8d3", F_OK) = -1 ENOENT (No such file or directory)
> 22903 getpid()                          = 22903
> 22903 openat(AT_FDCWD, ".git/objects/3b/tmp_obj_yrHAe4", O_RDWR|O_CREAT|O_EXCL, 0444) = 3
> 22903 write(3, "x\1\224}y<\324\337\367\377(\205\262E\226\262D\205T\0233\214=\n!\25\205R\22\306\314`"..., 4096) = 4096
> 22903 write(3, "\365\16\236\215!&\37 \v\300\240\334\33r\266/_\317Hg,\333\356w\366\235\307\245<E\347\t"..., 4096) = 4096
> 22903 write(3, "\362\367\267\224Pc\207\252!\240$\"\26\212\204\22\2315\245\255\357j\373f\316G9\352[\r!\326"..., 4096) = 4096
> 22903 write(3, "\305\vC3\253\217\343\17kl>v\200_\226Y\206\247\21@\327mi\310\177{7{i\214w\243"..., 4096) = 4096
> 22903 write(3, "@X\17\213\304\307z\232v\355\313y\316,\342\202p\"\343|x\306\306<\335DM\350\315!m\207"..., 4096) = 4096
> 22903 write(3, "'\253R\33\267\310p\3\332\201\333\316\254'\3114\347\277\0321\35\223\314i\255v\252X{[r\33"..., 4096) = 4096
> 22903 write(3, "\261\341a\252\355\360\356\340\375A$\214\377\336O\311A\215\327\5g2\\\324\332\276\363\323\306S\215\342"..., 4096) = 4096
> 22903 write(3, "s_n83S\326\224]H\271\246-\267\35\32\234\6s!\273\177\30\256k\\eW\355E\264^"..., 4096) = 4096
> 22903 write(3, "E\0266\37\30244Mtk9\310\305\227\370\245\355qwUi\363\237\302\347S\4Xp\342\330\370"..., 4096) = 4096
> 22903 write(3, "v\267\303B\177\232v{\342\353\1\340A\205[\336\266\250_e\346\347e)\212^\241]\263\233\270\323"..., 4096) = 4096
> 22903 write(3, "\325dA\266\333\221\274\32\367\277\271nh\327\276\272C\323-E\242\351\232\352\303]|\335\227\227\200\301"..., 4096 <unfinished ...>
> 22902 <... read resumed>0x5594d878b7b0, 4096) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> 22903 <... write resumed>)              = 4096

The last write() sequence seems running repeatedly, what it means? Infinite loop?

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git repack on shallow clone of large repo (linux kernel) hangs at "Enumerating objects"
  2021-05-22 11:16       ` Bagas Sanjaya
@ 2021-05-22 12:11         ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2021-05-22 12:11 UTC (permalink / raw)
  To: Bagas Sanjaya; +Cc: Git Users

On Sat, May 22, 2021 at 06:16:15PM +0700, Bagas Sanjaya wrote:

> Here is excerpt from strace -f just before I triggered SIGINT due to hang:
> 
> > 22903 openat(AT_FDCWD, ".git/objects/3b/tmp_obj_24pz93", O_RDWR|O_CREAT|O_EXCL, 0444) = 3
> > 22903 write(3, "x\1+)JMU0421d040031Q\320K\316I\314K\327M\313/\312M,"..., 1113) = 1113
> > 22903 close(3)                          = 0
> > 22903 utime(".git/objects/3b/tmp_obj_24pz93", {actime=1621599665 /* 2021-05-21T19:21:05+0700 */, modtime=1621599665 /* 2021-05-21T19:21:05+0700 */}) = 0
> > 22903 link(".git/objects/3b/tmp_obj_24pz93", ".git/objects/3b/816f00a02062692e95a9a756247fca34abb911") = 0
> > 22903 unlink(".git/objects/3b/tmp_obj_24pz93") = 0

OK, so this is writing out loose objects. It's exactly the case I
suspected.

> The last write() sequence seems running repeatedly, what it means? Infinite loop?

If it's a lot of just write(), it's probably just a large object. If
it's a lot of writes interspersed with opens and links, as above, then
it's just a lot of objects. It probably would finish eventually, but
it's just slow.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-22 12:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-16 13:09 git repack on shallow clone of large repo (linux kernel) hangs at "Enumerating objects" Bagas Sanjaya
2021-05-17  8:51 ` Jeff King
2021-05-18 11:23   ` Bagas Sanjaya
2021-05-18 12:07     ` Jeff King
2021-05-22 11:16       ` Bagas Sanjaya
2021-05-22 12:11         ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).