* git fast-import crashing on big imports @ 2017-01-12 8:21 Ulrich Spörlein 2017-01-18 14:01 ` Ulrich Spoerlein 0 siblings, 1 reply; 9+ messages in thread From: Ulrich Spörlein @ 2017-01-12 8:21 UTC (permalink / raw) To: git; +Cc: Ed Maste Hey, the FreeBSD svn2git conversion is crashing somewhat non-deterministically during its long conversion process. From memory, this was not as bad is it is with more recent versions of git (but I can't be sure, really). I have a dump file that you can grab at http://scan.spoerlein.net/pub/freebsd-base.dump.xz (19G, 55G uncompressed) that shows this problem after a couple of minutes of runtime. The caveat is that for another member of the team on a different machine the crashes are on different revisions. Googling around I found two previous threads that were discussing problems just like this (memory corruption, bad caching, etc) https://www.spinics.net/lists/git/msg93598.html from 2009 and http://git.661346.n2.nabble.com/long-fast-import-errors-out-quot-failed-to-apply-delta-quot-td6557884.html from 2011 % git fast-import --stats < ../freebsd-base.dump ... progress SVN r49318 branch master = :49869 progress SVN r49319 branch stable/3 = :49870 progress SVN r49320 branch master = :49871 error: failed to apply delta error: bad offset for revindex error: bad offset for revindex error: bad offset for revindex error: bad offset for revindex error: bad offset for revindex fatal: Can't load tree b35ae4e9c2a41677e84a3f14bed09f584c3ff25e fast-import: dumping crash report to fast_import_crash_29613 fast-import crash report: fast-import process: 29613 parent process : 29612 at 2017-01-11 19:33:37 +0000 fatal: Can't load tree b35ae4e9c2a41677e84a3f14bed09f584c3ff25e git fsck shows a somewhat incomplete pack file (I guess that's expected if the process dies mid-stream?) % git fsck Checking object directories: 100% (256/256), done. error: failed to apply delta6/614500) error: cannot unpack d1d7ee1f81c6767c5e0f75d14d400d7512a85a0f from ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack at offset 122654805 error: failed to apply delta error: failed to read delta base object d1d7ee1f81c6767c5e0f75d14d400d7512a85a0f at offset 122654805 from ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack error: cannot unpack 8523bde63ef34bef725347994fdaec996d756510 from ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack at offset 122671596 error: failed to apply delta0/614500) error: failed to read delta base object d1d7ee1f81c6767c5e0f75d14d400d7512a85a0f at offset 122654805 from ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack ... Any comments on whether the original problems from 2009 and 2011 were ever fixed and committed? Some more facts: - git version 2.11.0 - I don't recall these sorts of crashes with a git from 2-3 years ago - adding more checkpoints does help, but not fix the problem, it merely shifts the crashes around to different revisions - incremental runs of the conversion *will* complete most of the time, but depending on how often checkpoints are used, I've seen it croak on specific commits and not being able to progress further :( Thanks for any pointers or things to try! Cheers Uli ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git fast-import crashing on big imports 2017-01-12 8:21 git fast-import crashing on big imports Ulrich Spörlein @ 2017-01-18 14:01 ` Ulrich Spoerlein 2017-01-18 14:38 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Ulrich Spoerlein @ 2017-01-18 14:01 UTC (permalink / raw) To: git, Jeff King; +Cc: Ed Maste, Junio C Hamano Yo Jeff, your commit 8261e1f139db3f8aa6f9fd7d98c876cbeb0f927c from Aug 22nd, that changes delta_base_cache to use hashmap.h is the culprit for git fast-import crashing on large imports. Please read below, you can find a 55G SVN dump that should show the problem after a couple of minutes to less than an hour. Please also see similar issues from 2009 and 2011. This seems to be a rather fragile part of the code, could you add unit tests that make sure this regression is not re-introduce again once you fix it? Thanks! I'm happy to test any patches that you can provide. Cheers, Uli On Do., 2017-01-12 at 09:21:38 +0100, Ulrich Spörlein wrote: > Hey, > > the FreeBSD svn2git conversion is crashing somewhat > non-deterministically during its long conversion process. From memory, > this was not as bad is it is with more recent versions of git (but I > can't be sure, really). > > I have a dump file that you can grab at > http://scan.spoerlein.net/pub/freebsd-base.dump.xz (19G, 55G uncompressed) > that shows this problem after a couple of minutes of runtime. The caveat is > that for another member of the team on a different machine the crashes are on > different revisions. > > Googling around I found two previous threads that were discussing > problems just like this (memory corruption, bad caching, etc) > > https://www.spinics.net/lists/git/msg93598.html from 2009 > and > http://git.661346.n2.nabble.com/long-fast-import-errors-out-quot-failed-to-apply-delta-quot-td6557884.html > from 2011 > > % git fast-import --stats < ../freebsd-base.dump > ... > progress SVN r49318 branch master = :49869 > progress SVN r49319 branch stable/3 = :49870 > progress SVN r49320 branch master = :49871 > error: failed to apply delta > error: bad offset for revindex > error: bad offset for revindex > error: bad offset for revindex > error: bad offset for revindex > error: bad offset for revindex > fatal: Can't load tree b35ae4e9c2a41677e84a3f14bed09f584c3ff25e > fast-import: dumping crash report to fast_import_crash_29613 > > > fast-import crash report: > fast-import process: 29613 > parent process : 29612 > at 2017-01-11 19:33:37 +0000 > > fatal: Can't load tree b35ae4e9c2a41677e84a3f14bed09f584c3ff25e > > > git fsck shows a somewhat incomplete pack file (I guess that's expected if the > process dies mid-stream?) > > % git fsck > Checking object directories: 100% (256/256), done. > error: failed to apply delta6/614500) > error: cannot unpack d1d7ee1f81c6767c5e0f75d14d400d7512a85a0f from ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack at offset 122654805 > error: failed to apply delta > error: failed to read delta base object d1d7ee1f81c6767c5e0f75d14d400d7512a85a0f at offset 122654805 from ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack > error: cannot unpack 8523bde63ef34bef725347994fdaec996d756510 from ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack at offset 122671596 > error: failed to apply delta0/614500) > error: failed to read delta base object d1d7ee1f81c6767c5e0f75d14d400d7512a85a0f at offset 122654805 from ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack > ... > > > Any comments on whether the original problems from 2009 and 2011 were ever > fixed and committed? > > Some more facts: > - git version 2.11.0 > - I don't recall these sorts of crashes with a git from 2-3 years ago > - adding more checkpoints does help, but not fix the problem, it merely shifts > the crashes around to different revisions > - incremental runs of the conversion *will* complete most of the time, but > depending on how often checkpoints are used, I've seen it croak on specific > commits and not being able to progress further :( > > Thanks for any pointers or things to try! > Cheers > Uli ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git fast-import crashing on big imports 2017-01-18 14:01 ` Ulrich Spoerlein @ 2017-01-18 14:38 ` Jeff King 2017-01-18 20:06 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2017-01-18 14:38 UTC (permalink / raw) To: Ulrich Spoerlein; +Cc: git, Ed Maste, Junio C Hamano On Wed, Jan 18, 2017 at 03:01:17PM +0100, Ulrich Spoerlein wrote: > Yo Jeff, your commit 8261e1f139db3f8aa6f9fd7d98c876cbeb0f927c from Aug > 22nd, that changes delta_base_cache to use hashmap.h is the culprit for > git fast-import crashing on large imports. I actually saw your bug report the other day and tried to download the dump file, but got a 404. Can you double check that it is available? -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git fast-import crashing on big imports 2017-01-18 14:38 ` Jeff King @ 2017-01-18 20:06 ` Jeff King [not found] ` <CAJ9axoSzZJXD4RKvVx+D60dw4sakMJWgNmOP-cREWA53Ae3C3w@mail.gmail.com> 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2017-01-18 20:06 UTC (permalink / raw) To: Ulrich Spoerlein; +Cc: git, Ed Maste, Junio C Hamano On Wed, Jan 18, 2017 at 09:38:14AM -0500, Jeff King wrote: > On Wed, Jan 18, 2017 at 03:01:17PM +0100, Ulrich Spoerlein wrote: > > > Yo Jeff, your commit 8261e1f139db3f8aa6f9fd7d98c876cbeb0f927c from Aug > > 22nd, that changes delta_base_cache to use hashmap.h is the culprit for > > git fast-import crashing on large imports. > > I actually saw your bug report the other day and tried to download the > dump file, but got a 404. Can you double check that it is available? The URL in your original mail was bogus: > > I have a dump file that you can grab at > > http://scan.spoerlein.net/pub/freebsd-base.dump.xz (19G, 55G uncompressed) Via guess-and-check, I found: http://www.spoerlein.net/pub/freebsd-base.git.fi.xz I'll see if I can reproduce the problem. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CAJ9axoSzZJXD4RKvVx+D60dw4sakMJWgNmOP-cREWA53Ae3C3w@mail.gmail.com>]
* Re: git fast-import crashing on big imports [not found] ` <CAJ9axoSzZJXD4RKvVx+D60dw4sakMJWgNmOP-cREWA53Ae3C3w@mail.gmail.com> @ 2017-01-18 20:27 ` Jeff King 2017-01-18 21:51 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2017-01-18 20:27 UTC (permalink / raw) To: Ulrich Spörlein; +Cc: Ed Maste, git, Junio C Hamano On Wed, Jan 18, 2017 at 09:20:07PM +0100, Ulrich Spörlein wrote: > I found your commit via bisect in case you were wondering. Thanks for > picking this up. Still downloading. However, just looking at the code, the obvious culprit would be clear_delta_base_cache(), which is called from literally nowhere except fast-import, and then only when checkpointing. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git fast-import crashing on big imports 2017-01-18 20:27 ` Jeff King @ 2017-01-18 21:51 ` Jeff King 2017-01-19 14:03 ` Ulrich Spörlein 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2017-01-18 21:51 UTC (permalink / raw) To: Ulrich Spörlein; +Cc: Ed Maste, git, Junio C Hamano On Wed, Jan 18, 2017 at 03:27:04PM -0500, Jeff King wrote: > On Wed, Jan 18, 2017 at 09:20:07PM +0100, Ulrich Spörlein wrote: > > > I found your commit via bisect in case you were wondering. Thanks for > > picking this up. > > Still downloading. However, just looking at the code, the obvious > culprit would be clear_delta_base_cache(), which is called from > literally nowhere except fast-import, and then only when checkpointing. Hmm. I haven't reproduced your exact issue, but I was able to produce some hijinks in that function. The problem is that the hashmap_iter interface is unreliable if entries are added or removed from the map during the iteration. I suspect the patch below may fix things for you. It works around it by walking over the lru list (either is fine, as they both contain all entries, and since we're clearing everything, we don't care about the order). --- sha1_file.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 1eb47f611..d20714d6b 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2342,11 +2342,10 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent) void clear_delta_base_cache(void) { - struct hashmap_iter iter; - struct delta_base_cache_entry *entry; - for (entry = hashmap_iter_first(&delta_base_cache, &iter); - entry; - entry = hashmap_iter_next(&iter)) { + struct list_head *lru, *tmp; + list_for_each_safe(lru, tmp, &delta_base_cache_lru) { + struct delta_base_cache_entry *entry = + list_entry(lru, struct delta_base_cache_entry, lru); release_delta_base_cache(entry); } } -- 2.11.0.698.gd6b48ab4c > > -Peff ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: git fast-import crashing on big imports 2017-01-18 21:51 ` Jeff King @ 2017-01-19 14:03 ` Ulrich Spörlein 2017-01-19 16:33 ` [PATCH] clear_delta_base_cache(): don't modify hashmap while iterating Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Ulrich Spörlein @ 2017-01-19 14:03 UTC (permalink / raw) To: Jeff King; +Cc: Ed Maste, git, Junio C Hamano 2017-01-18 22:51 GMT+01:00 Jeff King <peff@peff.net>: > > On Wed, Jan 18, 2017 at 03:27:04PM -0500, Jeff King wrote: > > > On Wed, Jan 18, 2017 at 09:20:07PM +0100, Ulrich Spörlein wrote: > > > > > I found your commit via bisect in case you were wondering. Thanks for > > > picking this up. > > > > Still downloading. However, just looking at the code, the obvious > > culprit would be clear_delta_base_cache(), which is called from > > literally nowhere except fast-import, and then only when checkpointing. Sorry for the bad URL, pesky last minute changes ... > Hmm. I haven't reproduced your exact issue, but I was able to produce > some hijinks in that function. > > The problem is that the hashmap_iter interface is unreliable if entries > are added or removed from the map during the iteration. > > I suspect the patch below may fix things for you. It works around it by > walking over the lru list (either is fine, as they both contain all > entries, and since we're clearing everything, we don't care about the > order). Confirmed. With the patch applied, I can import the whole 55G in one go without any crashes or aborts. Thanks much! > > --- > sha1_file.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/sha1_file.c b/sha1_file.c > index 1eb47f611..d20714d6b 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -2342,11 +2342,10 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent) > > void clear_delta_base_cache(void) > { > - struct hashmap_iter iter; > - struct delta_base_cache_entry *entry; > - for (entry = hashmap_iter_first(&delta_base_cache, &iter); > - entry; > - entry = hashmap_iter_next(&iter)) { > + struct list_head *lru, *tmp; > + list_for_each_safe(lru, tmp, &delta_base_cache_lru) { > + struct delta_base_cache_entry *entry = > + list_entry(lru, struct delta_base_cache_entry, lru); > release_delta_base_cache(entry); > } > } > -- > 2.11.0.698.gd6b48ab4c > > > > > > > > -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] clear_delta_base_cache(): don't modify hashmap while iterating 2017-01-19 14:03 ` Ulrich Spörlein @ 2017-01-19 16:33 ` Jeff King 2017-01-19 19:16 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2017-01-19 16:33 UTC (permalink / raw) To: Ulrich Spörlein; +Cc: Ed Maste, git, Junio C Hamano On Thu, Jan 19, 2017 at 03:03:46PM +0100, Ulrich Spörlein wrote: > > I suspect the patch below may fix things for you. It works around it by > > walking over the lru list (either is fine, as they both contain all > > entries, and since we're clearing everything, we don't care about the > > order). > > Confirmed. With the patch applied, I can import the whole 55G in one go > without any crashes or aborts. Thanks much! Thanks. Here it is rolled up with a commit message. -- >8 -- Subject: clear_delta_base_cache(): don't modify hashmap while iterating Removing entries while iterating causes fast-import to access an already-freed `struct packed_git`, leading to various confusing errors. What happens is that clear_delta_base_cache() drops the whole contents of the cache by iterating over the hashmap, calling release_delta_base_cache() on each entry. That function removes the item from the hashmap. The hashmap code may then shrink the table, but the hashmap_iter struct retains an offset from the old table. As a result, the next call to hashmap_iter_next() may claim that the iteration is done, even though some items haven't been visited. The only caller of clear_delta_base_cache() is fast-import, which wants to clear the cache because it is discarding the packed_git struct for its temporary pack. So by failing to remove all of the entries, we still have references to the freed packed_git. To make things even more confusing, this doesn't seem to trigger with the test suite, because it depends on complexities like the size of the hash table, which entries got cleared, whether we try to access them before they're evicted from the cache, etc. So I've been able to identify the problem with large imports like freebsd's svn import, or a fast-export of linux.git. But nothing that would be reasonable to run as part of the normal test suite. We can fix this easily by iterating over the lru linked list instead of the hashmap. They both contain the same entries, and we can use the "safe" variant of the list iterator, which exists for exactly this case. Let's also add a warning to the hashmap API documentation to reduce the chances of getting bit by this again. Reported-by: Ulrich Spörlein <uqs@freebsd.org> Signed-off-by: Jeff King <peff@peff.net> --- Documentation/technical/api-hashmap.txt | 4 +++- sha1_file.c | 9 ++++----- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt index 28f5a8b71..a3f020cd9 100644 --- a/Documentation/technical/api-hashmap.txt +++ b/Documentation/technical/api-hashmap.txt @@ -188,7 +188,9 @@ Returns the removed entry, or NULL if not found. `void *hashmap_iter_next(struct hashmap_iter *iter)`:: `void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`:: - Used to iterate over all entries of a hashmap. + Used to iterate over all entries of a hashmap. Note that it is + not safe to add or remove entries to the hashmap while + iterating. + `hashmap_iter_init` initializes a `hashmap_iter` structure. + diff --git a/sha1_file.c b/sha1_file.c index 1eb47f611..d20714d6b 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2342,11 +2342,10 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent) void clear_delta_base_cache(void) { - struct hashmap_iter iter; - struct delta_base_cache_entry *entry; - for (entry = hashmap_iter_first(&delta_base_cache, &iter); - entry; - entry = hashmap_iter_next(&iter)) { + struct list_head *lru, *tmp; + list_for_each_safe(lru, tmp, &delta_base_cache_lru) { + struct delta_base_cache_entry *entry = + list_entry(lru, struct delta_base_cache_entry, lru); release_delta_base_cache(entry); } } -- 2.11.0.747.g2c5918130 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] clear_delta_base_cache(): don't modify hashmap while iterating 2017-01-19 16:33 ` [PATCH] clear_delta_base_cache(): don't modify hashmap while iterating Jeff King @ 2017-01-19 19:16 ` Junio C Hamano 0 siblings, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2017-01-19 19:16 UTC (permalink / raw) To: Jeff King; +Cc: Ulrich Spörlein, Ed Maste, git Jeff King <peff@peff.net> writes: > Thanks. Here it is rolled up with a commit message. > > -- >8 -- > Subject: clear_delta_base_cache(): don't modify hashmap while iterating > > Removing entries while iterating causes fast-import to > access an already-freed `struct packed_git`, leading to > various confusing errors. > > What happens is that clear_delta_base_cache() drops the > whole contents of the cache by iterating over the hashmap, > calling release_delta_base_cache() on each entry. That > function removes the item from the hashmap. The hashmap code > may then shrink the table, but the hashmap_iter struct > retains an offset from the old table. > > As a result, the next call to hashmap_iter_next() may claim > that the iteration is done, even though some items haven't > been visited. > > The only caller of clear_delta_base_cache() is fast-import, > which wants to clear the cache because it is discarding the > packed_git struct for its temporary pack. So by failing to > remove all of the entries, we still have references to the > freed packed_git. > > To make things even more confusing, this doesn't seem to > trigger with the test suite, because it depends on > complexities like the size of the hash table, which entries > got cleared, whether we try to access them before they're > evicted from the cache, etc. > > So I've been able to identify the problem with large > imports like freebsd's svn import, or a fast-export of > linux.git. But nothing that would be reasonable to run as > part of the normal test suite. > > We can fix this easily by iterating over the lru linked list > instead of the hashmap. They both contain the same entries, > and we can use the "safe" variant of the list iterator, > which exists for exactly this case. > > Let's also add a warning to the hashmap API documentation to > reduce the chances of getting bit by this again. > > Reported-by: Ulrich Spörlein <uqs@freebsd.org> > Signed-off-by: Jeff King <peff@peff.net> > --- Makes sense. Thanks, both, for reporting, finding and fixing. Will apply. > Documentation/technical/api-hashmap.txt | 4 +++- > sha1_file.c | 9 ++++----- > 2 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt > index 28f5a8b71..a3f020cd9 100644 > --- a/Documentation/technical/api-hashmap.txt > +++ b/Documentation/technical/api-hashmap.txt > @@ -188,7 +188,9 @@ Returns the removed entry, or NULL if not found. > `void *hashmap_iter_next(struct hashmap_iter *iter)`:: > `void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`:: > > - Used to iterate over all entries of a hashmap. > + Used to iterate over all entries of a hashmap. Note that it is > + not safe to add or remove entries to the hashmap while > + iterating. > + > `hashmap_iter_init` initializes a `hashmap_iter` structure. > + > diff --git a/sha1_file.c b/sha1_file.c > index 1eb47f611..d20714d6b 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -2342,11 +2342,10 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent) > > void clear_delta_base_cache(void) > { > - struct hashmap_iter iter; > - struct delta_base_cache_entry *entry; > - for (entry = hashmap_iter_first(&delta_base_cache, &iter); > - entry; > - entry = hashmap_iter_next(&iter)) { > + struct list_head *lru, *tmp; > + list_for_each_safe(lru, tmp, &delta_base_cache_lru) { > + struct delta_base_cache_entry *entry = > + list_entry(lru, struct delta_base_cache_entry, lru); > release_delta_base_cache(entry); > } > } ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-01-19 19:16 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-12 8:21 git fast-import crashing on big imports Ulrich Spörlein 2017-01-18 14:01 ` Ulrich Spoerlein 2017-01-18 14:38 ` Jeff King 2017-01-18 20:06 ` Jeff King [not found] ` <CAJ9axoSzZJXD4RKvVx+D60dw4sakMJWgNmOP-cREWA53Ae3C3w@mail.gmail.com> 2017-01-18 20:27 ` Jeff King 2017-01-18 21:51 ` Jeff King 2017-01-19 14:03 ` Ulrich Spörlein 2017-01-19 16:33 ` [PATCH] clear_delta_base_cache(): don't modify hashmap while iterating Jeff King 2017-01-19 19:16 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).