git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ulrich Spörlein" <uqs@freebsd.org>
Cc: Ed Maste <emaste@freebsd.org>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: [PATCH] clear_delta_base_cache(): don't modify hashmap while iterating
Date: Thu, 19 Jan 2017 11:33:50 -0500	[thread overview]
Message-ID: <20170119163350.zsfb33lmigkyljjh@sigill.intra.peff.net> (raw)
In-Reply-To: <CAJ9axoT1pUQc_jTKxO+RMw7emhA4ss1NCAU+hpnyG5LMwGD89w@mail.gmail.com>

On Thu, Jan 19, 2017 at 03:03:46PM +0100, Ulrich Spörlein wrote:

> > I suspect the patch below may fix things for you. It works around it by
> > walking over the lru list (either is fine, as they both contain all
> > entries, and since we're clearing everything, we don't care about the
> > order).
> 
> Confirmed. With the patch applied, I can import the whole 55G in one go
> without any crashes or aborts. Thanks much!

Thanks. Here it is rolled up with a commit message.

-- >8 --
Subject: clear_delta_base_cache(): don't modify hashmap while iterating

Removing entries while iterating causes fast-import to
access an already-freed `struct packed_git`, leading to
various confusing errors.

What happens is that clear_delta_base_cache() drops the
whole contents of the cache by iterating over the hashmap,
calling release_delta_base_cache() on each entry. That
function removes the item from the hashmap. The hashmap code
may then shrink the table, but the hashmap_iter struct
retains an offset from the old table.

As a result, the next call to hashmap_iter_next() may claim
that the iteration is done, even though some items haven't
been visited.

The only caller of clear_delta_base_cache() is fast-import,
which wants to clear the cache because it is discarding the
packed_git struct for its temporary pack. So by failing to
remove all of the entries, we still have references to the
freed packed_git.

To make things even more confusing, this doesn't seem to
trigger with the test suite, because it depends on
complexities like the size of the hash table, which entries
got cleared, whether we try to access them before they're
evicted from the cache, etc.

So I've been able to identify the problem with large
imports like freebsd's svn import, or a fast-export of
linux.git. But nothing that would be reasonable to run as
part of the normal test suite.

We can fix this easily by iterating over the lru linked list
instead of the hashmap. They both contain the same entries,
and we can use the "safe" variant of the list iterator,
which exists for exactly this case.

Let's also add a warning to the hashmap API documentation to
reduce the chances of getting bit by this again.

Reported-by: Ulrich Spörlein <uqs@freebsd.org>
Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/api-hashmap.txt | 4 +++-
 sha1_file.c                             | 9 ++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
index 28f5a8b71..a3f020cd9 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -188,7 +188,9 @@ Returns the removed entry, or NULL if not found.
 `void *hashmap_iter_next(struct hashmap_iter *iter)`::
 `void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
 
-	Used to iterate over all entries of a hashmap.
+	Used to iterate over all entries of a hashmap. Note that it is
+	not safe to add or remove entries to the hashmap while
+	iterating.
 +
 `hashmap_iter_init` initializes a `hashmap_iter` structure.
 +
diff --git a/sha1_file.c b/sha1_file.c
index 1eb47f611..d20714d6b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2342,11 +2342,10 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
 
 void clear_delta_base_cache(void)
 {
-	struct hashmap_iter iter;
-	struct delta_base_cache_entry *entry;
-	for (entry = hashmap_iter_first(&delta_base_cache, &iter);
-	     entry;
-	     entry = hashmap_iter_next(&iter)) {
+	struct list_head *lru, *tmp;
+	list_for_each_safe(lru, tmp, &delta_base_cache_lru) {
+		struct delta_base_cache_entry *entry =
+			list_entry(lru, struct delta_base_cache_entry, lru);
 		release_delta_base_cache(entry);
 	}
 }
-- 
2.11.0.747.g2c5918130


  reply	other threads:[~2017-01-19 16:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-12  8:21 git fast-import crashing on big imports Ulrich Spörlein
2017-01-18 14:01 ` Ulrich Spoerlein
2017-01-18 14:38   ` Jeff King
2017-01-18 20:06     ` Jeff King
     [not found]       ` <CAJ9axoSzZJXD4RKvVx+D60dw4sakMJWgNmOP-cREWA53Ae3C3w@mail.gmail.com>
2017-01-18 20:27         ` Jeff King
2017-01-18 21:51           ` Jeff King
2017-01-19 14:03             ` Ulrich Spörlein
2017-01-19 16:33               ` Jeff King [this message]
2017-01-19 19:16                 ` [PATCH] clear_delta_base_cache(): don't modify hashmap while iterating Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170119163350.zsfb33lmigkyljjh@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=emaste@freebsd.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=uqs@freebsd.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).