git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: David Turner <dturner@twopensource.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git mailing list <git@vger.kernel.org>, Duy Nguyen <pclouds@gmail.com>
Subject: Re: [PATCH 0/1] merge: fix cache_entry use-after-free
Date: Fri, 09 Oct 2015 18:16:03 -0400	[thread overview]
Message-ID: <1444428963.8836.36.camel@twopensource.com> (raw)
In-Reply-To: <xmqqmvvtqgei.fsf@gitster.mtv.corp.google.com>

+Duy Nguyen, who knows the split index better.

On Thu, 2015-10-08 at 13:00 -0700, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > From: Keith McGuigan <kmcguigan@twitter.com>
> >
> > During merges, we would previously free entries that we no longer need
> > in the destination index.  But those entries might also be stored in
> > the dir_entry cache, and when a later call to add_to_index found them,
> > they would be used after being freed.
> >
> > To prevent this, add a ref count for struct cache_entry.  Whenever
> > a cache entry is added to a data structure, the ref count is incremented;
> > when it is removed from the data structure, it is decremented.  When
> > it hits zero, the cache_entry is freed.
> >
> > Signed-off-by: David Turner <dturner@twopensource.com>
> > Signed-off-by: Keith McGuigan <kmcguigan@twitter.com>
> > ---
> 
> Thanks.
> 
> > @@ -213,6 +214,32 @@ struct cache_entry {
> >  struct pathspec;
> >  
> >  /*
> > + * Increment the cache_entry reference count.  Should be called
> > + * whenever a pointer to a cache_entry is retained in a data structure,
> > + * thus marking it as alive.
> > + */
> > +static inline void add_ce_ref(struct cache_entry *ce)
> > +{
> > +	assert(ce != NULL && ce->ref_count >= 0);
> > +	++ce->ref_count;
> > +}
> 
> We tend to prefer post-increment when the distinction between pre-
> or post- does not make any difference, which is the case here.

Will fix.

> > diff --git a/name-hash.c b/name-hash.c
> > index 702cd05..f12c919 100644
> > --- a/name-hash.c
> > +++ b/name-hash.c
> > @@ -92,7 +93,9 @@ static void remove_dir_entry(struct index_state *istate, struct cache_entry *ce)
> >  	struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce));
> >  	while (dir && !(--dir->nr)) {
> >  		struct dir_entry *parent = dir->parent;
> > -		hashmap_remove(&istate->dir_hash, dir, NULL);
> > +		struct dir_entry *removed = hashmap_remove(&istate->dir_hash, dir, NULL);
> > +		assert(removed == dir);
> > +		drop_ce_ref(dir->ce);
> 
> This is curious.  In remove_name_hash() you do not have the
> corresponding assert.  Why is it necessary here (or is it
> unnecessary over there)?

It is unnecessary in any case: it's an assert rather than a check for an
expected (or even unexpected) case.  That just happens to be where Keith
first managed to track down the use-after free, so that's where he
happened to put the assert while trying to debug it.  I'm in general in
favor of more asserts rather than fewer, so I would prefer to keep it
in.  But I can remove it if you like.

> > @@ -147,6 +151,7 @@ void remove_name_hash(struct index_state *istate, struct cache_entry *ce)
> >  		return;
> >  	ce->ce_flags &= ~CE_HASHED;
> >  	hashmap_remove(&istate->name_hash, ce, ce);
> > +	drop_ce_ref(ce);
> >  
> >  	if (ignore_case)
> >  		remove_dir_entry(istate, ce);
> > diff --git a/read-cache.c b/read-cache.c
> > index 87204a5..442de34 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -53,6 +53,7 @@ static const char *alternate_index_output;
> >  static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
> >  {
> >  	istate->cache[nr] = ce;
> > +	add_ce_ref(ce);
> >  	add_name_hash(istate, ce);
> >  }
> 
> What happens to the existing entry that used to be istate->cache[nr],
> which may or may not be 'ce' that is replacing it?
> 
> It turns out that all three calling sites are safe, but it is not
> immediately obvious why.  Perhaps some comment in front of the
> function is in order, to warn those who may have to add a new caller
> or restructure the existing calling chain, that istate->cache[nr] is
> expected not to hold a live reference when the function is called,
> or something?

Happy to add it if you want, but to me it was clear without because if
there were a value in istate->cache[nr], that old value would be leaked.
Let me know.

> > @@ -314,8 +314,10 @@ void replace_index_entry_in_base(struct index_state *istate,
> >  	    istate->split_index->base &&
> >  	    old->index <= istate->split_index->base->cache_nr) {
> >  		new->index = old->index;
> > -		if (old != istate->split_index->base->cache[new->index - 1])
> > -			free(istate->split_index->base->cache[new->index - 1]);
> > +		if (old != istate->split_index->base->cache[new->index - 1]) {
> > +			struct cache_entry *ce = istate->split_index->base->cache[new->index - 1];
> > +			drop_ce_ref(ce);
> > +		}
> >  		istate->split_index->base->cache[new->index - 1] = new;
> 
> Does 'new' already have the right refcount at this point?

I spoke to Keith, and he thinks we do need an add_ce_ref there. I'll fix
that on the reroll.  Duy, do you agree?

  reply	other threads:[~2015-10-09 22:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-08 18:47 [PATCH 0/1] merge: fix cache_entry use-after-free David Turner
2015-10-08 18:47 ` David Turner
2015-10-08 20:00   ` Junio C Hamano
2015-10-09 22:16     ` David Turner [this message]
2015-10-09 22:51       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1444428963.8836.36.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).