git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matheus Tavares <matheus.bernardino@usp.br>
To: git@vger.kernel.org
Cc: phil.hord@gmail.com, dstolee@microsoft.com,
	jonathantanmy@google.com, stefanbeller@gmail.com,
	gitster@pobox.com
Subject: [PATCH v2 0/2] Fix race condition and memory leak in delta base cache
Date: Mon, 28 Sep 2020 21:01:51 -0300	[thread overview]
Message-ID: <cover.1601337543.git.matheus.bernardino@usp.br> (raw)
In-Reply-To: <cover.1601311803.git.matheus.bernardino@usp.br>

Changes since v1:

- Reworded patch one to remove misleading sentence about the locking
  during step 2.
- In patch two, let add_delta_base_cache() take full ownership of the
  base, properly releasing it when the insertion is skipped.

Matheus Tavares (2):
  packfile: fix race condition on unpack_entry()
  packfile: fix memory leak in add_delta_base_cache()

 packfile.c | 48 +++++++++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 19 deletions(-)

Range-diff against v1:
1:  42a7948f94 ! 1:  948d07673f packfile: fix race condition on unpack_entry()
    @@ Commit message
         objects.
     
         To fix the race condition (and later segmentation fault), let's reorder
    -    the aforementioned steps so that the lock is not released between 1.
    -    and 3. An alternative solution which would not require the reordering
    -    would be to duplicate `base` before inserting it in the cache.
    -    However, as Phil Hord mentioned, memcpy()'ing large bases can negatively
    -    affect performance: in his experiments, this alternative approach slowed
    -    git-grep down by 10% to 20%.
    +    the aforementioned steps so that `base` is only added to the cache at
    +    the end. This will prevent the buffer from being released by another
    +    thread while it is still in use. An alternative solution which would not
    +    require the reordering would be to duplicate `base` before inserting it
    +    in the cache. However, as Phil Hord mentioned, memcpy()'ing large bases
    +    can negatively affect performance: in his experiments, this alternative
    +    approach slowed git-grep down by 10% to 20%.
     
         Reported-by: Phil Hord <phil.hord@gmail.com>
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
2:  5b6e3019e0 ! 2:  f15f0c82fd packfile: fix memory leak in add_delta_base_cache()
    @@ Commit message
         When add_delta_base_cache() is called with a base that is already in the
         cache, no operation is performed. But the check is done after allocating
         space for a new entry, so we end up leaking memory on the early return.
    -    Also, the caller always expect that the base will be inserted, so it
    -    never free()'s it. To fix both of these memory leaks, let's move the
    +    In addition, the caller never free()'s the base as it expects the
    +    function to take ownership of it. But the base is not released when we
    +    skip insertion, so it also gets leaked. To fix these problems, move the
         allocation of a new entry further down in add_delta_base_cache(), and
    -    make the function return an integer to indicate whether the insertion
    -    was performed or not. Then, make the caller free() the base when needed.
    +    free() the base on early return.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## packfile.c ##
     @@ packfile.c: void clear_delta_base_cache(void)
    - 	}
    - }
    - 
    --static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
    -+static int add_delta_base_cache(struct packed_git *p, off_t base_offset,
    + static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
      	void *base, unsigned long base_size, enum object_type type)
      {
     -	struct delta_base_cache_entry *ent = xmalloc(sizeof(*ent));
    @@ packfile.c: void clear_delta_base_cache(void)
      
      	/*
     @@ packfile.c: static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
    + 	 * is unpacking the same object, in unpack_entry() (since its phases I
      	 * and III might run concurrently across multiple threads).
      	 */
    - 	if (in_delta_base_cache(p, base_offset))
    --		return;
    -+		return 0;
    +-	if (in_delta_base_cache(p, base_offset))
    ++	if (in_delta_base_cache(p, base_offset)) {
    ++		free(base);
    + 		return;
    ++	}
      
      	delta_base_cached += base_size;
      
    @@ packfile.c: static void add_delta_base_cache(struct packed_git *p, off_t base_of
      	ent->key.p = p;
      	ent->key.base_offset = base_offset;
      	ent->type = type;
    -@@ packfile.c: static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
    - 		hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, NULL, 0);
    - 	hashmap_entry_init(&ent->ent, pack_entry_hash(p, base_offset));
    - 	hashmap_add(&delta_base_cache, &ent->ent);
    -+	return 1;
    - }
    - 
    - int packed_object_info(struct repository *r, struct packed_git *p,
    -@@ packfile.c: void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
    - 		 * thread could free() it (e.g. to make space for another entry)
    - 		 * before we are done using it.
    - 		 */
    --		if (!external_base)
    --			add_delta_base_cache(p, base_obj_offset, base, base_size, type);
    -+		if (!external_base && !add_delta_base_cache(p, base_obj_offset,
    -+						base, base_size, type)) {
    -+			free(base);
    -+		}
    - 
    - 		free(delta_data);
    - 		free(external_base);
-- 
2.28.0


  parent reply	other threads:[~2020-09-29  0:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-25  2:36 RFC - concurrency causes segfault in git grep since 2.26.0 Phil Hord
2020-09-25  5:52 ` Matheus Tavares
2020-09-25 19:53   ` Phil Hord
2020-09-28 16:50     ` [PATCH 0/2] Fix race condition and memory leak in delta base cache Matheus Tavares
2020-09-28 16:50       ` [PATCH 1/2] packfile: fix race condition on unpack_entry() Matheus Tavares
2020-09-28 18:05         ` Junio C Hamano
2020-09-28 16:50       ` [PATCH 2/2] packfile: fix memory leak in add_delta_base_cache() Matheus Tavares
2020-09-28 18:22         ` Junio C Hamano
2020-09-29  0:01       ` Matheus Tavares [this message]
2020-09-29  0:01         ` [PATCH v2 1/2] packfile: fix race condition on unpack_entry() Matheus Tavares
2020-10-02 20:06           ` Phil Hord
2020-09-29  0:01         ` [PATCH v2 2/2] packfile: fix memory leak in add_delta_base_cache() Matheus Tavares

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1601337543.git.matheus.bernardino@usp.br \
    --to=matheus.bernardino@usp.br \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=phil.hord@gmail.com \
    --cc=stefanbeller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).