git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH] packed_ref_cache: don't use mmap() for small files
@ 2018-01-13 16:11 Kim Gybels
  2018-01-13 18:56 ` Johannes Schindelin
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Kim Gybels @ 2018-01-13 16:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin, Michael Haggerty, Kim Gybels

Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use
read() instead of mmap() for small packed-refs files.

This also fixes the problem[1] where xmmap() returns NULL for zero
length[2], for which munmap() later fails.

Alternatively, we could simply check for NULL before munmap(), or
introduce an xmunmap() that could be used together with xmmap().

[1] https://github.com/git-for-windows/git/issues/1410
[2] Logic introduced in commit 9130ac1e1966adb9922e64f645730d0d45383495

Signed-off-by: Kim Gybels <kgybels@infogroep.be>
---
 refs/packed-backend.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index dab8a85d9a..7177e5bc2f 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 				 last_line, eof - last_line);
 }
 
+#define SMALL_FILE_SIZE (32*1024)
+
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
@@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot)
 		die_errno("couldn't stat %s", snapshot->refs->path);
 	size = xsize_t(st.st_size);
 
-	switch (mmap_strategy) {
-	case MMAP_NONE:
+	if (!size) {
+		snapshot->buf = NULL;
+		snapshot->eof = NULL;
+		snapshot->mmapped = 0;
+	} else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
 		snapshot->buf = xmalloc(size);
 		bytes_read = read_in_full(fd, snapshot->buf, size);
 		if (bytes_read < 0 || bytes_read != size)
 			die_errno("couldn't read %s", snapshot->refs->path);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 0;
-		break;
-	case MMAP_TEMPORARY:
-	case MMAP_OK:
+	} else {
 		snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 1;
-		break;
 	}
 	close(fd);
 
-- 
2.15.1.windows.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] packed_ref_cache: don't use mmap() for small files
  2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels
@ 2018-01-13 18:56 ` Johannes Schindelin
  2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
  2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King
  2 siblings, 0 replies; 15+ messages in thread
From: Johannes Schindelin @ 2018-01-13 18:56 UTC (permalink / raw)
  To: Kim Gybels; +Cc: git, Junio C Hamano, Michael Haggerty

Hi,

On Sat, 13 Jan 2018, Kim Gybels wrote:

> Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use

Maybe use

	ea68b0ce9f8 (hash-object: don't use mmap() for small files,
	2010-02-21)

instead of the full commit name?

> read() instead of mmap() for small packed-refs files.
> 
> This also fixes the problem[1] where xmmap() returns NULL for zero
> length[2], for which munmap() later fails.
> 
> Alternatively, we could simply check for NULL before munmap(), or
> introduce an xmunmap() that could be used together with xmmap().
> 
> [1] https://github.com/git-for-windows/git/issues/1410
> [2] Logic introduced in commit 9130ac1e1966adb9922e64f645730d0d45383495
> 
> Signed-off-by: Kim Gybels <kgybels@infogroep.be>
> ---
>  refs/packed-backend.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index dab8a85d9a..7177e5bc2f 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
>  				 last_line, eof - last_line);
>  }
>  
> +#define SMALL_FILE_SIZE (32*1024)
> +
>  /*
>   * Depending on `mmap_strategy`, either mmap or read the contents of
>   * the `packed-refs` file into the snapshot. Return 1 if the file
> @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot)
>  		die_errno("couldn't stat %s", snapshot->refs->path);
>  	size = xsize_t(st.st_size);
>  
> -	switch (mmap_strategy) {
> -	case MMAP_NONE:
> +	if (!size) {
> +		snapshot->buf = NULL;
> +		snapshot->eof = NULL;
> +		snapshot->mmapped = 0;
> +	} else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
>  		snapshot->buf = xmalloc(size);
>  		bytes_read = read_in_full(fd, snapshot->buf, size);
>  		if (bytes_read < 0 || bytes_read != size)
>  			die_errno("couldn't read %s", snapshot->refs->path);
>  		snapshot->eof = snapshot->buf + size;
>  		snapshot->mmapped = 0;
> -		break;
> -	case MMAP_TEMPORARY:
> -	case MMAP_OK:
> +	} else {
>  		snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
>  		snapshot->eof = snapshot->buf + size;
>  		snapshot->mmapped = 1;
> -		break;
>  	}
>  	close(fd);

Nicely explained, and nicely solved, for a potential extra performance
benefit ;-)

Thank you!
Dscho

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] packed_ref_cache: don't use mmap() for small files
  2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels
  2018-01-13 18:56 ` Johannes Schindelin
@ 2018-01-14 19:14 ` " Kim Gybels
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
                     ` (3 more replies)
  2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King
  2 siblings, 4 replies; 15+ messages in thread
From: Kim Gybels @ 2018-01-14 19:14 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin, Michael Haggerty, Kim Gybels

Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
small files, 2010-02-21) and use read() instead of mmap() for small
packed-refs files.

This also fixes the problem[1] where xmmap() returns NULL for zero
length[2], for which munmap() later fails.

Alternatively, we could simply check for NULL before munmap(), or
introduce xmunmap() that could be used together with xmmap().

[1] https://github.com/git-for-windows/git/issues/1410
[2] Logic introduced in commit 9130ac1e196 (Better error messages for
    corrupt databases, 2007-01-11)

Signed-off-by: Kim Gybels <kgybels@infogroep.be>
---
Change since v1: reworded commit message based on Johannes Schindelin's
feedback: shorter commit hashes, and included commit titles.

 refs/packed-backend.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index dab8a85d9a..7177e5bc2f 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 				 last_line, eof - last_line);
 }
 
+#define SMALL_FILE_SIZE (32*1024)
+
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
@@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot)
 		die_errno("couldn't stat %s", snapshot->refs->path);
 	size = xsize_t(st.st_size);
 
-	switch (mmap_strategy) {
-	case MMAP_NONE:
+	if (!size) {
+		snapshot->buf = NULL;
+		snapshot->eof = NULL;
+		snapshot->mmapped = 0;
+	} else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
 		snapshot->buf = xmalloc(size);
 		bytes_read = read_in_full(fd, snapshot->buf, size);
 		if (bytes_read < 0 || bytes_read != size)
 			die_errno("couldn't read %s", snapshot->refs->path);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 0;
-		break;
-	case MMAP_TEMPORARY:
-	case MMAP_OK:
+	} else {
 		snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 1;
-		break;
 	}
 	close(fd);
 
-- 
2.16.0.rc2.windows.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files"
  2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
@ 2018-01-15 12:17   ` Michael Haggerty
  2018-01-17 20:23     ` Johannes Schindelin
  2018-01-17 21:52     ` Junio C Hamano
  2018-01-15 12:17   ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw)
  To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty

Thanks for your patch. I haven't measured the performance difference
of `mmap()` vs. `read()` for small `packed-refs` files, but it's not
surprising that `read()` would be faster.

I especially like the fix for zero-length `packed-refs` files. (Even
though AFAIK Git never writes such files, they are totally legitimate
and shouldn't cause Git to fail.) With or without the additions
mentioned below,

Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>

While reviewing your patch, I realized that some areas of the existing
code use constructs that are undefined according to the C standard,
such as computing `NULL + 0` and `NULL - NULL`. This was already wrong
(and would come up more frequently after your change). Even though
these are unlikely to be problems in the real world, it would be good
to avoid them.

So I will follow up this email with three patches:

1. Mention that `snapshot::buf` can be NULL for empty files

   I suggest squashing this into your patch, to make it clear that
   `snapshot::buf` and `snapshot::eof` can also be NULL if the
   `packed-refs` file is empty.

2. create_snapshot(): exit early if the file was empty

   Avoid undefined behavior by returning early if `snapshot->buf` is
   NULL.

3. find_reference_location(): don't invoke if `snapshot->buf` is NULL

   Avoid undefined behavior and confusing semantics by not calling
   `find_reference_location()` when `snapshot->buf` is NULL.

Michael

Michael Haggerty (3):
  SQUASH? Mention that `snapshot::buf` can be NULL for empty files
  create_snapshot(): exit early if the file was empty
  find_reference_location(): don't invoke if `snapshot->buf` is NULL

 refs/packed-backend.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

-- 
2.14.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files
  2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
@ 2018-01-15 12:17   ` Michael Haggerty
  2018-01-15 12:17   ` [PATCH 2/3] create_snapshot(): exit early if the file was empty Michael Haggerty
  2018-01-15 12:17   ` [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL Michael Haggerty
  3 siblings, 0 replies; 15+ messages in thread
From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw)
  To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 01a13cb817..f20f05b4df 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -69,11 +69,11 @@ struct snapshot {
 
 	/*
 	 * The contents of the `packed-refs` file. If the file was
-	 * already sorted, this points at the mmapped contents of the
-	 * file. If not, this points at heap-allocated memory
-	 * containing the contents, sorted. If there were no contents
-	 * (e.g., because the file didn't exist), `buf` and `eof` are
-	 * both NULL.
+	 * already sorted and if mmapping is allowed, this points at
+	 * the mmapped contents of the file. If not, this points at
+	 * heap-allocated memory containing the contents, sorted. If
+	 * there were no contents (e.g., because the file didn't exist
+	 * or was empty), `buf` and `eof` are both NULL.
 	 */
 	char *buf, *eof;
 
-- 
2.14.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] create_snapshot(): exit early if the file was empty
  2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
  2018-01-15 12:17   ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty
@ 2018-01-15 12:17   ` Michael Haggerty
  2018-01-15 12:17   ` [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL Michael Haggerty
  3 siblings, 0 replies; 15+ messages in thread
From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw)
  To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty

If the `packed_refs` files is entirely empty (i.e., not even a header
line), then `load_contents()` returns 1 even though `snapshot->buf`
and `snapshot->eof` both end up set to NULL. In that case, the
subsequent processing that `create_snapshot()` does is unnecessary,
and also involves computing `NULL - NULL` and `NULL + 0`, which are
probably safe in real life but are technically undefined in C.

Sidestep both issues by exiting early if `snapshot->buf` is NULL.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index f20f05b4df..36796d65f0 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -613,7 +613,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 	acquire_snapshot(snapshot);
 	snapshot->peeled = PEELED_NONE;
 
-	if (!load_contents(snapshot))
+	if (!load_contents(snapshot) || !snapshot->buf)
 		return snapshot;
 
 	/* If the file has a header line, process it: */
-- 
2.14.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL
  2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
                     ` (2 preceding siblings ...)
  2018-01-15 12:17   ` [PATCH 2/3] create_snapshot(): exit early if the file was empty Michael Haggerty
@ 2018-01-15 12:17   ` Michael Haggerty
  3 siblings, 0 replies; 15+ messages in thread
From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw)
  To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty

If `snapshot->buf` is NULL, then `find_reference_location()` has two
problems:

1. It relies on behavior that is technically undefined in C, such as
   computing `NULL + 0`.

2. It returns NULL if the reference doesn't exist, even if `mustexist`
   is not set. This problem doesn't come up in the current code,
   because we never call this function with `snapshot->buf == NULL`
   and `mustexist` set. But it is something that future callers need
   to be aware of.

We could fix the first problem by adding some extra logic to the
function. But considering both problems together, it is more
straightforward to document that the function should only be called if
`snapshot->buf` is non-NULL.

Adjust `packed_read_raw_ref()` to return early if `snapshot->buf` is
NULL rather than calling `find_reference_location()`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 36796d65f0..ed2b396bef 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -521,8 +521,9 @@ static int load_contents(struct snapshot *snapshot)
  * reference name; for example, one could search for "refs/replace/"
  * to find the start of any replace references.
  *
+ * This function must only be called if `snapshot->buf` is non-NULL.
  * The record is sought using a binary search, so `snapshot->buf` must
- * be sorted.
+ * also be sorted.
  */
 static const char *find_reference_location(struct snapshot *snapshot,
 					   const char *refname, int mustexist)
@@ -728,6 +729,12 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
 
 	*type = 0;
 
+	if (!snapshot->buf) {
+		/* There are no packed references */
+		errno = ENOENT;
+		return -1;
+	}
+
 	rec = find_reference_location(snapshot, refname, 1);
 
 	if (!rec) {
-- 
2.14.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] packed_ref_cache: don't use mmap() for small files
  2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels
  2018-01-13 18:56 ` Johannes Schindelin
  2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
@ 2018-01-15 21:15 ` Jeff King
  2018-01-15 23:37   ` Kim Gybels
  2 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2018-01-15 21:15 UTC (permalink / raw)
  To: Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty

On Sat, Jan 13, 2018 at 05:11:49PM +0100, Kim Gybels wrote:

> Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use
> read() instead of mmap() for small packed-refs files.
> 
> This also fixes the problem[1] where xmmap() returns NULL for zero
> length[2], for which munmap() later fails.
> 
> Alternatively, we could simply check for NULL before munmap(), or
> introduce an xmunmap() that could be used together with xmmap().

This looks good to me, and since it's a recent-ish regression, I think
we should take the minimal fix here.

But it does make me wonder whether xmmap() ought to be doing this "small
mmap" optimization for us. Obviously that only works when we do
MAP_PRIVATE and never write to the result. But that's how we always use
it anyway, and we're restricted to that to work with the NO_MMAP wrapper
in compat/mmap.c.

> @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot)
>  		die_errno("couldn't stat %s", snapshot->refs->path);
>  	size = xsize_t(st.st_size);
>  
> -	switch (mmap_strategy) {
> -	case MMAP_NONE:
> +	if (!size) {
> +		snapshot->buf = NULL;
> +		snapshot->eof = NULL;
> +		snapshot->mmapped = 0;
> +	} else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
>  		snapshot->buf = xmalloc(size);
>  		bytes_read = read_in_full(fd, snapshot->buf, size);
>  		if (bytes_read < 0 || bytes_read != size)
>  			die_errno("couldn't read %s", snapshot->refs->path);
>  		snapshot->eof = snapshot->buf + size;
>  		snapshot->mmapped = 0;

If the "!size" case is just lumped in with "size <= SMALL_FILE_SIZE",
then we'd try to xmalloc(0), which is guaranteed to work (we fallback to
a 1-byte allocation if necessary). Would that make things simpler and
more consistent for the rest of the code to always have snapshot->buf be
a valid pointer (just based on seeing Michael's follow-up patches)?

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] packed_ref_cache: don't use mmap() for small files
  2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King
@ 2018-01-15 23:37   ` Kim Gybels
  2018-01-15 23:52     ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Kim Gybels @ 2018-01-15 23:37 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty

On (15/01/18 16:15), Jeff King wrote:

> On Sat, Jan 13, 2018 at 05:11:49PM +0100, Kim Gybels wrote:
> 
> > Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use
> > read() instead of mmap() for small packed-refs files.
> > 
> > This also fixes the problem[1] where xmmap() returns NULL for zero
> > length[2], for which munmap() later fails.
> > 
> > Alternatively, we could simply check for NULL before munmap(), or
> > introduce an xmunmap() that could be used together with xmmap().
> 
> This looks good to me, and since it's a recent-ish regression, I think
> we should take the minimal fix here.

The minimal fix being a simple NULL check before munmap()?

> But it does make me wonder whether xmmap() ought to be doing this "small
> mmap" optimization for us. Obviously that only works when we do
> MAP_PRIVATE and never write to the result. But that's how we always use
> it anyway, and we're restricted to that to work with the NO_MMAP wrapper
> in compat/mmap.c.

Maybe I should have left the optimization for small files out of the patch for
the zero length regression. After all, read() vs mmap() performance might
depend on other factors than just size.

> > @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot)
> >  		die_errno("couldn't stat %s", snapshot->refs->path);
> >  	size = xsize_t(st.st_size);
> >  
> > -	switch (mmap_strategy) {
> > -	case MMAP_NONE:
> > +	if (!size) {
> > +		snapshot->buf = NULL;
> > +		snapshot->eof = NULL;
> > +		snapshot->mmapped = 0;
> > +	} else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
> >  		snapshot->buf = xmalloc(size);
> >  		bytes_read = read_in_full(fd, snapshot->buf, size);
> >  		if (bytes_read < 0 || bytes_read != size)
> >  			die_errno("couldn't read %s", snapshot->refs->path);
> >  		snapshot->eof = snapshot->buf + size;
> >  		snapshot->mmapped = 0;
> 
> If the "!size" case is just lumped in with "size <= SMALL_FILE_SIZE",
> then we'd try to xmalloc(0), which is guaranteed to work (we fallback to
> a 1-byte allocation if necessary). Would that make things simpler and
> more consistent for the rest of the code to always have snapshot->buf be
> a valid pointer (just based on seeing Michael's follow-up patches)?

Indeed, all those patches are to avoid using the NULL pointers in ways that are
undefined. We could also copy index_core's way of handling the zero length
case:
ret = index_mem(sha1, "", size, type, path, flags);

Point to some static memory instead of NULL, then all the pointer arithmetic is defined.

-Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] packed_ref_cache: don't use mmap() for small files
  2018-01-15 23:37   ` Kim Gybels
@ 2018-01-15 23:52     ` Jeff King
  2018-01-16 19:38       ` [PATCH v3] " Kim Gybels
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2018-01-15 23:52 UTC (permalink / raw)
  To: Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty

On Tue, Jan 16, 2018 at 12:37:51AM +0100, Kim Gybels wrote:

> > This looks good to me, and since it's a recent-ish regression, I think
> > we should take the minimal fix here.
> 
> The minimal fix being a simple NULL check before munmap()?

Sorry to be unclear. I just meant that your patch is probably fine
as-is. I didn't want to hold up a regression fix with a bunch of
nit-picking or possible future work, when we could build that on top
later.

> > But it does make me wonder whether xmmap() ought to be doing this "small
> > mmap" optimization for us. Obviously that only works when we do
> > MAP_PRIVATE and never write to the result. But that's how we always use
> > it anyway, and we're restricted to that to work with the NO_MMAP wrapper
> > in compat/mmap.c.
> 
> Maybe I should have left the optimization for small files out of the patch for
> the zero length regression. After all, read() vs mmap() performance might
> depend on other factors than just size.

I'd be OK including it here, since there's prior art in the commit you
referenced. Though of course actual numbers are always good when
claiming an optimization. :)

> > If the "!size" case is just lumped in with "size <= SMALL_FILE_SIZE",
> > then we'd try to xmalloc(0), which is guaranteed to work (we fallback to
> > a 1-byte allocation if necessary). Would that make things simpler and
> > more consistent for the rest of the code to always have snapshot->buf be
> > a valid pointer (just based on seeing Michael's follow-up patches)?
> 
> Indeed, all those patches are to avoid using the NULL pointers in ways that are
> undefined. We could also copy index_core's way of handling the zero length
> case:
> ret = index_mem(sha1, "", size, type, path, flags);
> 
> Point to some static memory instead of NULL, then all the pointer arithmetic is defined.

Yep, that would work, too. I don't think the overhead of a
once-per-process xmalloc(0) is a big deal, though, if it keeps the code
simpler (though I admit it is not that complex either way).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] packed_ref_cache: don't use mmap() for small files
  2018-01-15 23:52     ` Jeff King
@ 2018-01-16 19:38       ` " Kim Gybels
  2018-01-17 22:09         ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Kim Gybels @ 2018-01-16 19:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty, Kim Gybels

Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
small files, 2010-02-21) and use read() instead of mmap() for small
packed-refs files.

This also fixes the problem[1] where xmmap() returns NULL for zero
length[2], for which munmap() later fails.

Alternatively, we could simply check for NULL before munmap(), or
introduce xmunmap() that could be used together with xmmap(). However,
always setting snapshot->buf to a valid pointer, by relying on
xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots
easier.

[1] https://github.com/git-for-windows/git/issues/1410
[2] Logic introduced in commit 9130ac1e196 (Better error messages for
    corrupt databases, 2007-01-11)

Signed-off-by: Kim Gybels <kgybels@infogroep.be>
---

Change since v2: removed separate case for zero length as suggested by Peff,
ensuring that snapshot->buf is always a valid pointer.

 refs/packed-backend.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index dab8a85d9a..b6e2bc3c1d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 				 last_line, eof - last_line);
 }
 
+#define SMALL_FILE_SIZE (32*1024)
+
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
@@ -489,21 +491,17 @@ static int load_contents(struct snapshot *snapshot)
 		die_errno("couldn't stat %s", snapshot->refs->path);
 	size = xsize_t(st.st_size);
 
-	switch (mmap_strategy) {
-	case MMAP_NONE:
+	if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
 		snapshot->buf = xmalloc(size);
 		bytes_read = read_in_full(fd, snapshot->buf, size);
 		if (bytes_read < 0 || bytes_read != size)
 			die_errno("couldn't read %s", snapshot->refs->path);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 0;
-		break;
-	case MMAP_TEMPORARY:
-	case MMAP_OK:
+	} else {
 		snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 1;
-		break;
 	}
 	close(fd);
 
-- 
2.15.1.windows.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files"
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
@ 2018-01-17 20:23     ` Johannes Schindelin
  2018-01-17 21:52     ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Schindelin @ 2018-01-17 20:23 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Kim Gybels, Junio C Hamano, git

Hi Michael,

On Mon, 15 Jan 2018, Michael Haggerty wrote:

> Thanks for your patch. I haven't measured the performance difference
> of `mmap()` vs. `read()` for small `packed-refs` files, but it's not
> surprising that `read()` would be faster.
> 
> I especially like the fix for zero-length `packed-refs` files. (Even
> though AFAIK Git never writes such files, they are totally legitimate
> and shouldn't cause Git to fail.) With or without the additions
> mentioned below,
> 
> Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
> 
> While reviewing your patch, I realized that some areas of the existing
> code use constructs that are undefined according to the C standard,
> such as computing `NULL + 0` and `NULL - NULL`. This was already wrong
> (and would come up more frequently after your change). Even though
> these are unlikely to be problems in the real world, it would be good
> to avoid them.
> 
> So I will follow up this email with three patches:
> 
> 1. Mention that `snapshot::buf` can be NULL for empty files
> 
>    I suggest squashing this into your patch, to make it clear that
>    `snapshot::buf` and `snapshot::eof` can also be NULL if the
>    `packed-refs` file is empty.
> 
> 2. create_snapshot(): exit early if the file was empty
> 
>    Avoid undefined behavior by returning early if `snapshot->buf` is
>    NULL.
> 
> 3. find_reference_location(): don't invoke if `snapshot->buf` is NULL
> 
>    Avoid undefined behavior and confusing semantics by not calling
>    `find_reference_location()` when `snapshot->buf` is NULL.
> 
> Michael
> 
> Michael Haggerty (3):
>   SQUASH? Mention that `snapshot::buf` can be NULL for empty files
>   create_snapshot(): exit early if the file was empty
>   find_reference_location(): don't invoke if `snapshot->buf` is NULL

I reviewed those patches and find the straight-forward (and obviously
good).

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files"
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
  2018-01-17 20:23     ` Johannes Schindelin
@ 2018-01-17 21:52     ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2018-01-17 21:52 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Kim Gybels, Johannes Schindelin, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> So I will follow up this email with three patches:
>
> 1. Mention that `snapshot::buf` can be NULL for empty files
>
>    I suggest squashing this into your patch, to make it clear that
>    `snapshot::buf` and `snapshot::eof` can also be NULL if the
>    `packed-refs` file is empty.
>
> 2. create_snapshot(): exit early if the file was empty
>
>    Avoid undefined behavior by returning early if `snapshot->buf` is
>    NULL.
>
> 3. find_reference_location(): don't invoke if `snapshot->buf` is NULL
>
>    Avoid undefined behavior and confusing semantics by not calling
>    `find_reference_location()` when `snapshot->buf` is NULL.

These look all sensible with today's code and with v2 from this
thread.

With the v3, i.e. "do the xmalloc() even for size==0", however,
snapshot->buf would never be NULL, so I'd shelve them for now,
though.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files
  2018-01-16 19:38       ` [PATCH v3] " Kim Gybels
@ 2018-01-17 22:09         ` Jeff King
  2018-01-21  4:41           ` Michael Haggerty
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2018-01-17 22:09 UTC (permalink / raw)
  To: Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty

On Tue, Jan 16, 2018 at 08:38:15PM +0100, Kim Gybels wrote:

> Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
> small files, 2010-02-21) and use read() instead of mmap() for small
> packed-refs files.
> 
> This also fixes the problem[1] where xmmap() returns NULL for zero
> length[2], for which munmap() later fails.
> 
> Alternatively, we could simply check for NULL before munmap(), or
> introduce xmunmap() that could be used together with xmmap(). However,
> always setting snapshot->buf to a valid pointer, by relying on
> xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots
> easier.
> 
> [1] https://github.com/git-for-windows/git/issues/1410
> [2] Logic introduced in commit 9130ac1e196 (Better error messages for
>     corrupt databases, 2007-01-11)
> 
> Signed-off-by: Kim Gybels <kgybels@infogroep.be>
> ---
> 
> Change since v2: removed separate case for zero length as suggested by Peff,
> ensuring that snapshot->buf is always a valid pointer.

Thanks, this looks fine to me (I'd be curious to hear from Michael if
this eliminates the need for the other patches).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files
  2018-01-17 22:09         ` Jeff King
@ 2018-01-21  4:41           ` Michael Haggerty
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Haggerty @ 2018-01-21  4:41 UTC (permalink / raw)
  To: Jeff King, Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin

On 01/17/2018 11:09 PM, Jeff King wrote:
> On Tue, Jan 16, 2018 at 08:38:15PM +0100, Kim Gybels wrote:
> 
>> Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
>> small files, 2010-02-21) and use read() instead of mmap() for small
>> packed-refs files.
>>
>> This also fixes the problem[1] where xmmap() returns NULL for zero
>> length[2], for which munmap() later fails.
>>
>> Alternatively, we could simply check for NULL before munmap(), or
>> introduce xmunmap() that could be used together with xmmap(). However,
>> always setting snapshot->buf to a valid pointer, by relying on
>> xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots
>> easier.
>>
>> [1] https://github.com/git-for-windows/git/issues/1410
>> [2] Logic introduced in commit 9130ac1e196 (Better error messages for
>>     corrupt databases, 2007-01-11)
>>
>> Signed-off-by: Kim Gybels <kgybels@infogroep.be>
>> ---
>>
>> Change since v2: removed separate case for zero length as suggested by Peff,
>> ensuring that snapshot->buf is always a valid pointer.
> 
> Thanks, this looks fine to me (I'd be curious to hear from Michael if
> this eliminates the need for the other patches).

`snapshot->buf` can still be NULL if the `packed-refs` file didn't exist
(see the earlier code path in `load_contents()`). So either that code
path *also* has to get the `xmalloc()` treatment, or my third patch is
still necessary. (My second patch wouldn't be necessary because the
ENOENT case makes `load_contents()` return 0, triggering the early exit
from `create_snapshot()`.)

I don't have a strong preference either way.

Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, back to index

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels
2018-01-13 18:56 ` Johannes Schindelin
2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
2018-01-17 20:23     ` Johannes Schindelin
2018-01-17 21:52     ` Junio C Hamano
2018-01-15 12:17   ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty
2018-01-15 12:17   ` [PATCH 2/3] create_snapshot(): exit early if the file was empty Michael Haggerty
2018-01-15 12:17   ` [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL Michael Haggerty
2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King
2018-01-15 23:37   ` Kim Gybels
2018-01-15 23:52     ` Jeff King
2018-01-16 19:38       ` [PATCH v3] " Kim Gybels
2018-01-17 22:09         ` Jeff King
2018-01-21  4:41           ` Michael Haggerty

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox