* [PATCH] packed_ref_cache: don't use mmap() for small files @ 2018-01-13 16:11 Kim Gybels 2018-01-13 18:56 ` Johannes Schindelin ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Kim Gybels @ 2018-01-13 16:11 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Johannes Schindelin, Michael Haggerty, Kim Gybels Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use read() instead of mmap() for small packed-refs files. This also fixes the problem[1] where xmmap() returns NULL for zero length[2], for which munmap() later fails. Alternatively, we could simply check for NULL before munmap(), or introduce an xmunmap() that could be used together with xmmap(). [1] https://github.com/git-for-windows/git/issues/1410 [2] Logic introduced in commit 9130ac1e1966adb9922e64f645730d0d45383495 Signed-off-by: Kim Gybels <kgybels@infogroep.be> --- refs/packed-backend.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index dab8a85d9a..7177e5bc2f 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot) last_line, eof - last_line); } +#define SMALL_FILE_SIZE (32*1024) + /* * Depending on `mmap_strategy`, either mmap or read the contents of * the `packed-refs` file into the snapshot. Return 1 if the file @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot) die_errno("couldn't stat %s", snapshot->refs->path); size = xsize_t(st.st_size); - switch (mmap_strategy) { - case MMAP_NONE: + if (!size) { + snapshot->buf = NULL; + snapshot->eof = NULL; + snapshot->mmapped = 0; + } else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) { snapshot->buf = xmalloc(size); bytes_read = read_in_full(fd, snapshot->buf, size); if (bytes_read < 0 || bytes_read != size) die_errno("couldn't read %s", snapshot->refs->path); snapshot->eof = snapshot->buf + size; snapshot->mmapped = 0; - break; - case MMAP_TEMPORARY: - case MMAP_OK: + } else { snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); snapshot->eof = snapshot->buf + size; snapshot->mmapped = 1; - break; } close(fd); -- 2.15.1.windows.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH] packed_ref_cache: don't use mmap() for small files 2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels @ 2018-01-13 18:56 ` Johannes Schindelin 2018-01-14 19:14 ` [PATCH v2] " Kim Gybels 2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King 2 siblings, 0 replies; 33+ messages in thread From: Johannes Schindelin @ 2018-01-13 18:56 UTC (permalink / raw) To: Kim Gybels; +Cc: git, Junio C Hamano, Michael Haggerty Hi, On Sat, 13 Jan 2018, Kim Gybels wrote: > Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use Maybe use ea68b0ce9f8 (hash-object: don't use mmap() for small files, 2010-02-21) instead of the full commit name? > read() instead of mmap() for small packed-refs files. > > This also fixes the problem[1] where xmmap() returns NULL for zero > length[2], for which munmap() later fails. > > Alternatively, we could simply check for NULL before munmap(), or > introduce an xmunmap() that could be used together with xmmap(). > > [1] https://github.com/git-for-windows/git/issues/1410 > [2] Logic introduced in commit 9130ac1e1966adb9922e64f645730d0d45383495 > > Signed-off-by: Kim Gybels <kgybels@infogroep.be> > --- > refs/packed-backend.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index dab8a85d9a..7177e5bc2f 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot) > last_line, eof - last_line); > } > > +#define SMALL_FILE_SIZE (32*1024) > + > /* > * Depending on `mmap_strategy`, either mmap or read the contents of > * the `packed-refs` file into the snapshot. Return 1 if the file > @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot) > die_errno("couldn't stat %s", snapshot->refs->path); > size = xsize_t(st.st_size); > > - switch (mmap_strategy) { > - case MMAP_NONE: > + if (!size) { > + snapshot->buf = NULL; > + snapshot->eof = NULL; > + snapshot->mmapped = 0; > + } else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) { > snapshot->buf = xmalloc(size); > bytes_read = read_in_full(fd, snapshot->buf, size); > if (bytes_read < 0 || bytes_read != size) > die_errno("couldn't read %s", snapshot->refs->path); > snapshot->eof = snapshot->buf + size; > snapshot->mmapped = 0; > - break; > - case MMAP_TEMPORARY: > - case MMAP_OK: > + } else { > snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); > snapshot->eof = snapshot->buf + size; > snapshot->mmapped = 1; > - break; > } > close(fd); Nicely explained, and nicely solved, for a potential extra performance benefit ;-) Thank you! Dscho ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2] packed_ref_cache: don't use mmap() for small files 2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels 2018-01-13 18:56 ` Johannes Schindelin @ 2018-01-14 19:14 ` Kim Gybels 2018-01-15 12:17 ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty 2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King 2 siblings, 1 reply; 33+ messages in thread From: Kim Gybels @ 2018-01-14 19:14 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Johannes Schindelin, Michael Haggerty, Kim Gybels Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for small files, 2010-02-21) and use read() instead of mmap() for small packed-refs files. This also fixes the problem[1] where xmmap() returns NULL for zero length[2], for which munmap() later fails. Alternatively, we could simply check for NULL before munmap(), or introduce xmunmap() that could be used together with xmmap(). [1] https://github.com/git-for-windows/git/issues/1410 [2] Logic introduced in commit 9130ac1e196 (Better error messages for corrupt databases, 2007-01-11) Signed-off-by: Kim Gybels <kgybels@infogroep.be> --- Change since v1: reworded commit message based on Johannes Schindelin's feedback: shorter commit hashes, and included commit titles. refs/packed-backend.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index dab8a85d9a..7177e5bc2f 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot) last_line, eof - last_line); } +#define SMALL_FILE_SIZE (32*1024) + /* * Depending on `mmap_strategy`, either mmap or read the contents of * the `packed-refs` file into the snapshot. Return 1 if the file @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot) die_errno("couldn't stat %s", snapshot->refs->path); size = xsize_t(st.st_size); - switch (mmap_strategy) { - case MMAP_NONE: + if (!size) { + snapshot->buf = NULL; + snapshot->eof = NULL; + snapshot->mmapped = 0; + } else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) { snapshot->buf = xmalloc(size); bytes_read = read_in_full(fd, snapshot->buf, size); if (bytes_read < 0 || bytes_read != size) die_errno("couldn't read %s", snapshot->refs->path); snapshot->eof = snapshot->buf + size; snapshot->mmapped = 0; - break; - case MMAP_TEMPORARY: - case MMAP_OK: + } else { snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); snapshot->eof = snapshot->buf + size; snapshot->mmapped = 1; - break; } close(fd); -- 2.16.0.rc2.windows.1 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" 2018-01-14 19:14 ` [PATCH v2] " Kim Gybels @ 2018-01-15 12:17 ` Michael Haggerty 2018-01-15 12:17 ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty ` (4 more replies) 0 siblings, 5 replies; 33+ messages in thread From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw) To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty Thanks for your patch. I haven't measured the performance difference of `mmap()` vs. `read()` for small `packed-refs` files, but it's not surprising that `read()` would be faster. I especially like the fix for zero-length `packed-refs` files. (Even though AFAIK Git never writes such files, they are totally legitimate and shouldn't cause Git to fail.) With or without the additions mentioned below, Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu> While reviewing your patch, I realized that some areas of the existing code use constructs that are undefined according to the C standard, such as computing `NULL + 0` and `NULL - NULL`. This was already wrong (and would come up more frequently after your change). Even though these are unlikely to be problems in the real world, it would be good to avoid them. So I will follow up this email with three patches: 1. Mention that `snapshot::buf` can be NULL for empty files I suggest squashing this into your patch, to make it clear that `snapshot::buf` and `snapshot::eof` can also be NULL if the `packed-refs` file is empty. 2. create_snapshot(): exit early if the file was empty Avoid undefined behavior by returning early if `snapshot->buf` is NULL. 3. find_reference_location(): don't invoke if `snapshot->buf` is NULL Avoid undefined behavior and confusing semantics by not calling `find_reference_location()` when `snapshot->buf` is NULL. Michael Michael Haggerty (3): SQUASH? Mention that `snapshot::buf` can be NULL for empty files create_snapshot(): exit early if the file was empty find_reference_location(): don't invoke if `snapshot->buf` is NULL refs/packed-backend.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) -- 2.14.2 ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files 2018-01-15 12:17 ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty @ 2018-01-15 12:17 ` Michael Haggerty 2018-01-15 12:17 ` [PATCH 2/3] create_snapshot(): exit early if the file was empty Michael Haggerty ` (3 subsequent siblings) 4 siblings, 0 replies; 33+ messages in thread From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw) To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- refs/packed-backend.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 01a13cb817..f20f05b4df 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -69,11 +69,11 @@ struct snapshot { /* * The contents of the `packed-refs` file. If the file was - * already sorted, this points at the mmapped contents of the - * file. If not, this points at heap-allocated memory - * containing the contents, sorted. If there were no contents - * (e.g., because the file didn't exist), `buf` and `eof` are - * both NULL. + * already sorted and if mmapping is allowed, this points at + * the mmapped contents of the file. If not, this points at + * heap-allocated memory containing the contents, sorted. If + * there were no contents (e.g., because the file didn't exist + * or was empty), `buf` and `eof` are both NULL. */ char *buf, *eof; -- 2.14.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 2/3] create_snapshot(): exit early if the file was empty 2018-01-15 12:17 ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty 2018-01-15 12:17 ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty @ 2018-01-15 12:17 ` Michael Haggerty 2018-01-15 12:17 ` [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL Michael Haggerty ` (2 subsequent siblings) 4 siblings, 0 replies; 33+ messages in thread From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw) To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty If the `packed_refs` files is entirely empty (i.e., not even a header line), then `load_contents()` returns 1 even though `snapshot->buf` and `snapshot->eof` both end up set to NULL. In that case, the subsequent processing that `create_snapshot()` does is unnecessary, and also involves computing `NULL - NULL` and `NULL + 0`, which are probably safe in real life but are technically undefined in C. Sidestep both issues by exiting early if `snapshot->buf` is NULL. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- refs/packed-backend.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index f20f05b4df..36796d65f0 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -613,7 +613,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) acquire_snapshot(snapshot); snapshot->peeled = PEELED_NONE; - if (!load_contents(snapshot)) + if (!load_contents(snapshot) || !snapshot->buf) return snapshot; /* If the file has a header line, process it: */ -- 2.14.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL 2018-01-15 12:17 ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty 2018-01-15 12:17 ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty 2018-01-15 12:17 ` [PATCH 2/3] create_snapshot(): exit early if the file was empty Michael Haggerty @ 2018-01-15 12:17 ` Michael Haggerty 2018-01-17 20:23 ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Johannes Schindelin 2018-01-17 21:52 ` Junio C Hamano 4 siblings, 0 replies; 33+ messages in thread From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw) To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty If `snapshot->buf` is NULL, then `find_reference_location()` has two problems: 1. It relies on behavior that is technically undefined in C, such as computing `NULL + 0`. 2. It returns NULL if the reference doesn't exist, even if `mustexist` is not set. This problem doesn't come up in the current code, because we never call this function with `snapshot->buf == NULL` and `mustexist` set. But it is something that future callers need to be aware of. We could fix the first problem by adding some extra logic to the function. But considering both problems together, it is more straightforward to document that the function should only be called if `snapshot->buf` is non-NULL. Adjust `packed_read_raw_ref()` to return early if `snapshot->buf` is NULL rather than calling `find_reference_location()`. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- refs/packed-backend.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 36796d65f0..ed2b396bef 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -521,8 +521,9 @@ static int load_contents(struct snapshot *snapshot) * reference name; for example, one could search for "refs/replace/" * to find the start of any replace references. * + * This function must only be called if `snapshot->buf` is non-NULL. * The record is sought using a binary search, so `snapshot->buf` must - * be sorted. + * also be sorted. */ static const char *find_reference_location(struct snapshot *snapshot, const char *refname, int mustexist) @@ -728,6 +729,12 @@ static int packed_read_raw_ref(struct ref_store *ref_store, *type = 0; + if (!snapshot->buf) { + /* There are no packed references */ + errno = ENOENT; + return -1; + } + rec = find_reference_location(snapshot, refname, 1); if (!rec) { -- 2.14.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" 2018-01-15 12:17 ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty ` (2 preceding siblings ...) 2018-01-15 12:17 ` [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL Michael Haggerty @ 2018-01-17 20:23 ` Johannes Schindelin 2018-01-17 21:52 ` Junio C Hamano 4 siblings, 0 replies; 33+ messages in thread From: Johannes Schindelin @ 2018-01-17 20:23 UTC (permalink / raw) To: Michael Haggerty; +Cc: Kim Gybels, Junio C Hamano, git Hi Michael, On Mon, 15 Jan 2018, Michael Haggerty wrote: > Thanks for your patch. I haven't measured the performance difference > of `mmap()` vs. `read()` for small `packed-refs` files, but it's not > surprising that `read()` would be faster. > > I especially like the fix for zero-length `packed-refs` files. (Even > though AFAIK Git never writes such files, they are totally legitimate > and shouldn't cause Git to fail.) With or without the additions > mentioned below, > > Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu> > > While reviewing your patch, I realized that some areas of the existing > code use constructs that are undefined according to the C standard, > such as computing `NULL + 0` and `NULL - NULL`. This was already wrong > (and would come up more frequently after your change). Even though > these are unlikely to be problems in the real world, it would be good > to avoid them. > > So I will follow up this email with three patches: > > 1. Mention that `snapshot::buf` can be NULL for empty files > > I suggest squashing this into your patch, to make it clear that > `snapshot::buf` and `snapshot::eof` can also be NULL if the > `packed-refs` file is empty. > > 2. create_snapshot(): exit early if the file was empty > > Avoid undefined behavior by returning early if `snapshot->buf` is > NULL. > > 3. find_reference_location(): don't invoke if `snapshot->buf` is NULL > > Avoid undefined behavior and confusing semantics by not calling > `find_reference_location()` when `snapshot->buf` is NULL. > > Michael > > Michael Haggerty (3): > SQUASH? Mention that `snapshot::buf` can be NULL for empty files > create_snapshot(): exit early if the file was empty > find_reference_location(): don't invoke if `snapshot->buf` is NULL I reviewed those patches and find the straight-forward (and obviously good). Thanks, Dscho ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" 2018-01-15 12:17 ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty ` (3 preceding siblings ...) 2018-01-17 20:23 ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Johannes Schindelin @ 2018-01-17 21:52 ` Junio C Hamano 4 siblings, 0 replies; 33+ messages in thread From: Junio C Hamano @ 2018-01-17 21:52 UTC (permalink / raw) To: Michael Haggerty; +Cc: Kim Gybels, Johannes Schindelin, git Michael Haggerty <mhagger@alum.mit.edu> writes: > So I will follow up this email with three patches: > > 1. Mention that `snapshot::buf` can be NULL for empty files > > I suggest squashing this into your patch, to make it clear that > `snapshot::buf` and `snapshot::eof` can also be NULL if the > `packed-refs` file is empty. > > 2. create_snapshot(): exit early if the file was empty > > Avoid undefined behavior by returning early if `snapshot->buf` is > NULL. > > 3. find_reference_location(): don't invoke if `snapshot->buf` is NULL > > Avoid undefined behavior and confusing semantics by not calling > `find_reference_location()` when `snapshot->buf` is NULL. These look all sensible with today's code and with v2 from this thread. With the v3, i.e. "do the xmalloc() even for size==0", however, snapshot->buf would never be NULL, so I'd shelve them for now, though. Thanks. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] packed_ref_cache: don't use mmap() for small files 2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels 2018-01-13 18:56 ` Johannes Schindelin 2018-01-14 19:14 ` [PATCH v2] " Kim Gybels @ 2018-01-15 21:15 ` Jeff King 2018-01-15 23:37 ` Kim Gybels 2 siblings, 1 reply; 33+ messages in thread From: Jeff King @ 2018-01-15 21:15 UTC (permalink / raw) To: Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty On Sat, Jan 13, 2018 at 05:11:49PM +0100, Kim Gybels wrote: > Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use > read() instead of mmap() for small packed-refs files. > > This also fixes the problem[1] where xmmap() returns NULL for zero > length[2], for which munmap() later fails. > > Alternatively, we could simply check for NULL before munmap(), or > introduce an xmunmap() that could be used together with xmmap(). This looks good to me, and since it's a recent-ish regression, I think we should take the minimal fix here. But it does make me wonder whether xmmap() ought to be doing this "small mmap" optimization for us. Obviously that only works when we do MAP_PRIVATE and never write to the result. But that's how we always use it anyway, and we're restricted to that to work with the NO_MMAP wrapper in compat/mmap.c. > @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot) > die_errno("couldn't stat %s", snapshot->refs->path); > size = xsize_t(st.st_size); > > - switch (mmap_strategy) { > - case MMAP_NONE: > + if (!size) { > + snapshot->buf = NULL; > + snapshot->eof = NULL; > + snapshot->mmapped = 0; > + } else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) { > snapshot->buf = xmalloc(size); > bytes_read = read_in_full(fd, snapshot->buf, size); > if (bytes_read < 0 || bytes_read != size) > die_errno("couldn't read %s", snapshot->refs->path); > snapshot->eof = snapshot->buf + size; > snapshot->mmapped = 0; If the "!size" case is just lumped in with "size <= SMALL_FILE_SIZE", then we'd try to xmalloc(0), which is guaranteed to work (we fallback to a 1-byte allocation if necessary). Would that make things simpler and more consistent for the rest of the code to always have snapshot->buf be a valid pointer (just based on seeing Michael's follow-up patches)? -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] packed_ref_cache: don't use mmap() for small files 2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King @ 2018-01-15 23:37 ` Kim Gybels 2018-01-15 23:52 ` Jeff King 0 siblings, 1 reply; 33+ messages in thread From: Kim Gybels @ 2018-01-15 23:37 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty On (15/01/18 16:15), Jeff King wrote: > On Sat, Jan 13, 2018 at 05:11:49PM +0100, Kim Gybels wrote: > > > Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use > > read() instead of mmap() for small packed-refs files. > > > > This also fixes the problem[1] where xmmap() returns NULL for zero > > length[2], for which munmap() later fails. > > > > Alternatively, we could simply check for NULL before munmap(), or > > introduce an xmunmap() that could be used together with xmmap(). > > This looks good to me, and since it's a recent-ish regression, I think > we should take the minimal fix here. The minimal fix being a simple NULL check before munmap()? > But it does make me wonder whether xmmap() ought to be doing this "small > mmap" optimization for us. Obviously that only works when we do > MAP_PRIVATE and never write to the result. But that's how we always use > it anyway, and we're restricted to that to work with the NO_MMAP wrapper > in compat/mmap.c. Maybe I should have left the optimization for small files out of the patch for the zero length regression. After all, read() vs mmap() performance might depend on other factors than just size. > > @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot) > > die_errno("couldn't stat %s", snapshot->refs->path); > > size = xsize_t(st.st_size); > > > > - switch (mmap_strategy) { > > - case MMAP_NONE: > > + if (!size) { > > + snapshot->buf = NULL; > > + snapshot->eof = NULL; > > + snapshot->mmapped = 0; > > + } else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) { > > snapshot->buf = xmalloc(size); > > bytes_read = read_in_full(fd, snapshot->buf, size); > > if (bytes_read < 0 || bytes_read != size) > > die_errno("couldn't read %s", snapshot->refs->path); > > snapshot->eof = snapshot->buf + size; > > snapshot->mmapped = 0; > > If the "!size" case is just lumped in with "size <= SMALL_FILE_SIZE", > then we'd try to xmalloc(0), which is guaranteed to work (we fallback to > a 1-byte allocation if necessary). Would that make things simpler and > more consistent for the rest of the code to always have snapshot->buf be > a valid pointer (just based on seeing Michael's follow-up patches)? Indeed, all those patches are to avoid using the NULL pointers in ways that are undefined. We could also copy index_core's way of handling the zero length case: ret = index_mem(sha1, "", size, type, path, flags); Point to some static memory instead of NULL, then all the pointer arithmetic is defined. -Kim ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] packed_ref_cache: don't use mmap() for small files 2018-01-15 23:37 ` Kim Gybels @ 2018-01-15 23:52 ` Jeff King 2018-01-16 19:38 ` [PATCH v3] " Kim Gybels 0 siblings, 1 reply; 33+ messages in thread From: Jeff King @ 2018-01-15 23:52 UTC (permalink / raw) To: Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty On Tue, Jan 16, 2018 at 12:37:51AM +0100, Kim Gybels wrote: > > This looks good to me, and since it's a recent-ish regression, I think > > we should take the minimal fix here. > > The minimal fix being a simple NULL check before munmap()? Sorry to be unclear. I just meant that your patch is probably fine as-is. I didn't want to hold up a regression fix with a bunch of nit-picking or possible future work, when we could build that on top later. > > But it does make me wonder whether xmmap() ought to be doing this "small > > mmap" optimization for us. Obviously that only works when we do > > MAP_PRIVATE and never write to the result. But that's how we always use > > it anyway, and we're restricted to that to work with the NO_MMAP wrapper > > in compat/mmap.c. > > Maybe I should have left the optimization for small files out of the patch for > the zero length regression. After all, read() vs mmap() performance might > depend on other factors than just size. I'd be OK including it here, since there's prior art in the commit you referenced. Though of course actual numbers are always good when claiming an optimization. :) > > If the "!size" case is just lumped in with "size <= SMALL_FILE_SIZE", > > then we'd try to xmalloc(0), which is guaranteed to work (we fallback to > > a 1-byte allocation if necessary). Would that make things simpler and > > more consistent for the rest of the code to always have snapshot->buf be > > a valid pointer (just based on seeing Michael's follow-up patches)? > > Indeed, all those patches are to avoid using the NULL pointers in ways that are > undefined. We could also copy index_core's way of handling the zero length > case: > ret = index_mem(sha1, "", size, type, path, flags); > > Point to some static memory instead of NULL, then all the pointer arithmetic is defined. Yep, that would work, too. I don't think the overhead of a once-per-process xmalloc(0) is a big deal, though, if it keeps the code simpler (though I admit it is not that complex either way). -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v3] packed_ref_cache: don't use mmap() for small files 2018-01-15 23:52 ` Jeff King @ 2018-01-16 19:38 ` Kim Gybels 2018-01-17 22:09 ` Jeff King 0 siblings, 1 reply; 33+ messages in thread From: Kim Gybels @ 2018-01-16 19:38 UTC (permalink / raw) To: Jeff King Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty, Kim Gybels Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for small files, 2010-02-21) and use read() instead of mmap() for small packed-refs files. This also fixes the problem[1] where xmmap() returns NULL for zero length[2], for which munmap() later fails. Alternatively, we could simply check for NULL before munmap(), or introduce xmunmap() that could be used together with xmmap(). However, always setting snapshot->buf to a valid pointer, by relying on xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots easier. [1] https://github.com/git-for-windows/git/issues/1410 [2] Logic introduced in commit 9130ac1e196 (Better error messages for corrupt databases, 2007-01-11) Signed-off-by: Kim Gybels <kgybels@infogroep.be> --- Change since v2: removed separate case for zero length as suggested by Peff, ensuring that snapshot->buf is always a valid pointer. refs/packed-backend.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index dab8a85d9a..b6e2bc3c1d 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot) last_line, eof - last_line); } +#define SMALL_FILE_SIZE (32*1024) + /* * Depending on `mmap_strategy`, either mmap or read the contents of * the `packed-refs` file into the snapshot. Return 1 if the file @@ -489,21 +491,17 @@ static int load_contents(struct snapshot *snapshot) die_errno("couldn't stat %s", snapshot->refs->path); size = xsize_t(st.st_size); - switch (mmap_strategy) { - case MMAP_NONE: + if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) { snapshot->buf = xmalloc(size); bytes_read = read_in_full(fd, snapshot->buf, size); if (bytes_read < 0 || bytes_read != size) die_errno("couldn't read %s", snapshot->refs->path); snapshot->eof = snapshot->buf + size; snapshot->mmapped = 0; - break; - case MMAP_TEMPORARY: - case MMAP_OK: + } else { snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); snapshot->eof = snapshot->buf + size; snapshot->mmapped = 1; - break; } close(fd); -- 2.15.1.windows.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files 2018-01-16 19:38 ` [PATCH v3] " Kim Gybels @ 2018-01-17 22:09 ` Jeff King 2018-01-21 4:41 ` Michael Haggerty 0 siblings, 1 reply; 33+ messages in thread From: Jeff King @ 2018-01-17 22:09 UTC (permalink / raw) To: Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty On Tue, Jan 16, 2018 at 08:38:15PM +0100, Kim Gybels wrote: > Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for > small files, 2010-02-21) and use read() instead of mmap() for small > packed-refs files. > > This also fixes the problem[1] where xmmap() returns NULL for zero > length[2], for which munmap() later fails. > > Alternatively, we could simply check for NULL before munmap(), or > introduce xmunmap() that could be used together with xmmap(). However, > always setting snapshot->buf to a valid pointer, by relying on > xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots > easier. > > [1] https://github.com/git-for-windows/git/issues/1410 > [2] Logic introduced in commit 9130ac1e196 (Better error messages for > corrupt databases, 2007-01-11) > > Signed-off-by: Kim Gybels <kgybels@infogroep.be> > --- > > Change since v2: removed separate case for zero length as suggested by Peff, > ensuring that snapshot->buf is always a valid pointer. Thanks, this looks fine to me (I'd be curious to hear from Michael if this eliminates the need for the other patches). -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files 2018-01-17 22:09 ` Jeff King @ 2018-01-21 4:41 ` Michael Haggerty 2018-01-22 19:31 ` Junio C Hamano 0 siblings, 1 reply; 33+ messages in thread From: Michael Haggerty @ 2018-01-21 4:41 UTC (permalink / raw) To: Jeff King, Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin On 01/17/2018 11:09 PM, Jeff King wrote: > On Tue, Jan 16, 2018 at 08:38:15PM +0100, Kim Gybels wrote: > >> Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for >> small files, 2010-02-21) and use read() instead of mmap() for small >> packed-refs files. >> >> This also fixes the problem[1] where xmmap() returns NULL for zero >> length[2], for which munmap() later fails. >> >> Alternatively, we could simply check for NULL before munmap(), or >> introduce xmunmap() that could be used together with xmmap(). However, >> always setting snapshot->buf to a valid pointer, by relying on >> xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots >> easier. >> >> [1] https://github.com/git-for-windows/git/issues/1410 >> [2] Logic introduced in commit 9130ac1e196 (Better error messages for >> corrupt databases, 2007-01-11) >> >> Signed-off-by: Kim Gybels <kgybels@infogroep.be> >> --- >> >> Change since v2: removed separate case for zero length as suggested by Peff, >> ensuring that snapshot->buf is always a valid pointer. > > Thanks, this looks fine to me (I'd be curious to hear from Michael if > this eliminates the need for the other patches). `snapshot->buf` can still be NULL if the `packed-refs` file didn't exist (see the earlier code path in `load_contents()`). So either that code path *also* has to get the `xmalloc()` treatment, or my third patch is still necessary. (My second patch wouldn't be necessary because the ENOENT case makes `load_contents()` return 0, triggering the early exit from `create_snapshot()`.) I don't have a strong preference either way. Michael ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files 2018-01-21 4:41 ` Michael Haggerty @ 2018-01-22 19:31 ` Junio C Hamano 2018-01-24 11:05 ` Michael Haggerty 0 siblings, 1 reply; 33+ messages in thread From: Junio C Hamano @ 2018-01-22 19:31 UTC (permalink / raw) To: Michael Haggerty; +Cc: Jeff King, Kim Gybels, git, Johannes Schindelin Michael Haggerty <mhagger@alum.mit.edu> writes: > `snapshot->buf` can still be NULL if the `packed-refs` file didn't exist > (see the earlier code path in `load_contents()`). So either that code > path *also* has to get the `xmalloc()` treatment, or my third patch is > still necessary. (My second patch wouldn't be necessary because the > ENOENT case makes `load_contents()` return 0, triggering the early exit > from `create_snapshot()`.) > > I don't have a strong preference either way. Which would be a two-liner, like the attached, which does not look too bad by itself. The direction, if we take this approach, means that we are declaring that .buf being NULL is an invalid state for a snapshot to be in, instead of saying "an empty snapshot looks exactly like one that was freshly initialized", which seems to be the intention of the original design. After Kim's fix and with 3/3 in your follow-up series, various helpers are still unsafe against .buf being NULL, like sort_snapshot(), verify_buffer_safe(), clear_snapshot_buffer() (only when mmapped bit is set), find_reference_location(). packed_ref_iterator_begin() checks if snapshot->buf is NULL and returns early. At the first glance, this appears a useful short cut to optimize the empty case away, but the check also is acting as a guard to prevent a snapshot with NULL .buf from being fed to an unsafe find_reference_location(). An implicit guard like this feels a bit more brittle than my liking. If we ensure .buf is never NULL, that check can become a pure short-cut optimization and stop being a correctness thing. So... refs/packed-backend.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index b6e2bc3c1d..1eeb5c7f80 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -473,12 +473,11 @@ static int load_contents(struct snapshot *snapshot) if (fd < 0) { if (errno == ENOENT) { /* - * This is OK; it just means that no - * "packed-refs" file has been written yet, - * which is equivalent to it being empty, - * which is its state when initialized with - * zeros. + * Treat missing "packed-refs" as equivalent to + * it being empty. */ + snapshot->eof = snapshot->buf = xmalloc(0); + snapshot->mmapped = 0; return 0; } else { die_errno("couldn't read %s", snapshot->refs->path); ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files 2018-01-22 19:31 ` Junio C Hamano @ 2018-01-24 11:05 ` Michael Haggerty 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty 2018-01-24 18:05 ` [PATCH v3] packed_ref_cache: don't use mmap() for small files Junio C Hamano 0 siblings, 2 replies; 33+ messages in thread From: Michael Haggerty @ 2018-01-24 11:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Kim Gybels, git, Johannes Schindelin On 01/22/2018 08:31 PM, Junio C Hamano wrote: > Michael Haggerty <mhagger@alum.mit.edu> writes: > >> `snapshot->buf` can still be NULL if the `packed-refs` file didn't exist >> (see the earlier code path in `load_contents()`). So either that code >> path *also* has to get the `xmalloc()` treatment, or my third patch is >> still necessary. (My second patch wouldn't be necessary because the >> ENOENT case makes `load_contents()` return 0, triggering the early exit >> from `create_snapshot()`.) >> >> I don't have a strong preference either way. > > Which would be a two-liner, like the attached, which does not look > too bad by itself. > > The direction, if we take this approach, means that we are declaring > that .buf being NULL is an invalid state for a snapshot to be in, > instead of saying "an empty snapshot looks exactly like one that was > freshly initialized", which seems to be the intention of the original > design. > > After Kim's fix and with 3/3 in your follow-up series, various > helpers are still unsafe against .buf being NULL, like > sort_snapshot(), verify_buffer_safe(), clear_snapshot_buffer() (only > when mmapped bit is set), find_reference_location(). > > packed_ref_iterator_begin() checks if snapshot->buf is NULL and > returns early. At the first glance, this appears a useful short cut > to optimize the empty case away, but the check also is acting as a > guard to prevent a snapshot with NULL .buf from being fed to an > unsafe find_reference_location(). An implicit guard like this feels > a bit more brittle than my liking. If we ensure .buf is never NULL, > that check can become a pure short-cut optimization and stop being > a correctness thing. > > So... > > > refs/packed-backend.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index b6e2bc3c1d..1eeb5c7f80 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -473,12 +473,11 @@ static int load_contents(struct snapshot *snapshot) > if (fd < 0) { > if (errno == ENOENT) { > /* > - * This is OK; it just means that no > - * "packed-refs" file has been written yet, > - * which is equivalent to it being empty, > - * which is its state when initialized with > - * zeros. > + * Treat missing "packed-refs" as equivalent to > + * it being empty. > */ > + snapshot->eof = snapshot->buf = xmalloc(0); > + snapshot->mmapped = 0; > return 0; > } else { > die_errno("couldn't read %s", snapshot->refs->path); > That would work, though if you go this way, please also change the docstring for `snapshot::buf`, which still says that `buf` and `eof` can be `NULL`. The other alternative, making `snapshot` safe for NULLs, becomes easier if `snapshot` stores a pointer to the start of the reference section of the `packed-refs` contents (i.e., after the header line), rather than repeatedly computing that address from `snapshot->buf + snapshot->header_len`. With this change, code that is technically undefined when the fields are NULL can more easily be replaced with code that is safe for NULL. For example, pos = snapshot->buf + snapshot->header_len becomes pos = snapshot->start , and len = snapshot->eof - pos; if (!len) [...] becomes if (pos == snapshot->eof) [...] len = snapshot->eof - pos; . In this way, most of the special-casing for NULL goes away (and some code becomes simpler, as well). In a moment I'll send a patch series illustrating this approach. I think patches 01, 02, and 04 are improvements regardless of whether we decide to make NULL safe. The change to using `read()` rather than `mmap()` for small `packed-refs` feels like it should be an improvement, but it occurred to me that the performance numbers quoted in ea68b0ce9f8 (hash-object: don't use mmap() for small files, 2010-02-21) are not directly applicable to the `packed-refs` file. As far as I understand, the file mmapped in `index_fd()` is always read in full, whereas the main point of mmapping the packed-refs file is to avoid having to read the whole file at all in some situations. That being said, a 32 KiB file would only be 8 pages (assuming a page size of 4 KiB), and by the time you've read the header and binary-searched to find the desired record, you've probably paged in most of the file anyway. Reading the whole file at once, in order, is almost certainly cheaper. Michael ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 0/6] Yet another approach to handling empty snapshots 2018-01-24 11:05 ` Michael Haggerty @ 2018-01-24 11:14 ` Michael Haggerty 2018-01-24 11:14 ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty ` (8 more replies) 2018-01-24 18:05 ` [PATCH v3] packed_ref_cache: don't use mmap() for small files Junio C Hamano 1 sibling, 9 replies; 33+ messages in thread From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw) To: Junio C Hamano Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty This patch series fixes the handling of empty packed-refs snapshots (i.e., those with `snapshot->buf` and friends equal to `NULL`), partly by changing `snapshot` to store a pointer to the start of the post-header `packed-refs` content instead of `header_len`. It makes a couple of other improvements as well. I'm not sure whether I like this approach better than the alternative of always setting `snapshot->buf` to a non-NULL value, by allocating a length-1 bit of RAM if necessary. The latter is less intrusive, though even if that approach is taken, I think patches 01, 02, and 04 from this patch series would be worthwhile improvements. Michael Kim Gybels (1): packed_ref_cache: don't use mmap() for small files Michael Haggerty (5): struct snapshot: store `start` rather than `header_len` create_snapshot(): use `xmemdupz()` rather than a strbuf find_reference_location(): make function safe for empty snapshots packed_ref_iterator_begin(): make optimization more general load_contents(): don't try to mmap an empty file refs/packed-backend.c | 106 ++++++++++++++++++++++++++------------------------ 1 file changed, 55 insertions(+), 51 deletions(-) -- 2.14.2 ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 1/6] struct snapshot: store `start` rather than `header_len` 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty @ 2018-01-24 11:14 ` Michael Haggerty 2018-01-24 20:36 ` Jeff King 2018-01-24 11:14 ` [PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf Michael Haggerty ` (7 subsequent siblings) 8 siblings, 1 reply; 33+ messages in thread From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw) To: Junio C Hamano Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty Store a pointer to the start of the actual references within the `packed-refs` contents rather than storing the length of the header. This is more convenient for most users of this field. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- refs/packed-backend.c | 64 ++++++++++++++++++++++++++------------------------- 1 file changed, 33 insertions(+), 31 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 023243fd5f..b872267f02 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -68,17 +68,21 @@ struct snapshot { int mmapped; /* - * The contents of the `packed-refs` file. If the file was - * already sorted, this points at the mmapped contents of the - * file. If not, this points at heap-allocated memory - * containing the contents, sorted. If there were no contents - * (e.g., because the file didn't exist), `buf` and `eof` are - * both NULL. + * The contents of the `packed-refs` file: + * + * - buf -- a pointer to the start of the memory + * - start -- a pointer to the first byte of actual references + * (i.e., after the header line, if one is present) + * - eof -- a pointer just past the end of the reference + * contents + * + * If the `packed-refs` file was already sorted, `buf` points + * at the mmapped contents of the file. If not, it points at + * heap-allocated memory containing the contents, sorted. If + * there were no contents (e.g., because the file didn't + * exist), `buf`, `start`, and `eof` are all NULL. */ - char *buf, *eof; - - /* The size of the header line, if any; otherwise, 0: */ - size_t header_len; + char *buf, *start, *eof; /* * What is the peeled state of the `packed-refs` file that @@ -169,8 +173,7 @@ static void clear_snapshot_buffer(struct snapshot *snapshot) } else { free(snapshot->buf); } - snapshot->buf = snapshot->eof = NULL; - snapshot->header_len = 0; + snapshot->buf = snapshot->start = snapshot->eof = NULL; } /* @@ -319,13 +322,14 @@ static void sort_snapshot(struct snapshot *snapshot) size_t len, i; char *new_buffer, *dst; - pos = snapshot->buf + snapshot->header_len; + pos = snapshot->start; eof = snapshot->eof; - len = eof - pos; - if (!len) + if (pos == eof) return; + len = eof - pos; + /* * Initialize records based on a crude estimate of the number * of references in the file (we'll grow it below if needed): @@ -391,9 +395,8 @@ static void sort_snapshot(struct snapshot *snapshot) * place: */ clear_snapshot_buffer(snapshot); - snapshot->buf = new_buffer; + snapshot->buf = snapshot->start = new_buffer; snapshot->eof = new_buffer + len; - snapshot->header_len = 0; cleanup: free(records); @@ -442,14 +445,14 @@ static const char *find_end_of_record(const char *p, const char *end) */ static void verify_buffer_safe(struct snapshot *snapshot) { - const char *buf = snapshot->buf + snapshot->header_len; + const char *start = snapshot->start; const char *eof = snapshot->eof; const char *last_line; - if (buf == eof) + if (start == eof) return; - last_line = find_start_of_record(buf, eof - 1); + last_line = find_start_of_record(start, eof - 1); if (*(eof - 1) != '\n' || eof - last_line < GIT_SHA1_HEXSZ + 2) die_invalid_line(snapshot->refs->path, last_line, eof - last_line); @@ -495,18 +498,19 @@ static int load_contents(struct snapshot *snapshot) bytes_read = read_in_full(fd, snapshot->buf, size); if (bytes_read < 0 || bytes_read != size) die_errno("couldn't read %s", snapshot->refs->path); - snapshot->eof = snapshot->buf + size; snapshot->mmapped = 0; break; case MMAP_TEMPORARY: case MMAP_OK: snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); - snapshot->eof = snapshot->buf + size; snapshot->mmapped = 1; break; } close(fd); + snapshot->start = snapshot->buf; + snapshot->eof = snapshot->buf + size; + return 1; } @@ -539,7 +543,7 @@ static const char *find_reference_location(struct snapshot *snapshot, * preceding records all have reference names that come * *before* `refname`. */ - const char *lo = snapshot->buf + snapshot->header_len; + const char *lo = snapshot->start; /* * A pointer to a the first character of a record whose @@ -617,8 +621,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) /* If the file has a header line, process it: */ if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') { struct strbuf tmp = STRBUF_INIT; - char *p; - const char *eol; + char *p, *eol; struct string_list traits = STRING_LIST_INIT_NODUP; eol = memchr(snapshot->buf, '\n', @@ -647,7 +650,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) /* perhaps other traits later as well */ /* The "+ 1" is for the LF character. */ - snapshot->header_len = eol + 1 - snapshot->buf; + snapshot->start = eol + 1; string_list_clear(&traits, 0); strbuf_release(&tmp); @@ -671,13 +674,12 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) * We don't want to leave the file mmapped, so we are * forced to make a copy now: */ - size_t size = snapshot->eof - - (snapshot->buf + snapshot->header_len); + size_t size = snapshot->eof - snapshot->start; char *buf_copy = xmalloc(size); - memcpy(buf_copy, snapshot->buf + snapshot->header_len, size); + memcpy(buf_copy, snapshot->start, size); clear_snapshot_buffer(snapshot); - snapshot->buf = buf_copy; + snapshot->buf = snapshot->start = buf_copy; snapshot->eof = buf_copy + size; } @@ -937,7 +939,7 @@ static struct ref_iterator *packed_ref_iterator_begin( if (prefix && *prefix) start = find_reference_location(snapshot, prefix, 0); else - start = snapshot->buf + snapshot->header_len; + start = snapshot->start; iter->pos = start; iter->eof = snapshot->eof; -- 2.14.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 1/6] struct snapshot: store `start` rather than `header_len` 2018-01-24 11:14 ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty @ 2018-01-24 20:36 ` Jeff King 0 siblings, 0 replies; 33+ messages in thread From: Jeff King @ 2018-01-24 20:36 UTC (permalink / raw) To: Michael Haggerty; +Cc: Junio C Hamano, Kim Gybels, Johannes Schindelin, git On Wed, Jan 24, 2018 at 12:14:11PM +0100, Michael Haggerty wrote: > Store a pointer to the start of the actual references within the > `packed-refs` contents rather than storing the length of the header. > This is more convenient for most users of this field. This makes sense. It means that the "start" pointer needs to be invalidated if "buf" ever changes. But that was pretty much the case already with "header_len" (because "buf" is not a heap buffer that we might realloc; if it ever changes it is because we re-read the file, and we would have to re-parse the header length anyway). -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty 2018-01-24 11:14 ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty @ 2018-01-24 11:14 ` Michael Haggerty 2018-01-24 11:14 ` [PATCH 3/6] find_reference_location(): make function safe for empty snapshots Michael Haggerty ` (6 subsequent siblings) 8 siblings, 0 replies; 33+ messages in thread From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw) To: Junio C Hamano Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty It's lighter weight. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- refs/packed-backend.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index b872267f02..08698de6ea 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -620,8 +620,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) /* If the file has a header line, process it: */ if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') { - struct strbuf tmp = STRBUF_INIT; - char *p, *eol; + char *tmp, *p, *eol; struct string_list traits = STRING_LIST_INIT_NODUP; eol = memchr(snapshot->buf, '\n', @@ -631,9 +630,9 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) snapshot->buf, snapshot->eof - snapshot->buf); - strbuf_add(&tmp, snapshot->buf, eol - snapshot->buf); + tmp = xmemdupz(snapshot->buf, eol - snapshot->buf); - if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char **)&p)) + if (!skip_prefix(tmp, "# pack-refs with:", (const char **)&p)) die_invalid_line(refs->path, snapshot->buf, snapshot->eof - snapshot->buf); @@ -653,7 +652,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) snapshot->start = eol + 1; string_list_clear(&traits, 0); - strbuf_release(&tmp); + free(tmp); } verify_buffer_safe(snapshot); -- 2.14.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 3/6] find_reference_location(): make function safe for empty snapshots 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty 2018-01-24 11:14 ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty 2018-01-24 11:14 ` [PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf Michael Haggerty @ 2018-01-24 11:14 ` Michael Haggerty 2018-01-24 20:27 ` Jeff King 2018-01-24 11:14 ` [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general Michael Haggerty ` (5 subsequent siblings) 8 siblings, 1 reply; 33+ messages in thread From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw) To: Junio C Hamano Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty This function had two problems if called for an empty snapshot (i.e., `snapshot->start == snapshot->eof == NULL`): * It checked `NULL < NULL`, which is undefined by C (albeit highly unlikely to fail in the real world). * (Assuming the above comparison behaved as expected), it returned NULL when `mustexist` was false, contrary to its docstring. Change the check and fix the docstring. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- refs/packed-backend.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 08698de6ea..361affd7ad 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -519,9 +519,11 @@ static int load_contents(struct snapshot *snapshot) * `refname` starts. If `mustexist` is true and the reference doesn't * exist, then return NULL. If `mustexist` is false and the reference * doesn't exist, then return the point where that reference would be - * inserted. In the latter mode, `refname` doesn't have to be a proper - * reference name; for example, one could search for "refs/replace/" - * to find the start of any replace references. + * inserted, or `snapshot->eof` (which might be NULL) if it would be + * inserted at the end of the file. In the latter mode, `refname` + * doesn't have to be a proper reference name; for example, one could + * search for "refs/replace/" to find the start of any replace + * references. * * The record is sought using a binary search, so `snapshot->buf` must * be sorted. @@ -551,7 +553,7 @@ static const char *find_reference_location(struct snapshot *snapshot, */ const char *hi = snapshot->eof; - while (lo < hi) { + while (lo != hi) { const char *mid, *rec; int cmp; -- 2.14.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 3/6] find_reference_location(): make function safe for empty snapshots 2018-01-24 11:14 ` [PATCH 3/6] find_reference_location(): make function safe for empty snapshots Michael Haggerty @ 2018-01-24 20:27 ` Jeff King 2018-01-24 21:11 ` Junio C Hamano 0 siblings, 1 reply; 33+ messages in thread From: Jeff King @ 2018-01-24 20:27 UTC (permalink / raw) To: Michael Haggerty; +Cc: Junio C Hamano, Kim Gybels, Johannes Schindelin, git On Wed, Jan 24, 2018 at 12:14:13PM +0100, Michael Haggerty wrote: > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index 08698de6ea..361affd7ad 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > [...] > @@ -551,7 +553,7 @@ static const char *find_reference_location(struct snapshot *snapshot, > */ > const char *hi = snapshot->eof; > > - while (lo < hi) { > + while (lo != hi) { > const char *mid, *rec; > int cmp; This tightens the binary search termination condition. If we ever did see "hi > lo", we'd want to terminate the loop. Is that ever possible? I think the answer is "no". Our "hi" here is an exclusive bound, so we should never go past it via find_end_of_record() when assigning "lo". And "hi" is always assigned from the start of the current record. That can never cross "lo", because find_start_of_record() ensures it. So I think it's fine, but I wanted to double check. -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 3/6] find_reference_location(): make function safe for empty snapshots 2018-01-24 20:27 ` Jeff King @ 2018-01-24 21:11 ` Junio C Hamano 2018-01-24 21:34 ` Jeff King 0 siblings, 1 reply; 33+ messages in thread From: Junio C Hamano @ 2018-01-24 21:11 UTC (permalink / raw) To: Jeff King; +Cc: Michael Haggerty, Kim Gybels, Johannes Schindelin, git Jeff King <peff@peff.net> writes: > On Wed, Jan 24, 2018 at 12:14:13PM +0100, Michael Haggerty wrote: > >> diff --git a/refs/packed-backend.c b/refs/packed-backend.c >> index 08698de6ea..361affd7ad 100644 >> --- a/refs/packed-backend.c >> +++ b/refs/packed-backend.c >> [...] >> @@ -551,7 +553,7 @@ static const char *find_reference_location(struct snapshot *snapshot, >> */ >> const char *hi = snapshot->eof; >> >> - while (lo < hi) { >> + while (lo != hi) { >> const char *mid, *rec; >> int cmp; > > This tightens the binary search termination condition. If we ever did > see "hi > lo", we'd want to terminate the loop. Is that ever possible? I think you meant "lo > hi", but I shared the same "Huh?" moment. Because "While lo is strictly lower than hi" is a so well established binary search pattern, even though we know that it is equivalent to "While lo and hi is different" due to your analysis below, the new code looks somewhat strange at the first glance. > I think the answer is "no". Our "hi" here is an exclusive bound, so we > should never go past it via find_end_of_record() when assigning "lo". > And "hi" is always assigned from the start of the current record. That > can never cross "lo", because find_start_of_record() ensures it. > > So I think it's fine, but I wanted to double check. It would be much simpler to reason about if we instead do #define is_empty_snapshot(s) ((s)->start == NULL) if (is_empty_snapshot(snapshot)) return NULL; or something like that upfront. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 3/6] find_reference_location(): make function safe for empty snapshots 2018-01-24 21:11 ` Junio C Hamano @ 2018-01-24 21:34 ` Jeff King 0 siblings, 0 replies; 33+ messages in thread From: Jeff King @ 2018-01-24 21:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michael Haggerty, Kim Gybels, Johannes Schindelin, git On Wed, Jan 24, 2018 at 01:11:00PM -0800, Junio C Hamano wrote: > > This tightens the binary search termination condition. If we ever did > > see "hi > lo", we'd want to terminate the loop. Is that ever possible? > > I think you meant "lo > hi", but I shared the same "Huh?" moment. Er, yeah. Sorry about that. > Because "While lo is strictly lower than hi" is a so well > established binary search pattern, even though we know that it is > equivalent to "While lo and hi is different" due to your analysis > below, the new code looks somewhat strange at the first glance. I thought at first that this was due to the way the record-finding happens, but I think even in our normal binary searches, it is an invariant that "lo <= hi". > > I think the answer is "no". Our "hi" here is an exclusive bound, so we > > should never go past it via find_end_of_record() when assigning "lo". > > And "hi" is always assigned from the start of the current record. That > > can never cross "lo", because find_start_of_record() ensures it. > > > > So I think it's fine, but I wanted to double check. > > It would be much simpler to reason about if we instead do > > #define is_empty_snapshot(s) ((s)->start == NULL) > > if (is_empty_snapshot(snapshot)) > return NULL; > > or something like that upfront. Yes, I agree that would also work. -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty ` (2 preceding siblings ...) 2018-01-24 11:14 ` [PATCH 3/6] find_reference_location(): make function safe for empty snapshots Michael Haggerty @ 2018-01-24 11:14 ` Michael Haggerty 2018-01-24 20:32 ` Jeff King 2018-01-24 11:14 ` [PATCH 5/6] load_contents(): don't try to mmap an empty file Michael Haggerty ` (4 subsequent siblings) 8 siblings, 1 reply; 33+ messages in thread From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw) To: Junio C Hamano Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty We can return an empty iterator not only if the `packed-refs` file is missing, but also if it is empty or if there are no references whose names succeed `prefix`. Optimize away those cases as well by moving the call to `find_reference_location()` higher in the function and checking whether the determined start position is the same as `snapshot->eof`. (This is possible now because the previous commit made `find_reference_location()` robust against empty snapshots.) Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- refs/packed-backend.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 361affd7ad..988c45402b 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -927,7 +927,12 @@ static struct ref_iterator *packed_ref_iterator_begin( */ snapshot = get_snapshot(refs); - if (!snapshot->buf) + if (prefix && *prefix) + start = find_reference_location(snapshot, prefix, 0); + else + start = snapshot->start; + + if (start == snapshot->eof) return empty_ref_iterator_begin(); iter = xcalloc(1, sizeof(*iter)); @@ -937,11 +942,6 @@ static struct ref_iterator *packed_ref_iterator_begin( iter->snapshot = snapshot; acquire_snapshot(snapshot); - if (prefix && *prefix) - start = find_reference_location(snapshot, prefix, 0); - else - start = snapshot->start; - iter->pos = start; iter->eof = snapshot->eof; strbuf_init(&iter->refname_buf, 0); -- 2.14.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general 2018-01-24 11:14 ` [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general Michael Haggerty @ 2018-01-24 20:32 ` Jeff King 0 siblings, 0 replies; 33+ messages in thread From: Jeff King @ 2018-01-24 20:32 UTC (permalink / raw) To: Michael Haggerty; +Cc: Junio C Hamano, Kim Gybels, Johannes Schindelin, git On Wed, Jan 24, 2018 at 12:14:14PM +0100, Michael Haggerty wrote: > We can return an empty iterator not only if the `packed-refs` file is > missing, but also if it is empty or if there are no references whose > names succeed `prefix`. Optimize away those cases as well by moving > the call to `find_reference_location()` higher in the function and > checking whether the determined start position is the same as > `snapshot->eof`. (This is possible now because the previous commit > made `find_reference_location()` robust against empty snapshots.) Makes sense. > @@ -937,11 +942,6 @@ static struct ref_iterator *packed_ref_iterator_begin( > iter->snapshot = snapshot; > acquire_snapshot(snapshot); > > - if (prefix && *prefix) > - start = find_reference_location(snapshot, prefix, 0); > - else > - start = snapshot->start; > - I did a double-take here that we are now looking at the snapshot without calling acquire_snapshot(). But that function is just about taking a refcount on it. The actual acquisition of data happens in get_snapshot(). -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 5/6] load_contents(): don't try to mmap an empty file 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty ` (3 preceding siblings ...) 2018-01-24 11:14 ` [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general Michael Haggerty @ 2018-01-24 11:14 ` Michael Haggerty 2018-01-24 11:14 ` [PATCH 6/6] packed_ref_cache: don't use mmap() for small files Michael Haggerty ` (3 subsequent siblings) 8 siblings, 0 replies; 33+ messages in thread From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw) To: Junio C Hamano Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty We don't actually create zero-length `packed-refs` files, but they are valid and we should handle them correctly. The old code `xmmap()`ed such files, which led to an error when `munmap()` was called. So, if the `packed-refs` file is empty, leave the snapshot at its zero values and return 0 without trying to read or mmap the file. Returning 0 also makes `create_snapshot()` exit early, which avoids the technically undefined comparison `NULL < NULL`. Reported-by: Kim Gybels <kgybels@infogroep.be> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- refs/packed-backend.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 988c45402b..e829cf206d 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -461,7 +461,8 @@ static void verify_buffer_safe(struct snapshot *snapshot) /* * Depending on `mmap_strategy`, either mmap or read the contents of * the `packed-refs` file into the snapshot. Return 1 if the file - * existed and was read, or 0 if the file was absent. Die on errors. + * existed and was read, or 0 if the file was absent or empty. Die on + * errors. */ static int load_contents(struct snapshot *snapshot) { @@ -492,19 +493,17 @@ static int load_contents(struct snapshot *snapshot) die_errno("couldn't stat %s", snapshot->refs->path); size = xsize_t(st.st_size); - switch (mmap_strategy) { - case MMAP_NONE: + if (!size) { + return 0; + } else if (mmap_strategy == MMAP_NONE) { snapshot->buf = xmalloc(size); bytes_read = read_in_full(fd, snapshot->buf, size); if (bytes_read < 0 || bytes_read != size) die_errno("couldn't read %s", snapshot->refs->path); snapshot->mmapped = 0; - break; - case MMAP_TEMPORARY: - case MMAP_OK: + } else { snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); snapshot->mmapped = 1; - break; } close(fd); -- 2.14.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 6/6] packed_ref_cache: don't use mmap() for small files 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty ` (4 preceding siblings ...) 2018-01-24 11:14 ` [PATCH 5/6] load_contents(): don't try to mmap an empty file Michael Haggerty @ 2018-01-24 11:14 ` Michael Haggerty 2018-01-24 20:38 ` [PATCH 0/6] Yet another approach to handling empty snapshots Jeff King ` (2 subsequent siblings) 8 siblings, 0 replies; 33+ messages in thread From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw) To: Junio C Hamano Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty From: Kim Gybels <kgybels@infogroep.be> Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for small files, 2010-02-21) and use read() instead of mmap() for small packed-refs files. Signed-off-by: Kim Gybels <kgybels@infogroep.be> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- refs/packed-backend.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index e829cf206d..8b4b45da67 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -458,6 +458,8 @@ static void verify_buffer_safe(struct snapshot *snapshot) last_line, eof - last_line); } +#define SMALL_FILE_SIZE (32*1024) + /* * Depending on `mmap_strategy`, either mmap or read the contents of * the `packed-refs` file into the snapshot. Return 1 if the file @@ -495,7 +497,7 @@ static int load_contents(struct snapshot *snapshot) if (!size) { return 0; - } else if (mmap_strategy == MMAP_NONE) { + } else if (mmap_strategy == MMAP_NONE || size <= SMALL_FILE_SIZE) { snapshot->buf = xmalloc(size); bytes_read = read_in_full(fd, snapshot->buf, size); if (bytes_read < 0 || bytes_read != size) -- 2.14.2 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 0/6] Yet another approach to handling empty snapshots 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty ` (5 preceding siblings ...) 2018-01-24 11:14 ` [PATCH 6/6] packed_ref_cache: don't use mmap() for small files Michael Haggerty @ 2018-01-24 20:38 ` Jeff King 2018-01-24 20:54 ` Junio C Hamano 2018-02-15 16:54 ` Johannes Schindelin 8 siblings, 0 replies; 33+ messages in thread From: Jeff King @ 2018-01-24 20:38 UTC (permalink / raw) To: Michael Haggerty; +Cc: Junio C Hamano, Kim Gybels, Johannes Schindelin, git On Wed, Jan 24, 2018 at 12:14:10PM +0100, Michael Haggerty wrote: > This patch series fixes the handling of empty packed-refs snapshots > (i.e., those with `snapshot->buf` and friends equal to `NULL`), partly > by changing `snapshot` to store a pointer to the start of the > post-header `packed-refs` content instead of `header_len`. It makes a > couple of other improvements as well. > > I'm not sure whether I like this approach better than the alternative > of always setting `snapshot->buf` to a non-NULL value, by allocating a > length-1 bit of RAM if necessary. The latter is less intrusive, though > even if that approach is taken, I think patches 01, 02, and 04 from > this patch series would be worthwhile improvements. This looks good to me. I agree that 1, 2, and 4 are improvements regardless (but 4 as it is now depends on 3, right?). I don't have a strong opinion between this series and the other options presented. It's probably not worth agonizing over, so we should pick one and move on. -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/6] Yet another approach to handling empty snapshots 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty ` (6 preceding siblings ...) 2018-01-24 20:38 ` [PATCH 0/6] Yet another approach to handling empty snapshots Jeff King @ 2018-01-24 20:54 ` Junio C Hamano 2018-02-15 16:54 ` Johannes Schindelin 8 siblings, 0 replies; 33+ messages in thread From: Junio C Hamano @ 2018-01-24 20:54 UTC (permalink / raw) To: Michael Haggerty; +Cc: Kim Gybels, Johannes Schindelin, Jeff King, git Michael Haggerty <mhagger@alum.mit.edu> writes: > This patch series fixes the handling of empty packed-refs snapshots > (i.e., those with `snapshot->buf` and friends equal to `NULL`), partly > by changing `snapshot` to store a pointer to the start of the > post-header `packed-refs` content instead of `header_len`. It makes a > couple of other improvements as well. > > I'm not sure whether I like this approach better than the alternative > of always setting `snapshot->buf` to a non-NULL value, by allocating a > length-1 bit of RAM if necessary. The latter is less intrusive, though > even if that approach is taken, I think patches 01, 02, and 04 from > this patch series would be worthwhile improvements. I do not have a strong preference either way, but somehow feel that this is more "coherent" ;-) That is certainly subjective, though. Thanks. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/6] Yet another approach to handling empty snapshots 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty ` (7 preceding siblings ...) 2018-01-24 20:54 ` Junio C Hamano @ 2018-02-15 16:54 ` Johannes Schindelin 8 siblings, 0 replies; 33+ messages in thread From: Johannes Schindelin @ 2018-02-15 16:54 UTC (permalink / raw) To: Michael Haggerty; +Cc: Junio C Hamano, Kim Gybels, Jeff King, git Hi Michael, On Wed, 24 Jan 2018, Michael Haggerty wrote: > This patch series fixes the handling of empty packed-refs snapshots > (i.e., those with `snapshot->buf` and friends equal to `NULL`), partly > by changing `snapshot` to store a pointer to the start of the > post-header `packed-refs` content instead of `header_len`. It makes a > couple of other improvements as well. > > I'm not sure whether I like this approach better than the alternative > of always setting `snapshot->buf` to a non-NULL value, by allocating a > length-1 bit of RAM if necessary. The latter is less intrusive, though > even if that approach is taken, I think patches 01, 02, and 04 from > this patch series would be worthwhile improvements. Thank you for Cc:ing me on this patch series. I tried to find some time to review it, I really did, but failed. As I saw that others already had a good look at it, I will just archive the mail thread. I hope you do not mind! Ciao, Dscho ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files 2018-01-24 11:05 ` Michael Haggerty 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty @ 2018-01-24 18:05 ` Junio C Hamano 1 sibling, 0 replies; 33+ messages in thread From: Junio C Hamano @ 2018-01-24 18:05 UTC (permalink / raw) To: Michael Haggerty; +Cc: Jeff King, Kim Gybels, git, Johannes Schindelin Michael Haggerty <mhagger@alum.mit.edu> writes: > The change to using `read()` rather than `mmap()` for small > `packed-refs` feels like it should be an improvement, but it occurred to > me that the performance numbers quoted in ea68b0ce9f8 (hash-object: > don't use mmap() for small files, 2010-02-21) are not directly > applicable to the `packed-refs` file. As far as I understand, the file > mmapped in `index_fd()` is always read in full, whereas the main point > of mmapping the packed-refs file is to avoid having to read the whole > file at all in some situations. That being said, a 32 KiB file would > only be 8 pages (assuming a page size of 4 KiB), and by the time you've > read the header and binary-searched to find the desired record, you've > probably paged in most of the file anyway. Reading the whole file at > once, in order, is almost certainly cheaper. Yup. So unless your "small" is meaningfully large, we are likely to be better off with read(2), but I suspect that this might not be even measuable since we are only talking about "small" files. ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2018-02-15 16:54 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels 2018-01-13 18:56 ` Johannes Schindelin 2018-01-14 19:14 ` [PATCH v2] " Kim Gybels 2018-01-15 12:17 ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty 2018-01-15 12:17 ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty 2018-01-15 12:17 ` [PATCH 2/3] create_snapshot(): exit early if the file was empty Michael Haggerty 2018-01-15 12:17 ` [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL Michael Haggerty 2018-01-17 20:23 ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Johannes Schindelin 2018-01-17 21:52 ` Junio C Hamano 2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King 2018-01-15 23:37 ` Kim Gybels 2018-01-15 23:52 ` Jeff King 2018-01-16 19:38 ` [PATCH v3] " Kim Gybels 2018-01-17 22:09 ` Jeff King 2018-01-21 4:41 ` Michael Haggerty 2018-01-22 19:31 ` Junio C Hamano 2018-01-24 11:05 ` Michael Haggerty 2018-01-24 11:14 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty 2018-01-24 11:14 ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty 2018-01-24 20:36 ` Jeff King 2018-01-24 11:14 ` [PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf Michael Haggerty 2018-01-24 11:14 ` [PATCH 3/6] find_reference_location(): make function safe for empty snapshots Michael Haggerty 2018-01-24 20:27 ` Jeff King 2018-01-24 21:11 ` Junio C Hamano 2018-01-24 21:34 ` Jeff King 2018-01-24 11:14 ` [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general Michael Haggerty 2018-01-24 20:32 ` Jeff King 2018-01-24 11:14 ` [PATCH 5/6] load_contents(): don't try to mmap an empty file Michael Haggerty 2018-01-24 11:14 ` [PATCH 6/6] packed_ref_cache: don't use mmap() for small files Michael Haggerty 2018-01-24 20:38 ` [PATCH 0/6] Yet another approach to handling empty snapshots Jeff King 2018-01-24 20:54 ` Junio C Hamano 2018-02-15 16:54 ` Johannes Schindelin 2018-01-24 18:05 ` [PATCH v3] packed_ref_cache: don't use mmap() for small files Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).