git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>, Kim Gybels <kgybels@infogroep.be>,
	git@vger.kernel.org,
	Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files
Date: Wed, 24 Jan 2018 12:05:01 +0100	[thread overview]
Message-ID: <cea5e366-dc95-6f41-6373-f8bbef103561@alum.mit.edu> (raw)
In-Reply-To: <xmqqh8rdn113.fsf@gitster.mtv.corp.google.com>

On 01/22/2018 08:31 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> `snapshot->buf` can still be NULL if the `packed-refs` file didn't exist
>> (see the earlier code path in `load_contents()`). So either that code
>> path *also* has to get the `xmalloc()` treatment, or my third patch is
>> still necessary. (My second patch wouldn't be necessary because the
>> ENOENT case makes `load_contents()` return 0, triggering the early exit
>> from `create_snapshot()`.)
>>
>> I don't have a strong preference either way.
> 
> Which would be a two-liner, like the attached, which does not look
> too bad by itself.
> 
> The direction, if we take this approach, means that we are declaring
> that .buf being NULL is an invalid state for a snapshot to be in,
> instead of saying "an empty snapshot looks exactly like one that was
> freshly initialized", which seems to be the intention of the original
> design.
> 
> After Kim's fix and with 3/3 in your follow-up series, various
> helpers are still unsafe against .buf being NULL, like
> sort_snapshot(), verify_buffer_safe(), clear_snapshot_buffer() (only
> when mmapped bit is set), find_reference_location().
> 
> packed_ref_iterator_begin() checks if snapshot->buf is NULL and
> returns early.  At the first glance, this appears a useful short cut
> to optimize the empty case away, but the check also is acting as a
> guard to prevent a snapshot with NULL .buf from being fed to an
> unsafe find_reference_location().  An implicit guard like this feels
> a bit more brittle than my liking.  If we ensure .buf is never NULL,
> that check can become a pure short-cut optimization and stop being
> a correctness thing.
> 
> So...
> 
> 
>  refs/packed-backend.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index b6e2bc3c1d..1eeb5c7f80 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -473,12 +473,11 @@ static int load_contents(struct snapshot *snapshot)
>  	if (fd < 0) {
>  		if (errno == ENOENT) {
>  			/*
> -			 * This is OK; it just means that no
> -			 * "packed-refs" file has been written yet,
> -			 * which is equivalent to it being empty,
> -			 * which is its state when initialized with
> -			 * zeros.
> +			 * Treat missing "packed-refs" as equivalent to
> +			 * it being empty.
>  			 */
> +			snapshot->eof = snapshot->buf = xmalloc(0);
> +			snapshot->mmapped = 0;
>  			return 0;
>  		} else {
>  			die_errno("couldn't read %s", snapshot->refs->path);
> 

That would work, though if you go this way, please also change the
docstring for `snapshot::buf`, which still says that `buf` and `eof` can
be `NULL`.

The other alternative, making `snapshot` safe for NULLs, becomes easier
if `snapshot` stores a pointer to the start of the reference section of
the `packed-refs` contents (i.e., after the header line), rather than
repeatedly computing that address from `snapshot->buf +
snapshot->header_len`. With this change, code that is technically
undefined when the fields are NULL can more easily be replaced with code
that is safe for NULL. For example,

    pos = snapshot->buf + snapshot->header_len

becomes

    pos = snapshot->start

, and

    len = snapshot->eof - pos;
    if (!len) [...]

becomes

    if (pos == snapshot->eof) [...]
    len = snapshot->eof - pos;

. In this way, most of the special-casing for NULL goes away (and some
code becomes simpler, as well).

In a moment I'll send a patch series illustrating this approach. I think
patches 01, 02, and 04 are improvements regardless of whether we decide
to make NULL safe.

The change to using `read()` rather than `mmap()` for small
`packed-refs` feels like it should be an improvement, but it occurred to
me that the performance numbers quoted in ea68b0ce9f8 (hash-object:
don't use mmap() for small files, 2010-02-21) are not directly
applicable to the `packed-refs` file. As far as I understand, the file
mmapped in `index_fd()` is always read in full, whereas the main point
of mmapping the packed-refs file is to avoid having to read the whole
file at all in some situations. That being said, a 32 KiB file would
only be 8 pages (assuming a page size of 4 KiB), and by the time you've
read the header and binary-searched to find the desired record, you've
probably paged in most of the file anyway. Reading the whole file at
once, in order, is almost certainly cheaper.

Michael

  reply	other threads:[~2018-01-24 11:05 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels
2018-01-13 18:56 ` Johannes Schindelin
2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
2018-01-15 12:17     ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty
2018-01-15 12:17     ` [PATCH 2/3] create_snapshot(): exit early if the file was empty Michael Haggerty
2018-01-15 12:17     ` [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL Michael Haggerty
2018-01-17 20:23     ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Johannes Schindelin
2018-01-17 21:52     ` Junio C Hamano
2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King
2018-01-15 23:37   ` Kim Gybels
2018-01-15 23:52     ` Jeff King
2018-01-16 19:38       ` [PATCH v3] " Kim Gybels
2018-01-17 22:09         ` Jeff King
2018-01-21  4:41           ` Michael Haggerty
2018-01-22 19:31             ` Junio C Hamano
2018-01-24 11:05               ` Michael Haggerty [this message]
2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
2018-01-24 11:14                   ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty
2018-01-24 20:36                     ` Jeff King
2018-01-24 11:14                   ` [PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf Michael Haggerty
2018-01-24 11:14                   ` [PATCH 3/6] find_reference_location(): make function safe for empty snapshots Michael Haggerty
2018-01-24 20:27                     ` Jeff King
2018-01-24 21:11                       ` Junio C Hamano
2018-01-24 21:34                         ` Jeff King
2018-01-24 11:14                   ` [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general Michael Haggerty
2018-01-24 20:32                     ` Jeff King
2018-01-24 11:14                   ` [PATCH 5/6] load_contents(): don't try to mmap an empty file Michael Haggerty
2018-01-24 11:14                   ` [PATCH 6/6] packed_ref_cache: don't use mmap() for small files Michael Haggerty
2018-01-24 20:38                   ` [PATCH 0/6] Yet another approach to handling empty snapshots Jeff King
2018-01-24 20:54                   ` Junio C Hamano
2018-02-15 16:54                   ` Johannes Schindelin
2018-01-24 18:05                 ` [PATCH v3] packed_ref_cache: don't use mmap() for small files Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cea5e366-dc95-6f41-6373-f8bbef103561@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=kgybels@infogroep.be \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).