git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: "Jeff King" <peff@peff.net>, "Junio C Hamano" <gitster@pobox.com>,
	"Stefan Beller" <sbeller@google.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Brandon Williams" <bmwill@google.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v2 14/21] read_packed_refs(): ensure that references are ordered when read
Date: Mon, 25 Sep 2017 17:44:55 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1.1709251743230.40514@virtualbox> (raw)
In-Reply-To: <d16e3003-e10e-6767-4d00-65e264a4a548@alum.mit.edu>

Hi Michael,

On Thu, 21 Sep 2017, Michael Haggerty wrote:

> On 09/20/2017 08:50 PM, Jeff King wrote:
> > On Tue, Sep 19, 2017 at 08:22:22AM +0200, Michael Haggerty wrote:
> >> If `packed-refs` was already sorted, then (if the system allows it)
> >> we can use the mmapped file contents directly. But if the system
> >> doesn't allow a file that is currently mmapped to be replaced using
> >> `rename()`, then it would be bad for us to keep the file mmapped for
> >> any longer than necessary. So, on such systems, always make a copy of
> >> the file contents, either as part of the sorting process, or
> >> afterwards.
> > 
> > So this sort-of answers my question from the previous commit (why we
> > care about the distinction between NONE and TEMPORARY), since we now
> > start treating them differently.
> > 
> > But I'm a bit confused why there's any advantage in the TEMPORARY case
> > to doing the mmap-and-copy versus just treating it like NONE and reading
> > it directly.
> > 
> > Hmm, I guess it comes down to the double-allocation thing again? Let me
> > see if I have this right:
> > 
> >   1. For NO_MMAP, we'd read the buffer once. If it's sorted, good. If
> >      not, then we temporarily hold it in memory twice while we copy it
> >      into the new sorted buffer.
> > 
> >   2. For TEMPORARY, we mmap once. If it's sorted, then we make a single
> >      copy. If it's not sorted, then we do the copy+sort as a single
> >      step.
> > 
> >   3. For MMAP_OK, if it's sorted, we're done. Otherwise, we do the
> >      single copy.
> > 
> > So this is really there to help the TEMPORARY case reading an old
> > unsorted file avoid needing to use double-the-ram during the copy?
> > 
> > That seems like a good reason (and it does not seem to add too much
> > complexity).
> 
> Yes, that's correct. A deeper question is whether it's worth this extra
> effort to optimize the conversion from "unsorted" to "known-sorted",
> which it seems should only happen once per repository. But
> 
> * Many more Git commands read the `packed-refs` file than write it.
>   So an "unsorted" file might have to be read multiple times before
>   the first time it is rewritten with the "sorted" trait.
> 
> * Users might have multiple Git clients writing their `packed-refs`
>   file (e.g., the git CLI plus JGit in Eclipse), and they might not
>   upgrade both clients at the same time to versions that write the
>   "sorted" trait. So a single repository might go back and forth
>   between "unsorted" and "known-sorted" multiple times in its
>   life.
> 
> * Theoretically, somebody might have a `packed-refs` file that is so
>   big that it takes up more than half of memory. I think there are
>   scenarios where such a file could be converted to "sorted" form
>   in `MMAP_TEMPORARY` mode but would fail in `MMAP_NONE` mode.
> 
> On the downside, in my benchmarking on Linux, there were hints that the
> `MMAP_TEMPORARY` branch might be a *tiny* bit slower than the `MMAP_OK`
> branch when operating on a known-sorted `packed-refs` file. But the
> speed difference was barely detectable (and might be illusory). And
> anyway, the speed difference on Linux is moot, since on Linux `MMAP_OK`
> mode will usually be used. It *would* be interesting to know if there is
> a significant speed difference on Windows. Dscho?

Sadly, I lack the time to test this thoroughly, but my experience is that
those things are dominated by I/O on Windows, i.e. that the malloc() and
copy won't matter all that much.

In any case, the same argument you make for Linux holds, in reverse, for
Windows: we cannot use MMAP_OK on Windows, so the point is moot ;-)

Ciao,
Dscho

  reply	other threads:[~2017-09-25 15:45 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-19  6:22 [PATCH v2 00/21] Read `packed-refs` using mmap() Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 01/21] ref_iterator: keep track of whether the iterator output is ordered Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 02/21] prefix_ref_iterator: break when we leave the prefix Michael Haggerty
2017-09-20 20:25   ` Stefan Beller
2017-09-21  4:59     ` Jeff King
2017-09-21 17:29       ` Stefan Beller
2017-09-21  7:42     ` Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 03/21] packed_ref_cache: add a backlink to the associated `packed_ref_store` Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 04/21] die_unterminated_line(), die_invalid_line(): new functions Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 05/21] read_packed_refs(): use mmap to read the `packed-refs` file Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 06/21] read_packed_refs(): only check for a header at the top of the file Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 07/21] read_packed_refs(): make parsing of the header line more robust Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 08/21] read_packed_refs(): read references with minimal copying Michael Haggerty
2017-09-20 18:27   ` Jeff King
2017-09-21  7:34     ` Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 09/21] packed_ref_cache: remember the file-wide peeling state Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 10/21] mmapped_ref_iterator: add iterator over a packed-refs file Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 11/21] mmapped_ref_iterator_advance(): no peeled value for broken refs Michael Haggerty
2017-09-20 18:29   ` Jeff King
2017-09-19  6:22 ` [PATCH v2 12/21] packed-backend.c: reorder some definitions Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 13/21] packed_ref_cache: keep the `packed-refs` file mmapped if possible Michael Haggerty
2017-09-19 12:44   ` Michael Haggerty
2017-09-24  6:56     ` Junio C Hamano
2017-09-20 18:40   ` Jeff King
2017-09-20 18:51     ` Jeff King
2017-09-21  8:04       ` Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 14/21] read_packed_refs(): ensure that references are ordered when read Michael Haggerty
2017-09-20 18:50   ` Jeff King
2017-09-21  8:27     ` Michael Haggerty
2017-09-25 15:44       ` Johannes Schindelin [this message]
2017-09-19  6:22 ` [PATCH v2 15/21] packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator` Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 16/21] packed_read_raw_ref(): read the reference from the mmapped buffer Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 17/21] ref_store: implement `refs_peel_ref()` generically Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 18/21] packed_ref_store: get rid of the `ref_cache` entirely Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 19/21] ref_cache: remove support for storing peeled values Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 20/21] mmapped_ref_iterator: inline into `packed_ref_iterator` Michael Haggerty
2017-09-19  6:22 ` [PATCH v2 21/21] packed-backend.c: rename a bunch of things and update comments Michael Haggerty
2017-09-19 19:53 ` [PATCH v2 00/21] Read `packed-refs` using mmap() Johannes Schindelin
2017-09-20 18:57 ` Jeff King
2017-09-25 15:55   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1.1709251743230.40514@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).