git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Neeraj Singh <nksingh85@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff Hostetler <git@jeffhostetler.com>,
	git@vger.kernel.org, "Neeraj K. Singh" <neerajsi@microsoft.com>
Subject: Re: [PATCH] read-cache: make the index write buffer size 128K
Date: Wed, 24 Feb 2021 12:56:46 -0800	[thread overview]
Message-ID: <CANQDOdfJApBOEm2gPMwtz9T0ETPoDk107mF7LYRGCmjFLi3Jxg@mail.gmail.com> (raw)
In-Reply-To: <xmqqo8gd8tyr.fsf@gitster.g>

On Sun, Feb 21, 2021 at 4:51 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Neeraj Singh <nksingh85@gmail.com> writes:
>
> >> >>   -#define WRITE_BUFFER_SIZE 8192
> >> >> +#define WRITE_BUFFER_SIZE (128 * 1024)
> >> >>   static unsigned char write_buffer[WRITE_BUFFER_SIZE];
> >> >>   static unsigned long write_buffer_len;
> >> >
> >> > [...]
> >> >
> >> > Very nice.
> >>
> >> I wonder if we gain more by going say 4M buffer size or even larger?
> >>
> >> Is this something we can make the system auto-tune itself?  This is
> >> not about reading but writing, so we already have enough information
> >> to estimate how much we would need to write out.
> >>
> >> Thanks.
> >>
> >
> > Hi Junio,
> > At some point the cost of the memcpy into the filesystem cache begins to
> > dominate the cost of the system call, so increasing the buffer size
> > has diminishing returns.
>
> Yes, I know that kind of "general principle".
>
> If I recall correctly, we used to pass too large a buffer to a
> single write(2) system call (I do not know if it was for the
> index---I suspect it was for some other data), and found out that it
> made response to ^C take too long, and tuned the buffer size down.
>
> I was asking where the sweet spot for this codepath would be, and if
> we can take a measurement to make a better decision than "8k feels
> too small and 128k turns out to be better than 8k".  It does not
> tell us if 128k would always do better than 64k or 256k, for
> example.
>
> I suspect that the sweet spot would be dependent on many parameters
> (not just the operating system, but also relative speed among
> memory, "disk", and cpu, and also the size of the index) and if we
> can devise a way to auto-tune it so that we do not have to worry
> about it.
>
> Thanks.

I think the main concern on a reasonably-configured machine is the speed
of memcpy and the cost of the code to get to that memcpy (syscall, file system
free space allocator, page allocator, mapping from file offset to cache page).
Disk shouldn't matter, since we write the file with OS buffering and
buffer flushing
will happen asynchronously some time after the git command completes.

If we think about doing the fastest possible memcpy, I think we want to aim for
maximizing the use of the CPU cache.  A write buffer that's too big would result
in most of the data being flushed to DRAM between when git writes it and the
OS reads it.  L1 caches are typically ~32K and L2 caches are on the
order of 256K.
We probably don't want to exceed the size of the L2 cache, and we
should actually
leave some room for OS code and data, so 128K is a good number from
that perspective.

I collected data from an experiment with different buffer sizes on Windows on my
3.6Ghz Xeon W-2133 machine:
https://docs.google.com/spreadsheets/d/1Bu6pjp53NPDK6AKQI_cry-hgxEqlicv27dptoXZYnwc/edit?usp=sharing

The timing is pretty much in the noise after we pass 32K.  So I think
8K is too small, but
given the flatness of the curve we can feel good about any value above
32K from a performance
perspective.  I still think 128K is a decent number that won't likely
need to be changed for
some time.

Thanks,
-Neeraj

  reply	other threads:[~2021-02-24 20:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-18  2:48 [PATCH] read-cache: make the index write buffer size 128K Neeraj K. Singh via GitGitGadget
2021-02-19 19:12 ` Jeff Hostetler
2021-02-20  3:28   ` Junio C Hamano
2021-02-20  7:56     ` Neeraj Singh
2021-02-21 12:51       ` Junio C Hamano
2021-02-24 20:56         ` Neeraj Singh [this message]
2021-02-25  5:41           ` Junio C Hamano
2021-02-25  6:58             ` Chris Torek
2021-02-25  7:16               ` Junio C Hamano
2021-02-25  7:36                 ` Neeraj Singh
2021-02-25  7:57                   ` Chris Torek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANQDOdfJApBOEm2gPMwtz9T0ETPoDk107mF7LYRGCmjFLi3Jxg@mail.gmail.com \
    --to=nksingh85@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=neerajsi@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).