git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Neeraj K. Singh via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Neeraj K. Singh" <neerajsi@microsoft.com>,
	Neeraj Singh <neerajsi@ntdev.microsoft.com>
Subject: [PATCH] read-cache: make the index write buffer size 128K
Date: Thu, 18 Feb 2021 02:48:26 +0000	[thread overview]
Message-ID: <pull.877.git.1613616506949.gitgitgadget@gmail.com> (raw)

From: Neeraj Singh <neerajsi@ntdev.microsoft.com>

Writing an index 8K at a time invokes the OS filesystem and caching code
very frequently, introducing noticeable overhead while writing large
indexes. When experimenting with different write buffer sizes on Windows
writing the Windows OS repo index (260MB), most of the benefit came by
bumping the index write buffer size to 64K. I picked 128K to ensure that
we're past the knee of the curve.

With this change, the time under do_write_index for an index with 3M
files goes from ~1.02s to ~0.72s.

Signed-off-by: Neeraj Singh <neerajsi@ntdev.microsoft.com>
---
    read-cache: make the index write buffer size 128K
    
    Writing an index 8K at a time invokes the OS filesystem and caching code
    very frequently, introducing noticeable overhead while writing large
    indexes. When experimenting with different write buffer sizes on Windows
    writing the Windows OS repo index (260MB), most of the benefit came by
    bumping the index write buffer size to 64K. I picked 128K to ensure that
    we're past the knee of the curve.
    
    With this change, the time under do_write_index for an index with 3M
    files goes from ~1.02s to ~0.72s.
    
    Signed-off-by: Neeraj Singh neerajsi@ntdev.microsoft.com
    
    Note: This was previously discussed on the mailing list in 2016 at:
    https://lore.kernel.org/git/1458350341-12276-1-git-send-email-dturner@twopensource.com/.
    
    Since then, I believe we have a couple changes:
    
     * 'small' development platforms like raspberry pi have gotten larger
       (4GB RAM).
     * spectre and meltdown make individual system calls more expensive when
       mitigations are enabled
     * there have been many investments to make very large repos scale well
       in git, so huge repos are more common now.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-877%2Fneerajsi-msft%2Fneerajsi%2Findex-buffer-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-877/neerajsi-msft/neerajsi/index-buffer-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/877

 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 29144cf879e7..a5b2779b9586 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2447,7 +2447,7 @@ int repo_index_has_changes(struct repository *repo,
 	}
 }
 
-#define WRITE_BUFFER_SIZE 8192
+#define WRITE_BUFFER_SIZE (128 * 1024)
 static unsigned char write_buffer[WRITE_BUFFER_SIZE];
 static unsigned long write_buffer_len;
 

base-commit: 45526154a57d15947cad7262230d0b935cedb9d3
-- 
gitgitgadget

             reply	other threads:[~2021-02-18  2:50 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-18  2:48 Neeraj K. Singh via GitGitGadget [this message]
2021-02-19 19:12 ` [PATCH] read-cache: make the index write buffer size 128K Jeff Hostetler
2021-02-20  3:28   ` Junio C Hamano
2021-02-20  7:56     ` Neeraj Singh
2021-02-21 12:51       ` Junio C Hamano
2021-02-24 20:56         ` Neeraj Singh
2021-02-25  5:41           ` Junio C Hamano
2021-02-25  6:58             ` Chris Torek
2021-02-25  7:16               ` Junio C Hamano
2021-02-25  7:36                 ` Neeraj Singh
2021-02-25  7:57                   ` Chris Torek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.877.git.1613616506949.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=neerajsi@microsoft.com \
    --cc=neerajsi@ntdev.microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).