git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Benjamin Flesch <benjaminflesch@icloud.com>
Subject: [PATCH 7/9] upload-pack: always turn off save_commit_buffer
Date: Wed, 28 Feb 2024 17:39:00 -0500	[thread overview]
Message-ID: <20240228223900.GG1158131@coredump.intra.peff.net> (raw)
In-Reply-To: <20240228223700.GA1157826@coredump.intra.peff.net>

When the client sends us "want $oid" lines, we call parse_object($oid)
to get an object struct. It's important to parse the commits because we
need to traverse them in the negotiation phase. But of course we don't
need to hold on to the commit messages for each one.

We've turned off the save_commit_buffer flag in get_common_commits() for
a long time, since f0243f26f6 (git-upload-pack: More efficient usage of
the has_sha1 array, 2005-10-28). That helps with the commits we see
while actually traversing. But:

  1. That function is only used by the v0 protocol. I think the v2
     protocol's code path leaves the flag on (and thus pays the extra
     memory penalty), though I didn't measure it specifically.

  2. If the client sends us a bunch of "want" lines, that happens before
     the negotiation phase. So we'll hold on to all of those commit
     messages. Generally the number of "want" lines scales with the
     refs, not with the number of objects in the repo. But a malicious
     client could send a lot in order to waste memory.

As an example of (2), if I generate a request to fetch all commits in
git.git like this:

  pktline() {
    local msg="$*"
    printf "%04x%s\n" $((1+4+${#msg})) "$msg"
  }

  want_commits() {
    pktline command=fetch
    printf 0001
    git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
      while read oid type; do
        test "$type" = "commit" || continue
        pktline want $oid
      done
      pktline done
      printf 0000
  }

  want_commits | GIT_PROTOCOL=version=2 valgrind --tool=massif git-upload-pack . >/dev/null

before this patch upload-pack peaks at ~125MB, and after at ~35MB. The
difference is not coincidentally about the same as the sum of all commit
object sizes as computed by:

  git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectsize)' |
  perl -alne '$v += $F[1] if $F[0] eq "commit"; END { print $v }'

In a larger repository like linux.git, that number is ~1GB.

In a repository with a full commit-graph file this will have no impact
(and the commit graph would save us from parsing at all, so is a much
better solution!). But it's easy to do, might help a little in
real-world cases (where even if you have a commit graph it might not be
fully up to date), and helps a lot for a worst-case malicious request.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/upload-pack.c | 2 ++
 upload-pack.c         | 2 --
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
index 9b021ef026..15afb97260 100644
--- a/builtin/upload-pack.c
+++ b/builtin/upload-pack.c
@@ -8,6 +8,7 @@
 #include "replace-object.h"
 #include "upload-pack.h"
 #include "serve.h"
+#include "commit.h"
 
 static const char * const upload_pack_usage[] = {
 	N_("git-upload-pack [--[no-]strict] [--timeout=<n>] [--stateless-rpc]\n"
@@ -37,6 +38,7 @@ int cmd_upload_pack(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("upload-pack");
 	disable_replace_refs();
+	save_commit_buffer = 0;
 
 	argc = parse_options(argc, argv, prefix, options, upload_pack_usage, 0);
 
diff --git a/upload-pack.c b/upload-pack.c
index 2a5c52666e..3970bb9b30 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -526,8 +526,6 @@ static int get_common_commits(struct upload_pack_data *data,
 	int got_other = 0;
 	int sent_ready = 0;
 
-	save_commit_buffer = 0;
-
 	for (;;) {
 		const char *arg;
 
-- 
2.44.0.rc2.424.gbdbf4d014b



  parent reply	other threads:[~2024-02-28 22:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 22:37 [PATCH 0/9] bound upload-pack memory allocations Jeff King
2024-02-28 22:37 ` [PATCH 1/9] upload-pack: drop separate v2 "haves" array Jeff King
2024-02-28 22:37 ` [PATCH 2/9] upload-pack: switch deepen-not list to an oid_array Jeff King
2024-02-28 22:37 ` [PATCH 3/9] upload-pack: use oidset for deepen_not list Jeff King
2024-02-28 22:38 ` [PATCH 4/9] upload-pack: use a strmap for want-ref lines Jeff King
2024-02-28 22:38 ` [PATCH 5/9] upload-pack: accept only a single packfile-uri line Jeff King
2024-02-28 22:38 ` [PATCH 6/9] upload-pack: disallow object-info capability by default Jeff King
2024-03-04  8:34   ` Patrick Steinhardt
2024-03-04  9:59     ` Jeff King
2024-04-29 20:49       ` Taylor Blau
2024-02-28 22:39 ` Jeff King [this message]
2024-02-28 22:39 ` [PATCH 8/9] upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places Jeff King
2024-02-28 22:39 ` [PATCH 9/9] upload-pack: free tree buffers after parsing Jeff King
2024-03-04  8:33   ` Patrick Steinhardt
2024-03-04  9:57     ` Jeff King
2024-03-04 10:00       ` Patrick Steinhardt
2024-02-28 22:47 ` [PATCH 0/9] bound upload-pack memory allocations Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240228223900.GG1158131@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=benjaminflesch@icloud.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).