git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Cc: Karthik Nayak <karthik.188@gmail.com>,
	 Taylor Blau <me@ttaylorr.com>, Jeff King <peff@peff.net>,
	 John Cai <johncai86@gmail.com>,
	Dhruva Krishnamurthy <dhruvakm@gmail.com>
Subject: Re* using tree as attribute source is slow, was Re: Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42
Date: Fri, 03 May 2024 08:34:27 -0700	[thread overview]
Message-ID: <xmqqzft6aozg.fsf_-_@gitster.g> (raw)
In-Reply-To: <CAKOHPA==xgRBLXmyURkdZ9X4LqQoBHYy=XD0Q_KTQHbK54DOFg@mail.gmail.com> (Dhruva Krishnamurthy's message of "Thu, 2 May 2024 22:37:59 -0700")

Dhruva Krishnamurthy <dhruvakm@gmail.com> writes:

> On Thu, May 2, 2024 at 2:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>> We could drop [1/2] from the series in the meantime to make it a
>> GitLab installation specific issue where they explicitly use
>> attr.tree to point at HEAD ;-) That is not solving anything for
>> those who set attr.tree (in a sense, they are buying the feature
>> with overhead of reading attributes from the named tree), but at
>> least for most people who are used to seeing the bare repository
>> ignoring the attributes, it would be an improvement to drop the
>> "bare repositories the tree of the HEAD commit is used to look up
>> attributes files by default" half from the series.
>>
>
> A hack (without knowing side effects if any) is to use an empty tree
> for attr source:
> $ git config --add attr.tree $(git hash-object -t tree /dev/null)
>
> This gives me performance comparable to git 2.42

That is clever.  Instead of crawling a potentially large tree that
is at the HEAD of the main project payload to find ".gitattributes"
files that may be relevant (and often not), folks can set an empty
tree to attr.tree to the configuration until this gets corrected.

And for folks who had been happy with the pre 2.42 behaviour,
we could do something like the attached as the first step to a real fix.

----- >8 --------- >8 --------- >8 --------- >8 -----
Subject: [PATCH] stop using HEAD for attributes in bare repository by default

With 23865355 (attr: read attributes from HEAD when bare repo,
2023-10-13), we started to use the HEAD tree as the default
attribute source in a bare repository.  One argument for such a
behaviour is that it would make things like "git archive" run in
bare and non-bare repositories for the same commit consistent.
This changes was merged to Git 2.43 but without an explicit mention
in its release notes.

It turns out that this change destroys performance of shallowly
cloning from a bare repository.  As the "server" installations are
expected to be mostly bare, and "git pack-objects", which is the
core of driving the other side of "git clone" and "git fetch" wants
to see if a path is set not to delta with blobs from other paths via
the attribute system, the change forces the server side to traverse
the tree of the HEAD commit needlessly to find if each and every
paths the objects it sends out has the attribute that controls the
deltification.  Given that (1) most projects do not configure such
an attribute, and (2) it is dubious for the server side to honor
such an end-user supplied attribute anyway, this was a poor choice
of the default.

To mitigate the current situation, let's revert the change that uses
the tree of HEAD in a bare repository by default as the attribute
source.  This will help most people who have been happy with the
behaviour of Git 2.42 and before.

Two things to note:

 * If you are stuck with versions of Git 2.43 or newer, that is
   older than the release this fix appears in, you can explicitly
   set the attr.tree configuration variable to point at an empty
   tree object, i.e.

	$ git config attr.tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904

 * If you like the behaviour we are reverting, you can explicitly
   set the attr.tree configuration variable to HEAD, i.e.

	$ git config attr.tree HEAD

The right fix for this is to optimize the code paths that allow
accesses to attributes in tree objects, but that is a much more
involved change and is left as a longer-term project, outside the
scope of this "first step" fix.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 attr.c                  |  7 -------
 t/t0003-attributes.sh   | 10 ++++++++--
 t/t5001-archive-attr.sh |  3 ++-
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git c/attr.c w/attr.c
index 679e42258c..6af7151088 100644
--- c/attr.c
+++ w/attr.c
@@ -1223,13 +1223,6 @@ static void compute_default_attr_source(struct object_id *attr_source)
 		ignore_bad_attr_tree = 1;
 	}
 
-	if (!default_attr_source_tree_object_name &&
-	    startup_info->have_repository &&
-	    is_bare_repository()) {
-		default_attr_source_tree_object_name = "HEAD";
-		ignore_bad_attr_tree = 1;
-	}
-
 	if (!default_attr_source_tree_object_name || !is_null_oid(attr_source))
 		return;
 
diff --git c/t/t0003-attributes.sh w/t/t0003-attributes.sh
index 774b52c298..d755cc3c29 100755
--- c/t/t0003-attributes.sh
+++ w/t/t0003-attributes.sh
@@ -398,13 +398,19 @@ test_expect_success 'bad attr source defaults to reading .gitattributes file' '
 	)
 '
 
-test_expect_success 'bare repo defaults to reading .gitattributes from HEAD' '
+test_expect_success 'bare repo no longer defaults to reading .gitattributes from HEAD' '
 	test_when_finished rm -rf test bare_with_gitattribute &&
 	git init test &&
 	test_commit -C test gitattributes .gitattributes "f/path test=val" &&
 	git clone --bare test bare_with_gitattribute &&
-	echo "f/path: test: val" >expect &&
+
+	echo "f/path: test: unspecified" >expect &&
 	git -C bare_with_gitattribute check-attr test -- f/path >actual &&
+	test_cmp expect actual &&
+
+	echo "f/path: test: val" >expect &&
+	git -C bare_with_gitattribute -c attr.tree=HEAD \
+		check-attr test -- f/path >actual &&
 	test_cmp expect actual
 '
 
diff --git c/t/t5001-archive-attr.sh w/t/t5001-archive-attr.sh
index eaf959d8f6..7310774af5 100755
--- c/t/t5001-archive-attr.sh
+++ w/t/t5001-archive-attr.sh
@@ -133,7 +133,8 @@ test_expect_success 'git archive vs. bare' '
 '
 
 test_expect_success 'git archive with worktree attributes, bare' '
-	(cd bare && git archive --worktree-attributes HEAD) >bare-worktree.tar &&
+	(cd bare &&
+	git -c attr.tree=HEAD archive --worktree-attributes HEAD) >bare-worktree.tar &&
 	(mkdir bare-worktree && cd bare-worktree && "$TAR" xf -) <bare-worktree.tar
 '
 


  reply	other threads:[~2024-05-03 15:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01  5:26 Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42 Dhruva Krishnamurthy
2024-05-01 22:00 ` using tree as attribute source is slow, was " Jeff King
2024-05-01 22:37   ` rsbecker
2024-05-01 22:40   ` Junio C Hamano
2024-05-02  0:33   ` Taylor Blau
2024-05-02 17:33     ` Taylor Blau
2024-05-02 17:44       ` Junio C Hamano
2024-05-02 17:55         ` Taylor Blau
2024-05-02 19:01           ` Karthik Nayak
2024-05-02 21:08             ` Junio C Hamano
2024-05-03  5:37               ` Dhruva Krishnamurthy
2024-05-03 15:34                 ` Junio C Hamano [this message]
2024-05-03 17:46                   ` Re* " Jeff King
2024-05-06 20:28                     ` Taylor Blau
2024-05-13 20:16                   ` John Cai
2024-05-02 18:34         ` Dhruva Krishnamurthy
2024-05-02  0:45   ` Dhruva Krishnamurthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqzft6aozg.fsf_-_@gitster.g \
    --to=gitster@pobox.com \
    --cc=dhruvakm@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johncai86@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).