git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: Dhruva Krishnamurthy <dhruvakm@gmail.com>,
	John Cai <johncai86@gmail.com>,
	Karthik Nayak <karthik.188@gmail.com>,
	git@vger.kernel.org
Subject: Re: using tree as attribute source is slow, was Re: Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42
Date: Thu, 2 May 2024 13:33:43 -0400	[thread overview]
Message-ID: <ZjPOd83r+tkmsv3o@nand.local> (raw)
In-Reply-To: <ZjLfcCxjLq4o7hpw@nand.local>

On Wed, May 01, 2024 at 08:33:52PM -0400, Taylor Blau wrote:
> On Wed, May 01, 2024 at 06:00:30PM -0400, Jeff King wrote:
> > Bisecting show the culprit is 2386535511 (attr: read attributes from
> > HEAD when bare repo, 2023-10-13), which is in v2.43.0. Before that, a
> > bare repository would only look for attributes in the info/attributes
> > file. But after, we look at the HEAD tree-ish, too. And pack-objects
> > will check the "delta" attribute for every path of every object we are
> > packing. And remember that in-tree lookups for foo/bar/baz require
> > looking not just for .gitattributes, but also foo/.gitattributes,
> > foo/bar/.gitattributes, and foo/bar/baz/.gitattributes.
>
> Thanks for the explanation and bisection. I agree that 2386535511 makes
> sense as a likely culprit given what you wrote here.

Here is one possible approach, which is a partial revert of 2386535511.
I thought about suggesting that we revert 2386535511 entirely, but I
think that may be too strong of an approach especially if there are
plans to otherwise improve the performance of attr lookups with some
caching layer.

Instead, this patch changes the behavior to only fallback to "HEAD" in
bare repositories from check-attr, but leaves pack-objects, archive, and
all other builtins alone.

I should note, this is a pretty hacky approach to use the extern'd
git_attr_tree variable from within the check-attr builtin, but I think
that this does do the trick.

Alternatively, if this is too hacky or magical that check-attr does one
thing but every other command does something else, I would personally be
fine with a full revert of 2386535511.

Anyway, here is the patch:

--- 8< ---

Subject: [PATCH] attr.c: only read attributes from HEAD via check-attr

This patch is a partial revert of commit 2386535511d (attr: read
attributes from HEAD when bare repo, 2023-10-13), which caused Git to
start reading from .gitattributes files from HEAD^{tree} when invoked in
a bare repository.

This patch has an unfortunate side-effect of significantly slowing down
pack-objects, for example, when invoked in a bare repository without
using reachability bitmaps.

Prior to 2386535511d, pack-objects would only look at the
info/attributes file when working in a bare repository. But after,
pack-objects ends up looking at every "delta" attribute not just in the
info/attributes file, but for every .gitattributes file in each tree
recursively from the root down to the dirname of whatever path we're
inspecting. In other words, gathering attributes for path foo/bar/baz
requires reading .gitattributes, foo/.gitattributes,
foo/bar/.gitattributes, and foo/bar/baz/.gitattributes.

Restore the pre-2386535511d behavior for commands other than check-attr
(which was the intended target of the change described in 2386535511d).

If we want to cause pack-objects to use HEAD^{tree} as an attributes
source in bare repositories by default again, it would likely come after
some caching layer to avoid the performance penalty.

Reported-by: Dhruva Krishnamurthy <dhruvakm@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 attr.c                  | 7 -------
 builtin/check-attr.c    | 2 ++
 t/t5001-archive-attr.sh | 2 +-
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/attr.c b/attr.c
index 7c380c17317..33473bdce01 100644
--- a/attr.c
+++ b/attr.c
@@ -1220,17 +1220,10 @@ static void compute_default_attr_source(struct object_id *attr_source)
 	if (!default_attr_source_tree_object_name && git_attr_tree) {
 		default_attr_source_tree_object_name = git_attr_tree;
 		ignore_bad_attr_tree = 1;
 	}

-	if (!default_attr_source_tree_object_name &&
-	    startup_info->have_repository &&
-	    is_bare_repository()) {
-		default_attr_source_tree_object_name = "HEAD";
-		ignore_bad_attr_tree = 1;
-	}
-
 	if (!default_attr_source_tree_object_name || !is_null_oid(attr_source))
 		return;

 	if (repo_get_oid_treeish(the_repository,
 				 default_attr_source_tree_object_name,
diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index c1da1d184e9..9b445fe33c6 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -188,10 +188,12 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)

 	if (source) {
 		if (repo_get_oid_tree(the_repository, source, &initialized_oid))
 			die("%s: not a valid tree-ish source", source);
 		set_git_attr_source(source);
+	} else if (startup_info->have_repository && is_bare_repository()) {
+		git_attr_tree = "HEAD";
 	}

 	if (stdin_paths)
 		check_attr_stdin_paths(prefix, check, all_attrs);
 	else {
diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
index eaf959d8f63..0ff47a239db 100755
--- a/t/t5001-archive-attr.sh
+++ b/t/t5001-archive-attr.sh
@@ -136,11 +136,11 @@ test_expect_success 'git archive with worktree attributes, bare' '
 	(cd bare && git archive --worktree-attributes HEAD) >bare-worktree.tar &&
 	(mkdir bare-worktree && cd bare-worktree && "$TAR" xf -) <bare-worktree.tar
 '

 test_expect_missing	bare-worktree/ignored
-test_expect_missing	bare-worktree/ignored-by-tree
+test_expect_exists	bare-worktree/ignored-by-tree
 test_expect_exists	bare-worktree/ignored-by-worktree

 test_expect_success 'export-subst' '
 	git log "--pretty=format:A${SUBSTFORMAT}O" HEAD >substfile1.expected &&
 	test_cmp nosubstfile archive/nosubstfile &&
--
2.45.0.1.g3e84e921a0a.dirty

--- >8 ---

Thanks,
Taylor


  reply	other threads:[~2024-05-02 17:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01  5:26 Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42 Dhruva Krishnamurthy
2024-05-01 22:00 ` using tree as attribute source is slow, was " Jeff King
2024-05-01 22:37   ` rsbecker
2024-05-01 22:40   ` Junio C Hamano
2024-05-02  0:33   ` Taylor Blau
2024-05-02 17:33     ` Taylor Blau [this message]
2024-05-02 17:44       ` Junio C Hamano
2024-05-02 17:55         ` Taylor Blau
2024-05-02 19:01           ` Karthik Nayak
2024-05-02 21:08             ` Junio C Hamano
2024-05-03  5:37               ` Dhruva Krishnamurthy
2024-05-03 15:34                 ` Re* " Junio C Hamano
2024-05-03 17:46                   ` Jeff King
2024-05-06 20:28                     ` Taylor Blau
2024-05-13 20:16                   ` John Cai
2024-05-02 18:34         ` Dhruva Krishnamurthy
2024-05-02  0:45   ` Dhruva Krishnamurthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjPOd83r+tkmsv3o@nand.local \
    --to=me@ttaylorr.com \
    --cc=dhruvakm@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johncai86@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).