git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] Clean-up around get_x_ish()
@ 2020-09-25  5:59 Junio C Hamano
  2020-09-25  5:59 ` [PATCH 1/4] t8013: minimum preparatory clean-up Junio C Hamano
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-09-25  5:59 UTC (permalink / raw)
  To: git

Here is to clean some code I noticed while auditing the use of
get_committish() and get_treeish() API functions.

The main topic is to tighten error checking of "blame --ignore-rev"
and "blame --ignore-revs-list" arguments, which were not checked all
that much.  We make sure the ignore revs are committish objects, and
also peel tags pointing at commits down to commits before using them.
The breakage in the original is demonstrated by the tests added and/or
tweaked in the second patch.

The last two patches are icing on the cake.

Junio C Hamano (4):
  t8013: minimum preparatory clean-up
  blame: validate and peel the object names on the ignore list
  t1506: rev-parse A..B and A...B
  sequencer: stop abbreviating stopped-sha file

 builtin/blame.c                | 27 +++++++++++++--
 oidset.c                       |  9 ++++-
 oidset.h                       |  9 +++++
 sequencer.c                    | 11 +++---
 t/t1506-rev-parse-diagnosis.sh | 18 ++++++++++
 t/t8013-blame-ignore-revs.sh   | 61 ++++++++++++++++++++++------------
 6 files changed, 105 insertions(+), 30 deletions(-)

-- 
2.28.0-718-gd8d5e3da39


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] t8013: minimum preparatory clean-up
  2020-09-25  5:59 [PATCH 0/4] Clean-up around get_x_ish() Junio C Hamano
@ 2020-09-25  5:59 ` Junio C Hamano
  2020-09-25  5:59 ` [PATCH 2/4] blame: validate and peel the object names on the ignore list Junio C Hamano
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-09-25  5:59 UTC (permalink / raw)
  To: git

The closing sq for each test piece should be placed at the beginning
of line.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t8013-blame-ignore-revs.sh | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/t/t8013-blame-ignore-revs.sh b/t/t8013-blame-ignore-revs.sh
index 36dc31eb39..67de83ae2b 100755
--- a/t/t8013-blame-ignore-revs.sh
+++ b/t/t8013-blame-ignore-revs.sh
@@ -31,7 +31,7 @@ test_expect_success setup '
 	grep -E "^[0-9a-f]+ [0-9]+ 2" blame_raw | sed -e "s/ .*//" >actual &&
 	git rev-parse X >expect &&
 	test_cmp expect actual
-	'
+'
 
 # Ignore X, make sure A is blamed for line 1 and B for line 2.
 test_expect_success ignore_rev_changing_lines '
@@ -44,7 +44,7 @@ test_expect_success ignore_rev_changing_lines '
 	grep -E "^[0-9a-f]+ [0-9]+ 2" blame_raw | sed -e "s/ .*//" >actual &&
 	git rev-parse B >expect &&
 	test_cmp expect actual
-	'
+'
 
 # For ignored revs that have added 'unblamable' lines, attribute those to the
 # ignored commit.
@@ -67,7 +67,7 @@ test_expect_success ignore_rev_adding_unblamable_lines '
 
 	grep -E "^[0-9a-f]+ [0-9]+ 4" blame_raw | sed -e "s/ .*//" >actual &&
 	test_cmp expect actual
-	'
+'
 
 # Ignore X and Y, both in separate files.  Lines 1 == A, 2 == B.
 test_expect_success ignore_revs_from_files '
@@ -82,7 +82,7 @@ test_expect_success ignore_revs_from_files '
 	grep -E "^[0-9a-f]+ [0-9]+ 2" blame_raw | sed -e "s/ .*//" >actual &&
 	git rev-parse B >expect &&
 	test_cmp expect actual
-	'
+'
 
 # Ignore X from the config option, Y from a file.
 test_expect_success ignore_revs_from_configs_and_files '
@@ -96,7 +96,7 @@ test_expect_success ignore_revs_from_configs_and_files '
 	grep -E "^[0-9a-f]+ [0-9]+ 2" blame_raw | sed -e "s/ .*//" >actual &&
 	git rev-parse B >expect &&
 	test_cmp expect actual
-	'
+'
 
 # Override blame.ignoreRevsFile (ignore_x) with an empty string.  X should be
 # blamed now for lines 1 and 2, since we are no longer ignoring X.
@@ -120,7 +120,7 @@ test_expect_success bad_files_and_revs '
 	echo NOREV >ignore_norev &&
 	test_must_fail git blame file --ignore-revs-file ignore_norev 2>err &&
 	test_i18ngrep "invalid object name: NOREV" err
-	'
+'
 
 # For ignored revs that have added 'unblamable' lines, mark those lines with a
 # '*'
@@ -138,7 +138,7 @@ test_expect_success mark_unblamable_lines '
 
 	sed -n "4p" blame_raw | cut -c1 >actual &&
 	test_cmp expect actual
-	'
+'
 
 # Commit Z will touch the first two lines.  Y touched all four.
 # 	A--B--X--Y--Z
@@ -171,7 +171,7 @@ test_expect_success mark_ignored_lines '
 
 	sed -n "4p" blame_raw | cut -c1 >actual &&
 	! test_cmp expect actual
-	'
+'
 
 # For ignored revs that added 'unblamable' lines and more recent commits changed
 # the blamable lines, mark the unblamable lines with a
@@ -190,7 +190,7 @@ test_expect_success mark_unblamable_lines_intermediate '
 
 	sed -n "4p" blame_raw | cut -c1 >actual &&
 	test_cmp expect actual
-	'
+'
 
 # The heuristic called by guess_line_blames() tries to find the size of a
 # blame_entry 'e' in the parent's address space.  Those calculations need to
@@ -227,7 +227,7 @@ test_expect_success ignored_chunk_negative_parent_size '
 	git tag C &&
 
 	git blame file --ignore-rev B >blame_raw
-	'
+'
 
 # Resetting the repo and creating:
 #
@@ -269,6 +269,6 @@ test_expect_success ignore_merge '
 	grep -E "^[0-9a-f]+ [0-9]+ 9" blame_raw | sed -e "s/ .*//" >actual &&
 	git rev-parse C >expect &&
 	test_cmp expect actual
-	'
+'
 
 test_done
-- 
2.28.0-718-gd8d5e3da39


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/4] blame: validate and peel the object names on the ignore list
  2020-09-25  5:59 [PATCH 0/4] Clean-up around get_x_ish() Junio C Hamano
  2020-09-25  5:59 ` [PATCH 1/4] t8013: minimum preparatory clean-up Junio C Hamano
@ 2020-09-25  5:59 ` Junio C Hamano
  2020-09-26 16:23   ` René Scharfe
  2020-09-25  5:59 ` [PATCH 3/4] t1506: rev-parse A..B and A...B Junio C Hamano
  2020-09-25  5:59 ` [PATCH 4/4] sequencer: stop abbreviating stopped-sha file Junio C Hamano
  3 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2020-09-25  5:59 UTC (permalink / raw)
  To: git

The command reads list of object names to place on the ignore list
either from the command line or from a file, but they are not
checked with their object type (those read from the file are not
even checked for object existence).

Extend the oidset_parse_file() API and allow it to take a callback
that can be used to die (e.g. when an inappropriate input is read)
or modify the object name read (e.g. when a tag pointing at a commit
is read, and the caller wants a commit object name), and use it in
the code that handles ignore list.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/blame.c              | 27 ++++++++++++++++++++++++--
 oidset.c                     |  9 ++++++++-
 oidset.h                     |  9 +++++++++
 t/t8013-blame-ignore-revs.sh | 37 ++++++++++++++++++++++++++----------
 4 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 94ef57c1cc..baa5d979cc 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -27,6 +27,7 @@
 #include "object-store.h"
 #include "blame.h"
 #include "refs.h"
+#include "tag.h"
 
 static char blame_usage[] = N_("git blame [<options>] [<rev-opts>] [<rev>] [--] <file>");
 
@@ -803,6 +804,26 @@ static int is_a_rev(const char *name)
 	return OBJ_NONE < oid_object_info(the_repository, &oid, NULL);
 }
 
+static int peel_to_commit_oid(struct object_id *oid_ret, void *cbdata)
+{
+	struct repository *r = ((struct blame_scoreboard *)cbdata)->repo;
+	struct object_id oid;
+
+	oidcpy(&oid, oid_ret);
+	while (1) {
+		struct object *obj;
+		int kind = oid_object_info(r, &oid, NULL);
+		if (kind == OBJ_COMMIT) {
+			oidcpy(oid_ret, &oid);
+			return 0;
+		}
+		if (kind != OBJ_TAG)
+			return -1;
+		obj = deref_tag(r, parse_object(r, &oid), NULL, 0);
+		oidcpy(&oid, &obj->oid);
+	}
+}
+
 static void build_ignorelist(struct blame_scoreboard *sb,
 			     struct string_list *ignore_revs_file_list,
 			     struct string_list *ignore_rev_list)
@@ -815,10 +836,12 @@ static void build_ignorelist(struct blame_scoreboard *sb,
 		if (!strcmp(i->string, ""))
 			oidset_clear(&sb->ignore_list);
 		else
-			oidset_parse_file(&sb->ignore_list, i->string);
+			oidset_parse_file_carefully(&sb->ignore_list, i->string,
+						    peel_to_commit_oid, sb);
 	}
 	for_each_string_list_item(i, ignore_rev_list) {
-		if (get_oid_committish(i->string, &oid))
+		if (get_oid_committish(i->string, &oid) ||
+		    peel_to_commit_oid(&oid, sb))
 			die(_("cannot find revision %s to ignore"), i->string);
 		oidset_insert(&sb->ignore_list, &oid);
 	}
diff --git a/oidset.c b/oidset.c
index 15d4e18c37..2d0ab76fb5 100644
--- a/oidset.c
+++ b/oidset.c
@@ -42,6 +42,12 @@ int oidset_size(struct oidset *set)
 }
 
 void oidset_parse_file(struct oidset *set, const char *path)
+{
+	oidset_parse_file_carefully(set, path, NULL, NULL);
+}
+
+void oidset_parse_file_carefully(struct oidset *set, const char *path,
+				 oidset_parse_tweak_fn fn, void *cbdata)
 {
 	FILE *fp;
 	struct strbuf sb = STRBUF_INIT;
@@ -66,7 +72,8 @@ void oidset_parse_file(struct oidset *set, const char *path)
 		if (!sb.len)
 			continue;
 
-		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
+		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0' ||
+		    (fn && fn(&oid, cbdata)))
 			die("invalid object name: %s", sb.buf);
 		oidset_insert(set, &oid);
 	}
diff --git a/oidset.h b/oidset.h
index 209ae7a173..01f6560283 100644
--- a/oidset.h
+++ b/oidset.h
@@ -73,6 +73,15 @@ void oidset_clear(struct oidset *set);
  */
 void oidset_parse_file(struct oidset *set, const char *path);
 
+/*
+ * Similar to the above, but with a callback which can (1) return non-zero to
+ * signal displeasure with the object and (2) replace object ID with something
+ * else (meant to be used to "peel").
+ */
+typedef int (*oidset_parse_tweak_fn)(struct object_id *, void *);
+void oidset_parse_file_carefully(struct oidset *set, const char *path,
+				 oidset_parse_tweak_fn fn, void *cbdata);
+
 struct oidset_iter {
 	kh_oid_set_t *set;
 	khiter_t iter;
diff --git a/t/t8013-blame-ignore-revs.sh b/t/t8013-blame-ignore-revs.sh
index 67de83ae2b..24ae5018e8 100755
--- a/t/t8013-blame-ignore-revs.sh
+++ b/t/t8013-blame-ignore-revs.sh
@@ -21,6 +21,7 @@ test_expect_success setup '
 	test_tick &&
 	git commit -m X &&
 	git tag X &&
+	git tag -a -m "X (annotated)" XT &&
 
 	git blame --line-porcelain file >blame_raw &&
 
@@ -33,19 +34,35 @@ test_expect_success setup '
 	test_cmp expect actual
 '
 
-# Ignore X, make sure A is blamed for line 1 and B for line 2.
-test_expect_success ignore_rev_changing_lines '
-	git blame --line-porcelain --ignore-rev X file >blame_raw &&
-
-	grep -E "^[0-9a-f]+ [0-9]+ 1" blame_raw | sed -e "s/ .*//" >actual &&
-	git rev-parse A >expect &&
-	test_cmp expect actual &&
+# Ensure bogus --ignore-rev requests are caught
+test_expect_success 'validate --ignore-rev' '
+	test_must_fail git blame --ignore-rev X^{tree} file
+'
 
-	grep -E "^[0-9a-f]+ [0-9]+ 2" blame_raw | sed -e "s/ .*//" >actual &&
-	git rev-parse B >expect &&
-	test_cmp expect actual
+# Ensure bogus --ignore-revs-file requests are caught
+test_expect_success 'validate --ignore-revs-file' '
+	git rev-parse X^{tree} >ignore_x &&
+	test_must_fail git blame --ignore-revs-file ignore_x file
 '
 
+for I in X XT
+do
+	# Ignore X (or XT), make sure A is blamed for line 1 and B for line 2.
+	# Giving X (i.e. commit) and XT (i.e. annotated tag to commit) should
+	# produce the same result.
+	test_expect_success "ignore_rev_changing_lines ($I)" '
+		git blame --line-porcelain --ignore-rev $I file >blame_raw &&
+
+		grep -E "^[0-9a-f]+ [0-9]+ 1" blame_raw | sed -e "s/ .*//" >actual &&
+		git rev-parse A >expect &&
+		test_cmp expect actual &&
+
+		grep -E "^[0-9a-f]+ [0-9]+ 2" blame_raw | sed -e "s/ .*//" >actual &&
+		git rev-parse B >expect &&
+		test_cmp expect actual
+	'
+done
+
 # For ignored revs that have added 'unblamable' lines, attribute those to the
 # ignored commit.
 # 	A--B--X--Y
-- 
2.28.0-718-gd8d5e3da39


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/4] t1506: rev-parse A..B and A...B
  2020-09-25  5:59 [PATCH 0/4] Clean-up around get_x_ish() Junio C Hamano
  2020-09-25  5:59 ` [PATCH 1/4] t8013: minimum preparatory clean-up Junio C Hamano
  2020-09-25  5:59 ` [PATCH 2/4] blame: validate and peel the object names on the ignore list Junio C Hamano
@ 2020-09-25  5:59 ` Junio C Hamano
  2020-09-25  5:59 ` [PATCH 4/4] sequencer: stop abbreviating stopped-sha file Junio C Hamano
  3 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-09-25  5:59 UTC (permalink / raw)
  To: git

Because these constructs can be used to parse user input to be
passed to rev-list --objects, e.g.

	range=$(git rev-parse v1.0..v2.0) &&
	git rev-list --objects $range | git pack-objects --stdin

the endpoints (v1.0 and v2.0 in the example) are shown without
peeling them to underlying commits, even when they are annotated
tags.  Make sure it stays that way.

While at it, ensure "rev-parse A...B" also keeps the endpoints A and
B unpeeled, even though the negative side (i.e. the merge-base
between A and B) has to become a commit.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t1506-rev-parse-diagnosis.sh | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index dbf690b9c1..3e657e693b 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -190,6 +190,24 @@ test_expect_success 'dotdot is not an empty set' '
 	test_cmp expect actual
 '
 
+test_expect_success 'dotdot does not peel endpoints' '
+	git tag -a -m "annote" annotated HEAD &&
+	A=$(git rev-parse annotated) &&
+	H=$(git rev-parse annotated^0) &&
+	{
+		echo $A && echo ^$A
+	} >expect-with-two-dots &&
+	{
+		echo $A && echo $A && echo ^$H
+	} >expect-with-merge-base &&
+
+	git rev-parse annotated..annotated >actual-with-two-dots &&
+	test_cmp expect-with-two-dots actual-with-two-dots &&
+
+	git rev-parse annotated...annotated >actual-with-merge-base &&
+	test_cmp expect-with-merge-base actual-with-merge-base
+'
+
 test_expect_success 'arg before dashdash must be a revision (missing)' '
 	test_must_fail git rev-parse foobar -- 2>stderr &&
 	test_i18ngrep "bad revision" stderr
-- 
2.28.0-718-gd8d5e3da39


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] sequencer: stop abbreviating stopped-sha file
  2020-09-25  5:59 [PATCH 0/4] Clean-up around get_x_ish() Junio C Hamano
                   ` (2 preceding siblings ...)
  2020-09-25  5:59 ` [PATCH 3/4] t1506: rev-parse A..B and A...B Junio C Hamano
@ 2020-09-25  5:59 ` Junio C Hamano
  3 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-09-25  5:59 UTC (permalink / raw)
  To: git

The object name written to this file is not exposed to end-users and
the only reader of this file immediately expands it back to a full
object name.  Stop abbreviating while writing, and expect a full
object name while reading, which simplifies the code a bit.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 sequencer.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index fd7701c88a..7dc9088d09 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -120,7 +120,7 @@ static GIT_PATH_FUNC(rebase_path_author_script, "rebase-merge/author-script")
 static GIT_PATH_FUNC(rebase_path_amend, "rebase-merge/amend")
 /*
  * When we stop at a given patch via the "edit" command, this file contains
- * the abbreviated commit name of the corresponding patch.
+ * the commit object name of the corresponding patch.
  */
 static GIT_PATH_FUNC(rebase_path_stopped_sha, "rebase-merge/stopped-sha")
 /*
@@ -3012,11 +3012,12 @@ static int make_patch(struct repository *r,
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct rev_info log_tree_opt;
-	const char *subject, *p;
+	const char *subject;
+	char hex[GIT_MAX_HEXSZ + 1];
 	int res = 0;
 
-	p = short_commit_name(commit);
-	if (write_message(p, strlen(p), rebase_path_stopped_sha(), 1) < 0)
+	oid_to_hex_r(hex, &commit->object.oid);
+	if (write_message(hex, strlen(hex), rebase_path_stopped_sha(), 1) < 0)
 		return -1;
 	res |= write_rebase_head(&commit->object.oid);
 
@@ -4396,7 +4397,7 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 
 		if (read_oneliner(&buf, rebase_path_stopped_sha(),
 				  READ_ONELINER_SKIP_IF_EMPTY) &&
-		    !get_oid_committish(buf.buf, &oid))
+		    !get_oid_hex(buf.buf, &oid))
 			record_in_rewritten(&oid, peek_command(&todo_list, 0));
 		strbuf_release(&buf);
 	}
-- 
2.28.0-718-gd8d5e3da39


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] blame: validate and peel the object names on the ignore list
  2020-09-25  5:59 ` [PATCH 2/4] blame: validate and peel the object names on the ignore list Junio C Hamano
@ 2020-09-26 16:23   ` René Scharfe
  2020-09-26 17:06     ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: René Scharfe @ 2020-09-26 16:23 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Barret Rhoden

Am 25.09.20 um 07:59 schrieb Junio C Hamano:
> The command reads list of object names to place on the ignore list
> either from the command line or from a file, but they are not
> checked with their object type (those read from the file are not
> even checked for object existence).
>
> Extend the oidset_parse_file() API and allow it to take a callback
> that can be used to die (e.g. when an inappropriate input is read)
> or modify the object name read (e.g. when a tag pointing at a commit
> is read, and the caller wants a commit object name), and use it in
> the code that handles ignore list.

What's the benefit of such a check?  Ignoring a non-existing or
type-mismatched object is really easy -- no actual effort is required to
fulfill that request.

When I request "Don't eat any glue!", perfectly human responses could be
"But I don't have any glue!" or "It doesn't even taste that good.", but
I'd expect a computer program to act I bit more logical and just don't
do it, without talking back.  Maybe that's just me.

(I had been bitten by a totally different software adding such a check,
which made it complain about my long catch-all ignore list, and I had to
craft and maintain a specific "clean" list for each deployment --
perhaps I'm still bitter about that.)

>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/blame.c              | 27 ++++++++++++++++++++++++--
>  oidset.c                     |  9 ++++++++-
>  oidset.h                     |  9 +++++++++
>  t/t8013-blame-ignore-revs.sh | 37 ++++++++++++++++++++++++++----------
>  4 files changed, 69 insertions(+), 13 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 94ef57c1cc..baa5d979cc 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -27,6 +27,7 @@
>  #include "object-store.h"
>  #include "blame.h"
>  #include "refs.h"
> +#include "tag.h"
>
>  static char blame_usage[] = N_("git blame [<options>] [<rev-opts>] [<rev>] [--] <file>");
>
> @@ -803,6 +804,26 @@ static int is_a_rev(const char *name)
>  	return OBJ_NONE < oid_object_info(the_repository, &oid, NULL);
>  }
>
> +static int peel_to_commit_oid(struct object_id *oid_ret, void *cbdata)
> +{
> +	struct repository *r = ((struct blame_scoreboard *)cbdata)->repo;
> +	struct object_id oid;
> +
> +	oidcpy(&oid, oid_ret);
> +	while (1) {
> +		struct object *obj;
> +		int kind = oid_object_info(r, &oid, NULL);
> +		if (kind == OBJ_COMMIT) {
> +			oidcpy(oid_ret, &oid);

At that point we know it's an object, but cast it up to the most generic
class we have -- an object ID.  We could have set an object flag to mark
it ignored instead, which would be trivial to check later.  On the other
hand it probably wouldn't make much of a difference -- hashmaps are
pretty fast, and blame has lots of things to do beyond ignoring commits.

> +			return 0;
> +		}
> +		if (kind != OBJ_TAG)
> +			return -1;
> +		obj = deref_tag(r, parse_object(r, &oid), NULL, 0);
> +		oidcpy(&oid, &obj->oid);
> +	}
> +}
> +
>  static void build_ignorelist(struct blame_scoreboard *sb,
>  			     struct string_list *ignore_revs_file_list,
>  			     struct string_list *ignore_rev_list)
> @@ -815,10 +836,12 @@ static void build_ignorelist(struct blame_scoreboard *sb,
>  		if (!strcmp(i->string, ""))
>  			oidset_clear(&sb->ignore_list);

This preexisting feature is curious.  It's even documented ('An empty
file name, "", will clear the list of revs from previously processed
files.') and covered by t8013.6.  Why would we need such magic in
addition to the standard negation (--no-ignore-revs-file) for clearing
the list?  The latter counters blame.ignoreRevsFile as well. *puzzled*

>  		else
> -			oidset_parse_file(&sb->ignore_list, i->string);
> +			oidset_parse_file_carefully(&sb->ignore_list, i->string,
> +						    peel_to_commit_oid, sb);
>  	}
>  	for_each_string_list_item(i, ignore_rev_list) {
> -		if (get_oid_committish(i->string, &oid))
> +		if (get_oid_committish(i->string, &oid) ||
> +		    peel_to_commit_oid(&oid, sb))
>  			die(_("cannot find revision %s to ignore"), i->string);
>  		oidset_insert(&sb->ignore_list, &oid);
>  	}
> diff --git a/oidset.c b/oidset.c
> index 15d4e18c37..2d0ab76fb5 100644
> --- a/oidset.c
> +++ b/oidset.c
> @@ -42,6 +42,12 @@ int oidset_size(struct oidset *set)
>  }
>
>  void oidset_parse_file(struct oidset *set, const char *path)
> +{
> +	oidset_parse_file_carefully(set, path, NULL, NULL);
> +}
> +
> +void oidset_parse_file_carefully(struct oidset *set, const char *path,
> +				 oidset_parse_tweak_fn fn, void *cbdata)
>  {
>  	FILE *fp;
>  	struct strbuf sb = STRBUF_INIT;
> @@ -66,7 +72,8 @@ void oidset_parse_file(struct oidset *set, const char *path)
>  		if (!sb.len)
>  			continue;
>
> -		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
> +		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0' ||
> +		    (fn && fn(&oid, cbdata)))

OK, so this turns the basic all-I-know-is-hashes oidset loader into a
flexible higher-order map function.  Fun, but wise?  Can't make up my
mind.

René

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] blame: validate and peel the object names on the ignore list
  2020-09-26 16:23   ` René Scharfe
@ 2020-09-26 17:06     ` Junio C Hamano
  2020-09-26 23:58       ` Junio C Hamano
  2020-09-28 13:26       ` Barret Rhoden
  0 siblings, 2 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-09-26 17:06 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Barret Rhoden

René Scharfe <l.s.r@web.de> writes:

> When I request "Don't eat any glue!", perfectly human responses could be
> "But I don't have any glue!" or "It doesn't even taste that good.", but
> I'd expect a computer program to act I bit more logical and just don't
> do it, without talking back.  Maybe that's just me.
>
> (I had been bitten by a totally different software adding such a check,
> which made it complain about my long catch-all ignore list, and I had to
> craft and maintain a specific "clean" list for each deployment --
> perhaps I'm still bitter about that.)

A user who says "ignore v2.3", sees that the commit pointed at by
that release tag is not ignored, comes here to complain, and is told
to write v2.3^0 instead, would not be happy.  It is a mistake easy
to catch to help users, so I am more for than against that part of
the change.  I am completely neutral about "you told me to ignore
this, but as far as I can tell it does not even exist---did you 
screw up when you prepared the list of stuff to ignore?" part.  I do
not mind seeing it removed.

>> +		if (kind == OBJ_COMMIT) {
>> +			oidcpy(oid_ret, &oid);
>
> At that point we know it's an object, but cast it up to the most generic
> class we have -- an object ID.  We could have set an object flag to mark
> it ignored instead, which would be trivial to check later.  On the other
> hand it probably wouldn't make much of a difference -- hashmaps are
> pretty fast, and blame has lots of things to do beyond ignoring commits.

Quite honestly, I am not interested in the "blame --ignore" feature
itself.  It is good that you CC'ed Barret so that such an
improvement suggestion would be heard by the right party ;-).

>> @@ -815,10 +836,12 @@ static void build_ignorelist(struct blame_scoreboard *sb,
>>  		if (!strcmp(i->string, ""))
>>  			oidset_clear(&sb->ignore_list);
>
> This preexisting feature is curious.  It's even documented ('An empty
> file name, "", will clear the list of revs from previously processed
> files.') and covered by t8013.6.  Why would we need such magic in
> addition to the standard negation (--no-ignore-revs-file) for clearing
> the list?  The latter counters blame.ignoreRevsFile as well. *puzzled*

I shared the puzzlement when I saw it, but ditto.

>> +void oidset_parse_file_carefully(struct oidset *set, const char *path,
>> +				 oidset_parse_tweak_fn fn, void *cbdata)
>>  {
>>  	FILE *fp;
>>  	struct strbuf sb = STRBUF_INIT;
>> @@ -66,7 +72,8 @@ void oidset_parse_file(struct oidset *set, const char *path)
>>  		if (!sb.len)
>>  			continue;
>>
>> -		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
>> +		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0' ||
>> +		    (fn && fn(&oid, cbdata)))
>
> OK, so this turns the basic all-I-know-is-hashes oidset loader into a
> flexible higher-order map function.  Fun, but wise?  Can't make up my
> mind.

Fun and probably useful.  It is a different matter if it is wise to
use it to (1) peel tags to commits and (2) fail on an nonexistent
object.  My take on them is (1) is probably true, and (2) is Meh ;-)

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] blame: validate and peel the object names on the ignore list
  2020-09-26 17:06     ` Junio C Hamano
@ 2020-09-26 23:58       ` Junio C Hamano
  2020-09-28 13:26       ` Barret Rhoden
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-09-26 23:58 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Barret Rhoden

Junio C Hamano <gitster@pobox.com> writes:

>>> @@ -66,7 +72,8 @@ void oidset_parse_file(struct oidset *set, const char *path)
>>>  		if (!sb.len)
>>>  			continue;
>>>
>>> -		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
>>> +		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0' ||
>>> +		    (fn && fn(&oid, cbdata)))
>>
>> OK, so this turns the basic all-I-know-is-hashes oidset loader into a
>> flexible higher-order map function.  Fun, but wise?  Can't make up my
>> mind.
>
> Fun and probably useful.  It is a different matter if it is wise to
> use it to (1) peel tags to commits and (2) fail on an nonexistent
> object.  My take on them is (1) is probably true, and (2) is Meh ;-)

If we choose to do (2) differently, we only need the following
one-liner patch.  I might suggest tweaking the semantics of the
callback function a bit to allow it to tell the caller (i.e.
oidset_parse_file_carefully()) that it wants to go on as usual, it
wants to omit the object from the hashtable, it replaced the given
object to something else, or it detected an error and wants to
abort, and if we were doing that, we'd be returning "do not add this
object to the table" signal, instead of 0 that signals "we are good
doing business as usual", from here.


 builtin/blame.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index baa5d979cc..8d7b66e970 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -817,8 +817,19 @@ static int peel_to_commit_oid(struct object_id *oid_ret, void *cbdata)
 			oidcpy(oid_ret, &oid);
 			return 0;
 		}
+
+		/*
+		 * We can ignore a request to ignore any nonexistent
+		 * objects, trees and blobs by not doing anything
+		 * special, as the blame machinery works with commits,
+		 * so entries in the hashtable from these objects will
+		 * never be looked up.  But we do allow dereferencing
+		 * an annotated tag, as silently ignoring a request to
+		 * ignore v1.0.0 because it is an annotated tag is a
+		 * bit too unfriendly to end-users.
+		 */
 		if (kind != OBJ_TAG)
-			return -1;
+			return 0;
 		obj = deref_tag(r, parse_object(r, &oid), NULL, 0);
 		oidcpy(&oid, &obj->oid);
 	}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] blame: validate and peel the object names on the ignore list
  2020-09-26 17:06     ` Junio C Hamano
  2020-09-26 23:58       ` Junio C Hamano
@ 2020-09-28 13:26       ` Barret Rhoden
  2020-10-11 16:03         ` René Scharfe
  1 sibling, 1 reply; 13+ messages in thread
From: Barret Rhoden @ 2020-09-28 13:26 UTC (permalink / raw)
  To: Junio C Hamano, René Scharfe; +Cc: git

Hi -

On 9/26/20 1:06 PM, Junio C Hamano wrote:
> René Scharfe <l.s.r@web.de> writes:
> 
>> When I request "Don't eat any glue!", perfectly human responses could be
>> "But I don't have any glue!" or "It doesn't even taste that good.", but
>> I'd expect a computer program to act I bit more logical and just don't
>> do it, without talking back.  Maybe that's just me.
>>
>> (I had been bitten by a totally different software adding such a check,
>> which made it complain about my long catch-all ignore list, and I had to
>> craft and maintain a specific "clean" list for each deployment --
>> perhaps I'm still bitter about that.)
> 
> A user who says "ignore v2.3", sees that the commit pointed at by
> that release tag is not ignored, comes here to complain, and is told
> to write v2.3^0 instead, would not be happy.  It is a mistake easy
> to catch to help users, so I am more for than against that part of
> the change.

That sounds like a nice change.

> I am completely neutral about "you told me to ignore
> this, but as far as I can tell it does not even exist---did you
> screw up when you prepared the list of stuff to ignore?" part.  I do
> not mind seeing it removed.

Part of my reasoning for "fail if you can't find it" was that it was 
highly likely to be a user error.  Especially because it will fail for a 
short hash from a file.  If you do have a list of commits to ignore 
(.git-blame-ignore-revs), that list is probably under version control in 
the same git repo, so it should change as you change branches.

But all in all, I'm fine with skipping unknown objects.  Or for warning 
or having a git-config option, like we do for a couple other aspects of 
blame-ignore, since one size doesn't fit all.

>>> +		if (kind == OBJ_COMMIT) {
>>> +			oidcpy(oid_ret, &oid);
>>
>> At that point we know it's an object, but cast it up to the most generic
>> class we have -- an object ID.  We could have set an object flag to mark
>> it ignored instead, which would be trivial to check later.  On the other
>> hand it probably wouldn't make much of a difference -- hashmaps are
>> pretty fast, and blame has lots of things to do beyond ignoring commits.
> 
> Quite honestly, I am not interested in the "blame --ignore" feature
> itself.  It is good that you CC'ed Barret so that such an
> improvement suggestion would be heard by the right party ;-).

Any performance improvement would be welcome.  I haven't looked at the 
code in a while, but I don't recall any reasons why this wouldn't work.

> 
>>> @@ -815,10 +836,12 @@ static void build_ignorelist(struct blame_scoreboard *sb,
>>>   		if (!strcmp(i->string, ""))
>>>   			oidset_clear(&sb->ignore_list);
>>
>> This preexisting feature is curious.  It's even documented ('An empty
>> file name, "", will clear the list of revs from previously processed
>> files.') and covered by t8013.6.  Why would we need such magic in
>> addition to the standard negation (--no-ignore-revs-file) for clearing
>> the list?  The latter counters blame.ignoreRevsFile as well. *puzzled*
> 
> I shared the puzzlement when I saw it, but ditto.

I don't recall exactly.  Someone on the list might have wanted to both 
counter the blame.ignoreRevsFile and specify another file.  Or maybe 
they just wanted to counter the ignoreRevsFile, and I didn't know that 
--no- would already do that.  I'm certainly not wed to it.

Thanks,

Barret


>>> +void oidset_parse_file_carefully(struct oidset *set, const char *path,
>>> +				 oidset_parse_tweak_fn fn, void *cbdata)
>>>   {
>>>   	FILE *fp;
>>>   	struct strbuf sb = STRBUF_INIT;
>>> @@ -66,7 +72,8 @@ void oidset_parse_file(struct oidset *set, const char *path)
>>>   		if (!sb.len)
>>>   			continue;
>>>
>>> -		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
>>> +		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0' ||
>>> +		    (fn && fn(&oid, cbdata)))
>>
>> OK, so this turns the basic all-I-know-is-hashes oidset loader into a
>> flexible higher-order map function.  Fun, but wise?  Can't make up my
>> mind.
> 
> Fun and probably useful.  It is a different matter if it is wise to
> use it to (1) peel tags to commits and (2) fail on an nonexistent
> object.  My take on them is (1) is probably true, and (2) is Meh ;-)
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] blame: validate and peel the object names on the ignore list
  2020-09-28 13:26       ` Barret Rhoden
@ 2020-10-11 16:03         ` René Scharfe
  2020-10-12 16:54           ` Junio C Hamano
  2020-10-12 20:39           ` Barret Rhoden
  0 siblings, 2 replies; 13+ messages in thread
From: René Scharfe @ 2020-10-11 16:03 UTC (permalink / raw)
  To: Barret Rhoden, Junio C Hamano; +Cc: git

Am 28.09.20 um 15:26 schrieb Barret Rhoden:
> On 9/26/20 1:06 PM, Junio C Hamano wrote:
>> A user who says "ignore v2.3", sees that the commit pointed at by
>> that release tag is not ignored, comes here to complain, and is
>> told to write v2.3^0 instead, would not be happy.  It is a mistake
>> easy to catch to help users, so I am more for than against that
>> part of the change.
>
> That sounds like a nice change.

I agree -- peeling down to the commit is good.

>> I am completely neutral about "you told me to ignore this, but as
>> far as I can tell it does not even exist---did you screw up when
>> you prepared the list of stuff to ignore?" part.  I do not mind
>> seeing it removed.
>
> Part of my reasoning for "fail if you can't find it" was that it was
> highly likely to be a user error.  Especially because it will fail
> for a short hash from a file.  If you do have a list of commits to
> ignore (.git-blame-ignore-revs), that list is probably under version
> control in the same git repo, so it should change as you change
> branches.
>
> But all in all, I'm fine with skipping unknown objects.  Or for
> warning or having a git-config option, like we do for a couple other
> aspects of blame-ignore, since one size doesn't fit all.

I would expect user errors to be more likely with --ignore-rev and
interactive use (command line typos).  I see how aborting if the
referenced commit doesn't exists can help, and it's consistent with
other options that require a revision name.

I don't know how most people use --ignore-revs-file, but can imagine
a big bin of boring commits that is just appended to.  Having to
keep that file clean by removing commits from abandoned and garbage-
collected branches sounds like unnecessary busy-work to me.  (What
can I say -- I'm lazy.)

> Any performance improvement would be welcome.  I haven't looked at
> the code in a while, but I don't recall any reasons why this wouldn't
> work.

Using a commit flag instead of an oidset would only improve
performance noticeably if the product of the number of suspects and
ignored commits was huge, I guess.

I get weird timings for an ignore file containing basically all commits
(created with "git log --format=%H").  With Git's own repo and rc1:

Benchmark #1: ./git-blame --ignore-revs-file hashes Makefile
  Time (mean ± σ):      8.470 s ±  0.049 s    [User: 7.923 s, System: 0.547 s]
  Range (min … max):    8.434 s …  8.605 s    10 runs

And with the patch at the bottom:

Benchmark #1: ./git-blame --ignore-revs-file hashes Makefile
  Time (mean ± σ):      8.048 s ±  0.061 s    [User: 7.899 s, System: 0.146 s]
  Range (min … max):    7.987 s …  8.175 s    10 runs

That looks like a nice speedup, but why for system time alone?  Malloc
overhead perhaps?  And when I get rid of the intermediate oidset by
partially duplicating oidset_parse_file_carefully() it takes longer than
nine seconds.  Weird.  Perhaps a silly bug.

>>> This preexisting feature is curious.  It's even documented ('An
>>> empty file name, "", will clear the list of revs from previously
>>> processed files.') and covered by t8013.6.  Why would we need
>>> such magic in addition to the standard negation
>>> (--no-ignore-revs-file) for clearing the list?  The latter
>>> counters blame.ignoreRevsFile as well. *puzzled*
>>
>> I shared the puzzlement when I saw it, but ditto.
>
> I don't recall exactly.  Someone on the list might have wanted to
> both counter the blame.ignoreRevsFile and specify another file.  Or
> maybe they just wanted to counter the ignoreRevsFile, and I didn't
> know that --no- would already do that.  I'm certainly not wed to it.

The first step would be to show a deprecation warning, wait a few
releases and then remove that feature.  Not sure the effort and
potential user irritation is worth the saved conditional, doc lines
and test.  (We already established that I'm lazy.)

Anyway, here's the patch:
---
 blame.c         |  2 +-
 blame.h         |  5 +++--
 builtin/blame.c | 16 ++++++++++++----
 object.h        |  3 ++-
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/blame.c b/blame.c
index 686845b2b4..6e8c8fec9b 100644
--- a/blame.c
+++ b/blame.c
@@ -2487,7 +2487,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin,
 	/*
 	 * Pass remaining suspects for ignored commits to their parents.
 	 */
-	if (oidset_contains(&sb->ignore_list, &commit->object.oid)) {
+	if (commit->object.flags & BLAME_IGNORE) {
 		for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
 		     i < num_sg && sg;
 		     sg = sg->next, i++) {
diff --git a/blame.h b/blame.h
index b6bbee4147..d35167e8bd 100644
--- a/blame.h
+++ b/blame.h
@@ -16,6 +16,9 @@
 #define BLAME_DEFAULT_MOVE_SCORE	20
 #define BLAME_DEFAULT_COPY_SCORE	40

+/* Remember to update object flag allocation in object.h */
+#define BLAME_IGNORE	(1u<<14)
+
 struct fingerprint;

 /*
@@ -125,8 +128,6 @@ struct blame_scoreboard {
 	/* linked list of blames */
 	struct blame_entry *ent;

-	struct oidset ignore_list;
-
 	/* look-up a line in the final buffer */
 	int num_lines;
 	int *lineno;
diff --git a/builtin/blame.c b/builtin/blame.c
index bb0f29300e..1c6721b5d5 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -830,21 +830,29 @@ static void build_ignorelist(struct blame_scoreboard *sb,
 {
 	struct string_list_item *i;
 	struct object_id oid;
+	const struct object_id *o;
+	struct oidset_iter iter;
+	struct oidset ignore_list = OIDSET_INIT;

-	oidset_init(&sb->ignore_list, 0);
 	for_each_string_list_item(i, ignore_revs_file_list) {
 		if (!strcmp(i->string, ""))
-			oidset_clear(&sb->ignore_list);
+			oidset_clear(&ignore_list);
 		else
-			oidset_parse_file_carefully(&sb->ignore_list, i->string,
+			oidset_parse_file_carefully(&ignore_list, i->string,
 						    peel_to_commit_oid, sb);
 	}
 	for_each_string_list_item(i, ignore_rev_list) {
 		if (get_oid_committish(i->string, &oid) ||
 		    peel_to_commit_oid(&oid, sb))
 			die(_("cannot find revision %s to ignore"), i->string);
-		oidset_insert(&sb->ignore_list, &oid);
+		oidset_insert(&ignore_list, &oid);
 	}
+	oidset_iter_init(&ignore_list, &iter);
+	while ((o = oidset_iter_next(&iter))) {
+		struct commit *commit = lookup_commit(sb->repo, o);
+		commit->object.flags |= BLAME_IGNORE;
+	}
+	oidset_clear(&ignore_list);
 }

 int cmd_blame(int argc, const char **argv, const char *prefix)
diff --git a/object.h b/object.h
index 20b18805f0..6818c9296b 100644
--- a/object.h
+++ b/object.h
@@ -64,7 +64,8 @@ struct object_array {
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
  * upload-pack.c:                4       11-----14  16-----19
- * builtin/blame.c:                        12-13
+ * blame.c:                                     14
+ * builtin/blame.c:                        12---14
  * bisect.c:                                        16
  * bundle.c:                                        16
  * http-push.c:                          11-----14
--
2.28.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] blame: validate and peel the object names on the ignore list
  2020-10-11 16:03         ` René Scharfe
@ 2020-10-12 16:54           ` Junio C Hamano
  2020-10-12 20:39           ` Barret Rhoden
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-10-12 16:54 UTC (permalink / raw)
  To: René Scharfe; +Cc: Barret Rhoden, git

René Scharfe <l.s.r@web.de> writes:

>>>> This preexisting feature is curious.  It's even documented ('An
>>>> empty file name, "", will clear the list of revs from previously
>>>> processed files.') and covered by t8013.6.  Why would we need
>>>> such magic in addition to the standard negation
>>>> (--no-ignore-revs-file) for clearing the list?  The latter
>>>> counters blame.ignoreRevsFile as well. *puzzled*
>>>
>>> I shared the puzzlement when I saw it, but ditto.
>>
>> I don't recall exactly.  Someone on the list might have wanted to
>> both counter the blame.ignoreRevsFile and specify another file.  Or
>> maybe they just wanted to counter the ignoreRevsFile, and I didn't
>> know that --no- would already do that.  I'm certainly not wed to it.
>
> The first step would be to show a deprecation warning, wait a few
> releases and then remove that feature.  Not sure the effort and
> potential user irritation is worth the saved conditional, doc lines
> and test.  (We already established that I'm lazy.)

I do not particularly see the need to.  Perhaps when somebody
complains the next time?

> Anyway, here's the patch:
> ---
>  blame.c         |  2 +-
>  blame.h         |  5 +++--
>  builtin/blame.c | 16 ++++++++++++----
>  object.h        |  3 ++-
>  4 files changed, 18 insertions(+), 8 deletions(-)

Looks OK to me from a quick scan.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] blame: validate and peel the object names on the ignore list
  2020-10-11 16:03         ` René Scharfe
  2020-10-12 16:54           ` Junio C Hamano
@ 2020-10-12 20:39           ` Barret Rhoden
  2020-10-13 20:12             ` René Scharfe
  1 sibling, 1 reply; 13+ messages in thread
From: Barret Rhoden @ 2020-10-12 20:39 UTC (permalink / raw)
  To: René Scharfe, Junio C Hamano; +Cc: git

Hi -

On 10/11/20 12:03 PM, René Scharfe wrote:
[snip]
>> Any performance improvement would be welcome.  I haven't looked at
>> the code in a while, but I don't recall any reasons why this wouldn't
>> work.
> 
> Using a commit flag instead of an oidset would only improve
> performance noticeably if the product of the number of suspects and
> ignored commits was huge, I guess.
> 
> I get weird timings for an ignore file containing basically all commits
> (created with "git log --format=%H").  With Git's own repo and rc1:
> 
> Benchmark #1: ./git-blame --ignore-revs-file hashes Makefile
>    Time (mean ± σ):      8.470 s ±  0.049 s    [User: 7.923 s, System: 0.547 s]
>    Range (min … max):    8.434 s …  8.605 s    10 runs
> 
> And with the patch at the bottom:
> 
> Benchmark #1: ./git-blame --ignore-revs-file hashes Makefile
>    Time (mean ± σ):      8.048 s ±  0.061 s    [User: 7.899 s, System: 0.146 s]
>    Range (min … max):    7.987 s …  8.175 s    10 runs
> 
> That looks like a nice speedup, but why for system time alone?  Malloc
> overhead perhaps?

Hard to say.  Maybe page faults when walking the old ignore_list?

> Anyway, here's the patch:

Looks good to me.

Barret


> ---
>   blame.c         |  2 +-
>   blame.h         |  5 +++--
>   builtin/blame.c | 16 ++++++++++++----
>   object.h        |  3 ++-
>   4 files changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/blame.c b/blame.c
> index 686845b2b4..6e8c8fec9b 100644
> --- a/blame.c
> +++ b/blame.c
> @@ -2487,7 +2487,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin,
>   	/*
>   	 * Pass remaining suspects for ignored commits to their parents.
>   	 */
> -	if (oidset_contains(&sb->ignore_list, &commit->object.oid)) {
> +	if (commit->object.flags & BLAME_IGNORE) {
>   		for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
>   		     i < num_sg && sg;
>   		     sg = sg->next, i++) {
> diff --git a/blame.h b/blame.h
> index b6bbee4147..d35167e8bd 100644
> --- a/blame.h
> +++ b/blame.h
> @@ -16,6 +16,9 @@
>   #define BLAME_DEFAULT_MOVE_SCORE	20
>   #define BLAME_DEFAULT_COPY_SCORE	40
> 
> +/* Remember to update object flag allocation in object.h */
> +#define BLAME_IGNORE	(1u<<14)
> +
>   struct fingerprint;
> 
>   /*
> @@ -125,8 +128,6 @@ struct blame_scoreboard {
>   	/* linked list of blames */
>   	struct blame_entry *ent;
> 
> -	struct oidset ignore_list;
> -
>   	/* look-up a line in the final buffer */
>   	int num_lines;
>   	int *lineno;
> diff --git a/builtin/blame.c b/builtin/blame.c
> index bb0f29300e..1c6721b5d5 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -830,21 +830,29 @@ static void build_ignorelist(struct blame_scoreboard *sb,
>   {
>   	struct string_list_item *i;
>   	struct object_id oid;
> +	const struct object_id *o;
> +	struct oidset_iter iter;
> +	struct oidset ignore_list = OIDSET_INIT;
> 
> -	oidset_init(&sb->ignore_list, 0);
>   	for_each_string_list_item(i, ignore_revs_file_list) {
>   		if (!strcmp(i->string, ""))
> -			oidset_clear(&sb->ignore_list);
> +			oidset_clear(&ignore_list);
>   		else
> -			oidset_parse_file_carefully(&sb->ignore_list, i->string,
> +			oidset_parse_file_carefully(&ignore_list, i->string,
>   						    peel_to_commit_oid, sb);
>   	}
>   	for_each_string_list_item(i, ignore_rev_list) {
>   		if (get_oid_committish(i->string, &oid) ||
>   		    peel_to_commit_oid(&oid, sb))
>   			die(_("cannot find revision %s to ignore"), i->string);
> -		oidset_insert(&sb->ignore_list, &oid);
> +		oidset_insert(&ignore_list, &oid);
>   	}
> +	oidset_iter_init(&ignore_list, &iter);
> +	while ((o = oidset_iter_next(&iter))) {
> +		struct commit *commit = lookup_commit(sb->repo, o);
> +		commit->object.flags |= BLAME_IGNORE;
> +	}
> +	oidset_clear(&ignore_list);
>   }
> 
>   int cmd_blame(int argc, const char **argv, const char *prefix)
> diff --git a/object.h b/object.h
> index 20b18805f0..6818c9296b 100644
> --- a/object.h
> +++ b/object.h
> @@ -64,7 +64,8 @@ struct object_array {
>    * negotiator/default.c:       2--5
>    * walker.c:                 0-2
>    * upload-pack.c:                4       11-----14  16-----19
> - * builtin/blame.c:                        12-13
> + * blame.c:                                     14
> + * builtin/blame.c:                        12---14
>    * bisect.c:                                        16
>    * bundle.c:                                        16
>    * http-push.c:                          11-----14
> --
> 2.28.0
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] blame: validate and peel the object names on the ignore list
  2020-10-12 20:39           ` Barret Rhoden
@ 2020-10-13 20:12             ` René Scharfe
  0 siblings, 0 replies; 13+ messages in thread
From: René Scharfe @ 2020-10-13 20:12 UTC (permalink / raw)
  To: Barret Rhoden, Junio C Hamano; +Cc: git

Am 12.10.20 um 22:39 schrieb Barret Rhoden:
> Hi -
>
> On 10/11/20 12:03 PM, René Scharfe wrote:
> [snip]
>>> Any performance improvement would be welcome.  I haven't looked at
>>> the code in a while, but I don't recall any reasons why this wouldn't
>>> work.
>>
>> Using a commit flag instead of an oidset would only improve
>> performance noticeably if the product of the number of suspects and
>> ignored commits was huge, I guess.
>>
>> I get weird timings for an ignore file containing basically all commits
>> (created with "git log --format=%H").  With Git's own repo and rc1:
>>
>> Benchmark #1: ./git-blame --ignore-revs-file hashes Makefile
>>    Time (mean ± σ):      8.470 s ±  0.049 s    [User: 7.923 s, System: 0.547 s]
>>    Range (min … max):    8.434 s …  8.605 s    10 runs
>>
>> And with the patch at the bottom:
>>
>> Benchmark #1: ./git-blame --ignore-revs-file hashes Makefile
>>    Time (mean ± σ):      8.048 s ±  0.061 s    [User: 7.899 s, System: 0.146 s]
>>    Range (min … max):    7.987 s …  8.175 s    10 runs
>>
>> That looks like a nice speedup, but why for system time alone?  Malloc
>> overhead perhaps?
>
> Hard to say.  Maybe page faults when walking the old ignore_list?

brk(2) calls.  strace -c says that rc1 has 21657 of them and the patch
gets that down to 8132.  They dominate system time in both cases.

>
>> Anyway, here's the patch:
>
> Looks good to me.
>
> Barret
>
>
>> ---
>>   blame.c         |  2 +-
>>   blame.h         |  5 +++--
>>   builtin/blame.c | 16 ++++++++++++----
>>   object.h        |  3 ++-
>>   4 files changed, 18 insertions(+), 8 deletions(-)
>>
>> diff --git a/blame.c b/blame.c
>> index 686845b2b4..6e8c8fec9b 100644
>> --- a/blame.c
>> +++ b/blame.c
>> @@ -2487,7 +2487,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin,
>>       /*
>>        * Pass remaining suspects for ignored commits to their parents.
>>        */
>> -    if (oidset_contains(&sb->ignore_list, &commit->object.oid)) {
>> +    if (commit->object.flags & BLAME_IGNORE) {
>>           for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
>>                i < num_sg && sg;
>>                sg = sg->next, i++) {
>> diff --git a/blame.h b/blame.h
>> index b6bbee4147..d35167e8bd 100644
>> --- a/blame.h
>> +++ b/blame.h
>> @@ -16,6 +16,9 @@
>>   #define BLAME_DEFAULT_MOVE_SCORE    20
>>   #define BLAME_DEFAULT_COPY_SCORE    40
>>
>> +/* Remember to update object flag allocation in object.h */
>> +#define BLAME_IGNORE    (1u<<14)
>> +
>>   struct fingerprint;
>>
>>   /*
>> @@ -125,8 +128,6 @@ struct blame_scoreboard {
>>       /* linked list of blames */
>>       struct blame_entry *ent;
>>
>> -    struct oidset ignore_list;
>> -
>>       /* look-up a line in the final buffer */
>>       int num_lines;
>>       int *lineno;
>> diff --git a/builtin/blame.c b/builtin/blame.c
>> index bb0f29300e..1c6721b5d5 100644
>> --- a/builtin/blame.c
>> +++ b/builtin/blame.c
>> @@ -830,21 +830,29 @@ static void build_ignorelist(struct blame_scoreboard *sb,
>>   {
>>       struct string_list_item *i;
>>       struct object_id oid;
>> +    const struct object_id *o;
>> +    struct oidset_iter iter;
>> +    struct oidset ignore_list = OIDSET_INIT;
>>
>> -    oidset_init(&sb->ignore_list, 0);
>>       for_each_string_list_item(i, ignore_revs_file_list) {
>>           if (!strcmp(i->string, ""))
>> -            oidset_clear(&sb->ignore_list);
>> +            oidset_clear(&ignore_list);
>>           else
>> -            oidset_parse_file_carefully(&sb->ignore_list, i->string,
>> +            oidset_parse_file_carefully(&ignore_list, i->string,
>>                               peel_to_commit_oid, sb);
>>       }
>>       for_each_string_list_item(i, ignore_rev_list) {
>>           if (get_oid_committish(i->string, &oid) ||
>>               peel_to_commit_oid(&oid, sb))
>>               die(_("cannot find revision %s to ignore"), i->string);
>> -        oidset_insert(&sb->ignore_list, &oid);
>> +        oidset_insert(&ignore_list, &oid);
>>       }
>> +    oidset_iter_init(&ignore_list, &iter);
>> +    while ((o = oidset_iter_next(&iter))) {
>> +        struct commit *commit = lookup_commit(sb->repo, o);
>> +        commit->object.flags |= BLAME_IGNORE;
>> +    }
>> +    oidset_clear(&ignore_list);

Without this cleanup the number of brk(2) calls goes up to 24071 for
me, increasing system time beyond the one for rc1.

So it seems the improvement comes from allocating a few MB (the oidset)
and releasing it again to pre-size the heap and avoid thousands of
system calls that would otherwise extend it lazily.

The patch below on top of rc1 simulates this.  New day, new numbers,
this time with less background programs; here are the times for rc1:

Benchmark #1: ./git-blame --ignore-revs-file=hashes Makefile
  Time (mean ± σ):      8.210 s ±  0.020 s    [User: 7.647 s, System: 0.558 s]
  Range (min … max):    8.182 s …  8.258 s    10 runs

And here with the patch at the bottom:

Benchmark #1: ./git-blame --ignore-revs-file=hashes Makefile
  Time (mean ± σ):      7.879 s ±  0.023 s    [User: 7.827 s, System: 0.052 s]
  Range (min … max):    7.859 s …  7.936 s    10 runs

My conclusion: object flags won last time by cheating -- lookup speed
isn't really all that different, what matters is allocation overhead.
Extending the heap by just a few MB helps a lot.  Which is very likely
to be a platform-specific (system-specific even?) win.

So let's drop this.  But it shows that a better direction for
improving performance might be to reduce the number of allocations,
e.g. by using a mem_pool.

>>   }
>>
>>   int cmd_blame(int argc, const char **argv, const char *prefix)
>> diff --git a/object.h b/object.h
>> index 20b18805f0..6818c9296b 100644
>> --- a/object.h
>> +++ b/object.h
>> @@ -64,7 +64,8 @@ struct object_array {
>>    * negotiator/default.c:       2--5
>>    * walker.c:                 0-2
>>    * upload-pack.c:                4       11-----14  16-----19
>> - * builtin/blame.c:                        12-13
>> + * blame.c:                                     14
>> + * builtin/blame.c:                        12---14
>>    * bisect.c:                                        16
>>    * bundle.c:                                        16
>>    * http-push.c:                          11-----14
>> --
>> 2.28.0
>>
>

---
 builtin/blame.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/blame.c b/builtin/blame.c
index bb0f29300e..aa6970f452 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -845,6 +845,7 @@ static void build_ignorelist(struct blame_scoreboard *sb,
 			die(_("cannot find revision %s to ignore"), i->string);
 		oidset_insert(&sb->ignore_list, &oid);
 	}
+	free(xmalloc(10*1000*1000));
 }

 int cmd_blame(int argc, const char **argv, const char *prefix)
--
2.28.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-10-13 20:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25  5:59 [PATCH 0/4] Clean-up around get_x_ish() Junio C Hamano
2020-09-25  5:59 ` [PATCH 1/4] t8013: minimum preparatory clean-up Junio C Hamano
2020-09-25  5:59 ` [PATCH 2/4] blame: validate and peel the object names on the ignore list Junio C Hamano
2020-09-26 16:23   ` René Scharfe
2020-09-26 17:06     ` Junio C Hamano
2020-09-26 23:58       ` Junio C Hamano
2020-09-28 13:26       ` Barret Rhoden
2020-10-11 16:03         ` René Scharfe
2020-10-12 16:54           ` Junio C Hamano
2020-10-12 20:39           ` Barret Rhoden
2020-10-13 20:12             ` René Scharfe
2020-09-25  5:59 ` [PATCH 3/4] t1506: rev-parse A..B and A...B Junio C Hamano
2020-09-25  5:59 ` [PATCH 4/4] sequencer: stop abbreviating stopped-sha file Junio C Hamano

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git