git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Barret Rhoden <brho@google.com>, Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/4] blame: validate and peel the object names on the ignore list
Date: Tue, 13 Oct 2020 22:12:37 +0200	[thread overview]
Message-ID: <ea1f2a1e-c525-c735-bdf7-65d44771cb3f@web.de> (raw)
In-Reply-To: <cd2c51da-55c6-cc5e-2da1-69db90aaf438@google.com>

Am 12.10.20 um 22:39 schrieb Barret Rhoden:
> Hi -
>
> On 10/11/20 12:03 PM, René Scharfe wrote:
> [snip]
>>> Any performance improvement would be welcome.  I haven't looked at
>>> the code in a while, but I don't recall any reasons why this wouldn't
>>> work.
>>
>> Using a commit flag instead of an oidset would only improve
>> performance noticeably if the product of the number of suspects and
>> ignored commits was huge, I guess.
>>
>> I get weird timings for an ignore file containing basically all commits
>> (created with "git log --format=%H").  With Git's own repo and rc1:
>>
>> Benchmark #1: ./git-blame --ignore-revs-file hashes Makefile
>>    Time (mean ± σ):      8.470 s ±  0.049 s    [User: 7.923 s, System: 0.547 s]
>>    Range (min … max):    8.434 s …  8.605 s    10 runs
>>
>> And with the patch at the bottom:
>>
>> Benchmark #1: ./git-blame --ignore-revs-file hashes Makefile
>>    Time (mean ± σ):      8.048 s ±  0.061 s    [User: 7.899 s, System: 0.146 s]
>>    Range (min … max):    7.987 s …  8.175 s    10 runs
>>
>> That looks like a nice speedup, but why for system time alone?  Malloc
>> overhead perhaps?
>
> Hard to say.  Maybe page faults when walking the old ignore_list?

brk(2) calls.  strace -c says that rc1 has 21657 of them and the patch
gets that down to 8132.  They dominate system time in both cases.

>
>> Anyway, here's the patch:
>
> Looks good to me.
>
> Barret
>
>
>> ---
>>   blame.c         |  2 +-
>>   blame.h         |  5 +++--
>>   builtin/blame.c | 16 ++++++++++++----
>>   object.h        |  3 ++-
>>   4 files changed, 18 insertions(+), 8 deletions(-)
>>
>> diff --git a/blame.c b/blame.c
>> index 686845b2b4..6e8c8fec9b 100644
>> --- a/blame.c
>> +++ b/blame.c
>> @@ -2487,7 +2487,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin,
>>       /*
>>        * Pass remaining suspects for ignored commits to their parents.
>>        */
>> -    if (oidset_contains(&sb->ignore_list, &commit->object.oid)) {
>> +    if (commit->object.flags & BLAME_IGNORE) {
>>           for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
>>                i < num_sg && sg;
>>                sg = sg->next, i++) {
>> diff --git a/blame.h b/blame.h
>> index b6bbee4147..d35167e8bd 100644
>> --- a/blame.h
>> +++ b/blame.h
>> @@ -16,6 +16,9 @@
>>   #define BLAME_DEFAULT_MOVE_SCORE    20
>>   #define BLAME_DEFAULT_COPY_SCORE    40
>>
>> +/* Remember to update object flag allocation in object.h */
>> +#define BLAME_IGNORE    (1u<<14)
>> +
>>   struct fingerprint;
>>
>>   /*
>> @@ -125,8 +128,6 @@ struct blame_scoreboard {
>>       /* linked list of blames */
>>       struct blame_entry *ent;
>>
>> -    struct oidset ignore_list;
>> -
>>       /* look-up a line in the final buffer */
>>       int num_lines;
>>       int *lineno;
>> diff --git a/builtin/blame.c b/builtin/blame.c
>> index bb0f29300e..1c6721b5d5 100644
>> --- a/builtin/blame.c
>> +++ b/builtin/blame.c
>> @@ -830,21 +830,29 @@ static void build_ignorelist(struct blame_scoreboard *sb,
>>   {
>>       struct string_list_item *i;
>>       struct object_id oid;
>> +    const struct object_id *o;
>> +    struct oidset_iter iter;
>> +    struct oidset ignore_list = OIDSET_INIT;
>>
>> -    oidset_init(&sb->ignore_list, 0);
>>       for_each_string_list_item(i, ignore_revs_file_list) {
>>           if (!strcmp(i->string, ""))
>> -            oidset_clear(&sb->ignore_list);
>> +            oidset_clear(&ignore_list);
>>           else
>> -            oidset_parse_file_carefully(&sb->ignore_list, i->string,
>> +            oidset_parse_file_carefully(&ignore_list, i->string,
>>                               peel_to_commit_oid, sb);
>>       }
>>       for_each_string_list_item(i, ignore_rev_list) {
>>           if (get_oid_committish(i->string, &oid) ||
>>               peel_to_commit_oid(&oid, sb))
>>               die(_("cannot find revision %s to ignore"), i->string);
>> -        oidset_insert(&sb->ignore_list, &oid);
>> +        oidset_insert(&ignore_list, &oid);
>>       }
>> +    oidset_iter_init(&ignore_list, &iter);
>> +    while ((o = oidset_iter_next(&iter))) {
>> +        struct commit *commit = lookup_commit(sb->repo, o);
>> +        commit->object.flags |= BLAME_IGNORE;
>> +    }
>> +    oidset_clear(&ignore_list);

Without this cleanup the number of brk(2) calls goes up to 24071 for
me, increasing system time beyond the one for rc1.

So it seems the improvement comes from allocating a few MB (the oidset)
and releasing it again to pre-size the heap and avoid thousands of
system calls that would otherwise extend it lazily.

The patch below on top of rc1 simulates this.  New day, new numbers,
this time with less background programs; here are the times for rc1:

Benchmark #1: ./git-blame --ignore-revs-file=hashes Makefile
  Time (mean ± σ):      8.210 s ±  0.020 s    [User: 7.647 s, System: 0.558 s]
  Range (min … max):    8.182 s …  8.258 s    10 runs

And here with the patch at the bottom:

Benchmark #1: ./git-blame --ignore-revs-file=hashes Makefile
  Time (mean ± σ):      7.879 s ±  0.023 s    [User: 7.827 s, System: 0.052 s]
  Range (min … max):    7.859 s …  7.936 s    10 runs

My conclusion: object flags won last time by cheating -- lookup speed
isn't really all that different, what matters is allocation overhead.
Extending the heap by just a few MB helps a lot.  Which is very likely
to be a platform-specific (system-specific even?) win.

So let's drop this.  But it shows that a better direction for
improving performance might be to reduce the number of allocations,
e.g. by using a mem_pool.

>>   }
>>
>>   int cmd_blame(int argc, const char **argv, const char *prefix)
>> diff --git a/object.h b/object.h
>> index 20b18805f0..6818c9296b 100644
>> --- a/object.h
>> +++ b/object.h
>> @@ -64,7 +64,8 @@ struct object_array {
>>    * negotiator/default.c:       2--5
>>    * walker.c:                 0-2
>>    * upload-pack.c:                4       11-----14  16-----19
>> - * builtin/blame.c:                        12-13
>> + * blame.c:                                     14
>> + * builtin/blame.c:                        12---14
>>    * bisect.c:                                        16
>>    * bundle.c:                                        16
>>    * http-push.c:                          11-----14
>> --
>> 2.28.0
>>
>

---
 builtin/blame.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/blame.c b/builtin/blame.c
index bb0f29300e..aa6970f452 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -845,6 +845,7 @@ static void build_ignorelist(struct blame_scoreboard *sb,
 			die(_("cannot find revision %s to ignore"), i->string);
 		oidset_insert(&sb->ignore_list, &oid);
 	}
+	free(xmalloc(10*1000*1000));
 }

 int cmd_blame(int argc, const char **argv, const char *prefix)
--
2.28.0

  reply	other threads:[~2020-10-13 20:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-25  5:59 [PATCH 0/4] Clean-up around get_x_ish() Junio C Hamano
2020-09-25  5:59 ` [PATCH 1/4] t8013: minimum preparatory clean-up Junio C Hamano
2020-09-25  5:59 ` [PATCH 2/4] blame: validate and peel the object names on the ignore list Junio C Hamano
2020-09-26 16:23   ` René Scharfe
2020-09-26 17:06     ` Junio C Hamano
2020-09-26 23:58       ` Junio C Hamano
2020-09-28 13:26       ` Barret Rhoden
2020-10-11 16:03         ` René Scharfe
2020-10-12 16:54           ` Junio C Hamano
2020-10-12 20:39           ` Barret Rhoden
2020-10-13 20:12             ` René Scharfe [this message]
2020-09-25  5:59 ` [PATCH 3/4] t1506: rev-parse A..B and A...B Junio C Hamano
2020-09-25  5:59 ` [PATCH 4/4] sequencer: stop abbreviating stopped-sha file Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea1f2a1e-c525-c735-bdf7-65d44771cb3f@web.de \
    --to=l.s.r@web.de \
    --cc=brho@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).