git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Elijah Newren <newren@gmail.com>,
	Philippe Blain <levraiphilippeblain@gmail.com>,
	Martin Englund <martin@englund.nu>,
	git@vger.kernel.org
Subject: Re: gigantic commit messages, was Re: Git Bug Report: out of memory using git tag
Date: Wed, 02 Nov 2022 15:26:20 +0100	[thread overview]
Message-ID: <221102.86pme52z8d.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <Y2I0siBlVOngNUtK@coredump.intra.peff.net>


On Wed, Nov 02 2022, Jeff King wrote:

> On Wed, Nov 02, 2022 at 01:14:59AM -0700, Elijah Newren wrote:
>
>> On Wed, Nov 2, 2022 at 12:51 AM Jeff King <peff@peff.net> wrote:
>> >
>> > Here are patches which fix them both. I may be setting a new record for
>> > the ratio of commit message lines to changed code
>> 
>> It looks like the first patch is 72 lines of commit message for a
>> one-line fix, and the second patch is 61 lines of commit message for a
>> two line fix.
>> 
>> I don't know what the record ratio is, but it's at least 96[1], so
>> clearly you'll need to figure out how to pad your first commit message
>> with at least another 25 lines before this series can be accepted.
>> ;-)
>
> Well, if we want to start digging things up... ;)
>
> Try this:
>
>   git log --no-merges --no-renames --format='%H %B' -z --numstat '*.c' |
>   perl -0ne '
>     chomp;
>     if (s/^([0-9a-f]{40}) //) {
>       if (defined $commit && $diff) {
>         my $ratio = $body / $diff;
>         print "$ratio $body $diff $commit\n";
>       }
>       $commit = $1;
>       $body = () = /\n/g;
>       $diff = 0;
>     } elsif (/^\s*(\d+)\t/) {
>       # this counts only added lines, under the assumption that
>       # small commits generally remove/add in proportion. Of course
>       # ones that _only_ remove lines have infinite ratios.
>       $diff += $1;
>     } else {
>       die "confusing record: $_\n";
>     }
>   ' |
>   sort -rn |
>   head
>
> which shows there are a few in the 100's. Pipe through:
>
>   awk '{print $4}' |
>   git log --stdin --no-walk=unsorted --stat
>
> for a nicer view. I'm rejecting the top one on the grounds that it's
> mostly cut-and-paste output, and also that #2 is mine. ;)

I think that '*.c' is cheating, if anything I should be getting more
points when you remove that, as I've been over explaining
adding/removing a compiler flag or something. At least your #2 is tricky
C code :)

I haven't bothered to do this, but I think if you --word-diff
--word-diff-regex=. and parse the resulting diff you'd get "better"
results.

Or, for better & similar (but not the same): compute the levenshtein
distance of the pre- and post-image, and compute edit distance to commit
message length.

I haven't done that, but just from eyeballing it I think [1] beats your
[2] by that criteria. Per:
	
	$ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' int unsigned
	6
	$ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' "" _lf
	3

It should get 2x the score v.s. yours, but yours is <2x the
words/characters.

(Edit: But see [4] below)

There's also e.g. my [3] that's fairly high in the running per your
"only added lines". But I think it shows the perils of doing that,
i.e. in general I don't see why you'd omit deletions, that commit
message is certainly spending most of its time talking about why the
deletion of the code at hand is OK.

Once you count deletions it'll get *way* down the list, as it's 11
deleted lines, 1 added.

Hrm, I take some of the above back, I think [4] might be the winner.
That's just an edit distance of 1, so it's around 2x the commit message
length of yours if we adjust for your score of 6. (~2.5 by
characters)[5].

1. 356c4732950 (credential: treat CR/LF as line endings in the
   credential protocol, 2020-10-03)
2. aec0bba106d (config: work around gcc-10 -Wstringop-overflow warning,
   2020-08-04)
3. f97fe358576 (pickaxe -G: don't special-case create/delete,
   2021-04-12)
4. c58bebd4c67 (ci: update Cirrus-CI image to FreeBSD 12.3, 2022-05-25)
5. All measured with "git show --no-notes --no-patch <commit> | wc",
   because I was lazy.

  reply	other threads:[~2022-11-02 14:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-28 22:29 Git Bug Report: out of memory using git tag Martin Englund
2022-11-01 12:22 ` Jeff King
2022-11-02  0:41   ` Philippe Blain
2022-11-02  7:39     ` Jeff King
2022-11-02  7:42       ` [PATCH 1/2] ref-filter: fix parsing of signatures without blank lines Jeff King
2022-11-02  7:44       ` [PATCH 2/2] ref-filter: fix parsing of signatures with CRLF and no body Jeff King
2022-11-02  8:14       ` Git Bug Report: out of memory using git tag Elijah Newren
2022-11-02  9:13         ` gigantic commit messages, was " Jeff King
2022-11-02 14:26           ` Ævar Arnfjörð Bjarmason [this message]
2022-11-02 15:43             ` Elijah Newren
2022-11-02  8:24       ` Eric Sunshine
2022-11-02 12:13       ` Philippe Blain
2022-11-03  4:32         ` Jeff King
2022-11-03  0:42       ` Taylor Blau
2022-11-02  0:42   ` Philippe Blain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221102.86pme52z8d.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=levraiphilippeblain@gmail.com \
    --cc=martin@englund.nu \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).