From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Jeff King <peff@peff.net>,
Philippe Blain <levraiphilippeblain@gmail.com>,
Martin Englund <martin@englund.nu>,
git@vger.kernel.org
Subject: Re: gigantic commit messages, was Re: Git Bug Report: out of memory using git tag
Date: Wed, 2 Nov 2022 08:43:52 -0700 [thread overview]
Message-ID: <CABPp-BHwWYij2zqTSnsuu1ib97M4kJhfjMeEFqV13nttdqT1yw@mail.gmail.com> (raw)
In-Reply-To: <221102.86pme52z8d.gmgdl@evledraar.gmail.com>
On Wed, Nov 2, 2022 at 7:43 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> On Wed, Nov 02 2022, Jeff King wrote:
>
> > On Wed, Nov 02, 2022 at 01:14:59AM -0700, Elijah Newren wrote:
> >
> >> On Wed, Nov 2, 2022 at 12:51 AM Jeff King <peff@peff.net> wrote:
> >> >
> >> > Here are patches which fix them both. I may be setting a new record for
> >> > the ratio of commit message lines to changed code
> >>
> >> It looks like the first patch is 72 lines of commit message for a
> >> one-line fix, and the second patch is 61 lines of commit message for a
> >> two line fix.
> >>
> >> I don't know what the record ratio is, but it's at least 96[1], so
> >> clearly you'll need to figure out how to pad your first commit message
> >> with at least another 25 lines before this series can be accepted.
> >> ;-)
> >
> > Well, if we want to start digging things up... ;)
> >
> > Try this:
> >
> > git log --no-merges --no-renames --format='%H %B' -z --numstat '*.c' |
> > perl -0ne '
> > chomp;
> > if (s/^([0-9a-f]{40}) //) {
> > if (defined $commit && $diff) {
> > my $ratio = $body / $diff;
> > print "$ratio $body $diff $commit\n";
> > }
> > $commit = $1;
> > $body = () = /\n/g;
> > $diff = 0;
> > } elsif (/^\s*(\d+)\t/) {
> > # this counts only added lines, under the assumption that
> > # small commits generally remove/add in proportion. Of course
> > # ones that _only_ remove lines have infinite ratios.
> > $diff += $1;
> > } else {
> > die "confusing record: $_\n";
> > }
> > ' |
> > sort -rn |
> > head
> >
> > which shows there are a few in the 100's. Pipe through:
> >
> > awk '{print $4}' |
> > git log --stdin --no-walk=unsorted --stat
> >
> > for a nicer view. I'm rejecting the top one on the grounds that it's
> > mostly cut-and-paste output, and also that #2 is mine. ;)
>
> I think that '*.c' is cheating, if anything I should be getting more
> points when you remove that, as I've been over explaining
> adding/removing a compiler flag or something. At least your #2 is tricky
> C code :)
>
> I haven't bothered to do this, but I think if you --word-diff
> --word-diff-regex=. and parse the resulting diff you'd get "better"
> results.
>
> Or, for better & similar (but not the same): compute the levenshtein
> distance of the pre- and post-image, and compute edit distance to commit
> message length.
>
> I haven't done that, but just from eyeballing it I think [1] beats your
> [2] by that criteria. Per:
>
> $ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' int unsigned
> 6
> $ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' "" _lf
> 3
>
> It should get 2x the score v.s. yours, but yours is <2x the
> words/characters.
>
> (Edit: But see [4] below)
>
> There's also e.g. my [3] that's fairly high in the running per your
> "only added lines". But I think it shows the perils of doing that,
> i.e. in general I don't see why you'd omit deletions, that commit
> message is certainly spending most of its time talking about why the
> deletion of the code at hand is OK.
>
> Once you count deletions it'll get *way* down the list, as it's 11
> deleted lines, 1 added.
>
> Hrm, I take some of the above back, I think [4] might be the winner.
> That's just an edit distance of 1, so it's around 2x the commit message
> length of yours if we adjust for your score of 6. (~2.5 by
> characters)[5].
>
> 1. 356c4732950 (credential: treat CR/LF as line endings in the
> credential protocol, 2020-10-03)
> 2. aec0bba106d (config: work around gcc-10 -Wstringop-overflow warning,
> 2020-08-04)
> 3. f97fe358576 (pickaxe -G: don't special-case create/delete,
> 2021-04-12)
> 4. c58bebd4c67 (ci: update Cirrus-CI image to FreeBSD 12.3, 2022-05-25)
> 5. All measured with "git show --no-notes --no-patch <commit> | wc",
> because I was lazy.
Hehe, my offhand joke started a contest over the whimsical question of
who's the most long-winded. I think my work here is done. :-)
next prev parent reply other threads:[~2022-11-02 15:45 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-28 22:29 Git Bug Report: out of memory using git tag Martin Englund
2022-11-01 12:22 ` Jeff King
2022-11-02 0:41 ` Philippe Blain
2022-11-02 7:39 ` Jeff King
2022-11-02 7:42 ` [PATCH 1/2] ref-filter: fix parsing of signatures without blank lines Jeff King
2022-11-02 7:44 ` [PATCH 2/2] ref-filter: fix parsing of signatures with CRLF and no body Jeff King
2022-11-02 8:14 ` Git Bug Report: out of memory using git tag Elijah Newren
2022-11-02 9:13 ` gigantic commit messages, was " Jeff King
2022-11-02 14:26 ` Ævar Arnfjörð Bjarmason
2022-11-02 15:43 ` Elijah Newren [this message]
2022-11-02 8:24 ` Eric Sunshine
2022-11-02 12:13 ` Philippe Blain
2022-11-03 4:32 ` Jeff King
2022-11-03 0:42 ` Taylor Blau
2022-11-02 0:42 ` Philippe Blain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABPp-BHwWYij2zqTSnsuu1ib97M4kJhfjMeEFqV13nttdqT1yw@mail.gmail.com \
--to=newren@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=levraiphilippeblain@gmail.com \
--cc=martin@englund.nu \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).