git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Jeff King <peff@peff.net>,
	Philippe Blain <levraiphilippeblain@gmail.com>,
	Martin Englund <martin@englund.nu>,
	git@vger.kernel.org
Subject: Re: gigantic commit messages, was Re: Git Bug Report: out of memory using git tag
Date: Wed, 2 Nov 2022 08:43:52 -0700	[thread overview]
Message-ID: <CABPp-BHwWYij2zqTSnsuu1ib97M4kJhfjMeEFqV13nttdqT1yw@mail.gmail.com> (raw)
In-Reply-To: <221102.86pme52z8d.gmgdl@evledraar.gmail.com>

On Wed, Nov 2, 2022 at 7:43 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> On Wed, Nov 02 2022, Jeff King wrote:
>
> > On Wed, Nov 02, 2022 at 01:14:59AM -0700, Elijah Newren wrote:
> >
> >> On Wed, Nov 2, 2022 at 12:51 AM Jeff King <peff@peff.net> wrote:
> >> >
> >> > Here are patches which fix them both. I may be setting a new record for
> >> > the ratio of commit message lines to changed code
> >>
> >> It looks like the first patch is 72 lines of commit message for a
> >> one-line fix, and the second patch is 61 lines of commit message for a
> >> two line fix.
> >>
> >> I don't know what the record ratio is, but it's at least 96[1], so
> >> clearly you'll need to figure out how to pad your first commit message
> >> with at least another 25 lines before this series can be accepted.
> >> ;-)
> >
> > Well, if we want to start digging things up... ;)
> >
> > Try this:
> >
> >   git log --no-merges --no-renames --format='%H %B' -z --numstat '*.c' |
> >   perl -0ne '
> >     chomp;
> >     if (s/^([0-9a-f]{40}) //) {
> >       if (defined $commit && $diff) {
> >         my $ratio = $body / $diff;
> >         print "$ratio $body $diff $commit\n";
> >       }
> >       $commit = $1;
> >       $body = () = /\n/g;
> >       $diff = 0;
> >     } elsif (/^\s*(\d+)\t/) {
> >       # this counts only added lines, under the assumption that
> >       # small commits generally remove/add in proportion. Of course
> >       # ones that _only_ remove lines have infinite ratios.
> >       $diff += $1;
> >     } else {
> >       die "confusing record: $_\n";
> >     }
> >   ' |
> >   sort -rn |
> >   head
> >
> > which shows there are a few in the 100's. Pipe through:
> >
> >   awk '{print $4}' |
> >   git log --stdin --no-walk=unsorted --stat
> >
> > for a nicer view. I'm rejecting the top one on the grounds that it's
> > mostly cut-and-paste output, and also that #2 is mine. ;)
>
> I think that '*.c' is cheating, if anything I should be getting more
> points when you remove that, as I've been over explaining
> adding/removing a compiler flag or something. At least your #2 is tricky
> C code :)
>
> I haven't bothered to do this, but I think if you --word-diff
> --word-diff-regex=. and parse the resulting diff you'd get "better"
> results.
>
> Or, for better & similar (but not the same): compute the levenshtein
> distance of the pre- and post-image, and compute edit distance to commit
> message length.
>
> I haven't done that, but just from eyeballing it I think [1] beats your
> [2] by that criteria. Per:
>
>         $ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' int unsigned
>         6
>         $ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' "" _lf
>         3
>
> It should get 2x the score v.s. yours, but yours is <2x the
> words/characters.
>
> (Edit: But see [4] below)
>
> There's also e.g. my [3] that's fairly high in the running per your
> "only added lines". But I think it shows the perils of doing that,
> i.e. in general I don't see why you'd omit deletions, that commit
> message is certainly spending most of its time talking about why the
> deletion of the code at hand is OK.
>
> Once you count deletions it'll get *way* down the list, as it's 11
> deleted lines, 1 added.
>
> Hrm, I take some of the above back, I think [4] might be the winner.
> That's just an edit distance of 1, so it's around 2x the commit message
> length of yours if we adjust for your score of 6. (~2.5 by
> characters)[5].
>
> 1. 356c4732950 (credential: treat CR/LF as line endings in the
>    credential protocol, 2020-10-03)
> 2. aec0bba106d (config: work around gcc-10 -Wstringop-overflow warning,
>    2020-08-04)
> 3. f97fe358576 (pickaxe -G: don't special-case create/delete,
>    2021-04-12)
> 4. c58bebd4c67 (ci: update Cirrus-CI image to FreeBSD 12.3, 2022-05-25)
> 5. All measured with "git show --no-notes --no-patch <commit> | wc",
>    because I was lazy.

Hehe, my offhand joke started a contest over the whimsical question of
who's the most long-winded.  I think my work here is done.  :-)

  reply	other threads:[~2022-11-02 15:45 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-28 22:29 Git Bug Report: out of memory using git tag Martin Englund
2022-11-01 12:22 ` Jeff King
2022-11-02  0:41   ` Philippe Blain
2022-11-02  7:39     ` Jeff King
2022-11-02  7:42       ` [PATCH 1/2] ref-filter: fix parsing of signatures without blank lines Jeff King
2022-11-02  7:44       ` [PATCH 2/2] ref-filter: fix parsing of signatures with CRLF and no body Jeff King
2022-11-02  8:14       ` Git Bug Report: out of memory using git tag Elijah Newren
2022-11-02  9:13         ` gigantic commit messages, was " Jeff King
2022-11-02 14:26           ` Ævar Arnfjörð Bjarmason
2022-11-02 15:43             ` Elijah Newren [this message]
2022-11-02  8:24       ` Eric Sunshine
2022-11-02 12:13       ` Philippe Blain
2022-11-03  4:32         ` Jeff King
2022-11-03  0:42       ` Taylor Blau
2022-11-02  0:42   ` Philippe Blain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BHwWYij2zqTSnsuu1ib97M4kJhfjMeEFqV13nttdqT1yw@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=levraiphilippeblain@gmail.com \
    --cc=martin@englund.nu \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).