git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Julien Rouhaud <rjuju123@gmail.com>
To: git@vger.kernel.org
Subject: [RFC PATCH] gitweb: improve title shortening heuristics
Date: Sun, 24 Jul 2022 14:12:31 +0800	[thread overview]
Message-ID: <20220724061231.jddhqns7bqx5c2xm@jrouhaud> (raw)

[-- Attachment #1: Type: text/plain, Size: 1625 bytes --]

Hi,

First of all, this is my first time on this ML so apologies in advance if I
missed anything in the patch submission guidelines.

We got some report recently that the commit short title on the postgres gitweb
instance was sometimes being mangled (1).  After a bit of digging, it appears
to be due to some long time heuristics to remove some uninteresting parts of a
commit message (see 198066916a8 from August 2005).  In our case, it removed any
occurrence of "master." in the commit message even if the message contains
"postmaster.c" rather than a cname (or something that looks like it), leading
to the commit message:

Remove postmaster.c's reset_shared() wrapper function.

being displayed as:

Remove postc's reset_shared() wrapper function.

It's probably some corner case for which there's barely any complaint, so it
doesn't look worthwhile to spend too much effort on it.  It also seems
impossible to make the current approach entirely bullet proof, but if we simply
make sure that the prefix is preceded by at least one whitespace and isn't
followed by another one we could avoid almost all of the incorrect matches (and
all of them as far as postgres is concerned).  Would that be an acceptable
compromise?  If yes, I'm attaching a patch that does that (and also adds git://
and https:// to the list of trimmed protocols while at it).

Otherwise, would it be acceptable to disable the whole block (the "remove
leading stuff of merges to make the interesting part visible") with some new
configuration option?

Cheers,
Julien.

[1] https://www.postgresql.org/message-id/flat/4025723.1658013974%40sss.pgh.pa.us

[-- Attachment #2: v1-0001-gitweb-improve-title_short-shortening-heuristics.patch --]
[-- Type: text/plain, Size: 1423 bytes --]

From ed46dcd2796b9af6ba3f73d46a3141a88964ed11 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 24 Jul 2022 13:17:19 +0800
Subject: [PATCH v1] gitweb: improve title_short shortening heuristics

In order to shorten the title, some common domain prefixes can be detected and
removed.  However, the current regex matches those prefix anywhere in the
title which makes it likely to remove it where it's not intended.

To make that case less likely, make sure that the prefix is preceded by at
least one whitespace and isn't followed by another whitespace.

While at it, also add  git:// and https:// to the list of detected and trimmed
protocols.

Signed-off-by: Julien Rouhaud <julien.rouhaud@free.fr>
---
 gitweb/gitweb.perl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1835487ab2..18dd0b93fb 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3565,10 +3565,10 @@ sub parse_commit_text {
 				$title =~ s/^Automatic //;
 				$title =~ s/^merge (of|with) /Merge ... /i;
 				if (length($title) > 50) {
-					$title =~ s/(http|rsync):\/\///;
+					$title =~ s/(git|http|https|rsync):\/\///;
 				}
 				if (length($title) > 50) {
-					$title =~ s/(master|www|rsync)\.//;
+					$title =~ s/\s+(master|www|rsync)\.([^\s])/ \2/;
 				}
 				if (length($title) > 50) {
 					$title =~ s/kernel.org:?//;
-- 
2.37.0


             reply	other threads:[~2022-07-24  6:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-24  6:12 Julien Rouhaud [this message]
2022-07-25  1:30 ` [RFC PATCH] gitweb: improve title shortening heuristics Junio C Hamano
2022-07-25  2:12   ` Julien Rouhaud
2022-07-25  5:54     ` Ævar Arnfjörð Bjarmason
2022-07-26 13:59       ` Julien Rouhaud
2022-07-27  6:31         ` Junio C Hamano
2022-07-28  1:30           ` Julien Rouhaud

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220724061231.jddhqns7bqx5c2xm@jrouhaud \
    --to=rjuju123@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).