git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Josh Triplett <josh@joshtriplett.org>
Cc: git@vger.kernel.org
Subject: Cross-referencing the Git mailing list archive with their corresponding commits in `pu`
Date: Mon, 6 Feb 2017 16:34:28 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.20.1702041206130.3496@virtualbox> (raw)

Hi Josh,

as discussed at the GitMerge, I am trying to come up with tooling that
will allow for substantially less tedious navigation between the local
repository, the mailing list, and what ends up in the `pu` branch.

That tooling would *still* not help lowering the barrier of entry for
contributing to Git by a lot, as it would *still* not address the problem
that mails sent from the most prevalent desktop mail client, as well as
mails sent from the most prevalent web mail client, are simply and
unceremoniously dropped. (This problem was acknowledged by quite a few
nods even at the Contributors' Summit...) But still, we decided to start
*somewhere* and this tooling is what we agreed on.

It is quite a bit harder going than I would like: as we have figured out,
the Subject: line is not a good way to link the commits with the original
mails containing the patches, as commit messages are modified before being
pushed often enough to make this a fragile matching.

So I thought maybe the From: line (from the body, if available, otherwise
from the header) in conjunction with the "Date:" header would work. But a
preliminary study shows that there are 336 From: + Date: combinations in
the Git mailing list archive that are not unique. 71 of these are shared
by three or more mails, even, and 9 are shared by more than 10 mails,
respectively. This is bad!

Unsurprisingly, the top 10 of these cases were obviously caused by the
builtin `git am` bug where it would not reset the author date properly.
Surprisingly, though, there were a few cases from 2005, too.

I had a quick look to find out what was the culprit (looking at the
17-strong patch series "Documentation fixes in response to my previous
listing" by Nikolai Weibull, but I am at a loss there: the mail claims to
be sent by git-send-email and the patches appear to be generated by
git-format-patch as of v0.99.9l, neither of which had a Date:-related bug
back in that time frame. My best guess is that the patches were mishandled
by a tool similar to rebase -i (which entered Git only at v1.5.3).

For details, see:
http://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/
(this is also an example where public-inbox' thread detection went utterly
wrong, including way too many mails in the "thread")

There was even a case of duplicated Date: headers in 2012. Now, this case
is very curious, as there have been 7 mails with identical Date: header,
but it was not a 6-strong patch series. Instead, it was a 4-strong patch
series that needed three iterations before it was accepted, and the
identical Date: header appears only in v2's patches (*not* in its cover
letter) and it *disappeared* in v3's 4/4, where it was set *back* by a
week (to the Date: it had in v1).

For details, see
http://public-inbox.org/git/cover.1354693001.git.Sebastian.Leske@sleske.name/
and
http://public-inbox.org/git/cover.1354324110.git.Sebastian.Leske@sleske.name/
and
http://public-inbox.org/git/b115a546fa783b4121d118bb8fdb9270443f90fa.1353691892.git.Sebastian.Leske@sleske.name/

This last example also demonstrates a very curious test case for a
different difficulty in trying to reconstruct lost correspondences: the
patch series was applied *twice*, independently of each other. First, on
the day v3 was submitted, it was applied on top of v1.8.1-rc0 (as commits
ee26a6e2b8..dd465ce66f), although it was not merged until v1.8.1-rc3. 22
days later, it was reapplied on top of maint so it could enter v1.8.0.3
(back then, Git still had "patchlevel" versions): c2999adcd5..008c208c2c.

As you can see, there is a many-to-many relationship here, even if you do
leave the *original* branch out of the picture entirely.

Will keep you posted,
Dscho

P.S.: I used public-inbox.org links instead of commit references to the
Git repository containing the mailing list archive, because the format of
said Git repository is so unfavorable that it was determined very quickly
in a discussion between Patrick Reynolds (GitHub) and myself that it would
put totally undue burden on GitHub to mirror it there (compare also Carlos
Nieto's talk at GitMerge titled "Top Ten Worst Repositories to host on
GitHub").

             reply	other threads:[~2017-02-06 15:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-06 15:34 Johannes Schindelin [this message]
2017-02-06 19:10 ` Cross-referencing the Git mailing list archive with their corresponding commits in `pu` Junio C Hamano
2017-02-09 14:11   ` Lars Schneider
2017-02-09 21:53     ` Johannes Schindelin
2017-02-09 22:18       ` Junio C Hamano
2017-02-06 20:48 ` Eric Wong
2017-02-06 22:07   ` Jeff King
2017-02-07  0:14     ` Eric Wong
2017-02-17 17:50 ` Johannes Schindelin
2017-02-20 19:33   ` Junio C Hamano
2017-02-20 20:06     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1702041206130.3496@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=josh@joshtriplett.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).