git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* commit -> public-inbox link helper
@ 2018-04-04 16:35 Johannes Schindelin
  2018-04-04 18:36 ` Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Johannes Schindelin @ 2018-04-04 16:35 UTC (permalink / raw)
  To: git

Hi team,

I found myself in dear need to quickly look up mails in the public-inbox
mail archive corresponding to any given commit in git.git. Some time ago,
I wrote a shell script to help me with that, and I found myself using it a
couple of times, so I think it might be useful for others, too.

This script (I call it lookup-commit.sh) needs to be dropped into a *bare*
clone of http://public-inbox.org/git, and called with its absolute or
relative path from a git.git worktree, e.g.

	~/public-inbox-git.git/lookup-commit.sh \
		fea16b47b603e7e4fa7fca198bd49229c0e5da3d

This will take a while initially, or when the `master` branch of the
public-inbox mirror was updated, as it will generate two files with
plain-text mappings.

In its default mode, it will print the Message-ID of the mail (if found).

With --open, it opens the mail in a browser (macOS support is missing,
mainly because I cannot test: just add an `open` alternative to
`xdg-open`).

With --reply, it puts the mail into the `from-public-inbox` folder of a
maildir-formatted ~/Mail/, so it is pretty specific to my setup here.

Note: the way mails are matched is by timestamp. In practice, this works
amazingly often (although not always, I reported findings short after
GitMerge 2017). My plan was to work on this when/as needed.

Note also: the script is very much in a 'works-for-me' state, and I could
imagine that others might want to modify it to their needs. I would be
delighted if somebody would take custody of it (as in: start a repository
on GitHub, adding a README.md, adding a config setting to point to the
location of the public-inbox mirror without having to copy the script,
adding an option to install an alias to run the script, etc).

And now, without further ado, here it is, the script, in its current glory:

-- snipsnap --
#!/bin/sh

# This is a very simple helper to assist with finding the mail (if any)
# corresponding to a given commit in git.git.

die () {
	echo "$*" >&2
	exit 1
}

mode=print
while case "$1" in
--open) mode=open;;
--reply) mode=reply;;
-*) die "Unknown option: $1";;
*) break;;
esac; do shift; done

test $# = 1 ||
die "Usage: $0 ( [--open] | [--reply] ) <commit>"

test reply != $mode ||
test -d "$HOME/Mail" ||
die "Need $HOME/Mail to reply"

commit="$1"
tae="$(git show -s --format='%at %an <%ae>' "$1")" ||
die "Could not get Timestamp/Author/Email triplet from $1"

# We try to match the timestamp first; the author name and author email are
# not as reliable: they might have been overridden via a "From:" line in the
# mail's body
timestamp=${tae%% *}

cd "$(dirname "$0")" ||
die "Could not cd to the public-inbox directory"

git rev-parse --quiet --verify \
	b60d038730d2c2bb8ab2b48c117db917ad529cf7 >/dev/null 2>&1 ||
die "Not a public-inbox directory: $(pwd)"

head="$(git rev-parse --verify master)" ||
die "Could not determine tip of master"

prevhead=
test ! -f map.latest-rev ||
prevhead="$(cat map.latest-rev)"

if test $head != "$prevhead"
then
	range=${prevhead:+$prevhead..}$head
	echo "Inserting records for $range" >&2
	git log --format="%at %h %an <%ae>" $range >map.txt.add ||
	die "Could not enumerate $range"

	cat map.txt map.txt.add 2>/dev/null | sort -n >map.txt.new &&
	mv -f map.txt.new map.txt ||
	die "Could not insert new records"

	echo $head >map.latest-rev
fi

lines="$(grep "^$timestamp " <map.txt)"
if test 1 != $(echo "$lines" | wc -l)
then
	test -n "$lines" ||
	die "No records found for timestamp $timestamp"

	echo "Multiple records found:"

	for h in $(echo "$lines" | cut -d ' ' -f 2)
	do
		git show -s --format="%nOn %ad, %an <%ae> sent" $h
		git show $h |
		sed -n -e 's/^+Message-Id: <\(.*\)>/\1/ip' \
			-e 's/^+Subject: //ip'
	done

	exit 1
fi

# We found exactly one record: print the message ID
h=${lines#$timestamp }
h=${h%% *}
messageid="$(git show $h | sed -n 's/^+Message-Id: <\(.*\)>/\1/ip')" ||
die "Could not determine Message-Id from $h"

case $mode in
print) echo $messageid;;
open)
	url=https://public-inbox.org/git/$messageid
	case "$(uname -s)" in
	Linux) xdg-open "$url";;
	MINGW*|MSYS*) start "$url";;
	*) die "Need to learn how to open URLs on $(uname -s)";;
	esac
	;;
reply)
	mkdir -p "$HOME/Mail/from-public-inbox/new" &&
	mkdir -p "$HOME/Mail/from-public-inbox/cur" &&
	mkdir -p "$HOME/Mail/from-public-inbox/tmp" ||
	die "Could not set up mail folder 'from-public-inbox'"

	path=$(git diff --name-only $h^!) &&
	mail="$(printf "%s_%09d.%s:2," $(date +%s.%N) $$ $(hostname -f))"
&&
	git show $h:$path >"$HOME/Mail/from-public-inbox/new/$mail" ||
	die "Could not write mail"
	;;
*)
	die "Unhandled mode: $mode"
	;;
esac

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: commit -> public-inbox link helper
  2018-04-04 16:35 commit -> public-inbox link helper Johannes Schindelin
@ 2018-04-04 18:36 ` Jeff King
  2018-04-04 20:59   ` Johannes Schindelin
  2018-04-04 19:58 ` Mike Rappazzo
  2018-04-20  8:39 ` Eric Wong
  2 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2018-04-04 18:36 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Wed, Apr 04, 2018 at 06:35:59PM +0200, Johannes Schindelin wrote:

> Hi team,
> 
> I found myself in dear need to quickly look up mails in the public-inbox
> mail archive corresponding to any given commit in git.git. Some time ago,
> I wrote a shell script to help me with that, and I found myself using it a
> couple of times, so I think it might be useful for others, too.
> 
> This script (I call it lookup-commit.sh) needs to be dropped into a *bare*
> clone of http://public-inbox.org/git, and called with its absolute or
> relative path from a git.git worktree, e.g.
> 
> 	~/public-inbox-git.git/lookup-commit.sh \
> 		fea16b47b603e7e4fa7fca198bd49229c0e5da3d
> 
> This will take a while initially, or when the `master` branch of the
> public-inbox mirror was updated, as it will generate two files with
> plain-text mappings.

Junio publishes a git-notes ref with the mapping you want. So you can
do:

  git fetch git://github.com/gitster/git.git refs/notes/amlog:refs/notes/amlog
  mid=$(git notes --ref amlog show $commit | perl -lne '/<(.*)>/ and print $1')
  echo "https://public-inbox.org/git/$mid"

without having to download the gigantic list archive repo at all (though
I do keep my own copy of the archive and index it with mairix, so I can
use "mairix -t m:$mid" and then view the whole thing locally in mutt).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: commit -> public-inbox link helper
  2018-04-04 16:35 commit -> public-inbox link helper Johannes Schindelin
  2018-04-04 18:36 ` Jeff King
@ 2018-04-04 19:58 ` Mike Rappazzo
  2018-04-04 21:02   ` Johannes Schindelin
  2018-04-20  8:39 ` Eric Wong
  2 siblings, 1 reply; 7+ messages in thread
From: Mike Rappazzo @ 2018-04-04 19:58 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Git List

On Wed, Apr 4, 2018 at 12:35 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi team,
>
> I found myself in dear need to quickly look up mails in the public-inbox
> mail archive corresponding to any given commit in git.git. Some time ago,
> I wrote a shell script to help me with that, and I found myself using it a
> couple of times, so I think it might be useful for others, too.
>
> This script (I call it lookup-commit.sh) needs to be dropped into a *bare*
> clone of http://public-inbox.org/git, and called with its absolute or
> relative path from a git.git worktree, e.g.
>
>         ~/public-inbox-git.git/lookup-commit.sh \
>                 fea16b47b603e7e4fa7fca198bd49229c0e5da3d
>
> This will take a while initially, or when the `master` branch of the
> public-inbox mirror was updated, as it will generate two files with
> plain-text mappings.
>
> In its default mode, it will print the Message-ID of the mail (if found).
>
> With --open, it opens the mail in a browser (macOS support is missing,
> mainly because I cannot test: just add an `open` alternative to
> `xdg-open`).
>
> With --reply, it puts the mail into the `from-public-inbox` folder of a
> maildir-formatted ~/Mail/, so it is pretty specific to my setup here.
>
> Note: the way mails are matched is by timestamp. In practice, this works
> amazingly often (although not always, I reported findings short after
> GitMerge 2017). My plan was to work on this when/as needed.
>
> Note also: the script is very much in a 'works-for-me' state, and I could
> imagine that others might want to modify it to their needs. I would be
> delighted if somebody would take custody of it (as in: start a repository
> on GitHub, adding a README.md, adding a config setting to point to the
> location of the public-inbox mirror without having to copy the script,
> adding an option to install an alias to run the script, etc).
>
> And now, without further ado, here it is, the script, in its current glory:
>
> -- snipsnap --
> #!/bin/sh
>
> # This is a very simple helper to assist with finding the mail (if any)
> # corresponding to a given commit in git.git.
>
> die () {
>         echo "$*" >&2
>         exit 1
> }
>
> mode=print
> while case "$1" in
> --open) mode=open;;
> --reply) mode=reply;;
> -*) die "Unknown option: $1";;
> *) break;;
> esac; do shift; done
>
> test $# = 1 ||
> die "Usage: $0 ( [--open] | [--reply] ) <commit>"
>
> test reply != $mode ||
> test -d "$HOME/Mail" ||
> die "Need $HOME/Mail to reply"
>
> commit="$1"
> tae="$(git show -s --format='%at %an <%ae>' "$1")" ||
> die "Could not get Timestamp/Author/Email triplet from $1"
>
> # We try to match the timestamp first; the author name and author email are
> # not as reliable: they might have been overridden via a "From:" line in the
> # mail's body
> timestamp=${tae%% *}
>
> cd "$(dirname "$0")" ||
> die "Could not cd to the public-inbox directory"
>
> git rev-parse --quiet --verify \
>         b60d038730d2c2bb8ab2b48c117db917ad529cf7 >/dev/null 2>&1 ||
> die "Not a public-inbox directory: $(pwd)"
>
> head="$(git rev-parse --verify master)" ||
> die "Could not determine tip of master"
>
> prevhead=
> test ! -f map.latest-rev ||
> prevhead="$(cat map.latest-rev)"
>
> if test $head != "$prevhead"
> then
>         range=${prevhead:+$prevhead..}$head
>         echo "Inserting records for $range" >&2
>         git log --format="%at %h %an <%ae>" $range >map.txt.add ||
>         die "Could not enumerate $range"
>
>         cat map.txt map.txt.add 2>/dev/null | sort -n >map.txt.new &&
>         mv -f map.txt.new map.txt ||
>         die "Could not insert new records"
>
>         echo $head >map.latest-rev
> fi
>
> lines="$(grep "^$timestamp " <map.txt)"
> if test 1 != $(echo "$lines" | wc -l)
> then
>         test -n "$lines" ||
>         die "No records found for timestamp $timestamp"
>
>         echo "Multiple records found:"
>
>         for h in $(echo "$lines" | cut -d ' ' -f 2)
>         do
>                 git show -s --format="%nOn %ad, %an <%ae> sent" $h
>                 git show $h |
>                 sed -n -e 's/^+Message-Id: <\(.*\)>/\1/ip' \
>                         -e 's/^+Subject: //ip'
>         done
>
>         exit 1
> fi
>
> # We found exactly one record: print the message ID
> h=${lines#$timestamp }
> h=${h%% *}
> messageid="$(git show $h | sed -n 's/^+Message-Id: <\(.*\)>/\1/ip')" ||
> die "Could not determine Message-Id from $h"
>
> case $mode in
> print) echo $messageid;;
> open)
>         url=https://public-inbox.org/git/$messageid
>         case "$(uname -s)" in
>         Linux) xdg-open "$url";;
>         MINGW*|MSYS*) start "$url";;

         Darwin*) open "$url";;

>         *) die "Need to learn how to open URLs on $(uname -s)";;
>         esac
>         ;;
> reply)
>         mkdir -p "$HOME/Mail/from-public-inbox/new" &&
>         mkdir -p "$HOME/Mail/from-public-inbox/cur" &&
>         mkdir -p "$HOME/Mail/from-public-inbox/tmp" ||
>         die "Could not set up mail folder 'from-public-inbox'"
>
>         path=$(git diff --name-only $h^!) &&
>         mail="$(printf "%s_%09d.%s:2," $(date +%s.%N) $$ $(hostname -f))"
> &&
>         git show $h:$path >"$HOME/Mail/from-public-inbox/new/$mail" ||
>         die "Could not write mail"
>         ;;
> *)
>         die "Unhandled mode: $mode"
>         ;;
> esac

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: commit -> public-inbox link helper
  2018-04-04 18:36 ` Jeff King
@ 2018-04-04 20:59   ` Johannes Schindelin
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2018-04-04 20:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi Peff,

On Wed, 4 Apr 2018, Jeff King wrote:

> On Wed, Apr 04, 2018 at 06:35:59PM +0200, Johannes Schindelin wrote:
> 
> > I found myself in dear need to quickly look up mails in the public-inbox
> > mail archive corresponding to any given commit in git.git. Some time ago,
> > I wrote a shell script to help me with that, and I found myself using it a
> > couple of times, so I think it might be useful for others, too.
> > 
> > This script (I call it lookup-commit.sh) needs to be dropped into a *bare*
> > clone of http://public-inbox.org/git, and called with its absolute or
> > relative path from a git.git worktree, e.g.
> > 
> > 	~/public-inbox-git.git/lookup-commit.sh \
> > 		fea16b47b603e7e4fa7fca198bd49229c0e5da3d
> > 
> > This will take a while initially, or when the `master` branch of the
> > public-inbox mirror was updated, as it will generate two files with
> > plain-text mappings.
> 
> Junio publishes a git-notes ref with the mapping you want. So you can
> do:
> 
>   git fetch git://github.com/gitster/git.git refs/notes/amlog:refs/notes/amlog
>   mid=$(git notes --ref amlog show $commit | perl -lne '/<(.*)>/ and print $1')
>   echo "https://public-inbox.org/git/$mid"
> 
> without having to download the gigantic list archive repo at all (though
> I do keep my own copy of the archive and index it with mairix, so I can
> use "mairix -t m:$mid" and then view the whole thing locally in mutt).

Good to know! Thanks for the script.

And thanks also for the `--ref` trick: I had a look at the man page of
git-notes, and it was not immediately obvious that it supports options
before the sub-subcommand. The `--ref` description is buried pretty deep
in there.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: commit -> public-inbox link helper
  2018-04-04 19:58 ` Mike Rappazzo
@ 2018-04-04 21:02   ` Johannes Schindelin
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2018-04-04 21:02 UTC (permalink / raw)
  To: Mike Rappazzo; +Cc: Git List

Hi Mike,

as I said here:

On Wed, 4 Apr 2018, Mike Rappazzo wrote:

> On Wed, Apr 4, 2018 at 12:35 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > [...]
> >
> > With --open, it opens the mail in a browser (macOS support is missing,
> > mainly because I cannot test: just add an `open` alternative to
> > `xdg-open`).
> >
> > [...]
> > open)
> >         url=https://public-inbox.org/git/$messageid
> >         case "$(uname -s)" in
> >         Linux) xdg-open "$url";;
> >         MINGW*|MSYS*) start "$url";;
> 
>          Darwin*) open "$url";;

I am aware of this alternative, but as I do not currently develop on macOS
apart from headless build agents, I did not add support for that.

Feel free to adopt the script, publish it in a GitHub repository, adapt it
to use refs/notes/amlog instead of a public-inbox mirror, and then tell me
where I can clone it ;-)

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: commit -> public-inbox link helper
  2018-04-04 16:35 commit -> public-inbox link helper Johannes Schindelin
  2018-04-04 18:36 ` Jeff King
  2018-04-04 19:58 ` Mike Rappazzo
@ 2018-04-20  8:39 ` Eric Wong
  2018-04-20 19:40   ` Johannes Schindelin
  2 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2018-04-20  8:39 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2470 bytes --]

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi team,
> 
> I found myself in dear need to quickly look up mails in the public-inbox
> mail archive corresponding to any given commit in git.git. Some time ago,
> I wrote a shell script to help me with that, and I found myself using it a
> couple of times, so I think it might be useful for others, too.

Hello, I think you can dump all the info you need more quickly
without cloning 1G of data by dumping NNTP OVER(view)
information instead.

I've attached a short Perl script which dumps the tab-delimited
file to stdout so you can process it with whatever.  Columns
relevant to you would probably be 2-5:

	NUM	Subject	From	Date	Message-ID

On public-inbox-nntpd, Dates are normalized to UTC in the OVER
response right now, so you'd need to use TZ=UTC with --date=rfc-local

It works with both nntp://news.gmane.org/gmane.comp.version-control.git
nntp://news.public-inbox.org/inbox.comp.version-control.git

> Note: the way mails are matched is by timestamp. In practice, this works
> amazingly often (although not always, I reported findings short after
> GitMerge 2017). My plan was to work on this when/as needed.

Thanks for that.  I've added dt: (date-time) searching to public-inbox
(d: (date-only) has been there forever):

        d:       date range as YYYYMMDD  e.g. d:19931002..20101002
                 Open-ended ranges such as d:19931002.. and d:..20101002
                 are also supported
        dt:      date-time range as YYYYMMDDhhmmss (e.g. dt:19931002011000..19931002011200)

To match an exact timestamp, both the begining and end range should
be the same.

(It'd be nice if Xapian + Perl bindings could get date parsing as
 good as git's.  Too bad C++ / XS overwhelms my tiny brain and
 waiting for builds overwhelm my patience)

> git rev-parse --quiet --verify \
> 	b60d038730d2c2bb8ab2b48c117db917ad529cf7 >/dev/null 2>&1 ||
> die "Not a public-inbox directory: $(pwd)"

Eep.  I'd don't think it's good to put such a hard dependency on
a particular mirror I started.  Somebody could start another one
which wasn't sourced from gmane and the Received: headers would
be different.

...And I'm pondering a conversion of what's running on
https://public-inbox.org/git/ to the new v2 repository format:

	https://public-inbox.org/meta/20180419015813.GA20051@dcvr/
	https://public-inbox.org/meta/20180209205140.GA11047@dcvr/
	https://public-inbox.org/meta/20180215105509.GA22409@dcvr/

[-- Attachment #2: over.perl --]
[-- Type: text/plain, Size: 930 bytes --]

#!/usr/bin/perl -w
use strict;
use warnings;
use IO::Socket::INET;
my $usage = "$0 news://example.com/group.name [MIN] [MAX]\n";
my $url = shift or die $usage;
my $umin = shift;
my $umax = shift;
my ($host, $port, $group) = ($url =~ m!://([^/]+)?(?::(\d+))?/(.+)!);
$port ||= 119;
defined $group or die "missing group in $url\n";
my %opts = ( Proto => 'tcp', PeerHost => $host, PeerPort => $port );
my $s = IO::Socket::INET->new(%opts) or die "connect to $host:$port: $!\n";
my $l = $s->getline;
$l =~ /\A2\d\d / or die "bad greeting: $l\n";

$s->print("GROUP $group\n") or die "print $!";
$l = $s->getline;
$l =~ /\A211 \d+ (\d+) (\d+) / or die "bad GROUP response: $l\n";
my ($min, $max) = ($1, $2);
$min = $umin if $umin;
$max = $umax if $umax;

$s->print("OVER $min-$max\n") or die "print $!";
$l = $s->getline;
$l =~ /\A224 / or die "bad OVER response: $l\n";

while ($l = $s->getline) {
	last if $l eq ".\r\n";
	print $l;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: commit -> public-inbox link helper
  2018-04-20  8:39 ` Eric Wong
@ 2018-04-20 19:40   ` Johannes Schindelin
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2018-04-20 19:40 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

Hi Eric,

On Fri, 20 Apr 2018, Eric Wong wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > 
> > I found myself in dear need to quickly look up mails in the
> > public-inbox mail archive corresponding to any given commit in
> > git.git. Some time ago, I wrote a shell script to help me with that,
> > and I found myself using it a couple of times, so I think it might be
> > useful for others, too.
> 
> Hello, I think you can dump all the info you need more quickly
> without cloning 1G of data by dumping NNTP OVER(view)
> information instead.

That might be true for the current state of affairs.

However, there *are* cases (I think I even linked to my original mail with
my post-GitMerge 2017 analysis) where the triplet Date/Author/Email is not
enough, where even some patch series have the identical triplet for every
single patch.

Even if I did not hit those cases yet, and therefore did not implement
that part, I needed to keep the door open for that. So I need a clone.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-20 19:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-04 16:35 commit -> public-inbox link helper Johannes Schindelin
2018-04-04 18:36 ` Jeff King
2018-04-04 20:59   ` Johannes Schindelin
2018-04-04 19:58 ` Mike Rappazzo
2018-04-04 21:02   ` Johannes Schindelin
2018-04-20  8:39 ` Eric Wong
2018-04-20 19:40   ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).