git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC] git-send-email: Cache generated message-ids, use them when prompting
@ 2013-08-17  0:58 Rasmus Villemoes
  2013-08-18 21:08 ` Junio C Hamano
  2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes
  0 siblings, 2 replies; 7+ messages in thread
From: Rasmus Villemoes @ 2013-08-17  0:58 UTC (permalink / raw)
  To: gitster; +Cc: git, Rasmus Villemoes

This is mostly a proof of concept/RFC patch. The idea is for
git-send-email to store the message-ids it generates, along with the
Subject and Date headers of the message. When prompting for which
Message-ID should be used in In-Reply-To, display a list of recent
emails (identifed using the Date/Subject pairs; the message-ids
themselves are not for human consumption). Choosing from that list
will then use the corresponding message-id; otherwise, the behaviour
is as usual.

When composing v2 or v3 of a patch or patch series, this avoids the
need to get one's MUA to display the Message-ID of the earlier email
(which is cumbersome in some MUAs) and then copy-paste that.

If this idea is accepted, I'm certain I'll get to use the feature
immediately, since the patch is not quite ready for inclusion :-)

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 git-send-email.perl | 101 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 96 insertions(+), 5 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index f608d9b..2e3685c 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -26,6 +26,7 @@ use Data::Dumper;
 use Term::ANSIColor;
 use File::Temp qw/ tempdir tempfile /;
 use File::Spec::Functions qw(catfile);
+use Date::Parse;
 use Error qw(:try);
 use Git;
 
@@ -203,6 +204,7 @@ my ($validate, $confirm);
 my (@suppress_cc);
 my ($auto_8bit_encoding);
 my ($compose_encoding);
+my ($msgid_cache_file, $msgid_maxprompt);
 
 my ($debug_net_smtp) = 0;		# Net::SMTP, see send_message()
 
@@ -214,7 +216,7 @@ my %config_bool_settings = (
     "signedoffcc" => [\$signed_off_by_cc, undef],      # Deprecated
     "validate" => [\$validate, 1],
     "multiedit" => [\$multiedit, undef],
-    "annotate" => [\$annotate, undef]
+    "annotate" => [\$annotate, undef],
 );
 
 my %config_settings = (
@@ -237,6 +239,7 @@ my %config_settings = (
     "from" => \$sender,
     "assume8bitencoding" => \$auto_8bit_encoding,
     "composeencoding" => \$compose_encoding,
+    "msgidcachefile" => \$msgid_cache_file,
 );
 
 my %config_path_settings = (
@@ -311,6 +314,7 @@ my $rc = GetOptions("h" => \$help,
 		    "8bit-encoding=s" => \$auto_8bit_encoding,
 		    "compose-encoding=s" => \$compose_encoding,
 		    "force" => \$force,
+		    "msgid-cache-file=s" => \$msgid_cache_file,
 	 );
 
 usage() if $help;
@@ -784,10 +788,31 @@ sub expand_one_alias {
 @bcclist = validate_address_list(sanitize_address_list(@bcclist));
 
 if ($thread && !defined $initial_reply_to) {
-	$initial_reply_to = ask(
-		"Message-ID to be used as In-Reply-To for the first email (if any)? ",
-		default => "",
-		valid_re => qr/\@.*\./, confirm_only => 1);
+	my @choices;
+	if ($msgid_cache_file) {
+		@choices = msgid_cache_getmatches();
+	}
+	if (@choices) {
+		my $prompt = '';
+		my $i = 0;
+		$prompt .= sprintf "(%d) [%s] %s\n", $i++, $_->{date}, $_->{subject}
+		    for (@choices);
+		$prompt .= sprintf "Answer 0-%d to use the Message-ID of one of the above\n", $#choices;
+		$prompt .= "Message-ID to be used as In-Reply-To for the first email (if any)? ";
+		$initial_reply_to = 
+		    ask($prompt,
+			default => "",
+			valid_re => qr/\@.*\.|^[0-9]+$/, confirm_only => 1);
+		if ($initial_reply_to =~ /^[0-9]+$/ && $initial_reply_to < @choices) {
+			$initial_reply_to = $choices[$initial_reply_to]{id};
+		}
+	}
+	else {
+		$initial_reply_to = 
+		    ask("Message-ID to be used as In-Reply-To for the first email (if any)? ",
+			default => "",
+			valid_re => qr/\@.*\./, confirm_only => 1);
+	}
 }
 if (defined $initial_reply_to) {
 	$initial_reply_to =~ s/^\s*<?//;
@@ -1282,6 +1307,8 @@ X-Mailer: git-send-email $gitversion
 		}
 	}
 
+	msgid_cache_this($message_id, $subject, $date) if ($msgid_cache_file && !$dry_run);
+
 	return 1;
 }
 
@@ -1508,6 +1535,8 @@ sub cleanup_compose_files {
 
 $smtp->quit if $smtp;
 
+msgid_cache_write() if $msgid_cache_file;
+
 sub unique_email_list {
 	my %seen;
 	my @emails;
@@ -1556,3 +1585,65 @@ sub body_or_subject_has_nonascii {
 	}
 	return 0;
 }
+
+my @msgid_new_entries;
+
+# For now, use a simple tab-separated format:
+#
+#    $id\t$date\t$subject\n
+sub msgid_cache_read {
+	my $fh;
+	my $line;
+	my @entries;
+	if (not open ($fh, '<', $msgid_cache_file)) {
+		# A non-existing cache file is ok, but should we warn if errno != ENOENT?
+		return ();
+	}
+	while ($line = <$fh>) {
+		chomp($line);
+		my ($id, $date, $subject) = split /\t/, $line;
+		my $epoch = str2time($date);
+		push @entries, {id=>$id, date=>$date, epoch=>$epoch, subject=>$subject};
+	}
+	close($fh);
+	return @entries;
+}
+sub msgid_cache_write {
+	my $fh;
+	if (not open($fh, '>>', $msgid_cache_file)) {
+	    warn "cannot open $msgid_cache_file for appending: $!";
+	    return;
+	}
+	printf $fh "%s\t%s\t%s\n", $_->{id}, $_->{date}, $_->{subject} for (@msgid_new_entries);
+	close($fh);
+}
+# Return an array of cached message-ids, ordered by "relevance". It
+# might make sense to take the Subject of the new mail as an extra
+# argument and do some kind of fuzzy matching against the old
+# subjects, but for now "more relevant" simply means "newer".
+sub msgid_cache_getmatches {
+	my ($maxentries) = @_;
+	$maxentries //= 10;
+	my @list = msgid_cache_read();
+	@list = sort {$b->{epoch} <=> $a->{epoch}} @list;
+	@list = @list[0 .. $maxentries-1] if (@list > $maxentries);
+	return @list;
+}
+
+sub msgid_cache_this {
+	my $msgid = shift;
+	my $subject = shift;
+	my $date = shift;
+	# Make sure there are no tabs which will confuse us, and save
+	# some valuable horizontal real-estate by removing redundant
+	# whitespace.
+	if ($subject) {
+		$subject =~ s/^\s+|\s+$//g;
+		$subject =~ s/\s+/ /g;
+	}
+	# Replace undef or the empty string by an actual string. Nobody uses "0" as the subject...
+	$subject ||= '(none)';
+	$date //= format_2822_time(time());
+	$date =~ s/\s+/ /g;
+	push @msgid_new_entries, {id => $msgid, subject => $subject, date => $date};
+}
-- 
1.8.4.rc3.2.gb900fc8

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC] git-send-email: Cache generated message-ids, use them when prompting
  2013-08-17  0:58 [RFC] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes
@ 2013-08-18 21:08 ` Junio C Hamano
  2013-08-18 22:24   ` brian m. carlson
  2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2013-08-18 21:08 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: git

Rasmus Villemoes <rv@rasmusvillemoes.dk> writes:

> This is mostly a proof of concept/RFC patch. The idea is for
> git-send-email to store the message-ids it generates, along with the
> Subject and Date headers of the message. When prompting for which
> Message-ID should be used in In-Reply-To, display a list of recent
> emails (identifed using the Date/Subject pairs; the message-ids
> themselves are not for human consumption). Choosing from that list
> will then use the corresponding message-id; otherwise, the behaviour
> is as usual.
>
> When composing v2 or v3 of a patch or patch series, this avoids the
> need to get one's MUA to display the Message-ID of the earlier email
> (which is cumbersome in some MUAs) and then copy-paste that.
>
> If this idea is accepted, I'm certain I'll get to use the feature
> immediately, since the patch is not quite ready for inclusion :-)
>
> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
> ---
>  git-send-email.perl | 101 +++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 96 insertions(+), 5 deletions(-)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index f608d9b..2e3685c 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -26,6 +26,7 @@ use Data::Dumper;
>  use Term::ANSIColor;
>  use File::Temp qw/ tempdir tempfile /;
>  use File::Spec::Functions qw(catfile);
> +use Date::Parse;

Hmm, is this part of core that we do not have to worry about some
people not having it?  It appears that Git/SVN/Log.pm explicitly
avoids using it.

> @@ -203,6 +204,7 @@ my ($validate, $confirm);
>  my (@suppress_cc);
>  my ($auto_8bit_encoding);
>  my ($compose_encoding);
> +my ($msgid_cache_file, $msgid_maxprompt);

I do not see $msgid_maxprompt used anywhere in the new code.

>  
>  my ($debug_net_smtp) = 0;		# Net::SMTP, see send_message()
>  
> @@ -214,7 +216,7 @@ my %config_bool_settings = (
>      "signedoffcc" => [\$signed_off_by_cc, undef],      # Deprecated
>      "validate" => [\$validate, 1],
>      "multiedit" => [\$multiedit, undef],
> -    "annotate" => [\$annotate, undef]
> +    "annotate" => [\$annotate, undef],

Is this related to this patch in any way?

>  );
>  
>  my %config_settings = (
> @@ -237,6 +239,7 @@ my %config_settings = (
>      "from" => \$sender,
>      "assume8bitencoding" => \$auto_8bit_encoding,
>      "composeencoding" => \$compose_encoding,
> +    "msgidcachefile" => \$msgid_cache_file,
>  );
>  
>  my %config_path_settings = (
> @@ -311,6 +314,7 @@ my $rc = GetOptions("h" => \$help,
>  		    "8bit-encoding=s" => \$auto_8bit_encoding,
>  		    "compose-encoding=s" => \$compose_encoding,
>  		    "force" => \$force,
> +		    "msgid-cache-file=s" => \$msgid_cache_file,
>  	 );

Is there a standard, recommended location we suggest users to store
this?  

>  usage() if $help;
> @@ -784,10 +788,31 @@ sub expand_one_alias {
>  @bcclist = validate_address_list(sanitize_address_list(@bcclist));
>  
>  if ($thread && !defined $initial_reply_to) {
> -	$initial_reply_to = ask(
> -		"Message-ID to be used as In-Reply-To for the first email (if any)? ",
> -		default => "",
> -		valid_re => qr/\@.*\./, confirm_only => 1);
> +	my @choices;
> +	if ($msgid_cache_file) {
> +		@choices = msgid_cache_getmatches();

It is a bit strange that "filename is specified => we will find the
latest 10" before seeing if we even check to see if that file
exists.  I would have expected that a two-step "filename is given =>
try to read it" and "if we did read something => give choices"
process would be used.

Also "getmatches" that does not take any clue from what the caller
knows (the title of the series, for example) seems much less useful
than ideal.  The callee is not getting anything to work with to get
sensible "matches".

> @@ -1282,6 +1307,8 @@ X-Mailer: git-send-email $gitversion
>  		}
>  	}
>  
> +	msgid_cache_this($message_id, $subject, $date) if ($msgid_cache_file && !$dry_run);

Is this caching each and every one, even for things like "[PATCH 23/47]"?

> @@ -1508,6 +1535,8 @@ sub cleanup_compose_files {
>  
>  $smtp->quit if $smtp;
>  
> +msgid_cache_write() if $msgid_cache_file;

Is this done under --dry-run?

> @@ -1556,3 +1585,65 @@ sub body_or_subject_has_nonascii {
>  	}
>  	return 0;
>  }
> +
> +my @msgid_new_entries;
> +
> +# For now, use a simple tab-separated format:
> +#
> +#    $id\t$date\t$subject\n
> +sub msgid_cache_read {
> +	my $fh;
> +	my $line;
> +	my @entries;
> +	if (not open ($fh, '<', $msgid_cache_file)) {
> +		# A non-existing cache file is ok, but should we warn if errno != ENOENT?

It should not be a warning but an informational message, "creating a
new cachefile", when bootstrapping, no?

> +	while ($line = <$fh>) {
> +		chomp($line);
> +		my ($id, $date, $subject) = split /\t/, $line;
> +		my $epoch = str2time($date);
> +		push @entries, {id=>$id, date=>$date, epoch=>$epoch, subject=>$subject};
> +	}
> +	close($fh);
> +	return @entries;

So all the old ones are read, without dropping ancient and possibly
useless ones here...

> +sub msgid_cache_write {
> +	my $fh;
> +	if (not open($fh, '>>', $msgid_cache_file)) {
> +	    warn "cannot open $msgid_cache_file for appending: $!";
> +	    return;
> +	}
> +	printf $fh "%s\t%s\t%s\n", $_->{id}, $_->{date}, $_->{subject} for (@msgid_new_entries);
> +	close($fh);

And the new ones are appended to the end without expiring anything.

When does the cache shrink, and who is responsible for doing so?

> +# Return an array of cached message-ids, ordered by "relevance". It
> +# might make sense to take the Subject of the new mail as an extra
> +# argument and do some kind of fuzzy matching against the old
> +# subjects, but for now "more relevant" simply means "newer".
> +sub msgid_cache_getmatches {
> +	my ($maxentries) = @_;
> +	$maxentries //= 10;

The //= operator is mentioned in perl581delta.pod, it seems, and
none of our Perl scripted Porcelains seems to use it yet.  Is it
safe to assume that everybody has it?

> +	my @list = msgid_cache_read();
> +	@list = sort {$b->{epoch} <=> $a->{epoch}} @list;
> +	@list = @list[0 .. $maxentries-1] if (@list > $maxentries);
> +	return @list;
> +}
> +
> +sub msgid_cache_this {
> +	my $msgid = shift;
> +	my $subject = shift;
> +	my $date = shift;
> +	# Make sure there are no tabs which will confuse us, and save
> +	# some valuable horizontal real-estate by removing redundant
> +	# whitespace.
> +	if ($subject) {
> +		$subject =~ s/^\s+|\s+$//g;
> +		$subject =~ s/\s+/ /g;
> +	}
> +	# Replace undef or the empty string by an actual string. Nobody uses "0" as the subject...

That's a strange sloppiness for somebody who uses //=, no?

> +	$subject ||= '(none)';
> +	$date //= format_2822_time(time());
> +	$date =~ s/\s+/ /g;
> +	push @msgid_new_entries, {id => $msgid, subject => $subject, date => $date};
> +}

I am personally not very interested, but that only means I will not
be the one who will be enthusiastically rooting for inclusion; it
does not mean I reject the general notion outright.

But the behaviour the posted patch seems to implement seems to fall
far short of how useful the general notion "save message ID of what
we sent, so that we can reply to them" could be.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] git-send-email: Cache generated message-ids, use them when prompting
  2013-08-18 21:08 ` Junio C Hamano
@ 2013-08-18 22:24   ` brian m. carlson
  0 siblings, 0 replies; 7+ messages in thread
From: brian m. carlson @ 2013-08-18 22:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Rasmus Villemoes, git

[-- Attachment #1: Type: text/plain, Size: 1032 bytes --]

On Sun, Aug 18, 2013 at 02:08:00PM -0700, Junio C Hamano wrote:
> Rasmus Villemoes <rv@rasmusvillemoes.dk> writes:
> > +# Return an array of cached message-ids, ordered by "relevance". It
> > +# might make sense to take the Subject of the new mail as an extra
> > +# argument and do some kind of fuzzy matching against the old
> > +# subjects, but for now "more relevant" simply means "newer".
> > +sub msgid_cache_getmatches {
> > +	my ($maxentries) = @_;
> > +	$maxentries //= 10;
> 
> The //= operator is mentioned in perl581delta.pod, it seems, and
> none of our Perl scripted Porcelains seems to use it yet.  Is it
> safe to assume that everybody has it?

This operator is new in Perl 5.10.  If you want the code to work on
RHEL/CentOS 5, you need to avoid it, since they only have 5.8 available.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 0/2] git-send-email: Message-ID caching
  2013-08-17  0:58 [RFC] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes
  2013-08-18 21:08 ` Junio C Hamano
@ 2013-08-21 19:04 ` Rasmus Villemoes
  2013-08-21 19:04   ` [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub Rasmus Villemoes
                     ` (2 more replies)
  1 sibling, 3 replies; 7+ messages in thread
From: Rasmus Villemoes @ 2013-08-21 19:04 UTC (permalink / raw)
  To: gitster, sandals, git; +Cc: Rasmus Villemoes

This is a more polished attempt at implementing the message-id
caching. I apologize for the sloppiness (unrelated changes, unused
variables etc.) in the first version.

I have split the patch in two. The first half moves the choice-logic
inside ask(): If ask() decides to return the default immediately
(because the $term->{IN,OUT} checks fail), there's no reason to print
all the choices. In my first attempt, I handled this by passing a huge
prompt-string, but that made everything underlined (the prompt may be
emphasized in some other way on other terminals). Passing a different
valid_re and handling a numeric return is also slightly more
cumbersome for the caller than making ask() take care of it. There
might be other future uses of this 'choices' ability, but if the
message-id-caching is ultimately rejected, [1/2] can of course also be
dropped.

[2/2] is the new implementation. The two main changes are that old
entries are expunged when the file grows larger than, by default, 100
kB, and the old emails are scored according to a simple scheme (which
might need improvement). The input to the scoring is {first subject in
new batch, old subject, was the old email first in a batch?}. [*]

I also now simply store the unix time stamp instead of storing the
contents of the date header. This reduces processing time (no need to
parse the date header when reading the file), and eliminates the
Date::Parse dependency. The minor downside is that if the user has
changed time zone since the old email was sent, the date in the prompt
will not entirely match the Date:-header in the email that was sent.

I had to rearrange the existing code a little to make certain global
variables ($time, @files) contain the right thing at the right time,
but hopefully I haven't broken anything in the process.

[*] Suggestions for improving that heuristic are welcome. One thing
that might be worthwhile is to give words ending in a colon higher
weight.

Rasmus Villemoes (2):
  git-send-email: add optional 'choices' parameter to the ask sub
  git-send-email: Cache generated message-ids, use them when prompting

 git-send-email.perl |  144 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 138 insertions(+), 6 deletions(-)


Thanks, Junio, for the comments. A few replies:

Junio C Hamano <gitster@pobox.com> writes:

> Rasmus Villemoes <rv@rasmusvillemoes.dk> writes:
>>  my %config_path_settings = (
>> @@ -311,6 +314,7 @@ my $rc = GetOptions("h" => \$help,
>>  		    "8bit-encoding=s" => \$auto_8bit_encoding,
>>  		    "compose-encoding=s" => \$compose_encoding,
>>  		    "force" => \$force,
>> +		    "msgid-cache-file=s" => \$msgid_cache_file,
>>  	 );
>
> Is there a standard, recommended location we suggest users to store
> this?  

I don't know. It is obviously a local, per-repository, thing. I don't
know enough about git's internals to know if something breaks if one
puts it in .git (say, ".git/msgid.cache"). If that's a bad idea, then
I think it should be in the root of the repository, and one would then
probably want to add it to .git/info/exclude.

If storing it under .git is possible, one could consider making the
option a boolean ('msgid-use-cache' ?) and always use
".git/msgid.cache".

>> +	my @choices;
>> +	if ($msgid_cache_file) {
>> +		@choices = msgid_cache_getmatches();
>
> It is a bit strange that "filename is specified => we will find the
> latest 10" before seeing if we even check to see if that file
> exists.  I would have expected that a two-step "filename is given =>
> try to read it" and "if we did read something => give choices"
> process would be used.

I'm not sure I understand how this is different from what I was or am
doing. Sure, I don't do the test for existence before calling
msgid_cache_getmatches(), but that will just return an empty list if
the file does not exist. That will end up doing the same thing as if
no cache file was given.

> Also "getmatches" that does not take any clue from what the caller
> knows (the title of the series, for example) seems much less useful
> than ideal.

Fixed in the new version.

>> +	msgid_cache_this($message_id, $subject, $date) if ($msgid_cache_file && !$dry_run);
>
> Is this caching each and every one, even for things like "[PATCH 23/47]"?

Yes, but now I remember whether it was the first or not. 

>> +msgid_cache_write() if $msgid_cache_file;
>
> Is this done under --dry-run?

Well, it was, but the msgid_cache_this() was not, so there was an
empty list of new entries. Of course, better to be explicit and safe,
so fixed.

>> +	if (not open ($fh, '<', $msgid_cache_file)) {
>> +		# A non-existing cache file is ok, but should we warn if errno != ENOENT?
>
> It should not be a warning but an informational message, "creating a
> new cachefile", when bootstrapping, no?

If so, it should probably be when writing the new file. What I'm
trying to say is that errno == ENOENT is ok (and expected), but errno
== EPERM or anything else might be a condition we should warn or even
die on.

>> +	while ($line = <$fh>) {
>> +		chomp($line);
>> +		my ($id, $date, $subject) = split /\t/, $line;
>> +		my $epoch = str2time($date);
>> +		push @entries, {id=>$id, date=>$date, epoch=>$epoch, subject=>$subject};
>> +	}
>> +	close($fh);
>> +	return @entries;
>
> So all the old ones are read, without dropping ancient and possibly
> useless ones here...
>
>> +sub msgid_cache_write {
>> +	my $fh;
>> +	if (not open($fh, '>>', $msgid_cache_file)) {
>> +	    warn "cannot open $msgid_cache_file for appending: $!";
>> +	    return;
>> +	}
>> +	printf $fh "%s\t%s\t%s\n", $_->{id}, $_->{date}, $_->{subject} for (@msgid_new_entries);
>> +	close($fh);
>
> And the new ones are appended to the end without expiring anything.
>
> When does the cache shrink, and who is responsible for doing so?

Fixed. Whether it should be "cap when larger than $size" or "remove
entries older than time()-$maxage" I don't know.

> The //= operator is mentioned in perl581delta.pod, it seems, and
> none of our Perl scripted Porcelains seems to use it yet.  Is it
> safe to assume that everybody has it?

No, fixed.

>> +	if ($subject) {
>> +		$subject =~ s/^\s+|\s+$//g;
>> +		$subject =~ s/\s+/ /g;
>> +	}
>> +	# Replace undef or the empty string by an actual string. Nobody uses "0" as the subject...
>
> That's a strange sloppiness for somebody who uses //=, no?

Fixed.


-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub
  2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes
@ 2013-08-21 19:04   ` Rasmus Villemoes
  2013-08-21 19:04   ` [PATCH 2/2] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes
  2013-08-22  0:20   ` [PATCH v2 0/2] git-send-email: Message-ID caching Junio C Hamano
  2 siblings, 0 replies; 7+ messages in thread
From: Rasmus Villemoes @ 2013-08-21 19:04 UTC (permalink / raw)
  To: gitster, sandals, git; +Cc: Rasmus Villemoes

Make it possible for callers of ask() to provide a list of
choices. Entering an appropriate integer chooses from that list,
otherwise the input is treated as usual.

Each choice can either be a single string, which is used both for the
prompt and for the return value, or a two-element array ref, where the
zeroth element is the choice and the first is used for the prompt.

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 git-send-email.perl |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/git-send-email.perl b/git-send-email.perl
index 2162478..ac3b02d 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -680,11 +680,18 @@ sub ask {
 	my $valid_re = $arg{valid_re};
 	my $default = $arg{default};
 	my $confirm_only = $arg{confirm_only};
+	my $choices = $arg{choices} || [];
 	my $resp;
 	my $i = 0;
 	return defined $default ? $default : undef
 		unless defined $term->IN and defined fileno($term->IN) and
 		       defined $term->OUT and defined fileno($term->OUT);
+	for (@$choices) {
+		printf "(%d) %s\n", $i++, ref($_) eq 'ARRAY' ? $_->[1] : $_;
+	}
+	printf "Enter 0-%d to choose from the above list\n", $i-1
+		if (@$choices);
+	$i = 0;
 	while ($i++ < 10) {
 		$resp = $term->readline($prompt);
 		if (!defined $resp) { # EOF
@@ -694,6 +701,10 @@ sub ask {
 		if ($resp eq '' and defined $default) {
 			return $default;
 		}
+		if (@$choices && $resp =~ m/^[0-9]+$/ && $resp < @$choices) {
+			my $c = $choices->[$resp];
+			return ref($c) eq 'ARRAY' ? $c->[0] : $c;
+		}
 		if (!defined $valid_re or $resp =~ /$valid_re/) {
 			return $resp;
 		}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] git-send-email: Cache generated message-ids, use them when prompting
  2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes
  2013-08-21 19:04   ` [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub Rasmus Villemoes
@ 2013-08-21 19:04   ` Rasmus Villemoes
  2013-08-22  0:20   ` [PATCH v2 0/2] git-send-email: Message-ID caching Junio C Hamano
  2 siblings, 0 replies; 7+ messages in thread
From: Rasmus Villemoes @ 2013-08-21 19:04 UTC (permalink / raw)
  To: gitster, sandals, git; +Cc: Rasmus Villemoes

Allow the user to specify a file (sendemail.msgidcachefile) in which
to store the message-ids generated by git-send-email, along with time
and subject information. When prompting for a Message-ID to be used in
In-Reply-To, that file can be used to generate a list of options.

When composing v2 or v3 of a patch or patch series, this avoids the
need to get one's MUA to display the Message-ID of the earlier email
(which is cumbersome in some MUAs) and then copy-paste that.

Listing all previously sent emails is useless, so currently only the
10 most "relevant" emails. "Relevant" is based on a simple scoring,
which might need to be revised: Count the words in the old subject
which also appear in the subject of the first email to be sent; add a
bonus if the old email was first in a batch (that is, [00/74] is more
likely to be relevant than [43/74]). Resort to comparing timestamps
(newer is more relevant) when the scores tie.

To limit disk usage, the oldest half of the cached entries are
expunged when the cache file exceeds sendemail.msgidcachemaxsize
(default 100kB). This also ensures that we will never have to read,
score, and sort 1000s of entries on each invocation of git-send-email.

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 git-send-email.perl |  133 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 127 insertions(+), 6 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index ac3b02d..5094267 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -203,6 +203,7 @@ my ($validate, $confirm);
 my (@suppress_cc);
 my ($auto_8bit_encoding);
 my ($compose_encoding);
+my ($msgid_cache_file, $msgid_cache_maxsize);
 
 my ($debug_net_smtp) = 0;		# Net::SMTP, see send_message()
 
@@ -237,6 +238,8 @@ my %config_settings = (
     "from" => \$sender,
     "assume8bitencoding" => \$auto_8bit_encoding,
     "composeencoding" => \$compose_encoding,
+    "msgidcachefile" => \$msgid_cache_file,
+    "msgidcachemaxsize" => \$msgid_cache_maxsize,
 );
 
 my %config_path_settings = (
@@ -796,11 +799,23 @@ sub expand_one_alias {
 @bcclist = expand_aliases(@bcclist);
 @bcclist = validate_address_list(sanitize_address_list(@bcclist));
 
+if ($compose && $compose > 0) {
+	@files = ($compose_filename . ".final", @files);
+}
+
 if ($thread && !defined $initial_reply_to && $prompting) {
+	my @choices = ();
+	if ($msgid_cache_file) {
+		my $first_subject = get_patch_subject($files[0]);
+		$first_subject =~ s/^GIT: //;
+		@choices = msgid_cache_getmatches($first_subject, 10);
+		@choices = map {[$_->{id}, sprintf "[%s] %s", format_2822_time($_->{epoch}), $_->{subject}]} @choices;
+	}
 	$initial_reply_to = ask(
 		"Message-ID to be used as In-Reply-To for the first email (if any)? ",
 		default => "",
-		valid_re => qr/\@.*\./, confirm_only => 1);
+		valid_re => qr/\@.*\./, confirm_only => 1,
+		choices => \@choices);
 }
 if (defined $initial_reply_to) {
 	$initial_reply_to =~ s/^\s*<?//;
@@ -818,10 +833,6 @@ if (!defined $smtp_server) {
 	$smtp_server ||= 'localhost'; # could be 127.0.0.1, too... *shrug*
 }
 
-if ($compose && $compose > 0) {
-	@files = ($compose_filename . ".final", @files);
-}
-
 # Variables we set as part of the loop over files
 our ($message_id, %mail, $subject, $reply_to, $references, $message,
 	$needs_confirm, $message_num, $ask_default);
@@ -1136,7 +1147,7 @@ sub send_message {
 	my $to = join (",\n\t", @recipients);
 	@recipients = unique_email_list(@recipients,@cc,@bcclist);
 	@recipients = (map { extract_valid_address_or_die($_) } @recipients);
-	my $date = format_2822_time($time++);
+	my $date = format_2822_time($time);
 	my $gitversion = '@@GIT_VERSION@@';
 	if ($gitversion =~ m/..GIT_VERSION../) {
 	    $gitversion = Git::version();
@@ -1477,6 +1488,11 @@ foreach my $t (@files) {
 
 	my $message_was_sent = send_message();
 
+	if ($message_was_sent && $msgid_cache_file && !$dry_run) {
+		msgid_cache_this($message_id, $message_num == 1 ? 1 : 0, , $time, $subject);
+	}
+	$time++;
+
 	# set up for the next message
 	if ($thread && $message_was_sent &&
 		($chain_reply_to || !defined $reply_to || length($reply_to) == 0 ||
@@ -1521,6 +1537,8 @@ sub cleanup_compose_files {
 
 $smtp->quit if $smtp;
 
+msgid_cache_write() if $msgid_cache_file && !$dry_run;
+
 sub unique_email_list {
 	my %seen;
 	my @emails;
@@ -1569,3 +1587,106 @@ sub body_or_subject_has_nonascii {
 	}
 	return 0;
 }
+
+my @msgid_new_entries;
+sub msgid_cache_this {
+	my $msgid = shift;
+	my $first = shift;
+	my $epoch = shift;
+	my $subject = shift;
+	# Make sure there are no tabs which will confuse us, and save
+	# some valuable horizontal real-estate by removing redundant
+	# whitespace.
+	if ($subject) {
+		$subject =~ s/^\s+|\s+$//g;
+		$subject =~ s/\s+/ /g;
+	}
+	# Replace undef or the empty string by an actual string.
+	$subject = '(none)' if (!defined $subject || $subject eq '');
+
+	push @msgid_new_entries, {id => $msgid, first => $first, subject => $subject, epoch => $epoch};
+}
+
+
+# For now, use a simple tab-separated format:
+#
+#    $id\t$wasfirst\t$unixtime\t$subject\n
+sub msgid_cache_read {
+	my $fh;
+	my $line;
+	my @entries;
+	if (not open ($fh, '<', $msgid_cache_file)) {
+		# A non-existing cache file is ok, but should we warn if errno != ENOENT?
+		return ();
+	}
+	while ($line = <$fh>) {
+		chomp($line);
+		my ($id, $first, $epoch, $subject) = split /\t/, $line;
+		push @entries, {id=>$id, first=>$first, epoch=>$epoch, subject=>$subject};
+	}
+	close($fh);
+	return @entries;
+}
+
+sub msgid_cache_getmatches {
+	my ($first_subject, $maxentries) = @_;
+	my @list = msgid_cache_read();
+
+	# We need to find the message-ids which are most likely to be
+	# useful. There are probably better ways to do this, but for
+	# now we simply count how many words in the old subject also
+	# appear in $first_subject.
+	my %words = map {$_ => 1} msgid_subject_words($first_subject);
+	for my $item (@list) {
+		# Emails which were first in a batch are more likely
+		# to be used for followups (cf. the example in "man
+		# git-send-email"), so give those a head start.
+		my $score = $item->{first} ? 3 : 0;
+		for (msgid_subject_words($item->{subject})) {
+			$score++ if exists $words{$_};
+		}
+		$item->{score} = $score;
+	}
+	@list = sort {$b->{score} <=> $a->{score} ||
+		      $b->{epoch} <=> $a->{epoch}} @list;
+	@list = @list[0 .. $maxentries-1] if (@list > $maxentries);
+	return @list;
+}
+
+sub msgid_subject_words {
+	my $subject = shift;
+	# Ignore initial "[PATCH 02/47]"
+	$subject =~ s/^\s*\[.*?\]//;
+	my @words = split /\s+/, $subject;
+	# Ignore short words. 
+	@words = grep { length > 3 } @words;
+	return @words;
+}
+
+sub msgid_cache_write {
+	msgid_cache_do_write(1, \@msgid_new_entries);
+
+	if (defined $msgid_cache_maxsize && $msgid_cache_maxsize =~ m/^\s*([0-9]+)\s*([kKmMgG]?)$/) {
+		my %SI = ('' => 1, 'k' => 1e3, 'm' => 1e6, 'g' => 1e9);
+		$msgid_cache_maxsize = $1 * $SI{lc($2)};
+	}
+	else {
+		$msgid_cache_maxsize = 100000;
+	}
+	if (-s $msgid_cache_file > $msgid_cache_maxsize) {
+		my @entries = msgid_cache_read();
+		splice @entries, 0, int(@entries/2);
+		msgid_cache_do_write(0, \@entries);
+	}
+}
+
+sub msgid_cache_do_write {
+	my $append = shift;
+	my $entries = shift;
+	my $fh;
+	if (not open($fh, $append ? '>>' : '>', $msgid_cache_file)) {
+		die "cannot open $msgid_cache_file for writing: $!";
+	}
+	printf $fh "%s\t%d\t%s\t%s\n", $_->{id}, $_->{first}, $_->{epoch}, $_->{subject} for (@$entries);
+	close($fh);
+}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/2] git-send-email: Message-ID caching
  2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes
  2013-08-21 19:04   ` [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub Rasmus Villemoes
  2013-08-21 19:04   ` [PATCH 2/2] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes
@ 2013-08-22  0:20   ` Junio C Hamano
  2 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2013-08-22  0:20 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: sandals, git

Rasmus Villemoes <rv@rasmusvillemoes.dk> writes:

>> Rasmus Villemoes <rv@rasmusvillemoes.dk> writes:
>>>  my %config_path_settings = (
>>> @@ -311,6 +314,7 @@ my $rc = GetOptions("h" => \$help,
>>>  		    "8bit-encoding=s" => \$auto_8bit_encoding,
>>>  		    "compose-encoding=s" => \$compose_encoding,
>>>  		    "force" => \$force,
>>> +		    "msgid-cache-file=s" => \$msgid_cache_file,
>>>  	 );
>>
>> Is there a standard, recommended location we suggest users to store
>> this?  
>
> I don't know. It is obviously a local, per-repository, thing. I don't
> know enough about git's internals to know if something breaks if one
> puts it in .git (say, ".git/msgid.cache").

I think $GIT_DIR is OK, when we _know_ we are in a Git controlled
directory.  "git send-email" can however be invoked in a random
directory that is _not_ a Git controlled directory, though.

In any case, if we were to store it inside $GIT_DIR, I'd prefer to
have "send-email" somewhere in the name of the file, as there are
other Git programs that deal with things that have "msgid" (notably,
"am") that will not have anything to do with this file.

> If storing it under .git is possible, one could consider making the
> option a boolean ('msgid-use-cache' ?) and always use
> ".git/msgid.cache".

Another possibility is to have it in the output directory specified
via the "format-patch -o $dir" option.  When you are rerolling a
series multiple times, you will only look at the message ID from the
previous round; you do not even need to look at old messages in an
unrelated topic.

I could imagine that

	git send-email $dir/0*.txt

can notice that these input files are all in the same $dir
directory, check to see if $dir/message-id file exists, read it to
offer it as the suggested initial-reply-to.  Similarly, when sending
the _first_ message in such an invocation, it can just write the
generated message-id to that file.  Then we need no choices.  It is
sufficient to just keep a single message-id of the first message in
the previous round and offer it as a possible initial-reply-to in a
Yes/No question.

Just a random thought.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-08-22  0:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-17  0:58 [RFC] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes
2013-08-18 21:08 ` Junio C Hamano
2013-08-18 22:24   ` brian m. carlson
2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes
2013-08-21 19:04   ` [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub Rasmus Villemoes
2013-08-21 19:04   ` [PATCH 2/2] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes
2013-08-22  0:20   ` [PATCH v2 0/2] git-send-email: Message-ID caching Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).