git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Rasmus Villemoes <rv@rasmusvillemoes.dk>
To: gitster@pobox.com, sandals@crustytoothpaste.net, git@vger.kernel.org
Cc: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Subject: [PATCH v2 0/2] git-send-email: Message-ID caching
Date: Wed, 21 Aug 2013 19:04:20 +0000	[thread overview]
Message-ID: <1377111862-13199-1-git-send-email-rv@rasmusvillemoes.dk> (raw)
In-Reply-To: <1376701126-5759-1-git-send-email-rv@rasmusvillemoes.dk>

This is a more polished attempt at implementing the message-id
caching. I apologize for the sloppiness (unrelated changes, unused
variables etc.) in the first version.

I have split the patch in two. The first half moves the choice-logic
inside ask(): If ask() decides to return the default immediately
(because the $term->{IN,OUT} checks fail), there's no reason to print
all the choices. In my first attempt, I handled this by passing a huge
prompt-string, but that made everything underlined (the prompt may be
emphasized in some other way on other terminals). Passing a different
valid_re and handling a numeric return is also slightly more
cumbersome for the caller than making ask() take care of it. There
might be other future uses of this 'choices' ability, but if the
message-id-caching is ultimately rejected, [1/2] can of course also be
dropped.

[2/2] is the new implementation. The two main changes are that old
entries are expunged when the file grows larger than, by default, 100
kB, and the old emails are scored according to a simple scheme (which
might need improvement). The input to the scoring is {first subject in
new batch, old subject, was the old email first in a batch?}. [*]

I also now simply store the unix time stamp instead of storing the
contents of the date header. This reduces processing time (no need to
parse the date header when reading the file), and eliminates the
Date::Parse dependency. The minor downside is that if the user has
changed time zone since the old email was sent, the date in the prompt
will not entirely match the Date:-header in the email that was sent.

I had to rearrange the existing code a little to make certain global
variables ($time, @files) contain the right thing at the right time,
but hopefully I haven't broken anything in the process.

[*] Suggestions for improving that heuristic are welcome. One thing
that might be worthwhile is to give words ending in a colon higher
weight.

Rasmus Villemoes (2):
  git-send-email: add optional 'choices' parameter to the ask sub
  git-send-email: Cache generated message-ids, use them when prompting

 git-send-email.perl |  144 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 138 insertions(+), 6 deletions(-)


Thanks, Junio, for the comments. A few replies:

Junio C Hamano <gitster@pobox.com> writes:

> Rasmus Villemoes <rv@rasmusvillemoes.dk> writes:
>>  my %config_path_settings = (
>> @@ -311,6 +314,7 @@ my $rc = GetOptions("h" => \$help,
>>  		    "8bit-encoding=s" => \$auto_8bit_encoding,
>>  		    "compose-encoding=s" => \$compose_encoding,
>>  		    "force" => \$force,
>> +		    "msgid-cache-file=s" => \$msgid_cache_file,
>>  	 );
>
> Is there a standard, recommended location we suggest users to store
> this?  

I don't know. It is obviously a local, per-repository, thing. I don't
know enough about git's internals to know if something breaks if one
puts it in .git (say, ".git/msgid.cache"). If that's a bad idea, then
I think it should be in the root of the repository, and one would then
probably want to add it to .git/info/exclude.

If storing it under .git is possible, one could consider making the
option a boolean ('msgid-use-cache' ?) and always use
".git/msgid.cache".

>> +	my @choices;
>> +	if ($msgid_cache_file) {
>> +		@choices = msgid_cache_getmatches();
>
> It is a bit strange that "filename is specified => we will find the
> latest 10" before seeing if we even check to see if that file
> exists.  I would have expected that a two-step "filename is given =>
> try to read it" and "if we did read something => give choices"
> process would be used.

I'm not sure I understand how this is different from what I was or am
doing. Sure, I don't do the test for existence before calling
msgid_cache_getmatches(), but that will just return an empty list if
the file does not exist. That will end up doing the same thing as if
no cache file was given.

> Also "getmatches" that does not take any clue from what the caller
> knows (the title of the series, for example) seems much less useful
> than ideal.

Fixed in the new version.

>> +	msgid_cache_this($message_id, $subject, $date) if ($msgid_cache_file && !$dry_run);
>
> Is this caching each and every one, even for things like "[PATCH 23/47]"?

Yes, but now I remember whether it was the first or not. 

>> +msgid_cache_write() if $msgid_cache_file;
>
> Is this done under --dry-run?

Well, it was, but the msgid_cache_this() was not, so there was an
empty list of new entries. Of course, better to be explicit and safe,
so fixed.

>> +	if (not open ($fh, '<', $msgid_cache_file)) {
>> +		# A non-existing cache file is ok, but should we warn if errno != ENOENT?
>
> It should not be a warning but an informational message, "creating a
> new cachefile", when bootstrapping, no?

If so, it should probably be when writing the new file. What I'm
trying to say is that errno == ENOENT is ok (and expected), but errno
== EPERM or anything else might be a condition we should warn or even
die on.

>> +	while ($line = <$fh>) {
>> +		chomp($line);
>> +		my ($id, $date, $subject) = split /\t/, $line;
>> +		my $epoch = str2time($date);
>> +		push @entries, {id=>$id, date=>$date, epoch=>$epoch, subject=>$subject};
>> +	}
>> +	close($fh);
>> +	return @entries;
>
> So all the old ones are read, without dropping ancient and possibly
> useless ones here...
>
>> +sub msgid_cache_write {
>> +	my $fh;
>> +	if (not open($fh, '>>', $msgid_cache_file)) {
>> +	    warn "cannot open $msgid_cache_file for appending: $!";
>> +	    return;
>> +	}
>> +	printf $fh "%s\t%s\t%s\n", $_->{id}, $_->{date}, $_->{subject} for (@msgid_new_entries);
>> +	close($fh);
>
> And the new ones are appended to the end without expiring anything.
>
> When does the cache shrink, and who is responsible for doing so?

Fixed. Whether it should be "cap when larger than $size" or "remove
entries older than time()-$maxage" I don't know.

> The //= operator is mentioned in perl581delta.pod, it seems, and
> none of our Perl scripted Porcelains seems to use it yet.  Is it
> safe to assume that everybody has it?

No, fixed.

>> +	if ($subject) {
>> +		$subject =~ s/^\s+|\s+$//g;
>> +		$subject =~ s/\s+/ /g;
>> +	}
>> +	# Replace undef or the empty string by an actual string. Nobody uses "0" as the subject...
>
> That's a strange sloppiness for somebody who uses //=, no?

Fixed.


-- 
1.7.9.5

  parent reply	other threads:[~2013-08-21 19:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-17  0:58 [RFC] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes
2013-08-18 21:08 ` Junio C Hamano
2013-08-18 22:24   ` brian m. carlson
2013-08-21 19:04 ` Rasmus Villemoes [this message]
2013-08-21 19:04   ` [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub Rasmus Villemoes
2013-08-21 19:04   ` [PATCH 2/2] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes
2013-08-22  0:20   ` [PATCH v2 0/2] git-send-email: Message-ID caching Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1377111862-13199-1-git-send-email-rv@rasmusvillemoes.dk \
    --to=rv@rasmusvillemoes.dk \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).