git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Paul Tan <pyokagan@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] reset cached ident date before creating objects
Date: Mon, 1 Aug 2016 15:37:00 -0400	[thread overview]
Message-ID: <20160801193659.hag2yax2rniczkbo@sigill.intra.peff.net> (raw)
In-Reply-To: <20160801180047.sgzqgh7pvp2qzwug@sigill.intra.peff.net>

On Mon, Aug 01, 2016 at 02:00:47PM -0400, Jeff King wrote:

> So at this point I think I'd lean towards programs which have multiple
> logical commit operations explicitly saying "I am starting a new
> operation". It may be that we end up attaching more to that in the long
> run than just timestamp resets, too (i.e., what other cached data might
> there be that used to happen in separate sub-processes?).

Here's that patch.

I still just put in a bare "reset_ident_date()" in git-am. This _could_
be wrapped in: begin_logical_commit() or something, but it would just be
a wrapper around resetting the ident_date. So I'm inclined not to worry
about such semantic obfuscation until we actually have a second thing to
reset. :)

I also didn't go looking for other spots which might want similar
treatment. Junio mentioned that the sequencer code still uses an
external commit process, so cherry-pick/revert are OK. git-fast-import
creates a stream of commits, but already deals with this issue in a
separate way. And I couldn't think of other programs that want to make a
string of commits.

-- >8 --
Subject: [PATCH] am: reset cached ident date for each patch

When we compute the date to go in author/committer lines of
commits, or tagger lines of tags, we get the current date
once and then cache it for the rest of the program.  This is
a good thing in some cases, like "git commit", because it
means we do not racily assign different times to the
author/committer fields of a single commit object.

But as more programs start to make many commits in a single
process (e.g., the recently builtin "git am"), it means that
you'll get long strings of commits with identical committer
timestamps (whereas before, we invoked "git commit" many
times and got true timestamps).

This patch addresses it by letting callers reset the cached
time, which means they'll get a fresh time on their next
call to git_committer_info() or git_author_info(). The first
caller to do so is "git am", which resets the time for each
patch it applies.

It would be nice if we could just do this automatically
before filling in the ident fields of commit and tag
objects. Unfortunately, it's hard to know where a particular
logical operation begins and ends.

For instance, if commit_tree_extended() were to call
reset_ident_date() before getting the committer/author
ident, that doesn't quite work; sometimes the author info is
passed in to us as a parameter, and it may or may not have
come from a previous call to ident_default_date(). So in
those cases, we lose the property that the committer and the
author timestamp always match.

You could similarly put a date-reset at the end of
commit_tree_extended(). That actually works in the current
code base, but it's fragile. It makes the assumption that
after commit_tree_extended() finishes, the caller has no
other operations that would logically want to fall into the
same timestamp.

So instead we provide the tool to easily do the reset, and
let the high-level callers use it to annotate their own
logical operations.

There's no automated test, because it would be inherently
racy (it depends on whether the program takes multiple
seconds to run). But you can see the effect with something
like:

  # make a fake 100-patch series
  top=$(git rev-parse HEAD)
  bottom=$(git rev-list --first-parent -100 HEAD | tail -n 1)
  git log --format=email --reverse --first-parent \
          --binary -m -p $bottom..$top >patch

  # now apply it; this presumably takes multiple seconds
  git checkout --detach $bottom
  git am <patch

  # now count the number of distinct committer times;
  # prior to this patch, there would only be one, but
  # now we'd typically see several.
  git log --format=%ct $bottom.. | sort -u

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Helped-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/am.c | 2 ++
 cache.h      | 1 +
 ident.c      | 5 +++++
 3 files changed, 8 insertions(+)

diff --git a/builtin/am.c b/builtin/am.c
index b77bf11..8aa9b5b 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1840,6 +1840,8 @@ static void am_run(struct am_state *state, int resume)
 		const char *mail = am_path(state, msgnum(state));
 		int apply_status;
 
+		reset_ident_date();
+
 		if (!file_exists(mail))
 			goto next;
 
diff --git a/cache.h b/cache.h
index b5f76a4..31e65f9 100644
--- a/cache.h
+++ b/cache.h
@@ -1269,6 +1269,7 @@ extern const char *ident_default_email(void);
 extern const char *git_editor(void);
 extern const char *git_pager(int stdout_is_tty);
 extern int git_ident_config(const char *, const char *, void *);
+extern void reset_ident_date(void);
 
 struct ident_split {
 	const char *name_begin;
diff --git a/ident.c b/ident.c
index 139c528..e20a772 100644
--- a/ident.c
+++ b/ident.c
@@ -184,6 +184,11 @@ static const char *ident_default_date(void)
 	return git_default_date.buf;
 }
 
+void reset_ident_date(void)
+{
+	strbuf_reset(&git_default_date);
+}
+
 static int crud(unsigned char c)
 {
 	return  c <= 32  ||
-- 
2.9.2.670.gf6ce898


  reply	other threads:[~2016-08-01 20:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-28 23:35 Small trivial annoyance with the nice new builtin "git am" Linus Torvalds
2016-07-28 23:47 ` Junio C Hamano
2016-07-29  0:29   ` Jeff King
2016-07-29  0:37     ` Linus Torvalds
2016-07-29 15:50       ` Jeff King
2016-07-29 17:15         ` Junio C Hamano
2016-07-29 18:05           ` [PATCH] reset cached ident date before creating objects Jeff King
2016-07-29 18:12             ` Linus Torvalds
2016-07-29 18:15             ` Junio C Hamano
2016-07-29 18:25               ` Jeff King
2016-07-30  2:11             ` Paul Tan
2016-07-30  2:41               ` Jeff King
2016-08-01 17:49                 ` Junio C Hamano
2016-08-01 17:58                   ` Junio C Hamano
2016-08-01 18:12                     ` Jeff King
2016-08-01 18:17                       ` Jeff King
2016-08-01 18:00                   ` Jeff King
2016-08-01 19:37                     ` Jeff King [this message]
2016-08-01 21:54                       ` Junio C Hamano
2016-08-01 22:00                         ` Jeff King
2016-07-29 18:10       ` Small trivial annoyance with the nice new builtin "git am" Junio C Hamano
2016-07-29 17:06     ` Junio C Hamano
2016-07-29  0:21 ` Jeff King
2016-07-29  0:32   ` Linus Torvalds
2016-07-29  8:19     ` Christian Couder
2016-07-29 17:18     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160801193659.hag2yax2rniczkbo@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pyokagan@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).