git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Phillip Wood <phillip.wood@dunelm.org.uk>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v1 8/8] sequencer: try to commit without forking 'git commit'
Date: Tue, 7 Nov 2017 02:36:30 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.21.1.1711070203230.6482@virtualbox> (raw)
In-Reply-To: <20171106112709.2121-9-phillip.wood@talktalk.net>

Hi Phillip,

On Mon, 6 Nov 2017, Phillip Wood wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
> 
> If the commit message does not need to be edited then create the
> commit without forking 'git commit'. Taking the best time of ten runs
> with a warm cache this reduces the time taken to cherry-pick 10
> commits by 27% (from 282ms to 204ms), and the time taken by 'git
> rebase --continue' to pick 10 commits by 45% (from 386ms to 212ms) on
> my computer running linux. Some of greater saving for rebase is
> because it longer wastes time creating the commit summary just to

I usually leave the grammar reviews to people who prefer to review grammar
over code, but in this case I think the "no" in "no longer" is rather
crucial.

> throw it away.

Those are impressive improvements, and I am certain that they will be even
more noticable on Windows, where creating processes is a lot more
expensive than on Linux (which is the reason why you will find a lot more
multi-threaded processes on Windows...).

> The code to create the commit is based on builtin/commit.c. It is
> slightly simplified as it doesn't have to deal with merges and
> modified so try and return an error rather than dying so that the
> sequencer exits cleanly, as it would when forking 'git commit'.
> 
> Even when not forking 'git commit' the commit message is written to a
> file and CHERRY_PICK_HEAD is created unnecessarily. This could be
> eliminated in future. I hacked up a version that does not write these
> files and just passed an strbuf (with the wrong message for fixup and
> squash commands) to do_commit() but I couldn't measure any significant
> time difference when running cherry-pick or rebase. I think
> eliminating the writes properly for rebase would require a bit of
> effort as the code would need to be restructured.

True. And it totally makes sense to go for the big bucks.

> diff --git a/sequencer.c b/sequencer.c
> index b8cf679751449591d6f97102904e060ebee9d7a1..0636d027e9e1cdebaab4802e5becd89e8398a425 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -592,6 +592,18 @@ static int read_env_script(struct argv_array *env)
>  	return 0;
>  }
>  
> +static char *get_author(const char* message)
> +{
> +	size_t len;
> +	const char *a;
> +
> +	a = find_commit_header(message, "author", &len);
> +	if (a)
> +		return xmemdupz(a, len);
> +
> +	return NULL;
> +}

I was surprised that there is no helper for that yet, so I looked, but it
seems that the three existing callers of `find_commit_header(...,
"author", ...)` want to split the ident line right away, and do not need
the duplicated buffer.

In short: the code added here is necessary.

> @@ -984,6 +996,151 @@ int print_commit_summary(const char *prefix, const struct object_id *oid,
>  	return ret;
>  }
>  
> +static int parse_head(struct commit **head)
> +{
> +	struct commit *current_head;
> +	struct object_id oid;
> +
> +	if (get_oid("HEAD", &oid)) {
> +		current_head = NULL;
> +	} else {
> +		current_head = lookup_commit_reference(&oid);
> +		if (!current_head)
> +			return error(_("could not parse HEAD"));
> +		if (oidcmp(&oid, &current_head->object.oid)) {
> +			warning(_("HEAD %s is not a commit!"),
> +				oid_to_hex(&oid));
> +		}
> +		if (parse_commit(current_head))
> +			return error(_("could not parse HEAD commit"));
> +	}
> +	*head = current_head;
> +
> +	return 0;
> +}
> +
> +static int try_to_commit(struct strbuf *msg, const char *author,
> +			 struct replay_opts *opts, unsigned int flags,
> +			 struct object_id *oid)

Since this is a file-local function, i.e. not in any way tied to a
process exit status, it should probably return -1 in the case of errors,
as Git does elsewhere, too.

> +{
> +	struct object_id tree;
> +	struct commit *current_head;
> +	struct commit_list *parents = NULL;
> +	struct commit_extra_header *extra = NULL;
> +	struct strbuf err = STRBUF_INIT;
> +	struct strbuf amend_msg = STRBUF_INIT;
> +	char *amend_author = NULL;
> +	const char *gpg_sign;
> +	enum cleanup_mode cleanup;
> +	int res = 0;
> +
> +	if (parse_head(&current_head))
> +		return -1;
> +
> +	if (flags & AMEND_MSG) {
> +		const char *exclude_gpgsig[2] = { "gpgsig", NULL };

Git's current source code seems to prefer to infer the array length; The
`2` is unnecessary here.

> +		const char *out_enc = get_commit_output_encoding();
> +		const char *message = logmsg_reencode(current_head, NULL,
> +						      out_enc);
> +
> +		if (!msg) {
> +			const char *body = NULL;
> +
> +			find_commit_subject(message, &body);

Maybe `orig_message` would be better here; I expected `body` to refer to
the part of the commit message *after* the subject, but reading the code
of `find_commit_subject()`, I find that it stores the beginning of the
commit message.

Dunno.

> +			msg = &amend_msg;
> +			strbuf_addstr(msg, body);
> +		}
> +		author = amend_author = get_author (message);

Please lose the space after the function name.

> +		unuse_commit_buffer(current_head, message);
> +		if (!author) {
> +			res = error(_("unable to parse commit author"));
> +			goto out;
> +		}
> +		parents = copy_commit_list(current_head->parents);
> +		extra = read_commit_extra_headers(current_head, exclude_gpgsig);
> +	} else if (current_head) {
> +		commit_list_insert(current_head, &parents);
> +	}
> +
> +	cleanup = (flags & CLEANUP_MSG) ? CLEANUP_ALL : default_msg_cleanup;
> +	if (cleanup != CLEANUP_NONE)
> +		strbuf_stripspace(msg, cleanup == CLEANUP_ALL);
> +	if (!opts->allow_empty_message && message_is_empty(msg, cleanup)) {
> +		res = 1;
> +		goto out;
> +	}
> +
> +	gpg_sign = (opts->gpg_sign) ? opts->gpg_sign : default_gpg_sign;

Others will probably complain about those extra parentheses. I am not
offended by them, though.

> +	if (write_cache_as_tree(tree.hash, 0, NULL)) {
> +		res = error(_("git write-tree failed to write a tree"));
> +		goto out;
> +	}
> +
> +	if (!(flags & ALLOW_EMPTY) && !oidcmp(current_head ?
> +					      &current_head->tree->object.oid :
> +					      &empty_tree_oid, &tree)) {

I'll leave it to Junio to comment on the formatting here.

> +		res = 1;
> +		goto out;
> +	}
> +
> +	if (commit_tree_extended(msg->buf, msg->len, tree.hash, parents,
> +				 oid->hash, author, gpg_sign, extra)) {
> +		res = error(_("failed to write commit object"));
> +		goto out;
> +	}
> +
> +	if (update_head(current_head, oid, getenv("GIT_REFLOG_ACTION"), msg,
> +			&err)){
> +		res = error("%s", err.buf);
> +		goto out;
> +	}
> +
> +	if (flags & AMEND_MSG)
> +		commit_post_rewrite(current_head, oid);
> +
> +out:
> +	free_commit_extra_headers(extra);
> +	strbuf_release(&err);
> +	strbuf_release(&amend_msg);
> +	if (amend_author)
> +		free(amend_author);

Git's source code uses the fact that `free(NULL);` is essentially a no-op
(and certainly allowed) to avoid conditionals in such cases.

That would make the `if (amend_author)` unnecessary.

> +
> +	return res;
> +}
> +
> +static int do_commit(const char *msg_file, const char* author,
> +		     struct replay_opts *opts, unsigned int flags)
> +{
> +	int res = 1;

Same as above, the error code should most likely be -1 instead.

> +	if (~flags & EDIT_MSG && ~flags & VERIFY_MSG) {

I *think* it is more common to write `!(flags & EDIT_MSG)` in Git's source
code.

> +		struct object_id oid;
> +		struct strbuf sb = STRBUF_INIT;
> +
> +		if (msg_file && strbuf_read_file(&sb, msg_file, 2048) < 0)
> +			return error_errno(_("unable to read commit message "
> +					     "from '%s'"),
> +					   msg_file);
> +
> +		res = try_to_commit(msg_file ? &sb : NULL, author, opts, flags,
> +				    &oid);
> +		strbuf_release(&sb);
> +		if (res == 0) {

Usually, Git's source code uses `if (!res)` in such cases.

> +			unlink(git_path_cherry_pick_head());
> +			unlink(git_path_merge_msg());
> +			if (!is_rebase_i(opts))
> +				res = print_commit_summary(NULL, &oid,
> +						SUMMARY_SHOW_AUTHOR_DATE);
> +			return res;
> +		}

I wonder whether we should move the `return res;` one line lower, to avoid
falling through to call `run_git_commit()` if `try_to_commit()` failed...

> +	}
> +	if (res == 1)
> +		return run_git_commit(msg_file, opts, flags);

Maybe this code could be simplified even further by moving this
conditional to the beginning of the function, as:

	if ((flags & (EDIT_MSG | VERIFY_MSG)))
		return run_git_commit(msg_file, opts, flags);

But maybe I misunderstood and you really wanted to fall back on
`run_git_commit()` if `try_to_commit()` failed?

> +
> +	return res;
> +}
> +
>  static int is_original_commit_empty(struct commit *commit)
>  {
>  	const struct object_id *ptree_oid;
> @@ -1235,6 +1392,7 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
>  	struct object_id head;
>  	struct commit *base, *next, *parent;
>  	const char *base_label, *next_label;
> +	char *author = NULL;
>  	struct commit_message msg = { NULL, NULL, NULL, NULL };
>  	struct strbuf msgbuf = STRBUF_INIT;
>  	int res, unborn = 0, allow;
> @@ -1350,6 +1508,8 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
>  			strbuf_addstr(&msgbuf, oid_to_hex(&commit->object.oid));
>  			strbuf_addstr(&msgbuf, ")\n");
>  		}
> +		if (!is_fixup (command))
> +			author = get_author(msg.message);
>  	}
>  
>  	if (command == TODO_REWORD)
> @@ -1435,9 +1595,13 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
>  		goto leave;
>  	} else if (allow)
>  		flags |= ALLOW_EMPTY;
> -	if (!opts->no_commit)
> +	if (!opts->no_commit) {
>  fast_forward_edit:
> -		res = run_git_commit(msg_file, opts, flags);
> +		if (author || command == TODO_REVERT || (flags & AMEND_MSG))
> +			res = do_commit(msg_file, author, opts, flags);
> +		else
> +			res = error(_("unable to parse commit author"));

Would this be a bug here? Or do we expect `get_author()` to possibly fail?

> +	}
>  
>  	if (!res && final_fixup) {
>  		unlink(rebase_path_fixup_msg());
> @@ -1446,6 +1610,8 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
>  
>  leave:
>  	free_message(commit, &msg);
> +	if (author)
> +		free(author);

As above, please write an unconditional `free(author);` here.

All in all, this patch series was a nice an pleasant read. I am impressed
by the performance wins.

Thank you very much,
Dscho

  reply	other threads:[~2017-11-07  1:36 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-25 10:10 [RFC PATCH 0/8] sequencer: dont't fork git commit Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 1/8] commit: move empty message checks to libgit Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 2/8] commit: move code to update HEAD " Phillip Wood
2017-10-07  9:54   ` Junio C Hamano
2017-10-24 10:01     ` Phillip Wood
2017-10-24 12:41       ` Junio C Hamano
2017-09-25 10:10 ` [RFC PATCH 3/8] sequencer: refactor update_head() Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 4/8] commit: move post-rewrite code to libgit Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 5/8] commit: move print_commit_summary() " Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 6/8] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 7/8] sequencer: load commit related config Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 8/8] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-06 11:27 ` [PATCH v1 0/8] sequencer: dont't fork git commit Phillip Wood
2017-11-06 11:27   ` [PATCH v1 1/8] commit: move empty message checks to libgit Phillip Wood
2017-11-07  0:43     ` Johannes Schindelin
2017-11-07 14:24       ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 2/8] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-07  2:56     ` Junio C Hamano
2017-11-07  3:02       ` Johannes Schindelin
2017-11-07 14:26         ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 3/8] commit: move post-rewrite code to libgit Phillip Wood
2017-11-07  3:03     ` Junio C Hamano
2017-11-07 14:28       ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 4/8] commit: move print_commit_summary() " Phillip Wood
2017-11-07  3:38     ` Junio C Hamano
2017-11-07 14:32       ` Phillip Wood
2017-11-08  1:04         ` Junio C Hamano
2017-11-06 11:27   ` [PATCH v1 5/8] sequencer: don't die in print_commit_summary() Phillip Wood
2017-11-07  4:18     ` Junio C Hamano
2017-11-07 10:24       ` Johannes Schindelin
2017-11-07 15:13       ` Junio C Hamano
2017-11-10 14:53         ` Phillip Wood
2017-11-10 18:05           ` Junio C Hamano
2017-11-13 11:11             ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 6/8] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-07  0:52     ` Johannes Schindelin
2017-11-07  4:52     ` Junio C Hamano
2017-11-07 14:46       ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 7/8] sequencer: load commit related config Phillip Wood
2017-11-07  1:02     ` Johannes Schindelin
2017-11-07 10:50       ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 8/8] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-07  1:36     ` Johannes Schindelin [this message]
2017-11-07 11:16       ` Phillip Wood
2017-11-07 14:09         ` Johannes Schindelin
2017-11-10 11:09 ` [PATCH v2 0/9] sequencer: dont't fork git commit Phillip Wood
2017-11-10 11:09   ` [PATCH v2 1/9] t3404: check intermediate squash messages Phillip Wood
2017-11-10 11:09   ` [PATCH v2 2/9] commit: move empty message checks to libgit Phillip Wood
2017-11-10 18:51     ` Ramsay Jones
2017-11-13 11:08       ` Phillip Wood
2017-11-10 11:09   ` [PATCH v2 3/9] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-10 18:36     ` Junio C Hamano
2017-11-13 11:25       ` Phillip Wood
2017-11-10 11:09   ` [PATCH v2 4/9] commit: move post-rewrite code to libgit Phillip Wood
2017-11-10 11:09   ` [PATCH v2 5/9] commit: move print_commit_summary() " Phillip Wood
2017-11-10 11:09   ` [PATCH v2 6/9] sequencer: don't die in print_commit_summary() Phillip Wood
2017-11-10 11:09   ` [PATCH v2 7/9] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-10 11:09   ` [PATCH v2 8/9] sequencer: load commit related config Phillip Wood
2017-11-10 11:09   ` [PATCH v2 9/9] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-10 19:21   ` [PATCH v2 0/9] sequencer: dont't fork git commit Junio C Hamano
2017-11-13 11:24     ` Phillip Wood
2017-11-14  1:15       ` Junio C Hamano
2017-11-17 11:34 ` [PATCH v3 0/8] sequencer: don't " Phillip Wood
2017-11-17 11:34   ` [PATCH v3 1/8] t3404: check intermediate squash messages Phillip Wood
2017-11-17 11:34   ` [PATCH v3 2/8] commit: move empty message checks to libgit Phillip Wood
2017-11-17 11:34   ` [PATCH v3 3/8] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-17 11:34   ` [PATCH v3 4/8] commit: move post-rewrite code to libgit Phillip Wood
2017-11-17 11:34   ` [PATCH v3 5/8] commit: move print_commit_summary() " Phillip Wood
2017-11-17 11:34   ` [PATCH v3 6/8] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-17 11:34   ` [PATCH v3 7/8] sequencer: load commit related config Phillip Wood
2017-11-17 11:34   ` [PATCH v3 8/8] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-18  3:41   ` [PATCH v3 0/8] sequencer: don't fork git commit Junio C Hamano
2017-11-18  3:57     ` Junio C Hamano
2017-11-18 11:32       ` Phillip Wood
2017-11-18 14:33       ` Phillip Wood
2017-11-24 11:07 ` [PATCH v4 0/9] " Phillip Wood
2017-11-24 11:07   ` [PATCH v4 1/9] t3404: check intermediate squash messages Phillip Wood
2017-11-24 11:07   ` [PATCH v4 2/9] commit: move empty message checks to libgit Phillip Wood
2017-11-24 11:07   ` [PATCH v4 3/9] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-24 11:07   ` [PATCH v4 4/9] commit: move post-rewrite code to libgit Phillip Wood
2017-11-24 11:07   ` [PATCH v4 5/9] commit: move print_commit_summary() " Phillip Wood
2017-11-24 11:07   ` [PATCH v4 6/9] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-24 11:07   ` [PATCH v4 7/9] sequencer: load commit related config Phillip Wood
2017-11-24 13:48     ` Junio C Hamano
2017-11-24 14:38       ` Phillip Wood
2017-12-04 18:30     ` Junio C Hamano
2017-12-05 11:21       ` Phillip Wood
2017-12-05 12:10         ` Phillip Wood
2017-12-09 19:05         ` Phillip Wood
2017-11-24 11:07   ` [PATCH v4 8/9] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-24 11:07   ` [PATCH v4 9/9] t3512/t3513: remove KNOWN_FAILURE_CHERRY_PICK_SEES_EMPTY_COMMIT=1 Phillip Wood
2017-12-04 19:24     ` Stefan Beller
2017-12-05 12:13       ` Phillip Wood
2017-12-11 14:13 ` [PATCH v5 0/9] sequencer: don't fork git commit Phillip Wood
2017-12-11 14:13   ` [PATCH v5 1/9] t3404: check intermediate squash messages Phillip Wood
2017-12-11 14:13   ` [PATCH v5 2/9] commit: move empty message checks to libgit Phillip Wood
2017-12-11 14:13   ` [PATCH v5 3/9] Add a function to update HEAD after creating a commit Phillip Wood
2017-12-11 14:13   ` [PATCH v5 4/9] commit: move post-rewrite code to libgit Phillip Wood
2017-12-11 14:13   ` [PATCH v5 5/9] commit: move print_commit_summary() " Phillip Wood
2017-12-11 14:13   ` [PATCH v5 6/9] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-12-11 14:13   ` [PATCH v5 7/9] sequencer: load commit related config Phillip Wood
2017-12-11 18:53     ` Phillip Wood
2017-12-11 14:13   ` [PATCH v5 8/9] sequencer: try to commit without forking 'git commit' Phillip Wood
2018-01-10 20:53     ` Jonathan Nieder
2018-01-10 22:40       ` Johannes Schindelin
2018-01-11 10:41         ` Phillip Wood
2018-01-11 20:21           ` Johannes Schindelin
2017-12-11 14:13   ` [PATCH v5 9/9] t3512/t3513: remove KNOWN_FAILURE_CHERRY_PICK_SEES_EMPTY_COMMIT=1 Phillip Wood
2017-12-11 23:44   ` [PATCH v5 0/9] sequencer: don't fork git commit Junio C Hamano
2017-12-12 10:32     ` Phillip Wood
2017-12-13 11:46     ` [PATCH] sequencer: improve config handling Phillip Wood
2017-12-20 18:33       ` Error in `git': free(): invalid pointer (was Re: [PATCH] sequencer: improve config handling) Kaartic Sivaraam
2017-12-21 14:06         ` Johannes Schindelin
2017-12-21 16:42           ` Kaartic Sivaraam
2017-12-22 11:49             ` Johannes Schindelin
2017-12-25  8:51               ` Kaartic Sivaraam
     [not found]         ` <18737953.1042351513802399608.JavaMail.defaultUser@defaultHost>
2017-12-21 15:02           ` Kaartic Sivaraam
2017-12-21 16:53         ` phillip.wood
2017-12-21 17:14           ` Kaartic Sivaraam
2017-12-22 10:47             ` phillip.wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1.1711070203230.6482@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).