From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Phillip Wood <phillip.wood@dunelm.org.uk>
Cc: Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v1 8/8] sequencer: try to commit without forking 'git commit'
Date: Tue, 7 Nov 2017 02:36:30 +0100 (CET) [thread overview]
Message-ID: <alpine.DEB.2.21.1.1711070203230.6482@virtualbox> (raw)
In-Reply-To: <20171106112709.2121-9-phillip.wood@talktalk.net>
Hi Phillip,
On Mon, 6 Nov 2017, Phillip Wood wrote:
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> If the commit message does not need to be edited then create the
> commit without forking 'git commit'. Taking the best time of ten runs
> with a warm cache this reduces the time taken to cherry-pick 10
> commits by 27% (from 282ms to 204ms), and the time taken by 'git
> rebase --continue' to pick 10 commits by 45% (from 386ms to 212ms) on
> my computer running linux. Some of greater saving for rebase is
> because it longer wastes time creating the commit summary just to
I usually leave the grammar reviews to people who prefer to review grammar
over code, but in this case I think the "no" in "no longer" is rather
crucial.
> throw it away.
Those are impressive improvements, and I am certain that they will be even
more noticable on Windows, where creating processes is a lot more
expensive than on Linux (which is the reason why you will find a lot more
multi-threaded processes on Windows...).
> The code to create the commit is based on builtin/commit.c. It is
> slightly simplified as it doesn't have to deal with merges and
> modified so try and return an error rather than dying so that the
> sequencer exits cleanly, as it would when forking 'git commit'.
>
> Even when not forking 'git commit' the commit message is written to a
> file and CHERRY_PICK_HEAD is created unnecessarily. This could be
> eliminated in future. I hacked up a version that does not write these
> files and just passed an strbuf (with the wrong message for fixup and
> squash commands) to do_commit() but I couldn't measure any significant
> time difference when running cherry-pick or rebase. I think
> eliminating the writes properly for rebase would require a bit of
> effort as the code would need to be restructured.
True. And it totally makes sense to go for the big bucks.
> diff --git a/sequencer.c b/sequencer.c
> index b8cf679751449591d6f97102904e060ebee9d7a1..0636d027e9e1cdebaab4802e5becd89e8398a425 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -592,6 +592,18 @@ static int read_env_script(struct argv_array *env)
> return 0;
> }
>
> +static char *get_author(const char* message)
> +{
> + size_t len;
> + const char *a;
> +
> + a = find_commit_header(message, "author", &len);
> + if (a)
> + return xmemdupz(a, len);
> +
> + return NULL;
> +}
I was surprised that there is no helper for that yet, so I looked, but it
seems that the three existing callers of `find_commit_header(...,
"author", ...)` want to split the ident line right away, and do not need
the duplicated buffer.
In short: the code added here is necessary.
> @@ -984,6 +996,151 @@ int print_commit_summary(const char *prefix, const struct object_id *oid,
> return ret;
> }
>
> +static int parse_head(struct commit **head)
> +{
> + struct commit *current_head;
> + struct object_id oid;
> +
> + if (get_oid("HEAD", &oid)) {
> + current_head = NULL;
> + } else {
> + current_head = lookup_commit_reference(&oid);
> + if (!current_head)
> + return error(_("could not parse HEAD"));
> + if (oidcmp(&oid, ¤t_head->object.oid)) {
> + warning(_("HEAD %s is not a commit!"),
> + oid_to_hex(&oid));
> + }
> + if (parse_commit(current_head))
> + return error(_("could not parse HEAD commit"));
> + }
> + *head = current_head;
> +
> + return 0;
> +}
> +
> +static int try_to_commit(struct strbuf *msg, const char *author,
> + struct replay_opts *opts, unsigned int flags,
> + struct object_id *oid)
Since this is a file-local function, i.e. not in any way tied to a
process exit status, it should probably return -1 in the case of errors,
as Git does elsewhere, too.
> +{
> + struct object_id tree;
> + struct commit *current_head;
> + struct commit_list *parents = NULL;
> + struct commit_extra_header *extra = NULL;
> + struct strbuf err = STRBUF_INIT;
> + struct strbuf amend_msg = STRBUF_INIT;
> + char *amend_author = NULL;
> + const char *gpg_sign;
> + enum cleanup_mode cleanup;
> + int res = 0;
> +
> + if (parse_head(¤t_head))
> + return -1;
> +
> + if (flags & AMEND_MSG) {
> + const char *exclude_gpgsig[2] = { "gpgsig", NULL };
Git's current source code seems to prefer to infer the array length; The
`2` is unnecessary here.
> + const char *out_enc = get_commit_output_encoding();
> + const char *message = logmsg_reencode(current_head, NULL,
> + out_enc);
> +
> + if (!msg) {
> + const char *body = NULL;
> +
> + find_commit_subject(message, &body);
Maybe `orig_message` would be better here; I expected `body` to refer to
the part of the commit message *after* the subject, but reading the code
of `find_commit_subject()`, I find that it stores the beginning of the
commit message.
Dunno.
> + msg = &amend_msg;
> + strbuf_addstr(msg, body);
> + }
> + author = amend_author = get_author (message);
Please lose the space after the function name.
> + unuse_commit_buffer(current_head, message);
> + if (!author) {
> + res = error(_("unable to parse commit author"));
> + goto out;
> + }
> + parents = copy_commit_list(current_head->parents);
> + extra = read_commit_extra_headers(current_head, exclude_gpgsig);
> + } else if (current_head) {
> + commit_list_insert(current_head, &parents);
> + }
> +
> + cleanup = (flags & CLEANUP_MSG) ? CLEANUP_ALL : default_msg_cleanup;
> + if (cleanup != CLEANUP_NONE)
> + strbuf_stripspace(msg, cleanup == CLEANUP_ALL);
> + if (!opts->allow_empty_message && message_is_empty(msg, cleanup)) {
> + res = 1;
> + goto out;
> + }
> +
> + gpg_sign = (opts->gpg_sign) ? opts->gpg_sign : default_gpg_sign;
Others will probably complain about those extra parentheses. I am not
offended by them, though.
> + if (write_cache_as_tree(tree.hash, 0, NULL)) {
> + res = error(_("git write-tree failed to write a tree"));
> + goto out;
> + }
> +
> + if (!(flags & ALLOW_EMPTY) && !oidcmp(current_head ?
> + ¤t_head->tree->object.oid :
> + &empty_tree_oid, &tree)) {
I'll leave it to Junio to comment on the formatting here.
> + res = 1;
> + goto out;
> + }
> +
> + if (commit_tree_extended(msg->buf, msg->len, tree.hash, parents,
> + oid->hash, author, gpg_sign, extra)) {
> + res = error(_("failed to write commit object"));
> + goto out;
> + }
> +
> + if (update_head(current_head, oid, getenv("GIT_REFLOG_ACTION"), msg,
> + &err)){
> + res = error("%s", err.buf);
> + goto out;
> + }
> +
> + if (flags & AMEND_MSG)
> + commit_post_rewrite(current_head, oid);
> +
> +out:
> + free_commit_extra_headers(extra);
> + strbuf_release(&err);
> + strbuf_release(&amend_msg);
> + if (amend_author)
> + free(amend_author);
Git's source code uses the fact that `free(NULL);` is essentially a no-op
(and certainly allowed) to avoid conditionals in such cases.
That would make the `if (amend_author)` unnecessary.
> +
> + return res;
> +}
> +
> +static int do_commit(const char *msg_file, const char* author,
> + struct replay_opts *opts, unsigned int flags)
> +{
> + int res = 1;
Same as above, the error code should most likely be -1 instead.
> + if (~flags & EDIT_MSG && ~flags & VERIFY_MSG) {
I *think* it is more common to write `!(flags & EDIT_MSG)` in Git's source
code.
> + struct object_id oid;
> + struct strbuf sb = STRBUF_INIT;
> +
> + if (msg_file && strbuf_read_file(&sb, msg_file, 2048) < 0)
> + return error_errno(_("unable to read commit message "
> + "from '%s'"),
> + msg_file);
> +
> + res = try_to_commit(msg_file ? &sb : NULL, author, opts, flags,
> + &oid);
> + strbuf_release(&sb);
> + if (res == 0) {
Usually, Git's source code uses `if (!res)` in such cases.
> + unlink(git_path_cherry_pick_head());
> + unlink(git_path_merge_msg());
> + if (!is_rebase_i(opts))
> + res = print_commit_summary(NULL, &oid,
> + SUMMARY_SHOW_AUTHOR_DATE);
> + return res;
> + }
I wonder whether we should move the `return res;` one line lower, to avoid
falling through to call `run_git_commit()` if `try_to_commit()` failed...
> + }
> + if (res == 1)
> + return run_git_commit(msg_file, opts, flags);
Maybe this code could be simplified even further by moving this
conditional to the beginning of the function, as:
if ((flags & (EDIT_MSG | VERIFY_MSG)))
return run_git_commit(msg_file, opts, flags);
But maybe I misunderstood and you really wanted to fall back on
`run_git_commit()` if `try_to_commit()` failed?
> +
> + return res;
> +}
> +
> static int is_original_commit_empty(struct commit *commit)
> {
> const struct object_id *ptree_oid;
> @@ -1235,6 +1392,7 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
> struct object_id head;
> struct commit *base, *next, *parent;
> const char *base_label, *next_label;
> + char *author = NULL;
> struct commit_message msg = { NULL, NULL, NULL, NULL };
> struct strbuf msgbuf = STRBUF_INIT;
> int res, unborn = 0, allow;
> @@ -1350,6 +1508,8 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
> strbuf_addstr(&msgbuf, oid_to_hex(&commit->object.oid));
> strbuf_addstr(&msgbuf, ")\n");
> }
> + if (!is_fixup (command))
> + author = get_author(msg.message);
> }
>
> if (command == TODO_REWORD)
> @@ -1435,9 +1595,13 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
> goto leave;
> } else if (allow)
> flags |= ALLOW_EMPTY;
> - if (!opts->no_commit)
> + if (!opts->no_commit) {
> fast_forward_edit:
> - res = run_git_commit(msg_file, opts, flags);
> + if (author || command == TODO_REVERT || (flags & AMEND_MSG))
> + res = do_commit(msg_file, author, opts, flags);
> + else
> + res = error(_("unable to parse commit author"));
Would this be a bug here? Or do we expect `get_author()` to possibly fail?
> + }
>
> if (!res && final_fixup) {
> unlink(rebase_path_fixup_msg());
> @@ -1446,6 +1610,8 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
>
> leave:
> free_message(commit, &msg);
> + if (author)
> + free(author);
As above, please write an unconditional `free(author);` here.
All in all, this patch series was a nice an pleasant read. I am impressed
by the performance wins.
Thank you very much,
Dscho
next prev parent reply other threads:[~2017-11-07 1:36 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-25 10:10 [RFC PATCH 0/8] sequencer: dont't fork git commit Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 1/8] commit: move empty message checks to libgit Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 2/8] commit: move code to update HEAD " Phillip Wood
2017-10-07 9:54 ` Junio C Hamano
2017-10-24 10:01 ` Phillip Wood
2017-10-24 12:41 ` Junio C Hamano
2017-09-25 10:10 ` [RFC PATCH 3/8] sequencer: refactor update_head() Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 4/8] commit: move post-rewrite code to libgit Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 5/8] commit: move print_commit_summary() " Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 6/8] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 7/8] sequencer: load commit related config Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 8/8] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-06 11:27 ` [PATCH v1 0/8] sequencer: dont't fork git commit Phillip Wood
2017-11-06 11:27 ` [PATCH v1 1/8] commit: move empty message checks to libgit Phillip Wood
2017-11-07 0:43 ` Johannes Schindelin
2017-11-07 14:24 ` Phillip Wood
2017-11-06 11:27 ` [PATCH v1 2/8] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-07 2:56 ` Junio C Hamano
2017-11-07 3:02 ` Johannes Schindelin
2017-11-07 14:26 ` Phillip Wood
2017-11-06 11:27 ` [PATCH v1 3/8] commit: move post-rewrite code to libgit Phillip Wood
2017-11-07 3:03 ` Junio C Hamano
2017-11-07 14:28 ` Phillip Wood
2017-11-06 11:27 ` [PATCH v1 4/8] commit: move print_commit_summary() " Phillip Wood
2017-11-07 3:38 ` Junio C Hamano
2017-11-07 14:32 ` Phillip Wood
2017-11-08 1:04 ` Junio C Hamano
2017-11-06 11:27 ` [PATCH v1 5/8] sequencer: don't die in print_commit_summary() Phillip Wood
2017-11-07 4:18 ` Junio C Hamano
2017-11-07 10:24 ` Johannes Schindelin
2017-11-07 15:13 ` Junio C Hamano
2017-11-10 14:53 ` Phillip Wood
2017-11-10 18:05 ` Junio C Hamano
2017-11-13 11:11 ` Phillip Wood
2017-11-06 11:27 ` [PATCH v1 6/8] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-07 0:52 ` Johannes Schindelin
2017-11-07 4:52 ` Junio C Hamano
2017-11-07 14:46 ` Phillip Wood
2017-11-06 11:27 ` [PATCH v1 7/8] sequencer: load commit related config Phillip Wood
2017-11-07 1:02 ` Johannes Schindelin
2017-11-07 10:50 ` Phillip Wood
2017-11-06 11:27 ` [PATCH v1 8/8] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-07 1:36 ` Johannes Schindelin [this message]
2017-11-07 11:16 ` Phillip Wood
2017-11-07 14:09 ` Johannes Schindelin
2017-11-10 11:09 ` [PATCH v2 0/9] sequencer: dont't fork git commit Phillip Wood
2017-11-10 11:09 ` [PATCH v2 1/9] t3404: check intermediate squash messages Phillip Wood
2017-11-10 11:09 ` [PATCH v2 2/9] commit: move empty message checks to libgit Phillip Wood
2017-11-10 18:51 ` Ramsay Jones
2017-11-13 11:08 ` Phillip Wood
2017-11-10 11:09 ` [PATCH v2 3/9] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-10 18:36 ` Junio C Hamano
2017-11-13 11:25 ` Phillip Wood
2017-11-10 11:09 ` [PATCH v2 4/9] commit: move post-rewrite code to libgit Phillip Wood
2017-11-10 11:09 ` [PATCH v2 5/9] commit: move print_commit_summary() " Phillip Wood
2017-11-10 11:09 ` [PATCH v2 6/9] sequencer: don't die in print_commit_summary() Phillip Wood
2017-11-10 11:09 ` [PATCH v2 7/9] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-10 11:09 ` [PATCH v2 8/9] sequencer: load commit related config Phillip Wood
2017-11-10 11:09 ` [PATCH v2 9/9] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-10 19:21 ` [PATCH v2 0/9] sequencer: dont't fork git commit Junio C Hamano
2017-11-13 11:24 ` Phillip Wood
2017-11-14 1:15 ` Junio C Hamano
2017-11-17 11:34 ` [PATCH v3 0/8] sequencer: don't " Phillip Wood
2017-11-17 11:34 ` [PATCH v3 1/8] t3404: check intermediate squash messages Phillip Wood
2017-11-17 11:34 ` [PATCH v3 2/8] commit: move empty message checks to libgit Phillip Wood
2017-11-17 11:34 ` [PATCH v3 3/8] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-17 11:34 ` [PATCH v3 4/8] commit: move post-rewrite code to libgit Phillip Wood
2017-11-17 11:34 ` [PATCH v3 5/8] commit: move print_commit_summary() " Phillip Wood
2017-11-17 11:34 ` [PATCH v3 6/8] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-17 11:34 ` [PATCH v3 7/8] sequencer: load commit related config Phillip Wood
2017-11-17 11:34 ` [PATCH v3 8/8] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-18 3:41 ` [PATCH v3 0/8] sequencer: don't fork git commit Junio C Hamano
2017-11-18 3:57 ` Junio C Hamano
2017-11-18 11:32 ` Phillip Wood
2017-11-18 14:33 ` Phillip Wood
2017-11-24 11:07 ` [PATCH v4 0/9] " Phillip Wood
2017-11-24 11:07 ` [PATCH v4 1/9] t3404: check intermediate squash messages Phillip Wood
2017-11-24 11:07 ` [PATCH v4 2/9] commit: move empty message checks to libgit Phillip Wood
2017-11-24 11:07 ` [PATCH v4 3/9] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-24 11:07 ` [PATCH v4 4/9] commit: move post-rewrite code to libgit Phillip Wood
2017-11-24 11:07 ` [PATCH v4 5/9] commit: move print_commit_summary() " Phillip Wood
2017-11-24 11:07 ` [PATCH v4 6/9] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-24 11:07 ` [PATCH v4 7/9] sequencer: load commit related config Phillip Wood
2017-11-24 13:48 ` Junio C Hamano
2017-11-24 14:38 ` Phillip Wood
2017-12-04 18:30 ` Junio C Hamano
2017-12-05 11:21 ` Phillip Wood
2017-12-05 12:10 ` Phillip Wood
2017-12-09 19:05 ` Phillip Wood
2017-11-24 11:07 ` [PATCH v4 8/9] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-24 11:07 ` [PATCH v4 9/9] t3512/t3513: remove KNOWN_FAILURE_CHERRY_PICK_SEES_EMPTY_COMMIT=1 Phillip Wood
2017-12-04 19:24 ` Stefan Beller
2017-12-05 12:13 ` Phillip Wood
2017-12-11 14:13 ` [PATCH v5 0/9] sequencer: don't fork git commit Phillip Wood
2017-12-11 14:13 ` [PATCH v5 1/9] t3404: check intermediate squash messages Phillip Wood
2017-12-11 14:13 ` [PATCH v5 2/9] commit: move empty message checks to libgit Phillip Wood
2017-12-11 14:13 ` [PATCH v5 3/9] Add a function to update HEAD after creating a commit Phillip Wood
2017-12-11 14:13 ` [PATCH v5 4/9] commit: move post-rewrite code to libgit Phillip Wood
2017-12-11 14:13 ` [PATCH v5 5/9] commit: move print_commit_summary() " Phillip Wood
2017-12-11 14:13 ` [PATCH v5 6/9] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-12-11 14:13 ` [PATCH v5 7/9] sequencer: load commit related config Phillip Wood
2017-12-11 18:53 ` Phillip Wood
2017-12-11 14:13 ` [PATCH v5 8/9] sequencer: try to commit without forking 'git commit' Phillip Wood
2018-01-10 20:53 ` Jonathan Nieder
2018-01-10 22:40 ` Johannes Schindelin
2018-01-11 10:41 ` Phillip Wood
2018-01-11 20:21 ` Johannes Schindelin
2017-12-11 14:13 ` [PATCH v5 9/9] t3512/t3513: remove KNOWN_FAILURE_CHERRY_PICK_SEES_EMPTY_COMMIT=1 Phillip Wood
2017-12-11 23:44 ` [PATCH v5 0/9] sequencer: don't fork git commit Junio C Hamano
2017-12-12 10:32 ` Phillip Wood
2017-12-13 11:46 ` [PATCH] sequencer: improve config handling Phillip Wood
2017-12-20 18:33 ` Error in `git': free(): invalid pointer (was Re: [PATCH] sequencer: improve config handling) Kaartic Sivaraam
2017-12-21 14:06 ` Johannes Schindelin
2017-12-21 16:42 ` Kaartic Sivaraam
2017-12-22 11:49 ` Johannes Schindelin
2017-12-25 8:51 ` Kaartic Sivaraam
[not found] ` <18737953.1042351513802399608.JavaMail.defaultUser@defaultHost>
2017-12-21 15:02 ` Kaartic Sivaraam
2017-12-21 16:53 ` phillip.wood
2017-12-21 17:14 ` Kaartic Sivaraam
2017-12-22 10:47 ` phillip.wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.21.1.1711070203230.6482@virtualbox \
--to=johannes.schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=phillip.wood@dunelm.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).