Hi Kuba,

On Wed, 31 Aug 2016, Jakub Narębski wrote:

> W dniu 29.08.2016 o 10:05, Johannes Schindelin pisze:
> 
> > When we came up with the "sequencer" idea, we really wanted to have
> > kind of a plumbing equivalent of the interactive rebase. Hence the
> > choice of words: the "todo" script, a "pick", etc.
> > 
> > However, when it came time to implement the entire shebang, somehow this
> > idea got lost and the sequencer was used as working horse for
> > cherry-pick and revert instead. So as not to interfere with the
> > interactive rebase, it even uses a separate directory to store its
> > state.
> > 
> > Furthermore, it also is stupidly strict about the "todo" script it
> > accepts: while it parses commands in a way that was *designed* to be
> > similar to the interactive rebase, it then goes on to *error out* if the
> > commands disagree with the overall action (cherry-pick or revert).
> 
> Does this mean that after the change you would be able to continue
> "git revert" with "git cherry-pick --continue", and vice versa?  Or that
> it would be possible for git-cherry-pick to do reverts (e.g. with ^<rev>)?

I guess that I allow that now. Is it harmful? I dunno.

> > Let's just bite the bullet and rewrite the entire parser; the code now
> > becomes not only more elegant: it allows us to go on and teach the
> > sequencer how to parse *true* "todo" scripts as used by the interactive
> > rebase itself. In a way, the sequencer is about to grow up to do its
> > older brother's job. Better.
> 
> Sidenote: this is not your fault, but Git doesn't do a good job on
> changes which are mostly rewrites, trying to match stray '}' and the
> like in generated diff.  I wonder if existing diff heuristic options
> could help here.

I guess --patience would have helped. Or Michael's upcoming
diff-heuristics.

> > While at it, do not stop at the first problem, but list *all* of the
> > problems. This helps the user by allowing to address all issues in
> > one go rather than going back and forth until the todo list is valid.
> 
> That is also a good change, though I wonder how often users need
> to worry about this outside interactive rebase case.  If it is
> preparation for rebase -i, where instruction list is written by
> prone to errors human, it would be nice to have this information
> in the commit message.

Okay.

> > diff --git a/sequencer.c b/sequencer.c
> > index 982b6e9..cbdce6d 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -473,7 +473,26 @@ static int allow_empty(struct replay_opts *opts, struct commit *commit)
> >  		return 1;
> >  }
> >  
> > -static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
> > +enum todo_command {
> > +	TODO_PICK,
> > +	TODO_REVERT
> > +};
> 
> Do we have a naming convention for enums elements?  Or are we explicitly
> making enums and #defines interchangeable?  I wonder...
> 
> ...uh, I see we don't have naming convention, but all caps snake-case
> names dominate:
> 
>   $ git grep -A2 'enum .* {'
>   [...]
>   diff.h:enum color_diff {
>   diff.h- DIFF_RESET = 0,
>   diff.h- DIFF_CONTEXT = 1,
>   --
>   dir.c:enum path_treatment {
>   dir.c-  path_none = 0,
>   dir.c-  path_recurse,
>   --
> 
> Shouldn't we say 'TODO_PICK = 0' explicitly, though?

Sure.

> > +static const char *todo_command_strings[] = {
> > +	"pick",
> > +	"revert"
> > +};
> 
> It's a bit pity that we cannot use designated inits, and hanging comma,
> (from ISO C99 standard).  That is:
> 
>   +static const char *todo_command_strings[] = {
>   +	[TODO_PICK]   = "pick",
>   +	[TODO_REVERT] = "revert",
>   +};

I agree, it is a pity. I could do something like I did in fsck.c:

	#define FOREACH_TODO_COMMAND(FUNC) \
		FUNC(PICK, "pick") \
		FUNC(REVERT, "revert")

	#define COMMAND_ID(id, string) TODO_##id,
	enum todo_command {
		FOREACH_TODO_COMMAND(COMMAND_ID)
		TODO_END
	};
	#undef COMMAND_ID

	#define COMMAND_ID(id, string) string,
	static const char *todo_command_string[] = {
		FOREACH_TODO_COMMAND(COMMAND_ID)
		NULL
	};
	#undef COMMAND_ID

However, this is not even readable, let alone any other type of an
improvement. So I won't.

> > @@ -548,7 +568,7 @@ static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
> 
> From here on changes are about
> 
>   s/opts->action == REPLAY_\(PICK\|REVERT\)/command == TODO_\1/
> 
> Do we still need opts->action, or it is just needed less,
> and it is 'todo' instruction that decides about command
> (as it should)?

We need opts->action. For example, the state directory changes depending
on it: REPLAY_INTERACTIVE_REBASE stores its stuff in
git_path("rebase-merge").

There is lots more behavior that also changes depending on opts->action.

> > [...]
> >  	if (res) {
> > -		error(opts->action == REPLAY_REVERT
> > +		error(command == TODO_REVERT
> >  		      ? _("could not revert %s... %s")
> >  		      : _("could not apply %s... %s"),
> >  		      find_unique_abbrev(commit->object.oid.hash, DEFAULT_ABBREV),
> 
> And here those changes end.
> 
>   s/opts->action == REPLAY_\(PICK\|REVERT\)/command == TODO_\1/
> 
> I wonder if Coccinelle / Undebt would help here; or would simple
> sed or query-and-replace-regexp be enough...

I did this by hand, to verify that I did nothing idiotic.

> > @@ -683,116 +703,107 @@ static int read_and_refresh_cache(struct replay_opts *opts)
> >  	return 0;
> >  }
> >  
> > -static int format_todo(struct strbuf *buf, struct commit_list *todo_list,
> > -		struct replay_opts *opts)
> > +struct todo_item {
> > +	enum todo_command command;
> > +	struct commit *commit;
> > +	size_t offset_in_buf;
> > +};
> > +
> > +struct todo_list {
> > +	struct strbuf buf;
> > +	struct todo_item *items;
> > +	int nr, alloc, current;
> > +};
> 
> So there should be s/commit_list [*]{1,2}todo_list/todo_list *todo_list/
> from here on?

Almost, but not quite.

> Hmmm... commit_list is, as defined in commit.h, a linked list.

That is the most prominent reason why the rest is not a mindless
conversion from commit_list to todo_list.

And we need todo_list as an array, because we need to be able to peek, or
even move, backwards from the current command.

> Here todo_list uses growable array implementation of list.  Which
> is I guess better on current CPU architecture, with slow memory,
> limited-size caches, and adjacency prefetching.

That is not the reason that an array is used here. The array allows us
much more flexibility.

One of the major performance improvements will come at the very end, for
example: the reordering of the fixup!/squash! lines. And that would be a
*major* pain to do if the todo_list were still a linked list.

> > +#define TODO_LIST_INIT { STRBUF_INIT, NULL, 0, 0, 0 }
> 
> Same as with other patches in this series, it would be enough to
> 
>   +#define TODO_LIST_INIT { STRBUF_INIT }

As it happens, after Hannes' comment about REPLAY_OPTIONS_INIT, I already
had changed TODO_LIST_INIT as indicated. I just had no time to send out
another iteration (besides, I wanted to give the sequencer-i patch series
more visibility).

> You are consistent.

Thank you!

> > -	struct commit_list *cur = NULL;
> > -	const char *sha1_abbrev = NULL;
> > -	const char *action_str = opts->action == REPLAY_REVERT ? "revert" : "pick";
> > -	const char *subject;
> > -	int subject_len;
> > +	strbuf_release(&todo_list->buf);
> > +	free(todo_list->items);
> > +	todo_list->items = NULL;
> > +	todo_list->nr = todo_list->alloc = 0;
> > +}
> >  
> > -	for (cur = todo_list; cur; cur = cur->next) {
> > -		const char *commit_buffer = get_commit_buffer(cur->item, NULL);
> > -		sha1_abbrev = find_unique_abbrev(cur->item->object.oid.hash, DEFAULT_ABBREV);
> > -		subject_len = find_commit_subject(commit_buffer, &subject);
> > -		strbuf_addf(buf, "%s %s %.*s\n", action_str, sha1_abbrev,
> > -			subject_len, subject);
> > -		unuse_commit_buffer(cur->item, commit_buffer);
> > -	}
> > -	return 0;
> > +struct todo_item *append_todo(struct todo_list *todo_list)
> 
> Errr... I don't quite understand the name of this function.
> What are you appending here to the todo_list?

A new item.

> Compare string_list_append() and string_list_append_nodup(),
> where the second parameter is item to append.

Yes, that is correct. In the case of a todo_item, things are a lot more
complicated, though. Some of the values have to be determined tediously
(such as the offset and length of the oneline after the "pick <oid>"
command). I just put those values directly into the newly allocated item,
is all.

> > +	ALLOC_GROW(todo_list->items, todo_list->nr + 1, todo_list->alloc);
> > +	return todo_list->items + todo_list->nr++;
> >  }
> >  
> > -static struct commit *parse_insn_line(char *bol, char *eol, struct replay_opts *opts)
> > +static int parse_insn_line(struct todo_item *item, const char *bol, char *eol)
> 
> Why the change of return type?  

Because it makes no sense to return a commit here because not all commands
are about commits (think rebase -i's `exec`). It makes tons of sense to
return an error condition, though.

> Why now struct todo_item is first when struct replay_opts was last?

Those play very, very different roles.

The opts parameter used to provide parse_insn_line() with enough
information to complain loudly when the overall command was not identical
to the parsed command.

The item parameter is a receptacle for the parsed data. It will contain
the pointer to the commit that was previously returned, if any. But it
will also contain much more information, such as the command, the oneline,
the offset in the buffer, etc etc

So "opts" was an "in" parameter while "item" is an "out" one. Apples and
oranges.

> > +	for (i = 0; i < ARRAY_SIZE(todo_command_strings); i++)
> > +		if (skip_prefix(bol, todo_command_strings[i], &bol)) {
> 
> skip_prefix() is such a nice abstraction...
> 
> > +			item->command = i;
> > +			break;
> > +		}
> 
> Nice.  Replacing if-elsif chain with loop.  
> 
> I guess any hashmap would be serious overkill, as there are and would be
> only a few actions possible.

If at all, we should use a trie here. But as you said: overkill to the
max.

> > +	if (i >= ARRAY_SIZE(todo_command_strings))
> > +		return -1;
> >  
> >  	/* Eat up extra spaces/ tabs before object name */
> >  	padding = strspn(bol, " \t");
> >  	if (!padding)
> > -		return NULL;
> > +		return -1;
> >  	bol += padding;
> >  
> > -	end_of_object_name = bol + strcspn(bol, " \t\n");
> > +	end_of_object_name = (char *) bol + strcspn(bol, " \t\n");
> 
> Why is this cast needed?

Because bol is a "const char *" and we need to put "NUL" temporarily to
*end_of_object_name:

> >  	saved = *end_of_object_name;
> >  	*end_of_object_name = '\0';
> >  	status = get_sha1(bol, commit_sha1);
> >  	*end_of_object_name = saved;

Technically, this would have made a fine excuse to teach get_sha1() a mode
where it expects a length parameter instead of relying on a NUL-terminated
string.

Practically, such fine excuses cost me months in this rebase--helper
project already, and I need to protect my time better.

> > -	/*
> > -	 * Verify that the action matches up with the one in
> > -	 * opts; we don't support arbitrary instructions
> > -	 */
> > -	if (action != opts->action) {
> > -		if (action == REPLAY_REVERT)
> > -		      error((opts->action == REPLAY_REVERT)
> > -			    ? _("Cannot revert during another revert.")
> 
> Errr... could the above ever happen?  Namely
> 
>   action != opts->action && action == REPLAY_REVERT && opts->action == REPLAY_REVERT
> 
> Surely not.

Your reply pointed to the very circumstance when this may happen: `git
cherry-pick --continue` after an interrupted `git revert`.

But then, I remove that code here, so I should not try to defend it.

> > -			    : _("Cannot revert during a cherry-pick."));
> > -		else
> > -		      error((opts->action == REPLAY_REVERT)
> > -			    ? _("Cannot cherry-pick during a revert.")
> > -			    : _("Cannot cherry-pick during another cherry-pick."));
> > -		return NULL;
> > -	}
> 
> Anyway, while it is / would be a good idea to prevent starting any
> sequencer-based command (cherry-pick, revert, soon rebase -i) when
> other command is in progress (cherry-pick, revert, soon rebase -i).
> That is, if cherry-pick / revert waits for user action, you cannot
> run another cherry-pick or revert.
> 
> Which I guess the above code was not about...

It was about that, though.

It went about it in a pretty round-about way: opts->action comes from the
name of the command ("was I called as `git revert` or `git cherry-pick`?")
and action comes from the todo script, which was assumed to be written by
a previous run of the sequencer, using the then-current value of
opts->action.

So it wrote that command into *every single line* of the todo script, *for
the sole purpose* of verifying that it was the same action when running
via --continue.

As I said earlier, I would not complain at all if an interrupted `git
revert` could be continued via `git cherry-pick --continue`.

If that is not desirable, I can reintroduce that overzealous check, but
that will have to wait until after v2.10.0. And it would require an
argument that convinces me.

> > +		item = append_todo(todo_list);
> 
> A better name, in my personal option, would be
> 
>   +		item = todo_list_next(todo_list);
> 
> Or todo_next(todo_list).

That sounds more like a function that performs the next command in the
todo_list.

While I agree that naming is hard, I still think that `append_todo()` with
the todo_list as single parameter and returning a todo_item is pretty much
self-explanatory: it appends a new item to the todo_list and returns a
pointer to it.

> > +		item->offset_in_buf = p - todo_list->buf.buf;
> > +		if (parse_insn_line(item, p, eol)) {
> > +			error("Invalid line: %.*s", (int)(eol - p), p);
> 
> This error message should, I think, be also translatable:
> 
>   +			error(_("Invalid line: %.*s"), (int)(eol - p), p);
> 
> > +			res |= error(_("Could not parse line %d."), i);

Sure. In the meantime, I consolidated those two error()s into one, and now
I also marked it translatable.

> BTW. would be we able to show where exactly there was problem parsing,
> that is at which character in line?  Or is it something for the future?

Maybe for the future.

> > -static int read_populate_todo(struct commit_list **todo_list,
> > +static int read_populate_todo(struct todo_list *todo_list,
> >  			struct replay_opts *opts)
> >  {
> >  	const char *todo_file = get_todo_path(opts);
> 
> If I understand it correctly, replay_opts is used only to find out
> correct todo_file, isn't it?

Probably. Maybe also to make certain code paths conditional on rebase -i
mode. Maybe also to figure out whether we run in verbose mode in the
future. Or something.

Think of this `read_populate_todo()` function more as if it were a method
of the "replay class", and the "opts" parameter is kind of "self" or
"this" or whatever it is called in your favorite object-oriented language.


> > -	if (strbuf_read(&buf, fd, 0) < 0) {
> > +	if (strbuf_read(&todo_list->buf, fd, 0) < 0) {
> >  		close(fd);
> > -		strbuf_release(&buf);
> 
> A question: when is todo_list->buf released?

Why, I am glad you asked! It is released in todo_list_release(), called at
the end e.g. of sequencer_continue().

> > -static int walk_revs_populate_todo(struct commit_list **todo_list,
> > +static int walk_revs_populate_todo(struct todo_list *todo_list,
> >  				struct replay_opts *opts)
> >  {
> > +	enum todo_command command = opts->action == REPLAY_PICK ?
> > +		TODO_PICK : TODO_REVERT;
> >  	struct commit *commit;
> > -	struct commit_list **next;
> >  
> >  	if (prepare_revs(opts))
> >  		return -1;
> >  
> > -	next = todo_list;
> > -	while ((commit = get_revision(opts->revs)))
> > -		next = commit_list_append(commit, next);
> > +	while ((commit = get_revision(opts->revs))) {
> > +		struct todo_item *item = append_todo(todo_list);
> > +		const char *commit_buffer = get_commit_buffer(commit, NULL);
> 
> I see that you are creating todo file contents while walking revision list,
> something that was left for later in current / previous implementation
> of the sequencer...

Not really. This function was always about generating a todo_list. It just
did not format it yet.

With the change of keeping the original formatting of the todo script
instead of re-formatting it in save_todo(), this function now has to
format the todo_list itself.

> > +		const char *subject;
> > +		int subject_len;
> > +
> > +		item->command = command;
> > +		item->commit = commit;
> > +		item->offset_in_buf = todo_list->buf.len;
> > +		subject_len = find_commit_subject(commit_buffer, &subject);
> > +		strbuf_addf(&todo_list->buf, "%s %s %.*s\n",
> > +			opts->action == REPLAY_PICK ?  "pick" : "revert",
> 
> Wouldn't it be simpler to use
> 
>   +			todo_command_strings[command],
> 
> Also, this string does not change during the loop, though I guess
> compiler should be able to optimize it.

Sure!

> > +			find_unique_abbrev(commit->object.oid.hash,
> > +				DEFAULT_ABBREV),
> > +			subject_len, subject);
> 
> ...Did format of the 'todo' file changed?  And if yes, was it in backward
> compatible way, so that "git revert" or "git cherry-pick" started with
> old version of Git can be continued with new version, and what is also
> important (for somebody who sometimes uses system-installed Git, and
> sometimes user-compiled one) the reverse: started with new, continued
> with old?

The old format and the new format are compatible. In fact, sequencer's
format was based on rebase -i's format (which makes it all the more
surprising how much the processing deviated).

> > @@ -964,30 +990,24 @@ static int sequencer_rollback(struct replay_opts *opts)
> >  	return -1;
> >  }
> >  
> > -static int save_todo(struct commit_list *todo_list, struct replay_opts *opts)
> > +static int save_todo(struct todo_list *todo_list, struct replay_opts *opts)
> >  {
> >  	static struct lock_file todo_lock;
> > -	struct strbuf buf = STRBUF_INIT;
> > -	int fd;
> > +	const char *todo_path = get_todo_path(opts);
> > +	int next = todo_list->current, offset, fd;
> 
> The "next = todo_list->current" looks a bit strange.

Depending whether we need rebase -i processing or revert/cherry-pick's
slightly different one, the "current" position points to the next one
already...

> Also, we do not change todo_list->current, we use it in one place, so it
> can be used directly without help of temporary / helper variable.  But
> that is just my personal opinion.

No, it has nothing to do with opinion. It prepares the code to keep it
readable even when REPLAY_INTERACTIVE_REBASE is introduced.

> Also, from 'next', 'offset' and 'fd', all those are different uses of
> int: the index (int, rarely size_t), the offset in string (formally
> ptrdiff_t, or size_t, but usually int), and the file descriptor.  I
> think from those the file descriptor could be kept in separate line; it
> would help diff to be more readable.  But this is fairly marginal
> nitpicking, and a matter of personal opinion.

Right. At this point, I am really much more concerned about correctness of
code than discussing personal preferences.

> > -	fd = hold_lock_file_for_update(&todo_lock, git_path_todo_file(), 0);
> > +	fd = hold_lock_file_for_update(&todo_lock, todo_path, 0);
> >  	if (fd < 0)
> >  		return error_errno(_("Could not lock '%s'"),
> >  				   git_path_todo_file());
> 
> We should use 'todo_path' here...

True.

> and this should be done in one of earlier patches, isn't it?

No. I deliberately skipped save_todo() from "future-proofing" as I planned
to rewrite it anyway. There is no point in future-proofing something you
are going to toss in a minute.

> > -	if (format_todo(&buf, todo_list, opts) < 0) {
> > -		strbuf_release(&buf);
> > -		return error(_("Could not format %s."), git_path_todo_file());
> 
> Can we still get this error?  Could we get this error anyway,
> and under what conditions?

No. We keep the original formatting. Keeping it cannot possibly result in
a formatting error.

> > -	}
> > -	if (write_in_full(fd, buf.buf, buf.len) < 0) {
> > -		strbuf_release(&buf);
> > -		return error_errno(_("Could not write to %s"),
> > -				   git_path_todo_file());
> > -	}
> > +	offset = next < todo_list->nr ?
> > +		todo_list->items[next].offset_in_buf : todo_list->buf.len;
> > +	if (write_in_full(fd, todo_list->buf.buf + offset,
> > +			todo_list->buf.len - offset) < 0)
> > +		return error(_("Could not write to %s (%s)"),
> > +			todo_path, strerror(errno));
> 
> Ah, so it saves the remaining todo_items on todo_list, not the
> whole todo_list... the name does not fully show it.

The name also does not fully show that it will write a "done" file after
the sequencer-i patch series.

> 
> > -	if (commit_lock_file(&todo_lock) < 0) {
> > -		strbuf_release(&buf);
> > -		return error(_("Error wrapping up %s."), git_path_todo_file());
> > -	}
> > -	strbuf_release(&buf);
> > +	if (commit_lock_file(&todo_lock) < 0)
> > +		return error(_("Error wrapping up %s."), todo_path);
> 
> Note: this is unrelated change, but we usually put paths in quotes, like this
> 
>   +		return error(_("Error wrapping up '%s'."), todo_path);
> 
> (in this and earlier error message), so that paths containing spaces show
> correctly and readably to the user.  Though this possibly is not a problem
> for this path.

Right.

> Also, how user is to understand "wrapping up"?

The same as before: the removed lines already had the error message,
missing the quotes, too.

Don't get me wrong: I am a big fan of consistency, and I wish that Git's
source code had more of it. So I would love to see a patch series that
makes all error messages consistently reporting paths enclosed in single
quotes.

I am also a big fan of the separation of concerns, though. And this patch
series' concern is consistency *with the existing code*.

So I won't change the error message that I inherited at this point.

> >  static int single_pick(struct commit *cmit, struct replay_opts *opts)
> >  {
> >  	setenv(GIT_REFLOG_ACTION, action_name(opts), 0);
> > -	return do_pick_commit(cmit, opts);
> > +	return do_pick_commit(opts->action == REPLAY_PICK ?
> > +		TODO_PICK : TODO_REVERT, cmit, opts);
> 
> The ternary conditional operator here translates one enum to other enum,
> isn't it?

Well, almost. Please note that the enum will receive a new value in the
sequencer-i patch series. And there is no equivalent todo_command for
REPLAY_INTERACTIVE_REBASE.

Thanks for the review!
Johannes