On Wed, 06 Dec 2006 10:31:21 -0800, Junio C Hamano wrote:
> I am not sure what needs to be commented on at this point, since
> it is not yet clear to me where you want your proposal to lead
> us.

Thanks for the comments you made here---that's the kind of thing I was
looking for.

As for where I'm trying to lead us, what I really want to do is to
help improve the learnability of git. A big part of that is about
improving the set of "use-oriented" documentation, (which describes
how to achieve tasks, as opposed to what might be termed "technically
oriented" documentation which describes how individual tools work). I
think too much of the existing documentation falls into the second
class.

A parallel thread is already talking about some of the important
organizational aspects of use-oriented documentation. And I agree with
that thread is that the short "attention span" is a primary
consideration for this kind of documentation. The user has a task to
be accomplish, and any text or concepts that aren't contributing to
the solution of that task should be eliminated.

Note that when I talk about eliminating unnecessary concepts, I do not
mean lying to the user about the underlying model or any concepts. We
can't have a sugar-coated tutorial that says one thing, and then
expect users to "unlearn" that if they go deeper into the reference
manual. That's a recipe for disaster.

Also, when I say "use-oriented" I'm not suggesting that the
documentation be shallow. It can go as deep as any workflow we care to
document and introduce whatever concepts of git are necessary to
support that workflow. (There is, though a level at which "technically
oriented" documentation is all that's needed, or even desired, and
that's when the documentation is targeting authors of interfaces that
build on top of git---not users trying to use git to get work done at
the command line).

OK, so if my concern is all about documentation, then what am I doing
proposing new commands or new ways of thinking about existing commands
rather than just sending documentation patches? The problem is that
the current semantics of the following variations of "git commit":

	git commit
	git commit -a
	git commit paths...

defeat the goal of writing good, clean use-oriented documentation. So
there's some adjustment that should be made first. And I don't even
care what the adjustment is, (for example, it doesn't have to be
"commit -a by default"), but please recognize the problem and help me
come up with an acceptable way to fix it.

To demonstrate, let's take the simplest of use cases and try to
document it in as clear a way as possible. Let's imagine we're in a
tutorial where we've just guided the user to making modifications to
several existing, tracked files, (starting from an initial clone, not
an init-db), and the next task to teach the user commit for the first
time. We would like to document both "commit a single modified file"
and "commit all modified files". Here are two approaches that I can
come up with:

1. Any commit involves first "add"ing together new content, and then
   committing the result. For example to commit a single file:

	git add file		# add new content from file
	git commit		# commit the result

   As a shortcut, "commit -a", (or --all) can be used to automatically
   "add" the content of all tracked files before the commit. So the
   common case of committing all tracked files is as easy as:

	git commit -a		# commit content of all tracked files

2. The new content of modified files can be committed by naming the
   files on the "git commit" command line. For example:

	git commit file		# commit new content of file

   As a shortcut, "commit -a", (or --all) can be used to commit the
   content of all tracked files:

	git commit -a		# commit content of all tracked files

Neither of the above is totally satisfactory.

In (1) the user is not presented with a framework that will make sense
of "git commit files...". The expansion of "-a" as "--all" could
easily give the user the impression that "git commit files..." is a
shortcut for "git add files...; git commit", but that's wrong and
could lead to unexpected results and confusion.

In (2) the user is not presented with a framework that will make sense
of "git commit" with no arguments. The user is left to wonder about
why the --all is needed and what it means exactly, (particularly since
"git commit" also commits the content of all tracked files.

Various fixes have been proposed for these potential confusions. For
example, making "git commit files..." default to the behavior of
--include instead of --only would eliminate the confusion I described
for (1). And making -a the default for "git commit" would eliminate
the confusion I described for (2).

However, actually implementing either of those fixes would then break
the initial "commit one file" example from the other approach. Because
of that, the conversation has often fallen into debate over whether
(1) or (2) is the "one true way" to describe git, and which one leads
the user to have an incorrect mental model.

But I think that debate is misguided since both descriptions are
worthwhile and valid. (1) is based around an explanation of what "git
commit" does, and (2) is based around an explanation of what "git
commit files..." does. And both of these commands are very useful
exactly how they are.

It's almost coincidental that "commit -a" fits in logically with
either description.

So what I was trying to get across in this latest thread is that git's
command-line interface already has two slightly different models for
what's going on in a commit. You don't agree with me on that point
yet, (more on that below in my reply).

I really don't care what the final fix is, but I would love to see
documentation with no more complexity than the above that accurately
captures the useful functionality.

And I don't actually have a concrete proposal for a fix yet---I was
just offering the commit-index-content and commit-working-tree-content
ideas as ways to think about the issue. Maybe the two documentation
blurbs above capture it in a better way.

Do you feel like you have a better understanding of what I'm trying to
do now?

> I do not agree with your "three commands" or "two semantics"
> characterization of the current way "git commit" works.  "git
> commit" without any optional argument already acts as if a
> sensible default arguments are given, that is "no funny business
> with additional paths, commit just what the user has staged
> already."

I agree that "git commit" does nothing funny by default. What I was
pointing out is that "git commit" and "git commit paths..." do not
have the same semantics. There's really nothing to debate about
there. There is no argument you can substitute for <paths...> to give
you identical behavior as "git commit". That's a fact.

> "git commit" is primarily about committing what has been staged
> in the index, and "--all" is just a type-saver short-hand (just
> like "--include" is) to perform update-index the last minute and
> nothing more.  In other words, "--all" is a variant of the
> pathname-less form "git commit".  It is not a variant of "git
> commit --only paths..." form, as you characterized.

I hope the documentation blurbs (1) and (2) above show how "commit -a"
can be seen as a variant of either "commit" or "commit files...",
(which themselves are both useful semantics, but demonstrably
distinct).

> The pathname form (the "--only" variant) on the surface seem to
> work differently, but when you think about it, it is not all
> that different from the normal commit.  We explain that it
> ignores index, but in the bigger picture, it does not really.

No, it really is different.

> the first commit does "jump" the changes already made to the
> index, but after it makes the commit, the index has the same
> contents as if you did "git update-index a b" where you ran that
> "git commit".  In other words, it is just a handy short-hand to
> pretend as if you did the above sequence in this order instead:

How could you document "git commit files..." as a shorthand? A
shorthand for what exactly? A shorthand for pretending you didn't just
type the commands you did type that got the index into its current
state, but had instead typed different commands before the commit and
other commands afterwards?

That's crazy. That's not a shorthand. That's just plain different
semantics. The current "git commit files..." command never does commit
the contents of "the" index as a concept presented by "git
commit". (This is independent of the fact that the implementation of
"git commit files..." certainly does use an index file somewhere and
uses it to create a commit object in the same way that "git commit"
uses "the" index).

> So I actually think it is a mistake to stress the fact that "git
> commit --only paths..." seems to act differently from the normal
> "git commit" too much.

I think that would be lying to the users and setting them up to get
confused later. I discussed this above as the confusion that can
result with the explanation of (1). If you teach "git commit" as
commiting "the" index, and de-emphasize that "git commit files..."
is semantically distinct, then how is a user ever supposed to learn
what it is that "git commit files..." is actually doing?

> In short, while I understand that your "proposal" shows your own
> way to summarize the semantics of "git commit", I am not seeing
> what it buys us, and I do not see the need to come up with a
> pair of new two commands for making commits (if that is what the
> proposal is about, that is, but it is not clear to me if that is
> what you are driving at).  I think it would only confuse users.

Forgive me again for being obtuse. I don't think we should necessarily
add two new commands. I was trying to illustrate a problem in the
existing command set, and propose a new way of thinking about the
tasks that the current commands help a user to perform, (committing
content from the working tree or committing content from the index). I
don't actually have a concrete proposal for how to take that way of
thinking and map it to a command set, (and one that would disrupt
current git users as little as possible). I'd love to have some help
with that part.

> Is it just me who finds the above a very much made-up example?

Fine. We can ignore that example.

> In any case, I should clarify my aversion to partial commits a
> bit.  What is more important is to notice that, while you cannot
> compile-and-run test what is in the index in isolation (without
> a fuse that exports the index contents as a virtual filesystem
> -- anybody interested?), you _can_ preview and verify the text
> that is going to be committed by comparing the index and the
> HEAD.  And for that, your "staging" action (i.e. Nico's "git
> add") needs to be a separate step from your "committing" action.

Yes, I often use the index as a place to preview things. And it is
true that I find myself using update-index when I could have used
"commit paths..." precisely because I can preview it once more. But I
do use the "commit paths..." form at times as well. If I have just
reviewed things in "git diff" and there are _really_ obviously
separable pieces I will commit them alone with staging into the index
and reviewing again.

It's probably the case that I skip the explicit staging and extra
preview when I can use a single pathname as the argument to "git
commit".

> In other words, I would even love Johannes's "per hunk commit"
> idea, at least if it had an option to preview the whole thing
> just one more time before committing, and I would love it better
> if it had an option for not committing but just updating.

Yes! I've wanted tools to help with per-hunk separation before, but
since I'm so likely to make mistakes while doing that I would only
want that to go into the index so that I could review it before
committing. I guess I might need a per-hunk way to fix up my mistakes
too if I put a hunk into the index that I didn't want to be there.

-Carl