Re: [PATCH] Add very verbose porcelain output to status

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Jeff King <peff@peff.net>
To: Jeff Hostetler <jeffhost@microsoft.com>
Cc: git@vger.kernel.org, git@jeffhostetler.com, gitster@pobox.com
Subject: Re: [PATCH] Add very verbose porcelain output to status
Date: Tue, 12 Jul 2016 11:07:49 -0400	[thread overview]
Message-ID: <20160712150749.GD613@sigill.intra.peff.net> (raw)
In-Reply-To: <1467919588-11930-1-git-send-email-jeffhost@microsoft.com>

On Thu, Jul 07, 2016 at 03:26:28PM -0400, Jeff Hostetler wrote:

> Tools interacting with Git repositories may need to know the complete
> state of the working directory. For efficiency, it would be good to have
> a single command to obtain this information.
> 
> We already have a `--porcelain` mode intended for tools' consumption,
> and it only makes sense to enhance this mode to offer more information.
> 
> Just like we do elsewhere in Git's source code, we now interpret
> multiple `--verbose` flags accumulatively, and show substantially more
> information in porcelain mode at verbosity level 2.

Using "--verbose" to change the stable output feels a little funny to
me. Usually --verbose is about increasing the human-readable output to
stderr. Of course "--verbose" for git-status is already a little weird.
A single "-v" seems to do nothing, but two will turn on a diff for the
long-format output.

But it seems to me this just confuses things more. Why does "git status
--porcelain -v" do nothing, for example? What happens when we want to
add more information to the porcelain output later? Do we have to
require "-vvv" to trigger it, and if so, is there a way to get that
information without the "-vv" information?

I think the existing example for adding to the --porcelain format is
the --branch option. Could we have "--state" or something to trigger
the extra lines?

> +If --branch is given, the first line shows a summary of the current
> +operation in progress.  This line begins with "### state: ", the name
> +of the operation in progress, and then operation-specific information.
> +Fields are separated by a single space.

This seems like a good bit of information to have, and AFAIK we don't
expose the state information in a machine-readable way. A few nits:

> +    Operation   Fields                     Explanation
> +    ------------------------------------------------------------------
> +    clean

Using "clean" here seems weird, as that term is usually meant to mean
the opposite of dirty (i.e., there is something in your working tree
that can be committed). But there is no "dirty" here. Would "normal" or
"none" be a good description?

> +    ------------------------------------------------------------------
> +    merge       <nr>                       Number unmerged

This <nr> (and others below) is redundant with the rest of the output,
right (i.e., you could count up the "U" entries yourself)? I don't mind
it as a convenience, but I'm kind of curious of why it's useful.

> +    ------------------------------------------------------------------
> +    am          [E]                        Present if the current patch
> +                                           is empty
> +    ------------------------------------------------------------------
> +    rebase      <nr>                       Number unmerged
> +                [S]                        Present if split commit in
> +                                           progress during rebase
> +                [E]                        Present if editing a commit
> +                                           during rebase

Can a rebased commit be empty? If so, should that state get "E" to match
"am"?

Does editing differentiate between "stopped because of a conflict that
needs addressing" versus "stopped because the interactive insn sheet
told us to"?

> +                [I(<done>/<total>)]        Present if in an interactive
> +                                           rebase. Step counts are given.
> +                [<current>:<onto>]         Rebase branches

I wonder if we ought to just use "<current> <onto>" here, as separate
arguments. A ":" between refs usually signifies a refspec, but that is
not what this is.

I note this is optional, though. Is there a case where we might have one
but not the other? That would make things more difficult syntactically.

> +If --branch is given, the second line shows branch tracking information.
> +This line begins with "### track: ".  Fields are separated by a single
> +space, unless otherwise indicated.
> +
> +    Field                    Meaning
> +    --------------------------------------------------------
> +    <sha> | "(initial)"      Current commit
> +    <branch> | "(detached)"  Current branch
> +    ":"<upstream>            Upstream branch, if set
> +    "+"<ahead>               Ahead count, if upstream present
> +    "-"<behind>              Behind count, if upstream present
> +    --------------------------------------------------------

This really _ought_ to have been how --branch worked in the first place
with --porcelain. But unfortunately the existing format has been in the
wild for a long time. To top it off, the existing format is translated!
So you cannot even match "Initial commit on (.*)". Yech.

So this looks like a big improvement. And I see now why you wanted
something like "-vv" to implement this, instead of "--state". In
addition to adding new lines, this is fixing mistakes made in the prior
format.

I'm still not convinced that it makes sense to trigger this more-sane
format with a "verbose" option, though. It's not more verbose. It's just
less crappy. It would make more sense to me if this was "porcelain v2",
and triggered with "--porcelain=2" or similar.

Ditto with the new per-file format you introduce (which at least _is_
more verbose).

> +A series of lines are then displayed for the tracked entries.
> +Lines have one of the following formats:
> +
> +    XYS mH mI mW shaH shaI PATH
> +    XYS mH mI mW shaH shaI score OLD_PATH\tPATH
> +    XYS m1 m2 m3 mW sha1 sha2 sha3 PATH

So this is sort of like a --raw diff, but covering the 3 states. That
makes some sense to me.

I notice in the middle one that there is room for only one rename
(between the index and HEAD). I guess we cannot have a rename between
the working tree and the index, because the dst file would have to be a
new file, which by definition is untracked, and not eligible.

One of the nasty points of the current format is that it is
context-sensitive. You don't know if you have "PATH" with a space, or
"score OLD_PATH" until you see whether "X" tells you if you have a
rename or not. So simple parsing with things like "cut" doesn't work.

Can we collapse these into a single line format where some entries just
end up with some nil state (e.g., "0" for the score of something that
isn't a rename)?

While I'm thinking about it, have you looked at how this all works with
"-z"?

> +* X and Y were described in the short format section.
> +* S is a one character summary of the submodule status with values '0'..'7'
> +  representing the sum of: 4 when submodule has a new commit, 2 when the
> +  submodule is modified, and 1 when the submodule contains untracked changes.

Is there any reason not to give these non-numeric constants? I.e., to
treat it as a sequence of flags, which can be present or not?

> +  The value is ' ' if the entry is not a submodule.

So would it be:

  XY  100644 ...

with two spaces in that case?

I wonder if it would be less mysterious if the submodule data got its
own bitfield, like:

  N - not a submodule
  S[C][M][U] - is a submodule, with one or more of:
                - commits (C)
		- modifications (M)
		- untracked files (U)

> @@ -1380,7 +1382,20 @@ int cmd_status(int argc, const char **argv, const char *prefix)
>  	fd = hold_locked_index(&index_lock, 0);
>  
>  	s.is_initial = get_sha1(s.reference, sha1) ? 1 : 0;
> +	if (!s.is_initial)
> +		hashcpy(s.sha_commit, sha1);
> +
>  	s.ignore_submodule_arg = ignore_submodule_arg;
> +	if (verbose > 1 && status_format == STATUS_FORMAT_PORCELAIN) {
> +		/* Capture extra data for very verbose porcelain output. */
> +		s.verbose = verbose;

The existing s.verbose setup happens in the big format-specific switch
statement below. Should this go there, too? (Though if you have read the
rest of my email, you know I'd prefer this actually end up as a new
format).

> +		/* Force safe_crlf off to prevent normal LF/CRLF warning
> +		 * message from being printed on stderr for each new file.
> +		 */
> +		safe_crlf = SAFE_CRLF_FALSE;
> +	}

Hmm, this seems like something that would be useful for all formats. It
comes from doing a diff at all, right?

> [...]

I gave only a brief skim over the rest of the code, as I think the
design issues are more interesting at this stage.

Please don't let the length of my comments discourage you; I think this
is a good direction to be going. I did find I had to piece together the
rationale a bit from looking at the patch itself. I think it would be
easier to understand as a few patches:

  - add infrastructure and rules for a new v2 porcelain format; the
    format is still in flux for the next few patches, but will be
    cemented at release time (in your world-view this is the "more
    verbose format", but I'm taking the liberty of pretending you agree
    with everything I wrote above ;) ).

  - add v2 per-file format; the rationale is that the existing format
    leaves out some details. And also that it's hard to parse (if you
    agree that we should clean up those cases).

  - convert "--branch" output in v2 to what you have here; the rationale
    is that the existing format is horribly unstable

  - add "--state"; the rationale is that there isn't a machine-readable
    way to get this data right now. This can be independent of the v1/v2
    flag, since we output it only when asked for, and v1-era git didn't
    have it at all.

  - the SAFE_CRLF_FALSE tweak above; this seems orthogonal to the rest
    of it, but offhand seems like a reasonable idea to me.

-Peff

next prev parent reply	other threads:[~2016-07-12 15:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-07 19:26 [PATCH] Add very verbose porcelain output to status Jeff Hostetler
2016-07-08  6:29 ` Johannes Schindelin
2016-07-12 15:07 ` Jeff King [this message]
2016-07-12 16:13   ` Jeff Hostetler
2016-07-13  8:00     ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160712150749.GD613@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).