git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Git mailing list <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Brandon Williams <bmwill@google.com>
Subject: Re: [PATCH 18/19] diff: buffer all output if asked to
Date: Tue, 16 May 2017 09:42:23 -0700	[thread overview]
Message-ID: <CAGZ79kZQBG0AmADWCQ5yj+sRMtwiy8HOnD3SLFS_-yPPoRbfTA@mail.gmail.com> (raw)
In-Reply-To: <CAGf8dgJP+i5RL3FaGSYZyVKkt1ttSnnPd924ebs=4xJb4Fhc6w@mail.gmail.com>

On Mon, May 15, 2017 at 9:14 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> Overall, this patch seems larger than it should to me, although there
> might be good reasons for that that I don't know. I'll remark on what
> I find unexpected.
>


>>
>> +static void emit_buffered_patch_line_ws(struct diff_options *o,
>> +                                       struct buffered_patch_line *e,
>> +                                       const char *ws, unsigned ws_rule)
>
> This introduces a new _ws emission function - how is this used and how
> is this different from the non-ws one? I see BPL_EMIT_LINE_WS, but I
> don't see the caller that introduces that constant in this patch.

See emit_line_ws, which makes use of BPL_EMIT_LINE_WS.
The difference between BPL_EMIT_LINE_WS and BPL_EMIT_LINE_AS_IS
is that _WS is emitted marking up whitespace differently (e.g. 8 continuous
spaces instead of a tab or such), see the core.whitespace option.
Relevant hunk:

@@ -557,9 +616,12 @@ static void emit_line_ws(struct diff_options *o,
                         const char *line, int len,
                         const char *ws, unsigned ws_rule)
 {
-       emit_line_0(o, set, reset, sign, "", 0);
-       ws_check_emit(line, len, ws_rule,
-                     o->file, set, reset, ws);
+       struct buffered_patch_line e = {set, reset, line, len, sign,
BPL_EMIT_LINE_WS};
+
+       if (o->use_buffer)
+               append_buffered_patch_line(o, &e);
+       else
+               emit_buffered_patch_line_ws(o, &e, ws, ws_rule);
 }

>> +       switch (e->state) {
>> +               case BPL_EMIT_LINE_ASIS:
>> +                       emit_buffered_patch_line(o, e);
>> +                       break;
>> +               case BPL_EMIT_LINE_WS:
>> +                       emit_buffered_patch_line_ws(o, e, ws, ws_rule);
>> +                       break;
>> +               case BPL_HANDOVER:
>> +                       o->current_filepair++;
>
> If we're just buffering the diff output, do we need to store
> per-file-pair metadata? (I assume that's why you need a special
> handover constant.) Clients can already read what they need from the
> diff output.

Currently we keep only whitespace settings per-file separate as they are
defined per path (via attributes).

So I read this comment as if you consider the per-file buffer unneeded and
we could just detect the next file via the line

    <line-prefix>diff --git a/<file> b/<file>

and then re-read the attributes and remember only the current files settings in
the output pass. I'll look into that.

>> +        *
>> +        * TODO (later in this series):
>> +        * We'll unset this flag in a later patch.
>> +        */
>> +       o->use_buffer = 1;
>
> What I would do is to add a demonstration patch at the end of the
> patch series (which is not supposed to be queued) to avoid such churn
> in history, but I'm not sure how the Git project prefers to do this.

ok, I can omit this part in a reroll.

>> + *
>> + * NEEDSWORK: Instead of storing a copy of the line, add an offset pointer
>> + * into the pre/post image file. This pointer could be a union with the
>> + * line pointer. By storing an offset into the file instead of the literal line,
>> + * we can decrease the memory footprint for the buffered output. At first we
>> + * may want to only have indirection for the content lines, but we could
>> + * also have an enum (based on sign?) that stores prefabricated lines, e.g.
>> + * the similarity score line or hunk/file headers.
>
> This would be nice, but come to think of it, might not be possible.
> When requesting --word-diff, control characters (or others) might
> appear in the output, right?

That is why we'd have even more states. ;)
Or duplicate word diff still, but lines left intact are referenced via offsets.
It's a hard problem to get right, so I defer it via this comment.

>
>> + */
>> +struct buffered_patch_line {
>> +       const char *set;
>> +       const char *reset;
>> +       const char *line;
>> +       int len;
>> +       int sign;
>> +       enum {
>> +               BPL_EMIT_LINE_WS,
>> +               BPL_EMIT_LINE_ASIS,
>> +               BPL_HANDOVER
>> +       } state;
>
> It might be better, for simplicity, just to have one big buffer
> including everything (if we decide that we really can't add pointers
> to input later).

What do you mean here? (Drop the other structs such as the file pair?)

Thanks for the review!
Stefan

  reply	other threads:[~2017-05-16 16:42 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
2017-05-14  4:00 ` [PATCH 01/19] diff: readability fix Stefan Beller
2017-05-14  4:01 ` [PATCH 02/19] diff: move line ending check into emit_hunk_header Stefan Beller
2017-05-15  6:48   ` Junio C Hamano
2017-05-15 16:13     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0 Stefan Beller
2017-05-15 18:26   ` Jonathan Tan
2017-05-15 18:33     ` Stefan Beller
2017-05-16 16:05       ` Jonathan Tan
2017-05-15 19:22   ` Brandon Williams
2017-05-15 19:35     ` Stefan Beller
2017-05-15 19:45       ` Brandon Williams
2017-05-14  4:01 ` [PATCH 04/19] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
2017-05-14  4:01 ` [PATCH 05/19] diff.c: emit_line_0 can handle no color setting Stefan Beller
2017-05-15 18:31   ` Jonathan Tan
2017-05-15 22:11     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 06/19] diff: add emit_line_fmt Stefan Beller
2017-05-15 19:31   ` Brandon Williams
2017-05-14  4:01 ` [PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_* Stefan Beller
2017-05-16  1:00   ` Junio C Hamano
2017-05-16  1:05     ` Junio C Hamano
2017-05-16 16:23       ` Stefan Beller
2017-05-14  4:01 ` [PATCH 08/19] diff.c: convert builtin_diff " Stefan Beller
2017-05-15 18:42   ` Jonathan Tan
2017-05-14  4:01 ` [PATCH 09/19] diff.c: convert emit_rewrite_diff " Stefan Beller
2017-05-14  4:01 ` [PATCH 10/19] diff.c: convert emit_rewrite_lines " Stefan Beller
2017-05-15 19:09   ` Jonathan Tan
2017-05-15 19:31     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 11/19] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
2017-05-14  4:01 ` [PATCH 12/19] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
2017-05-14  4:01 ` [PATCH 13/19] diff.c: convert show_stats " Stefan Beller
2017-05-14  4:01 ` [PATCH 14/19] diff.c: convert word diffing " Stefan Beller
2017-05-15 22:40   ` Jonathan Tan
2017-05-15 23:12     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 15/19] diff.c: convert diff_flush " Stefan Beller
2017-05-15 20:21   ` Jonathan Tan
2017-05-15 22:08     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 16/19] diff.c: convert diff_summary " Stefan Beller
2017-05-14  4:01 ` [PATCH 17/19] diff.c: factor out emit_line_ws for coloring whitespaces Stefan Beller
2017-05-14  4:01 ` [PATCH 18/19] diff: buffer all output if asked to Stefan Beller
2017-05-14  4:06   ` Jeff King
2017-05-14  4:25     ` Stefan Beller
2017-05-16  4:14   ` Jonathan Tan
2017-05-16 16:42     ` Stefan Beller [this message]
2017-05-14  4:01 ` [PATCH 19/19] diff.c: color moved lines differently Stefan Beller
2017-05-15 22:42   ` Brandon Williams
2017-05-16  4:34   ` Jonathan Tan
2017-05-16 12:31   ` Jeff King
2017-05-15 12:43 ` [RFC PATCH 00/19] Diff machine: highlight moved lines Junio C Hamano
2017-05-15 16:33   ` Stefan Beller
2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 01/20] diff: readability fix Stefan Beller
2017-05-17  2:58   ` [PATCHv2 02/20] diff: move line ending check into emit_hunk_header Stefan Beller
2017-05-17  2:58   ` [PATCHv2 03/20] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
2017-05-17  2:58   ` [PATCHv2 04/20] diff.c: teach emit_line_0 to accept sign parameter Stefan Beller
2017-05-17  2:58   ` [PATCHv2 05/20] diff.c: emit_line_0 can handle no color setting Stefan Beller
2017-05-17  2:58   ` [PATCHv2 06/20] diff.c: emit_line_0 takes parameter whether to output line prefix Stefan Beller
2017-05-17  2:58   ` [PATCHv2 07/20] diff.c: inline emit_line_0 into emit_line Stefan Beller
2017-05-17  2:58   ` [PATCHv2 08/20] diff.c: convert fn_out_consume to use emit_line Stefan Beller
2017-05-17  2:58   ` [PATCHv2 09/20] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
2017-05-17  2:58   ` [PATCHv2 10/20] diff.c: convert emit_rewrite_diff " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 11/20] diff.c: convert emit_rewrite_lines " Stefan Beller
2017-05-17  5:03     ` Junio C Hamano
2017-05-17 21:16       ` Stefan Beller
2017-05-18  3:35     ` Junio C Hamano
2017-05-17  2:58   ` [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
2017-05-17  5:19     ` Junio C Hamano
2017-05-17 21:05       ` Stefan Beller
2017-05-18  3:25         ` Junio C Hamano
2017-05-18 17:12           ` Stefan Beller
2017-05-20  4:50             ` Junio C Hamano
2017-05-20 22:00               ` Stefan Beller
2017-05-17  2:58   ` [PATCHv2 13/20] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
2017-05-17  2:58   ` [PATCHv2 14/20] diff.c: convert show_stats " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 15/20] diff.c: convert word diffing " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 16/20] diff.c: convert diff_flush " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 17/20] diff.c: convert diff_summary " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 18/20] diff.c: emit_line includes whitespace highlighting Stefan Beller
2017-05-17  2:58   ` [PATCHv2 19/20] diff: buffer all output if asked to Stefan Beller
2017-05-17  2:58   ` [PATCHv2 20/20] diff.c: color moved lines differently Stefan Beller
2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
2017-05-18 19:37     ` [PATCHv3 01/20] diff: readability fix Stefan Beller
2017-05-18 19:37     ` [PATCHv3 02/20] diff: move line ending check into emit_hunk_header Stefan Beller
2017-05-18 19:37     ` [PATCHv3 03/20] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
2017-05-18 19:37     ` [PATCHv3 04/20] diff.c: teach emit_line_0 to accept sign parameter Stefan Beller
2017-05-18 23:33       ` Jonathan Tan
2017-05-22 23:36         ` Stefan Beller
2017-05-18 19:37     ` [PATCHv3 05/20] diff.c: emit_line_0 can handle no color setting Stefan Beller
2017-05-18 19:37     ` [PATCHv3 06/20] diff.c: emit_line_0 takes parameter whether to output line prefix Stefan Beller
2017-05-18 19:37     ` [PATCHv3 07/20] diff.c: inline emit_line_0 into emit_line Stefan Beller
2017-05-18 19:37     ` [PATCHv3 08/20] diff.c: convert fn_out_consume to use emit_line Stefan Beller
2017-05-18 19:37     ` [PATCHv3 09/20] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
2017-05-18 19:37     ` [PATCHv3 10/20] diff.c: convert emit_rewrite_diff " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 11/20] diff.c: convert emit_rewrite_lines " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
2017-05-18 19:37     ` [PATCHv3 13/20] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
2017-05-18 19:37     ` [PATCHv3 14/20] diff.c: convert show_stats " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 15/20] diff.c: convert word diffing " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 16/20] diff.c: convert diff_flush " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 17/20] diff.c: convert diff_summary " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 18/20] diff.c: emit_line includes whitespace highlighting Stefan Beller
2017-05-18 19:37     ` [PATCHv3 19/20] diff: buffer all output if asked to Stefan Beller
2017-05-18 19:37     ` [PATCHv3 20/20] diff.c: color moved lines differently Stefan Beller
2017-05-19 18:23       ` Jonathan Tan
2017-05-19 18:40         ` Stefan Beller
2017-05-19 19:34           ` Jonathan Tan
2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
2017-05-23  2:40       ` [PATCHv4 01/17] diff: readability fix Stefan Beller
2017-05-23  2:40       ` [PATCHv4 02/17] diff: move line ending check into emit_hunk_header Stefan Beller
2017-05-23  2:40       ` [PATCHv4 03/17] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
2017-05-23  2:40       ` [PATCHv4 04/17] diff: introduce more flexible emit function Stefan Beller
2017-05-23  2:40       ` [PATCHv4 05/17] diff.c: convert fn_out_consume to use emit_line Stefan Beller
2017-05-23  2:40       ` [PATCHv4 06/17] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
2017-05-23  2:40       ` [PATCHv4 07/17] diff.c: convert emit_rewrite_diff " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 08/17] diff.c: convert emit_rewrite_lines " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 09/17] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
2017-05-23  5:59         ` Junio C Hamano
2017-05-23 18:14           ` Stefan Beller
2017-05-23  2:40       ` [PATCHv4 10/17] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
2017-05-23  2:40       ` [PATCHv4 11/17] diff.c: convert show_stats " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 12/17] diff.c: convert word diffing " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 13/17] diff.c: convert diff_flush " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 14/17] diff.c: convert diff_summary " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 15/17] diff.c: emit_line includes whitespace highlighting Stefan Beller
2017-05-23  2:40       ` [PATCHv4 16/17] diff: buffer all output if asked to Stefan Beller
2017-05-23  2:40       ` [PATCHv4 17/17] diff.c: color moved lines differently Stefan Beller
2017-05-27  1:04       ` [PATCHv4 00/17] Diff machine: highlight moved lines Jacob Keller
2017-05-30 21:38         ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZQBG0AmADWCQ5yj+sRMtwiy8HOnD3SLFS_-yPPoRbfTA@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).