git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Orgad Shaneh <orgads@gmail.com>
Subject: Re: [PATCH 5/5] diff: handle NULs in get_string_hash()
Date: Thu, 19 Oct 2017 14:31:20 -0700	[thread overview]
Message-ID: <CAGZ79kY3jVgBRCKkswHV7xbjiiNgvsbVsgq_9PpVwa-zTt4q0Q@mail.gmail.com> (raw)
In-Reply-To: <20171019203119.d2xbryov2y5t2aig@sigill.intra.peff.net>

On Thu, Oct 19, 2017 at 1:31 PM, Jeff King <peff@peff.net> wrote:
> For computing moved lines, we feed the characters of each
> line into a hash. When we've been asked to ignore
> whitespace, then we pick each character using next_byte(),
> which returns -1 on end-of-string, which it determines using
> the start/end pointers we feed it.
>
> However our check of its return value treats "0" the same as
> "-1", meaning we'd quit if the string has an embedded NUL.

I agree. The code looks correct.

> This is unlikely to ever come up in practice since our line
> boundaries generally come from calling strlen() in the first
> place.

get_string_hash is called from
 prepare_entry, which in turn is called from
  add_lines_to_move_detection or mark_color_as_moved
   diff_flush_patch_all_file_pairs

that constructs the arguments in
diff_flush_patch
 run_diff
  run_diff_cmd
   builtin_diff (part "/* Crazy xdl interfaces.. */")
    xdi_diff_outf( fn_out_consume as arg!)
     xdi_diff
      xdl_diff
       xdl_call_hunk_func
        -> fn_out_consume(cb, line, len)

xdl_call_hunk_func however uses pointer arithmetic instead
of strlen. So I think this sentence is not a good idea to put in
the commit message.

It may not occur in practice, due to binary files detection using
NUL as a signal, but conceptually our move-colored(!) diffs
should be compatible with NULs with this patch now.

> But it was a bit surprising to me as a reader of the
> next_byte() code. And it's possible that we may one day feed
> this function with more exotic input, which otherwise works
> with arbitrary ptr/len pairs.

Good point.

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I noticed that we make an extra copy of each line here, just to feed it
> to memihash! I guess "-w" is not a critical-performance code path, but
> this could be fixed if we could do memhash() incrementally (e.g., by
> putting the FNV state into a struct and letting callers "add" to it
> incrementally). Maybe an interesting #leftoverbits, though I'd want to
> see timing tests that show it's worth doing.
>

I agree.

Thanks,
Stefan

  reply	other threads:[~2017-10-19 21:31 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12 19:53 Out of memory with diff.colormoved enabled Orgad Shaneh
2017-10-12 20:05 ` Jeff King
2017-10-12 22:39   ` Stefan Beller
2017-10-12 23:33   ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-13  0:18     ` Jeff King
2017-10-13  0:20       ` Jeff King
2017-10-13  0:24         ` Stefan Beller
2017-10-19  5:04         ` Jeff King
2017-10-19  5:24           ` Jeff King
2017-10-19  5:30             ` Junio C Hamano
2017-10-19  5:32               ` Junio C Hamano
2017-10-19  5:32                 ` Jeff King
2017-10-19  5:42               ` Jeff King
2017-10-19 19:55                 ` Stefan Beller
2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
2017-10-19 20:24                   ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
2017-10-19 20:56                     ` Stefan Beller
2017-10-19 21:10                       ` Jeff King
2017-10-19 20:25                   ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
2017-10-19 20:54                     ` Stefan Beller
2017-10-19 20:26                   ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
2017-10-19 21:03                     ` Stefan Beller
2017-10-19 21:14                       ` Jeff King
2017-10-19 20:29                   ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
2017-10-19 21:15                     ` Stefan Beller
2017-10-19 21:19                       ` Jeff King
2017-10-20  7:23                     ` Simon Ruderich
2017-10-20 22:37                       ` Jeff King
2017-10-19 20:31                   ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
2017-10-19 21:31                     ` Stefan Beller [this message]
2017-10-19 21:39                       ` Jeff King
2017-10-19 21:50                         ` Stefan Beller
2017-10-19 19:53             ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-19 19:55               ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kY3jVgBRCKkswHV7xbjiiNgvsbVsgq_9PpVwa-zTt4q0Q@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=orgads@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).