git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: Git List <git@vger.kernel.org>, ashishnegi33@gmail.com
Subject: Re: [PATCH 1/1] convert: tighten the safe autocrlf handling
Date: Fri, 24 Nov 2017 19:59:05 +0100	[thread overview]
Message-ID: <20171124185905.GA9736@tor.lan> (raw)
In-Reply-To: <CAPig+cT7=yLUVpmtutmTep5NBbSRNOL17dsOuVvn_Scu7_+p_w@mail.gmail.com>

On Fri, Nov 24, 2017 at 12:24:48PM -0500, Eric Sunshine wrote:
> On Fri, Nov 24, 2017 at 11:14 AM,  <tboegi@web.de> wrote:
> > When a text file had been commited with CRLF and the file is commited
> > again, the CRLF are kept if .gitattributs has "text=auto".
> > This is done by analyzing the content of the blob stored in the index:
> > If a '\r' is found, Git assumes that the blob was commited with CRLF.
> >
> > The simple search for a '\r' does not always work as expected:
> > A file is encoded in UTF-16 with CRLF and commited. Git treats it as binary.
> > Now the content is converted into UTF-8. At the next commit Git treats the
> > file as text, the CRLF should be converted into LF, but isn't.
> >
> > Solution:
> > Replace has_cr_in_index() with has_crlf_in_index(). When no '\r' is found,
> > 0 is returned directly, this is the most common case.
> > If a '\r' is found, the content is analyzed more deeply.
> >
> > Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> > ---
> > diff --git a/convert.c b/convert.c
> > @@ -220,18 +220,27 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
> > -static int has_cr_in_index(const struct index_state *istate, const char *path)
> > +static int has_crlf_in_index(const struct index_state *istate, const char *path)
> >  {
> >         unsigned long sz;
> >         void *data;
> > -       int has_cr;
> > +       const char *crp;
> > +       int has_crlf = 0;
> >
> >         data = read_blob_data_from_index(istate, path, &sz);
> >         if (!data)
> >                 return 0;
> > -       has_cr = memchr(data, '\r', sz) != NULL;
> > +
> > +       crp = memchr(data, '\r', sz);
> > +       if (crp && (crp[1] == '\n')) {
> 
> If I understand correctly, this isn't a NUL-terminated string and it
> might be a binary blob, so if the lone CR in a file resides at the end
> of the file, won't this try looking for LF out-of-bounds? I would have
> expected the conditional to be:
> 
>     if (crp && crp - data + 1 < sz && crp[1] == '\n') {
> 
> or any equivalent variation.
> 

The read_blob_data_from_index() function should always append a '\0',
regardless if the blob is binary or not.

  reply	other threads:[~2017-11-24 18:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14 12:31 Changing encoding of a file : What should happen to CRLF in file ? Ashish Negi
2017-11-14 15:20 ` Torsten Bögershausen
2017-11-14 16:13   ` Ashish Negi
2017-11-14 16:15     ` Ashish Negi
2017-11-14 17:09       ` Torsten Bögershausen
2017-11-15  8:11         ` Ashish Negi
2017-11-15 17:12           ` Torsten Bögershausen
2017-11-15 19:05             ` Ashish Negi
2017-11-16 16:15               ` Torsten Bögershausen
2017-11-23 16:31                 ` Ashish Negi
2017-11-23 20:25                   ` Torsten Bögershausen
2017-11-24  6:37                     ` Ashish Negi
2017-11-14 16:45     ` Torsten Bögershausen
2017-11-24 16:14 ` [PATCH 1/1] convert: tighten the safe autocrlf handling tboegi
2017-11-24 17:24   ` Eric Sunshine
2017-11-24 18:59     ` Torsten Bögershausen [this message]
2017-11-25  3:16   ` Junio C Hamano
2017-11-26 12:20 ` [PATCH v2 " tboegi
2017-12-08 17:46 ` [PATCH v1 1/2] t0027: Don't use git commit <empty-pathspec> tboegi
2017-12-08 18:13   ` Junio C Hamano
2017-12-08 18:21     ` Junio C Hamano
2017-12-08 18:50       ` Torsten Bögershausen
2017-12-08 17:46 ` [PATCH v1 2/2] t0027: Adapt the new MIX tests to Windows tboegi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171124185905.GA9736@tor.lan \
    --to=tboegi@web.de \
    --cc=ashishnegi33@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).