git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: John Whitney <jjw@emsoftware.com>
Cc: Phil Hord <phil.hord@gmail.com>, git@vger.kernel.org
Subject: Re: Bug report
Date: Sun, 7 Oct 2012 19:52:44 -0400	[thread overview]
Message-ID: <20121007235244.GA5536@sigill.intra.peff.net> (raw)
In-Reply-To: <5070E7BF.8040102@emsoftware.com>

On Sat, Oct 06, 2012 at 09:23:59PM -0500, John Whitney wrote:

> >You said in your test script:
> >
> >   # Committing test.txt or clearing .gitattributes does clear
> >   # the "modified" status, but those options are undesirable
> >
> >Why is the commit undesirable? You have decided that CRs will no longer
> >be tolerated in your repository (by setting .gitattributes). Now you
> >need to record that change in history with a commit that strips out the
> >CRs.
> In some cases it's undesirable. In my example, all I want to do is
> merge in the change that deletes the file, so I don't want to have to
> add that extra commit when I'm just going to delete the file anyway.

Yes, but that is conflating two operations. You only don't want to do
the commit because you are anticipating what is coming next (the
cherry-picked deletion). But if you want to conflate, then you could
also realize that you can simply delete the file, CRs or no, and you do
not need to care about its modified state.

I think a much stronger argument for your position is that the cherry
pick would not happen without a conflict after such a commit, because it
would be deleting files with two different sets of content (the
cherry-pick would want to delete the CR version, but you would not have
that version).

In other words, you want the cherry-pick to happen and ignore the
modification that could be committed, because you know the modification
is not relevant (but git does not). But there is not a way to do that
(even once you overcome the confusion), because the usual way to do so
is to drop the local modification with "git reset" (which would not work
in this case).

> >It is not about having CRs in the working tree file. Those are now
> >considered uninteresting and stripped by git when comparing to the HEAD.
> >The problem is that what's in your _repository_ has CRs.
> Yes, but does that really have to be an issue? Is there any technical
> or practical reason you can think of that the repository shouldn't
> ignore those CRs?

It's significantly less efficient. Right now git only has to do the
conversion when updating the index cache of what's on the filesystem
(i.e., when it would be doing a sha1 over the file contents _anyway_).
And then it can compare sha1s internally, because it knows that all of
the sha1s it has computed are for the canonical in-repo versions of the
file.

If we assume that the in-repo file might need to have CRs stripped, then
we need to actually follow up every sha1 mismatch with an actual content
diff in order to discover if it really is different or not. We could
cache the "true" sha1 of the canonical stripped version to avoid this,
but now we are getting much more complex. In most cases it is sufficient
to just commit the cleaned up contents and then never worry about it
again.

> You're right, we can't magically avoid all the line ending issues
> that people will run into. In this case, though, I think git can
> sidestep a fairly obnoxious problem. My example was simple, but when
> you've got multiple branches that need to be rebased/merged, it can
> get pretty hairy. The repository will never be truly "clean" unless
> you rewrite the whole thing (using filter-branch, for instance).

Right. Git's current approach is very hairy when you are looking at
history that crosses a CRLF flag-day boundary. It's definitely a
weakness of the canonicalization approach. But other approaches also
have downsides; I don't want to catalogue them all here, but you can
certainly search the archive for various discussions and flamewars about
how line endings are handled.

> Maybe my above suggestion is more of a feature request than a bug,

Fair enough. I think your complaint is real, but I think nobody has been
clever enough yet to devise a solution that does not have too many other
downsides. And of course you are free to propose such an approach if you
have thought of one. :)

> but there is the obvious bug that after changing .gitattributes, git
> still doesn't notice that files are "modified" until you modify them
> again in some way (touch works). I only noticed the CRs in our own
> repository after I tried to rebase a branch and got strange errors.
> To make git notice all the files, I had to "find . -type f -exec
> touch {} \;".

I think the idea has been floated before of unconditionally refreshing
the index when you update the crlf config via "git config". But of
course that can only fix a fraction of the cases. You might edit it with
an editor. Or they may be new lines in .gitattributes. Or a change of
wildcard lines in .gitattributes.

Really, the issue is that the index contains a cache of what's in the
files that is considered valid unless the stat information of the file
changes. But that is obviously not the full story, as the
canonicalization rules (CRLF handling or smudge/clean filters) can
change, too, and that is not considered as part of the cache's validity.
Doing it "right" would mean that anytime the attributes or config files
changed, we would consider the cache entry dirty and re-read (and
re-canonicalize) the file.

But that has either:

  1. Bad complexity. It means our cache validity needs to know about
     exactly which rules were applied to yield the cached sha1. And
     those rules can be complex, consisting of wildcard matching,
     cross-referencing custom filters from config, etc.

  2. Bad performance. If you instead just invalidate cached sha1s when
     the gitattributes or .git/config file changes, you catch way too
     many cases. E.g., if you checkout a branch that changes
     .gitattributes, we have to re-read every file in the repository,
     even though most of them will not be affected.

So I think it's possible to handle this case correctly, but doing it
right is quite complex. So we have the "just manually poke the files
when you make such a change". Which is a horrible user experience, but
works OK in practice (and many people do not run into it at all, because
on new projects they set the filter attributes very early on, before
they have an existing history).

IOW, no, it is not pretty, but these are all known issues that nobody
has felt it worth tackling yet.

-Peff

  reply	other threads:[~2012-10-07 23:52 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-04  4:35 Bug report John Whitney
2012-10-04 14:19 ` Phil Hord
2012-10-04 16:10   ` John Whitney
2012-10-06 13:31     ` Jeff King
2012-10-07  2:23       ` John Whitney
2012-10-07 23:52         ` Jeff King [this message]
2012-10-09 17:17           ` John Whitney
2012-10-09 19:00             ` John Whitney
2012-10-04 15:21 ` Andrew Wong
2012-10-04 16:16   ` John Whitney
2012-10-04 16:28     ` John Whitney
2012-10-04 17:01     ` Andrew Wong
  -- strict thread matches above, loose matches on Subject: below --
2012-10-05 10:13 Муковников Михаил
2012-10-05 10:32 ` Konstantin Khomoutov
2012-10-05 10:47   ` Carlos Martín Nieto
2012-10-05 11:03     ` Муковников Михаил
2012-10-05 10:52   ` Муковников Михаил
     [not found] <CAC34_pT9zwZDnUjo1bTUZabD02M48=_+77-mNCA5adWTgxuYgg@mail.gmail.com>
2013-04-08  5:20 ` Bug Report Kirk Fraser
2015-01-27 14:43 bug report Albert Akhriev
2015-01-27 14:50 ` Jeff King
2016-04-03  0:25 Bug Report Benjamin Sandeen
2016-04-03  2:20 ` Eric N. Vander Weele
2016-04-03  2:22 ` Jacob Keller
2016-05-13  5:04 bug report 李本超
2016-05-13  5:23 ` Pranit Bauva
2016-05-13  5:58   ` 李本超
2016-05-13  6:37     ` Pranit Bauva
2016-05-13  6:57       ` 李本超
2016-05-13  7:10         ` Pranit Bauva
2016-05-13  7:41           ` 李本超
2016-05-13  8:10             ` Jeff King
2016-05-13 12:05               ` 李本超
2017-08-30 21:25 Bug report Aleksandar Pavic
2017-08-31  6:36 ` Kevin Daudt
2017-08-31 14:19   ` Dov Grobgeld
2017-08-31 14:55     ` Aleksandar Pavic
2017-08-31 16:23   ` Stephan Beyer
2017-09-02  8:49 ` Jeff King
     [not found] <CA+2sEepTyrK-iH+VBHVF1i9DuYVzDkTNxuM0-yoWbkC9N4f8HA@mail.gmail.com>
2019-04-15 15:18 ` bug report Nick Steinhauser
2020-03-27 11:53 Bug Report James Yeoman
2020-03-27 12:59 ` Pratyush Yadav
2021-11-12  4:22 bug report Theodore Li
2021-11-12  4:29 ` Junio C Hamano
2021-11-12  6:59   ` Theodore Li
2021-11-12 14:05     ` Paul Smith
2021-12-01 22:31 Bug Report Josh Rampersad
2022-04-20 19:45 Daniel Habenicht
2022-04-20 21:30 ` brian m. carlson
2022-04-20 22:34   ` rsbecker
2022-04-21 13:20     ` Daniel Habenicht
2022-04-21 14:39       ` Torsten Bögershausen
     [not found]         ` <AS1P190MB175022A7F1264807ECA464A8ECF49@AS1P190MB1750.EURP190.PROD.OUTLOOK.COM>
2022-04-21 17:52           ` Torsten Bögershausen
2022-10-03 15:28 Bug report Alastair Douglas
2022-10-03 16:53 ` Junio C Hamano
2022-10-04 10:15   ` Alastair Douglas
2022-10-05  5:46     ` Junio C Hamano
2022-11-19 20:20 Bug Report Jensen Bean
2022-12-08  5:29 Jensen Bean
2022-12-08  8:31 ` Bagas Sanjaya
     [not found]   ` <CANqKdC-gHgQHn5DMoOREY52y7PpRLMpNAjX3qeA5iy9z_GXdzw@mail.gmail.com>
2022-12-26  2:15     ` Bagas Sanjaya
2022-12-25 17:26 bug report Eyal Post
2022-12-25 18:12 ` Eric Sunshine
2022-12-28  2:43 Bug report Jensen Bean
2022-12-28  5:02 ` Eric Sunshine
2023-06-27 16:02 Bug Report Tiago d'Almeida
2023-08-28 12:51 Bug report Dexter Pontañeles

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121007235244.GA5536@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jjw@emsoftware.com \
    --cc=phil.hord@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).