git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Philip Oakley <philipoakley@iee.org>
Cc: Pavel Kretov <firegurafiku@gmail.com>, git@vger.kernel.org
Subject: Re: [idea] File history tracking hints
Date: Wed, 13 Sep 2017 13:38:07 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1.1709131322470.4132@virtualbox> (raw)
In-Reply-To: <E8C827ED458648F78F263F2F2712493B@PhilipOakley>

Hi Philip,

On Mon, 11 Sep 2017, Philip Oakley wrote:

> From: "Pavel Kretov" <firegurafiku@gmail.com>
> > Hi all,
> >
> > Excuse me if the topic I'm going to raise here has been already discussed
> > on the mailing list, forums, or IRC, but I couldn't find anything related.
> >
> >
> > The problem:
> >
> > Git, being "a stupid content tracker", doesn't try to keep an eye on
> > operations which happens to individual files; things like file renames
> > aren't recorded during commit, but heuristically detected later.
> >
> > Unfortunately, the heuristic can only deal with simple file renames with
> > no substantial content changes; it's helpless when you:
> >
> > - rename file and change it's content significantly;
> > - split single file into several files;
> > - merge several files into another;
> > - copy entire file from another commit, and do other things like these.
> >
> > However, if we're able to preserve this information, it's possible
> > not only to do more accurate 'git blame', but also merge revisions with
> > fewer conflicts.
> >
> >
> > The proposal:
> >
> > The idea is to let user give hints about what was changed during
> > the commit. For example, if user did a rename which wasn't automatically
> > detected, he would append something like the following to his commit
> > message:
> >
> >    Tracking-hints: rename dev-vcs/git/git-1.0.ebuild ->
> > dev-vcs/git/git-2.0.ebuild
> >
> > or (if full paths of affected files can be unambiguously omitted):
> >
> >    Tracking-hints: rename git-1.0.ebuild -> git-2.0.ebuild
> >
> > There may be other hint types:
> >
> >    Tracking-hint: recreate LICENSE.txt
> >    Tracking-hint: split main.c -> main.c cmdline.c
> >    Tracking-hint: merge linalg.py <- vector.py matrix.py
> >
> > or even something like this:
> >
> >    Tracking-hint: copy json.py <-
> > libs/json.py@4db88291251151d8c5c8e4f20430fa4def2cb2ed
> >
> > If file transformation cannot be described by a single tracking hint, it
> > shall
> > be possible to specify a sequence of hints at once:
> >
> >    Tracking-hint:
> >        split Utils.java -> AppHelpers.java StringHelpers.java
> >        recreate Utils.java
> >
> > Note that in the above example the order of operations really matters, so
> > both lines have to reside in one 'Tracking-hint' block.
> >
> > * * *
> >
> > How do you think, is this idea worth implementing?
> > Any other thoughts on this?
> >
> > -- Pavel Kretov.
> 
> Maybe use the "interpret-trailers" methods for standardising your hints
> locally (in your team / workplace) to see how it goes and flesh out what works
> and what doesn't. Trying to decide, a-priori, what are the right hints is
> likely to be the hard part.

I think this adds a very valuable insight to this discussion: the current
state of Git's rename handling is based on the idea that you either record
the renames, or you detect them. Like, there is either "on" or "off". No
middle ground.

However, if you understand that there is also the possibility of hints
that can help any erroneous rename detection (and *everybody* who
seriously worked on a massive code base has seen that rename detection
fail in the most inopportune ways [*1*]), then you are on to something.

So I totally like the idea of introducing hints, possibly as trailers in
the commit message (or as refs/notes/rename/* or whatever) that can be
picked up by Git versions that know about them, and can be ignored by Git
versions that insist on the rename detection du jour. With a config option
to control the behavior, maybe, too.

Ciao,
Dscho

Footnote *1*: Just to name a couple of examples from my personal
experience, off the top of my head:

- license boiler plates often let Git detect renames/copies where there
  are none,

- even something as trivial as moving Java classes (and their dependent
  classes) between packages changes every line referring to said packages,
  causing Git's rename detection to go for a drink instead of doing its
  job,

- indentation changes overwhelm Git's rename detection,

- when rename detection would matter most, like, really a lot, to lift the
  burden of the human beings in front of the computer pouring over
  hundreds of thousands of files moved from one directory tree to another,
  that's exactly when Git's rename detection says that there are too many
  files, here are my union rights, I am going home, good luck to you.

In light of such experiences, I have to admit that the notion that the
rename detection can always be improved in hindsight puts quite a bit of
insult to injury for those developers who are bitten by it.

  reply	other threads:[~2017-09-13 11:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-11  7:11 [idea] File history tracking hints Pavel Kretov
2017-09-11 18:11 ` Stefan Beller
2017-09-11 18:47   ` Jacob Keller
2017-09-11 18:41 ` Jeff King
2017-09-11 20:09 ` Igor Djordjevic
2017-09-11 21:48 ` Philip Oakley
2017-09-13 11:38   ` Johannes Schindelin [this message]
2017-09-14 23:22     ` Philip Oakley
2017-09-29 23:12       ` Johannes Schindelin
2017-09-30  8:02         ` Jeff Hostetler
2017-09-30 15:11           ` Johannes Schindelin
2017-10-01  3:27           ` Junio C Hamano
2017-10-02 17:41             ` Stefan Beller
2017-10-02 18:51               ` Jeff Hostetler
2017-10-02 19:18                 ` Stefan Beller
2017-10-02 20:02                   ` Jeff Hostetler
2017-10-03  0:52                     ` Junio C Hamano
2017-10-03  0:45               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1.1709131322470.4132@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=firegurafiku@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=philipoakley@iee.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).