git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Stefan Beller <sbeller@google.com>, Junio C Hamano <gitster@pobox.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Philip Oakley <philipoakley@iee.org>,
	Pavel Kretov <firegurafiku@gmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [idea] File history tracking hints
Date: Mon, 2 Oct 2017 14:51:26 -0400	[thread overview]
Message-ID: <ea1538e3-2b2e-f7eb-9c0e-e29c15bf2ea9@jeffhostetler.com> (raw)
In-Reply-To: <CAGZ79kbghnWmvQweup=Z79HnVQQCMM65CKgEO3oqDoRp-Bj=2Q@mail.gmail.com>



On 10/2/2017 1:41 PM, Stefan Beller wrote:
>>> It would be nice if every file (and tree) had a permanent GUID
>>> associated with it.  Then the filename/pathname becomes a property
>>> of the GUIDs.  Then you can exactly know about moves/renames with
>>> minimal effort (and no guessing).
>>
> ...
> 
>> https://public-inbox.org/git/Pine.LNX.4.58.0504150753440.7211@ppc970.osdl.org/
>>
>> I'd encourge people to read and re-read that message until they can
>> recite it by heart.
> 
> I have rethought about the idea of GUIDs as proposed by Jeff and wanted
> to give a reply. After rereading this message, I think my thoughts are
> already included via:
> 
>    - you're doing the work at the wrong point for _another_ reason. You're
>       freezing your (crappy) algorithm at tree creation time, and basically
>       making it pointless to ever create something better later, because even
>       if hardware and software improves, you've codified that "we have to
>       have crappy information".
> 
> --
> My design proposal for these "rename hints" would be a special trailer,
> roughly:
> 
>      Rename: LICENSE -> legal.txt
>      Rename: t/* -> tests/*
> 
> or more generally:
> 
>      Rename: <pathspec> <delim> <pathspec>
> 
> This however has multiple issues due to potential
> human inaccuracies:
> (A) typos in the trailer key or in the pathspec
>     (resulting in different error modes)
> (B) partial hints (We currently have a world of
>     completely missing hints, so I would not expect it to
>     be worse?)
> (C) wrong hints. This ought to be no problem as Git would
>     take some CPU time to conclude the hint was bogus.
> 
> For (A), I would imagine we want a mechanism (e.g. notes)
> to "correct" the hints. This is the similar issue as a typo in a
> commit message, which we currently just ignore if the
> commit has been merged to e.g. master.
> 
> So maybe we'd just design around that, giving the option
> to give the correct hints via command line.
> 
> So if the commit has the typo'd hint
> 
>      Remame:  t/* -> tests/*
> 
> the human would see that (and also conclude that by
> the commit message), and then invoke
> 
> git log -C -C-hint="t/* -> tests/*" ...
> 
> which would have the corrected hint and hence deliver
> the best output.
> 
> Maybe the "-C-hint" flag is the best starting point when
> going in that direction?
> 
> Thanks,
> Stefan
> 

Sorry to re-re-...-re-stir up such an old topic.

I wasn't really thinking about commit-to-commit hints.
I think these have lots of problems.  (If commit A->B does
"t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*",
then you need a way to compute a transitive closure to see
the net-net hints for A->C.  I think that quickly spirals
out of control.)

No, I was going in another direction.  For example, if a
tree-entry contains { file-guid, file-name, file-sha, ... }
then when diffing any 2 commits, you can match up files
(and folders) by their guids.  Renames pop out trivially when
their file-names don't match.  File moves pop out when the
file-guids appear in different trees.  Adds and deletes pop
out when file-guids don't have a peer. (I'm glossing over some
of the details, but you get the idea.)  To address Junio's
question, independently added files with the same name will
have 2 different file-guids.  We amend the merge rules to
handle this case and pick one of them (say, the one that
is sorts less than the other) as the winner and go on.
All-in-all the solution is not trivial (as there are a few
edge cases to deal with), but it better matches the (casual)
user's perception of what happened to their tree over time.
It also doesn't require expensive code to sniff for renames
on every command (which doesn't scale on really large repos).

But as I said before, that ship has passed...
Jeff

  reply	other threads:[~2017-10-02 18:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-11  7:11 [idea] File history tracking hints Pavel Kretov
2017-09-11 18:11 ` Stefan Beller
2017-09-11 18:47   ` Jacob Keller
2017-09-11 18:41 ` Jeff King
2017-09-11 20:09 ` Igor Djordjevic
2017-09-11 21:48 ` Philip Oakley
2017-09-13 11:38   ` Johannes Schindelin
2017-09-14 23:22     ` Philip Oakley
2017-09-29 23:12       ` Johannes Schindelin
2017-09-30  8:02         ` Jeff Hostetler
2017-09-30 15:11           ` Johannes Schindelin
2017-10-01  3:27           ` Junio C Hamano
2017-10-02 17:41             ` Stefan Beller
2017-10-02 18:51               ` Jeff Hostetler [this message]
2017-10-02 19:18                 ` Stefan Beller
2017-10-02 20:02                   ` Jeff Hostetler
2017-10-03  0:52                     ` Junio C Hamano
2017-10-03  0:45               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea1538e3-2b2e-f7eb-9c0e-e29c15bf2ea9@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=firegurafiku@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=philipoakley@iee.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).