From: Jeff Hostetler <git@jeffhostetler.com>
To: Stefan Beller <sbeller@google.com>
Cc: Junio C Hamano <gitster@pobox.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Philip Oakley <philipoakley@iee.org>,
Pavel Kretov <firegurafiku@gmail.com>,
"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [idea] File history tracking hints
Date: Mon, 2 Oct 2017 16:02:09 -0400 [thread overview]
Message-ID: <f9b722d9-cd37-40f3-7ae4-6f7f3d90de83@jeffhostetler.com> (raw)
In-Reply-To: <CAGZ79kbjfXC3CxMDouUrCUVt-OJXckDtg9U_7=R=FM-eon4ikA@mail.gmail.com>
On 10/2/2017 3:18 PM, Stefan Beller wrote:
> On Mon, Oct 2, 2017 at 11:51 AM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>
>> Sorry to re-re-...-re-stir up such an old topic.
>>
>> I wasn't really thinking about commit-to-commit hints.
>> I think these have lots of problems. (If commit A->B does
>> "t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*",
>> then you need a way to compute a transitive closure to see
>> the net-net hints for A->C. I think that quickly spirals
>> out of control.)
>
> I agree. Though as a human I can still look at
> A..C giving the hint that t/*.c and xyz/*.c ought to
> be taken into account for rename detection.
> (which is currently done with -M -C --find-copies-harder
> as a generic "there are renamed things", and not the very
> specific rule, that may be cheaper to examine compared to
> these generic rules)
>
>> No, I was going in another direction. For example, if a
>> tree-entry contains { file-guid, file-name, file-sha, ... }
>> then when diffing any 2 commits, you can match up files
>> (and folders) by their guids. Renames pop out trivially when
>> their file-names don't match. File moves pop out when the
>> file-guids appear in different trees. Adds and deletes pop
>> out when file-guids don't have a peer. (I'm glossing over some
>> of the details, but you get the idea.)
>
> How do you know when a guid needs adaption?
I'm not sure I know what you mean by "adaption".
>
> (c.f. origin/jt/packmigrate)
> If a commit moves a function out of a file into a new file,
> the ideal version control could notice that the function
> was moved into a new file and still attribute the original
> authors by ignoring the move commit.
I think that's an orthogonal problem. I could move a function
from one file to an existing file or to a new file it doesn't
matter. Attributing those lines back to the original author
(rather than the mover) is a bit of a pipe dream IMHO. And I
have to wonder if it is always the correct thing to do? I can
see scenarios where you'd want the mover.
I guess there's nothing from stopping the "ideal VC system"
doing all this line-based analysis, but that shouldn't make
file renames expensive to detect (since that is the granularity
that people and most tools expect the system to work with).
>
> Another series in flight could have modified that
> function slightly (fixed a bug), such that it's hard to
> reason about these things.
>
> For guids I imagine the new file gets a new guid, such that
> tracking the function becomes harder?
>
Yeah, I'm not thinking about tracking individual functions.
>
>> To address Junio's
>> question, independently added files with the same name will
>> have 2 different file-guids. We amend the merge rules to
>> handle this case and pick one of them (say, the one that
>> is sorts less than the other) as the winner and go on.
>> All-in-all the solution is not trivial (as there are a few
>> edge cases to deal with), but it better matches the (casual)
>> user's perception of what happened to their tree over time.
>
> The GUID would be made up at creation time, I assume?
> Is there any input other than the file itself? (I assumed so
> initially, such that:
> By having a GUID in the tree, we would divorce from the notion
> of a "content addressable file system" quickly, as we both could
> create the same tree locally (containing the same blobs) and
> yet the trees would have different names due to having different
> GUIDs in them
> ), which I'd find undesirable.
Right. A real solution would store the guid data slightly
differently so we could preserve the existing SHA properties.
My example was more conceptual.
>
>> It also doesn't require expensive code to sniff for renames
>> on every command (which doesn't scale on really large repos).
>
> I wonder if the rename detection could be offloaded to a server
> (which scales) that provides a "hint file" to clients, such that the
> clients can then cheaply make use of these specific hints.
>
I don't know. Might be easier to add that computation to the
occasional client-side housekeeping (somewhat like the commit
generation number computation we keep talking about).
Thanks
Jeff
next prev parent reply other threads:[~2017-10-02 20:02 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-11 7:11 [idea] File history tracking hints Pavel Kretov
2017-09-11 18:11 ` Stefan Beller
2017-09-11 18:47 ` Jacob Keller
2017-09-11 18:41 ` Jeff King
2017-09-11 20:09 ` Igor Djordjevic
2017-09-11 21:48 ` Philip Oakley
2017-09-13 11:38 ` Johannes Schindelin
2017-09-14 23:22 ` Philip Oakley
2017-09-29 23:12 ` Johannes Schindelin
2017-09-30 8:02 ` Jeff Hostetler
2017-09-30 15:11 ` Johannes Schindelin
2017-10-01 3:27 ` Junio C Hamano
2017-10-02 17:41 ` Stefan Beller
2017-10-02 18:51 ` Jeff Hostetler
2017-10-02 19:18 ` Stefan Beller
2017-10-02 20:02 ` Jeff Hostetler [this message]
2017-10-03 0:52 ` Junio C Hamano
2017-10-03 0:45 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f9b722d9-cd37-40f3-7ae4-6f7f3d90de83@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=firegurafiku@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=philipoakley@iee.org \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).