git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Philip Oakley <philipoakley@iee.org>,
	Pavel Kretov <firegurafiku@gmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [idea] File history tracking hints
Date: Mon, 2 Oct 2017 12:18:59 -0700	[thread overview]
Message-ID: <CAGZ79kbjfXC3CxMDouUrCUVt-OJXckDtg9U_7=R=FM-eon4ikA@mail.gmail.com> (raw)
In-Reply-To: <ea1538e3-2b2e-f7eb-9c0e-e29c15bf2ea9@jeffhostetler.com>

On Mon, Oct 2, 2017 at 11:51 AM, Jeff Hostetler <git@jeffhostetler.com> wrote:

> Sorry to re-re-...-re-stir up such an old topic.
>
> I wasn't really thinking about commit-to-commit hints.
> I think these have lots of problems.  (If commit A->B does
> "t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*",
> then you need a way to compute a transitive closure to see
> the net-net hints for A->C.  I think that quickly spirals
> out of control.)

I agree. Though as a human I can still look at
A..C giving the hint that t/*.c and xyz/*.c ought to
be taken into account for rename detection.
(which is currently done with -M -C --find-copies-harder
as a generic "there are renamed things", and not the very
specific rule, that may be cheaper to examine compared to
these generic rules)

> No, I was going in another direction.  For example, if a
> tree-entry contains { file-guid, file-name, file-sha, ... }
> then when diffing any 2 commits, you can match up files
> (and folders) by their guids.  Renames pop out trivially when
> their file-names don't match.  File moves pop out when the
> file-guids appear in different trees.  Adds and deletes pop
> out when file-guids don't have a peer. (I'm glossing over some
> of the details, but you get the idea.)

How do you know when a guid needs adaption?

(c.f. origin/jt/packmigrate)
If a commit moves a function out of a file into a new file,
the ideal version control could notice that the function
was moved into a new file and still attribute the original
authors by ignoring the move commit.

Another series in flight could have modified that
function slightly (fixed a bug), such that it's hard to
reason about these things.

For guids I imagine the new file gets a new guid, such that
tracking the function becomes harder?


> To address Junio's
> question, independently added files with the same name will
> have 2 different file-guids.  We amend the merge rules to
> handle this case and pick one of them (say, the one that
> is sorts less than the other) as the winner and go on.
> All-in-all the solution is not trivial (as there are a few
> edge cases to deal with), but it better matches the (casual)
> user's perception of what happened to their tree over time.

The GUID would be made up at creation time, I assume?
Is there any input other than the file itself? (I assumed so
initially, such that:
  By having a GUID in the tree, we would divorce from the notion
  of a "content addressable file system" quickly, as we both could
  create the same tree locally (containing the same blobs) and
  yet the trees would have different names due to having different
  GUIDs in them
), which I'd find undesirable.

> It also doesn't require expensive code to sniff for renames
> on every command (which doesn't scale on really large repos).

I wonder if the rename detection could be offloaded to a server
(which scales) that provides a "hint file" to clients, such that the
clients can then cheaply make use of these specific hints.

  reply	other threads:[~2017-10-02 19:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-11  7:11 [idea] File history tracking hints Pavel Kretov
2017-09-11 18:11 ` Stefan Beller
2017-09-11 18:47   ` Jacob Keller
2017-09-11 18:41 ` Jeff King
2017-09-11 20:09 ` Igor Djordjevic
2017-09-11 21:48 ` Philip Oakley
2017-09-13 11:38   ` Johannes Schindelin
2017-09-14 23:22     ` Philip Oakley
2017-09-29 23:12       ` Johannes Schindelin
2017-09-30  8:02         ` Jeff Hostetler
2017-09-30 15:11           ` Johannes Schindelin
2017-10-01  3:27           ` Junio C Hamano
2017-10-02 17:41             ` Stefan Beller
2017-10-02 18:51               ` Jeff Hostetler
2017-10-02 19:18                 ` Stefan Beller [this message]
2017-10-02 20:02                   ` Jeff Hostetler
2017-10-03  0:52                     ` Junio C Hamano
2017-10-03  0:45               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kbjfXC3CxMDouUrCUVt-OJXckDtg9U_7=R=FM-eon4ikA@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=firegurafiku@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=philipoakley@iee.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).