* [idea] File history tracking hints @ 2017-09-11 7:11 Pavel Kretov 2017-09-11 18:11 ` Stefan Beller ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: Pavel Kretov @ 2017-09-11 7:11 UTC (permalink / raw) To: git Hi all, Excuse me if the topic I'm going to raise here has been already discussed on the mailing list, forums, or IRC, but I couldn't find anything related. The problem: Git, being "a stupid content tracker", doesn't try to keep an eye on operations which happens to individual files; things like file renames aren't recorded during commit, but heuristically detected later. Unfortunately, the heuristic can only deal with simple file renames with no substantial content changes; it's helpless when you: - rename file and change it's content significantly; - split single file into several files; - merge several files into another; - copy entire file from another commit, and do other things like these. However, if we're able to preserve this information, it's possible not only to do more accurate 'git blame', but also merge revisions with fewer conflicts. The proposal: The idea is to let user give hints about what was changed during the commit. For example, if user did a rename which wasn't automatically detected, he would append something like the following to his commit message: Tracking-hints: rename dev-vcs/git/git-1.0.ebuild -> dev-vcs/git/git-2.0.ebuild or (if full paths of affected files can be unambiguously omitted): Tracking-hints: rename git-1.0.ebuild -> git-2.0.ebuild There may be other hint types: Tracking-hint: recreate LICENSE.txt Tracking-hint: split main.c -> main.c cmdline.c Tracking-hint: merge linalg.py <- vector.py matrix.py or even something like this: Tracking-hint: copy json.py <- libs/json.py@4db88291251151d8c5c8e4f20430fa4def2cb2ed If file transformation cannot be described by a single tracking hint, it shall be possible to specify a sequence of hints at once: Tracking-hint: split Utils.java -> AppHelpers.java StringHelpers.java recreate Utils.java Note that in the above example the order of operations really matters, so both lines have to reside in one 'Tracking-hint' block. * * * How do you think, is this idea worth implementing? Any other thoughts on this? -- Pavel Kretov. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-11 7:11 [idea] File history tracking hints Pavel Kretov @ 2017-09-11 18:11 ` Stefan Beller 2017-09-11 18:47 ` Jacob Keller 2017-09-11 18:41 ` Jeff King ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: Stefan Beller @ 2017-09-11 18:11 UTC (permalink / raw) To: Pavel Kretov; +Cc: git@vger.kernel.org On Mon, Sep 11, 2017 at 12:11 AM, Pavel Kretov <firegurafiku@gmail.com> wrote: > Hi all, > > Excuse me if the topic I'm going to raise here has been already discussed > on the mailing list, forums, or IRC, but I couldn't find anything related. > > > The problem: > > Git, being "a stupid content tracker", doesn't try to keep an eye on > operations which happens to individual files; things like file renames > aren't recorded during commit, but heuristically detected later. > > Unfortunately, the heuristic can only deal with simple file renames with > no substantial content changes; it's helpless when you: > > - rename file and change it's content significantly; > - split single file into several files; > - merge several files into another; > - copy entire file from another commit, and do other things like these. > > However, if we're able to preserve this information, it's possible > not only to do more accurate 'git blame', but also merge revisions with > fewer conflicts. > > > The proposal: > > The idea is to let user give hints about what was changed during > the commit. For example, if user did a rename which wasn't automatically > detected, he would append something like the following to his commit > message: > > Tracking-hints: rename dev-vcs/git/git-1.0.ebuild -> > dev-vcs/git/git-2.0.ebuild > > or (if full paths of affected files can be unambiguously omitted): > > Tracking-hints: rename git-1.0.ebuild -> git-2.0.ebuild > > There may be other hint types: > > Tracking-hint: recreate LICENSE.txt > Tracking-hint: split main.c -> main.c cmdline.c > Tracking-hint: merge linalg.py <- vector.py matrix.py > > or even something like this: > > Tracking-hint: copy json.py <- > libs/json.py@4db88291251151d8c5c8e4f20430fa4def2cb2ed > > If file transformation cannot be described by a single tracking hint, it shall > be possible to specify a sequence of hints at once: > > Tracking-hint: > split Utils.java -> AppHelpers.java StringHelpers.java > recreate Utils.java > > Note that in the above example the order of operations really matters, so > both lines have to reside in one 'Tracking-hint' block. > > * * * > > How do you think, is this idea worth implementing? > Any other thoughts on this? > > -- Pavel Kretov. This was discussed a couple of times on the mailing list (though not recently). I searched for "rename tracking files site:public-inbox.org/git" and came up with https://public-inbox.org/git/Pine.LNX.4.58.0504141102430.7211@ppc970.osdl.org/ (the nearby emails seem to also be relevant to this discussion) tl:dr: When encoding these hints, you do it at commit time, but the heuristic can be improved upon later. So you can assume the heuristic is better for the common case, as someone will fix the heuristic for the common case. Also Gits model is to track objects. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-11 18:11 ` Stefan Beller @ 2017-09-11 18:47 ` Jacob Keller 0 siblings, 0 replies; 18+ messages in thread From: Jacob Keller @ 2017-09-11 18:47 UTC (permalink / raw) To: Stefan Beller; +Cc: Pavel Kretov, git@vger.kernel.org On Mon, Sep 11, 2017 at 11:11 AM, Stefan Beller <sbeller@google.com> wrote: > On Mon, Sep 11, 2017 at 12:11 AM, Pavel Kretov <firegurafiku@gmail.com> wrote: >> Hi all, >> >> Excuse me if the topic I'm going to raise here has been already discussed >> on the mailing list, forums, or IRC, but I couldn't find anything related. >> >> >> The problem: >> >> Git, being "a stupid content tracker", doesn't try to keep an eye on >> operations which happens to individual files; things like file renames >> aren't recorded during commit, but heuristically detected later. >> >> Unfortunately, the heuristic can only deal with simple file renames with >> no substantial content changes; it's helpless when you: >> >> - rename file and change it's content significantly; >> - split single file into several files; >> - merge several files into another; >> - copy entire file from another commit, and do other things like these. >> >> However, if we're able to preserve this information, it's possible >> not only to do more accurate 'git blame', but also merge revisions with >> fewer conflicts. >> >> >> The proposal: >> >> The idea is to let user give hints about what was changed during >> the commit. For example, if user did a rename which wasn't automatically >> detected, he would append something like the following to his commit >> message: >> >> Tracking-hints: rename dev-vcs/git/git-1.0.ebuild -> >> dev-vcs/git/git-2.0.ebuild >> >> or (if full paths of affected files can be unambiguously omitted): >> >> Tracking-hints: rename git-1.0.ebuild -> git-2.0.ebuild >> >> There may be other hint types: >> >> Tracking-hint: recreate LICENSE.txt >> Tracking-hint: split main.c -> main.c cmdline.c >> Tracking-hint: merge linalg.py <- vector.py matrix.py >> >> or even something like this: >> >> Tracking-hint: copy json.py <- >> libs/json.py@4db88291251151d8c5c8e4f20430fa4def2cb2ed >> >> If file transformation cannot be described by a single tracking hint, it shall >> be possible to specify a sequence of hints at once: >> >> Tracking-hint: >> split Utils.java -> AppHelpers.java StringHelpers.java >> recreate Utils.java >> >> Note that in the above example the order of operations really matters, so >> both lines have to reside in one 'Tracking-hint' block. >> >> * * * >> >> How do you think, is this idea worth implementing? >> Any other thoughts on this? >> >> -- Pavel Kretov. > > This was discussed a couple of times on the mailing list > (though not recently). > > I searched for "rename tracking files site:public-inbox.org/git" > and came up with > https://public-inbox.org/git/Pine.LNX.4.58.0504141102430.7211@ppc970.osdl.org/ > (the nearby emails seem to also be relevant to this discussion) > > tl:dr: When encoding these hints, you do it at commit time, > but the heuristic can be improved upon later. > So you can assume the heuristic is better for the > common case, as someone will fix the heuristic for the > common case. Also Gits model is to track objects. Linus has a pretty long post about this, it's somewhere in that discussion. Essentially, if you bake in rename detection (or other hints) at commit time, then you're stuck with it forever. Additionally, there are similar but not *quite* the same operations which you probably wouldn't bake into at the start, and the types of questions a user wants to ask isn't known at commit time, but rather known at *debug* time in the future when you're digging up history. In this time frame, the user does know what to care about and what kind of questions to ask, so it's already natural to ask these questions at that time. Additionally, if you have to generate the heuristic every commit, you're increasing time "wasted" every commit, where as doing the lookup later when a user starts asking questions like during blame or diff would only add time during an operation the user already expects to take some time. Thanks, Jake ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-11 7:11 [idea] File history tracking hints Pavel Kretov 2017-09-11 18:11 ` Stefan Beller @ 2017-09-11 18:41 ` Jeff King 2017-09-11 20:09 ` Igor Djordjevic 2017-09-11 21:48 ` Philip Oakley 3 siblings, 0 replies; 18+ messages in thread From: Jeff King @ 2017-09-11 18:41 UTC (permalink / raw) To: Pavel Kretov; +Cc: git On Mon, Sep 11, 2017 at 10:11:31AM +0300, Pavel Kretov wrote: > Unfortunately, the heuristic can only deal with simple file renames with > no substantial content changes; it's helpless when you: > > - rename file and change it's content significantly; > - split single file into several files; > - merge several files into another; > - copy entire file from another commit, and do other things like these. > > However, if we're able to preserve this information, it's possible > not only to do more accurate 'git blame', but also merge revisions with > fewer conflicts. This is definitely something that's been discussed before on the list (though I'm not sure of the best keywords to dig for; Stefan found one thread but I know there have been others). And I don't think it's a totally unreasonable idea, but there are some complications. The biggest one is that renames are really part of a _diff_ between two endpoints. We think of them as attached to a commit because we tend to talk about commits as a diff from state A to state B. So obviously in the diff HEAD^ versus HEAD, we can look at the hints for HEAD. But what about "git diff v1.0 v1.1", that may cover multiple commits? Right now Git doesn't look at the intermediate commits at all. And in fact we may not even know what they are, if the command is fed two trees. Or the two endpoints may not have a sensible history (e.g., consider diffing between two branches, one of which has been rebased). But even if we had a sensible set of commits to pull hints from (e.g., if v1.0 and v1.1 were in a linear relationship), it's not clear to me how you would want to apply them to an end-to-end diff. So I don't think that these kind of tracking hints make sense for a lot of diffs (including merges, which use diffs between the endpoints and the merge base). Which isn't to say that they're useless. I agree that something like "--follow" could benefit from an annotation that tells us when and how to pick up the next step in the traversal. But of course somebody has to make those annotations. If we had a tool to do it automatically, then we could apply the same tool at run-time later. But maybe if it were an optional annotation, people would want to use it when the normal rename logic doesn't kick in. So perhaps a baby step in this direction would be to teach something like "--follow" to "jump" across a non-rename when it sees a special marking in the commit message. -Peff ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-11 7:11 [idea] File history tracking hints Pavel Kretov 2017-09-11 18:11 ` Stefan Beller 2017-09-11 18:41 ` Jeff King @ 2017-09-11 20:09 ` Igor Djordjevic 2017-09-11 21:48 ` Philip Oakley 3 siblings, 0 replies; 18+ messages in thread From: Igor Djordjevic @ 2017-09-11 20:09 UTC (permalink / raw) To: Pavel Kretov, git Hi Pavel, On 11/09/2017 09:11, Pavel Kretov wrote: > Hi all, > > Excuse me if the topic I'm going to raise here has been already discussed > on the mailing list, forums, or IRC, but I couldn't find anything related. > > > The problem: > > Git, being "a stupid content tracker", doesn't try to keep an eye on > operations which happens to individual files; things like file renames > aren't recorded during commit, but heuristically detected later. > > Unfortunately, the heuristic can only deal with simple file renames with > no substantial content changes; it's helpless when you: > > - rename file and change it's content significantly; > - split single file into several files; > - merge several files into another; > - copy entire file from another commit, and do other things like these. > > However, if we're able to preserve this information, it's possible > not only to do more accurate 'git blame', but also merge revisions with > fewer conflicts. > > > The proposal: > > The idea is to let user give hints about what was changed during > the commit. For example, if user did a rename which wasn't automatically > detected, he would append something like the following to his commit > message: > > Tracking-hints: rename dev-vcs/git/git-1.0.ebuild -> > dev-vcs/git/git-2.0.ebuild > > or (if full paths of affected files can be unambiguously omitted): > > Tracking-hints: rename git-1.0.ebuild -> git-2.0.ebuild > > There may be other hint types: > > Tracking-hint: recreate LICENSE.txt > Tracking-hint: split main.c -> main.c cmdline.c > Tracking-hint: merge linalg.py <- vector.py matrix.py > > or even something like this: > > Tracking-hint: copy json.py <- > libs/json.py@4db88291251151d8c5c8e4f20430fa4def2cb2ed > > If file transformation cannot be described by a single tracking hint, it shall > be possible to specify a sequence of hints at once: > > Tracking-hint: > split Utils.java -> AppHelpers.java StringHelpers.java > recreate Utils.java > > Note that in the above example the order of operations really matters, so > both lines have to reside in one 'Tracking-hint' block. > > * * * > > How do you think, is this idea worth implementing? > Any other thoughts on this? Here[1] you can find Linus` reply (from 2005-04-15) to "rename tracking" discussion, usually quoted to explain the Git philosophy on this point, even referred to as "one of the most important messages in the list archive"[2] by Junio himself. [1] https://public-inbox.org/git/Pine.LNX.4.58.0504150753440.7211@ppc970.osdl.org/ [2] https://public-inbox.org/git/xmqqr30qflk9.fsf@gitster.mtv.corp.google.com/ Regards, Buga ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-11 7:11 [idea] File history tracking hints Pavel Kretov ` (2 preceding siblings ...) 2017-09-11 20:09 ` Igor Djordjevic @ 2017-09-11 21:48 ` Philip Oakley 2017-09-13 11:38 ` Johannes Schindelin 3 siblings, 1 reply; 18+ messages in thread From: Philip Oakley @ 2017-09-11 21:48 UTC (permalink / raw) To: Pavel Kretov, git From: "Pavel Kretov" <firegurafiku@gmail.com> > Hi all, > > Excuse me if the topic I'm going to raise here has been already discussed > on the mailing list, forums, or IRC, but I couldn't find anything related. > > > The problem: > > Git, being "a stupid content tracker", doesn't try to keep an eye on > operations which happens to individual files; things like file renames > aren't recorded during commit, but heuristically detected later. > > Unfortunately, the heuristic can only deal with simple file renames with > no substantial content changes; it's helpless when you: > > - rename file and change it's content significantly; > - split single file into several files; > - merge several files into another; > - copy entire file from another commit, and do other things like these. > > However, if we're able to preserve this information, it's possible > not only to do more accurate 'git blame', but also merge revisions with > fewer conflicts. > > > The proposal: > > The idea is to let user give hints about what was changed during > the commit. For example, if user did a rename which wasn't automatically > detected, he would append something like the following to his commit > message: > > Tracking-hints: rename dev-vcs/git/git-1.0.ebuild -> > dev-vcs/git/git-2.0.ebuild > > or (if full paths of affected files can be unambiguously omitted): > > Tracking-hints: rename git-1.0.ebuild -> git-2.0.ebuild > > There may be other hint types: > > Tracking-hint: recreate LICENSE.txt > Tracking-hint: split main.c -> main.c cmdline.c > Tracking-hint: merge linalg.py <- vector.py matrix.py > > or even something like this: > > Tracking-hint: copy json.py <- > libs/json.py@4db88291251151d8c5c8e4f20430fa4def2cb2ed > > If file transformation cannot be described by a single tracking hint, it > shall > be possible to specify a sequence of hints at once: > > Tracking-hint: > split Utils.java -> AppHelpers.java StringHelpers.java > recreate Utils.java > > Note that in the above example the order of operations really matters, so > both lines have to reside in one 'Tracking-hint' block. > > * * * > > How do you think, is this idea worth implementing? > Any other thoughts on this? > > -- Pavel Kretov. Maybe use the "interpret-trailers" methods for standardising your hints locally (in your team / workplace) to see how it goes and flesh out what works and what doesn't. Trying to decide, a-priori, what are the right hints is likely to be the hard part. -- Philip ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-11 21:48 ` Philip Oakley @ 2017-09-13 11:38 ` Johannes Schindelin 2017-09-14 23:22 ` Philip Oakley 0 siblings, 1 reply; 18+ messages in thread From: Johannes Schindelin @ 2017-09-13 11:38 UTC (permalink / raw) To: Philip Oakley; +Cc: Pavel Kretov, git Hi Philip, On Mon, 11 Sep 2017, Philip Oakley wrote: > From: "Pavel Kretov" <firegurafiku@gmail.com> > > Hi all, > > > > Excuse me if the topic I'm going to raise here has been already discussed > > on the mailing list, forums, or IRC, but I couldn't find anything related. > > > > > > The problem: > > > > Git, being "a stupid content tracker", doesn't try to keep an eye on > > operations which happens to individual files; things like file renames > > aren't recorded during commit, but heuristically detected later. > > > > Unfortunately, the heuristic can only deal with simple file renames with > > no substantial content changes; it's helpless when you: > > > > - rename file and change it's content significantly; > > - split single file into several files; > > - merge several files into another; > > - copy entire file from another commit, and do other things like these. > > > > However, if we're able to preserve this information, it's possible > > not only to do more accurate 'git blame', but also merge revisions with > > fewer conflicts. > > > > > > The proposal: > > > > The idea is to let user give hints about what was changed during > > the commit. For example, if user did a rename which wasn't automatically > > detected, he would append something like the following to his commit > > message: > > > > Tracking-hints: rename dev-vcs/git/git-1.0.ebuild -> > > dev-vcs/git/git-2.0.ebuild > > > > or (if full paths of affected files can be unambiguously omitted): > > > > Tracking-hints: rename git-1.0.ebuild -> git-2.0.ebuild > > > > There may be other hint types: > > > > Tracking-hint: recreate LICENSE.txt > > Tracking-hint: split main.c -> main.c cmdline.c > > Tracking-hint: merge linalg.py <- vector.py matrix.py > > > > or even something like this: > > > > Tracking-hint: copy json.py <- > > libs/json.py@4db88291251151d8c5c8e4f20430fa4def2cb2ed > > > > If file transformation cannot be described by a single tracking hint, it > > shall > > be possible to specify a sequence of hints at once: > > > > Tracking-hint: > > split Utils.java -> AppHelpers.java StringHelpers.java > > recreate Utils.java > > > > Note that in the above example the order of operations really matters, so > > both lines have to reside in one 'Tracking-hint' block. > > > > * * * > > > > How do you think, is this idea worth implementing? > > Any other thoughts on this? > > > > -- Pavel Kretov. > > Maybe use the "interpret-trailers" methods for standardising your hints > locally (in your team / workplace) to see how it goes and flesh out what works > and what doesn't. Trying to decide, a-priori, what are the right hints is > likely to be the hard part. I think this adds a very valuable insight to this discussion: the current state of Git's rename handling is based on the idea that you either record the renames, or you detect them. Like, there is either "on" or "off". No middle ground. However, if you understand that there is also the possibility of hints that can help any erroneous rename detection (and *everybody* who seriously worked on a massive code base has seen that rename detection fail in the most inopportune ways [*1*]), then you are on to something. So I totally like the idea of introducing hints, possibly as trailers in the commit message (or as refs/notes/rename/* or whatever) that can be picked up by Git versions that know about them, and can be ignored by Git versions that insist on the rename detection du jour. With a config option to control the behavior, maybe, too. Ciao, Dscho Footnote *1*: Just to name a couple of examples from my personal experience, off the top of my head: - license boiler plates often let Git detect renames/copies where there are none, - even something as trivial as moving Java classes (and their dependent classes) between packages changes every line referring to said packages, causing Git's rename detection to go for a drink instead of doing its job, - indentation changes overwhelm Git's rename detection, - when rename detection would matter most, like, really a lot, to lift the burden of the human beings in front of the computer pouring over hundreds of thousands of files moved from one directory tree to another, that's exactly when Git's rename detection says that there are too many files, here are my union rights, I am going home, good luck to you. In light of such experiences, I have to admit that the notion that the rename detection can always be improved in hindsight puts quite a bit of insult to injury for those developers who are bitten by it. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-13 11:38 ` Johannes Schindelin @ 2017-09-14 23:22 ` Philip Oakley 2017-09-29 23:12 ` Johannes Schindelin 0 siblings, 1 reply; 18+ messages in thread From: Philip Oakley @ 2017-09-14 23:22 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Pavel Kretov, git From: "Johannes Schindelin" <Johannes.Schindelin@gmx.de> > Hi Philip, > > On Mon, 11 Sep 2017, Philip Oakley wrote: > >> From: "Pavel Kretov" <firegurafiku@gmail.com> >> > Hi all, >> > >> > Excuse me if the topic I'm going to raise here has been already >> > discussed >> > on the mailing list, forums, or IRC, but I couldn't find anything >> > related. >> > >> > >> > The problem: >> > >> > Git, being "a stupid content tracker", doesn't try to keep an eye on >> > operations which happens to individual files; things like file renames >> > aren't recorded during commit, but heuristically detected later. >> > >> > Unfortunately, the heuristic can only deal with simple file renames >> > with >> > no substantial content changes; it's helpless when you: >> > >> > - rename file and change it's content significantly; >> > - split single file into several files; >> > - merge several files into another; >> > - copy entire file from another commit, and do other things like these. >> > >> > However, if we're able to preserve this information, it's possible >> > not only to do more accurate 'git blame', but also merge revisions with >> > fewer conflicts. >> > >> > >> > The proposal: >> > >> > The idea is to let user give hints about what was changed during >> > the commit. For example, if user did a rename which wasn't >> > automatically >> > detected, he would append something like the following to his commit >> > message: >> > >> > Tracking-hints: rename dev-vcs/git/git-1.0.ebuild -> >> > dev-vcs/git/git-2.0.ebuild >> > >> > or (if full paths of affected files can be unambiguously omitted): >> > >> > Tracking-hints: rename git-1.0.ebuild -> git-2.0.ebuild >> > >> > There may be other hint types: >> > >> > Tracking-hint: recreate LICENSE.txt >> > Tracking-hint: split main.c -> main.c cmdline.c >> > Tracking-hint: merge linalg.py <- vector.py matrix.py >> > >> > or even something like this: >> > >> > Tracking-hint: copy json.py <- >> > libs/json.py@4db88291251151d8c5c8e4f20430fa4def2cb2ed >> > >> > If file transformation cannot be described by a single tracking hint, >> > it >> > shall >> > be possible to specify a sequence of hints at once: >> > >> > Tracking-hint: >> > split Utils.java -> AppHelpers.java StringHelpers.java >> > recreate Utils.java >> > >> > Note that in the above example the order of operations really matters, >> > so >> > both lines have to reside in one 'Tracking-hint' block. >> > >> > * * * >> > >> > How do you think, is this idea worth implementing? >> > Any other thoughts on this? >> > >> > -- Pavel Kretov. >> >> Maybe use the "interpret-trailers" methods for standardising your hints >> locally (in your team / workplace) to see how it goes and flesh out what >> works >> and what doesn't. Trying to decide, a-priori, what are the right hints is >> likely to be the hard part. > > I think this adds a very valuable insight to this discussion: the current > state of Git's rename handling is based on the idea that you either record > the renames, or you detect them. Like, there is either "on" or "off". No > middle ground. > > However, if you understand that there is also the possibility of hints > that can help any erroneous rename detection (and *everybody* who > seriously worked on a massive code base has seen that rename detection > fail in the most inopportune ways [*1*]), then you are on to something. > > So I totally like the idea of introducing hints, possibly as trailers in > the commit message (or as refs/notes/rename/* or whatever) that can be > picked up by Git versions that know about them, and can be ignored by Git > versions that insist on the rename detection du jour. With a config option > to control the behavior, maybe, too. > > Ciao, > Dscho > > Footnote *1*: Just to name a couple of examples from my personal > experience, off the top of my head: > > - license boiler plates often let Git detect renames/copies where there > are none, > > - even something as trivial as moving Java classes (and their dependent > classes) between packages changes every line referring to said packages, > causing Git's rename detection to go for a drink instead of doing its > job, > > - indentation changes overwhelm Git's rename detection, > > - when rename detection would matter most, like, really a lot, to lift the > burden of the human beings in front of the computer pouring over > hundreds of thousands of files moved from one directory tree to another, > that's exactly when Git's rename detection says that there are too many > files, here are my union rights, I am going home, good luck to you. > > In light of such experiences, I have to admit that the notion that the > rename detection can always be improved in hindsight puts quite a bit of > insult to injury for those developers who are bitten by it. Your list made me think that the hints should be directed toward what may be considered existing solutions for those specific awkward cases. So the hints could be (by type): - template;licence;boiler-plate;standard;reference :: copy - word-rename - regex for word substitution changes (e.g. which chars are within 'Word-_0`) - regex for white-space changes (i.e. which chars are considered whitespace.) - move-dir path/glob spec - move-file path/glob spec (maybe list each 'group' of moves, so that once found the rest of the rename detection follows the group.) Once the particular hint is detected (path qualified) then the clue/hint is used to assist in parsing the files to simplify the comparison task and locate common lines, or common word patterns. The first example is just a set of alternate terms folk use for the new duplicate file file case. The second is a hint that there has been a number of fairly global name changes in the files. so not only do a word diff but detect & sumarise those global changes. (your class move example) The third is the more simple global word changes, based on a limited char set for a 'word' token list. The fourth is where we are focussed on the white space part (complementing the word token viewpoint) The move hints are lists of path specs that each have distinctly moved. It may be possible to order the hints as well, so that the detections work in the right order, giving the heuristics a better chance! -- Philip ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-14 23:22 ` Philip Oakley @ 2017-09-29 23:12 ` Johannes Schindelin 2017-09-30 8:02 ` Jeff Hostetler 0 siblings, 1 reply; 18+ messages in thread From: Johannes Schindelin @ 2017-09-29 23:12 UTC (permalink / raw) To: Philip Oakley; +Cc: Pavel Kretov, git Hi Philip, On Fri, 15 Sep 2017, Philip Oakley wrote: > From: "Johannes Schindelin" <Johannes.Schindelin@gmx.de> > > > In light of such experiences, I have to admit that the notion that the > > rename detection can always be improved in hindsight puts quite a bit of > > insult to injury for those developers who are bitten by it. > > Your list made me think that the hints should be directed toward what may be > considered existing solutions for those specific awkward cases. > > So the hints could be (by type): > - template;licence;boiler-plate;standard;reference :: copy > - word-rename > - regex for word substitution changes (e.g. which chars are within 'Word-_0`) > - regex for white-space changes (i.e. which chars are considered whitespace.) > - move-dir path/glob spec > - move-file path/glob spec > (maybe list each 'group' of moves, so that once found the rest of the rename > detection follows the group.) > > Once the particular hint is detected (path qualified) then the clue/hint is > used to assist in parsing the files to simplify the comparison task and locate > common lines, or common word patterns. > > The first example is just a set of alternate terms folk use for the new > duplicate file file case. > > The second is a hint that there has been a number of fairly global name > changes in the files. so not only do a word diff but detect & sumarise those > global changes. (your class move example) > > The third is the more simple global word changes, based on a limited char set > for a 'word' token list. > The fourth is where we are focussed on the white space part (complementing the > word token viewpoint) > > The move hints are lists of path specs that each have distinctly moved. > > It may be possible to order the hints as well, so that the detections work in > the right order, giving the heuristics a better chance! I think my point was: no matter how likely we thought any heuristic rename detection can be perfected over time, history proved that suspicion incorrect. Therefore, it would be good to have a way to tell Git about renames explicitly so that it does not even need to use its heuristics. Ciao, Dscho ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-29 23:12 ` Johannes Schindelin @ 2017-09-30 8:02 ` Jeff Hostetler 2017-09-30 15:11 ` Johannes Schindelin 2017-10-01 3:27 ` Junio C Hamano 0 siblings, 2 replies; 18+ messages in thread From: Jeff Hostetler @ 2017-09-30 8:02 UTC (permalink / raw) To: Johannes Schindelin, Philip Oakley; +Cc: Pavel Kretov, git On 9/29/2017 7:12 PM, Johannes Schindelin wrote: > Hi Philip, > > On Fri, 15 Sep 2017, Philip Oakley wrote: > >> From: "Johannes Schindelin" <Johannes.Schindelin@gmx.de> >> >>> In light of such experiences, I have to admit that the notion that the >>> rename detection can always be improved in hindsight puts quite a bit of >>> insult to injury for those developers who are bitten by it. >> >> Your list made me think that the hints should be directed toward what may be >> considered existing solutions for those specific awkward cases. >> >> So the hints could be (by type): >> - template;licence;boiler-plate;standard;reference :: copy >> - word-rename >> - regex for word substitution changes (e.g. which chars are within 'Word-_0`) >> - regex for white-space changes (i.e. which chars are considered whitespace.) >> - move-dir path/glob spec >> - move-file path/glob spec >> (maybe list each 'group' of moves, so that once found the rest of the rename >> detection follows the group.) >> >> Once the particular hint is detected (path qualified) then the clue/hint is >> used to assist in parsing the files to simplify the comparison task and locate >> common lines, or common word patterns. >> >> The first example is just a set of alternate terms folk use for the new >> duplicate file file case. >> >> The second is a hint that there has been a number of fairly global name >> changes in the files. so not only do a word diff but detect & sumarise those >> global changes. (your class move example) >> >> The third is the more simple global word changes, based on a limited char set >> for a 'word' token list. >> The fourth is where we are focussed on the white space part (complementing the >> word token viewpoint) >> >> The move hints are lists of path specs that each have distinctly moved. >> >> It may be possible to order the hints as well, so that the detections work in >> the right order, giving the heuristics a better chance! > > I think my point was: no matter how likely we thought any heuristic rename > detection can be perfected over time, history proved that suspicion > incorrect. > > Therefore, it would be good to have a way to tell Git about renames > explicitly so that it does not even need to use its heuristics. Agreed. It would be nice if every file (and tree) had a permanent GUID associated with it. Then the filename/pathname becomes a property of the GUIDs. Then you can exactly know about moves/renames with minimal effort (and no guessing). But I suppose that ship has sailed... Jeff ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-30 8:02 ` Jeff Hostetler @ 2017-09-30 15:11 ` Johannes Schindelin 2017-10-01 3:27 ` Junio C Hamano 1 sibling, 0 replies; 18+ messages in thread From: Johannes Schindelin @ 2017-09-30 15:11 UTC (permalink / raw) To: Jeff Hostetler; +Cc: Philip Oakley, Pavel Kretov, git Hi Jeff, On Sat, 30 Sep 2017, Jeff Hostetler wrote: > On 9/29/2017 7:12 PM, Johannes Schindelin wrote: > > > Therefore, it would be good to have a way to tell Git about renames > > explicitly so that it does not even need to use its heuristics. > > Agreed. > > It would be nice if every file (and tree) had a permanent GUID > associated with it. Then the filename/pathname becomes a property > of the GUIDs. Then you can exactly know about moves/renames with > minimal effort (and no guessing). But I suppose that ship has sailed... Yes, that ship has sailed. But we still could teach Git to understand certain "hints" (that would be really more like "cluebats"). So while we cannot have any GUIDs that are persistent across renames/moves (and which users would probably get wrong all the time by using third-party tools that are not Git-rename aware), we have unique identifiers: the object names. And we could easily have a lookup table of pairs of object names, telling Git that they were source and target of a rename. When Git would try to figure out whether anything was renamed, it would first look at that lookup table and save itself a lot of work (and opportunity to fail) and short-cut the rename detection. Ciao, Johannes ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-09-30 8:02 ` Jeff Hostetler 2017-09-30 15:11 ` Johannes Schindelin @ 2017-10-01 3:27 ` Junio C Hamano 2017-10-02 17:41 ` Stefan Beller 1 sibling, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2017-10-01 3:27 UTC (permalink / raw) To: Jeff Hostetler; +Cc: Johannes Schindelin, Philip Oakley, Pavel Kretov, git Jeff Hostetler <git@jeffhostetler.com> writes: > On 9/29/2017 7:12 PM, Johannes Schindelin wrote: > >> Therefore, it would be good to have a way to tell Git about renames >> explicitly so that it does not even need to use its heuristics. > > Agreed. > > It would be nice if every file (and tree) had a permanent GUID > associated with it. Then the filename/pathname becomes a property > of the GUIDs. Then you can exactly know about moves/renames with > minimal effort (and no guessing). I actually like the idea to have a mechanism where the user can give hint to influence, or instruction to dictate, how Git determines "this old path moved to this new path" when comparing two trees. A human would not consider a new file (e.g. header file) that begins with a few dozen commonly-seen boilerplate lines (e.g. copyright statement) followed by several lines unique to the new contents to be a rename of a disappearing old file that begins with the same boilerplate followed by several lines that are different from what is in the new file, but Git's algorithm would give equal weight to all of these lines when deciding how similar the new file is to the old file, and can misidentify a new file to be a rename of an old file that is unrelated. Even when Git can and does determine the pairing correctly, it would be a win if we do not have to recompute the same pairing every time. So both as hint and as cache, such a mechanism would make sense [*1*]. But "file ID" does not have any place to contribute to such a mechanism. Each of two developers working on the same project in a disributed environment can grab the same gist and create a new file in his or her tree, perhaps at the same path or at a different path. At the time of such an addition, there is no way for each of them to give these two files the same "file ID" (that is how the world works in the distributed environment after all)---which "file ID" should survive when their two histories finally meet and results in a single file after a merge? A file with "file ID" may not be renamed but may be copied and evolve separately and differently. Which one should inherit its original "file ID" and how does having "file ID" help us identify the other one is equally related to the original file? These two are merely examples that "file ID"s would cause while solving "only" what can be expressed in "git diff -M" output (the latter illustrates that it does not even help showing "git diff -C"). And when we stop limiting ourselves to the whole-file renames and copies (which can be expressed in "git diff" output) but also want to help finer-grained operation like "git blame", we'd want to have something that helps in situations like a single file's contents split into multiple files and multiple files' contents concatenated into a single new file, both of which happens during code refactoring. "file ID" would not contribute an iota in helping these situations. I've said this number of times, and I'll say this again, but one of the most important message in our list archive is gmane:217 aka https://public-inbox.org/git/Pine.LNX.4.58.0504150753440.7211@ppc970.osdl.org/ I'd encourge people to read and re-read that message until they can recite it by heart. Linus mentions "CVS annotate"; the message was written long before we had "git blame", and it served as a guide when desiging how we dig contents movement in various parts of the system. [Footnote] *1* There are many possible implementations; the most obvious would be to record a pair of blob object names and instruct Git when it seems one side of a pair disappearing and the other side of the pair appearing, take the pair as a rename. And that would be sufficient for "git log -M". Such a cache/hint alone however would not help much in "git merge" without further work, as we merge using only the tree state of the three points in the history (i.e. the common ancestor and two tips). merge-recursive needs to be taught to find the renames at each commit it finds throughout the history from the ancestor and each tip and carry its finding through if it wants to take advantage of such hint/cache. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-10-01 3:27 ` Junio C Hamano @ 2017-10-02 17:41 ` Stefan Beller 2017-10-02 18:51 ` Jeff Hostetler 2017-10-03 0:45 ` Junio C Hamano 0 siblings, 2 replies; 18+ messages in thread From: Stefan Beller @ 2017-10-02 17:41 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff Hostetler, Johannes Schindelin, Philip Oakley, Pavel Kretov, git@vger.kernel.org >> It would be nice if every file (and tree) had a permanent GUID >> associated with it. Then the filename/pathname becomes a property >> of the GUIDs. Then you can exactly know about moves/renames with >> minimal effort (and no guessing). > ... > https://public-inbox.org/git/Pine.LNX.4.58.0504150753440.7211@ppc970.osdl.org/ > > I'd encourge people to read and re-read that message until they can > recite it by heart. I have rethought about the idea of GUIDs as proposed by Jeff and wanted to give a reply. After rereading this message, I think my thoughts are already included via: - you're doing the work at the wrong point for _another_ reason. You're freezing your (crappy) algorithm at tree creation time, and basically making it pointless to ever create something better later, because even if hardware and software improves, you've codified that "we have to have crappy information". -- My design proposal for these "rename hints" would be a special trailer, roughly: Rename: LICENSE -> legal.txt Rename: t/* -> tests/* or more generally: Rename: <pathspec> <delim> <pathspec> This however has multiple issues due to potential human inaccuracies: (A) typos in the trailer key or in the pathspec (resulting in different error modes) (B) partial hints (We currently have a world of completely missing hints, so I would not expect it to be worse?) (C) wrong hints. This ought to be no problem as Git would take some CPU time to conclude the hint was bogus. For (A), I would imagine we want a mechanism (e.g. notes) to "correct" the hints. This is the similar issue as a typo in a commit message, which we currently just ignore if the commit has been merged to e.g. master. So maybe we'd just design around that, giving the option to give the correct hints via command line. So if the commit has the typo'd hint Remame: t/* -> tests/* the human would see that (and also conclude that by the commit message), and then invoke git log -C -C-hint="t/* -> tests/*" ... which would have the corrected hint and hence deliver the best output. Maybe the "-C-hint" flag is the best starting point when going in that direction? Thanks, Stefan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-10-02 17:41 ` Stefan Beller @ 2017-10-02 18:51 ` Jeff Hostetler 2017-10-02 19:18 ` Stefan Beller 2017-10-03 0:45 ` Junio C Hamano 1 sibling, 1 reply; 18+ messages in thread From: Jeff Hostetler @ 2017-10-02 18:51 UTC (permalink / raw) To: Stefan Beller, Junio C Hamano Cc: Johannes Schindelin, Philip Oakley, Pavel Kretov, git@vger.kernel.org On 10/2/2017 1:41 PM, Stefan Beller wrote: >>> It would be nice if every file (and tree) had a permanent GUID >>> associated with it. Then the filename/pathname becomes a property >>> of the GUIDs. Then you can exactly know about moves/renames with >>> minimal effort (and no guessing). >> > ... > >> https://public-inbox.org/git/Pine.LNX.4.58.0504150753440.7211@ppc970.osdl.org/ >> >> I'd encourge people to read and re-read that message until they can >> recite it by heart. > > I have rethought about the idea of GUIDs as proposed by Jeff and wanted > to give a reply. After rereading this message, I think my thoughts are > already included via: > > - you're doing the work at the wrong point for _another_ reason. You're > freezing your (crappy) algorithm at tree creation time, and basically > making it pointless to ever create something better later, because even > if hardware and software improves, you've codified that "we have to > have crappy information". > > -- > My design proposal for these "rename hints" would be a special trailer, > roughly: > > Rename: LICENSE -> legal.txt > Rename: t/* -> tests/* > > or more generally: > > Rename: <pathspec> <delim> <pathspec> > > This however has multiple issues due to potential > human inaccuracies: > (A) typos in the trailer key or in the pathspec > (resulting in different error modes) > (B) partial hints (We currently have a world of > completely missing hints, so I would not expect it to > be worse?) > (C) wrong hints. This ought to be no problem as Git would > take some CPU time to conclude the hint was bogus. > > For (A), I would imagine we want a mechanism (e.g. notes) > to "correct" the hints. This is the similar issue as a typo in a > commit message, which we currently just ignore if the > commit has been merged to e.g. master. > > So maybe we'd just design around that, giving the option > to give the correct hints via command line. > > So if the commit has the typo'd hint > > Remame: t/* -> tests/* > > the human would see that (and also conclude that by > the commit message), and then invoke > > git log -C -C-hint="t/* -> tests/*" ... > > which would have the corrected hint and hence deliver > the best output. > > Maybe the "-C-hint" flag is the best starting point when > going in that direction? > > Thanks, > Stefan > Sorry to re-re-...-re-stir up such an old topic. I wasn't really thinking about commit-to-commit hints. I think these have lots of problems. (If commit A->B does "t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*", then you need a way to compute a transitive closure to see the net-net hints for A->C. I think that quickly spirals out of control.) No, I was going in another direction. For example, if a tree-entry contains { file-guid, file-name, file-sha, ... } then when diffing any 2 commits, you can match up files (and folders) by their guids. Renames pop out trivially when their file-names don't match. File moves pop out when the file-guids appear in different trees. Adds and deletes pop out when file-guids don't have a peer. (I'm glossing over some of the details, but you get the idea.) To address Junio's question, independently added files with the same name will have 2 different file-guids. We amend the merge rules to handle this case and pick one of them (say, the one that is sorts less than the other) as the winner and go on. All-in-all the solution is not trivial (as there are a few edge cases to deal with), but it better matches the (casual) user's perception of what happened to their tree over time. It also doesn't require expensive code to sniff for renames on every command (which doesn't scale on really large repos). But as I said before, that ship has passed... Jeff ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-10-02 18:51 ` Jeff Hostetler @ 2017-10-02 19:18 ` Stefan Beller 2017-10-02 20:02 ` Jeff Hostetler 0 siblings, 1 reply; 18+ messages in thread From: Stefan Beller @ 2017-10-02 19:18 UTC (permalink / raw) To: Jeff Hostetler Cc: Junio C Hamano, Johannes Schindelin, Philip Oakley, Pavel Kretov, git@vger.kernel.org On Mon, Oct 2, 2017 at 11:51 AM, Jeff Hostetler <git@jeffhostetler.com> wrote: > Sorry to re-re-...-re-stir up such an old topic. > > I wasn't really thinking about commit-to-commit hints. > I think these have lots of problems. (If commit A->B does > "t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*", > then you need a way to compute a transitive closure to see > the net-net hints for A->C. I think that quickly spirals > out of control.) I agree. Though as a human I can still look at A..C giving the hint that t/*.c and xyz/*.c ought to be taken into account for rename detection. (which is currently done with -M -C --find-copies-harder as a generic "there are renamed things", and not the very specific rule, that may be cheaper to examine compared to these generic rules) > No, I was going in another direction. For example, if a > tree-entry contains { file-guid, file-name, file-sha, ... } > then when diffing any 2 commits, you can match up files > (and folders) by their guids. Renames pop out trivially when > their file-names don't match. File moves pop out when the > file-guids appear in different trees. Adds and deletes pop > out when file-guids don't have a peer. (I'm glossing over some > of the details, but you get the idea.) How do you know when a guid needs adaption? (c.f. origin/jt/packmigrate) If a commit moves a function out of a file into a new file, the ideal version control could notice that the function was moved into a new file and still attribute the original authors by ignoring the move commit. Another series in flight could have modified that function slightly (fixed a bug), such that it's hard to reason about these things. For guids I imagine the new file gets a new guid, such that tracking the function becomes harder? > To address Junio's > question, independently added files with the same name will > have 2 different file-guids. We amend the merge rules to > handle this case and pick one of them (say, the one that > is sorts less than the other) as the winner and go on. > All-in-all the solution is not trivial (as there are a few > edge cases to deal with), but it better matches the (casual) > user's perception of what happened to their tree over time. The GUID would be made up at creation time, I assume? Is there any input other than the file itself? (I assumed so initially, such that: By having a GUID in the tree, we would divorce from the notion of a "content addressable file system" quickly, as we both could create the same tree locally (containing the same blobs) and yet the trees would have different names due to having different GUIDs in them ), which I'd find undesirable. > It also doesn't require expensive code to sniff for renames > on every command (which doesn't scale on really large repos). I wonder if the rename detection could be offloaded to a server (which scales) that provides a "hint file" to clients, such that the clients can then cheaply make use of these specific hints. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-10-02 19:18 ` Stefan Beller @ 2017-10-02 20:02 ` Jeff Hostetler 2017-10-03 0:52 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: Jeff Hostetler @ 2017-10-02 20:02 UTC (permalink / raw) To: Stefan Beller Cc: Junio C Hamano, Johannes Schindelin, Philip Oakley, Pavel Kretov, git@vger.kernel.org On 10/2/2017 3:18 PM, Stefan Beller wrote: > On Mon, Oct 2, 2017 at 11:51 AM, Jeff Hostetler <git@jeffhostetler.com> wrote: > >> Sorry to re-re-...-re-stir up such an old topic. >> >> I wasn't really thinking about commit-to-commit hints. >> I think these have lots of problems. (If commit A->B does >> "t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*", >> then you need a way to compute a transitive closure to see >> the net-net hints for A->C. I think that quickly spirals >> out of control.) > > I agree. Though as a human I can still look at > A..C giving the hint that t/*.c and xyz/*.c ought to > be taken into account for rename detection. > (which is currently done with -M -C --find-copies-harder > as a generic "there are renamed things", and not the very > specific rule, that may be cheaper to examine compared to > these generic rules) > >> No, I was going in another direction. For example, if a >> tree-entry contains { file-guid, file-name, file-sha, ... } >> then when diffing any 2 commits, you can match up files >> (and folders) by their guids. Renames pop out trivially when >> their file-names don't match. File moves pop out when the >> file-guids appear in different trees. Adds and deletes pop >> out when file-guids don't have a peer. (I'm glossing over some >> of the details, but you get the idea.) > > How do you know when a guid needs adaption? I'm not sure I know what you mean by "adaption". > > (c.f. origin/jt/packmigrate) > If a commit moves a function out of a file into a new file, > the ideal version control could notice that the function > was moved into a new file and still attribute the original > authors by ignoring the move commit. I think that's an orthogonal problem. I could move a function from one file to an existing file or to a new file it doesn't matter. Attributing those lines back to the original author (rather than the mover) is a bit of a pipe dream IMHO. And I have to wonder if it is always the correct thing to do? I can see scenarios where you'd want the mover. I guess there's nothing from stopping the "ideal VC system" doing all this line-based analysis, but that shouldn't make file renames expensive to detect (since that is the granularity that people and most tools expect the system to work with). > > Another series in flight could have modified that > function slightly (fixed a bug), such that it's hard to > reason about these things. > > For guids I imagine the new file gets a new guid, such that > tracking the function becomes harder? > Yeah, I'm not thinking about tracking individual functions. > >> To address Junio's >> question, independently added files with the same name will >> have 2 different file-guids. We amend the merge rules to >> handle this case and pick one of them (say, the one that >> is sorts less than the other) as the winner and go on. >> All-in-all the solution is not trivial (as there are a few >> edge cases to deal with), but it better matches the (casual) >> user's perception of what happened to their tree over time. > > The GUID would be made up at creation time, I assume? > Is there any input other than the file itself? (I assumed so > initially, such that: > By having a GUID in the tree, we would divorce from the notion > of a "content addressable file system" quickly, as we both could > create the same tree locally (containing the same blobs) and > yet the trees would have different names due to having different > GUIDs in them > ), which I'd find undesirable. Right. A real solution would store the guid data slightly differently so we could preserve the existing SHA properties. My example was more conceptual. > >> It also doesn't require expensive code to sniff for renames >> on every command (which doesn't scale on really large repos). > > I wonder if the rename detection could be offloaded to a server > (which scales) that provides a "hint file" to clients, such that the > clients can then cheaply make use of these specific hints. > I don't know. Might be easier to add that computation to the occasional client-side housekeeping (somewhat like the commit generation number computation we keep talking about). Thanks Jeff ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-10-02 20:02 ` Jeff Hostetler @ 2017-10-03 0:52 ` Junio C Hamano 0 siblings, 0 replies; 18+ messages in thread From: Junio C Hamano @ 2017-10-03 0:52 UTC (permalink / raw) To: Jeff Hostetler Cc: Stefan Beller, Johannes Schindelin, Philip Oakley, Pavel Kretov, git@vger.kernel.org Jeff Hostetler <git@jeffhostetler.com> writes: >> How do you know when a guid needs adaption? > > I'm not sure I know what you mean by "adaption". I think he meant adapting, and I think he is referring to what I wrote in the message upthread to explain why "file ID" would not help. It seems to me, from reading the remainder of your message, that it is also becoming clear to you that "file ID" would not help and your conceptual thing was merely a hand-waving that was dubious how it could be made into a concrete working design? Hopefully we can converge on a workable design that does not involve "file ID", and that would be a good outcome. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [idea] File history tracking hints 2017-10-02 17:41 ` Stefan Beller 2017-10-02 18:51 ` Jeff Hostetler @ 2017-10-03 0:45 ` Junio C Hamano 1 sibling, 0 replies; 18+ messages in thread From: Junio C Hamano @ 2017-10-03 0:45 UTC (permalink / raw) To: Stefan Beller Cc: Jeff Hostetler, Johannes Schindelin, Philip Oakley, Pavel Kretov, git@vger.kernel.org Stefan Beller <sbeller@google.com> writes: > I have rethought about the idea of GUIDs as proposed by Jeff and wanted > to give a reply. After rereading this message, I think my thoughts are > already included via: > > - you're doing the work at the wrong point for _another_ reason. You're > freezing your (crappy) algorithm at tree creation time, and basically > making it pointless to ever create something better later, because even > if hardware and software improves, you've codified that "we have to > have crappy information". > > -- > My design proposal for these "rename hints" would be a special trailer, > roughly: > > Rename: LICENSE -> legal.txt > Rename: t/* -> tests/* > > or more generally: > > Rename: <pathspec> <delim> <pathspec> Yes, it is a non starter to have that baked in the log message of a commit object. The principle Linus lays out in the message does not reject such hints stored outside baked-in data structure, which allows mistakes to be corrected without affecting the real history, though. Another thing that makes what you wrote above of dubious value is that it attaches such hints to "a commit" (whether baked inside the log message, or as some form of "notes" that can be associated with a specific commit); it adds hints at a wrong place. Given identical pair of trees <X,Y> that are wrapped in two pairs of commits <A> and <B> where A^{tree}=B^{tree} and A^^{tree}=B^^{tree}, we do not want to have to give duplicated hints for A and B, to help "git show A" and "git show B" to behave the same. Rather, if we said "these two blobs A and B are similar and we want diffcore-rename to pair them, no matter where they appear in any two trees", then "git diff -M X Y", where X and Y may not have any ancestry relationship (they may not even be commits) can be told that the blob A that is in tree X and the blob B that is in tree Y are renames or copies, no matter where in these trees the pair of blobs appear, and no matter how X and Y are related (or unrelated) in the history. That is a bigger reason why annotating a commit may be a bad way to go. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2017-10-03 0:52 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-11 7:11 [idea] File history tracking hints Pavel Kretov 2017-09-11 18:11 ` Stefan Beller 2017-09-11 18:47 ` Jacob Keller 2017-09-11 18:41 ` Jeff King 2017-09-11 20:09 ` Igor Djordjevic 2017-09-11 21:48 ` Philip Oakley 2017-09-13 11:38 ` Johannes Schindelin 2017-09-14 23:22 ` Philip Oakley 2017-09-29 23:12 ` Johannes Schindelin 2017-09-30 8:02 ` Jeff Hostetler 2017-09-30 15:11 ` Johannes Schindelin 2017-10-01 3:27 ` Junio C Hamano 2017-10-02 17:41 ` Stefan Beller 2017-10-02 18:51 ` Jeff Hostetler 2017-10-02 19:18 ` Stefan Beller 2017-10-02 20:02 ` Jeff Hostetler 2017-10-03 0:52 ` Junio C Hamano 2017-10-03 0:45 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).