From: Junio C Hamano <junkio@cox.net>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] Fix -B "very-different" logic.
Date: Thu, 02 Jun 2005 18:33:18 -0700 [thread overview]
Message-ID: <7vis0wusv5.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <Pine.LNX.4.58.0506021716140.1876@ppc970.osdl.org> (Linus Torvalds's message of "Thu, 2 Jun 2005 17:21:43 -0700 (PDT)")
>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
LT> Careful.
LT> I think the amount of new code _should_ matter. Otherwise, an old empty
LT> file would always be considered the source of a new file, since the diff
LT> doesn't remove anything. Similarly, just because we have a boilerplate
LT> file shouldn't make that always be considered a "wonderful source", when
LT> people add the real meat to it.
Yes, I agree that rename/copy logic should use different
heuristics from the one I proposed for breaking.
It is my assumption that people in practice tend to make only
small edits after a rename/copy just to adjust things like:
- filenames mentioned in the comment of the file itself,
- include paths that refer other files if the file was
moved/copied from a different directory,
- names of functions and variables.
and making sure there would not be too much new stuff is quite
useful to detect rename/copy source correctly as the current
similarity estimator in diffcore-rename does. I do not intend
to touch that.
The boilderplate example you mention is a very good reason not
to dismiss the amount of new material when doing rename/copy
detection.
LT> In particular, let's say that I used to have two files:
LT> a.c - small helper functions
LT> b.c - the "meat" of the thing
LT> and I end up deciding that I might as well collapse it all into one file,
LT> a.c. What happens? There's almost no deletes from a.c, but there's a lot
LT> of new code in it.
LT> See what I'm saying?
Yes. I think I do.
When git-diff-tree -B -C runs your example, it feeds diffcore
with these:
:100644 100644 sha1-a-helper-only sha1-a-and-meat M a.c
:100644 000000 sha1-b-stale-meat 0{40} D b.c
The ideal diffcore-break breaks a.c because it looks at
insertions as well:
:100644 000000 sha1-a-helper-only 0{40} D a.c
:000000 100644 0{40} sha1-a-and-meat N a.c
:100644 000000 sha1-b-stale-meat 0{40} D b.c
Then diffcore-rename notices that sha1-b-stale-meat is better
match than sha1-a-helper-only to produce sha1-a-and-meat, and
resolves the above to:
:100644 100644 sha1-b-stale-meat sha1-a-and-meat R b.c a.c
Up to this point is just a demonstration that I see your point.
But I still want to keep the example I gave in the original
commit message. Suppose you did not have b.c file under version
control, and did the same operation. I.e. a.c acquired a lot of
good stuff. git-diff-tree -B -C feeds:
:100644 100644 sha1-a-helper-only sha1-a-and-meat M a.c
which is broken into:
:100644 000000 sha1-a-helper-only 0{40} D a.c
:000000 100644 0{40} sha1-a-and-meat N a.c
Unfortunately, in this case nobody absorbs these pairs. I want
to allow you to add 1000 lines of new stuff to a file (which was
originally 100 lines long) as long as you do not remove too many
lines from the original 100 lines without triggering "this is a
rewrite" logic in this case. So after rename/copy runs, we need
to match these up and merge them back into the original.
:100644 100644 sha1-a-helper-only sha1-a-and-meat M a.c
We should carry a bit more information about broken entries than
we currently do. We would break a pair based on both deletion
and insertion, just like the current code (i.e. without the
patch you are responding to) does. But when we do break a pair,
we need to mark them if the "new" side have enough original
source material remaining. If we have such mark to tell us that
"these were broken but there are a good chunk of source material
remaining", the clean-up phase, to run after diffcore-rename
finishes, should be able to notice surviving broken pairs and
merge them back accordingly.
next prev parent reply other threads:[~2005-06-03 1:30 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
2005-05-30 20:33 ` jeff millar
2005-05-30 20:49 ` Nicolas Pitre
2005-06-01 6:52 ` Junio C Hamano
2005-06-01 8:24 ` [PATCH] Add -d flag to git-pull-* family Junio C Hamano
2005-06-01 14:39 ` Nicolas Pitre
2005-06-01 16:00 ` Junio C Hamano
[not found] ` <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net>
2005-06-02 0:47 ` [PATCH] Handle deltified object correctly in git-*-pull family Nicolas Pitre
[not found] ` <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net>
2005-06-02 0:51 ` [PATCH] Stop inflating the whole SHA1 file only to check size Nicolas Pitre
2005-06-02 1:32 ` Junio C Hamano
2005-06-02 0:58 ` [PATCH] Handle deltified object correctly in git-*-pull family Linus Torvalds
2005-06-02 1:43 ` Junio C Hamano
2005-05-30 20:59 ` I want to release a "git-1.0" Junio C Hamano
2005-05-30 21:07 ` Junio C Hamano
2005-05-30 22:11 ` David Greaves
2005-05-30 22:12 ` Dave Jones
2005-05-30 22:55 ` Dmitry Torokhov
2005-05-30 23:15 ` Junio C Hamano
2005-05-30 23:23 ` Dmitry Torokhov
2005-05-31 0:52 ` Linus Torvalds
2005-05-30 22:19 ` Ryan Anderson
2005-05-31 0:58 ` Linus Torvalds
2005-05-30 22:32 ` Chris Wedgwood
2005-05-30 23:56 ` Chris Wedgwood
2005-05-31 1:06 ` Linus Torvalds
2005-06-01 2:11 ` Junio C Hamano
2005-06-01 2:25 ` David Lang
2005-06-01 4:53 ` Junio C Hamano
2005-06-01 20:06 ` David Lang
2005-06-01 20:16 ` C. Scott Ananian
2005-06-02 0:43 ` Nicolas Pitre
2005-06-02 1:14 ` Brian O'Mahoney
2005-06-01 23:03 ` Junio C Hamano
2005-05-31 0:19 ` Petr Baudis
2005-05-31 13:45 ` Eric W. Biederman
2005-06-01 3:04 ` Linus Torvalds
2005-06-01 4:06 ` Junio C Hamano
2005-06-02 23:54 ` [PATCH] Fix -B "very-different" logic Junio C Hamano
2005-06-03 0:21 ` Linus Torvalds
2005-06-03 1:33 ` Junio C Hamano [this message]
2005-06-03 8:32 ` [PATCH 0/4] " Junio C Hamano
2005-06-03 8:36 ` [PATCH 1/4] Tweak count-delta interface Junio C Hamano
2005-06-03 8:36 ` [PATCH 2/4] diff: Fix docs and add -O to diff-helper Junio C Hamano
2005-06-03 8:37 ` [PATCH 3/4] diff: Clean up diff_scoreopt_parse() Junio C Hamano
2005-06-03 8:40 ` [PATCH 4/4] diff: Update -B heuristics Junio C Hamano
2005-06-01 6:28 ` I want to release a "git-1.0" Junio C Hamano
2005-06-01 22:00 ` Daniel Barkalow
2005-06-01 23:05 ` Junio C Hamano
2005-06-03 9:47 ` Petr Baudis
2005-06-03 15:09 ` Daniel Barkalow
2005-06-02 7:15 ` Eric W. Biederman
2005-06-02 8:32 ` Kay Sievers
2005-06-02 14:52 ` Linus Torvalds
2005-06-02 12:02 ` [PATCH] several typos in tutorial Alexey Nezhdanov
2005-06-02 12:41 ` Vincent Hanquez
2005-06-02 12:45 ` Alexey Nezhdanov
2005-06-02 12:51 ` Vincent Hanquez
2005-06-02 12:56 ` Alexey Nezhdanov
2005-06-02 13:00 ` Alexey Nezhdanov
2005-06-02 23:40 ` I want to release a "git-1.0" Adam Kropelin
2005-06-03 0:06 ` Linus Torvalds
2005-06-03 0:47 ` Linus Torvalds
2005-06-03 1:34 ` Adam Kropelin
2005-06-02 19:43 ` CVS migration section to the tutorial Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vis0wusv5.fsf@assigned-by-dhcp.cox.net \
--to=junkio@cox.net \
--cc=git@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).