From: Junio C Hamano <junkio@cox.net>
To: git@vger.kernel.org
Subject: Re: updated design for the diff-raw format.
Date: Sat, 21 May 2005 16:16:00 -0700 [thread overview]
Message-ID: <7vr7g0dvbj.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <7vwtpsdvgm.fsf@assigned-by-dhcp.cox.net> (Junio C. Hamano's message of "Sat, 21 May 2005 16:12:57 -0700")
(first of the replayed exchange)
To: Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH 3/3] Diff overhaul, adding the other half of copy
detection.
From: Junio C Hamano <junkio@cox.net>
Date: Sat, 21 May 2005 10:56:06 -0700
Message-ID: <7v4qcwihu1.fsf@assigned-by-dhcp.cox.net>
GIT_DIFF_OPTS=--unified=0 is good to me as well (GNU diffutils
2.8.1).
Now I think I am done with diff, except one thing. And this is
quite an incompatible change so I do not know how well it would
work in practice. I am not even advocating this. It is more
like me thinking aloud.
The diff-raw format we have been dealing with (sorry about '\t'
vs ' ' gotcha again) is internally enhanced by diff-core. It
first introduces entries for unmodified paths; '*' entries that
has the same mode/sha1 in from->to pair are such entries, and
that is what the change in the [PATCH 3/3] is about.
*100644->100644 blob 233a250...->66818b4... file0
*100755->100755 blob fc77389...->7b72d3d... file1
+100644 blob 233a250... file2
-100755 blob fc77389... file3
*100644->100644 blob 233a250...->233a250... file4
Then diff-core internally extends the format to make things all
look like this ('*' and '-' are gone and each record acquires
the second path).
100644->100644 233a250...->66818b4... file0 file0
100755->100755 fc77389...->7b72d3d... file1 file1
______->100644 _______...->233a250... file2 file2
100755->______ fc77389...->_______... file3 file3
100644->100644 233a250...->233a250... file4 file4
Internally "______" above are represented with a separate flag
(file_valid), and denotes the absense of either src or dst.
The diff-core is all about manipulating this type of list and
changing one such list into a different list.
For example, rename-edit of file3 into file2 is detected by
diffcore-rename module and these entries:
______->100644 _______...->233a250... file2 file2
100755->______ fc77389...->_______... file3 file3
become:
100755->100644 fc77389...->233a250... file3 file2
What the diffcore-pickaxe does can also be explained clearly
with this model. It takes such a list and works as a "grep".
Once we start to think of it this way, it becomes quite tempting
to change the diff-raw format to actually match the above
concept. Meaning, (1) drop the operation letter +/-/*
(inferrable by looking at the both sides of ->); (2) drop
blob/tree (inferrable it from mode); (3) give two paths (usually
they are the same paths); (4) and perhaps replace '->' with the
same column separator. Like this:
100644 100644 233a250... 66818b4... file0 file0
100755 100755 fc77389... 7b72d3d... file1 file1
______ 100644 _______... 233a250... file2 file2
100755 ______ fc77389... _______... file3 file3
100644 100644 233a250... 233a250... file4 file4
Again, I am not even advocating this. It is more like me
still thinking aloud.
next prev parent reply other threads:[~2005-05-21 23:14 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-21 23:12 updated design for the diff-raw format Junio C Hamano
2005-05-21 23:16 ` Junio C Hamano [this message]
2005-05-21 23:17 ` Junio C Hamano
2005-05-21 23:18 ` Junio C Hamano
2005-05-21 23:19 ` Junio C Hamano
2005-05-22 2:40 ` [PATCH] Prepare diffcore interface for diff-tree header supression Junio C Hamano
2005-05-22 2:42 ` [PATCH] The diff-raw format updates Junio C Hamano
2005-05-22 6:01 ` Linus Torvalds
2005-05-22 6:33 ` Junio C Hamano
2005-05-22 6:57 ` Junio C Hamano
2005-05-22 8:31 ` [PATCH] Fix tweak in similarity estimator Junio C Hamano
2005-05-22 18:35 ` [PATCH] The diff-raw format updates Linus Torvalds
2005-05-22 18:36 ` Niklas Hoglund
2005-05-22 19:15 ` Junio C Hamano
2005-05-22 18:42 ` Thomas Glanzmann
2005-05-22 19:05 ` Linus Torvalds
2005-05-22 19:05 ` Thomas Glanzmann
2005-05-22 19:20 ` Junio C Hamano
2005-05-22 19:35 ` Junio C Hamano
2005-05-22 20:24 ` Linus Torvalds
2005-05-22 23:01 ` Junio C Hamano
2005-05-22 23:14 ` Linus Torvalds
2005-05-23 0:35 ` Junio C Hamano
2005-05-23 1:07 ` Linus Torvalds
2005-05-23 1:33 ` Junio C Hamano
2005-05-23 4:26 ` [PATCH] Rename/copy detection fix Junio C Hamano
2005-05-23 4:38 ` Comments on "Rename/copy detection fix." Junio C Hamano
2005-05-22 19:13 ` [PATCH] The diff-raw format updates Junio C Hamano
2005-05-22 9:41 ` [PATCH] Diffcore updates Junio C Hamano
2005-05-22 16:40 ` Linus Torvalds
2005-05-22 16:47 ` Junio C Hamano
2005-05-22 17:04 ` Junio C Hamano
2005-05-23 4:24 ` [PATCH] Be careful with symlinks when detecting renames and copies Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vr7g0dvbj.fsf@assigned-by-dhcp.cox.net \
--to=junkio@cox.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).