git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: git@vger.kernel.org
Subject: Re: updated design for the diff-raw format.
Date: Sat, 21 May 2005 16:16:00 -0700	[thread overview]
Message-ID: <7vr7g0dvbj.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <7vwtpsdvgm.fsf@assigned-by-dhcp.cox.net> (Junio C. Hamano's message of "Sat, 21 May 2005 16:12:57 -0700")

(first of the replayed exchange)

To: Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH 3/3] Diff overhaul, adding the other half of copy
 detection.
From: Junio C Hamano <junkio@cox.net>
Date: Sat, 21 May 2005 10:56:06 -0700
Message-ID: <7v4qcwihu1.fsf@assigned-by-dhcp.cox.net>

GIT_DIFF_OPTS=--unified=0 is good to me as well (GNU diffutils
2.8.1).

Now I think I am done with diff, except one thing.  And this is
quite an incompatible change so I do not know how well it would
work in practice.  I am not even advocating this.  It is more
like me thinking aloud.

The diff-raw format we have been dealing with (sorry about '\t'
vs ' ' gotcha again) is internally enhanced by diff-core.  It
first introduces entries for unmodified paths; '*' entries that
has the same mode/sha1 in from->to pair are such entries, and
that is what the change in the [PATCH 3/3] is about.

    *100644->100644 blob 233a250...->66818b4... file0
    *100755->100755 blob fc77389...->7b72d3d... file1
    +100644 blob 233a250... file2
    -100755 blob fc77389... file3
    *100644->100644 blob 233a250...->233a250... file4

Then diff-core internally extends the format to make things all
look like this ('*' and '-' are gone and each record acquires
the second path).

    100644->100644 233a250...->66818b4... file0 file0
    100755->100755 fc77389...->7b72d3d... file1 file1
    ______->100644 _______...->233a250... file2 file2
    100755->______ fc77389...->_______... file3 file3
    100644->100644 233a250...->233a250... file4 file4

Internally "______" above are represented with a separate flag
(file_valid), and denotes the absense of either src or dst.

The diff-core is all about manipulating this type of list and
changing one such list into a different list.

For example, rename-edit of file3 into file2 is detected by
diffcore-rename module and these entries:

    ______->100644 _______...->233a250... file2 file2
    100755->______ fc77389...->_______... file3 file3

become:

    100755->100644 fc77389...->233a250... file3 file2

What the diffcore-pickaxe does can also be explained clearly
with this model.  It takes such a list and works as a "grep".

Once we start to think of it this way, it becomes quite tempting
to change the diff-raw format to actually match the above
concept.  Meaning, (1) drop the operation letter +/-/*
(inferrable by looking at the both sides of ->); (2) drop
blob/tree (inferrable it from mode); (3) give two paths (usually
they are the same paths); (4) and perhaps replace '->' with the
same column separator.  Like this:

    100644 100644 233a250... 66818b4... file0 file0
    100755 100755 fc77389... 7b72d3d... file1 file1
    ______ 100644 _______... 233a250... file2 file2
    100755 ______ fc77389... _______... file3 file3
    100644 100644 233a250... 233a250... file4 file4

Again, I am not even advocating this.  It is more like me
still thinking aloud.







  reply	other threads:[~2005-05-21 23:14 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-21 23:12 updated design for the diff-raw format Junio C Hamano
2005-05-21 23:16 ` Junio C Hamano [this message]
2005-05-21 23:17 ` Junio C Hamano
2005-05-21 23:18 ` Junio C Hamano
2005-05-21 23:19 ` Junio C Hamano
2005-05-22  2:40 ` [PATCH] Prepare diffcore interface for diff-tree header supression Junio C Hamano
2005-05-22  2:42   ` [PATCH] The diff-raw format updates Junio C Hamano
2005-05-22  6:01     ` Linus Torvalds
2005-05-22  6:33       ` Junio C Hamano
2005-05-22  6:57       ` Junio C Hamano
2005-05-22  8:31         ` [PATCH] Fix tweak in similarity estimator Junio C Hamano
2005-05-22 18:35     ` [PATCH] The diff-raw format updates Linus Torvalds
2005-05-22 18:36       ` Niklas Hoglund
2005-05-22 19:15         ` Junio C Hamano
2005-05-22 18:42       ` Thomas Glanzmann
2005-05-22 19:05         ` Linus Torvalds
2005-05-22 19:05           ` Thomas Glanzmann
2005-05-22 19:20           ` Junio C Hamano
2005-05-22 19:35             ` Junio C Hamano
2005-05-22 20:24               ` Linus Torvalds
2005-05-22 23:01                 ` Junio C Hamano
2005-05-22 23:14                   ` Linus Torvalds
2005-05-23  0:35                     ` Junio C Hamano
2005-05-23  1:07                       ` Linus Torvalds
2005-05-23  1:33                         ` Junio C Hamano
2005-05-23  4:26               ` [PATCH] Rename/copy detection fix Junio C Hamano
2005-05-23  4:38                 ` Comments on "Rename/copy detection fix." Junio C Hamano
2005-05-22 19:13       ` [PATCH] The diff-raw format updates Junio C Hamano
2005-05-22  9:41   ` [PATCH] Diffcore updates Junio C Hamano
2005-05-22 16:40     ` Linus Torvalds
2005-05-22 16:47       ` Junio C Hamano
2005-05-22 17:04     ` Junio C Hamano
2005-05-23  4:24       ` [PATCH] Be careful with symlinks when detecting renames and copies Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vr7g0dvbj.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).