git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] The diff-raw format updates.
Date: Sun, 22 May 2005 11:35:34 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.58.0505221108000.2307@ppc970.osdl.org> (raw)
In-Reply-To: <7vvf5cc779.fsf_-_@assigned-by-dhcp.cox.net>



On Sat, 21 May 2005, Junio C Hamano wrote:
>
> Update the diff-raw format as Linus and I discussed, except that
> it does not use sequence of underscore '_' letters to express
> nonexistence.  All '0' mode is used for that purpose instead.
> 
> The new diff-raw format can express rename/copy

Having looked at this, I have to disagree.

It can _almost_ express rename/copy, but you can't tell the two apart. In 
both cases you have two different modes, two different SHA1's, and two 
different filenames.

Also, while you can trivially tell whether a file is deleted or new (look 
at the 000000... SHA1), it is pretty illogical to give a "filename" for 
the non-existent side, as in the line

:000000 100644 0000000000000000000000000000000000000000 25ab9eda939ad92bb746c2419d083b1e52117a56	diffcore-pathspec.c	diffcore-pathspec.c

Finally, having now looked at it some more, I've come to realize that it's 
actually pretty hard to tell the different cases apart visually (is it a 
rename or just a change), because the full pathnames can be so long that 
it's not at all immediately obvious.

Anyway, I think we can trivially tweak the filename output to handle all 
these problems.

I'd suggest:

 - we'd continue to have two "filename" fields, with the existing
   termination, but they aren't pure filenames any more, they are just
   tab-separated (or zero-separated) "source" and "destination" fields.

 - if no filename exists (ie the source side of a new file, or the 
   destination side of a deleted file), output "/dev/null". In other
   words, a nonexistent file is _always_ associated with mode 000000, SHA1
   00000..  and a "name field" of "/dev/null".

 - ONLY IF HUMAN-READABLE: if the destination filename is the same as the
   source, drop it (and the tab) completely. This just makes things so
   much more readable, and it's still parseable, because the 
   line-termination is different from the inter-file termination.

   NOTE! In the zero-terminated format, you can't do this, since you 
   wouldn't know where the line ended. You might drop the name completely, 
   but you'd have to have two NUL bytes. I'd argue that since this format 
   isn't human-readable anyway, you might just want to keep the filename 
   the same.

 - in all other cases: if the file is new, prepend a "+", if the file is 
   old, prepend a "*", and if the file goes away, prepend a "-". In other 
   words, the actual pathname (if it exists) always starts at the second
   character and is always prepended by _something_ (ie there is no 
   ambiguoity with pathnames that start in -/+/*).

The above hass a few nice properties, notably you can parse the first
character of the name field, and you always know what's up:

 - '/' is always "/dev/null" aka "no file"
 - '+' is always "added file"
 - '-' is always "removed file"
 - '*' is always "existing file"
 - '\0' (ie empty) is always "same filename as source"

So for the above "create" event, it would look like

	:000000 100644 0000.. 25ab..	/dev/null	+diffcore-pathspec.c

which is visually quite obviously a create. Similarly, deletes are also 
visually pretty obvious:

	:100644 000000 25ab.. 0000..	-diffcore-pathspec.c	/dev/null

while a "copy" would be (git-pull-script stays around, so it gets a "*"):

	:100755 100755 bd89.. 17f2..	*git-pull-script	+git-fetch-script

and a "rename" would be:

	:100644 100644 51bb.. 51bb..	-diff-tree-helper.c	+diff-helper.c

(ie the difference is in the source file having a "-" in front of it 
instead of a "*").

A regular modification would be

	:100644 100644 bcd3.. c05b..	*Documentation/git-fsck-cache.txt

which is also very visually distinct from the other cases.

What do you think?

		Linus

  parent reply	other threads:[~2005-05-22 18:32 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-21 23:12 updated design for the diff-raw format Junio C Hamano
2005-05-21 23:16 ` Junio C Hamano
2005-05-21 23:17 ` Junio C Hamano
2005-05-21 23:18 ` Junio C Hamano
2005-05-21 23:19 ` Junio C Hamano
2005-05-22  2:40 ` [PATCH] Prepare diffcore interface for diff-tree header supression Junio C Hamano
2005-05-22  2:42   ` [PATCH] The diff-raw format updates Junio C Hamano
2005-05-22  6:01     ` Linus Torvalds
2005-05-22  6:33       ` Junio C Hamano
2005-05-22  6:57       ` Junio C Hamano
2005-05-22  8:31         ` [PATCH] Fix tweak in similarity estimator Junio C Hamano
2005-05-22 18:35     ` Linus Torvalds [this message]
2005-05-22 18:36       ` [PATCH] The diff-raw format updates Niklas Hoglund
2005-05-22 19:15         ` Junio C Hamano
2005-05-22 18:42       ` Thomas Glanzmann
2005-05-22 19:05         ` Linus Torvalds
2005-05-22 19:05           ` Thomas Glanzmann
2005-05-22 19:20           ` Junio C Hamano
2005-05-22 19:35             ` Junio C Hamano
2005-05-22 20:24               ` Linus Torvalds
2005-05-22 23:01                 ` Junio C Hamano
2005-05-22 23:14                   ` Linus Torvalds
2005-05-23  0:35                     ` Junio C Hamano
2005-05-23  1:07                       ` Linus Torvalds
2005-05-23  1:33                         ` Junio C Hamano
2005-05-23  4:26               ` [PATCH] Rename/copy detection fix Junio C Hamano
2005-05-23  4:38                 ` Comments on "Rename/copy detection fix." Junio C Hamano
2005-05-22 19:13       ` [PATCH] The diff-raw format updates Junio C Hamano
2005-05-22  9:41   ` [PATCH] Diffcore updates Junio C Hamano
2005-05-22 16:40     ` Linus Torvalds
2005-05-22 16:47       ` Junio C Hamano
2005-05-22 17:04     ` Junio C Hamano
2005-05-23  4:24       ` [PATCH] Be careful with symlinks when detecting renames and copies Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0505221108000.2307@ppc970.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).