git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC/PATCH] add diff-pairs tool
@ 2016-12-01 20:40 Jeff King
  2016-12-01 20:52 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2016-12-01 20:40 UTC (permalink / raw)
  To: git

This is a small tool I cooked up for splitting the tree-diff out from
the generation of the patch text, and doing them in separate processes.
It's a little more expensive than doing it all in one process (besides
the process/pipe overhead, we may load blob content in the tree diff for
rename detection, and then again to do the actual patch). But it gives a
lot of flexibility when handling large diffs. You can save the tree-diff
and then feed it back in chunks to generate the actual patches, and do
so across many invocations (if you had, say, a website that showed diffs
and wanted to show small bits of them and let the user progressively
click through to see more).

Other things I tried but didn't work:

  - feeding two blobs to git-diff doesn't use plumbing, and loses
    information about filenames, etc. It also requires one process per
    pair.

  - feeding pathspecs back to git-diff-tree, like:

       git diff-tree --name-only $a $b |
       while read filename; do
         git diff-tree $a $b -- $filename
       done

    mostly works, but loses rename information (and is also slightly
    less efficient, as it has to walk the trees again).

So instead, I made a tool that takes the pairs generated by "diff-tree
--raw" and formats them as diff-tree would have if it had been done
in-process.

It's rather elegant, I think. But the main reason this is RFC is that
I'm not sure it's actually _useful_ to other people. It's a pretty
narrow use case.

The other reason it's an RFC is that it has a few rough edges. It works
well for my narrow use case, but there are cases where it probably
doesn't:

  - it expects "diff-tree -z" input, which keeps parsing simple. It
    probably wouldn't be too hard to have it handle the normal output,
    too.

  - it should handle submodules OK, but I don't know what it would do if
    you fed it things like dirty submodules from diff-files output.

  - some option combinations between the initial tree-diff and the
    followup diff do not make sense. For instance, if you do a
    non-recursive tree-diff (or one with "-t"), you'll have tree entries
    in your --raw output. Feeding that to "diff-pairs -p" will silently
    ignore the tree entries, because there's no way to represent them as
    a patch.

    Mostly these fall under "well, don't do that", but possibly the tool
    could be more aggressive about complaining, or coming up with some
    sensible behavior.

So anyway. Consider this a rough cut. What I'm really looking for at
this stage is whether anybody is interested enough in this for me to
bother polishing it.

-- >8 --
This takes the output of `diff-tree -z --raw` and feeds it
back to the later stages of the diff machinery to produce
diffs in other formats. Because the interim format contains
any whole-tree copy/rename information, you can safely feed
segments of the tree diff to get progressive patch-format
diffs. So something like:

  git diff-tree -r -z $a $b |
  git diff-pairs -p

should give you the same output that `git diff-tree -p`
would have.  Likewise, feeding each pair individually works,
too:

  git diff-tree -r -z -M $a $b |
  perl -0ne '
	my $meta = $_;
	my $path = <>;
	# only renames have an extra path
	my $path2 = <> if $meta =~ /[RC]\d+/;

	print STDERR "feeding one diff\n";
	open(my $fh, "|git diff-pairs -p");
	print $fh $meta, $path, $path2;
  '

The renames will still be shown just as if the diff had been
done in one process.

Signed-off-by: Jeff King <peff@peff.net>
---
 .gitignore                       |   1 +
 Documentation/git-diff-pairs.txt |  60 +++++++++++++++
 Makefile                         |   1 +
 builtin.h                        |   1 +
 builtin/diff-pairs.c             | 162 +++++++++++++++++++++++++++++++++++++++
 git.c                            |   1 +
 t/t4063-diff-pairs.sh            |  82 ++++++++++++++++++++
 7 files changed, 308 insertions(+)
 create mode 100644 Documentation/git-diff-pairs.txt
 create mode 100644 builtin/diff-pairs.c
 create mode 100755 t/t4063-diff-pairs.sh

diff --git a/.gitignore b/.gitignore
index 05cb58a3d..805751f2e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -48,6 +48,7 @@
 /git-diff
 /git-diff-files
 /git-diff-index
+/git-diff-pairs
 /git-diff-tree
 /git-difftool
 /git-difftool--helper
diff --git a/Documentation/git-diff-pairs.txt b/Documentation/git-diff-pairs.txt
new file mode 100644
index 000000000..4e3b6c909
--- /dev/null
+++ b/Documentation/git-diff-pairs.txt
@@ -0,0 +1,60 @@
+git-diff-pairs(1)
+=================
+
+NAME
+----
+git-diff-pairs - Compare blob pairs generated by `diff-tree --raw`
+
+SYNOPSIS
+--------
+[verse]
+'git diff-pairs' [diff_format_options]
+
+DESCRIPTION
+-----------
+
+Given the output of `diff-tree -z` on its stdin, `diff-pairs` will
+reformat that output into whatever format is requested on its command
+line.  For example:
+
+-----------------------------
+git diff-tree -z -M $a $b |
+git diff-pairs -p
+-----------------------------
+
+will compute the tree diff in one step (including renames), and then
+`diff-pairs` will compute and format the blob-level diffs for each pair.
+This can be used to modify the raw diff in the middle (without having to
+parse or re-create more complicated formats like `--patch`), or to
+compute diffs progressively over the course of multiple invocations of
+`diff-pairs`.
+
+Each blob pair is fed to the diff machinery individually and the output
+flushed immediately, meaning it is safe to interactively read and write
+from `diff-pairs`.
+
+OPTIONS
+-------
+
+All diff options below are accepted, but note that tree-wide options
+like `-M` are effectively noops, as we consider only one pair at a time.
+
+include::diff-options.txt[]
+
+BUGS
+----
+
+`diff-pairs` should handle any input generated by `diff-tree --raw -z`.
+It may choke or otherwise misbehave on output from `diff-files`, etc.
+
+Here's an incomplete list of things that `diff-pairs` could do, but
+doesn't (mostly in the name of simplicity):
+
+ - Only `-z` input is accepted, not normal `--raw` input.
+
+ - Abbreviated sha1s are rejected in the input from `diff-tree`; if you
+   want to abbreviate the output, you can pass `--abbrev` to
+   `diff-pairs`.
+
+ - Pathspecs are not handled by `diff-pairs`; you can limit the diff via
+   the initial `diff-tree` invocation.
diff --git a/Makefile b/Makefile
index f53fcc90d..c7ee3cd61 100644
--- a/Makefile
+++ b/Makefile
@@ -886,6 +886,7 @@ BUILTIN_OBJS += builtin/credential.o
 BUILTIN_OBJS += builtin/describe.o
 BUILTIN_OBJS += builtin/diff-files.o
 BUILTIN_OBJS += builtin/diff-index.o
+BUILTIN_OBJS += builtin/diff-pairs.o
 BUILTIN_OBJS += builtin/diff-tree.o
 BUILTIN_OBJS += builtin/diff.o
 BUILTIN_OBJS += builtin/fast-export.o
diff --git a/builtin.h b/builtin.h
index b9122bc5f..f62134ec1 100644
--- a/builtin.h
+++ b/builtin.h
@@ -59,6 +59,7 @@ extern int cmd_describe(int argc, const char **argv, const char *prefix);
 extern int cmd_diff_files(int argc, const char **argv, const char *prefix);
 extern int cmd_diff_index(int argc, const char **argv, const char *prefix);
 extern int cmd_diff(int argc, const char **argv, const char *prefix);
+extern int cmd_diff_pairs(int argc, const char **argv, const char *prefix);
 extern int cmd_diff_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_fast_export(int argc, const char **argv, const char *prefix);
 extern int cmd_fetch(int argc, const char **argv, const char *prefix);
diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c
new file mode 100644
index 000000000..f80c49fef
--- /dev/null
+++ b/builtin/diff-pairs.c
@@ -0,0 +1,162 @@
+#include "cache.h"
+#include "diff.h"
+#include "diffcore.h"
+#include "revision.h"
+
+static const char diff_pairs_usage[] =
+"git diff-pairs [diff-options]\n"
+"\n"
+"Reads pairs of blobs from stdin in 'diff-tree -z' syntax:\n"
+"\n"
+"  :<mode_a> <mode_b> <sha1_a> <sha1_b> <type>\\0<path>\0[path2\0]\n"
+"\n"
+"and outputs the diff for each a/b pair to stdout.";
+
+static unsigned parse_mode(const char *mode, char **endp)
+{
+	unsigned long ret;
+
+	errno = 0;
+	ret = strtoul(mode, endp, 8);
+	if (errno || *endp == mode || *(*endp)++ != ' ' || (unsigned)ret != ret)
+		die("unable to parse mode: %s", mode);
+	return ret;
+}
+
+static void parse_oid(char *p, char **endp, struct object_id *oid)
+{
+	*endp = p + GIT_SHA1_HEXSZ;
+	if (get_oid_hex(p, oid) || *(*endp)++ != ' ')
+		die("unable to parse object id: %s", p);
+}
+
+static unsigned short parse_score(char *score)
+{
+	unsigned long ret;
+	char *endp;
+
+	errno = 0;
+	ret = strtoul(score, &endp, 10);
+	ret *= MAX_SCORE / 100;
+	if (errno || endp == score || *endp || (unsigned short)ret != ret)
+		die("unable to parse rename/copy score: %s", score);
+	return ret;
+}
+
+/*
+ * The pair-creation is mostly done by diff_change and diff_addremove,
+ * which queue the filepair without returning it. So we have to resort
+ * to pulling it out of the global diff queue.
+ */
+static void set_pair_status(char status)
+{
+	/*
+	 * If we have no items in the queue, for some reason the pair wasn't
+	 * worth queueing. This generally shouldn't happen (since it means
+	 * dropping some parts of the diff), but the user can trigger it with
+	 * things like --ignore-submodules. If they do, the only sensible thing
+	 * is for us to play along and skip it.
+	 */
+	if (!diff_queued_diff.nr)
+		return;
+
+	diff_queued_diff.queue[0]->status = status;
+}
+
+int cmd_diff_pairs(int argc, const char **argv, const char *prefix)
+{
+	struct rev_info revs;
+	struct strbuf meta = STRBUF_INIT;
+	struct strbuf path = STRBUF_INIT;
+	struct strbuf path_dst = STRBUF_INIT;
+
+	init_revisions(&revs, prefix);
+	git_config(git_diff_basic_config, NULL);
+	revs.disable_stdin = 1;
+	argc = setup_revisions(argc, argv, &revs, NULL);
+
+	/* Don't allow pathspecs at all. */
+	if (argc > 1)
+		usage(diff_pairs_usage);
+
+	if (!revs.diffopt.output_format)
+		revs.diffopt.output_format = DIFF_FORMAT_RAW;
+
+	while (1) {
+		unsigned mode_a, mode_b;
+		struct object_id oid_a, oid_b;
+		char status;
+		char *p;
+
+		if (strbuf_getline_nul(&meta, stdin) == EOF)
+			break;
+
+		p = meta.buf;
+		if (*p == ':')
+			p++;
+
+		mode_a = parse_mode(p, &p);
+		mode_b = parse_mode(p, &p);
+
+		parse_oid(p, &p, &oid_a);
+		parse_oid(p, &p, &oid_b);
+
+		status = *p++;
+
+		if (strbuf_getline_nul(&path, stdin) == EOF)
+			die("got EOF while reading path");
+
+		switch (status) {
+		case DIFF_STATUS_ADDED:
+			diff_addremove(&revs.diffopt, '+',
+				       mode_b, oid_b.hash,
+				       1, path.buf, 0);
+			set_pair_status(status);
+			break;
+
+		case DIFF_STATUS_DELETED:
+			diff_addremove(&revs.diffopt, '-',
+				       mode_a, oid_a.hash,
+				       1, path.buf, 0);
+			set_pair_status(status);
+			break;
+
+		case DIFF_STATUS_TYPE_CHANGED:
+		case DIFF_STATUS_MODIFIED:
+			diff_change(&revs.diffopt,
+				    mode_a, mode_b,
+				    oid_a.hash, oid_b.hash,
+				    1, 1, path.buf, 0, 0);
+			set_pair_status(status);
+			break;
+
+		case DIFF_STATUS_RENAMED:
+		case DIFF_STATUS_COPIED:
+			{
+				struct diff_filespec *a, *b;
+				struct diff_filepair *pair;
+
+				if (strbuf_getline_nul(&path_dst, stdin) == EOF)
+					die("got EOF while reading secondary path");
+
+				a = alloc_filespec(path.buf);
+				b = alloc_filespec(path_dst.buf);
+				fill_filespec(a, oid_a.hash, 1, mode_a);
+				fill_filespec(b, oid_b.hash, 1, mode_b);
+
+				pair = diff_queue(&diff_queued_diff, a, b);
+				pair->status = status;
+				pair->score = parse_score(p);
+				pair->renamed_pair = 1;
+			}
+			break;
+
+		default:
+			die("unknown diff status: %c", status);
+		}
+
+		diff_flush(&revs.diffopt);
+	}
+
+	return 0;
+}
diff --git a/git.c b/git.c
index dce529fcb..4063c5366 100644
--- a/git.c
+++ b/git.c
@@ -423,6 +423,7 @@ static struct cmd_struct commands[] = {
 	{ "diff", cmd_diff },
 	{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE },
 	{ "diff-index", cmd_diff_index, RUN_SETUP },
+	{ "diff-pairs", cmd_diff_pairs, RUN_SETUP },
 	{ "diff-tree", cmd_diff_tree, RUN_SETUP },
 	{ "fast-export", cmd_fast_export, RUN_SETUP },
 	{ "fetch", cmd_fetch, RUN_SETUP },
diff --git a/t/t4063-diff-pairs.sh b/t/t4063-diff-pairs.sh
new file mode 100755
index 000000000..a99ec6655
--- /dev/null
+++ b/t/t4063-diff-pairs.sh
@@ -0,0 +1,82 @@
+#!/bin/sh
+
+test_description='basic diff-pairs tests'
+. ./test-lib.sh
+
+# This creates a diff with added, modified, deleted, renamed, copied, and
+# typechange entries. That includes one in a subdirectory for non-recursive
+# tests, and both exact and inexact similarity scores.
+test_expect_success 'create commit with various diffs' '
+	echo to-be-gone >deleted &&
+	echo original >modified &&
+	echo now-a-file >symlink &&
+	test_seq 200 >two-hundred &&
+	test_seq 201 500 >five-hundred &&
+	git add . &&
+	test_tick &&
+	git commit -m base &&
+	git tag base &&
+
+	echo now-here >added &&
+	echo new >modified &&
+	rm deleted &&
+	mkdir subdir &&
+	echo content >subdir/file &&
+	mv two-hundred renamed &&
+	test_seq 201 500 | sed s/300/modified/ >copied &&
+	rm symlink &&
+	git add -A . &&
+	test_ln_s_add dest symlink &&
+	test_tick &&
+	git commit -m new &&
+	git tag new
+'
+
+test_expect_success 'diff-pairs recreates --raw' '
+	git diff-tree -r -M -C -C base new >expect &&
+	# note that diff-pairs uses the default abbrev,
+	# so we must tweak that for identical output
+	git diff-tree -r -M -C -C -z base new |
+	git diff-pairs --no-abbrev >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'diff-pairs can create -p output' '
+	git diff-tree -p -M -C -C base new >expect &&
+	git diff-tree -r -M -C -C -z base new |
+	git diff-pairs -p >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'non-recursive --raw retains tree entry' '
+	git diff-tree base new >expect &&
+	git diff-tree -z base new |
+	git diff-pairs --no-abbrev >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'split input across multiple diff-pairs' '
+	write_script split-raw-diff "$PERL_PATH" <<-\EOF &&
+	$/ = "\0";
+	while (<>) {
+	  my $meta = $_;
+	  my $path = <>;
+	  # renames have an extra path
+	  my $path2 = <> if $meta =~ /[RC]\d+/;
+
+	  open(my $fh, ">", sprintf "diff%03d", $.);
+	  print $fh $meta, $path, $path2;
+	}
+	EOF
+
+	git diff-tree -p -M -C -C base new >expect &&
+
+	git diff-tree -r -z -M -C -C base new |
+	./split-raw-diff &&
+	for i in diff*; do
+		git diff-pairs -p <$i
+	done >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.11.0.319.g1f4e1e0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] add diff-pairs tool
  2016-12-01 20:40 [RFC/PATCH] add diff-pairs tool Jeff King
@ 2016-12-01 20:52 ` Junio C Hamano
  2016-12-01 20:55   ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2016-12-01 20:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> This takes the output of `diff-tree -z --raw` and feeds it
> back to the later stages of the diff machinery to produce
> diffs in other formats.

A full circle.  This reminds me of the experiment done more than 10
years ago at the beginning of the "diffcore transformations" design.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] add diff-pairs tool
  2016-12-01 20:52 ` Junio C Hamano
@ 2016-12-01 20:55   ` Jeff King
  2016-12-01 20:58     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2016-12-01 20:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Dec 01, 2016 at 12:52:05PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This takes the output of `diff-tree -z --raw` and feeds it
> > back to the later stages of the diff machinery to produce
> > diffs in other formats.
> 
> A full circle.  This reminds me of the experiment done more than 10
> years ago at the beginning of the "diffcore transformations" design.

Heh, I didn't even think to dig for prior art on the list.

In a sense this is just bringing the full power of diffcore out to the
scripting interface. The one missing component is that you can't
actually call diffcore_std() in the middle. The full pipeline would I
guess be something more like:

  git diff-tree --raw -z $a $b |
  git detect-renames |
  git diff-pairs -p

or something. In my model it's sufficient for the rename detection to
happen as part of the first diff-tree (since it's a whole-tree operation
anyway, there's no benefit to breaking it up into chunks).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] add diff-pairs tool
  2016-12-01 20:55   ` Jeff King
@ 2016-12-01 20:58     ` Junio C Hamano
  2016-12-01 21:04       ` Junio C Hamano
  2016-12-01 21:30       ` Jeff King
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2016-12-01 20:58 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Thu, Dec 01, 2016 at 12:52:05PM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > This takes the output of `diff-tree -z --raw` and feeds it
>> > back to the later stages of the diff machinery to produce
>> > diffs in other formats.
>> 
>> A full circle.  This reminds me of the experiment done more than 10
>> years ago at the beginning of the "diffcore transformations" design.
>
> Heh, I didn't even think to dig for prior art on the list.

It took me a while to dig it up because the topic is so old, but

https://public-inbox.org/git/Pine.LNX.4.58.0504251832480.18901@ppc970.osdl.org/

is the thread I had in mind.  The idea of rename detection followed
soon afterwards.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] add diff-pairs tool
  2016-12-01 20:58     ` Junio C Hamano
@ 2016-12-01 21:04       ` Junio C Hamano
  2016-12-01 21:30       ` Jeff King
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2016-12-01 21:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> It took me a while to dig it up because the topic is so old, but
>
> https://public-inbox.org/git/Pine.LNX.4.58.0504251832480.18901@ppc970.osdl.org/
>
> is the thread I had in mind.  The idea of rename detection followed
> soon afterwards.

... which was this one:

https://public-inbox.org/git/7vr7g4m0lz.fsf_-_@assigned-by-dhcp.cox.net/#t


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] add diff-pairs tool
  2016-12-01 20:58     ` Junio C Hamano
  2016-12-01 21:04       ` Junio C Hamano
@ 2016-12-01 21:30       ` Jeff King
  2016-12-01 22:22         ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff King @ 2016-12-01 21:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Dec 01, 2016 at 12:58:21PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Thu, Dec 01, 2016 at 12:52:05PM -0800, Junio C Hamano wrote:
> >
> >> Jeff King <peff@peff.net> writes:
> >> 
> >> > This takes the output of `diff-tree -z --raw` and feeds it
> >> > back to the later stages of the diff machinery to produce
> >> > diffs in other formats.
> >> 
> >> A full circle.  This reminds me of the experiment done more than 10
> >> years ago at the beginning of the "diffcore transformations" design.
> >
> > Heh, I didn't even think to dig for prior art on the list.
> 
> It took me a while to dig it up because the topic is so old, but
> 
> https://public-inbox.org/git/Pine.LNX.4.58.0504251832480.18901@ppc970.osdl.org/
> 
> is the thread I had in mind.  The idea of rename detection followed
> soon afterwards.

Thanks for an interesting read. Your diff-tree-helper patch is very
similar to what I wrote.

I do think the right decision was made back then. The single-process
model is much more efficient, and it was over 10 years until somebody
actually wanted to expose the functionality to a script (and even now,
I'm not convinced enough people want it to even merit inclusion).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] add diff-pairs tool
  2016-12-01 21:30       ` Jeff King
@ 2016-12-01 22:22         ` Junio C Hamano
  2016-12-01 22:30           ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2016-12-01 22:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

>> It took me a while to dig it up because the topic is so old, but
>> 
>> https://public-inbox.org/git/Pine.LNX.4.58.0504251832480.18901@ppc970.osdl.org/
>> 
>> is the thread I had in mind.  The idea of rename detection followed
>> soon afterwards.
>
> Thanks for an interesting read. Your diff-tree-helper patch is very
> similar to what I wrote.
>
> I do think the right decision was made back then. The single-process
> model is much more efficient, and it was over 10 years until somebody
> actually wanted to expose the functionality to a script (and even now,
> I'm not convinced enough people want it to even merit inclusion).

Well, 10 years ago the person in the thread who argued "who cares
about producing patches?  each step in plumbing should do one thing
and one thing only and do so well" was Linus, so your coming up with
the diff-tree-helper again may indicate that we may want to step
back and retry the experiment again, perhaps?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] add diff-pairs tool
  2016-12-01 22:22         ` Junio C Hamano
@ 2016-12-01 22:30           ` Jeff King
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2016-12-01 22:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Dec 01, 2016 at 02:22:47PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> It took me a while to dig it up because the topic is so old, but
> >> 
> >> https://public-inbox.org/git/Pine.LNX.4.58.0504251832480.18901@ppc970.osdl.org/
> >> 
> >> is the thread I had in mind.  The idea of rename detection followed
> >> soon afterwards.
> >
> > Thanks for an interesting read. Your diff-tree-helper patch is very
> > similar to what I wrote.
> >
> > I do think the right decision was made back then. The single-process
> > model is much more efficient, and it was over 10 years until somebody
> > actually wanted to expose the functionality to a script (and even now,
> > I'm not convinced enough people want it to even merit inclusion).
> 
> Well, 10 years ago the person in the thread who argued "who cares
> about producing patches?  each step in plumbing should do one thing
> and one thing only and do so well" was Linus, so your coming up with
> the diff-tree-helper again may indicate that we may want to step
> back and retry the experiment again, perhaps?

I think there are two questions, looking historically.

One is whether the functionality should be exposed to scripts at all.

The second is, assuming it should be exposed, in which order to do it.
You can write a series of small scripts, and then tie them together. Or
you can write tie it all together in C, and then make specific helpers
to expose the various bits.

The advantage of the first technique is that the tools are used
consistently by all parts of the system, so you know they don't grow
weird bugs or fail to handle corner cases. The advantage of the second
is that most people want the "tied-together" functionality, and it can
run a lot faster in-process.

So mostly I was suggesting that the right decision 10 years ago was to
optimize for speed in the common case, and let people worry later about
whether they wanted to expose the functionality in more flexible ways.
And that brings us to today.

It sounds like you are in favor of adding diff-pairs (and certainly it
shouldn't _hurt_ anybody if they are not interested in using it; you'll
notice the patch didn't need to touch the diff code at all).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-12-01 22:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-01 20:40 [RFC/PATCH] add diff-pairs tool Jeff King
2016-12-01 20:52 ` Junio C Hamano
2016-12-01 20:55   ` Jeff King
2016-12-01 20:58     ` Junio C Hamano
2016-12-01 21:04       ` Junio C Hamano
2016-12-01 21:30       ` Jeff King
2016-12-01 22:22         ` Junio C Hamano
2016-12-01 22:30           ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).