[RFC/PATCH 0/3] teach --histogram to diff

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [RFC/PATCH 0/3] teach --histogram to diff
@ 2011-07-12  6:10 Tay Ray Chuan
  2011-07-12  6:10 ` [RFC/PATCH 1/3] " Tay Ray Chuan
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Tay Ray Chuan @ 2011-07-12  6:10 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Shawn O. Pearce

(Shawn, I was held up with the patch messages, sorry for the delay.)

Port JGit's HistogramDiff(Index) over to C. This algorithm extends the
patience algorithm to "support low-occurrence common elements" [1].

Rough numbers show that it is a faster alternative to its --patience
cousin, as well as to the default Meyers algorithm:

  $ time ./git log --histogram -p v1.0.0 >/dev/null
  
  real    0m12.998s
  user    0m11.506s
  sys     0m1.487s
  $ time ./git log -p v1.0.0 >/dev/null
  
  real    0m13.575s
  user    0m12.101s
  sys     0m1.468s
  $ time ./git log --patience -p v1.0.0 >/dev/null
  
  real    0m14.978s
  user    0m13.508s
  sys     0m1.464s

The first patch implements JGit's HistogramDiff(Index) proper. The
second and third patches aren't essential but yield performance gains.

Note: these patches must be applied on top of rc/histogram-diff in pu.

Contents:
[RFC/PATCH 1/3] teach --histogram to diff
[RFC/PATCH 2/3] xdiff/xprepare: skip classification
[RFC/PATCH 3/3] xdiff/xprepare: use a smaller sample size for histogram

 Makefile                  |    2 +-
 diff.c                    |    2 +
 merge-recursive.c         |    2 +
 t/t4048-diff-histogram.sh |   12 ++
 xdiff/xdiff.h             |    1 +
 xdiff/xdiffi.c            |    3 +
 xdiff/xdiffi.h            |    2 +
 xdiff/xhistogram.c        |  384 +++++++++++++++++++++++++++++++++++++++++++++
 xdiff/xprepare.c          |   41 ++++--
 xdiff/xutils.c            |    8 +-
 xdiff/xutils.h            |    2 +-
 11 files changed, 440 insertions(+), 19 deletions(-)
 create mode 100755 t/t4048-diff-histogram.sh
 create mode 100644 xdiff/xhistogram.c

--
Footnotes:
[1] http://egit.eclipse.org/w/?p=jgit.git;a=blob;f=org.eclipse.jgit/src/org/eclipse/jgit/diff/HistogramDiff.java;hb=HEAD

-- 
1.7.3.4.681.gb718e

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC/PATCH 1/3] teach --histogram to diff
  2011-07-12  6:10 [RFC/PATCH 0/3] teach --histogram to diff Tay Ray Chuan
@ 2011-07-12  6:10 ` Tay Ray Chuan
  2011-07-12  6:10   ` [RFC/PATCH 2/3] xdiff/xprepare: skip classification Tay Ray Chuan
  2011-07-12 19:56   ` [RFC/PATCH 1/3] teach --histogram to diff Junio C Hamano
  2011-07-12 14:19 ` [RFC/PATCH 0/3] " Shawn Pearce
  2011-08-01  3:16 ` [PATCH v2 0/8] " Tay Ray Chuan
  2 siblings, 2 replies; 24+ messages in thread
From: Tay Ray Chuan @ 2011-07-12  6:10 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Shawn O. Pearce

Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show
that it is faster than its --patience cousin, as well as the default
Meyers algorithm.

The implementation has been reworked to use structs and pointers,
instead of bitmasks, thus doing away with JGit's 2^28 line limit.

We also use xdiff's default hash table implementation (xdl_hash_bits()
with XDL_HASHLONG()) for convenience.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 Makefile                  |    2 +-
 diff.c                    |    2 +
 merge-recursive.c         |    2 +
 t/t4048-diff-histogram.sh |   12 ++
 xdiff/xdiff.h             |    1 +
 xdiff/xdiffi.c            |    3 +
 xdiff/xdiffi.h            |    2 +
 xdiff/xhistogram.c        |  384 +++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 407 insertions(+), 1 deletions(-)
 create mode 100755 t/t4048-diff-histogram.sh
 create mode 100644 xdiff/xhistogram.c

diff --git a/Makefile b/Makefile
index 775ee83..9930e6b 100644
--- a/Makefile
+++ b/Makefile
@@ -1836,7 +1836,7 @@ ifndef NO_CURL
 	GIT_OBJS += http.o http-walker.o remote-curl.o
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
-	xdiff/xmerge.o xdiff/xpatience.o
+	xdiff/xmerge.o xdiff/xpatience.o xdiff/xhistogram.o
 VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
 	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/svndump.o
 VCSSVN_TEST_OBJS = test-obj-pool.o test-string-pool.o \
diff --git a/diff.c b/diff.c
index 5422c43..a113294 100644
--- a/diff.c
+++ b/diff.c
@@ -3186,6 +3186,8 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--patience"))
 		DIFF_XDL_SET(options, PATIENCE_DIFF);
+	else if (!strcmp(arg, "--histogram"))
+		DIFF_XDL_SET(options, HISTOGRAM_DIFF);
 
 	/* flags options */
 	else if (!strcmp(arg, "--binary")) {
diff --git a/merge-recursive.c b/merge-recursive.c
index 16c2dbe..3e8267b 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1773,6 +1773,8 @@ int parse_merge_opt(struct merge_options *o, const char *s)
 		o->subtree_shift = s + strlen("subtree=");
 	else if (!strcmp(s, "patience"))
 		o->xdl_opts |= XDF_PATIENCE_DIFF;
+	else if (!strcmp(s, "histogram"))
+		o->xdl_opts |= XDF_HISTOGRAM_DIFF;
 	else if (!strcmp(s, "ignore-space-change"))
 		o->xdl_opts |= XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(s, "ignore-all-space"))
diff --git a/t/t4048-diff-histogram.sh b/t/t4048-diff-histogram.sh
new file mode 100755
index 0000000..fd3e86a
--- /dev/null
+++ b/t/t4048-diff-histogram.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+test_description='histogram diff algorithm'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-diff-alternative.sh
+
+test_diff_frobnitz "histogram"
+
+test_diff_unique "histogram"
+
+test_done
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 711048e..c26170c 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -33,6 +33,7 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
 #define XDF_PATIENCE_DIFF (1 << 5)
+#define XDF_HISTOGRAM_DIFF (1 << 6)
 #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
 
 #define XDL_PATCH_NORMAL '-'
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index da67c04..75a3922 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -331,6 +331,9 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	if (xpp->flags & XDF_PATIENCE_DIFF)
 		return xdl_do_patience_diff(mf1, mf2, xpp, xe);
 
+	if (xpp->flags & XDF_HISTOGRAM_DIFF)
+		return xdl_do_histogram_diff(mf1, mf2, xpp, xe);
+
 	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
 
 		return -1;
diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h
index ad033a8..7a92ea9 100644
--- a/xdiff/xdiffi.h
+++ b/xdiff/xdiffi.h
@@ -57,5 +57,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg);
 int xdl_do_patience_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		xdfenv_t *env);
+int xdl_do_histogram_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
+		xdfenv_t *env);
 
 #endif /* #if !defined(XDIFFI_H) */
diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
new file mode 100644
index 0000000..391333a
--- /dev/null
+++ b/xdiff/xhistogram.c
@@ -0,0 +1,384 @@
+/*
+ * Copyright (C) 2010, Google Inc.
+ * and other copyright owners as documented in JGit's IP log.
+ *
+ * This program and the accompanying materials are made available
+ * under the terms of the Eclipse Distribution License v1.0 which
+ * accompanies this distribution, is reproduced below, and is
+ * available at http://www.eclipse.org/org/documents/edl-v10.php
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Eclipse Foundation, Inc. nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "xinclude.h"
+#include "xtypes.h"
+#include "xdiff.h"
+
+#define MAX_PTR	UINT_MAX
+#define MAX_CNT	UINT_MAX
+
+#define LINE_END(n) (line##n + count##n - 1)
+#define LINE_END_PTR(n) (*line##n + *count##n - 1)
+
+struct histindex {
+	struct record {
+		unsigned int ptr, cnt;
+		struct record *next;
+	} **records, /* an ocurrence */
+	  **line_map; /* map of line to record chain */
+	chastore_t rcha;
+	unsigned int *next_ptrs;
+	unsigned int table_bits,
+		     records_size,
+		     line_map_size;
+
+	unsigned int max_chain_length,
+		     key_shift,
+		     ptr_shift;
+
+	unsigned int cnt,
+		     has_common;
+
+	xdfenv_t *env;
+	xpparam_t const *xpp;
+};
+
+struct region {
+	unsigned int begin1, end1;
+	unsigned int begin2, end2;
+};
+
+#define LINE_MAP(i, a) (i->line_map[(a) - i->ptr_shift])
+
+#define NEXT_PTR(index, ptr) \
+	(index->next_ptrs[(ptr) - index->ptr_shift])
+
+#define CNT(index, ptr) \
+	((LINE_MAP(index, ptr))->cnt)
+
+#define REC(env, s, l) \
+	(env->xdf##s.recs[l - 1])
+
+static int cmp_recs(xpparam_t const *xpp,
+	xrecord_t *r1, xrecord_t *r2)
+{
+	return r1->ha == r2->ha &&
+		xdl_recmatch(r1->ptr, r1->size, r2->ptr, r2->size,
+			    xpp->flags);
+}
+
+#define CMP_ENV(xpp, env, s1, l1, s2, l2) \
+	(cmp_recs(xpp, REC(env, s1, l1), REC(env, s2, l2)))
+
+#define CMP(i, s1, l1, s2, l2) \
+	(CMP_ENV(i->xpp, i->env, s1, l1, s2, l2))
+
+#define TABLE_HASH(index, side, line) \
+	XDL_HASHLONG((REC(index->env, side, line))->ha, index->table_bits)
+
+static int scanA(struct histindex *index, int line1, int count1)
+{
+	unsigned int ptr, tbl_idx;
+	unsigned int chain_len;
+	struct record **rec_chain, *rec;
+
+	for (ptr = LINE_END(1); line1 <= ptr; ptr--) {
+		tbl_idx = TABLE_HASH(index, 1, ptr);
+		rec_chain = index->records + tbl_idx;
+		rec = *rec_chain;
+
+		chain_len = 0;
+		while (rec) {
+			if (CMP(index, 1, rec->ptr, 1, ptr)) {
+				/*
+				 * ptr is identical to another element. Insert
+				 * it onto the front of the existing element
+				 * chain.
+				 */
+				NEXT_PTR(index, ptr) = rec->ptr;
+				rec->ptr = ptr;
+				/* cap rec->cnt at MAX_CNT */
+				rec->cnt = XDL_MIN(MAX_CNT, rec->cnt + 1);
+				LINE_MAP(index, ptr) = rec;
+				goto continue_scan;
+			}
+
+			rec = rec->next;
+			chain_len++;
+		}
+
+		if (chain_len == index->max_chain_length)
+			return -1;
+
+		/*
+		 * This is the first time we have ever seen this particular
+		 * element in the sequence. Construct a new chain for it.
+		 */
+		if (!(rec = xdl_cha_alloc(&index->rcha)))
+			return -1;
+		rec->ptr = ptr;
+		rec->cnt = 1;
+		rec->next = *rec_chain;
+		*rec_chain = rec;
+		LINE_MAP(index, ptr) = rec;
+
+continue_scan:
+		; /* no op */
+	}
+
+	return 0;
+}
+
+static int try_lcs(struct histindex *index, struct region *lcs, int b_ptr,
+	int line1, int count1, int line2, int count2)
+{
+	unsigned int b_next = b_ptr + 1;
+	struct record *rec = index->records[TABLE_HASH(index, 2, b_ptr)];
+	unsigned int as, ae, bs, be, np, rc;
+	int should_break;
+
+	for (; rec; rec = rec->next) {
+		if (rec->cnt > index->cnt) {
+			if (!index->has_common)
+				index->has_common = CMP(index, 1, rec->ptr, 2, b_ptr);
+			continue;
+		}
+
+		as = rec->ptr;
+		if (!CMP(index, 1, as, 2, b_ptr))
+			continue;
+
+		index->has_common = 1;
+		for (;;) {
+			should_break = 0;
+			np = NEXT_PTR(index, as);
+			bs = b_ptr;
+			ae = as;
+			be = bs;
+			rc = rec->cnt;
+
+			while (line1 < as && line2 < bs
+				&& CMP(index, 1, as - 1, 2, bs - 1)) {
+				as--;
+				bs--;
+				if (1 < rc)
+					rc = XDL_MIN(rc, CNT(index, as));
+			}
+			while (ae < LINE_END(1) && be < LINE_END(2)
+				&& CMP(index, 1, ae + 1, 2, be + 1)) {
+				ae++;
+				be++;
+				if (1 < rc)
+					rc = XDL_MIN(rc, CNT(index, ae));
+			}
+
+			if (b_next <= be)
+				b_next = be + 1;
+			if (lcs->end1 - lcs->begin1 < ae - as || rc < index->cnt) {
+				lcs->begin1 = as;
+				lcs->begin2 = bs;
+				lcs->end1 = ae;
+				lcs->end2 = be;
+				index->cnt = rc;
+			}
+
+			if (np == 0)
+				break;
+
+			while (np <= ae) {
+				np = NEXT_PTR(index, np);
+				if (np == 0) {
+					should_break = 1;
+					break;
+				}
+			}
+
+			if (should_break)
+				break;
+
+			as = np;
+		}
+	}
+	return b_next;
+}
+
+static int find_lcs(struct histindex *index, struct region *lcs,
+	int line1, int count1, int line2, int count2) {
+	int b_ptr;
+
+	if (scanA(index, line1, count1))
+		return -1;
+
+	index->cnt = index->max_chain_length + 1;
+
+	for (b_ptr = line2; b_ptr <= LINE_END(2); )
+		b_ptr = try_lcs(index, lcs, b_ptr, line1, count1, line2, count2);
+
+	return index->has_common && index->max_chain_length < index->cnt;
+}
+
+static void reduce_common_start_end(xpparam_t const *xpp, xdfenv_t *env,
+	int *line1, int *count1, int *line2, int *count2)
+{
+	if (*count1 <= 1 || *count2 <= 1)
+		return;
+	while (*count1 > 1 && *count2 > 1 && CMP_ENV(xpp, env, 1, *line1, 2, *line2)) {
+		(*line1)++;
+		(*count1)--;
+		(*line2)++;
+		(*count2)--;
+	}
+	while (*count1 > 1 && *count2 > 1 && CMP_ENV(xpp, env, 1, LINE_END_PTR(1), 2, LINE_END_PTR(2))) {
+		(*count1)--;
+		(*count2)--;
+	}
+}
+
+static int fall_back_to_classic_diff(struct histindex *index,
+		int line1, int count1, int line2, int count2)
+{
+	xpparam_t xpp;
+	xpp.flags = index->xpp->flags & ~XDF_HISTOGRAM_DIFF;
+
+	return xdl_fall_back_diff(index->env, &xpp,
+				  line1, count1, line2, count2);
+}
+
+static int histogram_diff(xpparam_t const *xpp, xdfenv_t *env,
+	int line1, int count1, int line2, int count2)
+{
+	struct histindex index;
+	struct region lcs;
+	int sz;
+	int result = -1;
+
+	if (count1 <= 0 && count2 <= 0)
+		return 0;
+
+	if (LINE_END(1) >= MAX_PTR)
+		return -1;
+
+	if (!count1) {
+		while(count2--)
+			env->xdf2.rchg[line2++ - 1] = 1;
+		return 0;
+	} else if (!count2) {
+		while(count1--)
+			env->xdf1.rchg[line1++ - 1] = 1;
+		return 0;
+	}
+
+	memset(&index, 0, sizeof(index));
+
+	index.env = env;
+	index.xpp = xpp;
+
+	index.records = NULL;
+	index.line_map = NULL;
+	/* in case of early xdl_cha_free() */
+	index.rcha.head = NULL;
+
+	index.table_bits = xdl_hashbits(count1);
+	sz = index.records_size = 1 << index.table_bits;
+	sz *= sizeof(struct record *);
+	if (!(index.records = (struct record **) xdl_malloc(sz)))
+		goto cleanup;
+	memset(index.records, 0, sz);
+
+	sz = index.line_map_size = count1;
+	sz *= sizeof(struct record *);
+	if (!(index.line_map = (struct record **) xdl_malloc(sz)))
+		goto cleanup;
+	memset(index.line_map, 0, sz);
+
+	sz = index.line_map_size;
+	sz *= sizeof(unsigned int);
+	if (!(index.next_ptrs = (unsigned int *) xdl_malloc(sz)))
+		goto cleanup;
+	memset(index.next_ptrs, 0, sz);
+
+	/* lines / 4 + 1 comes from xprepare.c:xdl_prepare_ctx() */
+	if (xdl_cha_init(&index.rcha, sizeof(struct record), count1 / 4 + 1) < 0)
+		goto cleanup;
+
+	index.ptr_shift = line1;
+	index.max_chain_length = 64;
+
+	memset(&lcs, 0, sizeof(lcs));
+	if (find_lcs(&index, &lcs, line1, count1, line2, count2))
+		result = fall_back_to_classic_diff(&index, line1, count1, line2, count2);
+	else {
+		result = 0;
+		if (lcs.begin1 == 0 && lcs.begin2 == 0) {
+			int ptr;
+			for (ptr = 0; ptr < count1; ptr++)
+				env->xdf1.rchg[line1 + ptr - 1] = 1;
+			for (ptr = 0; ptr < count2; ptr++)
+				env->xdf2.rchg[line2 + ptr - 1] = 1;
+		} else {
+			result = histogram_diff(xpp, env,
+				line1, lcs.begin1 - line1,
+				line2, lcs.begin2 - line2);
+			result = histogram_diff(xpp, env,
+				lcs.end1 + 1, LINE_END(1) - lcs.end1,
+				lcs.end2 + 1, LINE_END(2) - lcs.end2);
+			result *= -1;
+		}
+	}
+
+cleanup:
+	xdl_free(index.records);
+	xdl_free(index.line_map);
+	xdl_free(index.next_ptrs);
+	xdl_cha_free(&index.rcha);
+
+	return result;
+}
+
+int xdl_do_histogram_diff(mmfile_t *file1, mmfile_t *file2,
+	xpparam_t const *xpp, xdfenv_t *env)
+{
+	int line1, line2, count1, count2;
+
+	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
+		return -1;
+
+	line1 = line2 = 1;
+	count1 = env->xdf1.nrec;
+	count2 = env->xdf2.nrec;
+
+	reduce_common_start_end(xpp, env, &line1, &count1, &line2, &count2);
+
+	return histogram_diff(xpp, env, line1, count1, line2, count2);
+}
-- 
1.7.3.4.681.gb718e

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC/PATCH 2/3] xdiff/xprepare: skip classification
  2011-07-12  6:10 ` [RFC/PATCH 1/3] " Tay Ray Chuan
@ 2011-07-12  6:10   ` Tay Ray Chuan
  2011-07-12  6:10     ` [RFC/PATCH 3/3] xdiff/xprepare: use a smaller sample size for histogram diff Tay Ray Chuan
  2011-07-12 19:56   ` [RFC/PATCH 1/3] teach --histogram to diff Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-07-12  6:10 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Shawn O. Pearce

xdiff performs "classification" of records (xdl_classify_record()),
replacing hashes (xrecord_t.ha) with a unique identifier of the
record/line and building a hash table (xrecord_t.rhash) of records. This
is then used to "cleanup" records (xdl_cleanup_records()).

We don't need any of that in histogram diff, so we omit calls to these
functions. We also skip allocating memory to the hash table, rhash, as
it is no longer used.

This gives us a small boost in performance.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 xdiff/xprepare.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 0f571db..7556538 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -154,11 +154,15 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 	if (!(recs = (xrecord_t **) xdl_malloc(narec * sizeof(xrecord_t *))))
 		goto abort;
 
-	hbits = xdl_hashbits((unsigned int) narec);
-	hsize = 1 << hbits;
-	if (!(rhash = (xrecord_t **) xdl_malloc(hsize * sizeof(xrecord_t *))))
-		goto abort;
-	memset(rhash, 0, hsize * sizeof(xrecord_t *));
+	if (xpp->flags & XDF_HISTOGRAM_DIFF)
+		hbits = hsize = 0;
+	else {
+		hbits = xdl_hashbits((unsigned int) narec);
+		hsize = 1 << hbits;
+		if (!(rhash = (xrecord_t **) xdl_malloc(hsize * sizeof(xrecord_t *))))
+			goto abort;
+		memset(rhash, 0, hsize * sizeof(xrecord_t *));
+	}
 
 	nrec = 0;
 	if ((cur = blk = xdl_mmfile_first(mf, &bsize)) != NULL) {
@@ -183,7 +187,8 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 			crec->ha = hav;
 			recs[nrec++] = crec;
 
-			if (xdl_classify_record(cf, rhash, hbits, crec) < 0)
+			if (!(xpp->flags & XDF_HISTOGRAM_DIFF) &&
+				xdl_classify_record(cf, rhash, hbits, crec) < 0)
 				goto abort;
 		}
 	}
@@ -240,7 +245,8 @@ int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	enl1 = xdl_guess_lines(mf1) + 1;
 	enl2 = xdl_guess_lines(mf2) + 1;
 
-	if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) {
+	if (!(xpp->flags & XDF_HISTOGRAM_DIFF) &&
+		xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) {
 
 		return -1;
 	}
@@ -257,9 +263,11 @@ int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		return -1;
 	}
 
-	xdl_free_classifier(&cf);
+	if (!(xpp->flags & XDF_HISTOGRAM_DIFF))
+		xdl_free_classifier(&cf);
 
 	if (!(xpp->flags & XDF_PATIENCE_DIFF) &&
+			!(xpp->flags & XDF_HISTOGRAM_DIFF) &&
 			xdl_optimize_ctxs(&xe->xdf1, &xe->xdf2) < 0) {
 
 		xdl_free_ctx(&xe->xdf2);
-- 
1.7.3.4.681.gb718e

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC/PATCH 3/3] xdiff/xprepare: use a smaller sample size for histogram diff
  2011-07-12  6:10   ` [RFC/PATCH 2/3] xdiff/xprepare: skip classification Tay Ray Chuan
@ 2011-07-12  6:10     ` Tay Ray Chuan
  0 siblings, 0 replies; 24+ messages in thread
From: Tay Ray Chuan @ 2011-07-12  6:10 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Shawn O. Pearce

For histogram diff, we can afford a smaller sample size and thus a
poorer estimate of the number of lines, as the hash table (rhash) won't
be filled up/grown. This is safe as the final count of lines (xdf.nrecs)
will be updated correctly anyway by xdl_prepare_ctx().

This gives us a small boost in performance.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 xdiff/xprepare.c |   17 ++++++++++++++---
 xdiff/xutils.c   |    8 ++------
 xdiff/xutils.h   |    2 +-
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 7556538..dfbb0de 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -26,6 +26,8 @@
 #define XDL_KPDIS_RUN 4
 #define XDL_MAX_EQLIMIT 1024
 #define XDL_SIMSCAN_WINDOW 100
+#define XDL_GUESS_NLINES1 256
+#define XDL_GUESS_NLINES2 20
 
 
 typedef struct s_xdlclass {
@@ -239,11 +241,20 @@ static void xdl_free_ctx(xdfile_t *xdf) {
 
 int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		    xdfenv_t *xe) {
-	long enl1, enl2;
+	long enl1, enl2, sample;
 	xdlclassifier_t cf;
 
-	enl1 = xdl_guess_lines(mf1) + 1;
-	enl2 = xdl_guess_lines(mf2) + 1;
+	/*
+	 * For histogram diff, we can afford a smaller sample size and
+	 * thus a poorer estimate of the number of lines, as the hash
+	 * table (rhash) won't be filled up/grown. The number of lines
+	 * (nrecs) will be updated correctly anyway by
+	 * xdl_prepare_ctx().
+	 */
+	sample = xpp->flags & XDF_HISTOGRAM_DIFF ? XDL_GUESS_NLINES2 : XDL_GUESS_NLINES1;
+
+	enl1 = xdl_guess_lines(mf1, sample) + 1;
+	enl2 = xdl_guess_lines(mf2, sample) + 1;
 
 	if (!(xpp->flags & XDF_HISTOGRAM_DIFF) &&
 		xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) {
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index ded7c32..a45e89b 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -24,10 +24,6 @@
 
 
 
-#define XDL_GUESS_NLINES 256
-
-
-
 
 long xdl_bogosqrt(long n) {
 	long i;
@@ -159,12 +155,12 @@ void *xdl_cha_next(chastore_t *cha) {
 }
 
 
-long xdl_guess_lines(mmfile_t *mf) {
+long xdl_guess_lines(mmfile_t *mf, long sample) {
 	long nl = 0, size, tsize = 0;
 	char const *data, *cur, *top;
 
 	if ((cur = data = xdl_mmfile_first(mf, &size)) != NULL) {
-		for (top = data + size; nl < XDL_GUESS_NLINES;) {
+		for (top = data + size; nl < sample;) {
 			if (cur >= top) {
 				tsize += (long) (cur - data);
 				if (!(cur = data = xdl_mmfile_next(mf, &size)))
diff --git a/xdiff/xutils.h b/xdiff/xutils.h
index 674a657..714719a 100644
--- a/xdiff/xutils.h
+++ b/xdiff/xutils.h
@@ -33,7 +33,7 @@ void xdl_cha_free(chastore_t *cha);
 void *xdl_cha_alloc(chastore_t *cha);
 void *xdl_cha_first(chastore_t *cha);
 void *xdl_cha_next(chastore_t *cha);
-long xdl_guess_lines(mmfile_t *mf);
+long xdl_guess_lines(mmfile_t *mf, long sample);
 int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags);
 unsigned long xdl_hash_record(char const **data, char const *top, long flags);
 unsigned int xdl_hashbits(unsigned int size);
-- 
1.7.3.4.681.gb718e

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC/PATCH 0/3] teach --histogram to diff
  2011-07-12  6:10 [RFC/PATCH 0/3] teach --histogram to diff Tay Ray Chuan
  2011-07-12  6:10 ` [RFC/PATCH 1/3] " Tay Ray Chuan
@ 2011-07-12 14:19 ` Shawn Pearce
  2011-07-12 17:43   ` Junio C Hamano
  2011-07-13 16:34   ` Tay Ray Chuan
  2011-08-01  3:16 ` [PATCH v2 0/8] " Tay Ray Chuan
  2 siblings, 2 replies; 24+ messages in thread
From: Shawn Pearce @ 2011-07-12 14:19 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: Git Mailing List

On Mon, Jul 11, 2011 at 23:10, Tay Ray Chuan <rctay89@gmail.com> wrote:
> (Shawn, I was held up with the patch messages, sorry for the delay.)
>
> Port JGit's HistogramDiff(Index) over to C. This algorithm extends the
> patience algorithm to "support low-occurrence common elements" [1].
>
> Rough numbers show that it is a faster alternative to its --patience
> cousin, as well as to the default Meyers algorithm:
>
>  $ time ./git log --histogram -p v1.0.0 >/dev/null
>
>  real    0m12.998s
>  user    0m11.506s
>  sys     0m1.487s
>  $ time ./git log -p v1.0.0 >/dev/null
>
>  real    0m13.575s
>  user    0m12.101s
>  sys     0m1.468s
>  $ time ./git log --patience -p v1.0.0 >/dev/null
>
>  real    0m14.978s
>  user    0m13.508s
>  sys     0m1.464s

Nice!

Not the big difference that it is for us in JGit (between histogram
and Myers), but its nice to see an improvement here, even if it is
only 0.5s for the entire 1.0.0 history. How do the diffs come out? One
of the arguments for patience diff is the formatting can sometimes be
more readable for certain changes, but its slower. Histogram tries to
apply a similar algorithm as patience in order to get the formatting
benefits, but also some performance improvements.

Have you looked at a patch that differs in output between Myers and
patience, and then compared those to the histogram version?

> The first patch implements JGit's HistogramDiff(Index) proper. The
> second and third patches aren't essential but yield performance gains.
...
> [RFC/PATCH 1/3] teach --histogram to diff
> [RFC/PATCH 2/3] xdiff/xprepare: skip classification
> [RFC/PATCH 3/3] xdiff/xprepare: use a smaller sample size for histogram

Do we need sampling at all for histogram? Can you skip it?

-- 
Shawn.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC/PATCH 0/3] teach --histogram to diff
  2011-07-12 14:19 ` [RFC/PATCH 0/3] " Shawn Pearce
@ 2011-07-12 17:43   ` Junio C Hamano
  2011-07-13 16:35     ` Tay Ray Chuan
  2011-07-13 16:34   ` Tay Ray Chuan
  1 sibling, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2011-07-12 17:43 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Tay Ray Chuan, Git Mailing List

Shawn Pearce <spearce@spearce.org> writes:

> Have you looked at a patch that differs in output between Myers and
> patience, and then compared those to the histogram version?

$ git log -p v1.6.6..v1.7.6 | git patch-id >/var/tmp/md
$ git log --histogram -p v1.6.6..v1.7.6 | git patch-id >/var/tmp/hd
$ diff -u0 /var/tmp/md /var/tmp/hd |
  sed -ne '/^+/s/^+[0-9a-f][0-9a-f][0-9a-f]* //p' |
  while read commit
  do
  	git show "$commit" >/var/tmp/1
        git show --histogram "$commit" >/var/tmp/2
        interdiff /var/tmp/1 /var/tmp/2
  done

shows there is one that gives vastly different appearance, but it all
boils down to which lines to take as common, and for this particular
example neither is more readable over the other (9560808f2ef5a34d2a).

Running the above "show" with larger -U$n value shows there don't seem to
be any discrepancies between the two.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC/PATCH 1/3] teach --histogram to diff
  2011-07-12  6:10 ` [RFC/PATCH 1/3] " Tay Ray Chuan
  2011-07-12  6:10   ` [RFC/PATCH 2/3] xdiff/xprepare: skip classification Tay Ray Chuan
@ 2011-07-12 19:56   ` Junio C Hamano
  2011-07-13 16:36     ` Tay Ray Chuan
  1 sibling, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2011-07-12 19:56 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: Git Mailing List, Shawn O. Pearce

This is just half-a-review (bottom half of the file).

> +static int histogram_diff(xpparam_t const *xpp, xdfenv_t *env,
> +	int line1, int count1, int line2, int count2)
> +{

What does this function return?

> +	struct histindex index;
> +	struct region lcs;
> +	int sz;
> +	int result = -1;
> +
> +	if (count1 <= 0 && count2 <= 0)
> +		return 0;
> +
> +	if (LINE_END(1) >= MAX_PTR)
> +		return -1;
> +
> +	if (!count1) {
> +		while(count2--)
> +			env->xdf2.rchg[line2++ - 1] = 1;
> +		return 0;
> +	} else if (!count2) {
> +		while(count1--)
> +			env->xdf1.rchg[line1++ - 1] = 1;
> +		return 0;
> +	}
> +
> +	memset(&index, 0, sizeof(index));
> +
> +	index.env = env;
> +	index.xpp = xpp;
> +
> +	index.records = NULL;
> +	index.line_map = NULL;
> +	/* in case of early xdl_cha_free() */
> +	index.rcha.head = NULL;
> +
> +	index.table_bits = xdl_hashbits(count1);
> +	sz = index.records_size = 1 << index.table_bits;
> +	sz *= sizeof(struct record *);
> +	if (!(index.records = (struct record **) xdl_malloc(sz)))
> +		goto cleanup;
> +	memset(index.records, 0, sz);
> +
> +	sz = index.line_map_size = count1;
> +	sz *= sizeof(struct record *);
> +	if (!(index.line_map = (struct record **) xdl_malloc(sz)))
> +		goto cleanup;
> +	memset(index.line_map, 0, sz);
> +
> +	sz = index.line_map_size;
> +	sz *= sizeof(unsigned int);
> +	if (!(index.next_ptrs = (unsigned int *) xdl_malloc(sz)))
> +		goto cleanup;
> +	memset(index.next_ptrs, 0, sz);
> +
> +	/* lines / 4 + 1 comes from xprepare.c:xdl_prepare_ctx() */
> +	if (xdl_cha_init(&index.rcha, sizeof(struct record), count1 / 4 + 1) < 0)
> +		goto cleanup;
> +
> +	index.ptr_shift = line1;
> +	index.max_chain_length = 64;
> +
> +	memset(&lcs, 0, sizeof(lcs));
> +	if (find_lcs(&index, &lcs, line1, count1, line2, count2))
> +		result = fall_back_to_classic_diff(&index, line1, count1, line2, count2);
> +	else {
> +		result = 0;
> +		if (lcs.begin1 == 0 && lcs.begin2 == 0) {
> +			int ptr;
> +			for (ptr = 0; ptr < count1; ptr++)
> +				env->xdf1.rchg[line1 + ptr - 1] = 1;
> +			for (ptr = 0; ptr < count2; ptr++)
> +				env->xdf2.rchg[line2 + ptr - 1] = 1;
> +		} else {
> +			result = histogram_diff(xpp, env,
> +				line1, lcs.begin1 - line1,
> +				line2, lcs.begin2 - line2);
> +			result = histogram_diff(xpp, env,
> +				lcs.end1 + 1, LINE_END(1) - lcs.end1,
> +				lcs.end2 + 1, LINE_END(2) - lcs.end2);

The result from the first half before lcs is discarded?

> +			result *= -1;

Again, what does this function (called recursively) return, and what does
flipping the sign of it do?

> +		}
> +	}
> +
> +cleanup:
> +	xdl_free(index.records);
> +	xdl_free(index.line_map);
> +	xdl_free(index.next_ptrs);
> +	xdl_cha_free(&index.rcha);
> +
> +	return result;
> +}
> +
> +int xdl_do_histogram_diff(mmfile_t *file1, mmfile_t *file2,
> +	xpparam_t const *xpp, xdfenv_t *env)
> +{
> +	int line1, line2, count1, count2;
> +
> +	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
> +		return -1;
> +
> +	line1 = line2 = 1;
> +	count1 = env->xdf1.nrec;
> +	count2 = env->xdf2.nrec;
> +
> +	reduce_common_start_end(xpp, env, &line1, &count1, &line2, &count2);

What this does is logically not specific to histogram algorithm but can be
applied to other backends, no? And I vaguely recall that Linus did try
something like this once, found some issues with it when context is set to
non zero, and stopped doing it (sorry, I do not have any more details).

I am not suggesting you to remove this call or hoist the call to one level
up to xdl_do_diff(), but I do have to wonder how much of the performance
improvement you reported is due to this common head/tail reduction.

> +	return histogram_diff(xpp, env, line1, count1, line2, count2);
> +}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC/PATCH 0/3] teach --histogram to diff
  2011-07-12 14:19 ` [RFC/PATCH 0/3] " Shawn Pearce
  2011-07-12 17:43   ` Junio C Hamano
@ 2011-07-13 16:34   ` Tay Ray Chuan
  1 sibling, 0 replies; 24+ messages in thread
From: Tay Ray Chuan @ 2011-07-13 16:34 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Git Mailing List

On Tue, Jul 12, 2011 at 10:19 PM, Shawn Pearce <spearce@spearce.org> wrote:
> On Mon, Jul 11, 2011 at 23:10, Tay Ray Chuan <rctay89@gmail.com> wrote:
>> [RFC/PATCH 3/3] xdiff/xprepare: use a smaller sample size for histogram
>
> Do we need sampling at all for histogram? Can you skip it?

Sampling is done to get a guess of lines in the file. This guess is
then used to preallocated memory for the list of records. (This is
just a guess; if we find more records we allocate more memory.) By
doing this preallocation, we can save on malloc()'s, giving a
performance boost.

But then sampling has its costs - previously, we ran up to 256
memchr('\n')s within a mmfile "block". For histogram diff, we cut the
cap down to 20. (But not for the other diff algorithms - see the
relevant patch text for more.) I think this gives us a good balance -
time spent in guessing lines, and time gained from preallocating
memory.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC/PATCH 0/3] teach --histogram to diff
  2011-07-12 17:43   ` Junio C Hamano
@ 2011-07-13 16:35     ` Tay Ray Chuan
  0 siblings, 0 replies; 24+ messages in thread
From: Tay Ray Chuan @ 2011-07-13 16:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Git Mailing List

On Wed, Jul 13, 2011 at 1:43 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
>
>> Have you looked at a patch that differs in output between Myers and
>> patience, and then compared those to the histogram version?
>
> $ git log -p v1.6.6..v1.7.6 | git patch-id >/var/tmp/md
> $ git log --histogram -p v1.6.6..v1.7.6 | git patch-id >/var/tmp/hd
> $ diff -u0 /var/tmp/md /var/tmp/hd |
>  sed -ne '/^+/s/^+[0-9a-f][0-9a-f][0-9a-f]* //p' |
>  while read commit
>  do
>        git show "$commit" >/var/tmp/1
>        git show --histogram "$commit" >/var/tmp/2
>        interdiff /var/tmp/1 /var/tmp/2
>  done

Thanks for taking the time to run this.

> shows there is one that gives vastly different appearance, but it all
> boils down to which lines to take as common, and for this particular
> example neither is more readable over the other (9560808f2ef5a34d2a).
>
> Running the above "show" with larger -U$n value shows there don't seem to
> be any discrepancies between the two.

I suspect it's due to that change being mostly a lot of additions on
the right side, without much deletions on the left.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC/PATCH 1/3] teach --histogram to diff
  2011-07-12 19:56   ` [RFC/PATCH 1/3] teach --histogram to diff Junio C Hamano
@ 2011-07-13 16:36     ` Tay Ray Chuan
  0 siblings, 0 replies; 24+ messages in thread
From: Tay Ray Chuan @ 2011-07-13 16:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Shawn O. Pearce

On Wed, Jul 13, 2011 at 3:56 AM, Junio C Hamano <gitster@pobox.com> wrote:
> This is just half-a-review (bottom half of the file).
> [snip]
>> +                     result = histogram_diff(xpp, env,
>> +                             line1, lcs.begin1 - line1,
>> +                             line2, lcs.begin2 - line2);
>> +                     result = histogram_diff(xpp, env,
>> +                             lcs.end1 + 1, LINE_END(1) - lcs.end1,
>> +                             lcs.end2 + 1, LINE_END(2) - lcs.end2);
>
> The result from the first half before lcs is discarded?
>
>> +                     result *= -1;
>
> Again, what does this function (called recursively) return, and what does
> flipping the sign of it do?

Oops, my bad.

>[snip]
>> +     reduce_common_start_end(xpp, env, &line1, &count1, &line2, &count2);
>
> What this does is logically not specific to histogram algorithm but can be
> applied to other backends, no? And I vaguely recall that Linus did try
> something like this once, found some issues with it when context is set to
> non zero, and stopped doing it (sorry, I do not have any more details).
>
> I am not suggesting you to remove this call or hoist the call to one level
> up to xdl_do_diff(), but I do have to wonder how much of the performance
> improvement you reported is due to this common head/tail reduction.

I believe xdiff already performs this for the Meyers algorithm (in
xprepare.c:xdl_trim_ends()), but the Meyers code doesn't look like it
uses this information at all. If this is the case, then I do think
that a large part of the performance improvement is due to this common
reduction - for git log -p, one would expect a pretty large number of
common head/tail lines in files within a commit.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 0/8] teach --histogram to diff
  2011-07-12  6:10 [RFC/PATCH 0/3] teach --histogram to diff Tay Ray Chuan
  2011-07-12  6:10 ` [RFC/PATCH 1/3] " Tay Ray Chuan
  2011-07-12 14:19 ` [RFC/PATCH 0/3] " Shawn Pearce
@ 2011-08-01  3:16 ` Tay Ray Chuan
  2011-08-01  3:16   ` [PATCH v2 1/8] xdiff/xprepare: use memset() Tay Ray Chuan
  2011-08-01  4:20   ` [PATCH 0/4] changes for rc/histogram-diff in 'next' Tay Ray Chuan
  2 siblings, 2 replies; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  3:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

Port JGit's HistogramDiff(Index) over to C. This algorithm extends the
patience algorithm to "support low-occurrence common elements" [1].
Rough numbers show that it is a faster alternative to its --patience
cousin, as well as to the default Meyers algorithm.

Changed from v1/RFC:
 - patch #2 is new
 - patch #6 been changed
 - all other patches are unchanged

Contents:
[PATCH v2 1/8] xdiff/xprepare: use memset()
[PATCH v2 2/8] do away with xdl_mmfile_next()
[PATCH v2 3/8] xdiff/xprepare: refactor abort cleanups
[PATCH v2 4/8] xdiff/xpatience: factor out fall-back-diff function
[PATCH v2 5/8] t4033-diff-patience: factor out tests
[PATCH v2 6/8] teach --histogram to diff
[PATCH v2 7/8] xdiff/xprepare: skip classification
[PATCH v2 8/8] xdiff/xprepare: use a smaller sample size for histogram diff

--
Footnotes:
[1]
http://egit.eclipse.org/w/?p=jgit.git;a=blob;f=org.eclipse.jgit/src/org/eclipse/jgit/diff/HistogramDiff.java;hb=HEAD

-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/8] xdiff/xprepare: use memset()
  2011-08-01  3:16 ` [PATCH v2 0/8] " Tay Ray Chuan
@ 2011-08-01  3:16   ` Tay Ray Chuan
  2011-08-01  3:16     ` [PATCH v2 2/8] do away with xdl_mmfile_next() Tay Ray Chuan
  2011-08-01  4:20   ` [PATCH 0/4] changes for rc/histogram-diff in 'next' Tay Ray Chuan
  1 sibling, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  3:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

Use memset() instead of a for loop to initialize. This could give a
performance advantage.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 xdiff/xprepare.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 1689085..783631a 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -64,8 +64,6 @@ static int xdl_optimize_ctxs(xdfile_t *xdf1, xdfile_t *xdf2);
 
 
 static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags) {
-	long i;
-
 	cf->flags = flags;
 
 	cf->hbits = xdl_hashbits((unsigned int) size);
@@ -80,8 +78,7 @@ static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags) {
 		xdl_cha_free(&cf->ncha);
 		return -1;
 	}
-	for (i = 0; i < cf->hsize; i++)
-		cf->rchash[i] = NULL;
+	memset(cf->rchash, 0, cf->hsize * sizeof(xdlclass_t *));
 
 	cf->count = 0;
 
@@ -136,7 +133,7 @@ static int xdl_classify_record(xdlclassifier_t *cf, xrecord_t **rhash, unsigned
 static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 			   xdlclassifier_t *cf, xdfile_t *xdf) {
 	unsigned int hbits;
-	long i, nrec, hsize, bsize;
+	long nrec, hsize, bsize;
 	unsigned long hav;
 	char const *blk, *cur, *top, *prev;
 	xrecord_t *crec;
@@ -164,8 +161,7 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 		xdl_cha_free(&xdf->rcha);
 		return -1;
 	}
-	for (i = 0; i < hsize; i++)
-		rhash[i] = NULL;
+	memset(rhash, 0, hsize * sizeof(xrecord_t *));
 
 	nrec = 0;
 	if ((cur = blk = xdl_mmfile_first(mf, &bsize)) != NULL) {
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/8] do away with xdl_mmfile_next()
  2011-08-01  3:16   ` [PATCH v2 1/8] xdiff/xprepare: use memset() Tay Ray Chuan
@ 2011-08-01  3:16     ` Tay Ray Chuan
  2011-08-01  3:16       ` [PATCH v2 3/8] xdiff/xprepare: refactor abort cleanups Tay Ray Chuan
  0 siblings, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  3:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

Given our simple mmfile structure, xdl_mmfile_next() calls are
redundant. Do away with calls to them.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---

New in v2.

 xdiff/xdiff.h    |    1 -
 xdiff/xprepare.c |    7 +------
 xdiff/xutils.c   |   14 +-------------
 3 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 711048e..6ea1d0e 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -105,7 +105,6 @@ typedef struct s_bdiffparam {
 #define xdl_realloc(ptr,x) realloc(ptr,x)
 
 void *xdl_mmfile_first(mmfile_t *mmf, long *size);
-void *xdl_mmfile_next(mmfile_t *mmf, long *size);
 long xdl_mmfile_size(mmfile_t *mmf);
 
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 783631a..49c7e8a 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -165,12 +165,7 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 
 	nrec = 0;
 	if ((cur = blk = xdl_mmfile_first(mf, &bsize)) != NULL) {
-		for (top = blk + bsize;;) {
-			if (cur >= top) {
-				if (!(cur = blk = xdl_mmfile_next(mf, &bsize)))
-					break;
-				top = blk + bsize;
-			}
+		for (top = blk + bsize; cur < top; ) {
 			prev = cur;
 			hav = xdl_hash_record(&cur, top, xpp->flags);
 			if (nrec >= narec) {
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index ab65034..ea1357d 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -71,12 +71,6 @@ void *xdl_mmfile_first(mmfile_t *mmf, long *size)
 }
 
 
-void *xdl_mmfile_next(mmfile_t *mmf, long *size)
-{
-	return NULL;
-}
-
-
 long xdl_mmfile_size(mmfile_t *mmf)
 {
 	return mmf->size;
@@ -164,13 +158,7 @@ long xdl_guess_lines(mmfile_t *mf) {
 	char const *data, *cur, *top;
 
 	if ((cur = data = xdl_mmfile_first(mf, &size)) != NULL) {
-		for (top = data + size; nl < XDL_GUESS_NLINES;) {
-			if (cur >= top) {
-				tsize += (long) (cur - data);
-				if (!(cur = data = xdl_mmfile_next(mf, &size)))
-					break;
-				top = data + size;
-			}
+		for (top = data + size; nl < XDL_GUESS_NLINES && cur < top; ) {
 			nl++;
 			if (!(cur = memchr(cur, '\n', top - cur)))
 				cur = top;
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/8] xdiff/xprepare: refactor abort cleanups
  2011-08-01  3:16     ` [PATCH v2 2/8] do away with xdl_mmfile_next() Tay Ray Chuan
@ 2011-08-01  3:16       ` Tay Ray Chuan
  2011-08-01  3:16         ` [PATCH v2 4/8] xdiff/xpatience: factor out fall-back-diff function Tay Ray Chuan
  0 siblings, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  3:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

Group free()'s that are called when a malloc() fails in
xdl_prepare_ctx(), making for more readable code.

Also add a free() on ha, in case future git hackers add allocs after the
ha malloc.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 xdiff/xprepare.c |   91 +++++++++++++++++++-----------------------------------
 1 files changed, 32 insertions(+), 59 deletions(-)

diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 49c7e8a..3ebad0f 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -143,24 +143,21 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 	char *rchg;
 	long *rindex;
 
-	if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0) {
+	ha = NULL;
+	rindex = NULL;
+	rchg = NULL;
+	rhash = NULL;
+	recs = NULL;
 
-		return -1;
-	}
-	if (!(recs = (xrecord_t **) xdl_malloc(narec * sizeof(xrecord_t *)))) {
-
-		xdl_cha_free(&xdf->rcha);
-		return -1;
-	}
+	if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0)
+		goto abort;
+	if (!(recs = (xrecord_t **) xdl_malloc(narec * sizeof(xrecord_t *))))
+		goto abort;
 
 	hbits = xdl_hashbits((unsigned int) narec);
 	hsize = 1 << hbits;
-	if (!(rhash = (xrecord_t **) xdl_malloc(hsize * sizeof(xrecord_t *)))) {
-
-		xdl_free(recs);
-		xdl_cha_free(&xdf->rcha);
-		return -1;
-	}
+	if (!(rhash = (xrecord_t **) xdl_malloc(hsize * sizeof(xrecord_t *))))
+		goto abort;
 	memset(rhash, 0, hsize * sizeof(xrecord_t *));
 
 	nrec = 0;
@@ -170,63 +167,30 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 			hav = xdl_hash_record(&cur, top, xpp->flags);
 			if (nrec >= narec) {
 				narec *= 2;
-				if (!(rrecs = (xrecord_t **) xdl_realloc(recs, narec * sizeof(xrecord_t *)))) {
-
-					xdl_free(rhash);
-					xdl_free(recs);
-					xdl_cha_free(&xdf->rcha);
-					return -1;
-				}
+				if (!(rrecs = (xrecord_t **) xdl_realloc(recs, narec * sizeof(xrecord_t *))))
+					goto abort;
 				recs = rrecs;
 			}
-			if (!(crec = xdl_cha_alloc(&xdf->rcha))) {
-
-				xdl_free(rhash);
-				xdl_free(recs);
-				xdl_cha_free(&xdf->rcha);
-				return -1;
-			}
+			if (!(crec = xdl_cha_alloc(&xdf->rcha)))
+				goto abort;
 			crec->ptr = prev;
 			crec->size = (long) (cur - prev);
 			crec->ha = hav;
 			recs[nrec++] = crec;
 
-			if (xdl_classify_record(cf, rhash, hbits, crec) < 0) {
-
-				xdl_free(rhash);
-				xdl_free(recs);
-				xdl_cha_free(&xdf->rcha);
-				return -1;
-			}
+			if (xdl_classify_record(cf, rhash, hbits, crec) < 0)
+				goto abort;
 		}
 	}
 
-	if (!(rchg = (char *) xdl_malloc((nrec + 2) * sizeof(char)))) {
-
-		xdl_free(rhash);
-		xdl_free(recs);
-		xdl_cha_free(&xdf->rcha);
-		return -1;
-	}
+	if (!(rchg = (char *) xdl_malloc((nrec + 2) * sizeof(char))))
+		goto abort;
 	memset(rchg, 0, (nrec + 2) * sizeof(char));
 
-	if (!(rindex = (long *) xdl_malloc((nrec + 1) * sizeof(long)))) {
-
-		xdl_free(rchg);
-		xdl_free(rhash);
-		xdl_free(recs);
-		xdl_cha_free(&xdf->rcha);
-		return -1;
-	}
-	if (!(ha = (unsigned long *) xdl_malloc((nrec + 1) * sizeof(unsigned long)))) {
-
-		xdl_free(rindex);
-		xdl_free(rchg);
-		xdl_free(rhash);
-		xdl_free(recs);
-		xdl_cha_free(&xdf->rcha);
-		return -1;
-	}
+	if (!(rindex = (long *) xdl_malloc((nrec + 1) * sizeof(long))))
+		goto abort;
+	if (!(ha = (unsigned long *) xdl_malloc((nrec + 1) * sizeof(unsigned long))))
+		goto abort;
 
 	xdf->nrec = nrec;
 	xdf->recs = recs;
@@ -240,6 +204,15 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 	xdf->dend = nrec - 1;
 
 	return 0;
+
+abort:
+	xdl_free(ha);
+	xdl_free(rindex);
+	xdl_free(rchg);
+	xdl_free(rhash);
+	xdl_free(recs);
+	xdl_cha_free(&xdf->rcha);
+	return -1;
 }
 
 
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/8] xdiff/xpatience: factor out fall-back-diff function
  2011-08-01  3:16       ` [PATCH v2 3/8] xdiff/xprepare: refactor abort cleanups Tay Ray Chuan
@ 2011-08-01  3:16         ` Tay Ray Chuan
  2011-08-01  3:16           ` [PATCH v2 5/8] t4033-diff-patience: factor out tests Tay Ray Chuan
  0 siblings, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  3:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

This is in preparation for the histogram diff algorithm, which will also
re-use much of the code to call the default Meyers diff algorithm.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 xdiff/xpatience.c |   27 ++-------------------------
 xdiff/xutils.c    |   31 +++++++++++++++++++++++++++++++
 xdiff/xutils.h    |    2 ++
 3 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index e42c16a..fdd7d02 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -287,34 +287,11 @@ static int walk_common_sequence(struct hashmap *map, struct entry *first,
 static int fall_back_to_classic_diff(struct hashmap *map,
 		int line1, int count1, int line2, int count2)
 {
-	/*
-	 * This probably does not work outside Git, since
-	 * we have a very simple mmfile structure.
-	 *
-	 * Note: ideally, we would reuse the prepared environment, but
-	 * the libxdiff interface does not (yet) allow for diffing only
-	 * ranges of lines instead of the whole files.
-	 */
-	mmfile_t subfile1, subfile2;
 	xpparam_t xpp;
-	xdfenv_t env;
-
-	subfile1.ptr = (char *)map->env->xdf1.recs[line1 - 1]->ptr;
-	subfile1.size = map->env->xdf1.recs[line1 + count1 - 2]->ptr +
-		map->env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
-	subfile2.ptr = (char *)map->env->xdf2.recs[line2 - 1]->ptr;
-	subfile2.size = map->env->xdf2.recs[line2 + count2 - 2]->ptr +
-		map->env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
 	xpp.flags = map->xpp->flags & ~XDF_PATIENCE_DIFF;
-	if (xdl_do_diff(&subfile1, &subfile2, &xpp, &env) < 0)
-		return -1;
-
-	memcpy(map->env->xdf1.rchg + line1 - 1, env.xdf1.rchg, count1);
-	memcpy(map->env->xdf2.rchg + line2 - 1, env.xdf2.rchg, count2);
-
-	xdl_free_env(&env);
 
-	return 0;
+	return xdl_fall_back_diff(map->env, &xpp,
+				  line1, count1, line2, count2);
 }
 
 /*
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index ea1357d..890cc4f 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -390,3 +390,34 @@ int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2,
 
 	return 0;
 }
+
+int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
+		int line1, int count1, int line2, int count2)
+{
+	/*
+	 * This probably does not work outside Git, since
+	 * we have a very simple mmfile structure.
+	 *
+	 * Note: ideally, we would reuse the prepared environment, but
+	 * the libxdiff interface does not (yet) allow for diffing only
+	 * ranges of lines instead of the whole files.
+	 */
+	mmfile_t subfile1, subfile2;
+	xdfenv_t env;
+
+	subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1]->ptr;
+	subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2]->ptr +
+		diff_env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
+	subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1]->ptr;
+	subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2]->ptr +
+		diff_env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
+	if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0)
+		return -1;
+
+	memcpy(diff_env->xdf1.rchg + line1 - 1, env.xdf1.rchg, count1);
+	memcpy(diff_env->xdf2.rchg + line2 - 1, env.xdf2.rchg, count2);
+
+	xdl_free_env(&env);
+
+	return 0;
+}
diff --git a/xdiff/xutils.h b/xdiff/xutils.h
index d5de829..674a657 100644
--- a/xdiff/xutils.h
+++ b/xdiff/xutils.h
@@ -41,6 +41,8 @@ int xdl_num_out(char *out, long val);
 long xdl_atol(char const *str, char const **next);
 int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2,
 		      const char *func, long funclen, xdemitcb_t *ecb);
+int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
+		       int line1, int count1, int line2, int count2);
 
 
 
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 5/8] t4033-diff-patience: factor out tests
  2011-08-01  3:16         ` [PATCH v2 4/8] xdiff/xpatience: factor out fall-back-diff function Tay Ray Chuan
@ 2011-08-01  3:16           ` Tay Ray Chuan
  2011-08-01  3:16             ` [PATCH v2 6/8] teach --histogram to diff Tay Ray Chuan
  0 siblings, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  3:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

Group the test cases into two functions, test_diff_(frobnitz|unique).
This in preparation for the histogram diff algorithm, which would also
re-use these test cases.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 t/lib-diff-alternative.sh |  165 +++++++++++++++++++++++++++++++++++++++++++++
 t/t4033-diff-patience.sh  |  162 +-------------------------------------------
 2 files changed, 168 insertions(+), 159 deletions(-)
 create mode 100644 t/lib-diff-alternative.sh

diff --git a/t/lib-diff-alternative.sh b/t/lib-diff-alternative.sh
new file mode 100644
index 0000000..75ffd91
--- /dev/null
+++ b/t/lib-diff-alternative.sh
@@ -0,0 +1,165 @@
+#!/bin/sh
+
+test_diff_frobnitz() {
+	cat >file1 <<\EOF
+#include <stdio.h>
+
+// Frobs foo heartily
+int frobnitz(int foo)
+{
+    int i;
+    for(i = 0; i < 10; i++)
+    {
+        printf("Your answer is: ");
+        printf("%d\n", foo);
+    }
+}
+
+int fact(int n)
+{
+    if(n > 1)
+    {
+        return fact(n-1) * n;
+    }
+    return 1;
+}
+
+int main(int argc, char **argv)
+{
+    frobnitz(fact(10));
+}
+EOF
+
+	cat >file2 <<\EOF
+#include <stdio.h>
+
+int fib(int n)
+{
+    if(n > 2)
+    {
+        return fib(n-1) + fib(n-2);
+    }
+    return 1;
+}
+
+// Frobs foo heartily
+int frobnitz(int foo)
+{
+    int i;
+    for(i = 0; i < 10; i++)
+    {
+        printf("%d\n", foo);
+    }
+}
+
+int main(int argc, char **argv)
+{
+    frobnitz(fib(10));
+}
+EOF
+
+	cat >expect <<\EOF
+diff --git a/file1 b/file2
+index 6faa5a3..e3af329 100644
+--- a/file1
++++ b/file2
+@@ -1,26 +1,25 @@
+ #include <stdio.h>
+ 
++int fib(int n)
++{
++    if(n > 2)
++    {
++        return fib(n-1) + fib(n-2);
++    }
++    return 1;
++}
++
+ // Frobs foo heartily
+ int frobnitz(int foo)
+ {
+     int i;
+     for(i = 0; i < 10; i++)
+     {
+-        printf("Your answer is: ");
+         printf("%d\n", foo);
+     }
+ }
+ 
+-int fact(int n)
+-{
+-    if(n > 1)
+-    {
+-        return fact(n-1) * n;
+-    }
+-    return 1;
+-}
+-
+ int main(int argc, char **argv)
+ {
+-    frobnitz(fact(10));
++    frobnitz(fib(10));
+ }
+EOF
+
+	STRATEGY=$1
+
+	test_expect_success "$STRATEGY diff" '
+		test_must_fail git diff --no-index "--$STRATEGY" file1 file2 > output &&
+		test_cmp expect output
+	'
+
+	test_expect_success "$STRATEGY diff output is valid" '
+		mv file2 expect &&
+		git apply < output &&
+		test_cmp expect file2
+	'
+}
+
+test_diff_unique() {
+	cat >uniq1 <<\EOF
+1
+2
+3
+4
+5
+6
+EOF
+
+	cat >uniq2 <<\EOF
+a
+b
+c
+d
+e
+f
+EOF
+
+	cat >expect <<\EOF
+diff --git a/uniq1 b/uniq2
+index b414108..0fdf397 100644
+--- a/uniq1
++++ b/uniq2
+@@ -1,6 +1,6 @@
+-1
+-2
+-3
+-4
+-5
+-6
++a
++b
++c
++d
++e
++f
+EOF
+
+	STRATEGY=$1
+
+	test_expect_success 'completely different files' '
+		test_must_fail git diff --no-index "--$STRATEGY" uniq1 uniq2 > output &&
+		test_cmp expect output
+	'
+}
+
diff --git a/t/t4033-diff-patience.sh b/t/t4033-diff-patience.sh
index 1eb1498..3c9932e 100755
--- a/t/t4033-diff-patience.sh
+++ b/t/t4033-diff-patience.sh
@@ -3,166 +3,10 @@
 test_description='patience diff algorithm'
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-diff-alternative.sh
 
-cat >file1 <<\EOF
-#include <stdio.h>
+test_diff_frobnitz "patience"
 
-// Frobs foo heartily
-int frobnitz(int foo)
-{
-    int i;
-    for(i = 0; i < 10; i++)
-    {
-        printf("Your answer is: ");
-        printf("%d\n", foo);
-    }
-}
-
-int fact(int n)
-{
-    if(n > 1)
-    {
-        return fact(n-1) * n;
-    }
-    return 1;
-}
-
-int main(int argc, char **argv)
-{
-    frobnitz(fact(10));
-}
-EOF
-
-cat >file2 <<\EOF
-#include <stdio.h>
-
-int fib(int n)
-{
-    if(n > 2)
-    {
-        return fib(n-1) + fib(n-2);
-    }
-    return 1;
-}
-
-// Frobs foo heartily
-int frobnitz(int foo)
-{
-    int i;
-    for(i = 0; i < 10; i++)
-    {
-        printf("%d\n", foo);
-    }
-}
-
-int main(int argc, char **argv)
-{
-    frobnitz(fib(10));
-}
-EOF
-
-cat >expect <<\EOF
-diff --git a/file1 b/file2
-index 6faa5a3..e3af329 100644
---- a/file1
-+++ b/file2
-@@ -1,26 +1,25 @@
- #include <stdio.h>
- 
-+int fib(int n)
-+{
-+    if(n > 2)
-+    {
-+        return fib(n-1) + fib(n-2);
-+    }
-+    return 1;
-+}
-+
- // Frobs foo heartily
- int frobnitz(int foo)
- {
-     int i;
-     for(i = 0; i < 10; i++)
-     {
--        printf("Your answer is: ");
-         printf("%d\n", foo);
-     }
- }
- 
--int fact(int n)
--{
--    if(n > 1)
--    {
--        return fact(n-1) * n;
--    }
--    return 1;
--}
--
- int main(int argc, char **argv)
- {
--    frobnitz(fact(10));
-+    frobnitz(fib(10));
- }
-EOF
-
-test_expect_success 'patience diff' '
-
-	test_must_fail git diff --no-index --patience file1 file2 > output &&
-	test_cmp expect output
-
-'
-
-test_expect_success 'patience diff output is valid' '
-
-	mv file2 expect &&
-	git apply < output &&
-	test_cmp expect file2
-
-'
-
-cat >uniq1 <<\EOF
-1
-2
-3
-4
-5
-6
-EOF
-
-cat >uniq2 <<\EOF
-a
-b
-c
-d
-e
-f
-EOF
-
-cat >expect <<\EOF
-diff --git a/uniq1 b/uniq2
-index b414108..0fdf397 100644
---- a/uniq1
-+++ b/uniq2
-@@ -1,6 +1,6 @@
--1
--2
--3
--4
--5
--6
-+a
-+b
-+c
-+d
-+e
-+f
-EOF
-
-test_expect_success 'completely different files' '
-
-	test_must_fail git diff --no-index --patience uniq1 uniq2 > output &&
-	test_cmp expect output
-
-'
+test_diff_unique "patience"
 
 test_done
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 6/8] teach --histogram to diff
  2011-08-01  3:16           ` [PATCH v2 5/8] t4033-diff-patience: factor out tests Tay Ray Chuan
@ 2011-08-01  3:16             ` Tay Ray Chuan
  2011-08-01  3:16               ` [PATCH v2 7/8] xdiff/xprepare: skip classification Tay Ray Chuan
  0 siblings, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  3:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show
that it is faster than its --patience cousin, as well as the default
Meyers algorithm.

The implementation has been reworked to use structs and pointers,
instead of bitmasks, thus doing away with JGit's 2^28 line limit.

We also use xdiff's default hash table implementation (xdl_hash_bits()
with XDL_HASHLONG()) for convenience.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---

Changes in v2:
 - changed handling of recursed results
 - do away with reduce_common_start_end(), as use xdiff's
   xdl_trim_ends()

On Wed, Jul 13, 2011 at 3:56 AM, Junio C Hamano <gitster@pobox.com> wrote:
>[snip]
>> +     reduce_common_start_end(xpp, env, &line1, &count1, &line2, &count2);
>
> What this does is logically not specific to histogram algorithm but can be
> applied to other backends, no? And I vaguely recall that Linus did try
> something like this once, found some issues with it when context is set to
> non zero, and stopped doing it (sorry, I do not have any more details).
>
> I am not suggesting you to remove this call or hoist the call to one level
> up to xdl_do_diff(), but I do have to wonder how much of the performance
> improvement you reported is due to this common head/tail reduction.

I believe this was the patch by Linus you were referring to:

  https://lkml.org/lkml/2007/12/20/692

That is an optimization on a more aggressive level - cutting out
content so that it doesn't get hashed in the first place. The
optimization used here (reduce_common_start_end()/xdl_trim_ends())
depends on the hashed result and simply reduces the "area" on which the
algorithm is applied to.

(Actually, I do have a working patch that does content trimming that is
context-length safe. But it's not specific to histogram so I'll keep it
with me till this series gets merged.)

 Makefile                  |    2 +-
 diff.c                    |    2 +
 merge-recursive.c         |    2 +
 t/t4048-diff-histogram.sh |   12 ++
 xdiff/xdiff.h             |    1 +
 xdiff/xdiffi.c            |    3 +
 xdiff/xdiffi.h            |    2 +
 xdiff/xhistogram.c        |  360 +++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 383 insertions(+), 1 deletions(-)
 create mode 100755 t/t4048-diff-histogram.sh
 create mode 100644 xdiff/xhistogram.c

diff --git a/Makefile b/Makefile
index 775ee83..9930e6b 100644
--- a/Makefile
+++ b/Makefile
@@ -1836,7 +1836,7 @@ ifndef NO_CURL
 	GIT_OBJS += http.o http-walker.o remote-curl.o
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
-	xdiff/xmerge.o xdiff/xpatience.o
+	xdiff/xmerge.o xdiff/xpatience.o xdiff/xhistogram.o
 VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
 	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/svndump.o
 VCSSVN_TEST_OBJS = test-obj-pool.o test-string-pool.o \
diff --git a/diff.c b/diff.c
index 5422c43..a113294 100644
--- a/diff.c
+++ b/diff.c
@@ -3186,6 +3186,8 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--patience"))
 		DIFF_XDL_SET(options, PATIENCE_DIFF);
+	else if (!strcmp(arg, "--histogram"))
+		DIFF_XDL_SET(options, HISTOGRAM_DIFF);
 
 	/* flags options */
 	else if (!strcmp(arg, "--binary")) {
diff --git a/merge-recursive.c b/merge-recursive.c
index 16c2dbe..3e8267b 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1773,6 +1773,8 @@ int parse_merge_opt(struct merge_options *o, const char *s)
 		o->subtree_shift = s + strlen("subtree=");
 	else if (!strcmp(s, "patience"))
 		o->xdl_opts |= XDF_PATIENCE_DIFF;
+	else if (!strcmp(s, "histogram"))
+		o->xdl_opts |= XDF_HISTOGRAM_DIFF;
 	else if (!strcmp(s, "ignore-space-change"))
 		o->xdl_opts |= XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(s, "ignore-all-space"))
diff --git a/t/t4048-diff-histogram.sh b/t/t4048-diff-histogram.sh
new file mode 100755
index 0000000..fd3e86a
--- /dev/null
+++ b/t/t4048-diff-histogram.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+test_description='histogram diff algorithm'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-diff-alternative.sh
+
+test_diff_frobnitz "histogram"
+
+test_diff_unique "histogram"
+
+test_done
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 6ea1d0e..4beb10c 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -33,6 +33,7 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
 #define XDF_PATIENCE_DIFF (1 << 5)
+#define XDF_HISTOGRAM_DIFF (1 << 6)
 #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
 
 #define XDL_PATCH_NORMAL '-'
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index da67c04..75a3922 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -331,6 +331,9 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	if (xpp->flags & XDF_PATIENCE_DIFF)
 		return xdl_do_patience_diff(mf1, mf2, xpp, xe);
 
+	if (xpp->flags & XDF_HISTOGRAM_DIFF)
+		return xdl_do_histogram_diff(mf1, mf2, xpp, xe);
+
 	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
 
 		return -1;
diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h
index ad033a8..7a92ea9 100644
--- a/xdiff/xdiffi.h
+++ b/xdiff/xdiffi.h
@@ -57,5 +57,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg);
 int xdl_do_patience_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		xdfenv_t *env);
+int xdl_do_histogram_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
+		xdfenv_t *env);
 
 #endif /* #if !defined(XDIFFI_H) */
diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
new file mode 100644
index 0000000..4af99f0
--- /dev/null
+++ b/xdiff/xhistogram.c
@@ -0,0 +1,360 @@
+/*
+ * Copyright (C) 2010, Google Inc.
+ * and other copyright owners as documented in JGit's IP log.
+ *
+ * This program and the accompanying materials are made available
+ * under the terms of the Eclipse Distribution License v1.0 which
+ * accompanies this distribution, is reproduced below, and is
+ * available at http://www.eclipse.org/org/documents/edl-v10.php
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Eclipse Foundation, Inc. nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "xinclude.h"
+#include "xtypes.h"
+#include "xdiff.h"
+
+#define MAX_PTR	UINT_MAX
+#define MAX_CNT	UINT_MAX
+
+#define LINE_END(n) (line##n + count##n - 1)
+#define LINE_END_PTR(n) (*line##n + *count##n - 1)
+
+struct histindex {
+	struct record {
+		unsigned int ptr, cnt;
+		struct record *next;
+	} **records, /* an ocurrence */
+	  **line_map; /* map of line to record chain */
+	chastore_t rcha;
+	unsigned int *next_ptrs;
+	unsigned int table_bits,
+		     records_size,
+		     line_map_size;
+
+	unsigned int max_chain_length,
+		     key_shift,
+		     ptr_shift;
+
+	unsigned int cnt,
+		     has_common;
+
+	xdfenv_t *env;
+	xpparam_t const *xpp;
+};
+
+struct region {
+	unsigned int begin1, end1;
+	unsigned int begin2, end2;
+};
+
+#define LINE_MAP(i, a) (i->line_map[(a) - i->ptr_shift])
+
+#define NEXT_PTR(index, ptr) \
+	(index->next_ptrs[(ptr) - index->ptr_shift])
+
+#define CNT(index, ptr) \
+	((LINE_MAP(index, ptr))->cnt)
+
+#define REC(env, s, l) \
+	(env->xdf##s.recs[l - 1])
+
+static int cmp_recs(xpparam_t const *xpp,
+	xrecord_t *r1, xrecord_t *r2)
+{
+	return r1->ha == r2->ha &&
+		xdl_recmatch(r1->ptr, r1->size, r2->ptr, r2->size,
+			    xpp->flags);
+}
+
+#define CMP_ENV(xpp, env, s1, l1, s2, l2) \
+	(cmp_recs(xpp, REC(env, s1, l1), REC(env, s2, l2)))
+
+#define CMP(i, s1, l1, s2, l2) \
+	(cmp_recs(i->xpp, REC(i->env, s1, l1), REC(i->env, s2, l2)))
+
+#define TABLE_HASH(index, side, line) \
+	XDL_HASHLONG((REC(index->env, side, line))->ha, index->table_bits)
+
+static int scanA(struct histindex *index, int line1, int count1)
+{
+	unsigned int ptr, tbl_idx;
+	unsigned int chain_len;
+	struct record **rec_chain, *rec;
+
+	for (ptr = LINE_END(1); line1 <= ptr; ptr--) {
+		tbl_idx = TABLE_HASH(index, 1, ptr);
+		rec_chain = index->records + tbl_idx;
+		rec = *rec_chain;
+
+		chain_len = 0;
+		while (rec) {
+			if (CMP(index, 1, rec->ptr, 1, ptr)) {
+				/*
+				 * ptr is identical to another element. Insert
+				 * it onto the front of the existing element
+				 * chain.
+				 */
+				NEXT_PTR(index, ptr) = rec->ptr;
+				rec->ptr = ptr;
+				/* cap rec->cnt at MAX_CNT */
+				rec->cnt = XDL_MIN(MAX_CNT, rec->cnt + 1);
+				LINE_MAP(index, ptr) = rec;
+				goto continue_scan;
+			}
+
+			rec = rec->next;
+			chain_len++;
+		}
+
+		if (chain_len == index->max_chain_length)
+			return -1;
+
+		/*
+		 * This is the first time we have ever seen this particular
+		 * element in the sequence. Construct a new chain for it.
+		 */
+		if (!(rec = xdl_cha_alloc(&index->rcha)))
+			return -1;
+		rec->ptr = ptr;
+		rec->cnt = 1;
+		rec->next = *rec_chain;
+		*rec_chain = rec;
+		LINE_MAP(index, ptr) = rec;
+
+continue_scan:
+		; /* no op */
+	}
+
+	return 0;
+}
+
+static int try_lcs(struct histindex *index, struct region *lcs, int b_ptr,
+	int line1, int count1, int line2, int count2)
+{
+	unsigned int b_next = b_ptr + 1;
+	struct record *rec = index->records[TABLE_HASH(index, 2, b_ptr)];
+	unsigned int as, ae, bs, be, np, rc;
+	int should_break;
+
+	for (; rec; rec = rec->next) {
+		if (rec->cnt > index->cnt) {
+			if (!index->has_common)
+				index->has_common = CMP(index, 1, rec->ptr, 2, b_ptr);
+			continue;
+		}
+
+		as = rec->ptr;
+		if (!CMP(index, 1, as, 2, b_ptr))
+			continue;
+
+		index->has_common = 1;
+		for (;;) {
+			should_break = 0;
+			np = NEXT_PTR(index, as);
+			bs = b_ptr;
+			ae = as;
+			be = bs;
+			rc = rec->cnt;
+
+			while (line1 < as && line2 < bs
+				&& CMP(index, 1, as - 1, 2, bs - 1)) {
+				as--;
+				bs--;
+				if (1 < rc)
+					rc = XDL_MIN(rc, CNT(index, as));
+			}
+			while (ae < LINE_END(1) && be < LINE_END(2)
+				&& CMP(index, 1, ae + 1, 2, be + 1)) {
+				ae++;
+				be++;
+				if (1 < rc)
+					rc = XDL_MIN(rc, CNT(index, ae));
+			}
+
+			if (b_next <= be)
+				b_next = be + 1;
+			if (lcs->end1 - lcs->begin1 < ae - as || rc < index->cnt) {
+				lcs->begin1 = as;
+				lcs->begin2 = bs;
+				lcs->end1 = ae;
+				lcs->end2 = be;
+				index->cnt = rc;
+			}
+
+			if (np == 0)
+				break;
+
+			while (np <= ae) {
+				np = NEXT_PTR(index, np);
+				if (np == 0) {
+					should_break = 1;
+					break;
+				}
+			}
+
+			if (should_break)
+				break;
+
+			as = np;
+		}
+	}
+	return b_next;
+}
+
+static int find_lcs(struct histindex *index, struct region *lcs,
+	int line1, int count1, int line2, int count2) {
+	int b_ptr;
+
+	if (scanA(index, line1, count1))
+		return -1;
+
+	index->cnt = index->max_chain_length + 1;
+
+	for (b_ptr = line2; b_ptr <= LINE_END(2); )
+		b_ptr = try_lcs(index, lcs, b_ptr, line1, count1, line2, count2);
+
+	return index->has_common && index->max_chain_length < index->cnt;
+}
+
+static int fall_back_to_classic_diff(struct histindex *index,
+		int line1, int count1, int line2, int count2)
+{
+	xpparam_t xpp;
+	xpp.flags = index->xpp->flags & ~XDF_HISTOGRAM_DIFF;
+
+	return xdl_fall_back_diff(index->env, &xpp,
+				  line1, count1, line2, count2);
+}
+
+static int histogram_diff(xpparam_t const *xpp, xdfenv_t *env,
+	int line1, int count1, int line2, int count2)
+{
+	struct histindex index;
+	struct region lcs;
+	int sz;
+	int result = -1;
+
+	if (count1 <= 0 && count2 <= 0)
+		return 0;
+
+	if (LINE_END(1) >= MAX_PTR)
+		return -1;
+
+	if (!count1) {
+		while(count2--)
+			env->xdf2.rchg[line2++ - 1] = 1;
+		return 0;
+	} else if (!count2) {
+		while(count1--)
+			env->xdf1.rchg[line1++ - 1] = 1;
+		return 0;
+	}
+
+	memset(&index, 0, sizeof(index));
+
+	index.env = env;
+	index.xpp = xpp;
+
+	index.records = NULL;
+	index.line_map = NULL;
+	/* in case of early xdl_cha_free() */
+	index.rcha.head = NULL;
+
+	index.table_bits = xdl_hashbits(count1);
+	sz = index.records_size = 1 << index.table_bits;
+	sz *= sizeof(struct record *);
+	if (!(index.records = (struct record **) xdl_malloc(sz)))
+		goto cleanup;
+	memset(index.records, 0, sz);
+
+	sz = index.line_map_size = count1;
+	sz *= sizeof(struct record *);
+	if (!(index.line_map = (struct record **) xdl_malloc(sz)))
+		goto cleanup;
+	memset(index.line_map, 0, sz);
+
+	sz = index.line_map_size;
+	sz *= sizeof(unsigned int);
+	if (!(index.next_ptrs = (unsigned int *) xdl_malloc(sz)))
+		goto cleanup;
+	memset(index.next_ptrs, 0, sz);
+
+	/* lines / 4 + 1 comes from xprepare.c:xdl_prepare_ctx() */
+	if (xdl_cha_init(&index.rcha, sizeof(struct record), count1 / 4 + 1) < 0)
+		goto cleanup;
+
+	index.ptr_shift = line1;
+	index.max_chain_length = 64;
+
+	memset(&lcs, 0, sizeof(lcs));
+	if (find_lcs(&index, &lcs, line1, count1, line2, count2))
+		result = fall_back_to_classic_diff(&index, line1, count1, line2, count2);
+	else {
+		result = 0;
+		if (lcs.begin1 == 0 && lcs.begin2 == 0) {
+			int ptr;
+			for (ptr = 0; ptr < count1; ptr++)
+				env->xdf1.rchg[line1 + ptr - 1] = 1;
+			for (ptr = 0; ptr < count2; ptr++)
+				env->xdf2.rchg[line2 + ptr - 1] = 1;
+		} else {
+			result += histogram_diff(xpp, env,
+				line1, lcs.begin1 - line1,
+				line2, lcs.begin2 - line2);
+			result += histogram_diff(xpp, env,
+				lcs.end1 + 1, LINE_END(1) - lcs.end1,
+				lcs.end2 + 1, LINE_END(2) - lcs.end2);
+		}
+	}
+
+cleanup:
+	xdl_free(index.records);
+	xdl_free(index.line_map);
+	xdl_free(index.next_ptrs);
+	xdl_cha_free(&index.rcha);
+
+	return result;
+}
+
+int xdl_do_histogram_diff(mmfile_t *file1, mmfile_t *file2,
+	xpparam_t const *xpp, xdfenv_t *env)
+{
+	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
+		return -1;
+
+	return histogram_diff(xpp, env,
+		env->xdf1.dstart + 1, env->xdf1.dend - env->xdf1.dstart + 1,
+		env->xdf2.dstart + 1, env->xdf2.dend - env->xdf2.dstart + 1);
+}
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 7/8] xdiff/xprepare: skip classification
  2011-08-01  3:16             ` [PATCH v2 6/8] teach --histogram to diff Tay Ray Chuan
@ 2011-08-01  3:16               ` Tay Ray Chuan
  2011-08-01  3:16                 ` [PATCH v2 8/8] xdiff/xprepare: use a smaller sample size for histogram diff Tay Ray Chuan
  0 siblings, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  3:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

xdiff performs "classification" of records (xdl_classify_record()),
replacing hashes (xrecord_t.ha) with a unique identifier of the
record/line and building a hash table (xrecord_t.rhash) of records. This
is then used to "cleanup" records (xdl_cleanup_records()).

We don't need any of that in histogram diff, so we omit calls to these
functions. We also skip allocating memory to the hash table, rhash, as
it is no longer used.

This gives us a small boost in performance.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 xdiff/xprepare.c |   28 ++++++++++++++++++----------
 1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 3ebad0f..c139ba8 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -154,11 +154,15 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 	if (!(recs = (xrecord_t **) xdl_malloc(narec * sizeof(xrecord_t *))))
 		goto abort;
 
-	hbits = xdl_hashbits((unsigned int) narec);
-	hsize = 1 << hbits;
-	if (!(rhash = (xrecord_t **) xdl_malloc(hsize * sizeof(xrecord_t *))))
-		goto abort;
-	memset(rhash, 0, hsize * sizeof(xrecord_t *));
+	if (xpp->flags & XDF_HISTOGRAM_DIFF)
+		hbits = hsize = 0;
+	else {
+		hbits = xdl_hashbits((unsigned int) narec);
+		hsize = 1 << hbits;
+		if (!(rhash = (xrecord_t **) xdl_malloc(hsize * sizeof(xrecord_t *))))
+			goto abort;
+		memset(rhash, 0, hsize * sizeof(xrecord_t *));
+	}
 
 	nrec = 0;
 	if ((cur = blk = xdl_mmfile_first(mf, &bsize)) != NULL) {
@@ -178,7 +182,8 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 			crec->ha = hav;
 			recs[nrec++] = crec;
 
-			if (xdl_classify_record(cf, rhash, hbits, crec) < 0)
+			if (!(xpp->flags & XDF_HISTOGRAM_DIFF) &&
+				xdl_classify_record(cf, rhash, hbits, crec) < 0)
 				goto abort;
 		}
 	}
@@ -235,7 +240,8 @@ int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	enl1 = xdl_guess_lines(mf1) + 1;
 	enl2 = xdl_guess_lines(mf2) + 1;
 
-	if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) {
+	if (!(xpp->flags & XDF_HISTOGRAM_DIFF) &&
+		xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) {
 
 		return -1;
 	}
@@ -252,10 +258,12 @@ int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		return -1;
 	}
 
-	xdl_free_classifier(&cf);
+	if (!(xpp->flags & XDF_HISTOGRAM_DIFF))
+		xdl_free_classifier(&cf);
 
-	if (!(xpp->flags & XDF_PATIENCE_DIFF) &&
-			xdl_optimize_ctxs(&xe->xdf1, &xe->xdf2) < 0) {
+	if ((xpp->flags & XDF_HISTOGRAM_DIFF) ?
+		xdl_trim_ends(&xe->xdf1, &xe->xdf2) < 0
+		: (!(xpp->flags & XDF_PATIENCE_DIFF) && xdl_optimize_ctxs(&xe->xdf1, &xe->xdf2) < 0)) {
 
 		xdl_free_ctx(&xe->xdf2);
 		xdl_free_ctx(&xe->xdf1);
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 8/8] xdiff/xprepare: use a smaller sample size for histogram diff
  2011-08-01  3:16               ` [PATCH v2 7/8] xdiff/xprepare: skip classification Tay Ray Chuan
@ 2011-08-01  3:16                 ` Tay Ray Chuan
  0 siblings, 0 replies; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  3:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

For histogram diff, we can afford a smaller sample size and thus a
poorer estimate of the number of lines, as the hash table (rhash) won't
be filled up/grown. This is safe as the final count of lines (xdf.nrecs)
will be updated correctly anyway by xdl_prepare_ctx().

This gives us a small boost in performance.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 xdiff/xprepare.c |   17 ++++++++++++++---
 xdiff/xutils.c   |    8 ++------
 xdiff/xutils.h   |    2 +-
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index c139ba8..b4d8402 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -26,6 +26,8 @@
 #define XDL_KPDIS_RUN 4
 #define XDL_MAX_EQLIMIT 1024
 #define XDL_SIMSCAN_WINDOW 100
+#define XDL_GUESS_NLINES1 256
+#define XDL_GUESS_NLINES2 20
 
 
 typedef struct s_xdlclass {
@@ -234,11 +236,20 @@ static void xdl_free_ctx(xdfile_t *xdf) {
 
 int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		    xdfenv_t *xe) {
-	long enl1, enl2;
+	long enl1, enl2, sample;
 	xdlclassifier_t cf;
 
-	enl1 = xdl_guess_lines(mf1) + 1;
-	enl2 = xdl_guess_lines(mf2) + 1;
+	/*
+	 * For histogram diff, we can afford a smaller sample size and
+	 * thus a poorer estimate of the number of lines, as the hash
+	 * table (rhash) won't be filled up/grown. The number of lines
+	 * (nrecs) will be updated correctly anyway by
+	 * xdl_prepare_ctx().
+	 */
+	sample = xpp->flags & XDF_HISTOGRAM_DIFF ? XDL_GUESS_NLINES2 : XDL_GUESS_NLINES1;
+
+	enl1 = xdl_guess_lines(mf1, sample) + 1;
+	enl2 = xdl_guess_lines(mf2, sample) + 1;
 
 	if (!(xpp->flags & XDF_HISTOGRAM_DIFF) &&
 		xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) {
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 890cc4f..0de084e 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -24,10 +24,6 @@
 
 
 
-#define XDL_GUESS_NLINES 256
-
-
-
 
 long xdl_bogosqrt(long n) {
 	long i;
@@ -153,12 +149,12 @@ void *xdl_cha_next(chastore_t *cha) {
 }
 
 
-long xdl_guess_lines(mmfile_t *mf) {
+long xdl_guess_lines(mmfile_t *mf, long sample) {
 	long nl = 0, size, tsize = 0;
 	char const *data, *cur, *top;
 
 	if ((cur = data = xdl_mmfile_first(mf, &size)) != NULL) {
-		for (top = data + size; nl < XDL_GUESS_NLINES && cur < top; ) {
+		for (top = data + size; nl < sample && cur < top; ) {
 			nl++;
 			if (!(cur = memchr(cur, '\n', top - cur)))
 				cur = top;
diff --git a/xdiff/xutils.h b/xdiff/xutils.h
index 674a657..714719a 100644
--- a/xdiff/xutils.h
+++ b/xdiff/xutils.h
@@ -33,7 +33,7 @@ void xdl_cha_free(chastore_t *cha);
 void *xdl_cha_alloc(chastore_t *cha);
 void *xdl_cha_first(chastore_t *cha);
 void *xdl_cha_next(chastore_t *cha);
-long xdl_guess_lines(mmfile_t *mf);
+long xdl_guess_lines(mmfile_t *mf, long sample);
 int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags);
 unsigned long xdl_hash_record(char const **data, char const *top, long flags);
 unsigned int xdl_hashbits(unsigned int size);
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 0/4] changes for rc/histogram-diff in 'next'
  2011-08-01  3:16 ` [PATCH v2 0/8] " Tay Ray Chuan
  2011-08-01  3:16   ` [PATCH v2 1/8] xdiff/xprepare: use memset() Tay Ray Chuan
@ 2011-08-01  4:20   ` Tay Ray Chuan
  2011-08-01  4:20     ` [PATCH 1/4] xdiff: do away with xdl_mmfile_next() Tay Ray Chuan
  1 sibling, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  4:20 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

This is a rebased version of this series:

  [PATCH v2 0/8] teach --histogram to diff
  <1312168608-10828-1-git-send-email-rctay89@gmail.com>

I had overlooked that the previous iteration of the series has been
added to 'next'.

Contents:
[PATCH 1/4] xdiff: do away with xdl_mmfile_next()
[PATCH 2/4] xdiff/xhistogram: rework handling of recursed results
[PATCH 3/4] xdiff/xhistogram: rely on xdl_trim_ends()
[PATCH 4/4] xdiff/xhistogram: drop need for additional variable

-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] xdiff: do away with xdl_mmfile_next()
  2011-08-01  4:20   ` [PATCH 0/4] changes for rc/histogram-diff in 'next' Tay Ray Chuan
@ 2011-08-01  4:20     ` Tay Ray Chuan
  2011-08-01  4:20       ` [PATCH 2/4] xdiff/xhistogram: rework handling of recursed results Tay Ray Chuan
  0 siblings, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  4:20 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

Given our simple mmfile structure, xdl_mmfile_next() calls are
redundant. Do away with calls to them.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 xdiff/xdiff.h    |    1 -
 xdiff/xprepare.c |    7 +------
 xdiff/xutils.c   |   14 +-------------
 3 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index c26170c..4beb10c 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -106,7 +106,6 @@ typedef struct s_bdiffparam {
 #define xdl_realloc(ptr,x) realloc(ptr,x)
 
 void *xdl_mmfile_first(mmfile_t *mmf, long *size);
-void *xdl_mmfile_next(mmfile_t *mmf, long *size);
 long xdl_mmfile_size(mmfile_t *mmf);
 
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index dfbb0de..620fc9a 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -168,12 +168,7 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 
 	nrec = 0;
 	if ((cur = blk = xdl_mmfile_first(mf, &bsize)) != NULL) {
-		for (top = blk + bsize;;) {
-			if (cur >= top) {
-				if (!(cur = blk = xdl_mmfile_next(mf, &bsize)))
-					break;
-				top = blk + bsize;
-			}
+		for (top = blk + bsize; cur < top; ) {
 			prev = cur;
 			hav = xdl_hash_record(&cur, top, xpp->flags);
 			if (nrec >= narec) {
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index a45e89b..0de084e 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -67,12 +67,6 @@ void *xdl_mmfile_first(mmfile_t *mmf, long *size)
 }
 
 
-void *xdl_mmfile_next(mmfile_t *mmf, long *size)
-{
-	return NULL;
-}
-
-
 long xdl_mmfile_size(mmfile_t *mmf)
 {
 	return mmf->size;
@@ -160,13 +154,7 @@ long xdl_guess_lines(mmfile_t *mf, long sample) {
 	char const *data, *cur, *top;
 
 	if ((cur = data = xdl_mmfile_first(mf, &size)) != NULL) {
-		for (top = data + size; nl < sample;) {
-			if (cur >= top) {
-				tsize += (long) (cur - data);
-				if (!(cur = data = xdl_mmfile_next(mf, &size)))
-					break;
-				top = data + size;
-			}
+		for (top = data + size; nl < sample && cur < top; ) {
 			nl++;
 			if (!(cur = memchr(cur, '\n', top - cur)))
 				cur = top;
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] xdiff/xhistogram: rework handling of recursed results
  2011-08-01  4:20     ` [PATCH 1/4] xdiff: do away with xdl_mmfile_next() Tay Ray Chuan
@ 2011-08-01  4:20       ` Tay Ray Chuan
  2011-08-01  4:20         ` [PATCH 3/4] xdiff/xhistogram: rely on xdl_trim_ends() Tay Ray Chuan
  0 siblings, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  4:20 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

Previously we were over-complicating matters by trying to combine the
recursed results. Now, terminate immediately if a recursive call failed
and return its result.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 xdiff/xhistogram.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index 391333a..9cb69ea 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -339,21 +339,22 @@ static int histogram_diff(xpparam_t const *xpp, xdfenv_t *env,
 	if (find_lcs(&index, &lcs, line1, count1, line2, count2))
 		result = fall_back_to_classic_diff(&index, line1, count1, line2, count2);
 	else {
-		result = 0;
 		if (lcs.begin1 == 0 && lcs.begin2 == 0) {
 			int ptr;
 			for (ptr = 0; ptr < count1; ptr++)
 				env->xdf1.rchg[line1 + ptr - 1] = 1;
 			for (ptr = 0; ptr < count2; ptr++)
 				env->xdf2.rchg[line2 + ptr - 1] = 1;
+			result = 0;
 		} else {
-			result = histogram_diff(xpp, env,
+			if (result = histogram_diff(xpp, env,
 				line1, lcs.begin1 - line1,
-				line2, lcs.begin2 - line2);
-			result = histogram_diff(xpp, env,
+				line2, lcs.begin2 - line2))
+				goto cleanup;
+			if (result = histogram_diff(xpp, env,
 				lcs.end1 + 1, LINE_END(1) - lcs.end1,
-				lcs.end2 + 1, LINE_END(2) - lcs.end2);
-			result *= -1;
+				lcs.end2 + 1, LINE_END(2) - lcs.end2))
+				goto cleanup;
 		}
 	}
 
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] xdiff/xhistogram: rely on xdl_trim_ends()
  2011-08-01  4:20       ` [PATCH 2/4] xdiff/xhistogram: rework handling of recursed results Tay Ray Chuan
@ 2011-08-01  4:20         ` Tay Ray Chuan
  2011-08-01  4:20           ` [PATCH 4/4] xdiff/xhistogram: drop need for additional variable Tay Ray Chuan
  0 siblings, 1 reply; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  4:20 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

Do away with reduce_common_start_end() and use xdf->dstart and xdf->dend
set by xdl_trim_ends() that similarly tells us where the first unmatched
line from the start and end occurs.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---

On Wed, Jul 13, 2011 at 3:56 AM, Junio C Hamano <gitster@pobox.com> wrote:
>[snip]
>> +     reduce_common_start_end(xpp, env, &line1, &count1, &line2, &count2);
>
> What this does is logically not specific to histogram algorithm but can be
> applied to other backends, no? And I vaguely recall that Linus did try
> something like this once, found some issues with it when context is set to
> non zero, and stopped doing it (sorry, I do not have any more details).
>
> I am not suggesting you to remove this call or hoist the call to one level
> up to xdl_do_diff(), but I do have to wonder how much of the performance
> improvement you reported is due to this common head/tail reduction.

In a way, this patch is a response to Junio's email. It just made sense
to use existing functionality in xdiff (xdl_trim_ends() and xdf->dstart
and dend) over writing a new one (reduce_common_start_end()).

I believe Junio was referring to this patch by Linus:

  https://lkml.org/lkml/2007/12/20/692

That is an optimization on a more aggressive level - cutting out
content so that it doesn't get hashed in the first place. The
optimization used here (reduce_common_start_end()/xdl_trim_ends())
depends on the hashed result and simply reduces the "area" on which the
algorithm is applied to.

(Actually, I do have a working patch that does content trimming that is
context-length safe. But it's not specific to histogram so I'll keep it
with me till this series gets merged, lest it holds up the series.)

 xdiff/xhistogram.c |   31 ++++---------------------------
 1 files changed, 4 insertions(+), 27 deletions(-)

diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index 9cb69ea..804e19b 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -102,7 +102,7 @@ static int cmp_recs(xpparam_t const *xpp,
 	(cmp_recs(xpp, REC(env, s1, l1), REC(env, s2, l2)))
 
 #define CMP(i, s1, l1, s2, l2) \
-	(CMP_ENV(i->xpp, i->env, s1, l1, s2, l2))
+	(cmp_recs(i->xpp, REC(i->env, s1, l1), REC(i->env, s2, l2)))
 
 #define TABLE_HASH(index, side, line) \
 	XDL_HASHLONG((REC(index->env, side, line))->ha, index->table_bits)
@@ -248,23 +248,6 @@ static int find_lcs(struct histindex *index, struct region *lcs,
 	return index->has_common && index->max_chain_length < index->cnt;
 }
 
-static void reduce_common_start_end(xpparam_t const *xpp, xdfenv_t *env,
-	int *line1, int *count1, int *line2, int *count2)
-{
-	if (*count1 <= 1 || *count2 <= 1)
-		return;
-	while (*count1 > 1 && *count2 > 1 && CMP_ENV(xpp, env, 1, *line1, 2, *line2)) {
-		(*line1)++;
-		(*count1)--;
-		(*line2)++;
-		(*count2)--;
-	}
-	while (*count1 > 1 && *count2 > 1 && CMP_ENV(xpp, env, 1, LINE_END_PTR(1), 2, LINE_END_PTR(2))) {
-		(*count1)--;
-		(*count2)--;
-	}
-}
-
 static int fall_back_to_classic_diff(struct histindex *index,
 		int line1, int count1, int line2, int count2)
 {
@@ -370,16 +353,10 @@ cleanup:
 int xdl_do_histogram_diff(mmfile_t *file1, mmfile_t *file2,
 	xpparam_t const *xpp, xdfenv_t *env)
 {
-	int line1, line2, count1, count2;
-
 	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
 		return -1;
 
-	line1 = line2 = 1;
-	count1 = env->xdf1.nrec;
-	count2 = env->xdf2.nrec;
-
-	reduce_common_start_end(xpp, env, &line1, &count1, &line2, &count2);
-
-	return histogram_diff(xpp, env, line1, count1, line2, count2);
+	return histogram_diff(xpp, env,
+		env->xdf1.dstart + 1, env->xdf1.dend - env->xdf1.dstart + 1,
+		env->xdf2.dstart + 1, env->xdf2.dend - env->xdf2.dstart + 1);
 }
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] xdiff/xhistogram: drop need for additional variable
  2011-08-01  4:20         ` [PATCH 3/4] xdiff/xhistogram: rely on xdl_trim_ends() Tay Ray Chuan
@ 2011-08-01  4:20           ` Tay Ray Chuan
  0 siblings, 0 replies; 24+ messages in thread
From: Tay Ray Chuan @ 2011-08-01  4:20 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

Having an additional variable (ptr) instead of changing line(1|2) and
count(1|2) was for debugging purposes.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 xdiff/xhistogram.c |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index 804e19b..3201fc5 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -323,11 +323,10 @@ static int histogram_diff(xpparam_t const *xpp, xdfenv_t *env,
 		result = fall_back_to_classic_diff(&index, line1, count1, line2, count2);
 	else {
 		if (lcs.begin1 == 0 && lcs.begin2 == 0) {
-			int ptr;
-			for (ptr = 0; ptr < count1; ptr++)
-				env->xdf1.rchg[line1 + ptr - 1] = 1;
-			for (ptr = 0; ptr < count2; ptr++)
-				env->xdf2.rchg[line2 + ptr - 1] = 1;
+			while (count1--)
+				env->xdf1.rchg[line1++ - 1] = 1;
+			while (count2--)
+				env->xdf2.rchg[line2++ - 1] = 1;
 			result = 0;
 		} else {
 			if (result = histogram_diff(xpp, env,
-- 
1.7.3.4.730.g67af1.dirty

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2011-08-01  4:20 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-12  6:10 [RFC/PATCH 0/3] teach --histogram to diff Tay Ray Chuan
2011-07-12  6:10 ` [RFC/PATCH 1/3] " Tay Ray Chuan
2011-07-12  6:10   ` [RFC/PATCH 2/3] xdiff/xprepare: skip classification Tay Ray Chuan
2011-07-12  6:10     ` [RFC/PATCH 3/3] xdiff/xprepare: use a smaller sample size for histogram diff Tay Ray Chuan
2011-07-12 19:56   ` [RFC/PATCH 1/3] teach --histogram to diff Junio C Hamano
2011-07-13 16:36     ` Tay Ray Chuan
2011-07-12 14:19 ` [RFC/PATCH 0/3] " Shawn Pearce
2011-07-12 17:43   ` Junio C Hamano
2011-07-13 16:35     ` Tay Ray Chuan
2011-07-13 16:34   ` Tay Ray Chuan
2011-08-01  3:16 ` [PATCH v2 0/8] " Tay Ray Chuan
2011-08-01  3:16   ` [PATCH v2 1/8] xdiff/xprepare: use memset() Tay Ray Chuan
2011-08-01  3:16     ` [PATCH v2 2/8] do away with xdl_mmfile_next() Tay Ray Chuan
2011-08-01  3:16       ` [PATCH v2 3/8] xdiff/xprepare: refactor abort cleanups Tay Ray Chuan
2011-08-01  3:16         ` [PATCH v2 4/8] xdiff/xpatience: factor out fall-back-diff function Tay Ray Chuan
2011-08-01  3:16           ` [PATCH v2 5/8] t4033-diff-patience: factor out tests Tay Ray Chuan
2011-08-01  3:16             ` [PATCH v2 6/8] teach --histogram to diff Tay Ray Chuan
2011-08-01  3:16               ` [PATCH v2 7/8] xdiff/xprepare: skip classification Tay Ray Chuan
2011-08-01  3:16                 ` [PATCH v2 8/8] xdiff/xprepare: use a smaller sample size for histogram diff Tay Ray Chuan
2011-08-01  4:20   ` [PATCH 0/4] changes for rc/histogram-diff in 'next' Tay Ray Chuan
2011-08-01  4:20     ` [PATCH 1/4] xdiff: do away with xdl_mmfile_next() Tay Ray Chuan
2011-08-01  4:20       ` [PATCH 2/4] xdiff/xhistogram: rework handling of recursed results Tay Ray Chuan
2011-08-01  4:20         ` [PATCH 3/4] xdiff/xhistogram: rely on xdl_trim_ends() Tay Ray Chuan
2011-08-01  4:20           ` [PATCH 4/4] xdiff/xhistogram: drop need for additional variable Tay Ray Chuan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).