git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] A couple of small patience diff cleanups
@ 2021-05-04  9:25 Phillip Wood via GitGitGadget
  2021-05-04  9:25 ` [PATCH 1/2] patience diff: remove unnecessary string comparisons Phillip Wood via GitGitGadget
  2021-05-04  9:25 ` [PATCH 2/2] patience diff: remove unused variable Phillip Wood via GitGitGadget
  0 siblings, 2 replies; 8+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-05-04  9:25 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Phillip Wood

Remove some unnecessary string comparisons and an unused variable

Phillip Wood (2):
  patience diff: remove unnecessary string comparisons
  patience diff: remove unused variable

 xdiff/xpatience.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)


base-commit: 7e391989789db82983665667013a46eabc6fc570
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-948%2Fphillipwood%2Fwip%2Fpatience-diff-cleanup-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-948/phillipwood/wip/patience-diff-cleanup-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/948
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] patience diff: remove unnecessary string comparisons
  2021-05-04  9:25 [PATCH 0/2] A couple of small patience diff cleanups Phillip Wood via GitGitGadget
@ 2021-05-04  9:25 ` Phillip Wood via GitGitGadget
  2021-05-05  0:31   ` Junio C Hamano
  2021-05-04  9:25 ` [PATCH 2/2] patience diff: remove unused variable Phillip Wood via GitGitGadget
  1 sibling, 1 reply; 8+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-05-04  9:25 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

xdl_prepare_env() calls xdl_classify_record() which arranges for the
hashes of non-matching lines to be different so lines can be tested
for equality by comparing just their hashes.

This reduces the time taken to calculate the diff of v2.28.0 to
v2.29.0 by ~3-4%.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xpatience.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index 20699a6f6054..db2d53e89cb0 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -90,7 +90,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
 {
 	xrecord_t **records = pass == 1 ?
 		map->env->xdf1.recs : map->env->xdf2.recs;
-	xrecord_t *record = records[line - 1], *other;
+	xrecord_t *record = records[line - 1];
 	/*
 	 * After xdl_prepare_env() (or more precisely, due to
 	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
@@ -104,11 +104,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
 	int index = (int)((record->ha << 1) % map->alloc);
 
 	while (map->entries[index].line1) {
-		other = map->env->xdf1.recs[map->entries[index].line1 - 1];
-		if (map->entries[index].hash != record->ha ||
-				!xdl_recmatch(record->ptr, record->size,
-					other->ptr, other->size,
-					map->xpp->flags)) {
+		if (map->entries[index].hash != record->ha) {
 			if (++index >= map->alloc)
 				index = 0;
 			continue;
@@ -253,8 +249,7 @@ static int match(struct hashmap *map, int line1, int line2)
 {
 	xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
 	xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
-	return xdl_recmatch(record1->ptr, record1->size,
-		record2->ptr, record2->size, map->xpp->flags);
+	return record1->ha == record2->ha;
 }
 
 static int patience_diff(mmfile_t *file1, mmfile_t *file2,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] patience diff: remove unused variable
  2021-05-04  9:25 [PATCH 0/2] A couple of small patience diff cleanups Phillip Wood via GitGitGadget
  2021-05-04  9:25 ` [PATCH 1/2] patience diff: remove unnecessary string comparisons Phillip Wood via GitGitGadget
@ 2021-05-04  9:25 ` Phillip Wood via GitGitGadget
  1 sibling, 0 replies; 8+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-05-04  9:25 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xpatience.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index db2d53e89cb0..c5d48e80aefb 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -284,9 +284,6 @@ static int walk_common_sequence(struct hashmap *map, struct entry *first,
 
 		/* Recurse */
 		if (next1 > line1 || next2 > line2) {
-			struct hashmap submap;
-
-			memset(&submap, 0, sizeof(submap));
 			if (patience_diff(map->file1, map->file2,
 					map->xpp, map->env,
 					line1, next1 - line1,
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] patience diff: remove unnecessary string comparisons
  2021-05-04  9:25 ` [PATCH 1/2] patience diff: remove unnecessary string comparisons Phillip Wood via GitGitGadget
@ 2021-05-05  0:31   ` Junio C Hamano
  2021-05-05  9:34     ` Phillip Wood
  2021-05-05 14:58     ` Johannes Schindelin
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2021-05-05  0:31 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: git, Johannes Schindelin, Phillip Wood

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> xdl_prepare_env() calls xdl_classify_record() which arranges for the
> hashes of non-matching lines to be different so lines can be tested
> for equality by comparing just their hashes.

Hmph, that is a bit different from what I read from the comment in
the post context of the first hunk, though.

	/*
	 * After xdl_prepare_env() (or more precisely, due to
	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
	 * is _not_ the hash anymore, but a linearized version of it.  In
	 * other words, the "ha" member is guaranteed to start with 0 and
	 * the second record's ha can only be 0 or 1, etc.
	 *
	 * So we multiply ha by 2 in the hope that the hashing was
	 * "unique enough".
	 */

The words "home" and "enough" hints to me that the "ha" member is
not hash, but "lineralized version of it" (whatever it means) does
not guarantee that two records with the same "ha" are identical, or
does it?

Well, I should just go read xdl_classify_record() to see what it
really does, but if it eliminates collisions, then the patch is a
clear and obvious improvement.

Thanks.

> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
> index 20699a6f6054..db2d53e89cb0 100644
> --- a/xdiff/xpatience.c
> +++ b/xdiff/xpatience.c
> @@ -90,7 +90,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>  {
>  	xrecord_t **records = pass == 1 ?
>  		map->env->xdf1.recs : map->env->xdf2.recs;
> -	xrecord_t *record = records[line - 1], *other;
> +	xrecord_t *record = records[line - 1];
>  	/*
>  	 * After xdl_prepare_env() (or more precisely, due to
>  	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
> @@ -104,11 +104,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>  	int index = (int)((record->ha << 1) % map->alloc);
>  
>  	while (map->entries[index].line1) {
> -		other = map->env->xdf1.recs[map->entries[index].line1 - 1];
> -		if (map->entries[index].hash != record->ha ||
> -				!xdl_recmatch(record->ptr, record->size,
> -					other->ptr, other->size,
> -					map->xpp->flags)) {
> +		if (map->entries[index].hash != record->ha) {
>  			if (++index >= map->alloc)
>  				index = 0;
>  			continue;
> @@ -253,8 +249,7 @@ static int match(struct hashmap *map, int line1, int line2)
>  {
>  	xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
>  	xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
> -	return xdl_recmatch(record1->ptr, record1->size,
> -		record2->ptr, record2->size, map->xpp->flags);
> +	return record1->ha == record2->ha;
>  }
>  
>  static int patience_diff(mmfile_t *file1, mmfile_t *file2,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] patience diff: remove unnecessary string comparisons
  2021-05-05  0:31   ` Junio C Hamano
@ 2021-05-05  9:34     ` Phillip Wood
  2021-05-05 14:58     ` Johannes Schindelin
  1 sibling, 0 replies; 8+ messages in thread
From: Phillip Wood @ 2021-05-05  9:34 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood via GitGitGadget
  Cc: git, Johannes Schindelin, Phillip Wood

On 05/05/2021 01:31, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> xdl_prepare_env() calls xdl_classify_record() which arranges for the
>> hashes of non-matching lines to be different so lines can be tested
>> for equality by comparing just their hashes.
> 
> Hmph, that is a bit different from what I read from the comment in
> the post context of the first hunk, though.
> 
> 	/*
> 	 * After xdl_prepare_env() (or more precisely, due to
> 	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
> 	 * is _not_ the hash anymore, but a linearized version of it.  In
> 	 * other words, the "ha" member is guaranteed to start with 0 and
> 	 * the second record's ha can only be 0 or 1, etc.
> 	 *
> 	 * So we multiply ha by 2 in the hope that the hashing was
> 	 * "unique enough".
> 	 */
> 
> The words "home" and "enough" hints to me that the "ha" member is
> not hash, but "lineralized version of it" (whatever it means) does
> not guarantee that two records with the same "ha" are identical, or
> does it?

By "hashes" I meant "the value of record->ha". That comment is a bit 
confusing. I think "linearized version of it" is referring to 
xdl_classify_record() assigning a unique integer to each unique input 
line starting from zero and increasing by one for each unique input line 
(the function is fairly easy to follow). I assume "unique enough" is 
referring to the line below the comment which takes the modulus of 
record->ha and record->ha is not evenly distributed over the whole 
integer range but bunched at the lower end.

The Myers implementation calls xdl_classify_record() and then only ever 
compares record->ha, it does not call xdl_recmatch() while computing the 
diff.

> Well, I should just go read xdl_classify_record() to see what it
> really does, but if it eliminates collisions, then the patch is a
> clear and obvious improvement.

Thanks

Phillip


> Thanks.
> 
>> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
>> index 20699a6f6054..db2d53e89cb0 100644
>> --- a/xdiff/xpatience.c
>> +++ b/xdiff/xpatience.c
>> @@ -90,7 +90,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>>   {
>>   	xrecord_t **records = pass == 1 ?
>>   		map->env->xdf1.recs : map->env->xdf2.recs;
>> -	xrecord_t *record = records[line - 1], *other;
>> +	xrecord_t *record = records[line - 1];
>>   	/*
>>   	 * After xdl_prepare_env() (or more precisely, due to
>>   	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
>> @@ -104,11 +104,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>>   	int index = (int)((record->ha << 1) % map->alloc);
>>   
>>   	while (map->entries[index].line1) {
>> -		other = map->env->xdf1.recs[map->entries[index].line1 - 1];
>> -		if (map->entries[index].hash != record->ha ||
>> -				!xdl_recmatch(record->ptr, record->size,
>> -					other->ptr, other->size,
>> -					map->xpp->flags)) {
>> +		if (map->entries[index].hash != record->ha) {
>>   			if (++index >= map->alloc)
>>   				index = 0;
>>   			continue;
>> @@ -253,8 +249,7 @@ static int match(struct hashmap *map, int line1, int line2)
>>   {
>>   	xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
>>   	xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
>> -	return xdl_recmatch(record1->ptr, record1->size,
>> -		record2->ptr, record2->size, map->xpp->flags);
>> +	return record1->ha == record2->ha;
>>   }
>>   
>>   static int patience_diff(mmfile_t *file1, mmfile_t *file2,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] patience diff: remove unnecessary string comparisons
  2021-05-05  0:31   ` Junio C Hamano
  2021-05-05  9:34     ` Phillip Wood
@ 2021-05-05 14:58     ` Johannes Schindelin
  2021-05-05 18:00       ` Phillip Wood
  2021-05-06  1:32       ` Junio C Hamano
  1 sibling, 2 replies; 8+ messages in thread
From: Johannes Schindelin @ 2021-05-05 14:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood via GitGitGadget, git, Phillip Wood

Hi Junio,

On Wed, 5 May 2021, Junio C Hamano wrote:

> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Phillip Wood <phillip.wood@dunelm.org.uk>
> >
> > xdl_prepare_env() calls xdl_classify_record() which arranges for the
> > hashes of non-matching lines to be different so lines can be tested
> > for equality by comparing just their hashes.
>
> Hmph, that is a bit different from what I read from the comment in
> the post context of the first hunk, though.
>
> 	/*
> 	 * After xdl_prepare_env() (or more precisely, due to
> 	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
> 	 * is _not_ the hash anymore, but a linearized version of it.  In
> 	 * other words, the "ha" member is guaranteed to start with 0 and
> 	 * the second record's ha can only be 0 or 1, etc.
> 	 *
> 	 * So we multiply ha by 2 in the hope that the hashing was
> 	 * "unique enough".
> 	 */
>
> The words "home" and "enough" hints to me that the "ha" member is
> not hash, but "lineralized version of it" (whatever it means) does
> not guarantee that two records with the same "ha" are identical, or
> does it?
>
> Well, I should just go read xdl_classify_record() to see what it
> really does, but if it eliminates collisions, then the patch is a
> clear and obvious improvement.

Right. I had the same concern. But it does look as if
`xdl_classify_record()` replaced the possibly non-unique hash values to
unique sequential identifiers.

I have to admit that the code is unnecessarily hard to read for me:
https://github.com/git/git/blob/v2.31.1/xdiff/xprepare.c#L110-L157

But I do gather that the loop at
https://github.com/git/git/blob/v2.31.1/xdiff/xprepare.c#L119-L123
is called for every line, that it does compare it to every seen line with
the same hash, and that it exits the loop early if the contents disagree:

	for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
		if (rcrec->ha == rec->ha &&
				xdl_recmatch(rcrec->line, rcrec->size,
					rec->ptr, rec->size, cf->flags))
			break;

Since naming is hard (and you can easily err on saving space at the
expense of costing readers' time, as libxdiff proves), and since I am
running out of review time, I'll have to assume that
https://github.com/git/git/blob/v2.31.1/xdiff/xprepare.c#L150-L154 means
that indeed, the `ha` field is set to a counter that uniquely identifies
the line contents:

	rec->ha = (unsigned long) rcrec->idx;


	hi = (long) XDL_HASHLONG(rec->ha, hbits);
	rec->next = rhash[hi];
	rhash[hi] = rec;

So I am fairly confident that the patch is good, and the performance win
is nice.

Thanks!
Dscho

>
> Thanks.
>
> > diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
> > index 20699a6f6054..db2d53e89cb0 100644
> > --- a/xdiff/xpatience.c
> > +++ b/xdiff/xpatience.c
> > @@ -90,7 +90,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
> >  {
> >  	xrecord_t **records = pass == 1 ?
> >  		map->env->xdf1.recs : map->env->xdf2.recs;
> > -	xrecord_t *record = records[line - 1], *other;
> > +	xrecord_t *record = records[line - 1];
> >  	/*
> >  	 * After xdl_prepare_env() (or more precisely, due to
> >  	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
> > @@ -104,11 +104,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
> >  	int index = (int)((record->ha << 1) % map->alloc);
> >
> >  	while (map->entries[index].line1) {
> > -		other = map->env->xdf1.recs[map->entries[index].line1 - 1];
> > -		if (map->entries[index].hash != record->ha ||
> > -				!xdl_recmatch(record->ptr, record->size,
> > -					other->ptr, other->size,
> > -					map->xpp->flags)) {
> > +		if (map->entries[index].hash != record->ha) {
> >  			if (++index >= map->alloc)
> >  				index = 0;
> >  			continue;
> > @@ -253,8 +249,7 @@ static int match(struct hashmap *map, int line1, int line2)
> >  {
> >  	xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
> >  	xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
> > -	return xdl_recmatch(record1->ptr, record1->size,
> > -		record2->ptr, record2->size, map->xpp->flags);
> > +	return record1->ha == record2->ha;
> >  }
> >
> >  static int patience_diff(mmfile_t *file1, mmfile_t *file2,
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] patience diff: remove unnecessary string comparisons
  2021-05-05 14:58     ` Johannes Schindelin
@ 2021-05-05 18:00       ` Phillip Wood
  2021-05-06  1:32       ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Phillip Wood @ 2021-05-05 18:00 UTC (permalink / raw)
  To: Johannes Schindelin, Junio C Hamano
  Cc: Phillip Wood via GitGitGadget, git, Phillip Wood

Hi Dscho

On 05/05/2021 15:58, Johannes Schindelin wrote:
> Hi Junio,
> 
> On Wed, 5 May 2021, Junio C Hamano wrote:
> 
>> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>>
>>> xdl_prepare_env() calls xdl_classify_record() which arranges for the
>>> hashes of non-matching lines to be different so lines can be tested
>>> for equality by comparing just their hashes.
>>
>> Hmph, that is a bit different from what I read from the comment in
>> the post context of the first hunk, though.
>>
>> 	/*
>> 	 * After xdl_prepare_env() (or more precisely, due to
>> 	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
>> 	 * is _not_ the hash anymore, but a linearized version of it.  In
>> 	 * other words, the "ha" member is guaranteed to start with 0 and
>> 	 * the second record's ha can only be 0 or 1, etc.
>> 	 *
>> 	 * So we multiply ha by 2 in the hope that the hashing was
>> 	 * "unique enough".
>> 	 */
>>
>> The words "home" and "enough" hints to me that the "ha" member is
>> not hash, but "lineralized version of it" (whatever it means) does
>> not guarantee that two records with the same "ha" are identical, or
>> does it?
>>
>> Well, I should just go read xdl_classify_record() to see what it
>> really does, but if it eliminates collisions, then the patch is a
>> clear and obvious improvement.
> 
> Right. I had the same concern. But it does look as if
> `xdl_classify_record()` replaced the possibly non-unique hash values to
> unique sequential identifiers.
> 
> I have to admit that the code is unnecessarily hard to read for me:
> https://github.com/git/git/blob/v2.31.1/xdiff/xprepare.c#L110-L157
> 
> But I do gather that the loop at
> https://github.com/git/git/blob/v2.31.1/xdiff/xprepare.c#L119-L123
> is called for every line, that it does compare it to every seen line with
> the same hash, and that it exits the loop early if the contents disagree:
> 
> 	for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
> 		if (rcrec->ha == rec->ha &&
> 				xdl_recmatch(rcrec->line, rcrec->size,
> 					rec->ptr, rec->size, cf->flags))
> 			break;
> 
> Since naming is hard (and you can easily err on saving space at the
> expense of costing readers' time, as libxdiff proves), and since I am
> running out of review time, I'll have to assume that
> https://github.com/git/git/blob/v2.31.1/xdiff/xprepare.c#L150-L154 means
> that indeed, the `ha` field is set to a counter that uniquely identifies
> the line contents:
> 
> 	rec->ha = (unsigned long) rcrec->idx;
> 
> 
> 	hi = (long) XDL_HASHLONG(rec->ha, hbits);
> 	rec->next = rhash[hi];
> 	rhash[hi] = rec;
> 
> So I am fairly confident that the patch is good, and the performance win
> is nice.

Thanks for taking the time to review this patch, I agree with your 
analysis. The output of `git log -p --diff-algorithm=patience 
origin/master` for the whole history of git.git is unchanged by this patch.

Best Wishes

Phillip

> 
> Thanks!
> Dscho
> 
>>
>> Thanks.
>>
>>> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
>>> index 20699a6f6054..db2d53e89cb0 100644
>>> --- a/xdiff/xpatience.c
>>> +++ b/xdiff/xpatience.c
>>> @@ -90,7 +90,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>>>   {
>>>   	xrecord_t **records = pass == 1 ?
>>>   		map->env->xdf1.recs : map->env->xdf2.recs;
>>> -	xrecord_t *record = records[line - 1], *other;
>>> +	xrecord_t *record = records[line - 1];
>>>   	/*
>>>   	 * After xdl_prepare_env() (or more precisely, due to
>>>   	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
>>> @@ -104,11 +104,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>>>   	int index = (int)((record->ha << 1) % map->alloc);
>>>
>>>   	while (map->entries[index].line1) {
>>> -		other = map->env->xdf1.recs[map->entries[index].line1 - 1];
>>> -		if (map->entries[index].hash != record->ha ||
>>> -				!xdl_recmatch(record->ptr, record->size,
>>> -					other->ptr, other->size,
>>> -					map->xpp->flags)) {
>>> +		if (map->entries[index].hash != record->ha) {
>>>   			if (++index >= map->alloc)
>>>   				index = 0;
>>>   			continue;
>>> @@ -253,8 +249,7 @@ static int match(struct hashmap *map, int line1, int line2)
>>>   {
>>>   	xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
>>>   	xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
>>> -	return xdl_recmatch(record1->ptr, record1->size,
>>> -		record2->ptr, record2->size, map->xpp->flags);
>>> +	return record1->ha == record2->ha;
>>>   }
>>>
>>>   static int patience_diff(mmfile_t *file1, mmfile_t *file2,
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] patience diff: remove unnecessary string comparisons
  2021-05-05 14:58     ` Johannes Schindelin
  2021-05-05 18:00       ` Phillip Wood
@ 2021-05-06  1:32       ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2021-05-06  1:32 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Phillip Wood via GitGitGadget, git, Phillip Wood

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Right. I had the same concern. But it does look as if
> `xdl_classify_record()` replaced the possibly non-unique hash values to
> unique sequential identifiers.
>
> I have to admit that the code is unnecessarily hard to read for me:
> https://github.com/git/git/blob/v2.31.1/xdiff/xprepare.c#L110-L157
>
> But I do gather that the loop at
> https://github.com/git/git/blob/v2.31.1/xdiff/xprepare.c#L119-L123
> is called for every line, that it does compare it to every seen line with
> the same hash, and that it exits the loop early if the contents disagree:
>
> 	for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
> 		if (rcrec->ha == rec->ha &&
> 				xdl_recmatch(rcrec->line, rcrec->size,
> 					rec->ptr, rec->size, cf->flags))
> 			break;

Yeah, I arrived at the same conclusion.  Also as Phillip said in a
separate message, Myers side already takes advantage of this same
fact, so I am fine with this change.

Thanks, both.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-06  1:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04  9:25 [PATCH 0/2] A couple of small patience diff cleanups Phillip Wood via GitGitGadget
2021-05-04  9:25 ` [PATCH 1/2] patience diff: remove unnecessary string comparisons Phillip Wood via GitGitGadget
2021-05-05  0:31   ` Junio C Hamano
2021-05-05  9:34     ` Phillip Wood
2021-05-05 14:58     ` Johannes Schindelin
2021-05-05 18:00       ` Phillip Wood
2021-05-06  1:32       ` Junio C Hamano
2021-05-04  9:25 ` [PATCH 2/2] patience diff: remove unused variable Phillip Wood via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).