git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] update internal patch-id to use "stable" algorithm
@ 2022-09-20  5:58 Jerry Zhang via GitGitGadget
  2022-09-20  5:58 ` [PATCH 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-09-20  5:58 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang

Internal usage of patch-id in rebase / cherry-pick doesn't persist
patch-ids, so there's no need to specifically invoke the unstable variant.

This allows the unstable logic to be cleaned up.

In the process, fixed a bug in the combination of --stable with binary files
and header-only, and expanded the test to cover both binary and non-binary
files.

Signed-off-by: Jerry Zhang jerry@skydio.com

Jerry Zhang (2):
  patch-id: fix stable patch id for binary / header-only
  patch-id: use stable patch-id for rebases

 builtin/log.c              |  2 +-
 diff.c                     | 44 ++++++++++++++++----------------------
 diff.h                     |  2 +-
 patch-ids.c                | 10 ++++-----
 patch-ids.h                |  2 +-
 t/t3419-rebase-patch-id.sh | 19 ++++++++++------
 6 files changed, 39 insertions(+), 40 deletions(-)


base-commit: e188ec3a735ae52a0d0d3c22f9df6b29fa613b1e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1359%2Fjerry-skydio%2Fjerry%2Frevup%2Fmaster%2Fpatch_ids-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1359/jerry-skydio/jerry/revup/master/patch_ids-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1359
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/2] patch-id: fix stable patch id for binary / header-only
  2022-09-20  5:58 [PATCH 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
@ 2022-09-20  5:58 ` Jerry Zhang via GitGitGadget
  2022-09-20  5:58 ` [PATCH 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-09-20  5:58 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

Previous logic here skipped flushing the hunks for binary
and header-only patch ids, which would always result in a
patch-id of 0000.

Reorder the logic to branch into 3 cases for populating the
patch body: header-only which populates nothing, binary which
populates the object ids, and normal which populates the text
diff. All branches will end up flushing the hunk.

Update the test to run on both binary and normal files.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 diff.c                     | 32 ++++++++++++++------------------
 t/t3419-rebase-patch-id.sh | 19 +++++++++++++------
 2 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/diff.c b/diff.c
index dd68281ba44..2f8f0c2e4f4 100644
--- a/diff.c
+++ b/diff.c
@@ -6248,30 +6248,26 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 			the_hash_algo->update_fn(&ctx, p->two->path, len2);
 		}
 
-		if (diff_header_only)
-			continue;
-
-		if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
-		    fill_mmfile(options->repo, &mf2, p->two) < 0)
-			return error("unable to read files to diff");
-
-		if (diff_filespec_is_binary(options->repo, p->one) ||
+		if (diff_header_only) {
+			// Don't do anything since we're only populating header info
+		} else if (diff_filespec_is_binary(options->repo, p->one) ||
 		    diff_filespec_is_binary(options->repo, p->two)) {
 			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->one->oid),
 					the_hash_algo->hexsz);
 			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->two->oid),
 					the_hash_algo->hexsz);
-			continue;
+		} else {
+			if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
+			    fill_mmfile(options->repo, &mf2, p->two) < 0)
+				return error("unable to read files to diff");
+			xpp.flags = 0;
+			xecfg.ctxlen = 3;
+			xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
+			if (xdi_diff_outf(&mf1, &mf2, NULL,
+					  patch_id_consume, &data, &xpp, &xecfg))
+				return error("unable to generate patch-id diff for %s",
+					     p->one->path);
 		}
-
-		xpp.flags = 0;
-		xecfg.ctxlen = 3;
-		xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
-		if (xdi_diff_outf(&mf1, &mf2, NULL,
-				  patch_id_consume, &data, &xpp, &xecfg))
-			return error("unable to generate patch-id diff for %s",
-				     p->one->path);
-
 		if (stable)
 			flush_one_hunk(oid, &ctx);
 	}
diff --git a/t/t3419-rebase-patch-id.sh b/t/t3419-rebase-patch-id.sh
index 295040f2fe3..f7b7e9e5b7c 100755
--- a/t/t3419-rebase-patch-id.sh
+++ b/t/t3419-rebase-patch-id.sh
@@ -46,10 +46,6 @@ test_expect_success 'setup: 500 lines' '
 	git cherry-pick main >/dev/null 2>&1
 '
 
-test_expect_success 'setup attributes' '
-	echo "file binary" >.gitattributes
-'
-
 test_expect_success 'detect upstream patch' '
 	git checkout -q main &&
 	scramble file &&
@@ -58,7 +54,13 @@ test_expect_success 'detect upstream patch' '
 	git checkout -q other^{} &&
 	git rebase main &&
 	git rev-list main...HEAD~ >revs &&
-	test_must_be_empty revs
+	test_must_be_empty revs &&
+	echo "file binary" >.gitattributes &&
+	git checkout -q other^{} &&
+	git rebase main &&
+	git rev-list main...HEAD~ >revs &&
+	test_must_be_empty revs &&
+	rm .gitattributes
 '
 
 test_expect_success 'do not drop patch' '
@@ -68,7 +70,12 @@ test_expect_success 'do not drop patch' '
 	git commit -q -m squashed &&
 	git checkout -q other^{} &&
 	test_must_fail git rebase squashed &&
-	git rebase --quit
+	git rebase --abort &&
+	echo "file binary" >.gitattributes &&
+	git checkout -q other^{} &&
+	test_must_fail git rebase squashed &&
+	git rebase --abort &&
+	rm .gitattributes
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/2] patch-id: use stable patch-id for rebases
  2022-09-20  5:58 [PATCH 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
  2022-09-20  5:58 ` [PATCH 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
@ 2022-09-20  5:58 ` Jerry Zhang via GitGitGadget
  2022-09-20  6:20 ` [PATCH v2 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
  2022-09-21 19:16 ` [PATCH 0/2] update internal patch-id to use "stable" algorithm Junio C Hamano
  3 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-09-20  5:58 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

Git doesn't persist patch-ids during the rebase process, so there is
no need to specifically invoke the unstable variant.

This allows the legacy unstable id logic to be cleaned up.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 builtin/log.c |  2 +-
 diff.c        | 12 ++++--------
 diff.h        |  2 +-
 patch-ids.c   | 10 +++++-----
 patch-ids.h   |  2 +-
 5 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 047f9e5278d..3bb49fd7406 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1762,7 +1762,7 @@ static void prepare_bases(struct base_tree_info *bases,
 		struct object_id *patch_id;
 		if (*commit_base_at(&commit_base, commit))
 			continue;
-		if (commit_patch_id(commit, &diffopt, &oid, 0, 1))
+		if (commit_patch_id(commit, &diffopt, &oid, 0))
 			die(_("cannot get patch id"));
 		ALLOC_GROW(bases->patch_id, bases->nr_patch_id + 1, bases->alloc_patch_id);
 		patch_id = bases->patch_id + bases->nr_patch_id;
diff --git a/diff.c b/diff.c
index 2f8f0c2e4f4..8c46531cd44 100644
--- a/diff.c
+++ b/diff.c
@@ -6185,7 +6185,7 @@ static void patch_id_add_mode(git_hash_ctx *ctx, unsigned mode)
 }
 
 /* returns 0 upon success, and writes result into oid */
-static int diff_get_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only, int stable)
+static int diff_get_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i;
@@ -6268,21 +6268,17 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 				return error("unable to generate patch-id diff for %s",
 					     p->one->path);
 		}
-		if (stable)
-			flush_one_hunk(oid, &ctx);
+		flush_one_hunk(oid, &ctx);
 	}
 
-	if (!stable)
-		the_hash_algo->final_oid_fn(oid, &ctx);
-
 	return 0;
 }
 
-int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only, int stable)
+int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i;
-	int result = diff_get_patch_id(options, oid, diff_header_only, stable);
+	int result = diff_get_patch_id(options, oid, diff_header_only);
 
 	for (i = 0; i < q->nr; i++)
 		diff_free_filepair(q->queue[i]);
diff --git a/diff.h b/diff.h
index 8ae18e5ab1e..fd33caeb25d 100644
--- a/diff.h
+++ b/diff.h
@@ -634,7 +634,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option);
 int run_diff_index(struct rev_info *revs, unsigned int option);
 
 int do_diff_cache(const struct object_id *, struct diff_options *);
-int diff_flush_patch_id(struct diff_options *, struct object_id *, int, int);
+int diff_flush_patch_id(struct diff_options *, struct object_id *, int);
 void flush_one_hunk(struct object_id *result, git_hash_ctx *ctx);
 
 int diff_result_code(struct diff_options *, int);
diff --git a/patch-ids.c b/patch-ids.c
index 8bf425555de..fefddc487e9 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -11,7 +11,7 @@ static int patch_id_defined(struct commit *commit)
 }
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-		    struct object_id *oid, int diff_header_only, int stable)
+		    struct object_id *oid, int diff_header_only)
 {
 	if (!patch_id_defined(commit))
 		return -1;
@@ -22,7 +22,7 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
 	else
 		diff_root_tree_oid(&commit->object.oid, "", options);
 	diffcore_std(options);
-	return diff_flush_patch_id(options, oid, diff_header_only, stable);
+	return diff_flush_patch_id(options, oid, diff_header_only);
 }
 
 /*
@@ -48,11 +48,11 @@ static int patch_id_neq(const void *cmpfn_data,
 	b = container_of(entry_or_key, struct patch_id, ent);
 
 	if (is_null_oid(&a->patch_id) &&
-	    commit_patch_id(a->commit, opt, &a->patch_id, 0, 0))
+	    commit_patch_id(a->commit, opt, &a->patch_id, 0))
 		return error("Could not get patch ID for %s",
 			oid_to_hex(&a->commit->object.oid));
 	if (is_null_oid(&b->patch_id) &&
-	    commit_patch_id(b->commit, opt, &b->patch_id, 0, 0))
+	    commit_patch_id(b->commit, opt, &b->patch_id, 0))
 		return error("Could not get patch ID for %s",
 			oid_to_hex(&b->commit->object.oid));
 	return !oideq(&a->patch_id, &b->patch_id);
@@ -82,7 +82,7 @@ static int init_patch_id_entry(struct patch_id *patch,
 	struct object_id header_only_patch_id;
 
 	patch->commit = commit;
-	if (commit_patch_id(commit, &ids->diffopts, &header_only_patch_id, 1, 0))
+	if (commit_patch_id(commit, &ids->diffopts, &header_only_patch_id, 1))
 		return -1;
 
 	hashmap_entry_init(&patch->ent, oidhash(&header_only_patch_id));
diff --git a/patch-ids.h b/patch-ids.h
index ab6c6a68047..490d7393716 100644
--- a/patch-ids.h
+++ b/patch-ids.h
@@ -20,7 +20,7 @@ struct patch_ids {
 };
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-		    struct object_id *oid, int, int);
+		    struct object_id *oid, int);
 int init_patch_ids(struct repository *, struct patch_ids *);
 int free_patch_ids(struct patch_ids *);
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 0/2] update internal patch-id to use "stable" algorithm
  2022-09-20  5:58 [PATCH 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
  2022-09-20  5:58 ` [PATCH 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
  2022-09-20  5:58 ` [PATCH 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
@ 2022-09-20  6:20 ` Jerry Zhang via GitGitGadget
  2022-09-20  6:20   ` [PATCH v2 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
                     ` (2 more replies)
  2022-09-21 19:16 ` [PATCH 0/2] update internal patch-id to use "stable" algorithm Junio C Hamano
  3 siblings, 3 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-09-20  6:20 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang

Internal usage of patch-id in rebase / cherry-pick doesn't persist
patch-ids, so there's no need to specifically invoke the unstable variant.

This allows the unstable logic to be cleaned up.

In the process, fixed a bug in the combination of --stable with binary files
and header-only, and expanded the test to cover both binary and non-binary
files.

Signed-off-by: Jerry Zhang jerry@skydio.com

Jerry Zhang (2):
  patch-id: fix stable patch id for binary / header-only
  patch-id: use stable patch-id for rebases

 builtin/log.c              |  2 +-
 diff.c                     | 44 ++++++++++++++++----------------------
 diff.h                     |  2 +-
 patch-ids.c                | 10 ++++-----
 patch-ids.h                |  2 +-
 t/t3419-rebase-patch-id.sh | 19 ++++++++++------
 6 files changed, 39 insertions(+), 40 deletions(-)


base-commit: e188ec3a735ae52a0d0d3c22f9df6b29fa613b1e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1359%2Fjerry-skydio%2Fjerry%2Frevup%2Fmaster%2Fpatch_ids-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1359/jerry-skydio/jerry/revup/master/patch_ids-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1359

Range-diff vs v1:

 1:  82fe77c1ce0 ! 1:  945508df7b6 patch-id: fix stable patch id for binary / header-only
     @@ diff.c: static int diff_get_patch_id(struct diff_options *options, struct object
      -
      -		if (diff_filespec_is_binary(options->repo, p->one) ||
      +		if (diff_header_only) {
     -+			// Don't do anything since we're only populating header info
     ++			/* don't do anything since we're only populating header info */
      +		} else if (diff_filespec_is_binary(options->repo, p->one) ||
       		    diff_filespec_is_binary(options->repo, p->two)) {
       			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->one->oid),
 2:  6abb1ced1bd = 2:  30ec43cd129 patch-id: use stable patch-id for rebases

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 1/2] patch-id: fix stable patch id for binary / header-only
  2022-09-20  6:20 ` [PATCH v2 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
@ 2022-09-20  6:20   ` Jerry Zhang via GitGitGadget
  2022-09-20  6:20   ` [PATCH v2 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
  2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  2 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-09-20  6:20 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

Previous logic here skipped flushing the hunks for binary
and header-only patch ids, which would always result in a
patch-id of 0000.

Reorder the logic to branch into 3 cases for populating the
patch body: header-only which populates nothing, binary which
populates the object ids, and normal which populates the text
diff. All branches will end up flushing the hunk.

Update the test to run on both binary and normal files.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 diff.c                     | 32 ++++++++++++++------------------
 t/t3419-rebase-patch-id.sh | 19 +++++++++++++------
 2 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/diff.c b/diff.c
index dd68281ba44..70bc1902e11 100644
--- a/diff.c
+++ b/diff.c
@@ -6248,30 +6248,26 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 			the_hash_algo->update_fn(&ctx, p->two->path, len2);
 		}
 
-		if (diff_header_only)
-			continue;
-
-		if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
-		    fill_mmfile(options->repo, &mf2, p->two) < 0)
-			return error("unable to read files to diff");
-
-		if (diff_filespec_is_binary(options->repo, p->one) ||
+		if (diff_header_only) {
+			/* don't do anything since we're only populating header info */
+		} else if (diff_filespec_is_binary(options->repo, p->one) ||
 		    diff_filespec_is_binary(options->repo, p->two)) {
 			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->one->oid),
 					the_hash_algo->hexsz);
 			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->two->oid),
 					the_hash_algo->hexsz);
-			continue;
+		} else {
+			if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
+			    fill_mmfile(options->repo, &mf2, p->two) < 0)
+				return error("unable to read files to diff");
+			xpp.flags = 0;
+			xecfg.ctxlen = 3;
+			xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
+			if (xdi_diff_outf(&mf1, &mf2, NULL,
+					  patch_id_consume, &data, &xpp, &xecfg))
+				return error("unable to generate patch-id diff for %s",
+					     p->one->path);
 		}
-
-		xpp.flags = 0;
-		xecfg.ctxlen = 3;
-		xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
-		if (xdi_diff_outf(&mf1, &mf2, NULL,
-				  patch_id_consume, &data, &xpp, &xecfg))
-			return error("unable to generate patch-id diff for %s",
-				     p->one->path);
-
 		if (stable)
 			flush_one_hunk(oid, &ctx);
 	}
diff --git a/t/t3419-rebase-patch-id.sh b/t/t3419-rebase-patch-id.sh
index 295040f2fe3..f7b7e9e5b7c 100755
--- a/t/t3419-rebase-patch-id.sh
+++ b/t/t3419-rebase-patch-id.sh
@@ -46,10 +46,6 @@ test_expect_success 'setup: 500 lines' '
 	git cherry-pick main >/dev/null 2>&1
 '
 
-test_expect_success 'setup attributes' '
-	echo "file binary" >.gitattributes
-'
-
 test_expect_success 'detect upstream patch' '
 	git checkout -q main &&
 	scramble file &&
@@ -58,7 +54,13 @@ test_expect_success 'detect upstream patch' '
 	git checkout -q other^{} &&
 	git rebase main &&
 	git rev-list main...HEAD~ >revs &&
-	test_must_be_empty revs
+	test_must_be_empty revs &&
+	echo "file binary" >.gitattributes &&
+	git checkout -q other^{} &&
+	git rebase main &&
+	git rev-list main...HEAD~ >revs &&
+	test_must_be_empty revs &&
+	rm .gitattributes
 '
 
 test_expect_success 'do not drop patch' '
@@ -68,7 +70,12 @@ test_expect_success 'do not drop patch' '
 	git commit -q -m squashed &&
 	git checkout -q other^{} &&
 	test_must_fail git rebase squashed &&
-	git rebase --quit
+	git rebase --abort &&
+	echo "file binary" >.gitattributes &&
+	git checkout -q other^{} &&
+	test_must_fail git rebase squashed &&
+	git rebase --abort &&
+	rm .gitattributes
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 2/2] patch-id: use stable patch-id for rebases
  2022-09-20  6:20 ` [PATCH v2 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
  2022-09-20  6:20   ` [PATCH v2 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
@ 2022-09-20  6:20   ` Jerry Zhang via GitGitGadget
  2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  2 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-09-20  6:20 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

Git doesn't persist patch-ids during the rebase process, so there is
no need to specifically invoke the unstable variant.

This allows the legacy unstable id logic to be cleaned up.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 builtin/log.c |  2 +-
 diff.c        | 12 ++++--------
 diff.h        |  2 +-
 patch-ids.c   | 10 +++++-----
 patch-ids.h   |  2 +-
 5 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 047f9e5278d..3bb49fd7406 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1762,7 +1762,7 @@ static void prepare_bases(struct base_tree_info *bases,
 		struct object_id *patch_id;
 		if (*commit_base_at(&commit_base, commit))
 			continue;
-		if (commit_patch_id(commit, &diffopt, &oid, 0, 1))
+		if (commit_patch_id(commit, &diffopt, &oid, 0))
 			die(_("cannot get patch id"));
 		ALLOC_GROW(bases->patch_id, bases->nr_patch_id + 1, bases->alloc_patch_id);
 		patch_id = bases->patch_id + bases->nr_patch_id;
diff --git a/diff.c b/diff.c
index 70bc1902e11..f00522d9354 100644
--- a/diff.c
+++ b/diff.c
@@ -6185,7 +6185,7 @@ static void patch_id_add_mode(git_hash_ctx *ctx, unsigned mode)
 }
 
 /* returns 0 upon success, and writes result into oid */
-static int diff_get_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only, int stable)
+static int diff_get_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i;
@@ -6268,21 +6268,17 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 				return error("unable to generate patch-id diff for %s",
 					     p->one->path);
 		}
-		if (stable)
-			flush_one_hunk(oid, &ctx);
+		flush_one_hunk(oid, &ctx);
 	}
 
-	if (!stable)
-		the_hash_algo->final_oid_fn(oid, &ctx);
-
 	return 0;
 }
 
-int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only, int stable)
+int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i;
-	int result = diff_get_patch_id(options, oid, diff_header_only, stable);
+	int result = diff_get_patch_id(options, oid, diff_header_only);
 
 	for (i = 0; i < q->nr; i++)
 		diff_free_filepair(q->queue[i]);
diff --git a/diff.h b/diff.h
index 8ae18e5ab1e..fd33caeb25d 100644
--- a/diff.h
+++ b/diff.h
@@ -634,7 +634,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option);
 int run_diff_index(struct rev_info *revs, unsigned int option);
 
 int do_diff_cache(const struct object_id *, struct diff_options *);
-int diff_flush_patch_id(struct diff_options *, struct object_id *, int, int);
+int diff_flush_patch_id(struct diff_options *, struct object_id *, int);
 void flush_one_hunk(struct object_id *result, git_hash_ctx *ctx);
 
 int diff_result_code(struct diff_options *, int);
diff --git a/patch-ids.c b/patch-ids.c
index 8bf425555de..fefddc487e9 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -11,7 +11,7 @@ static int patch_id_defined(struct commit *commit)
 }
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-		    struct object_id *oid, int diff_header_only, int stable)
+		    struct object_id *oid, int diff_header_only)
 {
 	if (!patch_id_defined(commit))
 		return -1;
@@ -22,7 +22,7 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
 	else
 		diff_root_tree_oid(&commit->object.oid, "", options);
 	diffcore_std(options);
-	return diff_flush_patch_id(options, oid, diff_header_only, stable);
+	return diff_flush_patch_id(options, oid, diff_header_only);
 }
 
 /*
@@ -48,11 +48,11 @@ static int patch_id_neq(const void *cmpfn_data,
 	b = container_of(entry_or_key, struct patch_id, ent);
 
 	if (is_null_oid(&a->patch_id) &&
-	    commit_patch_id(a->commit, opt, &a->patch_id, 0, 0))
+	    commit_patch_id(a->commit, opt, &a->patch_id, 0))
 		return error("Could not get patch ID for %s",
 			oid_to_hex(&a->commit->object.oid));
 	if (is_null_oid(&b->patch_id) &&
-	    commit_patch_id(b->commit, opt, &b->patch_id, 0, 0))
+	    commit_patch_id(b->commit, opt, &b->patch_id, 0))
 		return error("Could not get patch ID for %s",
 			oid_to_hex(&b->commit->object.oid));
 	return !oideq(&a->patch_id, &b->patch_id);
@@ -82,7 +82,7 @@ static int init_patch_id_entry(struct patch_id *patch,
 	struct object_id header_only_patch_id;
 
 	patch->commit = commit;
-	if (commit_patch_id(commit, &ids->diffopts, &header_only_patch_id, 1, 0))
+	if (commit_patch_id(commit, &ids->diffopts, &header_only_patch_id, 1))
 		return -1;
 
 	hashmap_entry_init(&patch->ent, oidhash(&header_only_patch_id));
diff --git a/patch-ids.h b/patch-ids.h
index ab6c6a68047..490d7393716 100644
--- a/patch-ids.h
+++ b/patch-ids.h
@@ -20,7 +20,7 @@ struct patch_ids {
 };
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-		    struct object_id *oid, int, int);
+		    struct object_id *oid, int);
 int init_patch_ids(struct repository *, struct patch_ids *);
 int free_patch_ids(struct patch_ids *);
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/2] update internal patch-id to use "stable" algorithm
  2022-09-20  5:58 [PATCH 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-09-20  6:20 ` [PATCH v2 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
@ 2022-09-21 19:16 ` Junio C Hamano
  2022-09-21 20:59   ` Jerry Zhang
  3 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2022-09-21 19:16 UTC (permalink / raw)
  To: Jerry Zhang via GitGitGadget; +Cc: git, Jerry Zhang

"Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Internal usage of patch-id in rebase / cherry-pick doesn't persist
> patch-ids, so there's no need to specifically invoke the unstable variant.
>
> This allows the unstable logic to be cleaned up.

While all of that may be true, two things are not explained.  

 * Why does "unstable" need to be "cleaned up"?  Is that too dirty
   in what way?

 * If internal usage does not persist patch-ids generated by the
   machinery, why is it bad to be using the unstable variant?  A
   naïve expectation would be to make sure you use stable one if you
   want a future recomputation to give you the same result, but the
   opposite does not have to be always true.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/2] update internal patch-id to use "stable" algorithm
  2022-09-21 19:16 ` [PATCH 0/2] update internal patch-id to use "stable" algorithm Junio C Hamano
@ 2022-09-21 20:59   ` Jerry Zhang
  0 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang @ 2022-09-21 20:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jerry Zhang via GitGitGadget, git

On Wed, Sep 21, 2022 at 12:16 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Internal usage of patch-id in rebase / cherry-pick doesn't persist
> > patch-ids, so there's no need to specifically invoke the unstable variant.
> >
> > This allows the unstable logic to be cleaned up.
>
> While all of that may be true, two things are not explained.
>
>  * Why does "unstable" need to be "cleaned up"?  Is that too dirty
>    in what way?
>
>  * If internal usage does not persist patch-ids generated by the
>    machinery, why is it bad to be using the unstable variant?  A
>    naïve expectation would be to make sure you use stable one if you
>    want a future recomputation to give you the same result, but the
>    opposite does not have to be always true.
>
Fair questions. My broad view is that less code and fewer code paths
is better for readability and testing. This isn't a massive impact but
it's not theoretical either -- as seen in patch 1 in this series I
caught a bug in stable + binary files because of this change.
Previously stable patch ids were only used in "git format-patch" and
so this corner case was missed, but this becomes less likely if rebase
+ cherry-pick + format-patch were all on the same scheme.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 0/7] patch-id fixes and improvements
  2022-09-20  6:20 ` [PATCH v2 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
  2022-09-20  6:20   ` [PATCH v2 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
  2022-09-20  6:20   ` [PATCH v2 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
@ 2022-10-14  8:56   ` Jerry Zhang via GitGitGadget
  2022-10-14  8:56     ` [PATCH v3 1/7] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
                       ` (7 more replies)
  2 siblings, 8 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-14  8:56 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang

These patches add fixes and features to the "git patch-id" command, mostly
discovered through our usage of patch-id in the revup project
(https://github.com/Skydio/revup). On top of that I've tried to make general
cleanup changes where I can.

Summary:

1: Fixed a bug in the combination of --stable with binary files and
header-only, and expanded the test to cover both binary and non-binary
files.

2: Switch internal usage of patch-id in rebase / cherry-pick to use the
stable variant to reduce the number of code paths and improve testing for
bugs like above.

3: Fixed bugs with patch-id and binary diffs. Previously patch-id did not
behave correctly for binary diffs regardless of whether "--binary" was given
to "diff".

4: Fixed bugs with patch-id and mode changes. Previously mode changes were
incorrectly excluded from the patch-id.

5: Add a new "--include-whitespace" mode to patch-id that prevents
whitespace from being stripped during id calculation. Also add a config
option for the same behavior.

6: Remove unused prefix from patch-id logic.

7: Update format-patch doc to specify when patch-ids are going to be equal
to those generated by "git patch-id".

V1->V2: Fixed comment style V2->V3: The ---/+++ lines no longer get added to
the patch-id of binary diffs. Also added patches 3-7 in the series.

Signed-off-by: Jerry Zhang jerry@skydio.com

Jerry Zhang (7):
  patch-id: fix stable patch id for binary / header-only
  patch-id: use stable patch-id for rebases
  builtin: patch-id: fix patch-id with binary diffs
  patch-id: fix patch-id for mode changes
  builtin: patch-id: add --include-whitespace as a command mode
  builtin: patch-id: remove unused diff-tree prefix
  documentation: format-patch: clarify requirements for patch-ids to
    match

 Documentation/git-format-patch.txt |   4 +-
 Documentation/git-patch-id.txt     |  25 +++++--
 builtin/log.c                      |   2 +-
 builtin/patch-id.c                 | 114 +++++++++++++++++++++--------
 diff.c                             |  75 +++++++++----------
 diff.h                             |   2 +-
 patch-ids.c                        |  10 +--
 patch-ids.h                        |   2 +-
 t/t3419-rebase-patch-id.sh         |  63 +++++++++++++---
 t/t4204-patch-id.sh                |  95 ++++++++++++++++++++++--
 10 files changed, 291 insertions(+), 101 deletions(-)


base-commit: d420dda0576340909c3faff364cfbd1485f70376
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1359%2Fjerry-skydio%2Fjerry%2Frevup%2Fmaster%2Fpatch_ids-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1359/jerry-skydio/jerry/revup/master/patch_ids-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1359

Range-diff vs v2:

 1:  945508df7b6 ! 1:  7d4c2e91ce0 patch-id: fix stable patch id for binary / header-only
     @@ Commit message
          populates the object ids, and normal which populates the text
          diff. All branches will end up flushing the hunk.
      
     +    Don't populate the ---a/ and +++b/ lines for binary diffs, to correspond
     +    to those lines not being present in the "git diff" text output.
     +    This is necessary because we advertise that the patch-id calculated
     +    internally and used in format-patch is the same that what the
     +    builtin "git patch-id" would produce when piped from a diff.
     +
          Update the test to run on both binary and normal files.
      
          Signed-off-by: Jerry Zhang <jerry@skydio.com>
      
       ## diff.c ##
      @@ diff.c: static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
     - 			the_hash_algo->update_fn(&ctx, p->two->path, len2);
     + 		if (p->one->mode == 0) {
     + 			patch_id_add_string(&ctx, "newfilemode");
     + 			patch_id_add_mode(&ctx, p->two->mode);
     +-			patch_id_add_string(&ctx, "---/dev/null");
     +-			patch_id_add_string(&ctx, "+++b/");
     +-			the_hash_algo->update_fn(&ctx, p->two->path, len2);
     + 		} else if (p->two->mode == 0) {
     + 			patch_id_add_string(&ctx, "deletedfilemode");
     + 			patch_id_add_mode(&ctx, p->one->mode);
     +-			patch_id_add_string(&ctx, "---a/");
     +-			the_hash_algo->update_fn(&ctx, p->one->path, len1);
     +-			patch_id_add_string(&ctx, "+++/dev/null");
     +-		} else {
     +-			patch_id_add_string(&ctx, "---a/");
     +-			the_hash_algo->update_fn(&ctx, p->one->path, len1);
     +-			patch_id_add_string(&ctx, "+++b/");
     +-			the_hash_algo->update_fn(&ctx, p->two->path, len2);
       		}
       
      -		if (diff_header_only)
     @@ diff.c: static int diff_get_patch_id(struct diff_options *options, struct object
       			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->two->oid),
       					the_hash_algo->hexsz);
      -			continue;
     +-		}
     +-
     +-		xpp.flags = 0;
     +-		xecfg.ctxlen = 3;
     +-		xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
     +-		if (xdi_diff_outf(&mf1, &mf2, NULL,
     +-				  patch_id_consume, &data, &xpp, &xecfg))
     +-			return error("unable to generate patch-id diff for %s",
     +-				     p->one->path);
      +		} else {
     ++			if (p->one->mode == 0) {
     ++				patch_id_add_string(&ctx, "---/dev/null");
     ++				patch_id_add_string(&ctx, "+++b/");
     ++				the_hash_algo->update_fn(&ctx, p->two->path, len2);
     ++			} else if (p->two->mode == 0) {
     ++				patch_id_add_string(&ctx, "---a/");
     ++				the_hash_algo->update_fn(&ctx, p->one->path, len1);
     ++				patch_id_add_string(&ctx, "+++/dev/null");
     ++			} else {
     ++				patch_id_add_string(&ctx, "---a/");
     ++				the_hash_algo->update_fn(&ctx, p->one->path, len1);
     ++				patch_id_add_string(&ctx, "+++b/");
     ++				the_hash_algo->update_fn(&ctx, p->two->path, len2);
     ++			}
     + 
      +			if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
      +			    fill_mmfile(options->repo, &mf2, p->two) < 0)
      +				return error("unable to read files to diff");
     @@ diff.c: static int diff_get_patch_id(struct diff_options *options, struct object
      +					  patch_id_consume, &data, &xpp, &xecfg))
      +				return error("unable to generate patch-id diff for %s",
      +					     p->one->path);
     - 		}
     --
     --		xpp.flags = 0;
     --		xecfg.ctxlen = 3;
     --		xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
     --		if (xdi_diff_outf(&mf1, &mf2, NULL,
     --				  patch_id_consume, &data, &xpp, &xecfg))
     --			return error("unable to generate patch-id diff for %s",
     --				     p->one->path);
     --
     ++		}
       		if (stable)
       			flush_one_hunk(oid, &ctx);
       	}
      
       ## t/t3419-rebase-patch-id.sh ##
      @@ t/t3419-rebase-patch-id.sh: test_expect_success 'setup: 500 lines' '
     - 	git cherry-pick main >/dev/null 2>&1
     - '
     + 	git add newfile &&
     + 	git commit -q -m "add small file" &&
     + 
     +-	git cherry-pick main >/dev/null 2>&1
     +-'
     ++	git cherry-pick main >/dev/null 2>&1 &&
       
      -test_expect_success 'setup attributes' '
      -	echo "file binary" >.gitattributes
     --'
     --
     ++	git branch -f squashed main &&
     ++	git checkout -q -f squashed &&
     ++	git reset -q --soft HEAD~2 &&
     ++	git commit -q -m squashed
     + '
     + 
       test_expect_success 'detect upstream patch' '
     - 	git checkout -q main &&
     +-	git checkout -q main &&
     ++	git checkout -q main^{} &&
       	scramble file &&
     + 	git add file &&
     + 	git commit -q -m "change big file again" &&
      @@ t/t3419-rebase-patch-id.sh: test_expect_success 'detect upstream patch' '
     - 	git checkout -q other^{} &&
     - 	git rebase main &&
     - 	git rev-list main...HEAD~ >revs &&
     --	test_must_be_empty revs
     -+	test_must_be_empty revs &&
     + 	test_must_be_empty revs
     + '
     + 
     ++test_expect_success 'detect upstream patch binary' '
      +	echo "file binary" >.gitattributes &&
      +	git checkout -q other^{} &&
      +	git rebase main &&
      +	git rev-list main...HEAD~ >revs &&
      +	test_must_be_empty revs &&
     -+	rm .gitattributes
     - '
     - 
     ++	test_when_finished "rm .gitattributes"
     ++'
     ++
       test_expect_success 'do not drop patch' '
     -@@ t/t3419-rebase-patch-id.sh: test_expect_success 'do not drop patch' '
     - 	git commit -q -m squashed &&
     +-	git branch -f squashed main &&
     +-	git checkout -q -f squashed &&
     +-	git reset -q --soft HEAD~2 &&
     +-	git commit -q -m squashed &&
       	git checkout -q other^{} &&
       	test_must_fail git rebase squashed &&
      -	git rebase --quit
     -+	git rebase --abort &&
     ++	test_when_finished "git rebase --abort"
     ++'
     ++
     ++test_expect_success 'do not drop patch binary' '
      +	echo "file binary" >.gitattributes &&
      +	git checkout -q other^{} &&
      +	test_must_fail git rebase squashed &&
     -+	git rebase --abort &&
     -+	rm .gitattributes
     ++	test_when_finished "git rebase --abort" &&
     ++	test_when_finished "rm .gitattributes"
       '
       
       test_done
 2:  30ec43cd129 ! 2:  25e28b7dab3 patch-id: use stable patch-id for rebases
     @@ Commit message
          patch-id: use stable patch-id for rebases
      
          Git doesn't persist patch-ids during the rebase process, so there is
     -    no need to specifically invoke the unstable variant.
     -
     -    This allows the legacy unstable id logic to be cleaned up.
     +    no need to specifically invoke the unstable variant. Use the stable
     +    logic for all internal patch-id calculations to minimize the number of
     +    code paths and improve test coverage.
      
          Signed-off-by: Jerry Zhang <jerry@skydio.com>
      
 -:  ----------- > 3:  21642128927 builtin: patch-id: fix patch-id with binary diffs
 -:  ----------- > 4:  6e07cfd5691 patch-id: fix patch-id for mode changes
 -:  ----------- > 5:  bbaa2425ad0 builtin: patch-id: add --include-whitespace as a command mode
 -:  ----------- > 6:  a1f6f36d487 builtin: patch-id: remove unused diff-tree prefix
 -:  ----------- > 7:  69440797f30 documentation: format-patch: clarify requirements for patch-ids to match

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 1/7] patch-id: fix stable patch id for binary / header-only
  2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
@ 2022-10-14  8:56     ` Jerry Zhang via GitGitGadget
  2022-10-14  8:56     ` [PATCH v3 2/7] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-14  8:56 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

Previous logic here skipped flushing the hunks for binary
and header-only patch ids, which would always result in a
patch-id of 0000.

Reorder the logic to branch into 3 cases for populating the
patch body: header-only which populates nothing, binary which
populates the object ids, and normal which populates the text
diff. All branches will end up flushing the hunk.

Don't populate the ---a/ and +++b/ lines for binary diffs, to correspond
to those lines not being present in the "git diff" text output.
This is necessary because we advertise that the patch-id calculated
internally and used in format-patch is the same that what the
builtin "git patch-id" would produce when piped from a diff.

Update the test to run on both binary and normal files.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 diff.c                     | 58 +++++++++++++++++++-------------------
 t/t3419-rebase-patch-id.sh | 34 +++++++++++++++-------
 2 files changed, 53 insertions(+), 39 deletions(-)

diff --git a/diff.c b/diff.c
index 648f6717a55..c15169e4b06 100644
--- a/diff.c
+++ b/diff.c
@@ -6253,46 +6253,46 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 		if (p->one->mode == 0) {
 			patch_id_add_string(&ctx, "newfilemode");
 			patch_id_add_mode(&ctx, p->two->mode);
-			patch_id_add_string(&ctx, "---/dev/null");
-			patch_id_add_string(&ctx, "+++b/");
-			the_hash_algo->update_fn(&ctx, p->two->path, len2);
 		} else if (p->two->mode == 0) {
 			patch_id_add_string(&ctx, "deletedfilemode");
 			patch_id_add_mode(&ctx, p->one->mode);
-			patch_id_add_string(&ctx, "---a/");
-			the_hash_algo->update_fn(&ctx, p->one->path, len1);
-			patch_id_add_string(&ctx, "+++/dev/null");
-		} else {
-			patch_id_add_string(&ctx, "---a/");
-			the_hash_algo->update_fn(&ctx, p->one->path, len1);
-			patch_id_add_string(&ctx, "+++b/");
-			the_hash_algo->update_fn(&ctx, p->two->path, len2);
 		}
 
-		if (diff_header_only)
-			continue;
-
-		if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
-		    fill_mmfile(options->repo, &mf2, p->two) < 0)
-			return error("unable to read files to diff");
-
-		if (diff_filespec_is_binary(options->repo, p->one) ||
+		if (diff_header_only) {
+			/* don't do anything since we're only populating header info */
+		} else if (diff_filespec_is_binary(options->repo, p->one) ||
 		    diff_filespec_is_binary(options->repo, p->two)) {
 			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->one->oid),
 					the_hash_algo->hexsz);
 			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->two->oid),
 					the_hash_algo->hexsz);
-			continue;
-		}
-
-		xpp.flags = 0;
-		xecfg.ctxlen = 3;
-		xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
-		if (xdi_diff_outf(&mf1, &mf2, NULL,
-				  patch_id_consume, &data, &xpp, &xecfg))
-			return error("unable to generate patch-id diff for %s",
-				     p->one->path);
+		} else {
+			if (p->one->mode == 0) {
+				patch_id_add_string(&ctx, "---/dev/null");
+				patch_id_add_string(&ctx, "+++b/");
+				the_hash_algo->update_fn(&ctx, p->two->path, len2);
+			} else if (p->two->mode == 0) {
+				patch_id_add_string(&ctx, "---a/");
+				the_hash_algo->update_fn(&ctx, p->one->path, len1);
+				patch_id_add_string(&ctx, "+++/dev/null");
+			} else {
+				patch_id_add_string(&ctx, "---a/");
+				the_hash_algo->update_fn(&ctx, p->one->path, len1);
+				patch_id_add_string(&ctx, "+++b/");
+				the_hash_algo->update_fn(&ctx, p->two->path, len2);
+			}
 
+			if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
+			    fill_mmfile(options->repo, &mf2, p->two) < 0)
+				return error("unable to read files to diff");
+			xpp.flags = 0;
+			xecfg.ctxlen = 3;
+			xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
+			if (xdi_diff_outf(&mf1, &mf2, NULL,
+					  patch_id_consume, &data, &xpp, &xecfg))
+				return error("unable to generate patch-id diff for %s",
+					     p->one->path);
+		}
 		if (stable)
 			flush_one_hunk(oid, &ctx);
 	}
diff --git a/t/t3419-rebase-patch-id.sh b/t/t3419-rebase-patch-id.sh
index 295040f2fe3..d24e55aac8d 100755
--- a/t/t3419-rebase-patch-id.sh
+++ b/t/t3419-rebase-patch-id.sh
@@ -43,15 +43,16 @@ test_expect_success 'setup: 500 lines' '
 	git add newfile &&
 	git commit -q -m "add small file" &&
 
-	git cherry-pick main >/dev/null 2>&1
-'
+	git cherry-pick main >/dev/null 2>&1 &&
 
-test_expect_success 'setup attributes' '
-	echo "file binary" >.gitattributes
+	git branch -f squashed main &&
+	git checkout -q -f squashed &&
+	git reset -q --soft HEAD~2 &&
+	git commit -q -m squashed
 '
 
 test_expect_success 'detect upstream patch' '
-	git checkout -q main &&
+	git checkout -q main^{} &&
 	scramble file &&
 	git add file &&
 	git commit -q -m "change big file again" &&
@@ -61,14 +62,27 @@ test_expect_success 'detect upstream patch' '
 	test_must_be_empty revs
 '
 
+test_expect_success 'detect upstream patch binary' '
+	echo "file binary" >.gitattributes &&
+	git checkout -q other^{} &&
+	git rebase main &&
+	git rev-list main...HEAD~ >revs &&
+	test_must_be_empty revs &&
+	test_when_finished "rm .gitattributes"
+'
+
 test_expect_success 'do not drop patch' '
-	git branch -f squashed main &&
-	git checkout -q -f squashed &&
-	git reset -q --soft HEAD~2 &&
-	git commit -q -m squashed &&
 	git checkout -q other^{} &&
 	test_must_fail git rebase squashed &&
-	git rebase --quit
+	test_when_finished "git rebase --abort"
+'
+
+test_expect_success 'do not drop patch binary' '
+	echo "file binary" >.gitattributes &&
+	git checkout -q other^{} &&
+	test_must_fail git rebase squashed &&
+	test_when_finished "git rebase --abort" &&
+	test_when_finished "rm .gitattributes"
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 2/7] patch-id: use stable patch-id for rebases
  2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  2022-10-14  8:56     ` [PATCH v3 1/7] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
@ 2022-10-14  8:56     ` Jerry Zhang via GitGitGadget
  2022-10-14  8:56     ` [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-14  8:56 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

Git doesn't persist patch-ids during the rebase process, so there is
no need to specifically invoke the unstable variant. Use the stable
logic for all internal patch-id calculations to minimize the number of
code paths and improve test coverage.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 builtin/log.c |  2 +-
 diff.c        | 12 ++++--------
 diff.h        |  2 +-
 patch-ids.c   | 10 +++++-----
 patch-ids.h   |  2 +-
 5 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index ee19dc5d450..e72869afb36 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1763,7 +1763,7 @@ static void prepare_bases(struct base_tree_info *bases,
 		struct object_id *patch_id;
 		if (*commit_base_at(&commit_base, commit))
 			continue;
-		if (commit_patch_id(commit, &diffopt, &oid, 0, 1))
+		if (commit_patch_id(commit, &diffopt, &oid, 0))
 			die(_("cannot get patch id"));
 		ALLOC_GROW(bases->patch_id, bases->nr_patch_id + 1, bases->alloc_patch_id);
 		patch_id = bases->patch_id + bases->nr_patch_id;
diff --git a/diff.c b/diff.c
index c15169e4b06..199b63dbcc3 100644
--- a/diff.c
+++ b/diff.c
@@ -6206,7 +6206,7 @@ static void patch_id_add_mode(git_hash_ctx *ctx, unsigned mode)
 }
 
 /* returns 0 upon success, and writes result into oid */
-static int diff_get_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only, int stable)
+static int diff_get_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i;
@@ -6293,21 +6293,17 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 				return error("unable to generate patch-id diff for %s",
 					     p->one->path);
 		}
-		if (stable)
-			flush_one_hunk(oid, &ctx);
+		flush_one_hunk(oid, &ctx);
 	}
 
-	if (!stable)
-		the_hash_algo->final_oid_fn(oid, &ctx);
-
 	return 0;
 }
 
-int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only, int stable)
+int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i;
-	int result = diff_get_patch_id(options, oid, diff_header_only, stable);
+	int result = diff_get_patch_id(options, oid, diff_header_only);
 
 	for (i = 0; i < q->nr; i++)
 		diff_free_filepair(q->queue[i]);
diff --git a/diff.h b/diff.h
index 8ae18e5ab1e..fd33caeb25d 100644
--- a/diff.h
+++ b/diff.h
@@ -634,7 +634,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option);
 int run_diff_index(struct rev_info *revs, unsigned int option);
 
 int do_diff_cache(const struct object_id *, struct diff_options *);
-int diff_flush_patch_id(struct diff_options *, struct object_id *, int, int);
+int diff_flush_patch_id(struct diff_options *, struct object_id *, int);
 void flush_one_hunk(struct object_id *result, git_hash_ctx *ctx);
 
 int diff_result_code(struct diff_options *, int);
diff --git a/patch-ids.c b/patch-ids.c
index 46c6a8f3eab..31534466266 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -11,7 +11,7 @@ static int patch_id_defined(struct commit *commit)
 }
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-		    struct object_id *oid, int diff_header_only, int stable)
+		    struct object_id *oid, int diff_header_only)
 {
 	if (!patch_id_defined(commit))
 		return -1;
@@ -22,7 +22,7 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
 	else
 		diff_root_tree_oid(&commit->object.oid, "", options);
 	diffcore_std(options);
-	return diff_flush_patch_id(options, oid, diff_header_only, stable);
+	return diff_flush_patch_id(options, oid, diff_header_only);
 }
 
 /*
@@ -48,11 +48,11 @@ static int patch_id_neq(const void *cmpfn_data,
 	b = container_of(entry_or_key, struct patch_id, ent);
 
 	if (is_null_oid(&a->patch_id) &&
-	    commit_patch_id(a->commit, opt, &a->patch_id, 0, 0))
+	    commit_patch_id(a->commit, opt, &a->patch_id, 0))
 		return error("Could not get patch ID for %s",
 			oid_to_hex(&a->commit->object.oid));
 	if (is_null_oid(&b->patch_id) &&
-	    commit_patch_id(b->commit, opt, &b->patch_id, 0, 0))
+	    commit_patch_id(b->commit, opt, &b->patch_id, 0))
 		return error("Could not get patch ID for %s",
 			oid_to_hex(&b->commit->object.oid));
 	return !oideq(&a->patch_id, &b->patch_id);
@@ -82,7 +82,7 @@ static int init_patch_id_entry(struct patch_id *patch,
 	struct object_id header_only_patch_id;
 
 	patch->commit = commit;
-	if (commit_patch_id(commit, &ids->diffopts, &header_only_patch_id, 1, 0))
+	if (commit_patch_id(commit, &ids->diffopts, &header_only_patch_id, 1))
 		return -1;
 
 	hashmap_entry_init(&patch->ent, oidhash(&header_only_patch_id));
diff --git a/patch-ids.h b/patch-ids.h
index ab6c6a68047..490d7393716 100644
--- a/patch-ids.h
+++ b/patch-ids.h
@@ -20,7 +20,7 @@ struct patch_ids {
 };
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-		    struct object_id *oid, int, int);
+		    struct object_id *oid, int);
 int init_patch_ids(struct repository *, struct patch_ids *);
 int free_patch_ids(struct patch_ids *);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs
  2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  2022-10-14  8:56     ` [PATCH v3 1/7] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
  2022-10-14  8:56     ` [PATCH v3 2/7] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
@ 2022-10-14  8:56     ` Jerry Zhang via GitGitGadget
  2022-10-14 17:13       ` Junio C Hamano
  2022-10-14 21:12       ` Junio C Hamano
  2022-10-14  8:56     ` [PATCH v3 4/7] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
                       ` (4 subsequent siblings)
  7 siblings, 2 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-14  8:56 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <Jerry@skydio.com>

"git patch-id" currently doesn't produce correct output if the
incoming diff has any binary files. Add logic to
get_one_patchid to handle the different possible styles of binary
diff. This attempts to keep resulting patch-ids identical to what
would be produced by the counterpart logic in diff.c, that is it
produces the id by hashing the a and b oids in succession.

In general we handle binary diffs by first caching the object ids from
the "index" line and using those if we then find an indication
that the diff is binary.

The input could contain patches generated with "git diff --binary". This
currently breaks the parse logic and results in multiple patch-ids
output for a single commit. Here we have to skip the contents of the
patch itself since those do not go into the patch id. --binary
implies --full-index so the object ids are always available.

When the diff is generated with --full-index there is no patch content
to skip over.

When a diff is generated without --full-index or --binary, it will
contain abbreviated object ids. This will still result in a sufficiently
unique patch-id when hashed, but does not match internal patch id
output. We'll call this ok for now as we already need specialized
arguments to diff in order to match internal patch id (namely -U3).

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
---
 builtin/patch-id.c  | 36 ++++++++++++++++++++++++++++++++++--
 t/t4204-patch-id.sh | 29 ++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 881fcf32732..e7a31123142 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -61,6 +61,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 {
 	int patchlen = 0, found_next = 0;
 	int before = -1, after = -1;
+	int diff_is_binary = 0;
+	char pre_oid_str[GIT_MAX_HEXSZ + 1], post_oid_str[GIT_MAX_HEXSZ + 1];
 	git_hash_ctx ctx;
 
 	the_hash_algo->init_fn(&ctx);
@@ -88,14 +90,44 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 
 		/* Parsing diff header?  */
 		if (before == -1) {
-			if (starts_with(line, "index "))
+			if (starts_with(line, "GIT binary patch") ||
+			    starts_with(line, "Binary files")) {
+				diff_is_binary = 1;
+				before = 0;
+				the_hash_algo->update_fn(&ctx, pre_oid_str,
+							 strlen(pre_oid_str));
+				the_hash_algo->update_fn(&ctx, post_oid_str,
+							 strlen(post_oid_str));
+				if (stable)
+					flush_one_hunk(result, &ctx);
 				continue;
-			else if (starts_with(line, "--- "))
+			} else if (skip_prefix(line, "index ", &p)) {
+				char *oid1_end = strstr(line, "..");
+				char *oid2_end = NULL;
+				if (oid1_end)
+					oid2_end = strstr(oid1_end, " ");
+				if (!oid2_end)
+					oid2_end = line + strlen(line) - 1;
+				if (oid1_end != NULL && oid2_end != NULL) {
+					*oid1_end = *oid2_end = '\0';
+					strlcpy(pre_oid_str, p, GIT_MAX_HEXSZ + 1);
+					strlcpy(post_oid_str, oid1_end + 2, GIT_MAX_HEXSZ + 1);
+				}
+				continue;
+			} else if (starts_with(line, "--- "))
 				before = after = 1;
 			else if (!isalpha(line[0]))
 				break;
 		}
 
+		if (diff_is_binary) {
+			if (starts_with(line, "diff ")) {
+				diff_is_binary = 0;
+				before = -1;
+			}
+			continue;
+		}
+
 		/* Looking for a valid hunk header?  */
 		if (before == 0 && after == 0) {
 			if (starts_with(line, "@@ -")) {
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index a730c0db985..cdc5191aa8d 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -42,7 +42,7 @@ calc_patch_id () {
 }
 
 get_top_diff () {
-	git log -p -1 "$@" -O bar-then-foo --
+	git log -p -1 "$@" -O bar-then-foo --full-index --
 }
 
 get_patch_id () {
@@ -61,6 +61,33 @@ test_expect_success 'patch-id detects inequality' '
 	get_patch_id notsame &&
 	! test_cmp patch-id_main patch-id_notsame
 '
+test_expect_success 'patch-id detects equality binary' '
+	cat >.gitattributes <<-\EOF &&
+	foo binary
+	bar binary
+	EOF
+	get_patch_id main &&
+	get_patch_id same &&
+	git log -p -1 --binary main >top-diff.output &&
+	calc_patch_id <top-diff.output main_binpatch &&
+	git log -p -1 --binary same >top-diff.output &&
+	calc_patch_id <top-diff.output same_binpatch &&
+	test_cmp patch-id_main patch-id_main_binpatch &&
+	test_cmp patch-id_same patch-id_same_binpatch &&
+	test_cmp patch-id_main patch-id_same &&
+	test_when_finished "rm .gitattributes"
+'
+
+test_expect_success 'patch-id detects inequality binary' '
+	cat >.gitattributes <<-\EOF &&
+	foo binary
+	bar binary
+	EOF
+	get_patch_id main &&
+	get_patch_id notsame &&
+	! test_cmp patch-id_main patch-id_notsame &&
+	test_when_finished "rm .gitattributes"
+'
 
 test_expect_success 'patch-id supports git-format-patch output' '
 	get_patch_id main &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 4/7] patch-id: fix patch-id for mode changes
  2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-10-14  8:56     ` [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
@ 2022-10-14  8:56     ` Jerry Zhang via GitGitGadget
  2022-10-14 21:17       ` Junio C Hamano
  2022-10-14  8:56     ` [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode Jerry Zhang via GitGitGadget
                       ` (3 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-14  8:56 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <Jerry@skydio.com>

Currently patch-id as used in rebase and cherry-pick does not account
for file modes if the file is modified. One consequence of this is
that if you have a local patch that changes modes, but upstream
has applied an outdated version of the patch that doesn't include
that mode change, "git rebase" will drop your local version of the
patch along with your mode changes. It also means that internal
patch-id doesn't produce the same output as the builtin, which does
account for mode changes due to them being part of diff output.

Fix by adding mode to the patch-id if it has changed, in the same
format that would be produced by diff, so that it is compatible
with builtin patch-id.

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
---
 diff.c                     |  5 +++++
 t/t3419-rebase-patch-id.sh | 31 ++++++++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 199b63dbcc3..0e336c48560 100644
--- a/diff.c
+++ b/diff.c
@@ -6256,6 +6256,11 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 		} else if (p->two->mode == 0) {
 			patch_id_add_string(&ctx, "deletedfilemode");
 			patch_id_add_mode(&ctx, p->one->mode);
+		} else if (p->one->mode != p->two->mode) {
+			patch_id_add_string(&ctx, "oldmode");
+			patch_id_add_mode(&ctx, p->one->mode);
+			patch_id_add_string(&ctx, "newmode");
+			patch_id_add_mode(&ctx, p->two->mode);
 		}
 
 		if (diff_header_only) {
diff --git a/t/t3419-rebase-patch-id.sh b/t/t3419-rebase-patch-id.sh
index d24e55aac8d..7181f176b81 100755
--- a/t/t3419-rebase-patch-id.sh
+++ b/t/t3419-rebase-patch-id.sh
@@ -48,7 +48,17 @@ test_expect_success 'setup: 500 lines' '
 	git branch -f squashed main &&
 	git checkout -q -f squashed &&
 	git reset -q --soft HEAD~2 &&
-	git commit -q -m squashed
+	git commit -q -m squashed &&
+
+	git branch -f mode main &&
+	git checkout -q -f mode &&
+	test_chmod +x file &&
+	git commit -q -a --amend &&
+
+	git branch -f modeother other &&
+	git checkout -q -f modeother &&
+	test_chmod +x file &&
+	git commit -q -a --amend
 '
 
 test_expect_success 'detect upstream patch' '
@@ -71,6 +81,13 @@ test_expect_success 'detect upstream patch binary' '
 	test_when_finished "rm .gitattributes"
 '
 
+test_expect_success 'detect upstream patch modechange' '
+	git checkout -q modeother^{} &&
+	git rebase mode &&
+	git rev-list mode...HEAD~ >revs &&
+	test_must_be_empty revs
+'
+
 test_expect_success 'do not drop patch' '
 	git checkout -q other^{} &&
 	test_must_fail git rebase squashed &&
@@ -85,4 +102,16 @@ test_expect_success 'do not drop patch binary' '
 	test_when_finished "rm .gitattributes"
 '
 
+test_expect_success 'do not drop patch modechange' '
+	git checkout -q modeother^{} &&
+	git rebase other &&
+	cat >expected <<-\EOF &&
+	diff --git a/file b/file
+	old mode 100644
+	new mode 100755
+	EOF
+	git diff HEAD~ >modediff &&
+	test_cmp expected modediff
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode
  2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-10-14  8:56     ` [PATCH v3 4/7] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
@ 2022-10-14  8:56     ` Jerry Zhang via GitGitGadget
  2022-10-14 21:24       ` Junio C Hamano
  2022-10-14  8:56     ` [PATCH v3 6/7] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
                       ` (2 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-14  8:56 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

There are situations where the user might not want the default setting
where patch-id strips all whitespace. They might be working in a
language where white space is syntactically important, or they might
have CI testing that enforces strict whitespace linting. In these cases,
a whitespace change would result in the patch fundamentally changing,
and thus deserving of a different id.

Add a new mode that is exclusive of --stable and --unstable called
--include-whitespace. It also corresponds to the config
patchid.include_whitespace = true. In this mode, the stable algorithm
is used and whitespace is not stripped from the patch text.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
fixes https://github.com/Skydio/revup/issues/2
---
 Documentation/git-patch-id.txt | 25 ++++++++----
 builtin/patch-id.c             | 74 ++++++++++++++++++++++------------
 t/t4204-patch-id.sh            | 66 +++++++++++++++++++++++++++---
 3 files changed, 126 insertions(+), 39 deletions(-)

diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt
index 442caff8a9c..8eab4cdfe1d 100644
--- a/Documentation/git-patch-id.txt
+++ b/Documentation/git-patch-id.txt
@@ -8,18 +8,18 @@ git-patch-id - Compute unique ID for a patch
 SYNOPSIS
 --------
 [verse]
-'git patch-id' [--stable | --unstable]
+'git patch-id' [--stable | --unstable | --include-whitespace]
 
 DESCRIPTION
 -----------
 Read a patch from the standard input and compute the patch ID for it.
 
 A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a
-patch, with whitespace and line numbers ignored.  As such, it's "reasonably
-stable", but at the same time also reasonably unique, i.e., two patches that
-have the same "patch ID" are almost guaranteed to be the same thing.
+patch, with line numbers ignored.  As such, it's "reasonably stable", but at
+the same time also reasonably unique, i.e., two patches that have the same
+"patch ID" are almost guaranteed to be the same thing.
 
-IOW, you can use this thing to look for likely duplicate commits.
+The main usecase for this command is to look for likely duplicate commits.
 
 When dealing with 'git diff-tree' output, it takes advantage of
 the fact that the patch is prefixed with the object name of the
@@ -30,6 +30,13 @@ This can be used to make a mapping from patch ID to commit ID.
 OPTIONS
 -------
 
+--include-whitespace::
+	Use the "stable" algorithm described below and also don't strip whitespace
+	from lines when calculating the patch-id.
+
+	This is the default if patchid.includeWhitespace is true and implies
+	patchid.stable.
+
 --stable::
 	Use a "stable" sum of hashes as the patch ID. With this option:
 	 - Reordering file diffs that make up a patch does not affect the ID.
@@ -45,14 +52,16 @@ OPTIONS
 	   of "-O<orderfile>", thereby making existing databases storing such
 	   "unstable" or historical patch-ids unusable.
 
+	 - All whitespace within the patch is ignored and does not affect the id.
+
 	This is the default if patchid.stable is set to true.
 
 --unstable::
 	Use an "unstable" hash as the patch ID. With this option,
 	the result produced is compatible with the patch-id value produced
-	by git 1.9 and older.  Users with pre-existing databases storing
-	patch-ids produced by git 1.9 and older (who do not deal with reordered
-	patches) may want to use this option.
+	by git 1.9 and older and whitespace is ignored.  Users with pre-existing
+	databases storing patch-ids produced by git 1.9 and older (who do not deal
+	with reordered patches) may want to use this option.
 
 	This is the default.
 
diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index e7a31123142..745fe193a71 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -2,6 +2,7 @@
 #include "builtin.h"
 #include "config.h"
 #include "diff.h"
+#include "parse-options.h"
 
 static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
 {
@@ -57,7 +58,7 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
 }
 
 static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
-			   struct strbuf *line_buf, int stable)
+			   struct strbuf *line_buf, int stable, int include_whitespace)
 {
 	int patchlen = 0, found_next = 0;
 	int before = -1, after = -1;
@@ -76,8 +77,11 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		if (!skip_prefix(line, "diff-tree ", &p) &&
 		    !skip_prefix(line, "commit ", &p) &&
 		    !skip_prefix(line, "From ", &p) &&
-		    starts_with(line, "\\ ") && 12 < strlen(line))
+		    starts_with(line, "\\ ") && 12 < strlen(line)) {
+			if (include_whitespace)
+				the_hash_algo->update_fn(&ctx, line, strlen(line));
 			continue;
+		}
 
 		if (!get_oid_hex(p, next_oid)) {
 			found_next = 1;
@@ -152,8 +156,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		if (line[0] == '+' || line[0] == ' ')
 			after--;
 
-		/* Compute the sha without whitespace */
-		len = remove_space(line);
+		/* Add line to hash algo (possibly removing whitespace) */
+		len = include_whitespace ? strlen(line) : remove_space(line);
 		patchlen += len;
 		the_hash_algo->update_fn(&ctx, line, len);
 	}
@@ -166,7 +170,7 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 	return patchlen;
 }
 
-static void generate_id_list(int stable)
+static void generate_id_list(int stable, int include_whitespace)
 {
 	struct object_id oid, n, result;
 	int patchlen;
@@ -174,21 +178,33 @@ static void generate_id_list(int stable)
 
 	oidclr(&oid);
 	while (!feof(stdin)) {
-		patchlen = get_one_patchid(&n, &result, &line_buf, stable);
+		patchlen = get_one_patchid(&n, &result, &line_buf, stable, include_whitespace);
 		flush_current_id(patchlen, &oid, &result);
 		oidcpy(&oid, &n);
 	}
 	strbuf_release(&line_buf);
 }
 
-static const char patch_id_usage[] = "git patch-id [--stable | --unstable]";
+static const char * const patch_id_usage[] = {
+	N_("git patch-id [--stable | --unstable | --include-whitespace]"),
+	NULL
+};
+
+struct patch_id_opts {
+	int stable;
+	int include_whitespace;
+};
 
 static int git_patch_id_config(const char *var, const char *value, void *cb)
 {
-	int *stable = cb;
+	struct patch_id_opts *opts = cb;
 
 	if (!strcmp(var, "patchid.stable")) {
-		*stable = git_config_bool(var, value);
+		opts->stable = git_config_bool(var, value);
+		return 0;
+	}
+	if (!strcmp(var, "patchid.includewhitespace")) {
+		opts->include_whitespace = git_config_bool(var, value);
 		return 0;
 	}
 
@@ -197,21 +213,29 @@ static int git_patch_id_config(const char *var, const char *value, void *cb)
 
 int cmd_patch_id(int argc, const char **argv, const char *prefix)
 {
-	int stable = -1;
-
-	git_config(git_patch_id_config, &stable);
-
-	/* If nothing is set, default to unstable. */
-	if (stable < 0)
-		stable = 0;
-
-	if (argc == 2 && !strcmp(argv[1], "--stable"))
-		stable = 1;
-	else if (argc == 2 && !strcmp(argv[1], "--unstable"))
-		stable = 0;
-	else if (argc != 1)
-		usage(patch_id_usage);
-
-	generate_id_list(stable);
+	/* if nothing is set, default to unstable */
+	struct patch_id_opts config = {0, 0};
+	int opts = 0;
+	struct option builtin_patch_id_options[] = {
+		OPT_CMDMODE(0, "unstable", &opts,
+			N_("use the unstable patch-id algorithm"), 1),
+		OPT_CMDMODE(0, "stable", &opts,
+			N_("use the stable patch-id algorithm"), 2),
+		OPT_CMDMODE(0, "include-whitespace", &opts,
+			N_("use the stable algorithm and don't strip whitespace"), 3),
+		OPT_END()
+	};
+
+	git_config(git_patch_id_config, &config);
+
+	/* includeWhitespace implies stable */
+	if (config.include_whitespace)
+		config.stable = 1;
+
+	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
+			     patch_id_usage, 0);
+
+	generate_id_list(opts ? opts > 1 : config.stable,
+			 opts ? opts == 3 : config.include_whitespace);
 	return 0;
 }
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index cdc5191aa8d..107e5a59fee 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -8,13 +8,13 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 . ./test-lib.sh
 
 test_expect_success 'setup' '
-	as="a a a a a a a a" && # eight a
-	test_write_lines $as >foo &&
-	test_write_lines $as >bar &&
+	str="ab cd ef gh ij kl mn op" &&
+	test_write_lines $str >foo &&
+	test_write_lines $str >bar &&
 	git add foo bar &&
 	git commit -a -m initial &&
-	test_write_lines $as b >foo &&
-	test_write_lines $as b >bar &&
+	test_write_lines $str b >foo &&
+	test_write_lines $str b >bar &&
 	git commit -a -m first &&
 	git checkout -b same main &&
 	git commit --amend -m same-msg &&
@@ -22,8 +22,23 @@ test_expect_success 'setup' '
 	echo c >foo &&
 	echo c >bar &&
 	git commit --amend -a -m notsame-msg &&
+	git checkout -b with_space main~ &&
+	cat >foo <<-\EOF &&
+	a  b
+	c d
+	e    f
+	  g   h
+	    i   j
+	k l
+	m   n
+	op
+	EOF
+	cp foo bar &&
+	git add foo bar &&
+	git commit --amend -m "with spaces" &&
 	test_write_lines bar foo >bar-then-foo &&
 	test_write_lines foo bar >foo-then-bar
+
 '
 
 test_expect_success 'patch-id output is well-formed' '
@@ -128,9 +143,21 @@ test_patch_id_file_order () {
 	git format-patch -1 --stdout -O foo-then-bar >format-patch.output &&
 	calc_patch_id <format-patch.output "ordered-$name" "$@" &&
 	cmp_patch_id $relevant "$name" "ordered-$name"
+}
 
+test_patch_id_whitespace () {
+	relevant="$1"
+	shift
+	name="ws-${1}-$relevant"
+	shift
+	get_top_diff "main~" >top-diff.output &&
+	calc_patch_id <top-diff.output "$name" "$@" &&
+	get_top_diff "with_space" >top-diff.output &&
+	calc_patch_id <top-diff.output "ws-$name" "$@" &&
+	cmp_patch_id $relevant "$name" "ws-$name"
 }
 
+
 # combined test for options: add more tests here to make them
 # run with all options
 test_patch_id () {
@@ -146,6 +173,14 @@ test_expect_success 'file order is relevant with --unstable' '
 	test_patch_id_file_order relevant --unstable --unstable
 '
 
+test_expect_success 'whitespace is relevant with --include-whitespace' '
+	test_patch_id_whitespace relevant --include-whitespace --include-whitespace
+'
+
+test_expect_success 'whitespace is irrelevant without --include-whitespace' '
+	test_patch_id_whitespace irrelevant --stable --stable
+'
+
 #Now test various option combinations.
 test_expect_success 'default is unstable' '
 	test_patch_id relevant default
@@ -161,6 +196,17 @@ test_expect_success 'patchid.stable = false is unstable' '
 	test_patch_id relevant patchid.stable=false
 '
 
+test_expect_success 'patchid.includeWhitespace = true is correct and stable' '
+	test_config patchid.includeWhitespace true &&
+	test_patch_id_whitespace relevant patchid.includeWhitespace=true &&
+	test_patch_id irrelevant patchid.includeWhitespace=true
+'
+
+test_expect_success 'patchid.includeWhitespace = false is unstable' '
+	test_config patchid.includeWhitespace false &&
+	test_patch_id relevant patchid.includeWhitespace=false
+'
+
 test_expect_success '--unstable overrides patchid.stable = true' '
 	test_config patchid.stable true &&
 	test_patch_id relevant patchid.stable=true--unstable --unstable
@@ -171,6 +217,11 @@ test_expect_success '--stable overrides patchid.stable = false' '
 	test_patch_id irrelevant patchid.stable=false--stable --stable
 '
 
+test_expect_success '--include-whitespace overrides patchid.stable = false' '
+	test_config patchid.stable false &&
+	test_patch_id_whitespace relevant stable=false--include-whitespace --include-whitespace
+'
+
 test_expect_success 'patch-id supports git-format-patch MIME output' '
 	get_patch_id main &&
 	git checkout same &&
@@ -225,7 +276,10 @@ test_expect_success 'patch-id handles no-nl-at-eof markers' '
 	EOF
 	calc_patch_id nonl <nonl &&
 	calc_patch_id withnl <withnl &&
-	test_cmp patch-id_nonl patch-id_withnl
+	test_cmp patch-id_nonl patch-id_withnl &&
+	calc_patch_id nonl-inc-ws --include-whitespace <nonl &&
+	calc_patch_id withnl-inc-ws --include-whitespace <withnl &&
+	! test_cmp patch-id_nonl-inc-ws patch-id_withnl-inc-ws
 '
 
 test_expect_success 'patch-id handles diffs with one line of before/after' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 6/7] builtin: patch-id: remove unused diff-tree prefix
  2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-10-14  8:56     ` [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode Jerry Zhang via GitGitGadget
@ 2022-10-14  8:56     ` Jerry Zhang via GitGitGadget
  2022-10-14 22:03       ` Junio C Hamano
  2022-10-14  8:56     ` [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match Jerry Zhang via GitGitGadget
  2022-10-20 23:16     ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  7 siblings, 1 reply; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-14  8:56 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <Jerry@skydio.com>

From a "git grep" of the repo, no command, including diff-tree itself,
produces diff output with "diff-tree " prefixed in the header.

Thus remove its handling in "patch-id".

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
---
 builtin/patch-id.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 745fe193a71..c37b8f573b7 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -74,8 +74,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		const char *p = line;
 		int len;
 
-		if (!skip_prefix(line, "diff-tree ", &p) &&
-		    !skip_prefix(line, "commit ", &p) &&
+		/* Possibly skip over the prefix added by "log" or "format-patch" */
+		if (!skip_prefix(line, "commit ", &p) &&
 		    !skip_prefix(line, "From ", &p) &&
 		    starts_with(line, "\\ ") && 12 < strlen(line)) {
 			if (include_whitespace)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match
  2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-10-14  8:56     ` [PATCH v3 6/7] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
@ 2022-10-14  8:56     ` Jerry Zhang via GitGitGadget
  2022-10-17 15:18       ` Junio C Hamano
  2022-10-20 23:16     ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  7 siblings, 1 reply; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-14  8:56 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <Jerry@skydio.com>

The documentation for format-patch advertises that the ids
it generates for prerequisite patches are the same as piping
the patch into "git patch-id". Clarify here that this is only
true if the patch was generated with -U3, and for binary patches,
with --full-index.

Note that the actual equivalence isn't currently tested. Aside
from a few cases fixed in this patch series, I've seen some
uncommon situations where "git diff" and the internal diff api
actually generate equally valid diffs of the same length, but
with lines reordered, which results in different patch-ids.

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
---
 Documentation/git-format-patch.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt
index dfcc7da4c21..566d4b486dd 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -668,8 +668,8 @@ of 'base commit' in topological order before the patches can be applied.
 The 'base commit' is shown as "base-commit: " followed by the 40-hex of
 the commit object name.  A 'prerequisite patch' is shown as
 "prerequisite-patch-id: " followed by the 40-hex 'patch id', which can
-be obtained by passing the patch through the `git patch-id --stable`
-command.
+be obtained by passing the patch (generated with -U3 --full-index) through
+the `git patch-id --stable` command.
 
 Imagine that on top of the public commit P, you applied well-known
 patches X, Y and Z from somebody else, and then built your three-patch
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs
  2022-10-14  8:56     ` [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
@ 2022-10-14 17:13       ` Junio C Hamano
  2022-10-14 22:33         ` Jerry Zhang
  2022-10-14 21:12       ` Junio C Hamano
  1 sibling, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2022-10-14 17:13 UTC (permalink / raw)
  To: Jerry Zhang via GitGitGadget; +Cc: git, Jerry Zhang

"Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jerry Zhang <Jerry@skydio.com>
>
> "git patch-id" currently doesn't produce correct output if the
> incoming diff has any binary files. Add logic to
> get_one_patchid to handle the different possible styles of binary
> diff. This attempts to keep resulting patch-ids identical to what
> would be produced by the counterpart logic in diff.c, that is it
> produces the id by hashing the a and b oids in succession.

It is sad that we have two separate implementations in the first
place.  Do you see if it is feasible to unify the implementation
by reusing one from the other (answering this is not a requirement
for this patch to be looked at)?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs
  2022-10-14  8:56     ` [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
  2022-10-14 17:13       ` Junio C Hamano
@ 2022-10-14 21:12       ` Junio C Hamano
  2022-10-14 22:34         ` Jerry Zhang
  1 sibling, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2022-10-14 21:12 UTC (permalink / raw)
  To: Jerry Zhang via GitGitGadget; +Cc: git, Jerry Zhang

"Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jerry Zhang <Jerry@skydio.com>
>
> "git patch-id" currently doesn't produce correct output if the
> incoming diff has any binary files. Add logic to
> get_one_patchid to handle the different possible styles of binary
> diff. This attempts to keep resulting patch-ids identical to what
> would be produced by the counterpart logic in diff.c, that is it
> produces the id by hashing the a and b oids in succession.

I thought I saw that a previous step touched diff.c to change how
patch ID for a binary diff is computed to match what patch-id
command computes?  Now we also have to change patch-id?  In the end
output from both may match, but which one between diff and patch-id
have we standardised on?

Puzzled...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 4/7] patch-id: fix patch-id for mode changes
  2022-10-14  8:56     ` [PATCH v3 4/7] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
@ 2022-10-14 21:17       ` Junio C Hamano
  0 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2022-10-14 21:17 UTC (permalink / raw)
  To: Jerry Zhang via GitGitGadget; +Cc: git, Jerry Zhang

"Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Currently patch-id as used in rebase and cherry-pick does not account
> for file modes if the file is modified. One consequence of this is
> that if you have a local patch that changes modes, but upstream
> has applied an outdated version of the patch that doesn't include
> that mode change, "git rebase" will drop your local version of the
> patch along with your mode changes.

Hmph, it may be a feature and a curse depending on the phase of the
moon and what you are using the patch-id computation to see if you
already have an identical change.  But attempting to apply a patch
after applying the same patch with different mode bits will likely
do either the right thing (i.e. taking the mode changes only) or
result in conflict to draw human attention, so this change is a
definite improvement over possibly dropping a change silently.

Good.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode
  2022-10-14  8:56     ` [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode Jerry Zhang via GitGitGadget
@ 2022-10-14 21:24       ` Junio C Hamano
  2022-10-14 22:55         ` Jerry Zhang
  2022-10-17 15:38         ` Junio C Hamano
  0 siblings, 2 replies; 46+ messages in thread
From: Junio C Hamano @ 2022-10-14 21:24 UTC (permalink / raw)
  To: Jerry Zhang via GitGitGadget; +Cc: git, Jerry Zhang

"Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +--include-whitespace::
> +	Use the "stable" algorithm described below and also don't strip whitespace
> +	from lines when calculating the patch-id.
> +
> +	This is the default if patchid.includeWhitespace is true and implies
> +	patchid.stable.

This seems very much orthogonal to "--stable/--unstable.  

Because the "--stable" variant is more expensive than "--unstable",
I am not sure why such an implication is a good thing to have.  Why
can we not have

    --include-whitespace --stable
    --include-whitespace --unstable

both combinations valid?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 6/7] builtin: patch-id: remove unused diff-tree prefix
  2022-10-14  8:56     ` [PATCH v3 6/7] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
@ 2022-10-14 22:03       ` Junio C Hamano
  0 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2022-10-14 22:03 UTC (permalink / raw)
  To: Jerry Zhang via GitGitGadget; +Cc: git, Jerry Zhang

"Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jerry Zhang <Jerry@skydio.com>
>
> From a "git grep" of the repo, no command, including diff-tree itself,
> produces diff output with "diff-tree " prefixed in the header.

I think you are lucky and it is OK in this case, but the "grep" only
tells about the current source, and a bit more due dilligence is in
general needed.

 * f9767222 (Add "git-patch-id" program to generate patch ID's.,
   2005-06-23) introduced the line that prepares us to see
   "diff-tree" prefix.  In that version, "git diff-tree --pretty
   --stdin" did use the prefix (it comes from 'diff-tree.c").

 * 5f1c3f07 (log-tree: separate major part of diff-tree.,
   2006-04-09) moved things around from "diff-tree.c" to
   "log-tree.c", without changing the behaviour.

 * cd2bdc53 (Common option parsing for "git log --diff" and friends,
   2006-04-14) further moved revs->header_prefix set-up to
   revision.c::setup_revisions(), without changing the behaviour.

 * 91539833 (Log message printout cleanups, 2006-04-17) did change
   the things drastically.  Its log message says:

       This does change "git whatchanged" from using "diff-tree" as
       the commit descriptor to "commit", and I changed one of the
       tests to reflect that new reality. Otherwise everything still
       passes, and my other tests look fine too.

As long as nobody keeps output from version of Git before v1.3.0
this change is safe to do ;-)

There may be third-party tools that was written in 2005-2006 that
still emit diff-tree prefix, but I somehow do not think it is likely
anobody would feed such an output to us.  Given how widely Git is
used, I might be overly optimistic, though.

> Thus remove its handling in "patch-id".
>
> Signed-off-by: Jerry Zhang <Jerry@skydio.com>
> ---
>  builtin/patch-id.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/patch-id.c b/builtin/patch-id.c
> index 745fe193a71..c37b8f573b7 100644
> --- a/builtin/patch-id.c
> +++ b/builtin/patch-id.c
> @@ -74,8 +74,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
>  		const char *p = line;
>  		int len;
>  
> -		if (!skip_prefix(line, "diff-tree ", &p) &&
> -		    !skip_prefix(line, "commit ", &p) &&
> +		/* Possibly skip over the prefix added by "log" or "format-patch" */
> +		if (!skip_prefix(line, "commit ", &p) &&
>  		    !skip_prefix(line, "From ", &p) &&
>  		    starts_with(line, "\\ ") && 12 < strlen(line)) {
>  			if (include_whitespace)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs
  2022-10-14 17:13       ` Junio C Hamano
@ 2022-10-14 22:33         ` Jerry Zhang
  0 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang @ 2022-10-14 22:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jerry Zhang via GitGitGadget, git

On Fri, Oct 14, 2022 at 10:13 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Jerry Zhang <Jerry@skydio.com>
> >
> > "git patch-id" currently doesn't produce correct output if the
> > incoming diff has any binary files. Add logic to
> > get_one_patchid to handle the different possible styles of binary
> > diff. This attempts to keep resulting patch-ids identical to what
> > would be produced by the counterpart logic in diff.c, that is it
> > produces the id by hashing the a and b oids in succession.
>
> It is sad that we have two separate implementations in the first
> place.  Do you see if it is feasible to unify the implementation
> by reusing one from the other (answering this is not a requirement
> for this patch to be looked at)?
Yeah I wondered this myself, it's tricky because they are actually doing
opposite things: the diff.c logic is adding diff metadata before doing the
patch-id, while the patch-id logic is parsing out the diff metadata. We could
refactor it, but would have to be careful not to accidentally change the output
semantics.

Another possible path to "unifying" the logic would be to add a
"--patch-id" mode
to "git diff' that produces the patch-id of what would be the diff,
rather than the diff
itself. For the usecases that involve piping "git diff" into "git
patch-id", this would
require not needing the separate patch-id tool at all. Of course
people also like
to run "patch-id" on the output of "format-patch" after the fact so
this isn't a perfect
solution either.

Speaking of which, do you have some context as to why we promise that
"git patch-id"
output will remain the same across git versions? Were there cases in
the past where
people actually made persistent databases of patch-ids, or complained
about the output
changing? I ask because this requirement makes it difficult to make
big changes, and
there aren't any tests to verify consistent output between git
versions. Also git itself
is already a persistent database of patches, so I'm not sure why
someone would choose
to implement a new system for this.
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs
  2022-10-14 21:12       ` Junio C Hamano
@ 2022-10-14 22:34         ` Jerry Zhang
  2022-10-17 15:23           ` Junio C Hamano
  0 siblings, 1 reply; 46+ messages in thread
From: Jerry Zhang @ 2022-10-14 22:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jerry Zhang via GitGitGadget, git

On Fri, Oct 14, 2022 at 2:12 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Jerry Zhang <Jerry@skydio.com>
> >
> > "git patch-id" currently doesn't produce correct output if the
> > incoming diff has any binary files. Add logic to
> > get_one_patchid to handle the different possible styles of binary
> > diff. This attempts to keep resulting patch-ids identical to what
> > would be produced by the counterpart logic in diff.c, that is it
> > produces the id by hashing the a and b oids in succession.
>
> I thought I saw that a previous step touched diff.c to change how
> patch ID for a binary diff is computed to match what patch-id
> command computes?  Now we also have to change patch-id?  In the end
> output from both may match, but which one between diff and patch-id
> have we standardised on?
Er yeah let me see if I can simplify.

Before:
Internal patch-id w/ unstable + binary was correct
Internal patch-id w/ stable + binary was broken
builtin patch-id w/ binary was broken

After:
Internal patch-id w/ unstable + binary is correct
Internal patch-id w/ stable + binary is now correct
builtin patch-id w/ binary is now correct

So the "standard" actually came from the one working case from
"before", which was the diff.c logic + unstable. I based all new logic
on that because it seemed reasonable and correct. Since "internal
w/unstable" is never exposed externally, it's perhaps true that i
could have invented a totally new format and standardized on that.
Hashing the oids in succession is pretty much representative of a
binary patch though, so I don't think there's much to be improved on.
>
> Puzzled...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode
  2022-10-14 21:24       ` Junio C Hamano
@ 2022-10-14 22:55         ` Jerry Zhang
  2022-10-17 15:38         ` Junio C Hamano
  1 sibling, 0 replies; 46+ messages in thread
From: Jerry Zhang @ 2022-10-14 22:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jerry Zhang via GitGitGadget, git

On Fri, Oct 14, 2022 at 2:24 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +--include-whitespace::
> > +     Use the "stable" algorithm described below and also don't strip whitespace
> > +     from lines when calculating the patch-id.
> > +
> > +     This is the default if patchid.includeWhitespace is true and implies
> > +     patchid.stable.
>
> This seems very much orthogonal to "--stable/--unstable.
>
> Because the "--stable" variant is more expensive than "--unstable",
I didn't realize it was more expensive, I'm assuming you mean in terms
of time, maybe it does
slightly more hashing operations under the hood?  I tried timing some
runs locally
and they were a wash:

time /bin/sh -c "git show | git patch-id --stable"
dddea79ee68d62a32cf8c0d7bb6691bcd0445628
4677fe858366a51ff3c5a0c0893418e32e934262

real 0m0.011s
user 0m0.003s
sys 0m0.012s
time /bin/sh -c "git show | git patch-id"
6602a3b2fe8b17d5bc295c2703901ad3e18eee18
4677fe858366a51ff3c5a0c0893418e32e934262

real 0m0.012s
user 0m0.009s
sys 0m0.007s

The operation is probably bound by process / disk overhead quite a bit
and a small
amount of cpu use wouldn't really be user-visible. Based on these
results I don't think
a user would choose --unstable just for the speed gain (if any).

> I am not sure why such an implication is a good thing to have.  Why
> can we not have
>
>     --include-whitespace --stable
>     --include-whitespace --unstable
>
> both combinations valid?
If you accept my point above, then a user would only choose
"--unstable" if they actually
had a need for backwards compatibility, such as for a persistent
database. Trying to include
whitespace on top of that would break the compatibility they're
relying on. So my conclusion was
that there isn't any usecase for the combination "--include-whitespace
--unstable", and it's better for
usability and not needing to always maintain compatibility if we don't
expose it to users at all.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match
  2022-10-14  8:56     ` [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match Jerry Zhang via GitGitGadget
@ 2022-10-17 15:18       ` Junio C Hamano
  2022-10-18 21:57         ` Jerry Zhang
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2022-10-17 15:18 UTC (permalink / raw)
  To: Jerry Zhang via GitGitGadget; +Cc: git, Jerry Zhang

"Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  The 'base commit' is shown as "base-commit: " followed by the 40-hex of
>  the commit object name.  A 'prerequisite patch' is shown as
>  "prerequisite-patch-id: " followed by the 40-hex 'patch id', which can
> -be obtained by passing the patch through the `git patch-id --stable`
> -command.
> +be obtained by passing the patch (generated with -U3 --full-index) through
> +the `git patch-id --stable` command.

This is not incorrect per-se, but I wonder how much it would help or
mislead people in practice.

I understand that the update means well to help those who complain
"'patch-id' produces wrong result when I feed the output of 'git
diff -U1'" by making them suspect that their -U1 may be the culprit.
But the new description does not cover everything that can affect
the resulting patch ID (the choice of --diff-algorithm affects how
common lines are matched up between the preimage and the postimage,
for example).

So, I dunno.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs
  2022-10-14 22:34         ` Jerry Zhang
@ 2022-10-17 15:23           ` Junio C Hamano
  0 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2022-10-17 15:23 UTC (permalink / raw)
  To: Jerry Zhang; +Cc: Jerry Zhang via GitGitGadget, git

Jerry Zhang <jerry@skydio.com> writes:

>> I thought I saw that a previous step touched diff.c to change how
>> patch ID for a binary diff is computed to match what patch-id
>> command computes?  Now we also have to change patch-id?  In the end
>> output from both may match, but which one between diff and patch-id
>> have we standardised on?
> Er yeah let me see if I can simplify.
>
> Before:
> Internal patch-id w/ unstable + binary was correct
> Internal patch-id w/ stable + binary was broken
> builtin patch-id w/ binary was broken
>
> After:
> Internal patch-id w/ unstable + binary is correct
> Internal patch-id w/ stable + binary is now correct
> builtin patch-id w/ binary is now correct
>
> So the "standard" actually came from the one working case from
> "before", which was the diff.c logic + unstable.

OK.

The question was meant to help you improve the log message, as it is
something a future reader of "git log" would wonder after reading
them.  I think including something that makes it easy for readers to
arrive at the summary above themselves by reading the log message
would be a very much welcome change.

Thanks.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode
  2022-10-14 21:24       ` Junio C Hamano
  2022-10-14 22:55         ` Jerry Zhang
@ 2022-10-17 15:38         ` Junio C Hamano
  2022-10-18 22:12           ` Jerry Zhang
  1 sibling, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2022-10-17 15:38 UTC (permalink / raw)
  To: Jerry Zhang via GitGitGadget; +Cc: git, Jerry Zhang

Junio C Hamano <gitster@pobox.com> writes:

> "Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> +--include-whitespace::
>> +	Use the "stable" algorithm described below and also don't strip whitespace
>> +	from lines when calculating the patch-id.
>> +
>> +	This is the default if patchid.includeWhitespace is true and implies
>> +	patchid.stable.
>
> This seems very much orthogonal to "--stable/--unstable.  
>
> Because the "--stable" variant is more expensive than "--unstable",

Sorry, I misspoke.  The way we make the result stable is *not* by
enforcing a fixed order to the input of the hash (which would have
been more expensive), but by hashing each file separately and
summing up the hashes, and it shouldn't be noticeably more expensive
than the unstable variant.

So, I do not think I mind if we introduced --include-whitespace as a
third option in addition to --stable and --unstable, instead of allowing
it to be combined with both --stable and --unstable.

But I wonder:

 * (minor) Would "--verbatim" work as a better option name?  The
   name "--include-whitespace" can apply even to an implementation
   that squashes multiple consecutive spaces and tabs into a single
   space, i.e. we keep words on a single line as separate words,
   instead of squishing them together, when hashing.

 * Do users even care about the internal reliance on the "stable"
   algorithm?  Wouldn't it be better to leave such an implementation
   detail unsaid?  After all, "--verbatim --unstable" would not work
   as they expect.

So I would suggest dropping "and implies patchid.stable" from the
above description.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match
  2022-10-17 15:18       ` Junio C Hamano
@ 2022-10-18 21:57         ` Jerry Zhang
  2022-10-19 16:19           ` Junio C Hamano
  0 siblings, 1 reply; 46+ messages in thread
From: Jerry Zhang @ 2022-10-18 21:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jerry Zhang via GitGitGadget, git

On Mon, Oct 17, 2022 at 8:19 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> >  The 'base commit' is shown as "base-commit: " followed by the 40-hex of
> >  the commit object name.  A 'prerequisite patch' is shown as
> >  "prerequisite-patch-id: " followed by the 40-hex 'patch id', which can
> > -be obtained by passing the patch through the `git patch-id --stable`
> > -command.
> > +be obtained by passing the patch (generated with -U3 --full-index) through
> > +the `git patch-id --stable` command.
>
> This is not incorrect per-se, but I wonder how much it would help or
> mislead people in practice.
>
> I understand that the update means well to help those who complain
> "'patch-id' produces wrong result when I feed the output of 'git
> diff -U1'" by making them suspect that their -U1 may be the culprit.
> But the new description does not cover everything that can affect
> the resulting patch ID (the choice of --diff-algorithm affects how
> common lines are matched up between the preimage and the postimage,
> for example).
I can add a note about diff algorithm as well. Its slightly different
from the other two options
in that there should be fewer cases where diff algorithm is a problem.
U3 and full-index aren't
the default options to "git diff" so people are more likely to have
the wrong options,
whereas they would explicitly have to run the two diffs with different
algorithms to
run into a problem. But there's no harm in being more specific.
>
> So, I dunno.
Keeping patch-ids the same is difficult especially between different
git versions, as
changes could be made to "diff" that seem like improvements but would
slightly change
patch-id even with the same args. I expect that people generally aren't keeping
the compatibility of patch-id in mind when changing diff. Nevertheless
since we've
already advertised that they would match, we ought to give users the best advice
possible to minimize if not eliminate confusion.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode
  2022-10-17 15:38         ` Junio C Hamano
@ 2022-10-18 22:12           ` Jerry Zhang
  0 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang @ 2022-10-18 22:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jerry Zhang via GitGitGadget, git

On Mon, Oct 17, 2022 at 8:38 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > "Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >> +--include-whitespace::
> >> +    Use the "stable" algorithm described below and also don't strip whitespace
> >> +    from lines when calculating the patch-id.
> >> +
> >> +    This is the default if patchid.includeWhitespace is true and implies
> >> +    patchid.stable.
> >
> > This seems very much orthogonal to "--stable/--unstable.
> >
> > Because the "--stable" variant is more expensive than "--unstable",
>
> Sorry, I misspoke.  The way we make the result stable is *not* by
> enforcing a fixed order to the input of the hash (which would have
> been more expensive), but by hashing each file separately and
> summing up the hashes, and it shouldn't be noticeably more expensive
> than the unstable variant.
>
> So, I do not think I mind if we introduced --include-whitespace as a
> third option in addition to --stable and --unstable, instead of allowing
> it to be combined with both --stable and --unstable.
>
> But I wonder:
>
>  * (minor) Would "--verbatim" work as a better option name?  The
>    name "--include-whitespace" can apply even to an implementation
>    that squashes multiple consecutive spaces and tabs into a single
>    space, i.e. we keep words on a single line as separate words,
>    instead of squishing them together, when hashing.
>
>  * Do users even care about the internal reliance on the "stable"
>    algorithm?  Wouldn't it be better to leave such an implementation
>    detail unsaid?  After all, "--verbatim --unstable" would not work
>    as they expect.
>
> So I would suggest dropping "and implies patchid.stable" from the
> above description.
Sure I can spin a new series with these changes
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match
  2022-10-18 21:57         ` Jerry Zhang
@ 2022-10-19 16:19           ` Junio C Hamano
  0 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2022-10-19 16:19 UTC (permalink / raw)
  To: Jerry Zhang; +Cc: Jerry Zhang via GitGitGadget, git

Jerry Zhang <jerry@skydio.com> writes:

> I can add a note about diff algorithm as well.

That's totally different from what I had in mind.  We _could_ be
more specific, but I do not think that helps users, as we cannot
promise we will keep using the same diff algorithm and parameters,
and the implementation would change to give "better" output for
human consumption that is not byte-for-byte identical to older one.

>> So, I dunno.

So, I do not know if more description is a good idea to begin with.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 0/6] patch-id fixes and improvements
  2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                       ` (6 preceding siblings ...)
  2022-10-14  8:56     ` [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match Jerry Zhang via GitGitGadget
@ 2022-10-20 23:16     ` Jerry Zhang via GitGitGadget
  2022-10-20 23:16       ` [PATCH v4 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
                         ` (6 more replies)
  7 siblings, 7 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-20 23:16 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang

These patches add fixes and features to the "git patch-id" command, mostly
discovered through our usage of patch-id in the revup project
(https://github.com/Skydio/revup). On top of that I've tried to make general
cleanup changes where I can.

Summary:

1: Fixed a bug in the combination of --stable with binary files and
header-only, and expanded the test to cover both binary and non-binary
files.

2: Switch internal usage of patch-id in rebase / cherry-pick to use the
stable variant to reduce the number of code paths and improve testing for
bugs like above.

3: Fixed bugs with patch-id and binary diffs. Previously patch-id did not
behave correctly for binary diffs regardless of whether "--binary" was given
to "diff".

4: Fixed bugs with patch-id and mode changes. Previously mode changes were
incorrectly excluded from the patch-id.

5: Add a new "--include-whitespace" mode to patch-id that prevents
whitespace from being stripped during id calculation. Also add a config
option for the same behavior.

6: Remove unused prefix from patch-id logic.

V1->V2: Fixed comment style V2->V3: The ---/+++ lines no longer get added to
the patch-id of binary diffs. Also added patches 3-7 in the series. V3->V3:
Dropped patch7. Updated flag name to --verbatim. Updated commit message
descriptions.

Signed-off-by: Jerry Zhang jerry@skydio.com

Jerry Zhang (6):
  patch-id: fix stable patch id for binary / header-only
  patch-id: use stable patch-id for rebases
  builtin: patch-id: fix patch-id with binary diffs
  patch-id: fix patch-id for mode changes
  builtin: patch-id: add --verbatim as a command mode
  builtin: patch-id: remove unused diff-tree prefix

 Documentation/git-patch-id.txt |  24 ++++---
 builtin/log.c                  |   2 +-
 builtin/patch-id.c             | 113 ++++++++++++++++++++++++---------
 diff.c                         |  75 +++++++++++-----------
 diff.h                         |   2 +-
 patch-ids.c                    |  10 +--
 patch-ids.h                    |   2 +-
 t/t3419-rebase-patch-id.sh     |  63 +++++++++++++++---
 t/t4204-patch-id.sh            |  95 +++++++++++++++++++++++++--
 9 files changed, 287 insertions(+), 99 deletions(-)


base-commit: 45c9f05c44b1cb6bd2d6cb95a22cf5e3d21d5b63
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1359%2Fjerry-skydio%2Fjerry%2Frevup%2Fmaster%2Fpatch_ids-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1359/jerry-skydio/jerry/revup/master/patch_ids-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1359

Range-diff vs v3:

 1:  7d4c2e91ce0 ! 1:  321757ef919 patch-id: fix stable patch id for binary / header-only
     @@ Metadata
       ## Commit message ##
          patch-id: fix stable patch id for binary / header-only
      
     -    Previous logic here skipped flushing the hunks for binary
     -    and header-only patch ids, which would always result in a
     -    patch-id of 0000.
     +    Patch-ids for binary patches are found by hashing the object
     +    ids of the before and after objects in succession. However in
     +    the --stable case, there is a bug where hunks are not flushed
     +    for binary and header-only patch ids, which would always result
     +    in a patch-id of 0000. The --unstable case is currently correct.
      
          Reorder the logic to branch into 3 cases for populating the
          patch body: header-only which populates nothing, binary which
 2:  25e28b7dab3 = 2:  ec4a2422d5b patch-id: use stable patch-id for rebases
 3:  21642128927 ! 3:  81501355313 builtin: patch-id: fix patch-id with binary diffs
     @@ Commit message
          builtin: patch-id: fix patch-id with binary diffs
      
          "git patch-id" currently doesn't produce correct output if the
     -    incoming diff has any binary files. Add logic to
     -    get_one_patchid to handle the different possible styles of binary
     -    diff. This attempts to keep resulting patch-ids identical to what
     -    would be produced by the counterpart logic in diff.c, that is it
     -    produces the id by hashing the a and b oids in succession.
     +    incoming diff has any binary files. Add logic to get_one_patchid
     +    to handle the different possible styles of binary diff. This
     +    attempts to keep resulting patch-ids identical to what would be
     +    produced by the counterpart logic in diff.c, that is it produces
     +    the id by hashing the a and b oids in succession.
      
          In general we handle binary diffs by first caching the object ids from
          the "index" line and using those if we then find an indication
 4:  6e07cfd5691 = 4:  bb0b4add03c patch-id: fix patch-id for mode changes
 5:  bbaa2425ad0 ! 5:  b160f2ae49f builtin: patch-id: add --include-whitespace as a command mode
     @@ Metadata
      Author: Jerry Zhang <jerry@skydio.com>
      
       ## Commit message ##
     -    builtin: patch-id: add --include-whitespace as a command mode
     +    builtin: patch-id: add --verbatim as a command mode
      
     -    There are situations where the user might not want the default setting
     -    where patch-id strips all whitespace. They might be working in a
     -    language where white space is syntactically important, or they might
     -    have CI testing that enforces strict whitespace linting. In these cases,
     -    a whitespace change would result in the patch fundamentally changing,
     -    and thus deserving of a different id.
     +    There are situations where the user might not want the default
     +    setting where patch-id strips all whitespace. They might be working
     +    in a language where white space is syntactically important, or they
     +    might have CI testing that enforces strict whitespace linting. In
     +    these cases, a whitespace change would result in the patch
     +    fundamentally changing, and thus deserving of a different id.
      
          Add a new mode that is exclusive of --stable and --unstable called
     -    --include-whitespace. It also corresponds to the config
     -    patchid.include_whitespace = true. In this mode, the stable algorithm
     -    is used and whitespace is not stripped from the patch text.
     +    --verbatim. It also corresponds to the config
     +    patchid.verbatim = true. In this mode, the stable algorithm is
     +    used and whitespace is not stripped from the patch text.
     +
     +    Users of --unstable mainly care about compatibility with old git
     +    versions, which unstripping the whitespace would break. Thus there
     +    isn't a usecase for the combination of --verbatim and --unstable,
     +    and we don't expose this so as to not add maintainence burden.
      
          Signed-off-by: Jerry Zhang <jerry@skydio.com>
          fixes https://github.com/Skydio/revup/issues/2
     @@ Documentation/git-patch-id.txt: git-patch-id - Compute unique ID for a patch
       --------
       [verse]
      -'git patch-id' [--stable | --unstable]
     -+'git patch-id' [--stable | --unstable | --include-whitespace]
     ++'git patch-id' [--stable | --unstable | --verbatim]
       
       DESCRIPTION
       -----------
     @@ Documentation/git-patch-id.txt: This can be used to make a mapping from patch ID
       OPTIONS
       -------
       
     -+--include-whitespace::
     -+	Use the "stable" algorithm described below and also don't strip whitespace
     -+	from lines when calculating the patch-id.
     ++--verbatim::
     ++	Calculate the patch-id of the input as it is given, do not strip
     ++	any whitespace.
      +
     -+	This is the default if patchid.includeWhitespace is true and implies
     -+	patchid.stable.
     ++	This is the default if patchid.verbatim is true.
      +
       --stable::
       	Use a "stable" sum of hashes as the patch ID. With this option:
     @@ builtin/patch-id.c: static int scan_hunk_header(const char *p, int *p_before, in
       
       static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
      -			   struct strbuf *line_buf, int stable)
     -+			   struct strbuf *line_buf, int stable, int include_whitespace)
     ++			   struct strbuf *line_buf, int stable, int verbatim)
       {
       	int patchlen = 0, found_next = 0;
       	int before = -1, after = -1;
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
       		    !skip_prefix(line, "From ", &p) &&
      -		    starts_with(line, "\\ ") && 12 < strlen(line))
      +		    starts_with(line, "\\ ") && 12 < strlen(line)) {
     -+			if (include_whitespace)
     ++			if (verbatim)
      +				the_hash_algo->update_fn(&ctx, line, strlen(line));
       			continue;
      +		}
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
      -		/* Compute the sha without whitespace */
      -		len = remove_space(line);
      +		/* Add line to hash algo (possibly removing whitespace) */
     -+		len = include_whitespace ? strlen(line) : remove_space(line);
     ++		len = verbatim ? strlen(line) : remove_space(line);
       		patchlen += len;
       		the_hash_algo->update_fn(&ctx, line, len);
       	}
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
       }
       
      -static void generate_id_list(int stable)
     -+static void generate_id_list(int stable, int include_whitespace)
     ++static void generate_id_list(int stable, int verbatim)
       {
       	struct object_id oid, n, result;
       	int patchlen;
     @@ builtin/patch-id.c: static void generate_id_list(int stable)
       	oidclr(&oid);
       	while (!feof(stdin)) {
      -		patchlen = get_one_patchid(&n, &result, &line_buf, stable);
     -+		patchlen = get_one_patchid(&n, &result, &line_buf, stable, include_whitespace);
     ++		patchlen = get_one_patchid(&n, &result, &line_buf, stable, verbatim);
       		flush_current_id(patchlen, &oid, &result);
       		oidcpy(&oid, &n);
       	}
     @@ builtin/patch-id.c: static void generate_id_list(int stable)
       }
       
      -static const char patch_id_usage[] = "git patch-id [--stable | --unstable]";
     -+static const char * const patch_id_usage[] = {
     -+	N_("git patch-id [--stable | --unstable | --include-whitespace]"),
     -+	NULL
     ++static const char *const patch_id_usage[] = {
     ++	N_("git patch-id [--stable | --unstable | --verbatim]"), NULL
      +};
      +
      +struct patch_id_opts {
      +	int stable;
     -+	int include_whitespace;
     ++	int verbatim;
      +};
       
       static int git_patch_id_config(const char *var, const char *value, void *cb)
     @@ builtin/patch-id.c: static void generate_id_list(int stable)
      +		opts->stable = git_config_bool(var, value);
      +		return 0;
      +	}
     -+	if (!strcmp(var, "patchid.includewhitespace")) {
     -+		opts->include_whitespace = git_config_bool(var, value);
     ++	if (!strcmp(var, "patchid.verbatim")) {
     ++		opts->verbatim = git_config_bool(var, value);
       		return 0;
       	}
       
     @@ builtin/patch-id.c: static int git_patch_id_config(const char *var, const char *
      +	int opts = 0;
      +	struct option builtin_patch_id_options[] = {
      +		OPT_CMDMODE(0, "unstable", &opts,
     -+			N_("use the unstable patch-id algorithm"), 1),
     ++		    N_("use the unstable patch-id algorithm"), 1),
      +		OPT_CMDMODE(0, "stable", &opts,
     -+			N_("use the stable patch-id algorithm"), 2),
     -+		OPT_CMDMODE(0, "include-whitespace", &opts,
     -+			N_("use the stable algorithm and don't strip whitespace"), 3),
     ++		    N_("use the stable patch-id algorithm"), 2),
     ++		OPT_CMDMODE(0, "verbatim", &opts,
     ++			N_("don't strip whitespace from the patch"), 3),
      +		OPT_END()
      +	};
      +
      +	git_config(git_patch_id_config, &config);
      +
     -+	/* includeWhitespace implies stable */
     -+	if (config.include_whitespace)
     ++	/* verbatim implies stable */
     ++	if (config.verbatim)
      +		config.stable = 1;
      +
      +	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
      +			     patch_id_usage, 0);
      +
      +	generate_id_list(opts ? opts > 1 : config.stable,
     -+			 opts ? opts == 3 : config.include_whitespace);
     ++			 opts ? opts == 3 : config.verbatim);
       	return 0;
       }
      
     @@ t/t4204-patch-id.sh: test_expect_success 'file order is relevant with --unstable
       	test_patch_id_file_order relevant --unstable --unstable
       '
       
     -+test_expect_success 'whitespace is relevant with --include-whitespace' '
     -+	test_patch_id_whitespace relevant --include-whitespace --include-whitespace
     ++test_expect_success 'whitespace is relevant with --verbatim' '
     ++	test_patch_id_whitespace relevant --verbatim --verbatim
      +'
      +
     -+test_expect_success 'whitespace is irrelevant without --include-whitespace' '
     ++test_expect_success 'whitespace is irrelevant without --verbatim' '
      +	test_patch_id_whitespace irrelevant --stable --stable
      +'
      +
     @@ t/t4204-patch-id.sh: test_expect_success 'patchid.stable = false is unstable' '
       	test_patch_id relevant patchid.stable=false
       '
       
     -+test_expect_success 'patchid.includeWhitespace = true is correct and stable' '
     -+	test_config patchid.includeWhitespace true &&
     -+	test_patch_id_whitespace relevant patchid.includeWhitespace=true &&
     -+	test_patch_id irrelevant patchid.includeWhitespace=true
     ++test_expect_success 'patchid.verbatim = true is correct and stable' '
     ++	test_config patchid.verbatim true &&
     ++	test_patch_id_whitespace relevant patchid.verbatim=true &&
     ++	test_patch_id irrelevant patchid.verbatim=true
      +'
      +
     -+test_expect_success 'patchid.includeWhitespace = false is unstable' '
     -+	test_config patchid.includeWhitespace false &&
     -+	test_patch_id relevant patchid.includeWhitespace=false
     ++test_expect_success 'patchid.verbatim = false is unstable' '
     ++	test_config patchid.verbatim false &&
     ++	test_patch_id relevant patchid.verbatim=false
      +'
      +
       test_expect_success '--unstable overrides patchid.stable = true' '
     @@ t/t4204-patch-id.sh: test_expect_success '--stable overrides patchid.stable = fa
       	test_patch_id irrelevant patchid.stable=false--stable --stable
       '
       
     -+test_expect_success '--include-whitespace overrides patchid.stable = false' '
     ++test_expect_success '--verbatim overrides patchid.stable = false' '
      +	test_config patchid.stable false &&
     -+	test_patch_id_whitespace relevant stable=false--include-whitespace --include-whitespace
     ++	test_patch_id_whitespace relevant stable=false--verbatim --verbatim
      +'
      +
       test_expect_success 'patch-id supports git-format-patch MIME output' '
     @@ t/t4204-patch-id.sh: test_expect_success 'patch-id handles no-nl-at-eof markers'
       	calc_patch_id withnl <withnl &&
      -	test_cmp patch-id_nonl patch-id_withnl
      +	test_cmp patch-id_nonl patch-id_withnl &&
     -+	calc_patch_id nonl-inc-ws --include-whitespace <nonl &&
     -+	calc_patch_id withnl-inc-ws --include-whitespace <withnl &&
     ++	calc_patch_id nonl-inc-ws --verbatim <nonl &&
     ++	calc_patch_id withnl-inc-ws --verbatim <withnl &&
      +	! test_cmp patch-id_nonl-inc-ws patch-id_withnl-inc-ws
       '
       
 6:  a1f6f36d487 ! 6:  dcdfac7a153 builtin: patch-id: remove unused diff-tree prefix
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
      +		if (!skip_prefix(line, "commit ", &p) &&
       		    !skip_prefix(line, "From ", &p) &&
       		    starts_with(line, "\\ ") && 12 < strlen(line)) {
     - 			if (include_whitespace)
     + 			if (verbatim)
 7:  69440797f30 < -:  ----------- documentation: format-patch: clarify requirements for patch-ids to match

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 1/6] patch-id: fix stable patch id for binary / header-only
  2022-10-20 23:16     ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
@ 2022-10-20 23:16       ` Jerry Zhang via GitGitGadget
  2022-10-20 23:16       ` [PATCH v4 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-20 23:16 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

Patch-ids for binary patches are found by hashing the object
ids of the before and after objects in succession. However in
the --stable case, there is a bug where hunks are not flushed
for binary and header-only patch ids, which would always result
in a patch-id of 0000. The --unstable case is currently correct.

Reorder the logic to branch into 3 cases for populating the
patch body: header-only which populates nothing, binary which
populates the object ids, and normal which populates the text
diff. All branches will end up flushing the hunk.

Don't populate the ---a/ and +++b/ lines for binary diffs, to correspond
to those lines not being present in the "git diff" text output.
This is necessary because we advertise that the patch-id calculated
internally and used in format-patch is the same that what the
builtin "git patch-id" would produce when piped from a diff.

Update the test to run on both binary and normal files.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 diff.c                     | 58 +++++++++++++++++++-------------------
 t/t3419-rebase-patch-id.sh | 34 +++++++++++++++-------
 2 files changed, 53 insertions(+), 39 deletions(-)

diff --git a/diff.c b/diff.c
index 648f6717a55..c15169e4b06 100644
--- a/diff.c
+++ b/diff.c
@@ -6253,46 +6253,46 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 		if (p->one->mode == 0) {
 			patch_id_add_string(&ctx, "newfilemode");
 			patch_id_add_mode(&ctx, p->two->mode);
-			patch_id_add_string(&ctx, "---/dev/null");
-			patch_id_add_string(&ctx, "+++b/");
-			the_hash_algo->update_fn(&ctx, p->two->path, len2);
 		} else if (p->two->mode == 0) {
 			patch_id_add_string(&ctx, "deletedfilemode");
 			patch_id_add_mode(&ctx, p->one->mode);
-			patch_id_add_string(&ctx, "---a/");
-			the_hash_algo->update_fn(&ctx, p->one->path, len1);
-			patch_id_add_string(&ctx, "+++/dev/null");
-		} else {
-			patch_id_add_string(&ctx, "---a/");
-			the_hash_algo->update_fn(&ctx, p->one->path, len1);
-			patch_id_add_string(&ctx, "+++b/");
-			the_hash_algo->update_fn(&ctx, p->two->path, len2);
 		}
 
-		if (diff_header_only)
-			continue;
-
-		if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
-		    fill_mmfile(options->repo, &mf2, p->two) < 0)
-			return error("unable to read files to diff");
-
-		if (diff_filespec_is_binary(options->repo, p->one) ||
+		if (diff_header_only) {
+			/* don't do anything since we're only populating header info */
+		} else if (diff_filespec_is_binary(options->repo, p->one) ||
 		    diff_filespec_is_binary(options->repo, p->two)) {
 			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->one->oid),
 					the_hash_algo->hexsz);
 			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->two->oid),
 					the_hash_algo->hexsz);
-			continue;
-		}
-
-		xpp.flags = 0;
-		xecfg.ctxlen = 3;
-		xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
-		if (xdi_diff_outf(&mf1, &mf2, NULL,
-				  patch_id_consume, &data, &xpp, &xecfg))
-			return error("unable to generate patch-id diff for %s",
-				     p->one->path);
+		} else {
+			if (p->one->mode == 0) {
+				patch_id_add_string(&ctx, "---/dev/null");
+				patch_id_add_string(&ctx, "+++b/");
+				the_hash_algo->update_fn(&ctx, p->two->path, len2);
+			} else if (p->two->mode == 0) {
+				patch_id_add_string(&ctx, "---a/");
+				the_hash_algo->update_fn(&ctx, p->one->path, len1);
+				patch_id_add_string(&ctx, "+++/dev/null");
+			} else {
+				patch_id_add_string(&ctx, "---a/");
+				the_hash_algo->update_fn(&ctx, p->one->path, len1);
+				patch_id_add_string(&ctx, "+++b/");
+				the_hash_algo->update_fn(&ctx, p->two->path, len2);
+			}
 
+			if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
+			    fill_mmfile(options->repo, &mf2, p->two) < 0)
+				return error("unable to read files to diff");
+			xpp.flags = 0;
+			xecfg.ctxlen = 3;
+			xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
+			if (xdi_diff_outf(&mf1, &mf2, NULL,
+					  patch_id_consume, &data, &xpp, &xecfg))
+				return error("unable to generate patch-id diff for %s",
+					     p->one->path);
+		}
 		if (stable)
 			flush_one_hunk(oid, &ctx);
 	}
diff --git a/t/t3419-rebase-patch-id.sh b/t/t3419-rebase-patch-id.sh
index 295040f2fe3..d24e55aac8d 100755
--- a/t/t3419-rebase-patch-id.sh
+++ b/t/t3419-rebase-patch-id.sh
@@ -43,15 +43,16 @@ test_expect_success 'setup: 500 lines' '
 	git add newfile &&
 	git commit -q -m "add small file" &&
 
-	git cherry-pick main >/dev/null 2>&1
-'
+	git cherry-pick main >/dev/null 2>&1 &&
 
-test_expect_success 'setup attributes' '
-	echo "file binary" >.gitattributes
+	git branch -f squashed main &&
+	git checkout -q -f squashed &&
+	git reset -q --soft HEAD~2 &&
+	git commit -q -m squashed
 '
 
 test_expect_success 'detect upstream patch' '
-	git checkout -q main &&
+	git checkout -q main^{} &&
 	scramble file &&
 	git add file &&
 	git commit -q -m "change big file again" &&
@@ -61,14 +62,27 @@ test_expect_success 'detect upstream patch' '
 	test_must_be_empty revs
 '
 
+test_expect_success 'detect upstream patch binary' '
+	echo "file binary" >.gitattributes &&
+	git checkout -q other^{} &&
+	git rebase main &&
+	git rev-list main...HEAD~ >revs &&
+	test_must_be_empty revs &&
+	test_when_finished "rm .gitattributes"
+'
+
 test_expect_success 'do not drop patch' '
-	git branch -f squashed main &&
-	git checkout -q -f squashed &&
-	git reset -q --soft HEAD~2 &&
-	git commit -q -m squashed &&
 	git checkout -q other^{} &&
 	test_must_fail git rebase squashed &&
-	git rebase --quit
+	test_when_finished "git rebase --abort"
+'
+
+test_expect_success 'do not drop patch binary' '
+	echo "file binary" >.gitattributes &&
+	git checkout -q other^{} &&
+	test_must_fail git rebase squashed &&
+	test_when_finished "git rebase --abort" &&
+	test_when_finished "rm .gitattributes"
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 2/6] patch-id: use stable patch-id for rebases
  2022-10-20 23:16     ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  2022-10-20 23:16       ` [PATCH v4 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
@ 2022-10-20 23:16       ` Jerry Zhang via GitGitGadget
  2022-10-20 23:16       ` [PATCH v4 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-20 23:16 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

Git doesn't persist patch-ids during the rebase process, so there is
no need to specifically invoke the unstable variant. Use the stable
logic for all internal patch-id calculations to minimize the number of
code paths and improve test coverage.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 builtin/log.c |  2 +-
 diff.c        | 12 ++++--------
 diff.h        |  2 +-
 patch-ids.c   | 10 +++++-----
 patch-ids.h   |  2 +-
 5 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index ee19dc5d450..e72869afb36 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1763,7 +1763,7 @@ static void prepare_bases(struct base_tree_info *bases,
 		struct object_id *patch_id;
 		if (*commit_base_at(&commit_base, commit))
 			continue;
-		if (commit_patch_id(commit, &diffopt, &oid, 0, 1))
+		if (commit_patch_id(commit, &diffopt, &oid, 0))
 			die(_("cannot get patch id"));
 		ALLOC_GROW(bases->patch_id, bases->nr_patch_id + 1, bases->alloc_patch_id);
 		patch_id = bases->patch_id + bases->nr_patch_id;
diff --git a/diff.c b/diff.c
index c15169e4b06..199b63dbcc3 100644
--- a/diff.c
+++ b/diff.c
@@ -6206,7 +6206,7 @@ static void patch_id_add_mode(git_hash_ctx *ctx, unsigned mode)
 }
 
 /* returns 0 upon success, and writes result into oid */
-static int diff_get_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only, int stable)
+static int diff_get_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i;
@@ -6293,21 +6293,17 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 				return error("unable to generate patch-id diff for %s",
 					     p->one->path);
 		}
-		if (stable)
-			flush_one_hunk(oid, &ctx);
+		flush_one_hunk(oid, &ctx);
 	}
 
-	if (!stable)
-		the_hash_algo->final_oid_fn(oid, &ctx);
-
 	return 0;
 }
 
-int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only, int stable)
+int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i;
-	int result = diff_get_patch_id(options, oid, diff_header_only, stable);
+	int result = diff_get_patch_id(options, oid, diff_header_only);
 
 	for (i = 0; i < q->nr; i++)
 		diff_free_filepair(q->queue[i]);
diff --git a/diff.h b/diff.h
index 8ae18e5ab1e..fd33caeb25d 100644
--- a/diff.h
+++ b/diff.h
@@ -634,7 +634,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option);
 int run_diff_index(struct rev_info *revs, unsigned int option);
 
 int do_diff_cache(const struct object_id *, struct diff_options *);
-int diff_flush_patch_id(struct diff_options *, struct object_id *, int, int);
+int diff_flush_patch_id(struct diff_options *, struct object_id *, int);
 void flush_one_hunk(struct object_id *result, git_hash_ctx *ctx);
 
 int diff_result_code(struct diff_options *, int);
diff --git a/patch-ids.c b/patch-ids.c
index 46c6a8f3eab..31534466266 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -11,7 +11,7 @@ static int patch_id_defined(struct commit *commit)
 }
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-		    struct object_id *oid, int diff_header_only, int stable)
+		    struct object_id *oid, int diff_header_only)
 {
 	if (!patch_id_defined(commit))
 		return -1;
@@ -22,7 +22,7 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
 	else
 		diff_root_tree_oid(&commit->object.oid, "", options);
 	diffcore_std(options);
-	return diff_flush_patch_id(options, oid, diff_header_only, stable);
+	return diff_flush_patch_id(options, oid, diff_header_only);
 }
 
 /*
@@ -48,11 +48,11 @@ static int patch_id_neq(const void *cmpfn_data,
 	b = container_of(entry_or_key, struct patch_id, ent);
 
 	if (is_null_oid(&a->patch_id) &&
-	    commit_patch_id(a->commit, opt, &a->patch_id, 0, 0))
+	    commit_patch_id(a->commit, opt, &a->patch_id, 0))
 		return error("Could not get patch ID for %s",
 			oid_to_hex(&a->commit->object.oid));
 	if (is_null_oid(&b->patch_id) &&
-	    commit_patch_id(b->commit, opt, &b->patch_id, 0, 0))
+	    commit_patch_id(b->commit, opt, &b->patch_id, 0))
 		return error("Could not get patch ID for %s",
 			oid_to_hex(&b->commit->object.oid));
 	return !oideq(&a->patch_id, &b->patch_id);
@@ -82,7 +82,7 @@ static int init_patch_id_entry(struct patch_id *patch,
 	struct object_id header_only_patch_id;
 
 	patch->commit = commit;
-	if (commit_patch_id(commit, &ids->diffopts, &header_only_patch_id, 1, 0))
+	if (commit_patch_id(commit, &ids->diffopts, &header_only_patch_id, 1))
 		return -1;
 
 	hashmap_entry_init(&patch->ent, oidhash(&header_only_patch_id));
diff --git a/patch-ids.h b/patch-ids.h
index ab6c6a68047..490d7393716 100644
--- a/patch-ids.h
+++ b/patch-ids.h
@@ -20,7 +20,7 @@ struct patch_ids {
 };
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-		    struct object_id *oid, int, int);
+		    struct object_id *oid, int);
 int init_patch_ids(struct repository *, struct patch_ids *);
 int free_patch_ids(struct patch_ids *);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 3/6] builtin: patch-id: fix patch-id with binary diffs
  2022-10-20 23:16     ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  2022-10-20 23:16       ` [PATCH v4 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
  2022-10-20 23:16       ` [PATCH v4 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
@ 2022-10-20 23:16       ` Jerry Zhang via GitGitGadget
  2022-10-20 23:16       ` [PATCH v4 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-20 23:16 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <Jerry@skydio.com>

"git patch-id" currently doesn't produce correct output if the
incoming diff has any binary files. Add logic to get_one_patchid
to handle the different possible styles of binary diff. This
attempts to keep resulting patch-ids identical to what would be
produced by the counterpart logic in diff.c, that is it produces
the id by hashing the a and b oids in succession.

In general we handle binary diffs by first caching the object ids from
the "index" line and using those if we then find an indication
that the diff is binary.

The input could contain patches generated with "git diff --binary". This
currently breaks the parse logic and results in multiple patch-ids
output for a single commit. Here we have to skip the contents of the
patch itself since those do not go into the patch id. --binary
implies --full-index so the object ids are always available.

When the diff is generated with --full-index there is no patch content
to skip over.

When a diff is generated without --full-index or --binary, it will
contain abbreviated object ids. This will still result in a sufficiently
unique patch-id when hashed, but does not match internal patch id
output. We'll call this ok for now as we already need specialized
arguments to diff in order to match internal patch id (namely -U3).

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
---
 builtin/patch-id.c  | 36 ++++++++++++++++++++++++++++++++++--
 t/t4204-patch-id.sh | 29 ++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 881fcf32732..e7a31123142 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -61,6 +61,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 {
 	int patchlen = 0, found_next = 0;
 	int before = -1, after = -1;
+	int diff_is_binary = 0;
+	char pre_oid_str[GIT_MAX_HEXSZ + 1], post_oid_str[GIT_MAX_HEXSZ + 1];
 	git_hash_ctx ctx;
 
 	the_hash_algo->init_fn(&ctx);
@@ -88,14 +90,44 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 
 		/* Parsing diff header?  */
 		if (before == -1) {
-			if (starts_with(line, "index "))
+			if (starts_with(line, "GIT binary patch") ||
+			    starts_with(line, "Binary files")) {
+				diff_is_binary = 1;
+				before = 0;
+				the_hash_algo->update_fn(&ctx, pre_oid_str,
+							 strlen(pre_oid_str));
+				the_hash_algo->update_fn(&ctx, post_oid_str,
+							 strlen(post_oid_str));
+				if (stable)
+					flush_one_hunk(result, &ctx);
 				continue;
-			else if (starts_with(line, "--- "))
+			} else if (skip_prefix(line, "index ", &p)) {
+				char *oid1_end = strstr(line, "..");
+				char *oid2_end = NULL;
+				if (oid1_end)
+					oid2_end = strstr(oid1_end, " ");
+				if (!oid2_end)
+					oid2_end = line + strlen(line) - 1;
+				if (oid1_end != NULL && oid2_end != NULL) {
+					*oid1_end = *oid2_end = '\0';
+					strlcpy(pre_oid_str, p, GIT_MAX_HEXSZ + 1);
+					strlcpy(post_oid_str, oid1_end + 2, GIT_MAX_HEXSZ + 1);
+				}
+				continue;
+			} else if (starts_with(line, "--- "))
 				before = after = 1;
 			else if (!isalpha(line[0]))
 				break;
 		}
 
+		if (diff_is_binary) {
+			if (starts_with(line, "diff ")) {
+				diff_is_binary = 0;
+				before = -1;
+			}
+			continue;
+		}
+
 		/* Looking for a valid hunk header?  */
 		if (before == 0 && after == 0) {
 			if (starts_with(line, "@@ -")) {
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index a730c0db985..cdc5191aa8d 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -42,7 +42,7 @@ calc_patch_id () {
 }
 
 get_top_diff () {
-	git log -p -1 "$@" -O bar-then-foo --
+	git log -p -1 "$@" -O bar-then-foo --full-index --
 }
 
 get_patch_id () {
@@ -61,6 +61,33 @@ test_expect_success 'patch-id detects inequality' '
 	get_patch_id notsame &&
 	! test_cmp patch-id_main patch-id_notsame
 '
+test_expect_success 'patch-id detects equality binary' '
+	cat >.gitattributes <<-\EOF &&
+	foo binary
+	bar binary
+	EOF
+	get_patch_id main &&
+	get_patch_id same &&
+	git log -p -1 --binary main >top-diff.output &&
+	calc_patch_id <top-diff.output main_binpatch &&
+	git log -p -1 --binary same >top-diff.output &&
+	calc_patch_id <top-diff.output same_binpatch &&
+	test_cmp patch-id_main patch-id_main_binpatch &&
+	test_cmp patch-id_same patch-id_same_binpatch &&
+	test_cmp patch-id_main patch-id_same &&
+	test_when_finished "rm .gitattributes"
+'
+
+test_expect_success 'patch-id detects inequality binary' '
+	cat >.gitattributes <<-\EOF &&
+	foo binary
+	bar binary
+	EOF
+	get_patch_id main &&
+	get_patch_id notsame &&
+	! test_cmp patch-id_main patch-id_notsame &&
+	test_when_finished "rm .gitattributes"
+'
 
 test_expect_success 'patch-id supports git-format-patch output' '
 	get_patch_id main &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 4/6] patch-id: fix patch-id for mode changes
  2022-10-20 23:16     ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-10-20 23:16       ` [PATCH v4 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
@ 2022-10-20 23:16       ` Jerry Zhang via GitGitGadget
  2022-10-20 23:16       ` [PATCH v4 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-20 23:16 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <Jerry@skydio.com>

Currently patch-id as used in rebase and cherry-pick does not account
for file modes if the file is modified. One consequence of this is
that if you have a local patch that changes modes, but upstream
has applied an outdated version of the patch that doesn't include
that mode change, "git rebase" will drop your local version of the
patch along with your mode changes. It also means that internal
patch-id doesn't produce the same output as the builtin, which does
account for mode changes due to them being part of diff output.

Fix by adding mode to the patch-id if it has changed, in the same
format that would be produced by diff, so that it is compatible
with builtin patch-id.

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
---
 diff.c                     |  5 +++++
 t/t3419-rebase-patch-id.sh | 31 ++++++++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 199b63dbcc3..0e336c48560 100644
--- a/diff.c
+++ b/diff.c
@@ -6256,6 +6256,11 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 		} else if (p->two->mode == 0) {
 			patch_id_add_string(&ctx, "deletedfilemode");
 			patch_id_add_mode(&ctx, p->one->mode);
+		} else if (p->one->mode != p->two->mode) {
+			patch_id_add_string(&ctx, "oldmode");
+			patch_id_add_mode(&ctx, p->one->mode);
+			patch_id_add_string(&ctx, "newmode");
+			patch_id_add_mode(&ctx, p->two->mode);
 		}
 
 		if (diff_header_only) {
diff --git a/t/t3419-rebase-patch-id.sh b/t/t3419-rebase-patch-id.sh
index d24e55aac8d..7181f176b81 100755
--- a/t/t3419-rebase-patch-id.sh
+++ b/t/t3419-rebase-patch-id.sh
@@ -48,7 +48,17 @@ test_expect_success 'setup: 500 lines' '
 	git branch -f squashed main &&
 	git checkout -q -f squashed &&
 	git reset -q --soft HEAD~2 &&
-	git commit -q -m squashed
+	git commit -q -m squashed &&
+
+	git branch -f mode main &&
+	git checkout -q -f mode &&
+	test_chmod +x file &&
+	git commit -q -a --amend &&
+
+	git branch -f modeother other &&
+	git checkout -q -f modeother &&
+	test_chmod +x file &&
+	git commit -q -a --amend
 '
 
 test_expect_success 'detect upstream patch' '
@@ -71,6 +81,13 @@ test_expect_success 'detect upstream patch binary' '
 	test_when_finished "rm .gitattributes"
 '
 
+test_expect_success 'detect upstream patch modechange' '
+	git checkout -q modeother^{} &&
+	git rebase mode &&
+	git rev-list mode...HEAD~ >revs &&
+	test_must_be_empty revs
+'
+
 test_expect_success 'do not drop patch' '
 	git checkout -q other^{} &&
 	test_must_fail git rebase squashed &&
@@ -85,4 +102,16 @@ test_expect_success 'do not drop patch binary' '
 	test_when_finished "rm .gitattributes"
 '
 
+test_expect_success 'do not drop patch modechange' '
+	git checkout -q modeother^{} &&
+	git rebase other &&
+	cat >expected <<-\EOF &&
+	diff --git a/file b/file
+	old mode 100644
+	new mode 100755
+	EOF
+	git diff HEAD~ >modediff &&
+	test_cmp expected modediff
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 5/6] builtin: patch-id: add --verbatim as a command mode
  2022-10-20 23:16     ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-10-20 23:16       ` [PATCH v4 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
@ 2022-10-20 23:16       ` Jerry Zhang via GitGitGadget
  2022-10-20 23:16       ` [PATCH v4 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
  2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-20 23:16 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

There are situations where the user might not want the default
setting where patch-id strips all whitespace. They might be working
in a language where white space is syntactically important, or they
might have CI testing that enforces strict whitespace linting. In
these cases, a whitespace change would result in the patch
fundamentally changing, and thus deserving of a different id.

Add a new mode that is exclusive of --stable and --unstable called
--verbatim. It also corresponds to the config
patchid.verbatim = true. In this mode, the stable algorithm is
used and whitespace is not stripped from the patch text.

Users of --unstable mainly care about compatibility with old git
versions, which unstripping the whitespace would break. Thus there
isn't a usecase for the combination of --verbatim and --unstable,
and we don't expose this so as to not add maintainence burden.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
fixes https://github.com/Skydio/revup/issues/2
---
 Documentation/git-patch-id.txt | 24 +++++++----
 builtin/patch-id.c             | 73 ++++++++++++++++++++++------------
 t/t4204-patch-id.sh            | 66 +++++++++++++++++++++++++++---
 3 files changed, 124 insertions(+), 39 deletions(-)

diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt
index 442caff8a9c..1d15fa45d51 100644
--- a/Documentation/git-patch-id.txt
+++ b/Documentation/git-patch-id.txt
@@ -8,18 +8,18 @@ git-patch-id - Compute unique ID for a patch
 SYNOPSIS
 --------
 [verse]
-'git patch-id' [--stable | --unstable]
+'git patch-id' [--stable | --unstable | --verbatim]
 
 DESCRIPTION
 -----------
 Read a patch from the standard input and compute the patch ID for it.
 
 A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a
-patch, with whitespace and line numbers ignored.  As such, it's "reasonably
-stable", but at the same time also reasonably unique, i.e., two patches that
-have the same "patch ID" are almost guaranteed to be the same thing.
+patch, with line numbers ignored.  As such, it's "reasonably stable", but at
+the same time also reasonably unique, i.e., two patches that have the same
+"patch ID" are almost guaranteed to be the same thing.
 
-IOW, you can use this thing to look for likely duplicate commits.
+The main usecase for this command is to look for likely duplicate commits.
 
 When dealing with 'git diff-tree' output, it takes advantage of
 the fact that the patch is prefixed with the object name of the
@@ -30,6 +30,12 @@ This can be used to make a mapping from patch ID to commit ID.
 OPTIONS
 -------
 
+--verbatim::
+	Calculate the patch-id of the input as it is given, do not strip
+	any whitespace.
+
+	This is the default if patchid.verbatim is true.
+
 --stable::
 	Use a "stable" sum of hashes as the patch ID. With this option:
 	 - Reordering file diffs that make up a patch does not affect the ID.
@@ -45,14 +51,16 @@ OPTIONS
 	   of "-O<orderfile>", thereby making existing databases storing such
 	   "unstable" or historical patch-ids unusable.
 
+	 - All whitespace within the patch is ignored and does not affect the id.
+
 	This is the default if patchid.stable is set to true.
 
 --unstable::
 	Use an "unstable" hash as the patch ID. With this option,
 	the result produced is compatible with the patch-id value produced
-	by git 1.9 and older.  Users with pre-existing databases storing
-	patch-ids produced by git 1.9 and older (who do not deal with reordered
-	patches) may want to use this option.
+	by git 1.9 and older and whitespace is ignored.  Users with pre-existing
+	databases storing patch-ids produced by git 1.9 and older (who do not deal
+	with reordered patches) may want to use this option.
 
 	This is the default.
 
diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index e7a31123142..afdd472369f 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -2,6 +2,7 @@
 #include "builtin.h"
 #include "config.h"
 #include "diff.h"
+#include "parse-options.h"
 
 static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
 {
@@ -57,7 +58,7 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
 }
 
 static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
-			   struct strbuf *line_buf, int stable)
+			   struct strbuf *line_buf, int stable, int verbatim)
 {
 	int patchlen = 0, found_next = 0;
 	int before = -1, after = -1;
@@ -76,8 +77,11 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		if (!skip_prefix(line, "diff-tree ", &p) &&
 		    !skip_prefix(line, "commit ", &p) &&
 		    !skip_prefix(line, "From ", &p) &&
-		    starts_with(line, "\\ ") && 12 < strlen(line))
+		    starts_with(line, "\\ ") && 12 < strlen(line)) {
+			if (verbatim)
+				the_hash_algo->update_fn(&ctx, line, strlen(line));
 			continue;
+		}
 
 		if (!get_oid_hex(p, next_oid)) {
 			found_next = 1;
@@ -152,8 +156,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		if (line[0] == '+' || line[0] == ' ')
 			after--;
 
-		/* Compute the sha without whitespace */
-		len = remove_space(line);
+		/* Add line to hash algo (possibly removing whitespace) */
+		len = verbatim ? strlen(line) : remove_space(line);
 		patchlen += len;
 		the_hash_algo->update_fn(&ctx, line, len);
 	}
@@ -166,7 +170,7 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 	return patchlen;
 }
 
-static void generate_id_list(int stable)
+static void generate_id_list(int stable, int verbatim)
 {
 	struct object_id oid, n, result;
 	int patchlen;
@@ -174,21 +178,32 @@ static void generate_id_list(int stable)
 
 	oidclr(&oid);
 	while (!feof(stdin)) {
-		patchlen = get_one_patchid(&n, &result, &line_buf, stable);
+		patchlen = get_one_patchid(&n, &result, &line_buf, stable, verbatim);
 		flush_current_id(patchlen, &oid, &result);
 		oidcpy(&oid, &n);
 	}
 	strbuf_release(&line_buf);
 }
 
-static const char patch_id_usage[] = "git patch-id [--stable | --unstable]";
+static const char *const patch_id_usage[] = {
+	N_("git patch-id [--stable | --unstable | --verbatim]"), NULL
+};
+
+struct patch_id_opts {
+	int stable;
+	int verbatim;
+};
 
 static int git_patch_id_config(const char *var, const char *value, void *cb)
 {
-	int *stable = cb;
+	struct patch_id_opts *opts = cb;
 
 	if (!strcmp(var, "patchid.stable")) {
-		*stable = git_config_bool(var, value);
+		opts->stable = git_config_bool(var, value);
+		return 0;
+	}
+	if (!strcmp(var, "patchid.verbatim")) {
+		opts->verbatim = git_config_bool(var, value);
 		return 0;
 	}
 
@@ -197,21 +212,29 @@ static int git_patch_id_config(const char *var, const char *value, void *cb)
 
 int cmd_patch_id(int argc, const char **argv, const char *prefix)
 {
-	int stable = -1;
-
-	git_config(git_patch_id_config, &stable);
-
-	/* If nothing is set, default to unstable. */
-	if (stable < 0)
-		stable = 0;
-
-	if (argc == 2 && !strcmp(argv[1], "--stable"))
-		stable = 1;
-	else if (argc == 2 && !strcmp(argv[1], "--unstable"))
-		stable = 0;
-	else if (argc != 1)
-		usage(patch_id_usage);
-
-	generate_id_list(stable);
+	/* if nothing is set, default to unstable */
+	struct patch_id_opts config = {0, 0};
+	int opts = 0;
+	struct option builtin_patch_id_options[] = {
+		OPT_CMDMODE(0, "unstable", &opts,
+		    N_("use the unstable patch-id algorithm"), 1),
+		OPT_CMDMODE(0, "stable", &opts,
+		    N_("use the stable patch-id algorithm"), 2),
+		OPT_CMDMODE(0, "verbatim", &opts,
+			N_("don't strip whitespace from the patch"), 3),
+		OPT_END()
+	};
+
+	git_config(git_patch_id_config, &config);
+
+	/* verbatim implies stable */
+	if (config.verbatim)
+		config.stable = 1;
+
+	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
+			     patch_id_usage, 0);
+
+	generate_id_list(opts ? opts > 1 : config.stable,
+			 opts ? opts == 3 : config.verbatim);
 	return 0;
 }
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index cdc5191aa8d..a7fa94ce0a2 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -8,13 +8,13 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 . ./test-lib.sh
 
 test_expect_success 'setup' '
-	as="a a a a a a a a" && # eight a
-	test_write_lines $as >foo &&
-	test_write_lines $as >bar &&
+	str="ab cd ef gh ij kl mn op" &&
+	test_write_lines $str >foo &&
+	test_write_lines $str >bar &&
 	git add foo bar &&
 	git commit -a -m initial &&
-	test_write_lines $as b >foo &&
-	test_write_lines $as b >bar &&
+	test_write_lines $str b >foo &&
+	test_write_lines $str b >bar &&
 	git commit -a -m first &&
 	git checkout -b same main &&
 	git commit --amend -m same-msg &&
@@ -22,8 +22,23 @@ test_expect_success 'setup' '
 	echo c >foo &&
 	echo c >bar &&
 	git commit --amend -a -m notsame-msg &&
+	git checkout -b with_space main~ &&
+	cat >foo <<-\EOF &&
+	a  b
+	c d
+	e    f
+	  g   h
+	    i   j
+	k l
+	m   n
+	op
+	EOF
+	cp foo bar &&
+	git add foo bar &&
+	git commit --amend -m "with spaces" &&
 	test_write_lines bar foo >bar-then-foo &&
 	test_write_lines foo bar >foo-then-bar
+
 '
 
 test_expect_success 'patch-id output is well-formed' '
@@ -128,9 +143,21 @@ test_patch_id_file_order () {
 	git format-patch -1 --stdout -O foo-then-bar >format-patch.output &&
 	calc_patch_id <format-patch.output "ordered-$name" "$@" &&
 	cmp_patch_id $relevant "$name" "ordered-$name"
+}
 
+test_patch_id_whitespace () {
+	relevant="$1"
+	shift
+	name="ws-${1}-$relevant"
+	shift
+	get_top_diff "main~" >top-diff.output &&
+	calc_patch_id <top-diff.output "$name" "$@" &&
+	get_top_diff "with_space" >top-diff.output &&
+	calc_patch_id <top-diff.output "ws-$name" "$@" &&
+	cmp_patch_id $relevant "$name" "ws-$name"
 }
 
+
 # combined test for options: add more tests here to make them
 # run with all options
 test_patch_id () {
@@ -146,6 +173,14 @@ test_expect_success 'file order is relevant with --unstable' '
 	test_patch_id_file_order relevant --unstable --unstable
 '
 
+test_expect_success 'whitespace is relevant with --verbatim' '
+	test_patch_id_whitespace relevant --verbatim --verbatim
+'
+
+test_expect_success 'whitespace is irrelevant without --verbatim' '
+	test_patch_id_whitespace irrelevant --stable --stable
+'
+
 #Now test various option combinations.
 test_expect_success 'default is unstable' '
 	test_patch_id relevant default
@@ -161,6 +196,17 @@ test_expect_success 'patchid.stable = false is unstable' '
 	test_patch_id relevant patchid.stable=false
 '
 
+test_expect_success 'patchid.verbatim = true is correct and stable' '
+	test_config patchid.verbatim true &&
+	test_patch_id_whitespace relevant patchid.verbatim=true &&
+	test_patch_id irrelevant patchid.verbatim=true
+'
+
+test_expect_success 'patchid.verbatim = false is unstable' '
+	test_config patchid.verbatim false &&
+	test_patch_id relevant patchid.verbatim=false
+'
+
 test_expect_success '--unstable overrides patchid.stable = true' '
 	test_config patchid.stable true &&
 	test_patch_id relevant patchid.stable=true--unstable --unstable
@@ -171,6 +217,11 @@ test_expect_success '--stable overrides patchid.stable = false' '
 	test_patch_id irrelevant patchid.stable=false--stable --stable
 '
 
+test_expect_success '--verbatim overrides patchid.stable = false' '
+	test_config patchid.stable false &&
+	test_patch_id_whitespace relevant stable=false--verbatim --verbatim
+'
+
 test_expect_success 'patch-id supports git-format-patch MIME output' '
 	get_patch_id main &&
 	git checkout same &&
@@ -225,7 +276,10 @@ test_expect_success 'patch-id handles no-nl-at-eof markers' '
 	EOF
 	calc_patch_id nonl <nonl &&
 	calc_patch_id withnl <withnl &&
-	test_cmp patch-id_nonl patch-id_withnl
+	test_cmp patch-id_nonl patch-id_withnl &&
+	calc_patch_id nonl-inc-ws --verbatim <nonl &&
+	calc_patch_id withnl-inc-ws --verbatim <withnl &&
+	! test_cmp patch-id_nonl-inc-ws patch-id_withnl-inc-ws
 '
 
 test_expect_success 'patch-id handles diffs with one line of before/after' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 6/6] builtin: patch-id: remove unused diff-tree prefix
  2022-10-20 23:16     ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                         ` (4 preceding siblings ...)
  2022-10-20 23:16       ` [PATCH v4 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
@ 2022-10-20 23:16       ` Jerry Zhang via GitGitGadget
  2022-10-21  9:33         ` Junio C Hamano
  2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  6 siblings, 1 reply; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-20 23:16 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <Jerry@skydio.com>

From a "git grep" of the repo, no command, including diff-tree itself,
produces diff output with "diff-tree " prefixed in the header.

Thus remove its handling in "patch-id".

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
---
 builtin/patch-id.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index afdd472369f..f840fbf1c7e 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -74,8 +74,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		const char *p = line;
 		int len;
 
-		if (!skip_prefix(line, "diff-tree ", &p) &&
-		    !skip_prefix(line, "commit ", &p) &&
+		/* Possibly skip over the prefix added by "log" or "format-patch" */
+		if (!skip_prefix(line, "commit ", &p) &&
 		    !skip_prefix(line, "From ", &p) &&
 		    starts_with(line, "\\ ") && 12 < strlen(line)) {
 			if (verbatim)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 6/6] builtin: patch-id: remove unused diff-tree prefix
  2022-10-20 23:16       ` [PATCH v4 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
@ 2022-10-21  9:33         ` Junio C Hamano
  0 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2022-10-21  9:33 UTC (permalink / raw)
  To: Jerry Zhang via GitGitGadget; +Cc: git, Jerry Zhang

"Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jerry Zhang <Jerry@skydio.com>
>
> From a "git grep" of the repo, no command, including diff-tree itself,
> produces diff output with "diff-tree " prefixed in the header.
>
> Thus remove its handling in "patch-id".

There is a bit of leap in the logic flow here, in that the current
state alone does not justify such a removal of the code that is not
hurting anybody.  I thought I did the necessary homework the last
time to help you update the proposed log message for this step with
necessary due diligence, like when we stopped producing it
ourselves.  The lack of third-party tools still relying on the code
we are removing here is not something we can prove easily, so
documenting that we go by faith there would not hurt, either.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v5 0/6] patch-id fixes and improvements
  2022-10-20 23:16     ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                         ` (5 preceding siblings ...)
  2022-10-20 23:16       ` [PATCH v4 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
@ 2022-10-24 20:07       ` Jerry Zhang via GitGitGadget
  2022-10-24 20:07         ` [PATCH v5 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
                           ` (6 more replies)
  6 siblings, 7 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-24 20:07 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang

These patches add fixes and features to the "git patch-id" command, mostly
discovered through our usage of patch-id in the revup project
(https://github.com/Skydio/revup). On top of that I've tried to make general
cleanup changes where I can.

Summary:

1: Fixed a bug in the combination of --stable with binary files and
header-only, and expanded the test to cover both binary and non-binary
files.

2: Switch internal usage of patch-id in rebase / cherry-pick to use the
stable variant to reduce the number of code paths and improve testing for
bugs like above.

3: Fixed bugs with patch-id and binary diffs. Previously patch-id did not
behave correctly for binary diffs regardless of whether "--binary" was given
to "diff".

4: Fixed bugs with patch-id and mode changes. Previously mode changes were
incorrectly excluded from the patch-id.

5: Add a new "--include-whitespace" mode to patch-id that prevents
whitespace from being stripped during id calculation. Also add a config
option for the same behavior.

6: Remove unused prefix from patch-id logic.

V1->V2: Fixed comment style V2->V3: The ---/+++ lines no longer get added to
the patch-id of binary diffs. Also added patches 3-7 in the series. V3->V4:
Dropped patch7. Updated flag name to --verbatim. Updated commit message
descriptions. V4->V5: Updated commit message for patch 6.

Signed-off-by: Jerry Zhang jerry@skydio.com

Jerry Zhang (6):
  patch-id: fix stable patch id for binary / header-only
  patch-id: use stable patch-id for rebases
  builtin: patch-id: fix patch-id with binary diffs
  patch-id: fix patch-id for mode changes
  builtin: patch-id: add --verbatim as a command mode
  builtin: patch-id: remove unused diff-tree prefix

 Documentation/git-patch-id.txt |  24 ++++---
 builtin/log.c                  |   2 +-
 builtin/patch-id.c             | 113 ++++++++++++++++++++++++---------
 diff.c                         |  75 +++++++++++-----------
 diff.h                         |   2 +-
 patch-ids.c                    |  10 +--
 patch-ids.h                    |   2 +-
 t/t3419-rebase-patch-id.sh     |  63 +++++++++++++++---
 t/t4204-patch-id.sh            |  95 +++++++++++++++++++++++++--
 9 files changed, 287 insertions(+), 99 deletions(-)


base-commit: 45c9f05c44b1cb6bd2d6cb95a22cf5e3d21d5b63
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1359%2Fjerry-skydio%2Fjerry%2Frevup%2Fmaster%2Fpatch_ids-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1359/jerry-skydio/jerry/revup/master/patch_ids-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1359

Range-diff vs v4:

 1:  321757ef919 = 1:  321757ef919 patch-id: fix stable patch id for binary / header-only
 2:  ec4a2422d5b = 2:  ec4a2422d5b patch-id: use stable patch-id for rebases
 3:  81501355313 = 3:  81501355313 builtin: patch-id: fix patch-id with binary diffs
 4:  bb0b4add03c = 4:  bb0b4add03c patch-id: fix patch-id for mode changes
 5:  b160f2ae49f = 5:  b160f2ae49f builtin: patch-id: add --verbatim as a command mode
 6:  dcdfac7a153 ! 6:  eef2a32f008 builtin: patch-id: remove unused diff-tree prefix
     @@ Metadata
       ## Commit message ##
          builtin: patch-id: remove unused diff-tree prefix
      
     -    From a "git grep" of the repo, no command, including diff-tree itself,
     -    produces diff output with "diff-tree " prefixed in the header.
     +    The last git version that had "diff-tree" in the header text
     +    of "git diff-tree" output was v1.3.0 from 2006. The header text
     +    was changed from "diff-tree" to "commit" in 91539833
     +    ("Log message printout cleanups").
      
     -    Thus remove its handling in "patch-id".
     +    Given how long ago this change was made, it is highly unlikely that
     +    anyone is still feeding in outputs from that git version.
     +
     +    Remove the handling of the "diff-tree" prefix and document the
     +    source of the other prefixes so that the overall functionality
     +    is more clear.
      
          Signed-off-by: Jerry Zhang <Jerry@skydio.com>
      

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v5 1/6] patch-id: fix stable patch id for binary / header-only
  2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
@ 2022-10-24 20:07         ` Jerry Zhang via GitGitGadget
  2022-10-24 20:07         ` [PATCH v5 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-24 20:07 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

Patch-ids for binary patches are found by hashing the object
ids of the before and after objects in succession. However in
the --stable case, there is a bug where hunks are not flushed
for binary and header-only patch ids, which would always result
in a patch-id of 0000. The --unstable case is currently correct.

Reorder the logic to branch into 3 cases for populating the
patch body: header-only which populates nothing, binary which
populates the object ids, and normal which populates the text
diff. All branches will end up flushing the hunk.

Don't populate the ---a/ and +++b/ lines for binary diffs, to correspond
to those lines not being present in the "git diff" text output.
This is necessary because we advertise that the patch-id calculated
internally and used in format-patch is the same that what the
builtin "git patch-id" would produce when piped from a diff.

Update the test to run on both binary and normal files.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 diff.c                     | 58 +++++++++++++++++++-------------------
 t/t3419-rebase-patch-id.sh | 34 +++++++++++++++-------
 2 files changed, 53 insertions(+), 39 deletions(-)

diff --git a/diff.c b/diff.c
index 648f6717a55..c15169e4b06 100644
--- a/diff.c
+++ b/diff.c
@@ -6253,46 +6253,46 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 		if (p->one->mode == 0) {
 			patch_id_add_string(&ctx, "newfilemode");
 			patch_id_add_mode(&ctx, p->two->mode);
-			patch_id_add_string(&ctx, "---/dev/null");
-			patch_id_add_string(&ctx, "+++b/");
-			the_hash_algo->update_fn(&ctx, p->two->path, len2);
 		} else if (p->two->mode == 0) {
 			patch_id_add_string(&ctx, "deletedfilemode");
 			patch_id_add_mode(&ctx, p->one->mode);
-			patch_id_add_string(&ctx, "---a/");
-			the_hash_algo->update_fn(&ctx, p->one->path, len1);
-			patch_id_add_string(&ctx, "+++/dev/null");
-		} else {
-			patch_id_add_string(&ctx, "---a/");
-			the_hash_algo->update_fn(&ctx, p->one->path, len1);
-			patch_id_add_string(&ctx, "+++b/");
-			the_hash_algo->update_fn(&ctx, p->two->path, len2);
 		}
 
-		if (diff_header_only)
-			continue;
-
-		if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
-		    fill_mmfile(options->repo, &mf2, p->two) < 0)
-			return error("unable to read files to diff");
-
-		if (diff_filespec_is_binary(options->repo, p->one) ||
+		if (diff_header_only) {
+			/* don't do anything since we're only populating header info */
+		} else if (diff_filespec_is_binary(options->repo, p->one) ||
 		    diff_filespec_is_binary(options->repo, p->two)) {
 			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->one->oid),
 					the_hash_algo->hexsz);
 			the_hash_algo->update_fn(&ctx, oid_to_hex(&p->two->oid),
 					the_hash_algo->hexsz);
-			continue;
-		}
-
-		xpp.flags = 0;
-		xecfg.ctxlen = 3;
-		xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
-		if (xdi_diff_outf(&mf1, &mf2, NULL,
-				  patch_id_consume, &data, &xpp, &xecfg))
-			return error("unable to generate patch-id diff for %s",
-				     p->one->path);
+		} else {
+			if (p->one->mode == 0) {
+				patch_id_add_string(&ctx, "---/dev/null");
+				patch_id_add_string(&ctx, "+++b/");
+				the_hash_algo->update_fn(&ctx, p->two->path, len2);
+			} else if (p->two->mode == 0) {
+				patch_id_add_string(&ctx, "---a/");
+				the_hash_algo->update_fn(&ctx, p->one->path, len1);
+				patch_id_add_string(&ctx, "+++/dev/null");
+			} else {
+				patch_id_add_string(&ctx, "---a/");
+				the_hash_algo->update_fn(&ctx, p->one->path, len1);
+				patch_id_add_string(&ctx, "+++b/");
+				the_hash_algo->update_fn(&ctx, p->two->path, len2);
+			}
 
+			if (fill_mmfile(options->repo, &mf1, p->one) < 0 ||
+			    fill_mmfile(options->repo, &mf2, p->two) < 0)
+				return error("unable to read files to diff");
+			xpp.flags = 0;
+			xecfg.ctxlen = 3;
+			xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
+			if (xdi_diff_outf(&mf1, &mf2, NULL,
+					  patch_id_consume, &data, &xpp, &xecfg))
+				return error("unable to generate patch-id diff for %s",
+					     p->one->path);
+		}
 		if (stable)
 			flush_one_hunk(oid, &ctx);
 	}
diff --git a/t/t3419-rebase-patch-id.sh b/t/t3419-rebase-patch-id.sh
index 295040f2fe3..d24e55aac8d 100755
--- a/t/t3419-rebase-patch-id.sh
+++ b/t/t3419-rebase-patch-id.sh
@@ -43,15 +43,16 @@ test_expect_success 'setup: 500 lines' '
 	git add newfile &&
 	git commit -q -m "add small file" &&
 
-	git cherry-pick main >/dev/null 2>&1
-'
+	git cherry-pick main >/dev/null 2>&1 &&
 
-test_expect_success 'setup attributes' '
-	echo "file binary" >.gitattributes
+	git branch -f squashed main &&
+	git checkout -q -f squashed &&
+	git reset -q --soft HEAD~2 &&
+	git commit -q -m squashed
 '
 
 test_expect_success 'detect upstream patch' '
-	git checkout -q main &&
+	git checkout -q main^{} &&
 	scramble file &&
 	git add file &&
 	git commit -q -m "change big file again" &&
@@ -61,14 +62,27 @@ test_expect_success 'detect upstream patch' '
 	test_must_be_empty revs
 '
 
+test_expect_success 'detect upstream patch binary' '
+	echo "file binary" >.gitattributes &&
+	git checkout -q other^{} &&
+	git rebase main &&
+	git rev-list main...HEAD~ >revs &&
+	test_must_be_empty revs &&
+	test_when_finished "rm .gitattributes"
+'
+
 test_expect_success 'do not drop patch' '
-	git branch -f squashed main &&
-	git checkout -q -f squashed &&
-	git reset -q --soft HEAD~2 &&
-	git commit -q -m squashed &&
 	git checkout -q other^{} &&
 	test_must_fail git rebase squashed &&
-	git rebase --quit
+	test_when_finished "git rebase --abort"
+'
+
+test_expect_success 'do not drop patch binary' '
+	echo "file binary" >.gitattributes &&
+	git checkout -q other^{} &&
+	test_must_fail git rebase squashed &&
+	test_when_finished "git rebase --abort" &&
+	test_when_finished "rm .gitattributes"
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v5 2/6] patch-id: use stable patch-id for rebases
  2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  2022-10-24 20:07         ` [PATCH v5 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
@ 2022-10-24 20:07         ` Jerry Zhang via GitGitGadget
  2022-10-24 20:07         ` [PATCH v5 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-24 20:07 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

Git doesn't persist patch-ids during the rebase process, so there is
no need to specifically invoke the unstable variant. Use the stable
logic for all internal patch-id calculations to minimize the number of
code paths and improve test coverage.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 builtin/log.c |  2 +-
 diff.c        | 12 ++++--------
 diff.h        |  2 +-
 patch-ids.c   | 10 +++++-----
 patch-ids.h   |  2 +-
 5 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index ee19dc5d450..e72869afb36 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1763,7 +1763,7 @@ static void prepare_bases(struct base_tree_info *bases,
 		struct object_id *patch_id;
 		if (*commit_base_at(&commit_base, commit))
 			continue;
-		if (commit_patch_id(commit, &diffopt, &oid, 0, 1))
+		if (commit_patch_id(commit, &diffopt, &oid, 0))
 			die(_("cannot get patch id"));
 		ALLOC_GROW(bases->patch_id, bases->nr_patch_id + 1, bases->alloc_patch_id);
 		patch_id = bases->patch_id + bases->nr_patch_id;
diff --git a/diff.c b/diff.c
index c15169e4b06..199b63dbcc3 100644
--- a/diff.c
+++ b/diff.c
@@ -6206,7 +6206,7 @@ static void patch_id_add_mode(git_hash_ctx *ctx, unsigned mode)
 }
 
 /* returns 0 upon success, and writes result into oid */
-static int diff_get_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only, int stable)
+static int diff_get_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i;
@@ -6293,21 +6293,17 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 				return error("unable to generate patch-id diff for %s",
 					     p->one->path);
 		}
-		if (stable)
-			flush_one_hunk(oid, &ctx);
+		flush_one_hunk(oid, &ctx);
 	}
 
-	if (!stable)
-		the_hash_algo->final_oid_fn(oid, &ctx);
-
 	return 0;
 }
 
-int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only, int stable)
+int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i;
-	int result = diff_get_patch_id(options, oid, diff_header_only, stable);
+	int result = diff_get_patch_id(options, oid, diff_header_only);
 
 	for (i = 0; i < q->nr; i++)
 		diff_free_filepair(q->queue[i]);
diff --git a/diff.h b/diff.h
index 8ae18e5ab1e..fd33caeb25d 100644
--- a/diff.h
+++ b/diff.h
@@ -634,7 +634,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option);
 int run_diff_index(struct rev_info *revs, unsigned int option);
 
 int do_diff_cache(const struct object_id *, struct diff_options *);
-int diff_flush_patch_id(struct diff_options *, struct object_id *, int, int);
+int diff_flush_patch_id(struct diff_options *, struct object_id *, int);
 void flush_one_hunk(struct object_id *result, git_hash_ctx *ctx);
 
 int diff_result_code(struct diff_options *, int);
diff --git a/patch-ids.c b/patch-ids.c
index 46c6a8f3eab..31534466266 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -11,7 +11,7 @@ static int patch_id_defined(struct commit *commit)
 }
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-		    struct object_id *oid, int diff_header_only, int stable)
+		    struct object_id *oid, int diff_header_only)
 {
 	if (!patch_id_defined(commit))
 		return -1;
@@ -22,7 +22,7 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
 	else
 		diff_root_tree_oid(&commit->object.oid, "", options);
 	diffcore_std(options);
-	return diff_flush_patch_id(options, oid, diff_header_only, stable);
+	return diff_flush_patch_id(options, oid, diff_header_only);
 }
 
 /*
@@ -48,11 +48,11 @@ static int patch_id_neq(const void *cmpfn_data,
 	b = container_of(entry_or_key, struct patch_id, ent);
 
 	if (is_null_oid(&a->patch_id) &&
-	    commit_patch_id(a->commit, opt, &a->patch_id, 0, 0))
+	    commit_patch_id(a->commit, opt, &a->patch_id, 0))
 		return error("Could not get patch ID for %s",
 			oid_to_hex(&a->commit->object.oid));
 	if (is_null_oid(&b->patch_id) &&
-	    commit_patch_id(b->commit, opt, &b->patch_id, 0, 0))
+	    commit_patch_id(b->commit, opt, &b->patch_id, 0))
 		return error("Could not get patch ID for %s",
 			oid_to_hex(&b->commit->object.oid));
 	return !oideq(&a->patch_id, &b->patch_id);
@@ -82,7 +82,7 @@ static int init_patch_id_entry(struct patch_id *patch,
 	struct object_id header_only_patch_id;
 
 	patch->commit = commit;
-	if (commit_patch_id(commit, &ids->diffopts, &header_only_patch_id, 1, 0))
+	if (commit_patch_id(commit, &ids->diffopts, &header_only_patch_id, 1))
 		return -1;
 
 	hashmap_entry_init(&patch->ent, oidhash(&header_only_patch_id));
diff --git a/patch-ids.h b/patch-ids.h
index ab6c6a68047..490d7393716 100644
--- a/patch-ids.h
+++ b/patch-ids.h
@@ -20,7 +20,7 @@ struct patch_ids {
 };
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-		    struct object_id *oid, int, int);
+		    struct object_id *oid, int);
 int init_patch_ids(struct repository *, struct patch_ids *);
 int free_patch_ids(struct patch_ids *);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v5 3/6] builtin: patch-id: fix patch-id with binary diffs
  2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
  2022-10-24 20:07         ` [PATCH v5 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
  2022-10-24 20:07         ` [PATCH v5 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
@ 2022-10-24 20:07         ` Jerry Zhang via GitGitGadget
  2022-10-24 20:07         ` [PATCH v5 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-24 20:07 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <Jerry@skydio.com>

"git patch-id" currently doesn't produce correct output if the
incoming diff has any binary files. Add logic to get_one_patchid
to handle the different possible styles of binary diff. This
attempts to keep resulting patch-ids identical to what would be
produced by the counterpart logic in diff.c, that is it produces
the id by hashing the a and b oids in succession.

In general we handle binary diffs by first caching the object ids from
the "index" line and using those if we then find an indication
that the diff is binary.

The input could contain patches generated with "git diff --binary". This
currently breaks the parse logic and results in multiple patch-ids
output for a single commit. Here we have to skip the contents of the
patch itself since those do not go into the patch id. --binary
implies --full-index so the object ids are always available.

When the diff is generated with --full-index there is no patch content
to skip over.

When a diff is generated without --full-index or --binary, it will
contain abbreviated object ids. This will still result in a sufficiently
unique patch-id when hashed, but does not match internal patch id
output. We'll call this ok for now as we already need specialized
arguments to diff in order to match internal patch id (namely -U3).

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
---
 builtin/patch-id.c  | 36 ++++++++++++++++++++++++++++++++++--
 t/t4204-patch-id.sh | 29 ++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 881fcf32732..e7a31123142 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -61,6 +61,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 {
 	int patchlen = 0, found_next = 0;
 	int before = -1, after = -1;
+	int diff_is_binary = 0;
+	char pre_oid_str[GIT_MAX_HEXSZ + 1], post_oid_str[GIT_MAX_HEXSZ + 1];
 	git_hash_ctx ctx;
 
 	the_hash_algo->init_fn(&ctx);
@@ -88,14 +90,44 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 
 		/* Parsing diff header?  */
 		if (before == -1) {
-			if (starts_with(line, "index "))
+			if (starts_with(line, "GIT binary patch") ||
+			    starts_with(line, "Binary files")) {
+				diff_is_binary = 1;
+				before = 0;
+				the_hash_algo->update_fn(&ctx, pre_oid_str,
+							 strlen(pre_oid_str));
+				the_hash_algo->update_fn(&ctx, post_oid_str,
+							 strlen(post_oid_str));
+				if (stable)
+					flush_one_hunk(result, &ctx);
 				continue;
-			else if (starts_with(line, "--- "))
+			} else if (skip_prefix(line, "index ", &p)) {
+				char *oid1_end = strstr(line, "..");
+				char *oid2_end = NULL;
+				if (oid1_end)
+					oid2_end = strstr(oid1_end, " ");
+				if (!oid2_end)
+					oid2_end = line + strlen(line) - 1;
+				if (oid1_end != NULL && oid2_end != NULL) {
+					*oid1_end = *oid2_end = '\0';
+					strlcpy(pre_oid_str, p, GIT_MAX_HEXSZ + 1);
+					strlcpy(post_oid_str, oid1_end + 2, GIT_MAX_HEXSZ + 1);
+				}
+				continue;
+			} else if (starts_with(line, "--- "))
 				before = after = 1;
 			else if (!isalpha(line[0]))
 				break;
 		}
 
+		if (diff_is_binary) {
+			if (starts_with(line, "diff ")) {
+				diff_is_binary = 0;
+				before = -1;
+			}
+			continue;
+		}
+
 		/* Looking for a valid hunk header?  */
 		if (before == 0 && after == 0) {
 			if (starts_with(line, "@@ -")) {
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index a730c0db985..cdc5191aa8d 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -42,7 +42,7 @@ calc_patch_id () {
 }
 
 get_top_diff () {
-	git log -p -1 "$@" -O bar-then-foo --
+	git log -p -1 "$@" -O bar-then-foo --full-index --
 }
 
 get_patch_id () {
@@ -61,6 +61,33 @@ test_expect_success 'patch-id detects inequality' '
 	get_patch_id notsame &&
 	! test_cmp patch-id_main patch-id_notsame
 '
+test_expect_success 'patch-id detects equality binary' '
+	cat >.gitattributes <<-\EOF &&
+	foo binary
+	bar binary
+	EOF
+	get_patch_id main &&
+	get_patch_id same &&
+	git log -p -1 --binary main >top-diff.output &&
+	calc_patch_id <top-diff.output main_binpatch &&
+	git log -p -1 --binary same >top-diff.output &&
+	calc_patch_id <top-diff.output same_binpatch &&
+	test_cmp patch-id_main patch-id_main_binpatch &&
+	test_cmp patch-id_same patch-id_same_binpatch &&
+	test_cmp patch-id_main patch-id_same &&
+	test_when_finished "rm .gitattributes"
+'
+
+test_expect_success 'patch-id detects inequality binary' '
+	cat >.gitattributes <<-\EOF &&
+	foo binary
+	bar binary
+	EOF
+	get_patch_id main &&
+	get_patch_id notsame &&
+	! test_cmp patch-id_main patch-id_notsame &&
+	test_when_finished "rm .gitattributes"
+'
 
 test_expect_success 'patch-id supports git-format-patch output' '
 	get_patch_id main &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v5 4/6] patch-id: fix patch-id for mode changes
  2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                           ` (2 preceding siblings ...)
  2022-10-24 20:07         ` [PATCH v5 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
@ 2022-10-24 20:07         ` Jerry Zhang via GitGitGadget
  2022-10-24 20:07         ` [PATCH v5 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-24 20:07 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <Jerry@skydio.com>

Currently patch-id as used in rebase and cherry-pick does not account
for file modes if the file is modified. One consequence of this is
that if you have a local patch that changes modes, but upstream
has applied an outdated version of the patch that doesn't include
that mode change, "git rebase" will drop your local version of the
patch along with your mode changes. It also means that internal
patch-id doesn't produce the same output as the builtin, which does
account for mode changes due to them being part of diff output.

Fix by adding mode to the patch-id if it has changed, in the same
format that would be produced by diff, so that it is compatible
with builtin patch-id.

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
---
 diff.c                     |  5 +++++
 t/t3419-rebase-patch-id.sh | 31 ++++++++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 199b63dbcc3..0e336c48560 100644
--- a/diff.c
+++ b/diff.c
@@ -6256,6 +6256,11 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 		} else if (p->two->mode == 0) {
 			patch_id_add_string(&ctx, "deletedfilemode");
 			patch_id_add_mode(&ctx, p->one->mode);
+		} else if (p->one->mode != p->two->mode) {
+			patch_id_add_string(&ctx, "oldmode");
+			patch_id_add_mode(&ctx, p->one->mode);
+			patch_id_add_string(&ctx, "newmode");
+			patch_id_add_mode(&ctx, p->two->mode);
 		}
 
 		if (diff_header_only) {
diff --git a/t/t3419-rebase-patch-id.sh b/t/t3419-rebase-patch-id.sh
index d24e55aac8d..7181f176b81 100755
--- a/t/t3419-rebase-patch-id.sh
+++ b/t/t3419-rebase-patch-id.sh
@@ -48,7 +48,17 @@ test_expect_success 'setup: 500 lines' '
 	git branch -f squashed main &&
 	git checkout -q -f squashed &&
 	git reset -q --soft HEAD~2 &&
-	git commit -q -m squashed
+	git commit -q -m squashed &&
+
+	git branch -f mode main &&
+	git checkout -q -f mode &&
+	test_chmod +x file &&
+	git commit -q -a --amend &&
+
+	git branch -f modeother other &&
+	git checkout -q -f modeother &&
+	test_chmod +x file &&
+	git commit -q -a --amend
 '
 
 test_expect_success 'detect upstream patch' '
@@ -71,6 +81,13 @@ test_expect_success 'detect upstream patch binary' '
 	test_when_finished "rm .gitattributes"
 '
 
+test_expect_success 'detect upstream patch modechange' '
+	git checkout -q modeother^{} &&
+	git rebase mode &&
+	git rev-list mode...HEAD~ >revs &&
+	test_must_be_empty revs
+'
+
 test_expect_success 'do not drop patch' '
 	git checkout -q other^{} &&
 	test_must_fail git rebase squashed &&
@@ -85,4 +102,16 @@ test_expect_success 'do not drop patch binary' '
 	test_when_finished "rm .gitattributes"
 '
 
+test_expect_success 'do not drop patch modechange' '
+	git checkout -q modeother^{} &&
+	git rebase other &&
+	cat >expected <<-\EOF &&
+	diff --git a/file b/file
+	old mode 100644
+	new mode 100755
+	EOF
+	git diff HEAD~ >modediff &&
+	test_cmp expected modediff
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v5 5/6] builtin: patch-id: add --verbatim as a command mode
  2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                           ` (3 preceding siblings ...)
  2022-10-24 20:07         ` [PATCH v5 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
@ 2022-10-24 20:07         ` Jerry Zhang via GitGitGadget
  2022-10-24 20:07         ` [PATCH v5 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
  2022-10-24 22:55         ` [PATCH v5 0/6] patch-id fixes and improvements Junio C Hamano
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-24 20:07 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <jerry@skydio.com>

There are situations where the user might not want the default
setting where patch-id strips all whitespace. They might be working
in a language where white space is syntactically important, or they
might have CI testing that enforces strict whitespace linting. In
these cases, a whitespace change would result in the patch
fundamentally changing, and thus deserving of a different id.

Add a new mode that is exclusive of --stable and --unstable called
--verbatim. It also corresponds to the config
patchid.verbatim = true. In this mode, the stable algorithm is
used and whitespace is not stripped from the patch text.

Users of --unstable mainly care about compatibility with old git
versions, which unstripping the whitespace would break. Thus there
isn't a usecase for the combination of --verbatim and --unstable,
and we don't expose this so as to not add maintainence burden.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
fixes https://github.com/Skydio/revup/issues/2
---
 Documentation/git-patch-id.txt | 24 +++++++----
 builtin/patch-id.c             | 73 ++++++++++++++++++++++------------
 t/t4204-patch-id.sh            | 66 +++++++++++++++++++++++++++---
 3 files changed, 124 insertions(+), 39 deletions(-)

diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt
index 442caff8a9c..1d15fa45d51 100644
--- a/Documentation/git-patch-id.txt
+++ b/Documentation/git-patch-id.txt
@@ -8,18 +8,18 @@ git-patch-id - Compute unique ID for a patch
 SYNOPSIS
 --------
 [verse]
-'git patch-id' [--stable | --unstable]
+'git patch-id' [--stable | --unstable | --verbatim]
 
 DESCRIPTION
 -----------
 Read a patch from the standard input and compute the patch ID for it.
 
 A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a
-patch, with whitespace and line numbers ignored.  As such, it's "reasonably
-stable", but at the same time also reasonably unique, i.e., two patches that
-have the same "patch ID" are almost guaranteed to be the same thing.
+patch, with line numbers ignored.  As such, it's "reasonably stable", but at
+the same time also reasonably unique, i.e., two patches that have the same
+"patch ID" are almost guaranteed to be the same thing.
 
-IOW, you can use this thing to look for likely duplicate commits.
+The main usecase for this command is to look for likely duplicate commits.
 
 When dealing with 'git diff-tree' output, it takes advantage of
 the fact that the patch is prefixed with the object name of the
@@ -30,6 +30,12 @@ This can be used to make a mapping from patch ID to commit ID.
 OPTIONS
 -------
 
+--verbatim::
+	Calculate the patch-id of the input as it is given, do not strip
+	any whitespace.
+
+	This is the default if patchid.verbatim is true.
+
 --stable::
 	Use a "stable" sum of hashes as the patch ID. With this option:
 	 - Reordering file diffs that make up a patch does not affect the ID.
@@ -45,14 +51,16 @@ OPTIONS
 	   of "-O<orderfile>", thereby making existing databases storing such
 	   "unstable" or historical patch-ids unusable.
 
+	 - All whitespace within the patch is ignored and does not affect the id.
+
 	This is the default if patchid.stable is set to true.
 
 --unstable::
 	Use an "unstable" hash as the patch ID. With this option,
 	the result produced is compatible with the patch-id value produced
-	by git 1.9 and older.  Users with pre-existing databases storing
-	patch-ids produced by git 1.9 and older (who do not deal with reordered
-	patches) may want to use this option.
+	by git 1.9 and older and whitespace is ignored.  Users with pre-existing
+	databases storing patch-ids produced by git 1.9 and older (who do not deal
+	with reordered patches) may want to use this option.
 
 	This is the default.
 
diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index e7a31123142..afdd472369f 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -2,6 +2,7 @@
 #include "builtin.h"
 #include "config.h"
 #include "diff.h"
+#include "parse-options.h"
 
 static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
 {
@@ -57,7 +58,7 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
 }
 
 static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
-			   struct strbuf *line_buf, int stable)
+			   struct strbuf *line_buf, int stable, int verbatim)
 {
 	int patchlen = 0, found_next = 0;
 	int before = -1, after = -1;
@@ -76,8 +77,11 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		if (!skip_prefix(line, "diff-tree ", &p) &&
 		    !skip_prefix(line, "commit ", &p) &&
 		    !skip_prefix(line, "From ", &p) &&
-		    starts_with(line, "\\ ") && 12 < strlen(line))
+		    starts_with(line, "\\ ") && 12 < strlen(line)) {
+			if (verbatim)
+				the_hash_algo->update_fn(&ctx, line, strlen(line));
 			continue;
+		}
 
 		if (!get_oid_hex(p, next_oid)) {
 			found_next = 1;
@@ -152,8 +156,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		if (line[0] == '+' || line[0] == ' ')
 			after--;
 
-		/* Compute the sha without whitespace */
-		len = remove_space(line);
+		/* Add line to hash algo (possibly removing whitespace) */
+		len = verbatim ? strlen(line) : remove_space(line);
 		patchlen += len;
 		the_hash_algo->update_fn(&ctx, line, len);
 	}
@@ -166,7 +170,7 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 	return patchlen;
 }
 
-static void generate_id_list(int stable)
+static void generate_id_list(int stable, int verbatim)
 {
 	struct object_id oid, n, result;
 	int patchlen;
@@ -174,21 +178,32 @@ static void generate_id_list(int stable)
 
 	oidclr(&oid);
 	while (!feof(stdin)) {
-		patchlen = get_one_patchid(&n, &result, &line_buf, stable);
+		patchlen = get_one_patchid(&n, &result, &line_buf, stable, verbatim);
 		flush_current_id(patchlen, &oid, &result);
 		oidcpy(&oid, &n);
 	}
 	strbuf_release(&line_buf);
 }
 
-static const char patch_id_usage[] = "git patch-id [--stable | --unstable]";
+static const char *const patch_id_usage[] = {
+	N_("git patch-id [--stable | --unstable | --verbatim]"), NULL
+};
+
+struct patch_id_opts {
+	int stable;
+	int verbatim;
+};
 
 static int git_patch_id_config(const char *var, const char *value, void *cb)
 {
-	int *stable = cb;
+	struct patch_id_opts *opts = cb;
 
 	if (!strcmp(var, "patchid.stable")) {
-		*stable = git_config_bool(var, value);
+		opts->stable = git_config_bool(var, value);
+		return 0;
+	}
+	if (!strcmp(var, "patchid.verbatim")) {
+		opts->verbatim = git_config_bool(var, value);
 		return 0;
 	}
 
@@ -197,21 +212,29 @@ static int git_patch_id_config(const char *var, const char *value, void *cb)
 
 int cmd_patch_id(int argc, const char **argv, const char *prefix)
 {
-	int stable = -1;
-
-	git_config(git_patch_id_config, &stable);
-
-	/* If nothing is set, default to unstable. */
-	if (stable < 0)
-		stable = 0;
-
-	if (argc == 2 && !strcmp(argv[1], "--stable"))
-		stable = 1;
-	else if (argc == 2 && !strcmp(argv[1], "--unstable"))
-		stable = 0;
-	else if (argc != 1)
-		usage(patch_id_usage);
-
-	generate_id_list(stable);
+	/* if nothing is set, default to unstable */
+	struct patch_id_opts config = {0, 0};
+	int opts = 0;
+	struct option builtin_patch_id_options[] = {
+		OPT_CMDMODE(0, "unstable", &opts,
+		    N_("use the unstable patch-id algorithm"), 1),
+		OPT_CMDMODE(0, "stable", &opts,
+		    N_("use the stable patch-id algorithm"), 2),
+		OPT_CMDMODE(0, "verbatim", &opts,
+			N_("don't strip whitespace from the patch"), 3),
+		OPT_END()
+	};
+
+	git_config(git_patch_id_config, &config);
+
+	/* verbatim implies stable */
+	if (config.verbatim)
+		config.stable = 1;
+
+	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
+			     patch_id_usage, 0);
+
+	generate_id_list(opts ? opts > 1 : config.stable,
+			 opts ? opts == 3 : config.verbatim);
 	return 0;
 }
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index cdc5191aa8d..a7fa94ce0a2 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -8,13 +8,13 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 . ./test-lib.sh
 
 test_expect_success 'setup' '
-	as="a a a a a a a a" && # eight a
-	test_write_lines $as >foo &&
-	test_write_lines $as >bar &&
+	str="ab cd ef gh ij kl mn op" &&
+	test_write_lines $str >foo &&
+	test_write_lines $str >bar &&
 	git add foo bar &&
 	git commit -a -m initial &&
-	test_write_lines $as b >foo &&
-	test_write_lines $as b >bar &&
+	test_write_lines $str b >foo &&
+	test_write_lines $str b >bar &&
 	git commit -a -m first &&
 	git checkout -b same main &&
 	git commit --amend -m same-msg &&
@@ -22,8 +22,23 @@ test_expect_success 'setup' '
 	echo c >foo &&
 	echo c >bar &&
 	git commit --amend -a -m notsame-msg &&
+	git checkout -b with_space main~ &&
+	cat >foo <<-\EOF &&
+	a  b
+	c d
+	e    f
+	  g   h
+	    i   j
+	k l
+	m   n
+	op
+	EOF
+	cp foo bar &&
+	git add foo bar &&
+	git commit --amend -m "with spaces" &&
 	test_write_lines bar foo >bar-then-foo &&
 	test_write_lines foo bar >foo-then-bar
+
 '
 
 test_expect_success 'patch-id output is well-formed' '
@@ -128,9 +143,21 @@ test_patch_id_file_order () {
 	git format-patch -1 --stdout -O foo-then-bar >format-patch.output &&
 	calc_patch_id <format-patch.output "ordered-$name" "$@" &&
 	cmp_patch_id $relevant "$name" "ordered-$name"
+}
 
+test_patch_id_whitespace () {
+	relevant="$1"
+	shift
+	name="ws-${1}-$relevant"
+	shift
+	get_top_diff "main~" >top-diff.output &&
+	calc_patch_id <top-diff.output "$name" "$@" &&
+	get_top_diff "with_space" >top-diff.output &&
+	calc_patch_id <top-diff.output "ws-$name" "$@" &&
+	cmp_patch_id $relevant "$name" "ws-$name"
 }
 
+
 # combined test for options: add more tests here to make them
 # run with all options
 test_patch_id () {
@@ -146,6 +173,14 @@ test_expect_success 'file order is relevant with --unstable' '
 	test_patch_id_file_order relevant --unstable --unstable
 '
 
+test_expect_success 'whitespace is relevant with --verbatim' '
+	test_patch_id_whitespace relevant --verbatim --verbatim
+'
+
+test_expect_success 'whitespace is irrelevant without --verbatim' '
+	test_patch_id_whitespace irrelevant --stable --stable
+'
+
 #Now test various option combinations.
 test_expect_success 'default is unstable' '
 	test_patch_id relevant default
@@ -161,6 +196,17 @@ test_expect_success 'patchid.stable = false is unstable' '
 	test_patch_id relevant patchid.stable=false
 '
 
+test_expect_success 'patchid.verbatim = true is correct and stable' '
+	test_config patchid.verbatim true &&
+	test_patch_id_whitespace relevant patchid.verbatim=true &&
+	test_patch_id irrelevant patchid.verbatim=true
+'
+
+test_expect_success 'patchid.verbatim = false is unstable' '
+	test_config patchid.verbatim false &&
+	test_patch_id relevant patchid.verbatim=false
+'
+
 test_expect_success '--unstable overrides patchid.stable = true' '
 	test_config patchid.stable true &&
 	test_patch_id relevant patchid.stable=true--unstable --unstable
@@ -171,6 +217,11 @@ test_expect_success '--stable overrides patchid.stable = false' '
 	test_patch_id irrelevant patchid.stable=false--stable --stable
 '
 
+test_expect_success '--verbatim overrides patchid.stable = false' '
+	test_config patchid.stable false &&
+	test_patch_id_whitespace relevant stable=false--verbatim --verbatim
+'
+
 test_expect_success 'patch-id supports git-format-patch MIME output' '
 	get_patch_id main &&
 	git checkout same &&
@@ -225,7 +276,10 @@ test_expect_success 'patch-id handles no-nl-at-eof markers' '
 	EOF
 	calc_patch_id nonl <nonl &&
 	calc_patch_id withnl <withnl &&
-	test_cmp patch-id_nonl patch-id_withnl
+	test_cmp patch-id_nonl patch-id_withnl &&
+	calc_patch_id nonl-inc-ws --verbatim <nonl &&
+	calc_patch_id withnl-inc-ws --verbatim <withnl &&
+	! test_cmp patch-id_nonl-inc-ws patch-id_withnl-inc-ws
 '
 
 test_expect_success 'patch-id handles diffs with one line of before/after' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v5 6/6] builtin: patch-id: remove unused diff-tree prefix
  2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                           ` (4 preceding siblings ...)
  2022-10-24 20:07         ` [PATCH v5 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
@ 2022-10-24 20:07         ` Jerry Zhang via GitGitGadget
  2022-10-24 22:55         ` [PATCH v5 0/6] patch-id fixes and improvements Junio C Hamano
  6 siblings, 0 replies; 46+ messages in thread
From: Jerry Zhang via GitGitGadget @ 2022-10-24 20:07 UTC (permalink / raw)
  To: git; +Cc: Jerry Zhang, Jerry Zhang

From: Jerry Zhang <Jerry@skydio.com>

The last git version that had "diff-tree" in the header text
of "git diff-tree" output was v1.3.0 from 2006. The header text
was changed from "diff-tree" to "commit" in 91539833
("Log message printout cleanups").

Given how long ago this change was made, it is highly unlikely that
anyone is still feeding in outputs from that git version.

Remove the handling of the "diff-tree" prefix and document the
source of the other prefixes so that the overall functionality
is more clear.

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
---
 builtin/patch-id.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index afdd472369f..f840fbf1c7e 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -74,8 +74,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		const char *p = line;
 		int len;
 
-		if (!skip_prefix(line, "diff-tree ", &p) &&
-		    !skip_prefix(line, "commit ", &p) &&
+		/* Possibly skip over the prefix added by "log" or "format-patch" */
+		if (!skip_prefix(line, "commit ", &p) &&
 		    !skip_prefix(line, "From ", &p) &&
 		    starts_with(line, "\\ ") && 12 < strlen(line)) {
 			if (verbatim)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v5 0/6] patch-id fixes and improvements
  2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
                           ` (5 preceding siblings ...)
  2022-10-24 20:07         ` [PATCH v5 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
@ 2022-10-24 22:55         ` Junio C Hamano
  6 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2022-10-24 22:55 UTC (permalink / raw)
  To: Jerry Zhang via GitGitGadget; +Cc: git, Jerry Zhang

"Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com> writes:

> These patches add fixes and features to the "git patch-id" command, mostly
> discovered through our usage of patch-id in the revup project
> (https://github.com/Skydio/revup). On top of that I've tried to make general
> cleanup changes where I can.

Thanks, let's move it forward.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2022-10-25  0:32 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-20  5:58 [PATCH 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
2022-09-20  5:58 ` [PATCH 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-09-20  5:58 ` [PATCH 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-09-20  6:20 ` [PATCH v2 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
2022-09-20  6:20   ` [PATCH v2 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-09-20  6:20   ` [PATCH v2 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
2022-10-14  8:56     ` [PATCH v3 1/7] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-14  8:56     ` [PATCH v3 2/7] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-14  8:56     ` [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-14 17:13       ` Junio C Hamano
2022-10-14 22:33         ` Jerry Zhang
2022-10-14 21:12       ` Junio C Hamano
2022-10-14 22:34         ` Jerry Zhang
2022-10-17 15:23           ` Junio C Hamano
2022-10-14  8:56     ` [PATCH v3 4/7] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-14 21:17       ` Junio C Hamano
2022-10-14  8:56     ` [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode Jerry Zhang via GitGitGadget
2022-10-14 21:24       ` Junio C Hamano
2022-10-14 22:55         ` Jerry Zhang
2022-10-17 15:38         ` Junio C Hamano
2022-10-18 22:12           ` Jerry Zhang
2022-10-14  8:56     ` [PATCH v3 6/7] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-14 22:03       ` Junio C Hamano
2022-10-14  8:56     ` [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match Jerry Zhang via GitGitGadget
2022-10-17 15:18       ` Junio C Hamano
2022-10-18 21:57         ` Jerry Zhang
2022-10-19 16:19           ` Junio C Hamano
2022-10-20 23:16     ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-21  9:33         ` Junio C Hamano
2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-24 22:55         ` [PATCH v5 0/6] patch-id fixes and improvements Junio C Hamano
2022-09-21 19:16 ` [PATCH 0/2] update internal patch-id to use "stable" algorithm Junio C Hamano
2022-09-21 20:59   ` Jerry Zhang

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).