From: "Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Jerry Zhang <jerry@skydio.com>, Jerry Zhang <jerry@skydio.com>
Subject: [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode
Date: Fri, 14 Oct 2022 08:56:42 +0000 [thread overview]
Message-ID: <bbaa2425ad0cbb4b945cdce3402c6ed5fab381ec.1665737804.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1359.v3.git.1665737804.gitgitgadget@gmail.com>
From: Jerry Zhang <jerry@skydio.com>
There are situations where the user might not want the default setting
where patch-id strips all whitespace. They might be working in a
language where white space is syntactically important, or they might
have CI testing that enforces strict whitespace linting. In these cases,
a whitespace change would result in the patch fundamentally changing,
and thus deserving of a different id.
Add a new mode that is exclusive of --stable and --unstable called
--include-whitespace. It also corresponds to the config
patchid.include_whitespace = true. In this mode, the stable algorithm
is used and whitespace is not stripped from the patch text.
Signed-off-by: Jerry Zhang <jerry@skydio.com>
fixes https://github.com/Skydio/revup/issues/2
---
Documentation/git-patch-id.txt | 25 ++++++++----
builtin/patch-id.c | 74 ++++++++++++++++++++++------------
t/t4204-patch-id.sh | 66 +++++++++++++++++++++++++++---
3 files changed, 126 insertions(+), 39 deletions(-)
diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt
index 442caff8a9c..8eab4cdfe1d 100644
--- a/Documentation/git-patch-id.txt
+++ b/Documentation/git-patch-id.txt
@@ -8,18 +8,18 @@ git-patch-id - Compute unique ID for a patch
SYNOPSIS
--------
[verse]
-'git patch-id' [--stable | --unstable]
+'git patch-id' [--stable | --unstable | --include-whitespace]
DESCRIPTION
-----------
Read a patch from the standard input and compute the patch ID for it.
A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a
-patch, with whitespace and line numbers ignored. As such, it's "reasonably
-stable", but at the same time also reasonably unique, i.e., two patches that
-have the same "patch ID" are almost guaranteed to be the same thing.
+patch, with line numbers ignored. As such, it's "reasonably stable", but at
+the same time also reasonably unique, i.e., two patches that have the same
+"patch ID" are almost guaranteed to be the same thing.
-IOW, you can use this thing to look for likely duplicate commits.
+The main usecase for this command is to look for likely duplicate commits.
When dealing with 'git diff-tree' output, it takes advantage of
the fact that the patch is prefixed with the object name of the
@@ -30,6 +30,13 @@ This can be used to make a mapping from patch ID to commit ID.
OPTIONS
-------
+--include-whitespace::
+ Use the "stable" algorithm described below and also don't strip whitespace
+ from lines when calculating the patch-id.
+
+ This is the default if patchid.includeWhitespace is true and implies
+ patchid.stable.
+
--stable::
Use a "stable" sum of hashes as the patch ID. With this option:
- Reordering file diffs that make up a patch does not affect the ID.
@@ -45,14 +52,16 @@ OPTIONS
of "-O<orderfile>", thereby making existing databases storing such
"unstable" or historical patch-ids unusable.
+ - All whitespace within the patch is ignored and does not affect the id.
+
This is the default if patchid.stable is set to true.
--unstable::
Use an "unstable" hash as the patch ID. With this option,
the result produced is compatible with the patch-id value produced
- by git 1.9 and older. Users with pre-existing databases storing
- patch-ids produced by git 1.9 and older (who do not deal with reordered
- patches) may want to use this option.
+ by git 1.9 and older and whitespace is ignored. Users with pre-existing
+ databases storing patch-ids produced by git 1.9 and older (who do not deal
+ with reordered patches) may want to use this option.
This is the default.
diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index e7a31123142..745fe193a71 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -2,6 +2,7 @@
#include "builtin.h"
#include "config.h"
#include "diff.h"
+#include "parse-options.h"
static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
{
@@ -57,7 +58,7 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
}
static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
- struct strbuf *line_buf, int stable)
+ struct strbuf *line_buf, int stable, int include_whitespace)
{
int patchlen = 0, found_next = 0;
int before = -1, after = -1;
@@ -76,8 +77,11 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
if (!skip_prefix(line, "diff-tree ", &p) &&
!skip_prefix(line, "commit ", &p) &&
!skip_prefix(line, "From ", &p) &&
- starts_with(line, "\\ ") && 12 < strlen(line))
+ starts_with(line, "\\ ") && 12 < strlen(line)) {
+ if (include_whitespace)
+ the_hash_algo->update_fn(&ctx, line, strlen(line));
continue;
+ }
if (!get_oid_hex(p, next_oid)) {
found_next = 1;
@@ -152,8 +156,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
if (line[0] == '+' || line[0] == ' ')
after--;
- /* Compute the sha without whitespace */
- len = remove_space(line);
+ /* Add line to hash algo (possibly removing whitespace) */
+ len = include_whitespace ? strlen(line) : remove_space(line);
patchlen += len;
the_hash_algo->update_fn(&ctx, line, len);
}
@@ -166,7 +170,7 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
return patchlen;
}
-static void generate_id_list(int stable)
+static void generate_id_list(int stable, int include_whitespace)
{
struct object_id oid, n, result;
int patchlen;
@@ -174,21 +178,33 @@ static void generate_id_list(int stable)
oidclr(&oid);
while (!feof(stdin)) {
- patchlen = get_one_patchid(&n, &result, &line_buf, stable);
+ patchlen = get_one_patchid(&n, &result, &line_buf, stable, include_whitespace);
flush_current_id(patchlen, &oid, &result);
oidcpy(&oid, &n);
}
strbuf_release(&line_buf);
}
-static const char patch_id_usage[] = "git patch-id [--stable | --unstable]";
+static const char * const patch_id_usage[] = {
+ N_("git patch-id [--stable | --unstable | --include-whitespace]"),
+ NULL
+};
+
+struct patch_id_opts {
+ int stable;
+ int include_whitespace;
+};
static int git_patch_id_config(const char *var, const char *value, void *cb)
{
- int *stable = cb;
+ struct patch_id_opts *opts = cb;
if (!strcmp(var, "patchid.stable")) {
- *stable = git_config_bool(var, value);
+ opts->stable = git_config_bool(var, value);
+ return 0;
+ }
+ if (!strcmp(var, "patchid.includewhitespace")) {
+ opts->include_whitespace = git_config_bool(var, value);
return 0;
}
@@ -197,21 +213,29 @@ static int git_patch_id_config(const char *var, const char *value, void *cb)
int cmd_patch_id(int argc, const char **argv, const char *prefix)
{
- int stable = -1;
-
- git_config(git_patch_id_config, &stable);
-
- /* If nothing is set, default to unstable. */
- if (stable < 0)
- stable = 0;
-
- if (argc == 2 && !strcmp(argv[1], "--stable"))
- stable = 1;
- else if (argc == 2 && !strcmp(argv[1], "--unstable"))
- stable = 0;
- else if (argc != 1)
- usage(patch_id_usage);
-
- generate_id_list(stable);
+ /* if nothing is set, default to unstable */
+ struct patch_id_opts config = {0, 0};
+ int opts = 0;
+ struct option builtin_patch_id_options[] = {
+ OPT_CMDMODE(0, "unstable", &opts,
+ N_("use the unstable patch-id algorithm"), 1),
+ OPT_CMDMODE(0, "stable", &opts,
+ N_("use the stable patch-id algorithm"), 2),
+ OPT_CMDMODE(0, "include-whitespace", &opts,
+ N_("use the stable algorithm and don't strip whitespace"), 3),
+ OPT_END()
+ };
+
+ git_config(git_patch_id_config, &config);
+
+ /* includeWhitespace implies stable */
+ if (config.include_whitespace)
+ config.stable = 1;
+
+ argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
+ patch_id_usage, 0);
+
+ generate_id_list(opts ? opts > 1 : config.stable,
+ opts ? opts == 3 : config.include_whitespace);
return 0;
}
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index cdc5191aa8d..107e5a59fee 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -8,13 +8,13 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
. ./test-lib.sh
test_expect_success 'setup' '
- as="a a a a a a a a" && # eight a
- test_write_lines $as >foo &&
- test_write_lines $as >bar &&
+ str="ab cd ef gh ij kl mn op" &&
+ test_write_lines $str >foo &&
+ test_write_lines $str >bar &&
git add foo bar &&
git commit -a -m initial &&
- test_write_lines $as b >foo &&
- test_write_lines $as b >bar &&
+ test_write_lines $str b >foo &&
+ test_write_lines $str b >bar &&
git commit -a -m first &&
git checkout -b same main &&
git commit --amend -m same-msg &&
@@ -22,8 +22,23 @@ test_expect_success 'setup' '
echo c >foo &&
echo c >bar &&
git commit --amend -a -m notsame-msg &&
+ git checkout -b with_space main~ &&
+ cat >foo <<-\EOF &&
+ a b
+ c d
+ e f
+ g h
+ i j
+ k l
+ m n
+ op
+ EOF
+ cp foo bar &&
+ git add foo bar &&
+ git commit --amend -m "with spaces" &&
test_write_lines bar foo >bar-then-foo &&
test_write_lines foo bar >foo-then-bar
+
'
test_expect_success 'patch-id output is well-formed' '
@@ -128,9 +143,21 @@ test_patch_id_file_order () {
git format-patch -1 --stdout -O foo-then-bar >format-patch.output &&
calc_patch_id <format-patch.output "ordered-$name" "$@" &&
cmp_patch_id $relevant "$name" "ordered-$name"
+}
+test_patch_id_whitespace () {
+ relevant="$1"
+ shift
+ name="ws-${1}-$relevant"
+ shift
+ get_top_diff "main~" >top-diff.output &&
+ calc_patch_id <top-diff.output "$name" "$@" &&
+ get_top_diff "with_space" >top-diff.output &&
+ calc_patch_id <top-diff.output "ws-$name" "$@" &&
+ cmp_patch_id $relevant "$name" "ws-$name"
}
+
# combined test for options: add more tests here to make them
# run with all options
test_patch_id () {
@@ -146,6 +173,14 @@ test_expect_success 'file order is relevant with --unstable' '
test_patch_id_file_order relevant --unstable --unstable
'
+test_expect_success 'whitespace is relevant with --include-whitespace' '
+ test_patch_id_whitespace relevant --include-whitespace --include-whitespace
+'
+
+test_expect_success 'whitespace is irrelevant without --include-whitespace' '
+ test_patch_id_whitespace irrelevant --stable --stable
+'
+
#Now test various option combinations.
test_expect_success 'default is unstable' '
test_patch_id relevant default
@@ -161,6 +196,17 @@ test_expect_success 'patchid.stable = false is unstable' '
test_patch_id relevant patchid.stable=false
'
+test_expect_success 'patchid.includeWhitespace = true is correct and stable' '
+ test_config patchid.includeWhitespace true &&
+ test_patch_id_whitespace relevant patchid.includeWhitespace=true &&
+ test_patch_id irrelevant patchid.includeWhitespace=true
+'
+
+test_expect_success 'patchid.includeWhitespace = false is unstable' '
+ test_config patchid.includeWhitespace false &&
+ test_patch_id relevant patchid.includeWhitespace=false
+'
+
test_expect_success '--unstable overrides patchid.stable = true' '
test_config patchid.stable true &&
test_patch_id relevant patchid.stable=true--unstable --unstable
@@ -171,6 +217,11 @@ test_expect_success '--stable overrides patchid.stable = false' '
test_patch_id irrelevant patchid.stable=false--stable --stable
'
+test_expect_success '--include-whitespace overrides patchid.stable = false' '
+ test_config patchid.stable false &&
+ test_patch_id_whitespace relevant stable=false--include-whitespace --include-whitespace
+'
+
test_expect_success 'patch-id supports git-format-patch MIME output' '
get_patch_id main &&
git checkout same &&
@@ -225,7 +276,10 @@ test_expect_success 'patch-id handles no-nl-at-eof markers' '
EOF
calc_patch_id nonl <nonl &&
calc_patch_id withnl <withnl &&
- test_cmp patch-id_nonl patch-id_withnl
+ test_cmp patch-id_nonl patch-id_withnl &&
+ calc_patch_id nonl-inc-ws --include-whitespace <nonl &&
+ calc_patch_id withnl-inc-ws --include-whitespace <withnl &&
+ ! test_cmp patch-id_nonl-inc-ws patch-id_withnl-inc-ws
'
test_expect_success 'patch-id handles diffs with one line of before/after' '
--
gitgitgadget
next prev parent reply other threads:[~2022-10-14 8:57 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-20 5:58 [PATCH 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
2022-09-20 5:58 ` [PATCH 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-09-20 5:58 ` [PATCH 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-09-20 6:20 ` [PATCH v2 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
2022-09-20 6:20 ` [PATCH v2 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-09-20 6:20 ` [PATCH v2 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-14 8:56 ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
2022-10-14 8:56 ` [PATCH v3 1/7] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-14 8:56 ` [PATCH v3 2/7] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-14 8:56 ` [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-14 17:13 ` Junio C Hamano
2022-10-14 22:33 ` Jerry Zhang
2022-10-14 21:12 ` Junio C Hamano
2022-10-14 22:34 ` Jerry Zhang
2022-10-17 15:23 ` Junio C Hamano
2022-10-14 8:56 ` [PATCH v3 4/7] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-14 21:17 ` Junio C Hamano
2022-10-14 8:56 ` Jerry Zhang via GitGitGadget [this message]
2022-10-14 21:24 ` [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode Junio C Hamano
2022-10-14 22:55 ` Jerry Zhang
2022-10-17 15:38 ` Junio C Hamano
2022-10-18 22:12 ` Jerry Zhang
2022-10-14 8:56 ` [PATCH v3 6/7] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-14 22:03 ` Junio C Hamano
2022-10-14 8:56 ` [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match Jerry Zhang via GitGitGadget
2022-10-17 15:18 ` Junio C Hamano
2022-10-18 21:57 ` Jerry Zhang
2022-10-19 16:19 ` Junio C Hamano
2022-10-20 23:16 ` [PATCH v4 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
2022-10-20 23:16 ` [PATCH v4 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-20 23:16 ` [PATCH v4 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-20 23:16 ` [PATCH v4 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-20 23:16 ` [PATCH v4 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-20 23:16 ` [PATCH v4 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
2022-10-20 23:16 ` [PATCH v4 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-21 9:33 ` Junio C Hamano
2022-10-24 20:07 ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
2022-10-24 20:07 ` [PATCH v5 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-24 20:07 ` [PATCH v5 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-24 20:07 ` [PATCH v5 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-24 20:07 ` [PATCH v5 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-24 20:07 ` [PATCH v5 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
2022-10-24 20:07 ` [PATCH v5 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-24 22:55 ` [PATCH v5 0/6] patch-id fixes and improvements Junio C Hamano
2022-09-21 19:16 ` [PATCH 0/2] update internal patch-id to use "stable" algorithm Junio C Hamano
2022-09-21 20:59 ` Jerry Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bbaa2425ad0cbb4b945cdce3402c6ed5fab381ec.1665737804.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=jerry@skydio.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).