git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Jerry Zhang <jerry@skydio.com>
Subject: [PATCH v4 0/6] patch-id fixes and improvements
Date: Thu, 20 Oct 2022 23:16:49 +0000	[thread overview]
Message-ID: <pull.1359.v4.git.1666307815.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1359.v3.git.1665737804.gitgitgadget@gmail.com>

These patches add fixes and features to the "git patch-id" command, mostly
discovered through our usage of patch-id in the revup project
(https://github.com/Skydio/revup). On top of that I've tried to make general
cleanup changes where I can.

Summary:

1: Fixed a bug in the combination of --stable with binary files and
header-only, and expanded the test to cover both binary and non-binary
files.

2: Switch internal usage of patch-id in rebase / cherry-pick to use the
stable variant to reduce the number of code paths and improve testing for
bugs like above.

3: Fixed bugs with patch-id and binary diffs. Previously patch-id did not
behave correctly for binary diffs regardless of whether "--binary" was given
to "diff".

4: Fixed bugs with patch-id and mode changes. Previously mode changes were
incorrectly excluded from the patch-id.

5: Add a new "--include-whitespace" mode to patch-id that prevents
whitespace from being stripped during id calculation. Also add a config
option for the same behavior.

6: Remove unused prefix from patch-id logic.

V1->V2: Fixed comment style V2->V3: The ---/+++ lines no longer get added to
the patch-id of binary diffs. Also added patches 3-7 in the series. V3->V3:
Dropped patch7. Updated flag name to --verbatim. Updated commit message
descriptions.

Signed-off-by: Jerry Zhang jerry@skydio.com

Jerry Zhang (6):
  patch-id: fix stable patch id for binary / header-only
  patch-id: use stable patch-id for rebases
  builtin: patch-id: fix patch-id with binary diffs
  patch-id: fix patch-id for mode changes
  builtin: patch-id: add --verbatim as a command mode
  builtin: patch-id: remove unused diff-tree prefix

 Documentation/git-patch-id.txt |  24 ++++---
 builtin/log.c                  |   2 +-
 builtin/patch-id.c             | 113 ++++++++++++++++++++++++---------
 diff.c                         |  75 +++++++++++-----------
 diff.h                         |   2 +-
 patch-ids.c                    |  10 +--
 patch-ids.h                    |   2 +-
 t/t3419-rebase-patch-id.sh     |  63 +++++++++++++++---
 t/t4204-patch-id.sh            |  95 +++++++++++++++++++++++++--
 9 files changed, 287 insertions(+), 99 deletions(-)


base-commit: 45c9f05c44b1cb6bd2d6cb95a22cf5e3d21d5b63
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1359%2Fjerry-skydio%2Fjerry%2Frevup%2Fmaster%2Fpatch_ids-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1359/jerry-skydio/jerry/revup/master/patch_ids-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1359

Range-diff vs v3:

 1:  7d4c2e91ce0 ! 1:  321757ef919 patch-id: fix stable patch id for binary / header-only
     @@ Metadata
       ## Commit message ##
          patch-id: fix stable patch id for binary / header-only
      
     -    Previous logic here skipped flushing the hunks for binary
     -    and header-only patch ids, which would always result in a
     -    patch-id of 0000.
     +    Patch-ids for binary patches are found by hashing the object
     +    ids of the before and after objects in succession. However in
     +    the --stable case, there is a bug where hunks are not flushed
     +    for binary and header-only patch ids, which would always result
     +    in a patch-id of 0000. The --unstable case is currently correct.
      
          Reorder the logic to branch into 3 cases for populating the
          patch body: header-only which populates nothing, binary which
 2:  25e28b7dab3 = 2:  ec4a2422d5b patch-id: use stable patch-id for rebases
 3:  21642128927 ! 3:  81501355313 builtin: patch-id: fix patch-id with binary diffs
     @@ Commit message
          builtin: patch-id: fix patch-id with binary diffs
      
          "git patch-id" currently doesn't produce correct output if the
     -    incoming diff has any binary files. Add logic to
     -    get_one_patchid to handle the different possible styles of binary
     -    diff. This attempts to keep resulting patch-ids identical to what
     -    would be produced by the counterpart logic in diff.c, that is it
     -    produces the id by hashing the a and b oids in succession.
     +    incoming diff has any binary files. Add logic to get_one_patchid
     +    to handle the different possible styles of binary diff. This
     +    attempts to keep resulting patch-ids identical to what would be
     +    produced by the counterpart logic in diff.c, that is it produces
     +    the id by hashing the a and b oids in succession.
      
          In general we handle binary diffs by first caching the object ids from
          the "index" line and using those if we then find an indication
 4:  6e07cfd5691 = 4:  bb0b4add03c patch-id: fix patch-id for mode changes
 5:  bbaa2425ad0 ! 5:  b160f2ae49f builtin: patch-id: add --include-whitespace as a command mode
     @@ Metadata
      Author: Jerry Zhang <jerry@skydio.com>
      
       ## Commit message ##
     -    builtin: patch-id: add --include-whitespace as a command mode
     +    builtin: patch-id: add --verbatim as a command mode
      
     -    There are situations where the user might not want the default setting
     -    where patch-id strips all whitespace. They might be working in a
     -    language where white space is syntactically important, or they might
     -    have CI testing that enforces strict whitespace linting. In these cases,
     -    a whitespace change would result in the patch fundamentally changing,
     -    and thus deserving of a different id.
     +    There are situations where the user might not want the default
     +    setting where patch-id strips all whitespace. They might be working
     +    in a language where white space is syntactically important, or they
     +    might have CI testing that enforces strict whitespace linting. In
     +    these cases, a whitespace change would result in the patch
     +    fundamentally changing, and thus deserving of a different id.
      
          Add a new mode that is exclusive of --stable and --unstable called
     -    --include-whitespace. It also corresponds to the config
     -    patchid.include_whitespace = true. In this mode, the stable algorithm
     -    is used and whitespace is not stripped from the patch text.
     +    --verbatim. It also corresponds to the config
     +    patchid.verbatim = true. In this mode, the stable algorithm is
     +    used and whitespace is not stripped from the patch text.
     +
     +    Users of --unstable mainly care about compatibility with old git
     +    versions, which unstripping the whitespace would break. Thus there
     +    isn't a usecase for the combination of --verbatim and --unstable,
     +    and we don't expose this so as to not add maintainence burden.
      
          Signed-off-by: Jerry Zhang <jerry@skydio.com>
          fixes https://github.com/Skydio/revup/issues/2
     @@ Documentation/git-patch-id.txt: git-patch-id - Compute unique ID for a patch
       --------
       [verse]
      -'git patch-id' [--stable | --unstable]
     -+'git patch-id' [--stable | --unstable | --include-whitespace]
     ++'git patch-id' [--stable | --unstable | --verbatim]
       
       DESCRIPTION
       -----------
     @@ Documentation/git-patch-id.txt: This can be used to make a mapping from patch ID
       OPTIONS
       -------
       
     -+--include-whitespace::
     -+	Use the "stable" algorithm described below and also don't strip whitespace
     -+	from lines when calculating the patch-id.
     ++--verbatim::
     ++	Calculate the patch-id of the input as it is given, do not strip
     ++	any whitespace.
      +
     -+	This is the default if patchid.includeWhitespace is true and implies
     -+	patchid.stable.
     ++	This is the default if patchid.verbatim is true.
      +
       --stable::
       	Use a "stable" sum of hashes as the patch ID. With this option:
     @@ builtin/patch-id.c: static int scan_hunk_header(const char *p, int *p_before, in
       
       static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
      -			   struct strbuf *line_buf, int stable)
     -+			   struct strbuf *line_buf, int stable, int include_whitespace)
     ++			   struct strbuf *line_buf, int stable, int verbatim)
       {
       	int patchlen = 0, found_next = 0;
       	int before = -1, after = -1;
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
       		    !skip_prefix(line, "From ", &p) &&
      -		    starts_with(line, "\\ ") && 12 < strlen(line))
      +		    starts_with(line, "\\ ") && 12 < strlen(line)) {
     -+			if (include_whitespace)
     ++			if (verbatim)
      +				the_hash_algo->update_fn(&ctx, line, strlen(line));
       			continue;
      +		}
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
      -		/* Compute the sha without whitespace */
      -		len = remove_space(line);
      +		/* Add line to hash algo (possibly removing whitespace) */
     -+		len = include_whitespace ? strlen(line) : remove_space(line);
     ++		len = verbatim ? strlen(line) : remove_space(line);
       		patchlen += len;
       		the_hash_algo->update_fn(&ctx, line, len);
       	}
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
       }
       
      -static void generate_id_list(int stable)
     -+static void generate_id_list(int stable, int include_whitespace)
     ++static void generate_id_list(int stable, int verbatim)
       {
       	struct object_id oid, n, result;
       	int patchlen;
     @@ builtin/patch-id.c: static void generate_id_list(int stable)
       	oidclr(&oid);
       	while (!feof(stdin)) {
      -		patchlen = get_one_patchid(&n, &result, &line_buf, stable);
     -+		patchlen = get_one_patchid(&n, &result, &line_buf, stable, include_whitespace);
     ++		patchlen = get_one_patchid(&n, &result, &line_buf, stable, verbatim);
       		flush_current_id(patchlen, &oid, &result);
       		oidcpy(&oid, &n);
       	}
     @@ builtin/patch-id.c: static void generate_id_list(int stable)
       }
       
      -static const char patch_id_usage[] = "git patch-id [--stable | --unstable]";
     -+static const char * const patch_id_usage[] = {
     -+	N_("git patch-id [--stable | --unstable | --include-whitespace]"),
     -+	NULL
     ++static const char *const patch_id_usage[] = {
     ++	N_("git patch-id [--stable | --unstable | --verbatim]"), NULL
      +};
      +
      +struct patch_id_opts {
      +	int stable;
     -+	int include_whitespace;
     ++	int verbatim;
      +};
       
       static int git_patch_id_config(const char *var, const char *value, void *cb)
     @@ builtin/patch-id.c: static void generate_id_list(int stable)
      +		opts->stable = git_config_bool(var, value);
      +		return 0;
      +	}
     -+	if (!strcmp(var, "patchid.includewhitespace")) {
     -+		opts->include_whitespace = git_config_bool(var, value);
     ++	if (!strcmp(var, "patchid.verbatim")) {
     ++		opts->verbatim = git_config_bool(var, value);
       		return 0;
       	}
       
     @@ builtin/patch-id.c: static int git_patch_id_config(const char *var, const char *
      +	int opts = 0;
      +	struct option builtin_patch_id_options[] = {
      +		OPT_CMDMODE(0, "unstable", &opts,
     -+			N_("use the unstable patch-id algorithm"), 1),
     ++		    N_("use the unstable patch-id algorithm"), 1),
      +		OPT_CMDMODE(0, "stable", &opts,
     -+			N_("use the stable patch-id algorithm"), 2),
     -+		OPT_CMDMODE(0, "include-whitespace", &opts,
     -+			N_("use the stable algorithm and don't strip whitespace"), 3),
     ++		    N_("use the stable patch-id algorithm"), 2),
     ++		OPT_CMDMODE(0, "verbatim", &opts,
     ++			N_("don't strip whitespace from the patch"), 3),
      +		OPT_END()
      +	};
      +
      +	git_config(git_patch_id_config, &config);
      +
     -+	/* includeWhitespace implies stable */
     -+	if (config.include_whitespace)
     ++	/* verbatim implies stable */
     ++	if (config.verbatim)
      +		config.stable = 1;
      +
      +	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
      +			     patch_id_usage, 0);
      +
      +	generate_id_list(opts ? opts > 1 : config.stable,
     -+			 opts ? opts == 3 : config.include_whitespace);
     ++			 opts ? opts == 3 : config.verbatim);
       	return 0;
       }
      
     @@ t/t4204-patch-id.sh: test_expect_success 'file order is relevant with --unstable
       	test_patch_id_file_order relevant --unstable --unstable
       '
       
     -+test_expect_success 'whitespace is relevant with --include-whitespace' '
     -+	test_patch_id_whitespace relevant --include-whitespace --include-whitespace
     ++test_expect_success 'whitespace is relevant with --verbatim' '
     ++	test_patch_id_whitespace relevant --verbatim --verbatim
      +'
      +
     -+test_expect_success 'whitespace is irrelevant without --include-whitespace' '
     ++test_expect_success 'whitespace is irrelevant without --verbatim' '
      +	test_patch_id_whitespace irrelevant --stable --stable
      +'
      +
     @@ t/t4204-patch-id.sh: test_expect_success 'patchid.stable = false is unstable' '
       	test_patch_id relevant patchid.stable=false
       '
       
     -+test_expect_success 'patchid.includeWhitespace = true is correct and stable' '
     -+	test_config patchid.includeWhitespace true &&
     -+	test_patch_id_whitespace relevant patchid.includeWhitespace=true &&
     -+	test_patch_id irrelevant patchid.includeWhitespace=true
     ++test_expect_success 'patchid.verbatim = true is correct and stable' '
     ++	test_config patchid.verbatim true &&
     ++	test_patch_id_whitespace relevant patchid.verbatim=true &&
     ++	test_patch_id irrelevant patchid.verbatim=true
      +'
      +
     -+test_expect_success 'patchid.includeWhitespace = false is unstable' '
     -+	test_config patchid.includeWhitespace false &&
     -+	test_patch_id relevant patchid.includeWhitespace=false
     ++test_expect_success 'patchid.verbatim = false is unstable' '
     ++	test_config patchid.verbatim false &&
     ++	test_patch_id relevant patchid.verbatim=false
      +'
      +
       test_expect_success '--unstable overrides patchid.stable = true' '
     @@ t/t4204-patch-id.sh: test_expect_success '--stable overrides patchid.stable = fa
       	test_patch_id irrelevant patchid.stable=false--stable --stable
       '
       
     -+test_expect_success '--include-whitespace overrides patchid.stable = false' '
     ++test_expect_success '--verbatim overrides patchid.stable = false' '
      +	test_config patchid.stable false &&
     -+	test_patch_id_whitespace relevant stable=false--include-whitespace --include-whitespace
     ++	test_patch_id_whitespace relevant stable=false--verbatim --verbatim
      +'
      +
       test_expect_success 'patch-id supports git-format-patch MIME output' '
     @@ t/t4204-patch-id.sh: test_expect_success 'patch-id handles no-nl-at-eof markers'
       	calc_patch_id withnl <withnl &&
      -	test_cmp patch-id_nonl patch-id_withnl
      +	test_cmp patch-id_nonl patch-id_withnl &&
     -+	calc_patch_id nonl-inc-ws --include-whitespace <nonl &&
     -+	calc_patch_id withnl-inc-ws --include-whitespace <withnl &&
     ++	calc_patch_id nonl-inc-ws --verbatim <nonl &&
     ++	calc_patch_id withnl-inc-ws --verbatim <withnl &&
      +	! test_cmp patch-id_nonl-inc-ws patch-id_withnl-inc-ws
       '
       
 6:  a1f6f36d487 ! 6:  dcdfac7a153 builtin: patch-id: remove unused diff-tree prefix
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
      +		if (!skip_prefix(line, "commit ", &p) &&
       		    !skip_prefix(line, "From ", &p) &&
       		    starts_with(line, "\\ ") && 12 < strlen(line)) {
     - 			if (include_whitespace)
     + 			if (verbatim)
 7:  69440797f30 < -:  ----------- documentation: format-patch: clarify requirements for patch-ids to match

-- 
gitgitgadget

  parent reply	other threads:[~2022-10-20 23:17 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20  5:58 [PATCH 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
2022-09-20  5:58 ` [PATCH 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-09-20  5:58 ` [PATCH 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-09-20  6:20 ` [PATCH v2 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
2022-09-20  6:20   ` [PATCH v2 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-09-20  6:20   ` [PATCH v2 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
2022-10-14  8:56     ` [PATCH v3 1/7] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-14  8:56     ` [PATCH v3 2/7] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-14  8:56     ` [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-14 17:13       ` Junio C Hamano
2022-10-14 22:33         ` Jerry Zhang
2022-10-14 21:12       ` Junio C Hamano
2022-10-14 22:34         ` Jerry Zhang
2022-10-17 15:23           ` Junio C Hamano
2022-10-14  8:56     ` [PATCH v3 4/7] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-14 21:17       ` Junio C Hamano
2022-10-14  8:56     ` [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode Jerry Zhang via GitGitGadget
2022-10-14 21:24       ` Junio C Hamano
2022-10-14 22:55         ` Jerry Zhang
2022-10-17 15:38         ` Junio C Hamano
2022-10-18 22:12           ` Jerry Zhang
2022-10-14  8:56     ` [PATCH v3 6/7] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-14 22:03       ` Junio C Hamano
2022-10-14  8:56     ` [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match Jerry Zhang via GitGitGadget
2022-10-17 15:18       ` Junio C Hamano
2022-10-18 21:57         ` Jerry Zhang
2022-10-19 16:19           ` Junio C Hamano
2022-10-20 23:16     ` Jerry Zhang via GitGitGadget [this message]
2022-10-20 23:16       ` [PATCH v4 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-21  9:33         ` Junio C Hamano
2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-24 22:55         ` [PATCH v5 0/6] patch-id fixes and improvements Junio C Hamano
2022-09-21 19:16 ` [PATCH 0/2] update internal patch-id to use "stable" algorithm Junio C Hamano
2022-09-21 20:59   ` Jerry Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1359.v4.git.1666307815.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jerry@skydio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).