From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 8C6E11F403 for ; Thu, 20 Oct 2022 23:17:24 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ib9VSvwC"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229768AbiJTXRT (ORCPT ); Thu, 20 Oct 2022 19:17:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229658AbiJTXRG (ORCPT ); Thu, 20 Oct 2022 19:17:06 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEFFF22E0FB for ; Thu, 20 Oct 2022 16:17:04 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id w18so1969555wro.7 for ; Thu, 20 Oct 2022 16:17:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=c0DFd/4WwKAI4Jr4JudD/7m0+tz8oNLN9zGxWLff8cE=; b=ib9VSvwChJoYq44zgaTpQZRK2cG2qHSV0+CgOJmluTzxLkvPnYVd1MGtFsiWstvZ/y V9AP9CTaecETv46lddsQWXSDdM3pR2htcYWoEraAOA13GRmOjG9vzasJEe42CQTFyC50 XDo2/SOlPQXK9yA0v/nUHe4lMbQC+93pK2q90z/E9pDINezrpqS5qpTzX72qZdyKdsAu 48Av817XsiUgDBGyON/rTVxNpBveHo/Mc4OtXCPpgJwyNZASZm7QzMkSP2ykEBkbNuCB 2dBBMxwCmwh1SCEWknb+rJhxArypVLouXu9ZUypZTnrnWFCq+KGftlfUkJMbUmJz6L3Q 9/tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c0DFd/4WwKAI4Jr4JudD/7m0+tz8oNLN9zGxWLff8cE=; b=wNktNmFtlEPXTqoIiCHekeLwRUmTCeJOsiat11HeI1zBusWAyo2jmVgC+GSGzcts94 LmiEevgJIOpRNGUir+5cS/YtcwSJ84vSdY9pH8B11Dtz1xj654bAB7857k9ji+1+3bXd 8X3ty7RZLDL5VVsdUndO1ltVd2B+b9Qf79KymGXNPuMqkvW3NJY+fyUQbFtBJmO+7PKR 0e2j1kYH+PdyPMbAIk/ZXUzLGTPMY/ospwehLW/wYVxoxETp/NaK7ZWxpgxD21WH/Hy2 5Z9CyTSO3oX2LIS04dCdJ+8COIbns26UzqeE9u7U/iyCDDdtp0ihDzoPNqmChkzLG+VA DMyg== X-Gm-Message-State: ACrzQf3+lLuPka1f466JUqTf3Pblz+MW3P2RZKt455O9yzDyAX4uN3eO vbaIH9XK9FBJE9/Ewz3WeadH0JaFJwY= X-Google-Smtp-Source: AMsMyM6g+uyvIeTc2sK2vG1uSCcGzJHAQfRsbC/Yq4pkfjazoGXnBXQNcoAjTc4FHwErM3fG4PgqWw== X-Received: by 2002:a5d:5a11:0:b0:22e:3ed1:e426 with SMTP id bq17-20020a5d5a11000000b0022e3ed1e426mr10098148wrb.642.1666307822988; Thu, 20 Oct 2022 16:17:02 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l25-20020a1ced19000000b003c6cdbface4sm812165wmh.11.2022.10.20.16.17.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Oct 2022 16:17:02 -0700 (PDT) Message-Id: In-Reply-To: References: From: "Jerry Zhang via GitGitGadget" Date: Thu, 20 Oct 2022 23:16:54 +0000 Subject: [PATCH v4 5/6] builtin: patch-id: add --verbatim as a command mode Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Jerry Zhang , Jerry Zhang Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jerry Zhang There are situations where the user might not want the default setting where patch-id strips all whitespace. They might be working in a language where white space is syntactically important, or they might have CI testing that enforces strict whitespace linting. In these cases, a whitespace change would result in the patch fundamentally changing, and thus deserving of a different id. Add a new mode that is exclusive of --stable and --unstable called --verbatim. It also corresponds to the config patchid.verbatim = true. In this mode, the stable algorithm is used and whitespace is not stripped from the patch text. Users of --unstable mainly care about compatibility with old git versions, which unstripping the whitespace would break. Thus there isn't a usecase for the combination of --verbatim and --unstable, and we don't expose this so as to not add maintainence burden. Signed-off-by: Jerry Zhang fixes https://github.com/Skydio/revup/issues/2 --- Documentation/git-patch-id.txt | 24 +++++++---- builtin/patch-id.c | 73 ++++++++++++++++++++++------------ t/t4204-patch-id.sh | 66 +++++++++++++++++++++++++++--- 3 files changed, 124 insertions(+), 39 deletions(-) diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt index 442caff8a9c..1d15fa45d51 100644 --- a/Documentation/git-patch-id.txt +++ b/Documentation/git-patch-id.txt @@ -8,18 +8,18 @@ git-patch-id - Compute unique ID for a patch SYNOPSIS -------- [verse] -'git patch-id' [--stable | --unstable] +'git patch-id' [--stable | --unstable | --verbatim] DESCRIPTION ----------- Read a patch from the standard input and compute the patch ID for it. A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a -patch, with whitespace and line numbers ignored. As such, it's "reasonably -stable", but at the same time also reasonably unique, i.e., two patches that -have the same "patch ID" are almost guaranteed to be the same thing. +patch, with line numbers ignored. As such, it's "reasonably stable", but at +the same time also reasonably unique, i.e., two patches that have the same +"patch ID" are almost guaranteed to be the same thing. -IOW, you can use this thing to look for likely duplicate commits. +The main usecase for this command is to look for likely duplicate commits. When dealing with 'git diff-tree' output, it takes advantage of the fact that the patch is prefixed with the object name of the @@ -30,6 +30,12 @@ This can be used to make a mapping from patch ID to commit ID. OPTIONS ------- +--verbatim:: + Calculate the patch-id of the input as it is given, do not strip + any whitespace. + + This is the default if patchid.verbatim is true. + --stable:: Use a "stable" sum of hashes as the patch ID. With this option: - Reordering file diffs that make up a patch does not affect the ID. @@ -45,14 +51,16 @@ OPTIONS of "-O", thereby making existing databases storing such "unstable" or historical patch-ids unusable. + - All whitespace within the patch is ignored and does not affect the id. + This is the default if patchid.stable is set to true. --unstable:: Use an "unstable" hash as the patch ID. With this option, the result produced is compatible with the patch-id value produced - by git 1.9 and older. Users with pre-existing databases storing - patch-ids produced by git 1.9 and older (who do not deal with reordered - patches) may want to use this option. + by git 1.9 and older and whitespace is ignored. Users with pre-existing + databases storing patch-ids produced by git 1.9 and older (who do not deal + with reordered patches) may want to use this option. This is the default. diff --git a/builtin/patch-id.c b/builtin/patch-id.c index e7a31123142..afdd472369f 100644 --- a/builtin/patch-id.c +++ b/builtin/patch-id.c @@ -2,6 +2,7 @@ #include "builtin.h" #include "config.h" #include "diff.h" +#include "parse-options.h" static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result) { @@ -57,7 +58,7 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after) } static int get_one_patchid(struct object_id *next_oid, struct object_id *result, - struct strbuf *line_buf, int stable) + struct strbuf *line_buf, int stable, int verbatim) { int patchlen = 0, found_next = 0; int before = -1, after = -1; @@ -76,8 +77,11 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result, if (!skip_prefix(line, "diff-tree ", &p) && !skip_prefix(line, "commit ", &p) && !skip_prefix(line, "From ", &p) && - starts_with(line, "\\ ") && 12 < strlen(line)) + starts_with(line, "\\ ") && 12 < strlen(line)) { + if (verbatim) + the_hash_algo->update_fn(&ctx, line, strlen(line)); continue; + } if (!get_oid_hex(p, next_oid)) { found_next = 1; @@ -152,8 +156,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result, if (line[0] == '+' || line[0] == ' ') after--; - /* Compute the sha without whitespace */ - len = remove_space(line); + /* Add line to hash algo (possibly removing whitespace) */ + len = verbatim ? strlen(line) : remove_space(line); patchlen += len; the_hash_algo->update_fn(&ctx, line, len); } @@ -166,7 +170,7 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result, return patchlen; } -static void generate_id_list(int stable) +static void generate_id_list(int stable, int verbatim) { struct object_id oid, n, result; int patchlen; @@ -174,21 +178,32 @@ static void generate_id_list(int stable) oidclr(&oid); while (!feof(stdin)) { - patchlen = get_one_patchid(&n, &result, &line_buf, stable); + patchlen = get_one_patchid(&n, &result, &line_buf, stable, verbatim); flush_current_id(patchlen, &oid, &result); oidcpy(&oid, &n); } strbuf_release(&line_buf); } -static const char patch_id_usage[] = "git patch-id [--stable | --unstable]"; +static const char *const patch_id_usage[] = { + N_("git patch-id [--stable | --unstable | --verbatim]"), NULL +}; + +struct patch_id_opts { + int stable; + int verbatim; +}; static int git_patch_id_config(const char *var, const char *value, void *cb) { - int *stable = cb; + struct patch_id_opts *opts = cb; if (!strcmp(var, "patchid.stable")) { - *stable = git_config_bool(var, value); + opts->stable = git_config_bool(var, value); + return 0; + } + if (!strcmp(var, "patchid.verbatim")) { + opts->verbatim = git_config_bool(var, value); return 0; } @@ -197,21 +212,29 @@ static int git_patch_id_config(const char *var, const char *value, void *cb) int cmd_patch_id(int argc, const char **argv, const char *prefix) { - int stable = -1; - - git_config(git_patch_id_config, &stable); - - /* If nothing is set, default to unstable. */ - if (stable < 0) - stable = 0; - - if (argc == 2 && !strcmp(argv[1], "--stable")) - stable = 1; - else if (argc == 2 && !strcmp(argv[1], "--unstable")) - stable = 0; - else if (argc != 1) - usage(patch_id_usage); - - generate_id_list(stable); + /* if nothing is set, default to unstable */ + struct patch_id_opts config = {0, 0}; + int opts = 0; + struct option builtin_patch_id_options[] = { + OPT_CMDMODE(0, "unstable", &opts, + N_("use the unstable patch-id algorithm"), 1), + OPT_CMDMODE(0, "stable", &opts, + N_("use the stable patch-id algorithm"), 2), + OPT_CMDMODE(0, "verbatim", &opts, + N_("don't strip whitespace from the patch"), 3), + OPT_END() + }; + + git_config(git_patch_id_config, &config); + + /* verbatim implies stable */ + if (config.verbatim) + config.stable = 1; + + argc = parse_options(argc, argv, prefix, builtin_patch_id_options, + patch_id_usage, 0); + + generate_id_list(opts ? opts > 1 : config.stable, + opts ? opts == 3 : config.verbatim); return 0; } diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh index cdc5191aa8d..a7fa94ce0a2 100755 --- a/t/t4204-patch-id.sh +++ b/t/t4204-patch-id.sh @@ -8,13 +8,13 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh test_expect_success 'setup' ' - as="a a a a a a a a" && # eight a - test_write_lines $as >foo && - test_write_lines $as >bar && + str="ab cd ef gh ij kl mn op" && + test_write_lines $str >foo && + test_write_lines $str >bar && git add foo bar && git commit -a -m initial && - test_write_lines $as b >foo && - test_write_lines $as b >bar && + test_write_lines $str b >foo && + test_write_lines $str b >bar && git commit -a -m first && git checkout -b same main && git commit --amend -m same-msg && @@ -22,8 +22,23 @@ test_expect_success 'setup' ' echo c >foo && echo c >bar && git commit --amend -a -m notsame-msg && + git checkout -b with_space main~ && + cat >foo <<-\EOF && + a b + c d + e f + g h + i j + k l + m n + op + EOF + cp foo bar && + git add foo bar && + git commit --amend -m "with spaces" && test_write_lines bar foo >bar-then-foo && test_write_lines foo bar >foo-then-bar + ' test_expect_success 'patch-id output is well-formed' ' @@ -128,9 +143,21 @@ test_patch_id_file_order () { git format-patch -1 --stdout -O foo-then-bar >format-patch.output && calc_patch_id top-diff.output && + calc_patch_id top-diff.output && + calc_patch_id