git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: martin.agren@gmail.com, sandals@crustytoothpaste.net,
	me@ttaylorr.com, abhishekkumar8222@gmail.com,
	sunshine@sunshineco.com,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v2 0/3] SHA-256: Update commit-graph and multi-pack-index formats
Date: Mon, 17 Aug 2020 14:04:45 +0000	[thread overview]
Message-ID: <pull.703.v2.git.1597673089.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.703.git.1597428440.gitgitgadget@gmail.com>

As discussed [1], there is some concern around binary file formats requiring
the context of the repository config in order to infer hash lengths. Two
formats that were designed with the hash transition in mind (commit-graph
and multi-pack-index) have bytes available to indicate the hash algorithm
used. Let's actually update these formats to be more self-contained with the
two hash algorithms being available.

[1] 
https://lore.kernel.org/git/CAN0heSp024=Kyy7gdQ2VSetk_5iVhj_qdT8CMVPcry_AwWrhHQ@mail.gmail.com/

This merges cleanly with tb/bloom-improvements, but both that branch and
this patch series have merge conflicts with the corrected commit date patch
series [2].

[2] 
https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@gmail.com/

In particular, the following conflict can be resolved in the "obvioius" way:

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HEAD
    header: 43475048 1 $(test_oid oid_version) 3 $NUM_BASE
================================
    header: 43475048 1 1 4 $NUM_BASE
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abhishek/corrected_commit_date

Instead use:

    header: 43475048 1 $(test_oid oid_version) 4 $NUM_BASE

But, it also needs the following fix to actually work with this series:

diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index 211ec625d2..09f133792c 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -464,7 +464,7 @@ test_expect_success 'setup repo for mixed generation commit-graph-chain' '
        GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split=no-merge &&
        test-tool read-graph >output &&
        cat >expect <<-EOF &&
-       header: 43475048 1 1 4 1
+       header: 43475048 1 $(test_oid oid_version) 4 1
        num_commits: 2
        chunks: oid_fanout oid_lookup commit_metadata
        EOF
@@ -482,7 +482,7 @@ test_expect_success 'does not write generation data chunk if not present on exis
        git commit-graph write --reachable --split=no-merge &&
        test-tool read-graph >output &&
        cat >expect <<-EOF &&
-       header: 43475048 1 1 4 2
+       header: 43475048 1 $(test_oid oid_version) 4 2
        num_commits: 3
        chunks: oid_fanout oid_lookup commit_metadata
        EOF

If this is the way we want to go with the formats, then I'll assist
coordinating these textual and semantic merge conflicts.

UPDATES IN V2
=============

 * Documentation is updated, thanks to Eric's suggestion.
 * The implementation of oid_version() and the way we access it in the test
   scripts is improved, thanks to Brian's suggestion.
 * I use "mv" instead of "cp" in the cross-version tests because of a
   subtlety on macOS when overwriting these files.

Thanks, -Stolee

Derrick Stolee (3):
  t/README: document GIT_TEST_DEFAULT_HASH
  commit-graph: use the "hash version" byte
  multi-pack-index: use hash version byte

 .../technical/commit-graph-format.txt         |  9 +++-
 Documentation/technical/pack-format.txt       |  7 ++-
 commit-graph.c                                |  9 +++-
 midx.c                                        | 35 ++++++++++++---
 t/README                                      |  4 ++
 t/helper/test-read-midx.c                     |  8 +++-
 t/t4216-log-bloom.sh                          |  9 +++-
 t/t5318-commit-graph.sh                       | 38 +++++++++++++++-
 t/t5319-multi-pack-index.sh                   | 43 +++++++++++++++++--
 t/t5324-split-commit-graph.sh                 |  5 ++-
 10 files changed, 146 insertions(+), 21 deletions(-)


base-commit: 878e727637ec5815ccb3301eb994a54df95b21b8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-703%2Fderrickstolee%2Fcommit-graph-256-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-703/derrickstolee/commit-graph-256-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/703

Range-diff vs v1:

 1:  242a44b63c ! 1:  62e7247bad t/README: document GIT_TEST_DEFAULT_HASH
     @@ Metadata
       ## Commit message ##
          t/README: document GIT_TEST_DEFAULT_HASH
      
     +    Helped-by: Eric Sunshine <sunshine@sunshineco.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## t/README ##
     @@ t/README: GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS=<boolean>, when true (which is
       the default when running tests), errors out when an abbreviated option
       is used.
       
     -+GIT_TEST_DEFAULT_HASH=<sha1|sha256> specifies which hash algorithm to use
     -+in the test scripts.
     ++GIT_TEST_DEFAULT_HASH=<hash-algo> specifies which hash algorithm to
     ++use in the test scripts. Recognized values for <hash-algo> are "sha1"
     ++and "sha256".
      +
       Naming Tests
       ------------
 2:  4bbfd345d1 ! 2:  8d481f3b22 commit-graph: use the hash version byte
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    commit-graph: use the hash version byte
     +    commit-graph: use the "hash version" byte
      
          The commit-graph format reserved a byte among the header of the file to
          store a "hash version". During the SHA-256 work, this was not modified
     @@ Commit message
          each type then swaps the commit-graph files. The important value here is
          that the "git log" command succeeds while writing a message to stderr.
      
     +    Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/technical/commit-graph-format.txt ##
     @@ commit-graph.c: static char *get_chain_filename(struct object_directory *odb)
       static uint8_t oid_version(void)
       {
      -	return 1;
     -+	if (the_hash_algo->rawsz == GIT_SHA1_RAWSZ)
     ++	switch (hash_algo_by_ptr(the_hash_algo)) {
     ++	case GIT_HASH_SHA1:
      +		return 1;
     -+	if (the_hash_algo->rawsz == GIT_SHA256_RAWSZ)
     ++	case GIT_HASH_SHA256:
      +		return 2;
     -+	die(_("invalid hash version"));
     ++	default:
     ++		die(_("invalid hash version"));
     ++	}
       }
       
       static struct commit_graph *alloc_commit_graph(void)
      
       ## t/t4216-log-bloom.sh ##
     -@@ t/t4216-log-bloom.sh: test_description='git log for a path with Bloom filters'
     - GIT_TEST_COMMIT_GRAPH=0
     - GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
     - 
     -+OID_VERSION=1
     -+if [ "$GIT_DEFAULT_HASH" = "sha256" ]
     -+then
     -+	OID_VERSION=2
     -+fi
     -+
     - test_expect_success 'setup test - repo, commits, commit graph, log outputs' '
     - 	git init &&
     - 	mkdir A A/B A/B/C &&
      @@ t/t4216-log-bloom.sh: test_expect_success 'setup test - repo, commits, commit graph, log outputs' '
     + 	rm file_to_be_deleted &&
     + 	git add . &&
     + 	git commit -m "file removed" &&
     +-	git commit-graph write --reachable --changed-paths
     ++	git commit-graph write --reachable --changed-paths &&
     ++
     ++	test_oid_cache <<-EOF
     ++	oid_version sha1:1
     ++	oid_version sha256:2
     ++	EOF
     + '
       graph_read_expect () {
       	NUM_CHUNKS=5
       	cat >expect <<- EOF
      -	header: 43475048 1 1 $NUM_CHUNKS 0
     -+	header: 43475048 1 $OID_VERSION $NUM_CHUNKS 0
     ++	header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0
       	num_commits: $1
       	chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data
       	EOF
      
       ## t/t5318-commit-graph.sh ##
     -@@ t/t5318-commit-graph.sh: test_description='commit graph'
     - 
     - GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
     - 
     -+OID_VERSION=1
     -+if [ "$GIT_DEFAULT_HASH" = "sha256" ]
     -+then
     -+	OID_VERSION=2
     -+fi
     -+
     - test_expect_success 'setup full repo' '
     - 	mkdir full &&
     +@@ t/t5318-commit-graph.sh: test_expect_success 'setup full repo' '
       	cd "$TRASH_DIRECTORY/full" &&
     + 	git init &&
     + 	git config core.commitGraph true &&
     +-	objdir=".git/objects"
     ++	objdir=".git/objects" &&
     ++
     ++	test_oid_cache <<-EOF
     ++	oid_version sha1:1
     ++	oid_version sha256:2
     ++	EOF
     + '
     + 
     + test_expect_success POSIXPERM 'tweak umask for modebit tests' '
      @@ t/t5318-commit-graph.sh: graph_read_expect() {
       		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
       	fi
       	cat >expect <<- EOF
      -	header: 43475048 1 1 $NUM_CHUNKS 0
     -+	header: 43475048 1 $OID_VERSION $NUM_CHUNKS 0
     ++	header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0
       	num_commits: $1
       	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
       	EOF
     @@ t/t5318-commit-graph.sh: test_expect_success 'replace-objects invalidates commit
       # If the file changes the set of commits in the list, then the
      
       ## t/t5324-split-commit-graph.sh ##
     -@@ t/t5324-split-commit-graph.sh: test_description='split commit graph'
     - GIT_TEST_COMMIT_GRAPH=0
     - GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
     +@@ t/t5324-split-commit-graph.sh: test_expect_success 'setup repo' '
       
     -+OID_VERSION=1
     -+if [ "$GIT_DEFAULT_HASH" = "sha256" ]
     -+then
     -+	OID_VERSION=2
     -+fi
     + 	base sha1:1376
     + 	base sha256:1496
      +
     - test_expect_success 'setup repo' '
     - 	git init &&
     - 	git config core.commitGraph true &&
     ++	oid_version sha1:1
     ++	oid_version sha256:2
     + 	EOM
     + '
     + 
      @@ t/t5324-split-commit-graph.sh: graph_read_expect() {
       		NUM_BASE=$2
       	fi
       	cat >expect <<- EOF
      -	header: 43475048 1 1 3 $NUM_BASE
     -+	header: 43475048 1 $OID_VERSION 3 $NUM_BASE
     ++	header: 43475048 1 $(test_oid oid_version) 3 $NUM_BASE
       	num_commits: $1
       	chunks: oid_fanout oid_lookup commit_metadata
       	EOF
 3:  b4645789ad ! 3:  822e46868f multi-pack-index: use hash version byte
     @@ Commit message
          matches, we change the corrupted byte from "2" to "3" to ensure the test
          fails for both hash algorithms.
      
     +    Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/technical/pack-format.txt ##
     @@ midx.c
       
      +static uint8_t oid_version(void)
      +{
     -+	if (the_hash_algo->rawsz == GIT_SHA1_RAWSZ)
     ++	switch (hash_algo_by_ptr(the_hash_algo)) {
     ++	case GIT_HASH_SHA1:
      +		return 1;
     -+	if (the_hash_algo->rawsz == GIT_SHA256_RAWSZ)
     ++	case GIT_HASH_SHA256:
      +		return 2;
     -+	die(_("invalid hash version"));
     ++	default:
     ++		die(_("invalid hash version"));
     ++	}
      +}
      +
       static char *get_midx_filename(const char *object_dir)

-- 
gitgitgadget

  parent reply	other threads:[~2020-08-17 14:04 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-14 18:07 [PATCH 0/3] SHA-256: Update commit-graph and multi-pack-index formats Derrick Stolee via GitGitGadget
2020-08-14 18:07 ` [PATCH 1/3] t/README: document GIT_TEST_DEFAULT_HASH Derrick Stolee via GitGitGadget
2020-08-14 19:02   ` Junio C Hamano
2020-08-14 20:39   ` Eric Sunshine
2020-08-14 20:41     ` Derrick Stolee
2020-08-14 18:07 ` [PATCH 2/3] commit-graph: use the hash version byte Derrick Stolee via GitGitGadget
2020-08-14 19:05   ` Junio C Hamano
2020-08-14 20:05     ` Taylor Blau
2020-08-14 20:11   ` brian m. carlson
2020-08-14 20:22     ` Junio C Hamano
2020-08-14 20:36     ` Derrick Stolee
2020-08-15 13:46       ` Martin Ågren
2020-08-14 18:07 ` [PATCH 3/3] multi-pack-index: use " Derrick Stolee via GitGitGadget
2020-08-14 20:14   ` brian m. carlson
2020-08-14 19:25 ` [PATCH 0/3] SHA-256: Update commit-graph and multi-pack-index formats Junio C Hamano
2020-08-14 20:34   ` Derrick Stolee
2020-08-14 21:41     ` Junio C Hamano
2020-08-17 14:04 ` Derrick Stolee via GitGitGadget [this message]
2020-08-17 14:04   ` [PATCH v2 1/3] t/README: document GIT_TEST_DEFAULT_HASH Derrick Stolee via GitGitGadget
2020-08-17 14:04   ` [PATCH v2 2/3] commit-graph: use the "hash version" byte Derrick Stolee via GitGitGadget
2020-08-17 14:04   ` [PATCH v2 3/3] multi-pack-index: use hash version byte Derrick Stolee via GitGitGadget
2020-08-17 23:12   ` [PATCH v2 0/3] SHA-256: Update commit-graph and multi-pack-index formats brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.703.v2.git.1597673089.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=martin.agren@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).