git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: jrnieder@gmail.com, Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH 03/30] extensions: add refFormat extension
Date: Mon, 07 Nov 2022 18:35:37 +0000	[thread overview]
Message-ID: <4013f992d15aab69346bf6f8eafe38511b923595.1667846164.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1408.git.1667846164.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

Git's reference storage is critical to its function. Creating new
storage formats for references requires adding an extension. This
prevents third-party tools that do not understand that format from
operating incorrectly on the repository. This makes updating ref formats
more difficult than other optional indexes, such as the commit-graph or
multi-pack-index.

However, there are a number of potential ref storage enhancements that
are underway or could be created. Git needs an established mechanism for
coordinating between these different options.

The first obvious format update is the reftable format as documented in
Documentation/technical/reftable.txt. This format has much of its
implementation already in Git, but its connection as a ref backend is
not complete. This change is similar to some changes within one of the
patches intended for the reftable effort [1].

[1] https://lore.kernel.org/git/pull.1215.git.git.1644351400761.gitgitgadget@gmail.com/

However, this change makes a distinct strategy change from the one
recommended by reftable. Here, the extensions.refFormat extension is
provided as a multi-valued list. In the reftable RFC, the extension has
a single value, "files" or "reftable" and explicitly states that this
should not change after 'git init' or 'git clone'.

The single-valued approach has some major drawbacks, including the idea
that the "files" backend cannot coexist with the "reftable" backend at
the same time. In this way, it would not be possible to create a
repository that can write loose references and combine them into a
reftable in the background. With the multi-valued approach, we could
integrate reftable as a drop-in replacement for the packed-refs file and
allow that to be a faster way to do the integration since the test suite
would only need updates when the test is explicitly testing packed-refs.

When upgrading a repository from the "files" backend to the "reftable"
backend, it can help to have a transition period where both are present,
then finally removing the "files" backend after all loose refs are
collected into the reftable.

But the reftable is not the only approach available.

One obvious improvement could be a new file format version for the
packed-refs file. Its current plaintext-based format is inefficient due
to storing object IDs as hexadecimal representations instead of in
their raw format. This extra cost will get worse with SHA-256. In
addition, binary searches need to guess a position and scan to find
newlines for a refname entry. A structured binary format could allow for
more compact representation and faster access. Adding such a format
could be seen as "files-v2", but it is really "packed-v2".

The reftable approach has a concept of a "stack" of reftable files. This
idea would also work for a stack of packed-refs files (in v1 or v2
format). It would be helpful to describe that the refs could be stored
in a stack of packed-ref files independently of whether that is in file
format v1 or v2.

Even in these two options, it might be helpful to indicate whether or
not loose ref files are present. That is one reason to not make them
appear as "files-v2" or "files-v3" options in a single-valued extension.
Even as "packed-v2" or "packed-v3" options, this approach would require
third-party tools to understand the "v2" version if they want to support
the "v3" options. Instead, by splitting the format from the layout, we
can allow third-party tools to integrate only with the most-desired
format options.

For these reasons, this change is defining the extensions.refFormat
extension as well as how the two existing values interact. By default,
Git will assume "files" and "packed" in the list. If any other value
is provided, then the extension is marked as unrecognized.

Add tests that check the behavior of extensions.refFormat, both in that
it requires core.repositoryFormatVersion=1, and Git will refuse to work
with an unknown value of the extension.

There is a gap in the current implementation, though. What happens if
exactly one of "files" or "packed" is provided? The presence of only one
would imply that the other is not available. A later change can
communicate the list contents to the repository struct and then the
reference backend could ignore one of these two layers.

Specifically, having only "files" would mean that Git should not read or
write the packed-refs file and instead only read and write loose ref
files. By contrast, having only "packed" would mean that Git should not
read or write loose ref files and instead always update the packed-refs
file on every ref update.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/extensions.txt | 41 +++++++++++++++++++++++++++++
 setup.c                             |  5 ++++
 t/t3212-ref-formats.sh              | 27 +++++++++++++++++++
 3 files changed, 73 insertions(+)
 create mode 100755 t/t3212-ref-formats.sh

diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt
index bccaec7a963..ce8185adf53 100644
--- a/Documentation/config/extensions.txt
+++ b/Documentation/config/extensions.txt
@@ -7,6 +7,47 @@ Note that this setting should only be set by linkgit:git-init[1] or
 linkgit:git-clone[1].  Trying to change it after initialization will not
 work and will produce hard-to-diagnose issues.
 
+extensions.refFormat::
+	Specify the reference storage mechanisms used by the repoitory as a
+	multi-valued list. The acceptable values are `files` and `packed`.
+	If not specified, the list of `files` and `packed` is assumed. It
+	is an error to specify this key unless `core.repositoryFormatVersion`
+	is 1.
++
+As new ref formats are added, Git commands may modify this list before and
+after upgrading the on-disk reference storage files. The specific values
+indicate the existence of different layers:
++
+--
+`files`;;
+	When present, references may be stored as "loose" reference files
+	in the `$GIT_DIR/refs/` directory. The name of the reference
+	corresponds to the filename after `$GIT_DIR` and the file contains
+	an object ID as a hexadecimal string. If a loose reference file
+	exists, then its value takes precedence over all other formats.
+
+`packed`;;
+	When present, references may be stored as a group in a
+	`packed-refs` file in its version 1 format. When grouped with
+	`"files"` or provided on its own, this file is located at
+	`$GIT_DIR/packed-refs`. This file contains a list of distinct
+	reference names, paired with their object IDs. When combined with
+	`files`, the `packed` format will only be used to group multiple
+	loose object files upon request via the `git pack-refs` command or
+	via the `pack-refs` maintenance task.
+--
++
+The following combinations are supported by this version of Git:
++
+--
+`files` and `packed`;;
+	This set of values indicates that references are stored both as
+	loose reference files and in the `packed-refs` file in its v1
+	format. Loose references are preferred, and the `packed-refs` file
+	is updated only when deleting a reference that is stored in the
+	`packed-refs` file or during a `git pack-refs` command.
+--
+
 extensions.worktreeConfig::
 	If enabled, then worktrees will load config settings from the
 	`$GIT_DIR/config.worktree` file in addition to the
diff --git a/setup.c b/setup.c
index cefd5f63c46..f5eb50c969a 100644
--- a/setup.c
+++ b/setup.c
@@ -577,6 +577,11 @@ static enum extension_result handle_extension(const char *var,
 				     "extensions.objectformat", value);
 		data->hash_algo = format;
 		return EXTENSION_OK;
+	} else if (!strcmp(ext, "refformat")) {
+		if (strcmp(value, "files") && strcmp(value, "packed"))
+			return error(_("invalid value for '%s': '%s'"),
+				     "extensions.refFormat", value);
+		return EXTENSION_OK;
 	}
 	return EXTENSION_UNKNOWN;
 }
diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh
new file mode 100755
index 00000000000..bc554e7c701
--- /dev/null
+++ b/t/t3212-ref-formats.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+test_description='test across ref formats'
+
+. ./test-lib.sh
+
+test_expect_success 'extensions.refFormat requires core.repositoryFormatVersion=1' '
+	test_when_finished rm -rf broken &&
+
+	# Force sha1 to ensure GIT_TEST_DEFAULT_HASH does
+	# not imply a value of core.repositoryFormatVersion.
+	git init --object-format=sha1 broken &&
+	git -C broken config extensions.refFormat files &&
+	test_must_fail git -C broken status 2>err &&
+	grep "repo version is 0, but v1-only extension found" err
+'
+
+test_expect_success 'invalid extensions.refFormat' '
+	test_when_finished rm -rf broken &&
+	git init broken &&
+	git -C broken config core.repositoryFormatVersion 1 &&
+	git -C broken config extensions.refFormat bogus &&
+	test_must_fail git -C broken status 2>err &&
+	grep "invalid value for '\''extensions.refFormat'\'': '\''bogus'\''" err
+'
+
+test_done
-- 
gitgitgadget


  parent reply	other threads:[~2022-11-07 18:37 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-07 18:35 [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 01/30] hashfile: allow skipping the hash function Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 02/30] read-cache: add index.computeHash config option Derrick Stolee via GitGitGadget
2022-11-11 23:31   ` Elijah Newren
2022-11-14 16:30     ` Derrick Stolee
2022-11-17 16:13   ` Ævar Arnfjörð Bjarmason
2022-11-07 18:35 ` Derrick Stolee via GitGitGadget [this message]
2022-11-11 23:39   ` [PATCH 03/30] extensions: add refFormat extension Elijah Newren
2022-11-16 14:37     ` Derrick Stolee
2022-11-07 18:35 ` [PATCH 04/30] config: fix multi-level bulleted list Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 05/30] repository: wire ref extensions to ref backends Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 06/30] refs: allow loose files without packed-refs Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 07/30] chunk-format: number of chunks is optional Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 08/30] chunk-format: document trailing table of contents Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 09/30] chunk-format: store chunk offset during write Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 10/30] chunk-format: allow trailing table of contents Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 11/30] chunk-format: parse " Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 12/30] refs: extract packfile format to new file Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 13/30] packed-backend: extract add_write_error() Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 14/30] packed-backend: extract iterator/updates merge Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 15/30] packed-backend: create abstraction for writing refs Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 16/30] config: add config values for packed-refs v2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 17/30] packed-backend: create shell of v2 writes Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 18/30] packed-refs: write file format version 2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 19/30] packed-refs: read file format v2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 20/30] packed-refs: read optional prefix chunks Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 21/30] packed-refs: write " Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 22/30] packed-backend: create GIT_TEST_PACKED_REFS_VERSION Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 23/30] t1409: test with packed-refs v2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 24/30] t5312: allow packed-refs v2 format Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 25/30] t5502: add PACKED_REFS_V1 prerequisite Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 26/30] t3210: require packed-refs v1 for some tests Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 27/30] t*: skip packed-refs v2 over http tests Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 28/30] ci: run GIT_TEST_PACKED_REFS_VERSION=2 in some builds Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 29/30] p1401: create performance test for ref operations Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 30/30] refs: skip hashing when writing packed-refs v2 Derrick Stolee via GitGitGadget
2022-11-09 15:15 ` [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format Derrick Stolee
2022-11-11 23:28 ` Elijah Newren
2022-11-14  0:07   ` Derrick Stolee
2022-11-15  2:47     ` Elijah Newren
2022-11-16 14:45       ` Derrick Stolee
2022-11-17  4:28         ` Elijah Newren
2022-11-18 23:31     ` Junio C Hamano
2022-11-19  0:41       ` Elijah Newren
2022-11-19  3:00         ` Taylor Blau
2022-11-30 15:31       ` Derrick Stolee
2022-11-28 18:56 ` Han-Wen Nienhuys
2022-11-30 15:16   ` Derrick Stolee
2022-11-30 15:38     ` Phillip Wood
2022-11-30 16:37     ` Taylor Blau
2022-11-30 18:30     ` Han-Wen Nienhuys
2022-11-30 18:37       ` Sean Allred
2022-12-01 20:18       ` Derrick Stolee
2022-12-02 16:46         ` Han-Wen Nienhuys
2022-12-02 18:24           ` Ævar Arnfjörð Bjarmason
2022-11-30 22:55     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4013f992d15aab69346bf6f8eafe38511b923595.1667846164.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).