git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Patrick Steinhardt <ps@pks.im>,
	Christian Couder <christian.couder@gmail.com>,
	Albert Cui <albertqcui@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	"brian m . carlson" <sandals@crustytoothpaste.net>,
	"Robin H . Johnson" <robbat2@gentoo.org>
Subject: Re: [PATCH 2/3] protocol v2: specify static seeding of clone/fetch via "bundle-uri"
Date: Tue, 26 Oct 2021 17:00:08 +0200	[thread overview]
Message-ID: <211026.86cznrzu8g.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <f2269fc7-1688-d62e-02bb-01c5b5e33143@gmail.com>


On Tue, Oct 26 2021, Derrick Stolee wrote:

> On 10/25/2021 5:25 PM, Ævar Arnfjörð Bjarmason wrote:
>> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
>> +bundle-uri CLIENT ERROR RECOVERY
>> +++++++++++++++++++++++++++++++++
>> +
>> +A client MUST above all gracefully degrade on errors, whether that
>> +error is because of bad missing/data in the bundle URI(s), because
>> +that client is too dumb to e.g. understand and fully parse out bundle
>> +headers and their prerequisite relationships, or something else.
>
> "too dumb" seems a bit informal to me, especially because you
> immediately elaborate on its meaning. You could rewrite as follows:
>
>   ...because
>   that client can't understand or fully parse out bundle
>   headers and their prerequisite relationships, or something else.

Thanks, I've snipped all your subsequent comments on
phrasing/clarifications etc, except insofar as they have questions I
need to address (as opposed to just my bad grammar/phrasing etc).

Thanks a lot for them, will go through them closely for any subsequent
re-roll & address them.

> [...]
>> +While no per-URI key-value are currently supported currently they're
>> +intended to support future features such as:
>> +
>> + * Add a "hash=<val>" or "size=<bytes>" advertise the expected hash or
>> +   size of the bundle file.
>
> I suppose if one wanted to add this server-to-bundle coupling, then some
> clients might appreciate it.

For packfile-uri there's a hard dependency on the server transferring
the hash of the PACK file.

I've intentionally omitted it, the reasons are covered in [1], which I
realize now should really be part of this early series.

Basically having it as a hard requirement isn't necessary for security
or payload validation. Any server who's worried about their transport
integrity would point to a https URI under their control, any
checksumming and validation we'll need we'll get from the transport
layer and the client's reachability check.

Having it would mean that you need closer cooperation by default between
server and CDN than I'm aiming for, i.e. a server should be able to
point to some URI somewhere updated by a dumb hourly cronjob, without
needing to pass information back & forth about what the "current" URL
is. The client will discover all that.

But I left that "hash=*" in because it could be optionally added, in
case someone really wants it for some reason...

1. https://lore.kernel.org/git/RFC-patch-13.13-1e657ed27a-20210805T150534Z-avarab@gmail.com/

>> + * Advertise that one or more bundle files are the same (to e.g. have
>> +   clients round-robin or otherwise choose one of N possible files).
>
>   * Advertise that one or more bundle files are the same, to allow for
>     redundancy without causing duplicated effort.

*nod*

>> +static void send_bundle_uris(struct packet_writer *writer,
>> +			     struct string_list *uris)
>> +{
>> +	struct string_list_item *item;
>> +
>> +	for_each_string_list_item(item, uris)
>> +		packet_writer_write(writer, "%s", item->string);
>> +}
>> +
>> +static int advertise_bundle_uri = -1;
>> +static struct string_list bundle_uris = STRING_LIST_INIT_DUP;
>
> I see you put send_bundle_uris() before the global bundle_uris so
> it can be independent, but do you expect anyone to call send_bundle_uris()
> via a different list?

No, I'll move that around or rather fold it into bundle_uri_command()
directly.

I think I'd originally copied the structure of send_ref() and ls_refs()
from ls-refs.c, but it doesn't make much sense anymore here for this
2-line function. Thanks.

> Should we find a different place to store this data?
>
>> +static int bundle_uri_config(const char *var, const char *value, void *data)
>> +{
>> +	if (!strcmp(var, "uploadpack.bundleuri")) {
>> +		advertise_bundle_uri = 1;
>> +		string_list_append(&bundle_uris, value);
>> +	}
>> +
>> +	return 0;
>> +}
>
> Here, we are dictating that the URI list is available as a multi-valued
> config "uploadpack.bundleuri".
>
> 1. Should this be updated in Documentation/config/uploadpack.txt?

Definitely. I'll either incorporate that or re-structure this leading
series so that it's more design-doc/protocol focused, in any case all of
this ends up documented in the right places eventually...

> 2. This seems difficult to extend to your possible future features as
>    listed in the protocol docs, mainly because this can only store the
>    flat URI string. To add things like hash values, sizes, and prereqs,
>    you would need more data included and grouped on a per-URI basis.
>    What plans do you have to make extensions here while remaining
>    somewhat compatible with downgrading Git versions?

[...addressed below...]

>> @@ -136,6 +137,11 @@ static struct protocol_capability capabilities[] = {
>>  		.advertise = always_advertise,
>>  		.command = cap_object_info,
>>  	},
>> +	{
>> +		.name = "bundle-uri",
>> +		.advertise = bundle_uri_advertise,
>> +		.command = bundle_uri_command,
>> +	},
>>  };
>
> I really appreciate that it is this simple to extend protocol v2.

Yeah! FWIW I've got some WIP patches to make it easier still, i.e. some
further simplification & validation in the serve.[ch] API.

>> +test_expect_success 'basics of bundle-uri: dies if not enabled' '
>> +	test-tool pkt-line pack >in <<-EOF &&
>> +	command=bundle-uri
>> +	0000
>> +	EOF
>> +
>> +	cat >err.expect <<-\EOF &&
>> +	fatal: invalid command '"'"'bundle-uri'"'"'
>> +	EOF
>> +
>> +	cat >expect <<-\EOF &&
>> +	ERR serve: invalid command '"'"'bundle-uri'"'"'
>> +	EOF
>> +
>> +	test_must_fail test-tool serve-v2 --stateless-rpc <in >out 2>err.actual &&
>> +	test_cmp err.expect err.actual &&
>> +	test_must_be_empty out
>> +'
>> +
>> +
>
> hyper-nit: double newline.
>
> The implementation seems simple enough, which I like. I'm a bit

I mentally inserted the missing "skeptical/uncertain" etc. here :)

> your current use of Git config as the back-end, only because it is
> difficult to be future-proof. As the functionality stands today, the
> current config design works just fine. Perhaps we don't need to
> worry about the future, because we can design a new, complementary
> storage for that extra data. It seems worth exploring for a little
> while, though. Perhaps we should take a page out of 'git-remote'
> and how it stores named remotes with sub-items for metadata. The
> names probably don't need to ever be exposed to users, but it could
> be beneficial to anyone implementing this scheme.
>
> [bundle "main"]
> 	uri = https://example.com/my-bundle
> 	uri = https://redundant-cdn.com/my-bundle
> 	size = 120523
> 	sha256 = {64hexchars}
>
> [bundle "fork"]
> 	uri = https://cdn.org/my-fork
> 	size = 334
> 	sha256 = {...}
> 	prereq = {oid}
>
> This kind of layout has an immediate grouping of data that should
> help any future plan. Notice how I included multiple "uri" lines
> in the "main", which helps with your plan for duplicate URIs.

At first sight I like that config schema much better than my current
one, in particular how it makes the future-proofed "these N urls are one
logical URL" case simpler.

But overall I'm a bit on the fence, and leaning towards keeping what I
have, not out of any lazynes or wanting to just keep what I have mind
you.

But rather that the main benefit of the current one is that it's a 1=1
mapping to the line-based protocol, and you can say update your URLs as:

    git config --replace-all uploadpack.bundleUri "$first_url" &&
    git config --add uploadpack.bundleUri "$second_url"

Having usually you'd know the URL you'd like to replace, so you can use
the [value-pattern] of --replace-all, if it's a named section or other
split-out structure that become a two-step lookup.

Also for testing I've got a (trivial) plumbing tool I'll submit called
"git ls-remote-bundle-uri" (could be folded into something else I guess)
to dump the server-side config, it's nice that you can pretty much
directly copy/paste it into config without needing to adjust it.

Having said all that I'm not sure I feel strongly about it either way,
what do you think given the above?

I think most "real" server operators will use this as
GIT_CONFIG_COUNT=<n> GIT_CONFIG_{KEY,VALUE}_<1..n>, which can of course
work with any config schema, but if you've got code generating it on the
other side naming the targets is probably a slight hassle / confusing.

There's also the small matter of it being consistent with the
packfile-uri config in its current form, but that shouldn't be a reason
not to come up with something better. If anything any better suggestion
(if we go for that) could be supported by it too...

  reply	other threads:[~2021-10-26 15:22 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-25 21:25 [PATCH 0/3] bundle-uri: "dumb" static CDN offloading, spec & server implementation Ævar Arnfjörð Bjarmason
2021-10-25 21:25 ` [PATCH 1/3] leak tests: mark t5701-git-serve.sh as passing SANITIZE=leak Ævar Arnfjörð Bjarmason
2021-10-25 21:25 ` [PATCH 2/3] protocol v2: specify static seeding of clone/fetch via "bundle-uri" Ævar Arnfjörð Bjarmason
2021-10-26 14:00   ` Derrick Stolee
2021-10-26 15:00     ` Ævar Arnfjörð Bjarmason [this message]
2021-10-27  1:55       ` Derrick Stolee
2021-10-27 17:49         ` Ævar Arnfjörð Bjarmason
2021-10-27  2:01   ` Derrick Stolee
2021-10-27  8:29     ` Ævar Arnfjörð Bjarmason
2021-10-27 16:31       ` Derrick Stolee
2021-10-27 18:01         ` Ævar Arnfjörð Bjarmason
2021-10-27 19:23           ` Derrick Stolee
2021-10-27 20:22             ` Ævar Arnfjörð Bjarmason
2021-10-29 18:30               ` Derrick Stolee
2021-10-30 14:51           ` Philip Oakley
2021-10-25 21:25 ` [PATCH 3/3] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2021-10-26 14:05   ` Derrick Stolee
2021-10-29 18:46 ` [PATCH 0/3] bundle-uri: "dumb" static CDN offloading, spec & server implementation Derrick Stolee
2021-10-30  7:21   ` Ævar Arnfjörð Bjarmason
2021-11-01 21:00     ` Derrick Stolee
2021-11-01 23:18       ` Ævar Arnfjörð Bjarmason
2022-03-11 16:24 ` [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 01/13] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 02/13] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 03/13] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 04/13] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 05/13] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 06/13] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 07/13] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 08/13] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 09/13] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 10/13] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 11/13] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 12/13] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 13/13] bundle-uri: make the download program configurable Ævar Arnfjörð Bjarmason
2022-03-11 21:28   ` [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git Derrick Stolee
2022-04-18 17:23   ` [RFC PATCH v2 00/36] bundle-uri: a "dumb CDN" for git + TOC format Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 01/36] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 02/36] dir API: add a generalized path_match_flags() function Ævar Arnfjörð Bjarmason
2022-04-21 17:26       ` Derrick Stolee
2022-04-18 17:23     ` [RFC PATCH v2 03/36] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
2022-04-21 17:28       ` Derrick Stolee
2022-04-18 17:23     ` [RFC PATCH v2 04/36] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 05/36] http: make http_get_file() external Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 06/36] remote: move relative_url() Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 07/36] remote: allow relative_url() to return an absolute url Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 08/36] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 09/36] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 10/36] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 11/36] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 12/36] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 13/36] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 14/36] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 15/36] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 16/36] bundle-uri: make the download program configurable Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 17/36] remote-curl: add 'get' capability Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 18/36] bundle: implement 'fetch' command for direct bundles Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 19/36] bundle: parse table of contents during 'fetch' Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 20/36] bundle: add --filter option to 'fetch' Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 21/36] bundle: allow relative URLs in table of contents Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 22/36] bundle: make it easy to call 'git bundle fetch' Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 23/36] clone: add --bundle-uri option Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 24/36] clone: --bundle-uri cannot be combined with --depth Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 25/36] bundle: only fetch bundles if timestamp is new Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 26/36] fetch: fetch bundles before fetching original data Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 27/36] protocol-caps: implement cap_features() Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 28/36] serve: understand but do not advertise 'features' capability Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 29/36] serve: advertise 'features' when config exists Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 30/36] connect: implement get_recommended_features() Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 31/36] transport: add connections for 'features' capability Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 32/36] clone: use server-recommended bundle URI Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 33/36] t5601: basic bundle URI test Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 34/36] protocol v2: add server-side "bundle-uri" skeleton (docs) Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 35/36] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 36/36] docs: document bundle URI standard Ævar Arnfjörð Bjarmason
2022-04-21 19:54     ` [RFC PATCH v2 00/36] bundle-uri: a "dumb CDN" for git + TOC format Derrick Stolee
2022-04-22  9:37       ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211026.86cznrzu8g.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=albertqcui@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=robbat2@gentoo.org \
    --cc=sandals@crustytoothpaste.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).