git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Nikolay Edigaryev <edigaryev@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Nikolay Edigaryev via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org,  Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH] clone: support cloning of filtered bundles
Date: Wed, 17 Jan 2024 00:06:39 +0400	[thread overview]
Message-ID: <CAFX5hXR1S_pLr+HTyR5ucazniO+4qZC_DyivDELitwS-Emx4GQ@mail.gmail.com> (raw)
In-Reply-To: <xmqq8r4r5ovf.fsf@gitster.g>

Hello Junio and Phillip,

Thanks a lot for the explanations of how this is supposed to work. It
seems that to make this work properly, we'd need to:

(1) add an argument (or an option) to 'git bundle create', so that
    the user will be able to explicitly request the inclusion of a
    desired remote's URL

Without such mechanism in place data leak is possible, e.g. remote with
credentials hardcoded in it.

(2) extend the 'gitformat-bundle' to include 'url'

However, a remote can have multiple URLs and other remote-specific
options might be necessary to properly work with it.

(3) add an argument (or an option) to 'git clone', so that the user
    will be able to explicitly request the write of the URL contained
    in the bundle to the repository's config

Otherwise, it's insecure, e.g. someone might craft a bundle with a URL
that collects data from the user.

I don't want waste anyone's time on this anymore because I've toyed with
'git bundle' a bit more and realized that what I'm trying to accomplish
can be done the other way:

1. git init

2. git bundle unbundle <PATH> | <script that swaps hashes and refs in
   'git bundle unbundle output' and feeds them to 'git update-ref'>

Hopefully this discussion will be useful for people looking to
accomplish something similar to what I've described in the initial
message.

On Mon, Jan 15, 2024 at 6:09 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > "Nikolay Edigaryev via GitGitGadget" <gitgitgadget@gmail.com>
> > writes:
> >
> >> diff --git a/builtin/clone.c b/builtin/clone.c
> >> index c6357af9498..4b3fedf78ed 100644
> >> --- a/builtin/clone.c
> >> +++ b/builtin/clone.c
> >> @@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> >>
> >>              if (fd > 0)
> >>                      close(fd);
> >> +
> >> +            if (has_filter) {
> >> +                    strbuf_addf(&key, "remote.%s.promisor", remote_name);
> >> +                    git_config_set(key.buf, "true");
> >> +                    strbuf_reset(&key);
> >> +
> >> +                    strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name);
> >> +                    git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter));
> >> +                    strbuf_reset(&key);
> >> +            }
> >> +
> >
> >> -# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered
> >> -# bundle, but that requires a change to promisor/filter config options.
> > ...
> > But a bundle that were created with objects _omitted_ already?
> > ... the source of this clone operation, i.e. the bundle file that is
> > pointed at by "remote.$remote_name.url", cannot be that promisor.
>
> Extending the above a bit, one important way a bundle is used is as
> a medium for sneaker-net.  Instead of making a full clone over the
> network, if you can create a bundle that records all objects and all
> refs out of the source repository and then unbundle it in a
> different place to create a repository, you can tweak the resulting
> repository by either adding a separete remote or changing the
> remote.origin.url so that your subsequent fetch goes over the
> network to the repository you took the initial bundle from.
>
> The "tweak the resulting repository" part however MUST be done
> manually with the current system.  If we can optionally record the
> publically reachable URL of the source repository when we create a
> bundle file, and "git clone" on the receiving side can read the URL
> out of the bundle and act on it (e.g., show it to the user and offer
> to record it as remote.origin.url in the resulting repository---I do
> not think it is wise to do this silently without letting the user
> know from security's point of view), then the use of bundle files as
> a medium for sneaker-netting will become even easier.
>
> And once that is done, perhaps allowing a filtered bundle to act as
> a sneaker-net medium to simulate an initial filtered clone would
> make sense.  The promisor as well as the origin will be the network
> reachable URL and subsequent fetches (both deliberate ones via "git
> fetch" as well as lazy on-demand ones that backfills missing objects
> via the "promisor" access) would become possible.
>
> But without such a change to the bundle file format, allowing
> "clone" to finish and pretend the resulting repository is usable is
> somewhat irresponsible to the users.  The on-demand lazy fetch would
> fail after this code cloned from such a filtered bundle, no?


      reply	other threads:[~2024-01-16 22:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-14 11:16 [PATCH] clone: support cloning of filtered bundles Nikolay Edigaryev via GitGitGadget
2024-01-14 18:00 ` Phillip Wood
2024-01-14 19:39   ` Nikolay Edigaryev
2024-01-14 21:26     ` Nikolay Edigaryev
2024-01-15 10:35       ` phillip.wood123
2024-01-15 10:18     ` phillip.wood123
2024-01-15  1:13 ` Junio C Hamano
2024-01-15  2:09   ` Junio C Hamano
2024-01-16 20:06     ` Nikolay Edigaryev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFX5hXR1S_pLr+HTyR5ucazniO+4qZC_DyivDELitwS-Emx4GQ@mail.gmail.com \
    --to=edigaryev@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).