git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: John Cai <johncai86@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	Robert Coup <robert.coup@koordinates.com>,
	John Cai via GitGitGadget <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 0/4] [RFC] repack: add --filter=
Date: Sat, 26 Feb 2022 16:05:37 -0500	[thread overview]
Message-ID: <5106811D-2937-49CB-AC93-875D3B3BC241@gmail.com> (raw)
In-Reply-To: <YhqNy+t5SARNivQ5@nand.local>

Hi Taylor,

On 26 Feb 2022, at 15:30, Taylor Blau wrote:

> On Sat, Feb 26, 2022 at 03:19:11PM -0500, John Cai wrote:
>> Thanks for bringing this up again. I meant to write back regarding what you raised
>> in the other part of this thread. I think this is a valid concern. To attain the
>> goal of offloading certain blobs onto another server(B) and saving space on a git
>> server(A), then there will essentially be two steps. One to upload objects to (B),
>> and one to remove objects from (A). As you said, these two need to be the inverse of each
>> other or else you might end up with missing objects.
>
> Do you mean that you want to offload objects both from a local clone of
> some repository, _and_ the original remote it was cloned from?

yes, exactly. The "another server" would be something like an http server, OR another remote
which hosts a subset of the objects (let's say the large blobs).
>
> I don't understand what the role of "another server" is here. If this
> proposal was about making it easy to remove objects from a local copy of
> a repository based on a filter provided that there was a Git server
> elsewhere that could act as a promisor remote, than that makes sense to
> me.
>
> But I think I'm not quite understanding the rest of what you're
> suggesting.

Sorry for the lack of clarity here. The goal is to make it easy for a remote to offload a subset
of its objects to __another__ remote (either a Git server or an http server through a remote helper).
>
>>> My other concern was around what guarantees we currently provide for a
>>> promisor remote. My understanding is that we expect an object which was
>>> received from the promisor remote to always be fetch-able later on. If
>>> that's the case, then I don't mind the idea of refiltering a repository,
>>> provided that you only need to specify a filter once.
>>
>> Could you clarify what you mean by re-filtering a repository? By that I assumed
>> it meant specifying a filter eg: 100mb, and then narrowing it by specifying a
>> 50mb filter.
>
> I meant: applying a filter to a local clone (either where there wasn't a
> filter before, or a filter which matched more objects) and then removing
> objects that don't match the filter.
>
> But your response makes me think of another potential issue. What
> happens if I do the following:
>
>     $ git repack -ad --filter=blob:limit=100k
>     $ git repack -ad --filter=blob:limit=200k
>
> What should the second invocation do? I would expect that it needs to do
> a fetch from the promisor remote to recover any blobs between (100, 200]
> KB in size, since they would be gone after the first repack.
>
> This is a problem not just with two consecutive `git repack --filter`s,
> I think, since you could cook up the same situation with:
>
>     $ git clone --filter=blob:limit=100k git@github.com:git
>     $ git -C git repack -ad --filter=blob:limit=200k
>
> I don't think the existing patches handle this situation, so I'm curious
> whether it's something you have considered or not before.

I have not-will have to think through this case, but this sound similar to
what [1] is about.
is about.

>
> (Unrelated to the above, but please feel free to trim any quoted parts
> of emails when responding if they get overly long.)
>
> Thanks,
> Taylor

Thanks
John

1. https://lore.kernel.org/git/pull.1138.v2.git.1645719218.gitgitgadget@gmail.com/

  reply	other threads:[~2022-02-26 21:06 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-27  1:49 [PATCH 0/2] repack: add --filter= John Cai via GitGitGadget
2022-01-27  1:49 ` [PATCH 1/2] pack-objects: allow --filter without --stdout John Cai via GitGitGadget
2022-01-27  1:49 ` [PATCH 2/2] repack: add --filter=<filter-spec> option John Cai via GitGitGadget
2022-01-27 15:03   ` Derrick Stolee
2022-01-29 19:14     ` John Cai
2022-01-30  8:16       ` Christian Couder
2022-01-30 13:02       ` John Cai
2022-02-09  2:10 ` [PATCH v2 0/4] [RFC] repack: add --filter= John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 1/4] pack-objects: allow --filter without --stdout John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 2/4] repack: add --filter=<filter-spec> option John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 3/4] upload-pack: allow missing promisor objects John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 4/4] tests for repack --filter mode John Cai via GitGitGadget
2022-02-17 16:14     ` Robert Coup
2022-02-17 20:36       ` John Cai
2022-02-09  2:27   ` [PATCH v2 0/4] [RFC] repack: add --filter= John Cai
2022-02-16 15:39   ` Robert Coup
2022-02-16 21:07     ` John Cai
2022-02-21  3:11       ` Taylor Blau
2022-02-21 15:38         ` Robert Coup
2022-02-21 17:57           ` Taylor Blau
2022-02-21 21:10         ` Christian Couder
2022-02-21 21:42           ` Taylor Blau
2022-02-22 17:11             ` Christian Couder
2022-02-22 17:33               ` Taylor Blau
2022-02-23 15:40               ` Robert Coup
2022-02-23 19:31               ` Junio C Hamano
2022-02-26 16:01                 ` John Cai
2022-02-26 17:29                   ` Taylor Blau
2022-02-26 20:19                     ` John Cai
2022-02-26 20:30                       ` Taylor Blau
2022-02-26 21:05                         ` John Cai [this message]
2022-02-26 21:44                           ` Taylor Blau
2022-02-22 18:52             ` John Cai
2022-02-22 19:35               ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5106811D-2937-49CB-AC93-875D3B3BC241@gmail.com \
    --to=johncai86@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=robert.coup@koordinates.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).