git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: novalis@novalis.org, Kaushik Srenevasan <kaushik@twitter.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [RFC] Extending git-replace
Date: Tue, 14 Jan 2020 12:39:49 -0800	[thread overview]
Message-ID: <CABPp-BF8OHoHo73doekKzf0CmO09_PyAfe4q__DvoftQ+BeY2w@mail.gmail.com> (raw)
In-Reply-To: <20200114190339.120452-1-jonathantanmy@google.com>

On Tue, Jan 14, 2020 at 11:05 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > That is, would it be sufficient if every replaced file were replaced
> > with the exact text "me caga en la leche" instead of a custom hand-
> > crafted replacement?  I guess it's a bit complicated because while
> > that's a reasonable blob, it's not a valid commit.  So maybe this
> > mechanism would be limited to blobs.  I thought about whether we could
> > a different flavor of replacement for commits, but those generally have
> > to be custom because they each have different parents.
>
> Since the original email just discussed blobs, I'll confine myself to
> discussing blobs. (Commits are trickier, as you said.)
>
> > And if that would be sufficient, could promisors be used for this?  I
> > don't know how those interact with fsck and the other commands that
> > you're worried about.  Basically, the idea would be to use most of the
> > existing promisor code, and then have a mode where instead of visiting
> > the promisor, we just always return "me caga en la leche" (and this
> > does not have its SHA checked, of course).

Maybe; it doesn't necessarily need to be the same object returned, and
these replacements could be user-specified via replace refs...

> Missing promisor objects do not prevent fsck from passing - this is part
> of the original design (any packfiles we download from the specifically
> designated promisor remote are marked as such, and any objects that the
> objects in the packfile refer to are considered OK to be missing).

Is there ever a risk that objects in the downloaded packfile come
across as deltas against other objects that are missing/excluded, or
does the partial clone machinery ensure that doesn't happen?  (Because
this was certainly the biggest pain-point with my "fake cheap clone"
hacks.)

> Currently, when a missing object is read, it is first fetched (there are
> some more details that I can go over if you have any specific
> questions). What you're suggesting here is to return a fake blob with
> wrong hash - I haven't looked at all the callers of read-object
> functions in detail, but I don't think all of them are ready for such a
> behavioral change.

git-replace already took care of that for you and provides that
guarantee, modulo the --no-replace-objects & fsck & prune & fetch &
whatnot cases that ignore replace objects as Kaushik mentioned.  I
took advantage of this to great effect with my "fake cheap clone"
hacks.  Based in part on your other email where you made a suggestion
about promisors, I'm starting to think a pretty good first cut
solution might look like the following:

  * user manually adds a bunch of replace refs to map the unwanted big
blobs to something else (e.g. a README about how the files were
stripped, or something similar to this)
  * a partial clone specification that says "exclude objects that are
referenced by replace refs"
  * add a fake promisor to the downloaded promisor pack so that if
anyone runs with --no-replace-objects or similar then they get an
error saying the specified objects don't exist and can't be
downloaded.

Anyone see any obvious problems with this?

>  Maybe it would be sufficient to just make this work
> in a more limited scope (e.g. checkout only - and if we need different
> replacement blobs for different object IDs, maybe we could have
> something similar to the clean/smudge filters).

>
> > This could work together with some sort refs/blacklist mechanism to
> > enable the server to choose which objects the client replaces.
>
> In the original email, Kaushik mentioned objects larger than a certain
> size - we already have support for that (--filter=blob:limit=1000000,
> for example). Having said that, Git is already able to tolerate any
> exclusion (of tree or blob) from the server - we already need this in
> order to support changing of filters, for example.

  reply	other threads:[~2020-01-14 20:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14  5:33 [RFC] Extending git-replace Kaushik Srenevasan
2020-01-14  6:55 ` Elijah Newren
2020-01-14 19:11   ` Jonathan Tan
2020-01-16  3:30   ` Kaushik Srenevasan
2020-01-14 18:19 ` David Turner
2020-01-14 19:03   ` Jonathan Tan
2020-01-14 20:39     ` Elijah Newren [this message]
2020-01-14 21:57       ` Jonathan Tan
2020-01-14 22:46         ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BF8OHoHo73doekKzf0CmO09_PyAfe4q__DvoftQ+BeY2w@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=kaushik@twitter.com \
    --cc=novalis@novalis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).