user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: public-inbox + mlmmj best practices?
Date: Mon, 4 Jan 2021 15:12:45 -0500	[thread overview]
Message-ID: <20210104201245.cbtqno6cyxw5iycu@chatter.i7.local> (raw)
In-Reply-To: <20201228213139.GA17600@dcvr>

On Mon, Dec 28, 2020 at 09:31:39PM +0000, Eric Wong wrote:
> AFAIK, V2Writable always does the right thing on -purge/-edit;
> at least for WWW users(*).
> 
> V2W does more work in rare cases when history gets rewritten,
> but doesn't track anything beyond the latest indexed commit
> hash.
> 
> In the V2Writable::log_range sub, it uses "git merge-base --is-ancestor"
> (via is_ancestor wrapper) to cover the common case of contiguous history.
> 
> Otherwise, it attempts "git merge-base" to find a common ancestor:
> 
> 	if (common_ancestor_found)
> 		unindex some history starting at common ancestor
> 		reindex from common ancestor
> 	else
> 		unindex all history in epoch
> 		reindex epoch from stratch

I think I understand, but in the case of grok-pi-piper, unindexing is not an
option, since we can't control what the receiving-end app has already done
with the messages we have previously piped to it. We can't assume that it will
do the right thing when it receives duplicate messages, so we need to somehow
make sure that we don't pipe the same message twice.

> AFAIK, the common_ancestor_found case is always true unless
> somebody was wacky enough to run a full gc+prune immediately
> after fetching.  IOW, I don't think the else case happens
> in practice.

:) It kinda does in grok-pi-piper case, since one of the config options is to
continuously "reshallow" the repository to basically contain no objects.

https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git/tree/grokmirror/pi_piper.py#n58

I know that this is "wacky" as you say, but it helps save dramatic amounts of
space when cloning most of lore.kernel.org repositories. We can still use "git
fetch --deepen" when necessary, but this does make it impossible to use the
common ancestor strategy when dealing with history rewrites.

-K

  reply	other threads:[~2021-01-04 20:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-21 21:20 public-inbox + mlmmj best practices? Konstantin Ryabitsev
2020-12-21 21:39 ` Eric Wong
2020-12-22  6:28   ` Eric Wong
2020-12-28 16:22     ` Konstantin Ryabitsev
2020-12-28 21:31       ` Eric Wong
2021-01-04 20:12         ` Konstantin Ryabitsev [this message]
2021-01-05  1:06           ` Eric Wong
2021-01-05  1:29             ` [PATCH] v2writable: exact discontiguous history handling Eric Wong
2021-01-09 22:21               ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210104201245.cbtqno6cyxw5iycu@chatter.i7.local \
    --to=konstantin@linuxfoundation.org \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).