user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: Re: public-inbox + mlmmj best practices?
Date: Tue, 5 Jan 2021 01:06:43 +0000	[thread overview]
Message-ID: <20210105010643.GA20926@dcvr> (raw)
In-Reply-To: <20210104201245.cbtqno6cyxw5iycu@chatter.i7.local>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Dec 28, 2020 at 09:31:39PM +0000, Eric Wong wrote:
> > AFAIK, V2Writable always does the right thing on -purge/-edit;
> > at least for WWW users(*).
> > 
> > V2W does more work in rare cases when history gets rewritten,
> > but doesn't track anything beyond the latest indexed commit
> > hash.
> > 
> > In the V2Writable::log_range sub, it uses "git merge-base --is-ancestor"
> > (via is_ancestor wrapper) to cover the common case of contiguous history.
> > 
> > Otherwise, it attempts "git merge-base" to find a common ancestor:
> > 
> > 	if (common_ancestor_found)
> > 		unindex some history starting at common ancestor
> > 		reindex from common ancestor
> > 	else
> > 		unindex all history in epoch
> > 		reindex epoch from stratch
> 
> I think I understand, but in the case of grok-pi-piper, unindexing is not an
> option, since we can't control what the receiving-end app has already done
> with the messages we have previously piped to it. We can't assume that it will
> do the right thing when it receives duplicate messages, so we need to somehow
> make sure that we don't pipe the same message twice.

Nevermind, I just reread my code more carefully :x

Actually the unindexing code currently stores an {unindexed}
hash which is a { Message-ID => (NNTP )num } mapping

Which allows most unedited messages keep the same NNTP article
number so clients don't see it twice.  "Most" meaning non-broken
messages which don't have reused Message-IDs.

I'm thinking {unindexed} should be a
{ OID => [ num, Message-ID ] } mapping

That would allow the new version of the edited message to be
piped and seen by NNTP/IMAP readers.

You *do* want to pipe the new version of the message you've
edited, right?

> > AFAIK, the common_ancestor_found case is always true unless
> > somebody was wacky enough to run a full gc+prune immediately
> > after fetching.  IOW, I don't think the else case happens
> > in practice.
> 
> :) It kinda does in grok-pi-piper case, since one of the config options is to
> continuously "reshallow" the repository to basically contain no objects.
> 
> https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git/tree/grokmirror/pi_piper.py#n58
> 
> I know that this is "wacky" as you say, but it helps save dramatic amounts of
> space when cloning most of lore.kernel.org repositories. We can still use "git
> fetch --deepen" when necessary, but this does make it impossible to use the
> common ancestor strategy when dealing with history rewrites.

Understood.  So yeah, actually the current {unindexed} hash in
V2Writable mostly does what we want, but I'm preparing a patch
which does the aforementioned { OID => [ num, Message-ID ] }
mapping.

  reply	other threads:[~2021-01-05  1:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-21 21:20 public-inbox + mlmmj best practices? Konstantin Ryabitsev
2020-12-21 21:39 ` Eric Wong
2020-12-22  6:28   ` Eric Wong
2020-12-28 16:22     ` Konstantin Ryabitsev
2020-12-28 21:31       ` Eric Wong
2021-01-04 20:12         ` Konstantin Ryabitsev
2021-01-05  1:06           ` Eric Wong [this message]
2021-01-05  1:29             ` [PATCH] v2writable: exact discontiguous history handling Eric Wong
2021-01-09 22:21               ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210105010643.GA20926@dcvr \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).