From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.9 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 9CAD91F404; Fri, 10 Aug 2018 17:47:08 +0000 (UTC) Date: Fri, 10 Aug 2018 17:47:08 +0000 From: Eric Wong To: "Eric W. Biederman" Cc: meta@public-inbox.org Subject: Re: [PATCH] Import.pm: When purging replace a purged file with a zero length file Message-ID: <20180810174708.i5gnteidb6atyrzr@dcvr> References: <87d0urt761.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87d0urt761.fsf@xmission.com> List-Id: "Eric W. Biederman" wrote: > > This ensures that the number of added files remains the same and thus > the article numbers derived from a repository will remain the same. > > I think this is the last place in public-inbox that has to be tweaked to > guarantee the generated article number will remain the same in an public > inbox archive. OK, definitely desirable. > Signed-off-by: "Eric W. Biederman" > --- > lib/PublicInbox/Import.pm | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm > index bfa7a8053297..3df7d98f298b 100644 > --- a/lib/PublicInbox/Import.pm > +++ b/lib/PublicInbox/Import.pm > @@ -519,11 +519,12 @@ sub purge_oids { > push @buf, $buf; > } elsif (/^M 100644 ([a-f0-9]+) (\w+)/) { > my ($oid, $path) = ($1, $2); > + $tree->{$path} = 1; > if ($purge->{$oid}) { > push @oids, $oid; > - delete $tree->{$path}; > + my $cmd = "M 100644 inline $path\ndata 0\n\n"; > + push @buf, $cmd; > } else { > - $tree->{$path} = 1; > push @buf, $_; > } > } elsif (/^D (\w+)/) { > -- OK. I haven't checked, but is the indexing/re-indexing code able to deal with zero-byte messages? Thanks.