From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6315 166.70.0.0/16 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id C57F91F404; Sat, 11 Aug 2018 01:12:45 +0000 (UTC) Received: from in02.mta.xmission.com ([166.70.13.52]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1foISS-0002x1-6m; Fri, 10 Aug 2018 19:12:44 -0600 Received: from [97.119.167.31] (helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1foISQ-0005rS-Kq; Fri, 10 Aug 2018 19:12:44 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Eric Wong Cc: meta@public-inbox.org References: <87d0urt761.fsf@xmission.com> <20180810174708.i5gnteidb6atyrzr@dcvr> Date: Fri, 10 Aug 2018 20:12:28 -0500 In-Reply-To: <20180810174708.i5gnteidb6atyrzr@dcvr> (Eric Wong's message of "Fri, 10 Aug 2018 17:47:08 +0000") Message-ID: <877ekxiu4j.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1foISQ-0005rS-Kq;;;mid=<877ekxiu4j.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+oDSA/FbOwDn9YnfUk+tG+xNEbnr/Gweg= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH] Import.pm: When purging replace a purged file with a zero length file X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) List-Id: Eric Wong writes: > "Eric W. Biederman" wrote: >> >> This ensures that the number of added files remains the same and thus >> the article numbers derived from a repository will remain the same. >> >> I think this is the last place in public-inbox that has to be tweaked to >> guarantee the generated article number will remain the same in an public >> inbox archive. > > OK, definitely desirable. > >> Signed-off-by: "Eric W. Biederman" >> --- >> lib/PublicInbox/Import.pm | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm >> index bfa7a8053297..3df7d98f298b 100644 >> --- a/lib/PublicInbox/Import.pm >> +++ b/lib/PublicInbox/Import.pm >> @@ -519,11 +519,12 @@ sub purge_oids { >> push @buf, $buf; >> } elsif (/^M 100644 ([a-f0-9]+) (\w+)/) { >> my ($oid, $path) = ($1, $2); >> + $tree->{$path} = 1; >> if ($purge->{$oid}) { >> push @oids, $oid; >> - delete $tree->{$path}; >> + my $cmd = "M 100644 inline $path\ndata 0\n\n"; >> + push @buf, $cmd; >> } else { >> - $tree->{$path} = 1; >> push @buf, $_; >> } >> } elsif (/^D (\w+)/) { >> -- > > OK. I haven't checked, but is the indexing/re-indexing code > able to deal with zero-byte messages? Thanks. The v2mirror test covers this case and it doesn't seem to have any problems. The v2mirror performs an inex_sync after the purge and looks for warnings and doesn't get any. So I think we are ok. Skimming through the code I don't see any obvious issues either. Eric