From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 6269F1F62D; Thu, 5 Jul 2018 23:13:46 +0000 (UTC) Date: Thu, 5 Jul 2018 23:13:46 +0000 From: Eric Wong To: "Eric W. Biederman" Cc: meta@public-inbox.org Subject: Re: Warnings from git fsck after lkml import Message-ID: <20180705231346.GA6524@dcvr> References: <87a7r6z1cy.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87a7r6z1cy.fsf@xmission.com> List-Id: "Eric W. Biederman" wrote: > It looks like public-inbox has some challenges when importing some > questionable emails. The import of lkml has resulted in several commits > with bad dates that git fsck complains about. I have previously > reported this to Konstantin Ryabitsev who maintains kernel.org but since > I have not seen any discussion of this I thought I should report it > directly here as well. Thanks for bringing this up publically. Yes, I early during v2 development I noticed old mails had some -1400 timezone values (but the furthest is -1200). I opted to attempt to preserve the wonky timezones since fast-import happily accepts -1400 and I didn't anticipate problems... > At a practical level these errors initially preventing me from cloning > the repos as in .gitconfig I had: > > [transfer] > > fsckobjects = true > > [fetch] > > fsckobjects = true > > [receive] > > fsckobjects = true ...But I didn't know people cared to set those :x Now I wonder if git should only warn for bad-but-still-usable objects on clone, as I wouldn't consider a malformed date to be on the level as actual FS corruption. Or at least complete the clone and fail with a special exit code. > Beyond the cloning issue while I don't expect public-inbox to fix the > emails themselves it should be able to detect and prevent creating > buggy commits. Right, the emails themselves have wonky dates. I got public-inbox to massage the dates into the bare minimum of what fast-import finds acceptable(*). fast-import is rather liberal. > Importing a large repo like linux-kernel seems like a good test case for > finding these kinds of issues. Fwiw, linux.git and git.git both warn about missingTaggerEntry on fsck, yet clone fine with fsckObjects=true. Maybe clone should not abort on badTimeZone, either. *shrug* (*) In retrospect, especially with v2 which requires SQLite/Xapian, I'm thinking it's not even worth the trouble to parse out authorship information for git commit headers. Not sure if people would still use things like "git log --author=" for v2...