user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: meta@public-inbox.org
Subject: Re: Warnings from git fsck after lkml import
Date: Thu, 5 Jul 2018 23:13:46 +0000	[thread overview]
Message-ID: <20180705231346.GA6524@dcvr> (raw)
In-Reply-To: <87a7r6z1cy.fsf@xmission.com>

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> It looks like public-inbox has some challenges when importing some
> questionable emails.  The import of lkml has resulted in several commits
> with bad dates that git fsck complains about.  I have previously
> reported this to Konstantin Ryabitsev who maintains kernel.org but since
> I have not seen any discussion of this I thought I should report it
> directly here as well.

Thanks for bringing this up publically.

Yes, I early during v2 development I noticed old mails had some
-1400 timezone values (but the furthest is -1200).  I opted to
attempt to preserve the wonky timezones since fast-import
happily accepts -1400 and I didn't anticipate problems...

> At a practical level these errors initially preventing me from cloning
> the repos as in .gitconfig I had:
> > [transfer]
> >         fsckobjects = true
> > [fetch]
> >         fsckobjects = true
> > [receive]
> >         fsckobjects = true

...But I didn't know people cared to set those :x

Now I wonder if git should only warn for bad-but-still-usable
objects on clone, as I wouldn't consider a malformed date to be
on the level as actual FS corruption.  Or at least complete
the clone and fail with a special exit code.

> Beyond the cloning issue while I don't expect public-inbox to fix the
> emails themselves it should be able to detect and prevent creating
> buggy commits.

Right, the emails themselves have wonky dates.  I got public-inbox
to massage the dates into the bare minimum of what fast-import
finds acceptable(*).  fast-import is rather liberal.

> Importing a large repo like linux-kernel seems like a good test case for
> finding these kinds of issues.

Fwiw, linux.git and git.git both warn about missingTaggerEntry
on fsck, yet clone fine with fsckObjects=true.  Maybe clone
should not abort on badTimeZone, either.  *shrug*



(*) In retrospect, especially with v2 which requires SQLite/Xapian,
    I'm thinking it's not even worth the trouble to parse out
    authorship information for git commit headers.  Not sure if
    people would still use things like "git log --author=" for
    v2...

  reply	other threads:[~2018-07-05 23:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-05  5:40 Warnings from git fsck after lkml import Eric W. Biederman
2018-07-05 23:13 ` Eric Wong [this message]
2018-07-06  0:36   ` Eric W. Biederman
2018-07-06  3:47     ` Eric W. Biederman
2018-07-06 21:32       ` [PATCH] MsgTime.pm: Use strptime to compute the time zone Eric W. Biederman
2018-07-06 22:22         ` Eric Wong
2018-07-07 18:18           ` Eric W. Biederman
2018-07-07 18:22           ` [PATCH] Import: Don't copy nulls from emails into git Eric W. Biederman
2018-07-08  0:07             ` Eric Wong
2018-07-08  1:52               ` Eric W. Biederman
2018-07-12 18:31   ` Warnings from git fsck after lkml import Konstantin Ryabitsev
2018-07-12 22:19     ` Eric W. Biederman
2018-07-12 22:29     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180705231346.GA6524@dcvr \
    --to=e@80x24.org \
    --cc=ebiederm@xmission.com \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).