* Anyone know what is creating commits with bogus dates? @ 2020-05-21 17:49 Elijah Newren 2020-05-21 18:12 ` Eric Sunshine 2020-05-21 18:57 ` Jeff King 0 siblings, 2 replies; 5+ messages in thread From: Elijah Newren @ 2020-05-21 17:49 UTC (permalink / raw) To: Git Mailing List Hi, I wanted to report that we seem to have a number of repos in the wild with bogus (as in "won't even parse") dates. I first discovered such a repository in the wild a while ago with rails.git. It has a commit with a recorded timezone of "+051800" for both author and committer. Everything else about the commit looks fine. See https://github.com/rails/rails/commit/4cf94979c9f4d6683c9338d694d5eb3106a4e734. Some google searches at the time turned up a few other examples, all with the same "+051800" issue. I put a special workaround for it into filter-repo because I figured it was slightly prominent but probably limited to that special timezone. The fact that it was six digits but the last two were zeros made it seem not quite as bad as it could be. Recently, I've gotten a few reports of commits existing in live repos with other forms of breakage. Dates that correspond to roughly a hundred years in the future, and with timezone specifiers that are either 7 or 8 digits long and no leading or trailing zeros. See https://github.com/newren/git-filter-repo/issues/88. Is there rogue software somewhere creating some bad commit objects? Does anyone have any idea what might be causing these? Any suggestions on the right course(s) of action? Right now, I'm leaning towards saying fast-import is right to bail on these, and maybe I'll investigate having filter-repo waste some cycles to try to detect these and report more helpful errors (though I really want to avoid full sanity checking as that's expensive). If I go that route, I'll probably make it link to the example command I provided in that bug report showing users how they can use filter-repo to fix those problems. Does that sound sane? Anything I'm missing? Thanks, Elijah ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Anyone know what is creating commits with bogus dates? 2020-05-21 17:49 Anyone know what is creating commits with bogus dates? Elijah Newren @ 2020-05-21 18:12 ` Eric Sunshine 2020-05-21 18:57 ` Jeff King 1 sibling, 0 replies; 5+ messages in thread From: Eric Sunshine @ 2020-05-21 18:12 UTC (permalink / raw) To: Elijah Newren; +Cc: Git Mailing List On Thu, May 21, 2020 at 1:49 PM Elijah Newren <newren@gmail.com> wrote: > I first discovered such a repository in the wild a while ago with > rails.git. It has a commit with a recorded timezone of "+051800" for > both author and committer. Everything else about the commit looks > fine. See https://github.com/rails/rails/commit/4cf94979c9f4d6683c9338d694d5eb3106a4e734. > > Some google searches at the time turned up a few other examples, all > with the same "+051800" issue. I put a special workaround for it into > filter-repo because I figured it was slightly prominent but probably > limited to that special timezone. The fact that it was six digits but > the last two were zeros made it seem not quite as bad as it could be. In git.git itself, there were a couple bugs fixed [1,2] in which the author header lacked a NUL-terminator after the timezone and in which an error condition wasn't checked which led to the last digit of the timezone being duplicated, however, if I recall correctly, those only cropped up with "git rebase -i --root" (and only the author date-stamp was affected), which doesn't seem to correspond to the cases you're fixing. Perhaps those commits were created by tools other than git.git? [1]: ca3e1826a0 (sequencer: fix "rebase -i --root" corrupting author header, 2018-07-31) [2]: 0f16c09aae (sequencer: fix "rebase -i --root" corrupting author header timezone, 2018-07-31) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Anyone know what is creating commits with bogus dates? 2020-05-21 17:49 Anyone know what is creating commits with bogus dates? Elijah Newren 2020-05-21 18:12 ` Eric Sunshine @ 2020-05-21 18:57 ` Jeff King 2020-05-21 19:31 ` Elijah Newren 1 sibling, 1 reply; 5+ messages in thread From: Jeff King @ 2020-05-21 18:57 UTC (permalink / raw) To: Elijah Newren; +Cc: Git Mailing List On Thu, May 21, 2020 at 10:49:17AM -0700, Elijah Newren wrote: > I wanted to report that we seem to have a number of repos in the wild > with bogus (as in "won't even parse") dates. > > I first discovered such a repository in the wild a while ago with > rails.git. It has a commit with a recorded timezone of "+051800" for > both author and committer. Everything else about the commit looks > fine. See https://github.com/rails/rails/commit/4cf94979c9f4d6683c9338d694d5eb3106a4e734. > > Some google searches at the time turned up a few other examples, all > with the same "+051800" issue. I put a special workaround for it into > filter-repo because I figured it was slightly prominent but probably > limited to that special timezone. The fact that it was six digits but > the last two were zeros made it seem not quite as bad as it could be. I can't remember the source of the bug, but we've had a workaround in GitHub's incoming fsck checks to allow 6-digit zones like this since August 2011. I'm almost certain that it came up because of that rails/rails commit, but I don't remember the culprit implementation. I'm sure we would have dug it up and fixed it at the time. Sadly our commit message for the fsck tweak gives no further details, nor can I dig up anything out of issues/etc. I _think_ it wasn't GitHub/grit which did this (the 0-prefixed tree modes you might come across are, though). I couldn't find any mention of the fix there, at least. I'd suspect perhaps libgit2, but I also couldn't find any fix. But I think it would be safe to assume the bug is long-since fixed, and it's nice if you can be a bit more lenient on the parsing for historical issues like this. Arguably fast-export ought to be normalizing it to something syntactically correct (just like we probably do with other unparsable dates), though I guess you could argue that a filter might want to see the broken form in order to fix it in a custom way. -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Anyone know what is creating commits with bogus dates? 2020-05-21 18:57 ` Jeff King @ 2020-05-21 19:31 ` Elijah Newren 2020-05-21 19:55 ` Jeff King 0 siblings, 1 reply; 5+ messages in thread From: Elijah Newren @ 2020-05-21 19:31 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List On Thu, May 21, 2020 at 11:57 AM Jeff King <peff@peff.net> wrote: > > On Thu, May 21, 2020 at 10:49:17AM -0700, Elijah Newren wrote: > > > I wanted to report that we seem to have a number of repos in the wild > > with bogus (as in "won't even parse") dates. > > > > I first discovered such a repository in the wild a while ago with > > rails.git. It has a commit with a recorded timezone of "+051800" for > > both author and committer. Everything else about the commit looks > > fine. See https://github.com/rails/rails/commit/4cf94979c9f4d6683c9338d694d5eb3106a4e734. > > > > Some google searches at the time turned up a few other examples, all > > with the same "+051800" issue. I put a special workaround for it into > > filter-repo because I figured it was slightly prominent but probably > > limited to that special timezone. The fact that it was six digits but > > the last two were zeros made it seem not quite as bad as it could be. > > I can't remember the source of the bug, but we've had a workaround in > GitHub's incoming fsck checks to allow 6-digit zones like this since > August 2011. I'm almost certain that it came up because of that > rails/rails commit, but I don't remember the culprit implementation. I'm > sure we would have dug it up and fixed it at the time. What about 7- and 8- digit timezones (like the ones in the linked filter-repo issue report)? Do you currently prevent users from pushing those to GitHub, or do you allow those too? I'm curious about whether there is anything else out there that might help flag these commits or if it's just filter-repo. > Sadly our commit message for the fsck tweak gives no further details, > nor can I dig up anything out of issues/etc. > > I _think_ it wasn't GitHub/grit which did this (the 0-prefixed tree > modes you might come across are, though). I couldn't find any mention of > the fix there, at least. I'd suspect perhaps libgit2, but I also > couldn't find any fix. > > But I think it would be safe to assume the bug is long-since fixed, and > it's nice if you can be a bit more lenient on the parsing for historical > issues like this. Arguably fast-export ought to be normalizing it to > something syntactically correct (just like we probably do with other > unparsable dates), though I guess you could argue that a filter might > want to see the broken form in order to fix it in a custom way. If we're going to be more lenient on the parsing, does that suggest fast-import shouldn't die on these? Currently, fast-import is the thing dying, not fast-export or filter-repo (though filter-repo of course halts when it notices that fast-import has died under it). I put in special-case code in filter-repo to munge the +051800 timezone case to keep fast-import from dying, but these new cases seem to suggest it's not just one bad timezone that I can check for and correct, but rather that they are completely random 7- or 8- (or who knows how many) digit timezones coupled with bogus (century-into-the-future) unix epochs. I'm a little less comfortable working around all of these than the very specific +051800 issue. On the filter-repo side, I think the most I would want to do here is provide cleaner warning or error messages than "fast-import died, here's a traceback." But I'm unusre if there are other steps we should take as well, such as making the fast-import parser more lenient. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Anyone know what is creating commits with bogus dates? 2020-05-21 19:31 ` Elijah Newren @ 2020-05-21 19:55 ` Jeff King 0 siblings, 0 replies; 5+ messages in thread From: Jeff King @ 2020-05-21 19:55 UTC (permalink / raw) To: Elijah Newren; +Cc: Git Mailing List On Thu, May 21, 2020 at 12:31:36PM -0700, Elijah Newren wrote: > > I can't remember the source of the bug, but we've had a workaround in > > GitHub's incoming fsck checks to allow 6-digit zones like this since > > August 2011. I'm almost certain that it came up because of that > > rails/rails commit, but I don't remember the culprit implementation. I'm > > sure we would have dug it up and fixed it at the time. > > What about 7- and 8- digit timezones (like the ones in the linked > filter-repo issue report)? Do you currently prevent users from > pushing those to GitHub, or do you allow those too? > I'm curious about whether there is anything else out there that might > help flag these commits or if it's just filter-repo. Our loosening allows any size: --- a/fsck.c +++ b/fsck.c [...] @@ -772,14 +778,16 @@ static int fsck_ident(const char **ident, if ((end == p || *end != ' ')) return report(options, oid, type, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date"); p = end + 1; - if ((*p != '+' && *p != '-') || - !isdigit(p[1]) || - !isdigit(p[2]) || - !isdigit(p[3]) || - !isdigit(p[4]) || - (p[5] != '\n')) + if (*p != '+' && *p != '-') return report(options, oid, type, FSCK_MSG_BAD_TIMEZONE, "invalid author/committer line - bad time zone"); - p += 6; + p++; + + do { + if (!isdigit(*p)) + return report(options, oid, type, FSCK_MSG_BAD_TIMEZONE, "invalid author/committer line - bad time zone"); + p++; + } while (*p != '\n'); + I don't remember the nature of the bug well enough to know if the longer ones are likely to have the same cause. > > But I think it would be safe to assume the bug is long-since fixed, and > > it's nice if you can be a bit more lenient on the parsing for historical > > issues like this. Arguably fast-export ought to be normalizing it to > > something syntactically correct (just like we probably do with other > > unparsable dates), though I guess you could argue that a filter might > > want to see the broken form in order to fix it in a custom way. > > If we're going to be more lenient on the parsing, does that suggest > fast-import shouldn't die on these? Currently, fast-import is the > thing dying, not fast-export or filter-repo (though filter-repo of > course halts when it notices that fast-import has died under it). Ah, I thought filter-repo was noticing. I think it would be nice for either fast-export or fast-import to normalize syntactically invalid values to something sane (like just resetting a bogus timezone to +0000). I could see arguments for putting it in either spot (putting it in fast-import has the downside that we wouldn't catch invalid output generated by a script; putting it in fast-export has the downside that you can't notice and fix it up yourself if you choose to). Probably it should be in fast-export, but with an option to turn it off for people who want more control (or leave it off by default, and let people who run into problems turn it on). We already have similar options for handling un-exportable cases like signed tags. > I put in special-case code in filter-repo to munge the +051800 > timezone case to keep fast-import from dying, but these new cases seem > to suggest it's not just one bad timezone that I can check for and > correct, but rather that they are completely random 7- or 8- (or who > knows how many) digit timezones coupled with bogus > (century-into-the-future) unix epochs. I'm a little less comfortable > working around all of these than the very specific +051800 issue. On > the filter-repo side, I think the most I would want to do here is > provide cleaner warning or error messages than "fast-import died, > here's a traceback." But I'm unusre if there are other steps we > should take as well, such as making the fast-import parser more > lenient. I think if filter-repo does anything, it would probably be to read any syntactically invalid timezone and normalize it. But again, I think it would be fine to push that into fast-export. -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-05-21 19:55 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-21 17:49 Anyone know what is creating commits with bogus dates? Elijah Newren 2020-05-21 18:12 ` Eric Sunshine 2020-05-21 18:57 ` Jeff King 2020-05-21 19:31 ` Elijah Newren 2020-05-21 19:55 ` Jeff King
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).