git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ben Lynn" <benlynn@gmail.com>
To: git@vger.kernel.org
Subject: git bugs
Date: Tue, 10 Jun 2008 01:41:34 -0700	[thread overview]
Message-ID: <832adb090806100141n69c086a2v2f59fe94b2f4ead3@mail.gmail.com> (raw)

I've come across a couple of bugs. Most users will probably never
encounter them, but I think they ought to be fixed. Apologies if
they're well-known issues. I haven't read much of this list.

1. The import/export language poorly handles distinct initial commits
on the same branch, because given two commits with same branch name,
it assumes the latter is the descendant of the former (if there are no
"from" commands).

Normally this is what you want. But if your project, like git, ever
merges distinct initial commits, then all but the first will
unexpectedly gain parents, corrupting all their descendants' hashes.
For example:

$ git clone git://git.kernel.org/pub/scm/git/git.git
$ git checkout -b test 60ea0fdd7de001405fcc7591beb18a66a1f0dd09
$ git fast-export test > /tmp/x
$ cd /some/empty/dir
$ git init
$ git fast-import < /tmp/x
$ git checkout test

Importing a pristine export, we discover Linus did not in fact make
the first commit to the git project:

$ git log d154ebcc23bfcec2ed44e365af9e5c14c6e22015

As a workaround, I have a custom importer that knows that
git-fast-export omits the "from" command in initial commits. But there
should be a command to specify that the current commit is an initial
commit, allowing reliable export of projects such as git.

2. Kudos to everyone who figured out the nasty race condition and its
complex solution as described in Documentation/technical/racy-git.txt
and the comments of read-cache.c. It took me a while to get my head
around it.

Unfortunately, the solution isn't perfect. Try this:

$ echo xyzzy > file
$ git update-index --add file   # don't zero size since contents match
$ echo frotz > file             # all stats still match, contents don't
$ echo nitfol > other  # can be done much earlier
$ git update-index --add other  # now the cached size is zeroed
$ : > file                      # zero the file size muahahaha
$ # Type fast! The above must take place within the same second! ;-)
$ sleep 2
$ echo filfre > other
$ git update-index --add other  # stats of "file" match, hash is wrong

Essentially, we trigger two index writes that avoid operations on
"file": one immediately after "file" is first committed and identified
as racily clean, and the second some time later, after we have
sneakily zeroed the file behind git's back (after previously editing
its contents in place). We defeat the safeguards and end up with a bad
hash in the index that appears valid.

The"other" file merely causes index writes without reading the "file"
entry. It is also racily clean in the above, but this is irrelevant.

It's unlikely this sequence of operations would occur in real usage,
but I'd sleep better if this index corruption bug were eliminated. One
practical but unsatisfactory easy "fix" is to mark racily clean
entries with SIZE_MAX instead of 0. Who uses git to track to files of
this size?

A better solution would be to introduce a new per-entry flag. Let's
call it "dodgy". Then during a cache entry update, we set "dodgy" if
the mtime is bigger or equal to the index timestamp. And during cache
entry reads, we check "dodgy": if clear, then we trust the hatch,
otherwise we don't trust the hash and recompute it, again setting
"dodgy" if necessary (i.e. if the mtime matches the index timestamp
again).

Although this solution does require adding a flag per index entry, we
no longer have to scan through the index on every index write to
perform the size zero hack.

-Ben

             reply	other threads:[~2008-06-10  8:42 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-10  8:41 Ben Lynn [this message]
2008-06-10 16:58 ` git bugs Daniel Barkalow
2008-06-10 17:44 ` Linus Torvalds
2008-06-10 18:45   ` Ben Lynn
2008-06-10 20:06     ` Linus Torvalds
2008-06-10 23:09       ` Ben Lynn
2008-06-10 23:38         ` Junio C Hamano
2008-06-11  0:02           ` Ben Lynn
2008-06-11  0:20             ` Junio C Hamano
2008-06-11  0:24               ` Ben Lynn
2008-06-11  0:53                 ` Ben Lynn
2008-06-11 12:46                 ` Stephen R. van den Berg
2008-06-12  6:51                   ` Ben Lynn
2008-06-11  1:36             ` Linus Torvalds
2008-06-11  2:04               ` Ben Lynn
2008-06-11  2:12                 ` Linus Torvalds
2008-06-11  2:31                   ` Ben Lynn
2008-06-11  2:39                     ` Linus Torvalds
2008-06-11  5:58                       ` Ben Lynn
2008-06-11  6:18                         ` Ben Lynn
2008-06-11 14:54                           ` Linus Torvalds
2008-06-11 17:52                             ` Ben Lynn
2008-06-11 18:10                               ` Linus Torvalds
2008-06-11 18:48                                 ` Ben Lynn
2008-06-11 18:53                                   ` Linus Torvalds
2008-06-11 20:57                                     ` Ben Lynn
2008-06-11 21:50                                     ` Junio C Hamano
2008-06-11 14:52                         ` Linus Torvalds
2008-06-12 20:06   ` Junio C Hamano
2008-06-13 10:10   ` Jeff King
2008-06-13 23:09     ` Junio C Hamano
2008-06-14  6:25       ` Jeff King
2008-06-12  3:17 ` Shawn O. Pearce
2008-06-12  6:46   ` Ben Lynn
2008-06-12  7:12   ` Johannes Schindelin
  -- strict thread matches above, loose matches on Subject: below --
2017-02-23 20:27 Sean Hunt
2017-02-24 16:52 ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=832adb090806100141n69c086a2v2f59fe94b2f4ead3@mail.gmail.com \
    --to=benlynn@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).