git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git behaviour question regarding SHA-1 and commits
@ 2011-11-13 17:04 vinassa vinassa
  2011-11-13 17:41 ` Ævar Arnfjörð Bjarmason
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: vinassa vinassa @ 2011-11-13 17:04 UTC (permalink / raw
  To: git

Hello,

I am relatively new to git; I have only used it to track other git
projects, and sometimes to format and send patches to them, but never
to handle my own projects.

Now I am considering using git for my next task at work.

I am wondering about how git behaves currently, if I kinda win the
lottery of the universe, and happen to create a commit with a SHA-1
that is already the SHA-1 of another commit in the previous history.
However improbable.

Would that be detected, so that I could just add a newline, and then
commit with a different resulting SHA-1,
would I just lose one of those commits (hopefully the new one), would
I end up with a corrupted repository?

I found some mention of this in the archive, more about SHA-1 security
implications, that were dismissed, but here I am looking at just a
random, very unfortunate case, and just wondering if in this case I
would end up in a FUBAR situation.

Thank you,

Vinassa

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git behaviour question regarding SHA-1 and commits
  2011-11-13 17:04 git behaviour question regarding SHA-1 and commits vinassa vinassa
@ 2011-11-13 17:41 ` Ævar Arnfjörð Bjarmason
  2011-11-14  3:29   ` Junio C Hamano
  2011-11-13 18:27 ` Jonathan Nieder
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-11-13 17:41 UTC (permalink / raw
  To: vinassa vinassa; +Cc: git

This is not something you have to worry about, just get on with using
Git and stop worrying about phenomenally unlikely edge cases that are
never going to happen.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git behaviour question regarding SHA-1 and commits
  2011-11-13 17:04 git behaviour question regarding SHA-1 and commits vinassa vinassa
  2011-11-13 17:41 ` Ævar Arnfjörð Bjarmason
@ 2011-11-13 18:27 ` Jonathan Nieder
  2011-11-13 22:14   ` vinassa vinassa
  2011-11-14 11:32   ` Jeff King
  2011-11-13 22:14 ` Dmitry Potapov
  2011-11-14  7:39 ` Johannes Sixt
  3 siblings, 2 replies; 11+ messages in thread
From: Jonathan Nieder @ 2011-11-13 18:27 UTC (permalink / raw
  To: vinassa vinassa; +Cc: git, Ævar Arnfjörð Bjarmason

Hi Vinassa,

vinassa vinassa wrote:

> I am wondering about how git behaves currently, if I kinda win the
> lottery of the universe, and happen to create a commit with a SHA-1
> that is already the SHA-1 of another commit in the previous history.
> However improbable.

That would be great!  You could definitely get an academic paper out
of it.

> Would that be detected, so that I could just add a newline, and then
> commit with a different resulting SHA-1,
> would I just lose one of those commits (hopefully the new one), would
> I end up with a corrupted repository?

I suspect that one of the two commits would "win" the right to be
shown by commands like "git log".  A commit made after one of the
commits participating in the hash collision might be stored as a delta
against the wrong one in the pack, producing errors when you try to
access it (which is good, since it helps you find the hash collision
and you can get a paper and prizes).

Though I haven't tested.  It would be nice to have an md5git (or even
truncated-sha1-git) program to test this kind of thing with.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git behaviour question regarding SHA-1 and commits
  2011-11-13 18:27 ` Jonathan Nieder
@ 2011-11-13 22:14   ` vinassa vinassa
  2011-11-14 11:32   ` Jeff King
  1 sibling, 0 replies; 11+ messages in thread
From: vinassa vinassa @ 2011-11-13 22:14 UTC (permalink / raw
  To: git

Hi, thanks for the responses, I get the picture. Some comments below still.

On Sun, Nov 13, 2011 at 7:27 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Hi Vinassa,
>
> vinassa vinassa wrote:
>
> > I am wondering about how git behaves currently, if I kinda win the
> > lottery of the universe, and happen to create a commit with a SHA-1
> > that is already the SHA-1 of another commit in the previous history.
> > However improbable.
>
> That would be great!  You could definitely get an academic paper out
> of it.
>
> > Would that be detected, so that I could just add a newline, and then
> > commit with a different resulting SHA-1,
> > would I just lose one of those commits (hopefully the new one), would
> > I end up with a corrupted repository?
>
> I suspect that one of the two commits would "win" the right to be
> shown by commands like "git log".  A commit made after one of the
> commits participating in the hash collision might be stored as a delta
> against the wrong one in the pack, producing errors when you try to
> access it (which is good, since it helps you find the hash collision
> and you can get a paper and prizes).

After cashing in the prizes, I would be able then to git reset --soft,
add a newline, make another commit and go on with my work, right? No
screw up big enough to demand restoring from backups.

> Though I haven't tested.  It would be nice to have an md5git (or even
> truncated-sha1-git) program to test this kind of thing with.

Yes, would be nice. I'll try to see if I can wrap my mind around the
test infrastructure.

> Thanks and hope that helps,
> Jonathan

Thank you for your patience, I understand I should not worry about
this, but this has made me even more curious about what would happen..

Vinassa

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git behaviour question regarding SHA-1 and commits
  2011-11-13 17:04 git behaviour question regarding SHA-1 and commits vinassa vinassa
  2011-11-13 17:41 ` Ævar Arnfjörð Bjarmason
  2011-11-13 18:27 ` Jonathan Nieder
@ 2011-11-13 22:14 ` Dmitry Potapov
  2011-11-14  7:39 ` Johannes Sixt
  3 siblings, 0 replies; 11+ messages in thread
From: Dmitry Potapov @ 2011-11-13 22:14 UTC (permalink / raw
  To: vinassa vinassa; +Cc: git

On Sun, Nov 13, 2011 at 9:04 PM, vinassa vinassa
<vinassa.vinassa@gmail.com> wrote:
>
> I found some mention of this in the archive, more about SHA-1 security
> implications, that were dismissed, but here I am looking at just a
> random, very unfortunate case, and just wondering if in this case I
> would end up in a FUBAR situation.

I do not see how such an event would be very unfortunate considering
that it would make you instantaneously famous, so you could write a
lot of articles about what happened and make a fortunate of it... but
if we consider a _far_ much more likely event like some object from
the sky falling directly on your head at the moment when you are doing
a commit, that I would be really very unfortunate... So, maybe, you
should rent space in a bunker first just to work safely...

Seriously, it is so ridiculous to worry so much about so improbable
event, while in practice a lot of repository corruptions comes from
unreliable DRAM, disk storage, or some other reasons. The mean time
between failures for high quality components is only a few hundred
years while doing a commit every second will take dozen million
times more than the age of our universe to generate a collision. So,
those probabilities are so different that there is nothing in our
every day experiences that has the same scale difference. It is like
a hair width and the distance to the closest star.


Dmitry

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git behaviour question regarding SHA-1 and commits
  2011-11-13 17:41 ` Ævar Arnfjörð Bjarmason
@ 2011-11-14  3:29   ` Junio C Hamano
  2011-11-14 11:48     ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2011-11-14  3:29 UTC (permalink / raw
  To: Ævar Arnfjörð Bjarmason; +Cc: vinassa vinassa, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> This is not something you have to worry about, just get on with using
> Git and stop worrying about phenomenally unlikely edge cases that are
> never going to happen.

People who repeated answers along this line, you can stop. The message has
been heard, but without answering the original question.

When we create a new object (i.e. "git add" to register a new blob
contents, "git commit" that internally generates new tree objects to
record updated "whole contents" and then records the commit object), we
first compute what the object name of the new object would be, and then
check if we already have an object with the same object name in the object
store. If we do, we do not write the new copy of the object out (see the
function write_sha1_file() in sha1_file.c and the call to has_sha1_file()
that bypasses write_loose_object()).

So the old contents will be kept without getting overwritten.

Which sounds nice, but it has interesting consequences, as we do not
bother running byte-for-byte comparison when we find what we tried to
write already existed in the object store in order to error out in fear of
the miniscule chance that we would hit a SHA-1 collision.

If the collision is between commit objects, for example, we would write
the (old) commit object name to the tip of the current branch. Most
likely, the tree object recorded in the (old) commit would not match the
tree object your "git commit" wanted to record (otherwise you have hit
SHA-1 collision twice in a row ;-), which would mean "git status" would
show that a whole bunch of paths have changed between the HEAD and the
index. Also "git log" would show the history leading to the (old) commit
that is likely to be very different from what you would expect immediately
after committing the collided commit. Of course, you could recover from it
with "git reset --soft" after finding out what the previous HEAD was from
the reflog, but it won't be a pleasant experience.

There can be other kinds of collisions (e.g. your latest commit might have
collided with an existing blob or tree, in which case it is likely that
almost nothing would work after finding a blob or tree in HEAD).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git behaviour question regarding SHA-1 and commits
  2011-11-13 17:04 git behaviour question regarding SHA-1 and commits vinassa vinassa
                   ` (2 preceding siblings ...)
  2011-11-13 22:14 ` Dmitry Potapov
@ 2011-11-14  7:39 ` Johannes Sixt
  3 siblings, 0 replies; 11+ messages in thread
From: Johannes Sixt @ 2011-11-14  7:39 UTC (permalink / raw
  To: vinassa vinassa; +Cc: git

Am 11/13/2011 18:04, schrieb vinassa vinassa:
> I am wondering about how git behaves currently, if I kinda win the
> lottery of the universe, and happen to create a commit with a SHA-1
> that is already the SHA-1 of another commit in the previous history.
> However improbable.
> 
> Would that be detected, so that I could just add a newline, and then
> commit with a different resulting SHA-1,
> would I just lose one of those commits (hopefully the new one), would
> I end up with a corrupted repository?

I *think* the following would happen:

1. Git detects that the (commit) object that it is about to generate
already exists, and does not write a new one.

2. Then the branch's ref is updated to the SHA-1. Since the original
commit is somewhere back in history, this is effectively like 'git reset
--soft that-commit'.

3. At your next 'git diff --cached', you notice unexpected differences
between the index and the branch head. You will wonder what happened.
("Who typed 'git reset --soft that-commit' while I was looking the other
way??")

4. To recover, you just 'git reset --soft @{1}' to revert to the state
before the commit attempt, and commit again. Your commit message from the
first attempt will be lost unless you have used -C or -F for your commit.
At any rate, you can reuse the exact same commit message for this second
commit attempt, because by now time will have advanced by at least one
second, which gives you a different commit timestamp and, hence, a
different commit object.

-- Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git behaviour question regarding SHA-1 and commits
  2011-11-13 18:27 ` Jonathan Nieder
  2011-11-13 22:14   ` vinassa vinassa
@ 2011-11-14 11:32   ` Jeff King
  2011-11-14 12:48     ` Victor Engmark
  1 sibling, 1 reply; 11+ messages in thread
From: Jeff King @ 2011-11-14 11:32 UTC (permalink / raw
  To: Jonathan Nieder
  Cc: vinassa vinassa, git, Ævar Arnfjörð Bjarmason

On Sun, Nov 13, 2011 at 12:27:57PM -0600, Jonathan Nieder wrote:

> Though I haven't tested.  It would be nice to have an md5git (or even
> truncated-sha1-git) program to test this kind of thing with.

Fortunately we have such a thing:

  http://article.gmane.org/gmane.comp.version-control.git/184243

That one actually has 40 bits of hash entropy, so you'd expect to
generate 2^20 (about a million) commits before accidentally colliding.
If you want an easier experiment, you could truncate it even further.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git behaviour question regarding SHA-1 and commits
  2011-11-14  3:29   ` Junio C Hamano
@ 2011-11-14 11:48     ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2011-11-14 11:48 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, vinassa vinassa, git

On Sun, Nov 13, 2011 at 07:29:05PM -0800, Junio C Hamano wrote:

> If the collision is between commit objects, for example, we would write
> the (old) commit object name to the tip of the current branch. Most
> likely, the tree object recorded in the (old) commit would not match the
> tree object your "git commit" wanted to record (otherwise you have hit
> SHA-1 collision twice in a row ;-), which would mean "git status" would
> show that a whole bunch of paths have changed between the HEAD and the
> index. Also "git log" would show the history leading to the (old) commit
> that is likely to be very different from what you would expect immediately
> after committing the collided commit. Of course, you could recover from it
> with "git reset --soft" after finding out what the previous HEAD was from
> the reflog, but it won't be a pleasant experience.
> 
> There can be other kinds of collisions (e.g. your latest commit might have
> collided with an existing blob or tree, in which case it is likely that
> almost nothing would work after finding a blob or tree in HEAD).

You are more likely to just have blobs collide, since we generate many
more blobs than commits (each commit should have at least one changed
blob, but typically has more).

And in that case, I expect git would silently lose that state. We would
fail to write the new blob to the object db, but "git diff" would report
nothing, as it would see that the index entry's sha1 is the same as what
is in HEAD, and that the file is up to date with respect to the stat
information in the index. So if you were to "git checkout", your content
would be lost forever. However, if you instead modify the file further,
the new content will be kept (and you will get a very confusing diff).

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git behaviour question regarding SHA-1 and commits
  2011-11-14 11:32   ` Jeff King
@ 2011-11-14 12:48     ` Victor Engmark
  2011-11-14 13:04       ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Victor Engmark @ 2011-11-14 12:48 UTC (permalink / raw
  To: Jeff King
  Cc: Jonathan Nieder, vinassa vinassa, git,
	Ævar Arnfjörð Bjarmason

On Mon, Nov 14, 2011 at 06:32:35AM -0500, Jeff King wrote:
> On Sun, Nov 13, 2011 at 12:27:57PM -0600, Jonathan Nieder wrote:
> 
> > Though I haven't tested.  It would be nice to have an md5git (or even
> > truncated-sha1-git) program to test this kind of thing with.
> 
> Fortunately we have such a thing:
> 
>   http://article.gmane.org/gmane.comp.version-control.git/184243
> 
> That one actually has 40 bits of hash entropy, so you'd expect to
> generate 2^20 (about a million) commits before accidentally colliding.
> If you want an easier experiment, you could truncate it even further.

Would it be helpful to truncate this to something ludicrous like a
single byte of entropy, to be able to write tests for the various tools
and options?

Cheers,
V

-- 
terreActive AG
Kasinostrasse 30
CH-5001 Aarau
Tel: +41 62 834 00 55
Fax: +41 62 823 93 56
www.terreactive.ch

Wir sichern Ihren Erfolg - seit 15 Jahren

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git behaviour question regarding SHA-1 and commits
  2011-11-14 12:48     ` Victor Engmark
@ 2011-11-14 13:04       ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2011-11-14 13:04 UTC (permalink / raw
  To: Jonathan Nieder, vinassa vinassa, git,
	Ævar Arnfjörð Bjarmason

On Mon, Nov 14, 2011 at 01:48:51PM +0100, Victor Engmark wrote:

> > Fortunately we have such a thing:
> > 
> >   http://article.gmane.org/gmane.comp.version-control.git/184243
> > 
> > That one actually has 40 bits of hash entropy, so you'd expect to
> > generate 2^20 (about a million) commits before accidentally colliding.
> > If you want an easier experiment, you could truncate it even further.
> 
> Would it be helpful to truncate this to something ludicrous like a
> single byte of entropy, to be able to write tests for the various tools
> and options?

That's probably too small. Obviously any implementation like this is not
going to be usable for interacting with existing repositories, but if
you have too many collisions, then you won't even be able to create a
few new commits for your test.

Something like 20 bits means you can brute-force a collision for a
particular blob, commit, tree, or whatever in a few seconds, but you
won't be having accidental ones all the time.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-11-14 13:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-13 17:04 git behaviour question regarding SHA-1 and commits vinassa vinassa
2011-11-13 17:41 ` Ævar Arnfjörð Bjarmason
2011-11-14  3:29   ` Junio C Hamano
2011-11-14 11:48     ` Jeff King
2011-11-13 18:27 ` Jonathan Nieder
2011-11-13 22:14   ` vinassa vinassa
2011-11-14 11:32   ` Jeff King
2011-11-14 12:48     ` Victor Engmark
2011-11-14 13:04       ` Jeff King
2011-11-13 22:14 ` Dmitry Potapov
2011-11-14  7:39 ` Johannes Sixt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).