git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: linux@horizon.com
To: jgarzik@pobox.com
Cc: git@vger.kernel.org
Subject: Re: Hash collision count
Date: 24 Apr 2005 23:16:05 -0000	[thread overview]
Message-ID: <20050424231605.30173.qmail@science.horizon.com> (raw)

*Sigh*.

> A collision -will- occur eventually, and it is trivial to avoid this 
> problem:

Yes, it will occur *eventually*.  Let's put some numbers on this
"eventually" business.

The earth's sun will run out of hydrogen and become a red giant in about
6 billion years, 1.3 * 2^57 seconds.  (Many people say 5 billion. but
I'll round up for safety.)  Suppose we add 1 file per second to our
repository until we are interrupted by the sun's photosphere engulfing
the planet and melting all our computers.

That's a total of n*(n-1)/2 = 1.7 * 2^113 pairs of files which can
possibly collide.  I'll round up to 2^114 for simplicity.

Assuming we have a good uniform hash function, the chance that any given
pair of different file versions produces an identical hash is 1/2^160.

The chance that there are no collsions is (1-1/2^160)^(2^114).  What is
this numerically?  Well, (1-a)*(1-b) = 1 - a - b + a*b.  When multiplying
probabilities very close to 1 (a and b small), the probability is
a little bit larger than (1-(a+b)).

Likewise, when exponentiating proabilities very close to 1, (1-a)^n is
a bit larger than 1-n*a.

Thus, the probability of no collisions is a bit larger than (1 - 2^114/2^160),
or (1 - 2^-46).  The probability of one or more collisions is a bit *less*
than 2^-46, 1 in 70 trillion.


With odds like that, I submit that it's not worth fixing; the odds
of introducing a bug are far higher than that.


If you can't find a hard drive large enough to store 183 quadrillion
file versions, the probability of a collision decreases as the square of
the fraction of that number you do store.  For example, if you only have
400 billion versions, the chance of a collision is around 2^-84.

We have a fair bit of safety margin in our "good uniform hash" assumption.

Running out of 32-bit inode number space is a *far* more urgent problem.

Not to mention that over the next 1000 years of kernel maintainers, one
of them may find something they like better than git.

             reply	other threads:[~2005-04-24 23:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-24 23:16 linux [this message]
  -- strict thread matches above, loose matches on Subject: below --
2005-04-23 20:27 Hash collision count Jeff Garzik
2005-04-23 20:33 ` Jeff Garzik
2005-04-23 23:00 ` Ray Heasman
2005-04-23 23:20   ` Jeff Garzik
2005-04-23 23:46     ` Petr Baudis
2005-04-24  0:35       ` Jeff Garzik
2005-04-24  0:40         ` Petr Baudis
2005-04-24  0:43           ` Jeff Garzik
2005-04-24 21:24             ` Imre Simon
2005-04-25 23:50       ` Tom Lord
2005-04-26  0:00         ` Petr Baudis
2005-04-24  1:01     ` Ray Heasman
2005-04-24  7:56 ` David Lang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050424231605.30173.qmail@science.horizon.com \
    --to=linux@horizon.com \
    --cc=git@vger.kernel.org \
    --cc=jgarzik@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).