From: linux@horizon.com
To: jgarzik@pobox.com
Cc: git@vger.kernel.org
Subject: Re: Hash collision count
Date: 24 Apr 2005 23:16:05 -0000 [thread overview]
Message-ID: <20050424231605.30173.qmail@science.horizon.com> (raw)
*Sigh*.
> A collision -will- occur eventually, and it is trivial to avoid this
> problem:
Yes, it will occur *eventually*. Let's put some numbers on this
"eventually" business.
The earth's sun will run out of hydrogen and become a red giant in about
6 billion years, 1.3 * 2^57 seconds. (Many people say 5 billion. but
I'll round up for safety.) Suppose we add 1 file per second to our
repository until we are interrupted by the sun's photosphere engulfing
the planet and melting all our computers.
That's a total of n*(n-1)/2 = 1.7 * 2^113 pairs of files which can
possibly collide. I'll round up to 2^114 for simplicity.
Assuming we have a good uniform hash function, the chance that any given
pair of different file versions produces an identical hash is 1/2^160.
The chance that there are no collsions is (1-1/2^160)^(2^114). What is
this numerically? Well, (1-a)*(1-b) = 1 - a - b + a*b. When multiplying
probabilities very close to 1 (a and b small), the probability is
a little bit larger than (1-(a+b)).
Likewise, when exponentiating proabilities very close to 1, (1-a)^n is
a bit larger than 1-n*a.
Thus, the probability of no collisions is a bit larger than (1 - 2^114/2^160),
or (1 - 2^-46). The probability of one or more collisions is a bit *less*
than 2^-46, 1 in 70 trillion.
With odds like that, I submit that it's not worth fixing; the odds
of introducing a bug are far higher than that.
If you can't find a hard drive large enough to store 183 quadrillion
file versions, the probability of a collision decreases as the square of
the fraction of that number you do store. For example, if you only have
400 billion versions, the chance of a collision is around 2^-84.
We have a fair bit of safety margin in our "good uniform hash" assumption.
Running out of 32-bit inode number space is a *far* more urgent problem.
Not to mention that over the next 1000 years of kernel maintainers, one
of them may find something they like better than git.
next reply other threads:[~2005-04-24 23:11 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-24 23:16 linux [this message]
-- strict thread matches above, loose matches on Subject: below --
2005-04-23 20:27 Hash collision count Jeff Garzik
2005-04-23 20:33 ` Jeff Garzik
2005-04-23 23:00 ` Ray Heasman
2005-04-23 23:20 ` Jeff Garzik
2005-04-23 23:46 ` Petr Baudis
2005-04-24 0:35 ` Jeff Garzik
2005-04-24 0:40 ` Petr Baudis
2005-04-24 0:43 ` Jeff Garzik
2005-04-24 21:24 ` Imre Simon
2005-04-25 23:50 ` Tom Lord
2005-04-26 0:00 ` Petr Baudis
2005-04-24 1:01 ` Ray Heasman
2005-04-24 7:56 ` David Lang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050424231605.30173.qmail@science.horizon.com \
--to=linux@horizon.com \
--cc=git@vger.kernel.org \
--cc=jgarzik@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).