* easy way to demonstrate length of colliding SHA-1 prefixes?
@ 2018-12-02 11:50 Robert P. J. Day
2018-12-02 13:23 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 5+ messages in thread
From: Robert P. J. Day @ 2018-12-02 11:50 UTC (permalink / raw)
To: Git Mailing list
as part of an upcoming git class i'm delivering, i thought it would
be amusing to demonstrate the maximum length of colliding SHA-1
prefixes in a repository (in my case, i use the linux kernel git repo
for most of my examples).
is there a way to display the objects in the object database that
clash in the longest object name SHA-1 prefix; i mean, short of
manually listing all object names, running that through cut and sort
and uniq and ... you get the idea.
is there a cute way to do that? thanks.
rday
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: easy way to demonstrate length of colliding SHA-1 prefixes?
2018-12-02 11:50 easy way to demonstrate length of colliding SHA-1 prefixes? Robert P. J. Day
@ 2018-12-02 13:23 ` Ævar Arnfjörð Bjarmason
2018-12-02 16:24 ` Robert P. J. Day
2018-12-03 22:30 ` Matthew DeVore
0 siblings, 2 replies; 5+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-12-02 13:23 UTC (permalink / raw)
To: Robert P. J. Day; +Cc: Git Mailing list
On Sun, Dec 02 2018, Robert P. J. Day wrote:
> as part of an upcoming git class i'm delivering, i thought it would
> be amusing to demonstrate the maximum length of colliding SHA-1
> prefixes in a repository (in my case, i use the linux kernel git repo
> for most of my examples).
>
> is there a way to display the objects in the object database that
> clash in the longest object name SHA-1 prefix; i mean, short of
> manually listing all object names, running that through cut and sort
> and uniq and ... you get the idea.
>
> is there a cute way to do that? thanks.
You'll always need to list them all. It's inherently an operation where
for each SHA-1 you need to search for other ones with that prefix up to
a given length.
Perhaps you've missed that you can use --abbrev=N for this, and just
grep for things that are loger than that N, e.g. for linux.git:
git log --oneline --abbrev=10 --pretty=format:%h |
grep -E -v '^.{10}$' |
perl -pe 's/^(.{10}).*/$1/'
This will list the 4 objects that need more than 10 characters to be
shown unambiguously. If you then "git cat-file -t" them you'll get the
disambiguation help.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: easy way to demonstrate length of colliding SHA-1 prefixes?
2018-12-02 13:23 ` Ævar Arnfjörð Bjarmason
@ 2018-12-02 16:24 ` Robert P. J. Day
2018-12-03 22:30 ` Matthew DeVore
1 sibling, 0 replies; 5+ messages in thread
From: Robert P. J. Day @ 2018-12-02 16:24 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing list
[-- Attachment #1: Type: text/plain, Size: 1835 bytes --]
On Sun, 2 Dec 2018, Ævar Arnfjörð Bjarmason wrote:
> On Sun, Dec 02 2018, Robert P. J. Day wrote:
>
> > as part of an upcoming git class i'm delivering, i thought it
> > would be amusing to demonstrate the maximum length of colliding
> > SHA-1 prefixes in a repository (in my case, i use the linux kernel
> > git repo for most of my examples).
> >
> > is there a way to display the objects in the object database
> > that clash in the longest object name SHA-1 prefix; i mean, short
> > of manually listing all object names, running that through cut and
> > sort and uniq and ... you get the idea.
> >
> > is there a cute way to do that? thanks.
>
> You'll always need to list them all. It's inherently an operation
> where for each SHA-1 you need to search for other ones with that
> prefix up to a given length.
i assumed as much, just wasn't sure about the esoteric dark corners
of git i've never gotten to yet.
> Perhaps you've missed that you can use --abbrev=N for this, and just
> grep for things that are loger than that N, e.g. for linux.git:
>
> git log --oneline --abbrev=10 --pretty=format:%h |
> grep -E -v '^.{10}$' |
> perl -pe 's/^(.{10}).*/$1/'
>
> This will list the 4 objects that need more than 10 characters to be
> shown unambiguously. If you then "git cat-file -t" them you'll get
> the disambiguation help.
that's pretty close to what i came up with, thanks.
rday
--
========================================================================
Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca/dokuwiki
Twitter: http://twitter.com/rpjday
LinkedIn: http://ca.linkedin.com/in/rpjday
========================================================================
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: easy way to demonstrate length of colliding SHA-1 prefixes?
2018-12-02 13:23 ` Ævar Arnfjörð Bjarmason
2018-12-02 16:24 ` Robert P. J. Day
@ 2018-12-03 22:30 ` Matthew DeVore
2018-12-03 22:57 ` Jeff King
1 sibling, 1 reply; 5+ messages in thread
From: Matthew DeVore @ 2018-12-03 22:30 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, Robert P. J. Day; +Cc: Git Mailing list
On 12/02/2018 05:23 AM, Ævar Arnfjörð Bjarmason wrote:
>
> On Sun, Dec 02 2018, Robert P. J. Day wrote:
>
>> as part of an upcoming git class i'm delivering, i thought it would
>> be amusing to demonstrate the maximum length of colliding SHA-1
>> prefixes in a repository (in my case, i use the linux kernel git repo
>> for most of my examples).
>>
>> is there a way to display the objects in the object database that
>> clash in the longest object name SHA-1 prefix; i mean, short of
>> manually listing all object names, running that through cut and sort
>> and uniq and ... you get the idea.
>>
>> is there a cute way to do that? thanks.
>
Here is a one-liner to do it. It is Perl line noise, so it's not very
cute, thought that is subjective. The output shown below is for the Git
project (not Linux) repository as I've currently synced it:
$ git rev-list --objects HEAD | sort | perl -anE 'BEGIN { $prev = "";
$long = "" } $n = $F[0]; for my $i (reverse 1..40) {last if $i <
length($long); if (substr($prev, 0, $i) eq substr($n, 0, $i)) {$long =
substr($prev, 0, $i); last} } $prev = $n; END {say $long}'
c68038ef
$ git cat-file -t c68038ef
error: short SHA1 c68038ef is ambiguous
hint: The candidates are:
hint: c68038effe commit 2012-06-01 - vcs-svn: suppress a
signed/unsigned comparison warning
hint: c68038ef00 blob
fatal: Not a valid object name c68038ef
> You'll always need to list them all. It's inherently an operation where
> for each SHA-1 you need to search for other ones with that prefix up to
> a given length.
>
> Perhaps you've missed that you can use --abbrev=N for this, and just
> grep for things that are loger than that N, e.g. for linux.git:
>
> git log --oneline --abbrev=10 --pretty=format:%h |
> grep -E -v '^.{10}$' |
> perl -pe 's/^(.{10}).*/$1/'
I think the goal was to search all object hashes, not just commits. And
git rev-list --objects will do that.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: easy way to demonstrate length of colliding SHA-1 prefixes?
2018-12-03 22:30 ` Matthew DeVore
@ 2018-12-03 22:57 ` Jeff King
0 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2018-12-03 22:57 UTC (permalink / raw)
To: Matthew DeVore
Cc: Ævar Arnfjörð Bjarmason, Robert P. J. Day,
Git Mailing list
On Mon, Dec 03, 2018 at 02:30:44PM -0800, Matthew DeVore wrote:
> Here is a one-liner to do it. It is Perl line noise, so it's not very cute,
> thought that is subjective. The output shown below is for the Git project
> (not Linux) repository as I've currently synced it:
>
> $ git rev-list --objects HEAD | sort | perl -anE 'BEGIN { $prev = ""; $long
> = "" } $n = $F[0]; for my $i (reverse 1..40) {last if $i < length($long); if
> (substr($prev, 0, $i) eq substr($n, 0, $i)) {$long = substr($prev, 0, $i);
> last} } $prev = $n; END {say $long}'
Ooh, object-collision golf.
Try:
git cat-file --batch-all-objects --batch-check='%(objectname)'
instead of "rev-list | sort". It's _much_ faster, because it doesn't
have to actually open the objects and walk the graph.
Some versions of uniq have "-w" (including GNU, but it's definitely not
in POSIX), which lets you do:
git cat-file --batch-all-objects --batch-check='%(objectname)' |
uniq -cdw 7
to list all collisions of length 7 (it will show just the first item
from each group, but you can use -D to see them all).
> > You'll always need to list them all. It's inherently an operation where
> > for each SHA-1 you need to search for other ones with that prefix up to
> > a given length.
> >
> > Perhaps you've missed that you can use --abbrev=N for this, and just
> > grep for things that are loger than that N, e.g. for linux.git:
> >
> > git log --oneline --abbrev=10 --pretty=format:%h |
> > grep -E -v '^.{10}$' |
> > perl -pe 's/^(.{10}).*/$1/'
>
> I think the goal was to search all object hashes, not just commits. And git
> rev-list --objects will do that.
You can add "-t --raw" to see the abbreviated tree and blob names,
though it gets tricky around handling merges.
-Peff
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-12-03 22:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-02 11:50 easy way to demonstrate length of colliding SHA-1 prefixes? Robert P. J. Day
2018-12-02 13:23 ` Ævar Arnfjörð Bjarmason
2018-12-02 16:24 ` Robert P. J. Day
2018-12-03 22:30 ` Matthew DeVore
2018-12-03 22:57 ` Jeff King
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).