* [PATCH] Add git-relink-script to fix up missing hardlinks
@ 2005-06-26 18:15 Ryan Anderson
2005-06-26 18:36 ` Jeff Garzik
2005-06-26 19:07 ` Junio C Hamano
0 siblings, 2 replies; 6+ messages in thread
From: Ryan Anderson @ 2005-06-26 18:15 UTC (permalink / raw
To: Linus Torvalds; +Cc: git, Junio C Hamano
Add git-relink-script
This will scan 2 or more object repositories and look for common objects, check
if they are hardlinked, and replace one with a hardlink to the other if not.
This version warns when skipping files because of size differences, and
handle more than 2 repositories automatically.
Signed-off-by: Ryan Anderson <ryan@michonline.com>
diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -25,7 +25,7 @@ SCRIPTS=git git-apply-patch-script git-m
git-deltafy-script git-fetch-script git-status-script git-commit-script \
git-log-script git-shortlog git-cvsimport-script git-diff-script \
git-reset-script git-add-script git-checkout-script git-clone-script \
- gitk git-cherry git-rebase-script
+ gitk git-cherry git-rebase-script git-relink-script
PROG= git-update-cache git-diff-files git-init-db git-write-tree \
git-read-tree git-commit-tree git-cat-file git-fsck-cache \
diff --git a/git-relink-script b/git-relink-script
new file mode 100644
--- /dev/null
+++ b/git-relink-script
@@ -0,0 +1,173 @@
+#!/usr/bin/env perl
+# Copyright 2005, Ryan Anderson <ryan@michonline.com>
+# Distribution permitted under the GPL v2, as distributed
+# by the Free Software Foundation.
+# Later versions of the GPL at the discretion of Linus Torvalds
+#
+# Scan two git object-trees, and hardlink any common objects between them.
+
+use 5.006;
+use strict;
+use warnings;
+use Getopt::Long;
+
+sub get_canonical_form($);
+sub do_scan_directory($$$);
+sub compare_two_files($$);
+sub usage();
+sub link_two_files($$);
+
+# stats
+my $total_linked = 0;
+my $total_already = 0;
+my ($linked,$already);
+
+my $fail_on_different_sizes = 0;
+my $help = 0;
+GetOptions("safe" => \$fail_on_different_sizes,
+ "help" => \$help);
+
+usage() if $help;
+
+my (@dirs) = @ARGV;
+
+usage() if (!defined $dirs[0] || !defined $dirs[1]);
+
+$_ = get_canonical_form($_) foreach (@dirs);
+
+my $master_dir = pop @dirs;
+
+opendir(D,$master_dir . "objects/")
+ or die "Failed to open $master_dir/objects/ : $!";
+
+my @hashdirs = grep !/^\.{1,2}$/, readdir(D);
+
+foreach my $repo (@dirs) {
+ $linked = 0;
+ $already = 0;
+ printf("Searching '%s' and '%s' for common objects and hardlinking them...\n",
+ $master_dir,$repo);
+
+ foreach my $hashdir (@hashdirs) {
+ do_scan_directory($master_dir, $hashdir, $repo);
+ }
+
+ printf("Linked %d files, %d were already linked.\n",$linked, $already);
+
+ $total_linked += $linked;
+ $total_already += $already;
+}
+
+printf("Totals: Linked %d files, %d were already linked.\n",
+ $total_linked, $total_already);
+
+
+sub do_scan_directory($$$) {
+ my ($srcdir, $subdir, $dstdir) = @_;
+
+ my $sfulldir = sprintf("%sobjects/%s/",$srcdir,$subdir);
+ my $dfulldir = sprintf("%sobjects/%s/",$dstdir,$subdir);
+
+ opendir(S,$sfulldir)
+ or die "Failed to opendir $sfulldir: $!";
+
+ foreach my $file (grep(!/\.{1,2}$/, readdir(S))) {
+ my $sfilename = $sfulldir . $file;
+ my $dfilename = $dfulldir . $file;
+
+ compare_two_files($sfilename,$dfilename);
+
+ }
+ closedir(S);
+}
+
+sub compare_two_files($$) {
+ my ($sfilename, $dfilename) = @_;
+
+ # Perl's stat returns relevant information as follows:
+ # 0 = dev number
+ # 1 = inode number
+ # 7 = size
+ my @sstatinfo = stat($sfilename);
+ my @dstatinfo = stat($dfilename);
+
+ if (@sstatinfo == 0 && @dstatinfo == 0) {
+ die sprintf("Stat of both %s and %s failed: %s\n",$sfilename, $dfilename, $!);
+
+ } elsif (@dstatinfo == 0) {
+ return;
+ }
+
+ if ( ($sstatinfo[0] == $dstatinfo[0]) &&
+ ($sstatinfo[1] != $dstatinfo[1])) {
+ if ($sstatinfo[7] == $dstatinfo[7]) {
+ link_two_files($sfilename, $dfilename);
+
+ } else {
+ my $err = sprintf("ERROR: File sizes are not the same, cannot relink %s to %s.\n",
+ $sfilename, $dfilename);
+ if ($fail_on_different_sizes) {
+ die $err;
+ } else {
+ warn $err;
+ }
+ }
+
+ } elsif ( ($sstatinfo[0] == $dstatinfo[0]) &&
+ ($sstatinfo[1] == $dstatinfo[1])) {
+ $already++;
+ }
+}
+
+sub get_canonical_form($) {
+ my $dir = shift;
+ my $original = $dir;
+
+ die "$dir is not a directory." unless -d $dir;
+
+ $dir .= "/" unless $dir =~ m#/$#;
+ $dir .= ".git/" unless $dir =~ m#\.git/$#;
+
+ die "$original does not have a .git/ subdirectory.\n" unless -d $dir;
+
+ return $dir;
+}
+
+sub link_two_files($$) {
+ my ($sfilename, $dfilename) = @_;
+ my $tmpdname = sprintf("%s.old",$dfilename);
+ rename($dfilename,$tmpdname)
+ or die sprintf("Failure renaming %s to %s: %s",
+ $dfilename, $tmpdname, $!);
+
+ if (! link($sfilename,$dfilename)) {
+ my $failtxt = "";
+ unless (rename($tmpdname,$dfilename)) {
+ $failtxt = sprintf(
+ "Git Repository containing %s is probably corrupted, " .
+ "please copy '%s' to '%s' to fix.\n",
+ $tmpdname, $dfilename);
+ }
+
+ die sprintf("Failed to link %s to %s: %s\n%s" .
+ $sfilename, $dfilename,
+ $!, $dfilename, $failtxt);
+ }
+
+ unlink($tmpdname)
+ or die sprintf("Unlink of %s failed: %s\n",
+ $dfilename, $!);
+
+ $linked++;
+}
+
+
+sub usage() {
+ print("Usage: $0 [--safe] <dir> [<dir> ...] <master_dir> \n");
+ print("All directories should contain a .git/objects/ subdirectory.\n");
+ print("Options\n");
+ print("\t--safe\t" .
+ "Stops if two objects with the same hash exist but " .
+ "have different sizes. Default is to warn and continue.\n");
+ exit(1);
+}
--
Ryan Anderson
sometimes Pug Majere
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Add git-relink-script to fix up missing hardlinks
2005-06-26 18:15 [PATCH] Add git-relink-script to fix up missing hardlinks Ryan Anderson
@ 2005-06-26 18:36 ` Jeff Garzik
2005-06-26 19:07 ` Junio C Hamano
1 sibling, 0 replies; 6+ messages in thread
From: Jeff Garzik @ 2005-06-26 18:36 UTC (permalink / raw
To: Ryan Anderson; +Cc: Linus Torvalds, git, Junio C Hamano
Ryan Anderson wrote:
> Add git-relink-script
>
> This will scan 2 or more object repositories and look for common objects, check
> if they are hardlinked, and replace one with a hardlink to the other if not.
>
> This version warns when skipping files because of size differences, and
> handle more than 2 repositories automatically.
>
> Signed-off-by: Ryan Anderson <ryan@michonline.com>
Thanks for posting this.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Add git-relink-script to fix up missing hardlinks
2005-06-26 18:15 [PATCH] Add git-relink-script to fix up missing hardlinks Ryan Anderson
2005-06-26 18:36 ` Jeff Garzik
@ 2005-06-26 19:07 ` Junio C Hamano
2005-06-26 19:31 ` Junio C Hamano
1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2005-06-26 19:07 UTC (permalink / raw
To: Ryan Anderson; +Cc: git, Junio C Hamano
Not that I think it matters that much anymore since I am
proposing removal of "delta" support and Linus seems to be
inclined in the same direction, but I said "most of the time" in
the earlier message on this same topic for a reason:
Message-ID: <7vy89h36da.fsf@assigned-by-dhcp.cox.net>
Subject: Re: RFE: git relink
Date: Fri, 10 Jun 2005 20:44:01 -0700
References: <42A88C07.5050907@pobox.com>
Whoever is doing this script needs to be a bit careful.
...
Ryan Anderson code will notice delta vs full object case most of
the time because it checks and makes sure the sizes of
corresponding files from two repositories match. The problem
with the code is that it dies, instead of just ignoring, when
size differs....
Your latest version has an option not to die which is very good
[*1*], but in a very narrow corner case, without comparing the
file contents, I think the code would still do a wrong thing.
Two trees can store the same object both in delitified form but
based on different base objects, and the deltified
representation still having the same length, no? And I suspect
you would end up linking them together, corrupting one of the
trees.
Of course, even when you do not have "delta", if an object in
one tree is corrupted (but has the correct size), you would end
up relinking the corrupt one into another tree, nuking a good
copy, if you do not compare the file contents.
If/when/after the proposed removal of "delta" support happens, I
think the correct way to do git-relink-script would be to keep
most of your latest version intact, except:
(1) make it always die when you see differences in size.
Without "delta" in the repository, SHA1 files that
represent the same object must have the same size.
(2) make --safe also check on file contents. You do not need
the flag for the "delta" reason anymore, so I am suggesting
reusing the flag to detect file corruption, to be extra
safe, when the user permits you to spend cycles to be more
careful.
[Footnote]
*1* and other parts of the script all look nicely done.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Add git-relink-script to fix up missing hardlinks
2005-06-26 19:07 ` Junio C Hamano
@ 2005-06-26 19:31 ` Junio C Hamano
2005-06-26 19:44 ` Jeff Garzik
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2005-06-26 19:31 UTC (permalink / raw
To: Ryan Anderson; +Cc: git, Linus Torvalds, Jeff Garzik
>>>>> "JCH" == Junio C Hamano <junkio@cox.net> writes:
JCH> Your latest version has an option not to die which is very
JCH> good, but in a very narrow corner case, without comparing
JCH> the file contents, I think the code would still do a wrong
JCH> thing.
Having said that, the corner case is narrow enough (and
hopefully to be gone soon) that I think the current version is
perfectly acceptable for inclusion.
Linus, please apply. What it does is useful to encourage the
"one topic, one tree" use pattern, the officially recommended
way IIUC.
I am a bit puzzled, though, why Jeff was the original requestor
for this feature --- I thought he handles 50 heads in one
repository which means there is no multiple repositories to
relink across.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Add git-relink-script to fix up missing hardlinks
2005-06-26 19:31 ` Junio C Hamano
@ 2005-06-26 19:44 ` Jeff Garzik
2005-06-27 1:11 ` Jan Harkes
0 siblings, 1 reply; 6+ messages in thread
From: Jeff Garzik @ 2005-06-26 19:44 UTC (permalink / raw
To: Junio C Hamano; +Cc: Ryan Anderson, git, Linus Torvalds
Junio C Hamano wrote:
> I am a bit puzzled, though, why Jeff was the original requestor
> for this feature --- I thought he handles 50 heads in one
> repository which means there is no multiple repositories to
> relink across.
Sure there are. Just watching my submissions on the mailing list, you
can see ones mentioned such as "misc-2.6", "libata-dev", "netdev-2.6", etc.:
[jgarzik@pretzel repo]$ ls -FC
config-2.4 ethtool/ libata-dev/ netdev-2.6/ sparse/
config-2.6 git/ linux-2.6/ old-SCM/
config-2.6-uml git-tools/ misc-2.6/ scsi-misc-2.6/
And I always keep an unmodified 'vanilla' tree from which everything is
sourced (and hardlinked to).
It's a categorization system, a namespace. Top-level repositories are
broad categories, and branches sub-divide those categories.
Jeff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Add git-relink-script to fix up missing hardlinks
2005-06-26 19:44 ` Jeff Garzik
@ 2005-06-27 1:11 ` Jan Harkes
0 siblings, 0 replies; 6+ messages in thread
From: Jan Harkes @ 2005-06-27 1:11 UTC (permalink / raw
To: git
On Sun, Jun 26, 2005 at 03:44:15PM -0400, Jeff Garzik wrote:
> Junio C Hamano wrote:
> >I am a bit puzzled, though, why Jeff was the original requestor
> >for this feature --- I thought he handles 50 heads in one
> >repository which means there is no multiple repositories to
> >relink across.
>
> Sure there are. Just watching my submissions on the mailing list, you
> can see ones mentioned such as "misc-2.6", "libata-dev", "netdev-2.6", etc.:
>
> [jgarzik@pretzel repo]$ ls -FC
> config-2.4 ethtool/ libata-dev/ netdev-2.6/ sparse/
> config-2.6 git/ linux-2.6/ old-SCM/
> config-2.6-uml git-tools/ misc-2.6/ scsi-misc-2.6/
I actually have been using subdirectories in refs/heads with quite a bit
of success. All of the core tools have no problem with them and only
gitweb and gitk need some small changes to show them correctly.
The subdirectories in refs/heads are user specific in my case, this end
up being pretty useful when combined with Coda's directory ACLs, each
user can maintain their own branches, but cannot modify anyone elses.
All developers have insert, read and lookup rights on the main objects
repository, they can add new objects, but not remove or overwrite any
existing ones.
My branch names end up looking something like 'jaharkes/wdonly',
'awolbach/expand', etc. and it works like a charm.
Jan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-06-27 1:04 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-26 18:15 [PATCH] Add git-relink-script to fix up missing hardlinks Ryan Anderson
2005-06-26 18:36 ` Jeff Garzik
2005-06-26 19:07 ` Junio C Hamano
2005-06-26 19:31 ` Junio C Hamano
2005-06-26 19:44 ` Jeff Garzik
2005-06-27 1:11 ` Jan Harkes
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).