git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] archimport improvements
@ 2005-11-12  9:23 Eric Wong
  2005-11-12  9:25 ` [PATCH 1/5] remove shellquote usage for tags Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-12  9:23 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

Hello,

I'm another Arch-user trying out git.  Unfortunately, I encountered
several problems with git-archimport that I needed fixed before my
development trees could be imported into git.

Here's a summary of the changes:

Bug Fixes:

* Support for '--branch'-less Arch version names.
  Encoding '/' to '--' (as was previously done) is not 100% reversable
  because the "--branch" portion of an fully-qualified Arch version name
  is optional (though not many people or Arch-related tools know this).

* I'm encoding the '/' in the fully-qualified name as ',' to not confuse
  other porcelains, but leaving '/' in branch names may be alright
  provided porcelains can support them.

* Identify git branches as an Arch "archive,category<--branch>--version"
  Anything less than that is ambiguous as far as history and patch
  relationships go.

* Renamed directories containing renamed/moved files inside didn't get
  tracked properly.  The original code was inadequate for this, and
  making it support all rename cases that Arch supports is too much
  work.  Instead, I maintain full-blown Arch trees in the temp dir and
  replay patches + rsync based on that.  Performance is slightly slower
  than before, but accuracy is more important to me.

* Permission (execute bit only because of git) tracking as a side effect
  of the above.

* Tracking changes from branches that are only cherry-picked now works

* Pika-escaped filenames unhandled.  This seems fixed in the latest
  git, but I fixed it more generally and removed the ShellQuote module
  dependency along the way.

* Don't die() when a merge-base can't be found.  Arch supports
  merging between unrelated trees.


Usability enhancements:

* Optionally detect merged branches and attempt to import their history,
  too.  Use the -D <depth> option for this.  Specifying a <depth>
  greater than 1 is usually not needed unless the tree you're tracking
  has had history pruned.
  
* Optionally attempt to auto-register unknown Arch archives from
  mirrors.sourcecontrol.net to pull their history with the -a (boolean)
  switch.  Not sure how useful users will find this.

* Removed -A <archive> usage (unnecessary in all cases) and made all
  Arch calls and output parsing to be compatible with both tla (tested
  1.3.3) and baz (1.4.2).  Default is still tla, but the ARCH_CLIENT
  environment variable may be changed to baz.


Current weaknesses:

* (Present in the original code as well).
  The code still assumes that dates in commit logs can be trusted, which is
  fine in most cases, but a wayward branch can screw up git-archimport and
  cause parents to be missed.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/5] remove shellquote usage for tags
  2005-11-12  9:23 [PATCH] archimport improvements Eric Wong
@ 2005-11-12  9:25 ` Eric Wong
  2005-11-12  9:27   ` [PATCH 2/5] archimport: don't die on merge-base failure Eric Wong
  2005-11-12 11:54 ` [PATCH] archimport improvements Martin Langhoff
  2005-11-17  9:26 ` [PATCH] archimport improvements Martin Langhoff
  2 siblings, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-12  9:25 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

use ',' to encode '/' in "archivename/foo--bar--0.0" so we can allow
"--branch"-less trees which are valid in Arch ("archivename/foo--0.0")

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |   55 ++++++++++++++++++++++++++-------------------------
 1 files changed, 28 insertions(+), 27 deletions(-)

applies-to: 76d3d1c302c20b82fd976e958aabd19f7f01e7b5
28d4f9ee8ba83b35eea66d4dd19b8ec26a0218c7
diff --git a/git-archimport.perl b/git-archimport.perl
index e22c816..7c15184 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -52,6 +52,7 @@ $ENV{'TZ'}="UTC";
 
 my $git_dir = $ENV{"GIT_DIR"} || ".git";
 $ENV{"GIT_DIR"} = $git_dir;
+my $ptag_dir = "$git_dir/archimport/tags";
 
 our($opt_h,$opt_v, $opt_T,
     $opt_C,$opt_t);
@@ -195,16 +196,19 @@ unless (-d $git_dir) { # initial import
     opendir(DIR, "$git_dir/archimport/tags")
 	|| die "can't opendir: $!";
     while (my $file = readdir(DIR)) {
-	# skip non-interesting-files
-	next unless -f "$git_dir/archimport/tags/$file";
-	next if     $file =~ m/--base-0$/; # don't care for base-0
+        # skip non-interesting-files
+        next unless -f "$ptag_dir/$file";
+   
+        # convert first '--' to '/' from old git-archimport to use
+        # as an archivename/c--b--v private tag
+        if ($file !~ m!,!) {
+            my $oldfile = $file;
+            $file =~ s!--!,!;
+            print STDERR "converting old tag $oldfile to $file\n";
+            rename("$ptag_dir/$oldfile", "$ptag_dir/$file") or die $!;
+        }
 	my $sha = ptag($file);
 	chomp $sha;
-	# reconvert the 3rd '--' sequence from the end
-	# into a slash
-	# $file = reverse $file;
-	# $file =~ s!^(.+?--.+?--.+?--.+?)--(.+)$!$1/$2!;
-	# $file = reverse $file;
 	$rptags{$sha} = $file;
     }
     closedir DIR;
@@ -582,19 +586,20 @@ sub parselog {
 # write/read a tag
 sub tag {
     my ($tag, $commit) = @_;
-    $tag =~ s|/|--|g; 
-    $tag = shell_quote($tag);
+ 
+    # don't use subdirs for tags yet, it could screw up other porcelains
+    $tag =~ s|/|,|;
     
     if ($commit) {
-        open(C,">$git_dir/refs/tags/$tag")
+        open(C,">","$git_dir/refs/tags/$tag")
             or die "Cannot create tag $tag: $!\n";
         print C "$commit\n"
             or die "Cannot write tag $tag: $!\n";
         close(C)
             or die "Cannot write tag $tag: $!\n";
-        print " * Created tag ' $tag' on '$commit'\n" if $opt_v;
+        print " * Created tag '$tag' on '$commit'\n" if $opt_v;
     } else {                    # read
-        open(C,"<$git_dir/refs/tags/$tag")
+        open(C,"<","$git_dir/refs/tags/$tag")
             or die "Cannot read tag $tag: $!\n";
         $commit = <C>;
         chomp $commit;
@@ -609,15 +614,16 @@ sub tag {
 # reads fail softly if the tag isn't there
 sub ptag {
     my ($tag, $commit) = @_;
-    $tag =~ s|/|--|g; 
-    $tag = shell_quote($tag);
+
+    # don't use subdirs for tags yet, it could screw up other porcelains
+    $tag =~ s|/|,|g; 
     
-    unless (-d "$git_dir/archimport/tags") {
-        mkpath("$git_dir/archimport/tags");
-    }
+    my $tag_file = "$ptag_dir/$tag";
+    my $tag_branch_dir = dirname($tag_file);
+    mkpath($tag_branch_dir) unless (-d $tag_branch_dir);
 
     if ($commit) {              # write
-        open(C,">$git_dir/archimport/tags/$tag")
+        open(C,">",$tag_file)
             or die "Cannot create tag $tag: $!\n";
         print C "$commit\n"
             or die "Cannot write tag $tag: $!\n";
@@ -627,10 +633,10 @@ sub ptag {
 	    unless $tag =~ m/--base-0$/;
     } else {                    # read
         # if the tag isn't there, return 0
-        unless ( -s "$git_dir/archimport/tags/$tag") {
+        unless ( -s $tag_file) {
             return 0;
         }
-        open(C,"<$git_dir/archimport/tags/$tag")
+        open(C,"<",$tag_file)
             or die "Cannot read tag $tag: $!\n";
         $commit = <C>;
         chomp $commit;
@@ -780,12 +786,7 @@ sub commitid2pset {
     chomp $commitid;
     my $name = $rptags{$commitid} 
 	|| die "Cannot find reverse tag mapping for $commitid";
-    # the keys in %rptag  are slightly munged; unmunge
-    # reconvert the 3rd '--' sequence from the end
-    # into a slash
-    $name = reverse $name;
-    $name =~ s!^(.+?--.+?--.+?--.+?)--(.+)$!$1/$2!;
-    $name = reverse $name;
+    $name =~ s|,|/|;
     my $ps   = $psets{$name} 
 	|| (print Dumper(sort keys %psets)) && die "Cannot find patchset for $name";
     return $ps;
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/5] archimport: don't die on merge-base failure
  2005-11-12  9:25 ` [PATCH 1/5] remove shellquote usage for tags Eric Wong
@ 2005-11-12  9:27   ` Eric Wong
  2005-11-12  9:29     ` [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git Eric Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-12  9:27 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

Don't die if we can't find a merge base, Arch allows arbitrary
cherry-picks between unrelated branches and we should not
die when that happens

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

applies-to: 07dfd96ba53890d6a20fa0b028cf96e0e49bc027
7d099adadc041d74a0defc107656f273b35f57cb
diff --git a/git-archimport.perl b/git-archimport.perl
index 7c15184..699d5f6 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -693,7 +693,13 @@ sub find_parents {
 	next unless -e "$git_dir/refs/heads/$branch";
 
 	my $mergebase = `git-merge-base $branch $ps->{branch}`;
-	die "Cannot find merge base for $branch and $ps->{branch}" if $?;
+ 	if ($?) { 
+ 	    # Don't die here, Arch supports one-way cherry-picking
+ 	    # between branches with no common base (or any relationship
+ 	    # at all beforehand)
+ 	    warn "Cannot find merge base for $branch and $ps->{branch}";
+ 	    next;
+ 	}
 	chomp $mergebase;
 
 	# now walk up to the mergepoint collecting what patches we have
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git
  2005-11-12  9:27   ` [PATCH 2/5] archimport: don't die on merge-base failure Eric Wong
@ 2005-11-12  9:29     ` Eric Wong
  2005-11-12  9:30       ` [PATCH 4/5] Overhaul of changeset application Eric Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-12  9:29 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

[-- Attachment #1: Type: text/plain, Size: 3647 bytes --]

Disambiguate the term 'branch' in Arch vs git,
and start using fully-qualified names.

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |   65 ++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 54 insertions(+), 11 deletions(-)

applies-to: bbfe032e4900efc45bb94fb687af0140ccb0a858
ede672b4cd544b5e5418cc5088e92f2e0d2f7394
diff --git a/git-archimport.perl b/git-archimport.perl
index 699d5f6..f2bcbb4 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -30,6 +30,24 @@ See man (1) git-archimport for more deta
 
 Add print in front of the shell commands invoked via backticks. 
 
+=head1 Devel Notes
+
+There are several places where Arch and git terminology are intermixed
+and potentially confused.
+
+The notion of a "branch" in git is approximately equivalent to
+a "archive/category--branch--version" in Arch.  Also, it should be noted
+that the "--branch" portion of "archive/category--branch--version" is really
+optional in Arch although not many people (nor tools!) seem to know this.
+This means that "archive/category--version" is also a valid "branch"
+in git terms.
+
+We always refer to Arch names by their fully qualified variant (which
+means the "archive" name is prefixed.
+
+For people unfamiliar with Arch, an "archive" is the term for "repository",
+and can contain multiple, unrelated branches.
+
 =cut
 
 use strict;
@@ -215,9 +233,41 @@ unless (-d $git_dir) { # initial import
 }
 
 # process patchsets
-foreach my $ps (@psets) {
+# extract the Arch repository name (Arch "archive" in Arch-speak)
+sub extract_reponame {
+    my $fq_cvbr = shift; # archivename/[[[[category]branch]version]revision]
+    return (split(/\//, $fq_cvbr))[0];
+}
+ 
+sub extract_versionname {
+    my $name = shift;
+    $name =~ s/--(?:patch|version(?:fix)?|base)-\d+$//;
+    return $name;
+}
 
-    $ps->{branch} =  branchname($ps->{id});
+# convert a fully-qualified revision or version to a unique dirname:
+#   normalperson@yhbt.net-05/mpd--uclinux--1--patch-2 
+# becomes: normalperson@yhbt.net-05,mpd--uclinux--1
+#
+# the git notion of a branch is closer to
+# archive/category--branch--version than archive/category--branch, so we
+# use this to convert to git branch names.
+# Also, keep archive names but replace '/' with ',' since it won't require
+# subdirectories, and is safer than swapping '--' which could confuse
+# reverse-mapping when dealing with bastard branches that
+# are just archive/category--version  (no --branch)
+sub tree_dirname {
+    my $revision = shift;
+    my $name = extract_versionname($revision);
+    $name =~ s#/#,#;
+    return $name;
+}
+
+*git_branchname = *tree_dirname;
+
+# process patchsets
+foreach my $ps (@psets) {
+    $ps->{branch} = git_branchname($ps->{id});
 
     #
     # ensure we have a clean state 
@@ -429,16 +479,9 @@ foreach my $ps (@psets) {
     $opt_v && print "   + parents:  $par \n";
 }
 
-sub branchname {
-    my $id = shift;
-    $id =~ s#^.+?/##;
-    my @parts = split(m/--/, $id);
-    return join('--', @parts[0..1]);
-}
-
 sub apply_import {
     my $ps = shift;
-    my $bname = branchname($ps->{id});
+    my $bname = git_branchname($ps->{id});
 
     `mkdir -p $tmp`;
 
@@ -669,7 +712,7 @@ sub find_parents {
     # simple loop to split the merges
     # per branch
     foreach my $merge (@{$ps->{merges}}) {
-	my $branch = branchname($merge);
+	my $branch = git_branchname($merge);
 	unless (defined $branches{$branch} ){
 	    $branches{$branch} = [];
 	}
---
0.99.9.GIT
-- 
Eric Wong

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/5] Overhaul of changeset application
  2005-11-12  9:29     ` [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git Eric Wong
@ 2005-11-12  9:30       ` Eric Wong
  2005-11-12  9:32         ` [PATCH 5/5] -D <depth> option to recurse into merged branches Eric Wong
  2005-11-12 12:07         ` [PATCH 4/5] Overhaul of changeset application Martin Langhoff
  0 siblings, 2 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-12  9:30 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

Overhaul of changeset application to use native Arch tree operations.
This results in:
 - reliable rename handling (esp. when dealing with renamed with files
   that already got renamed)
 - permissions tracking (execute only for git).
 - no need to shell-escape or pika-unescape anything.  All arguments to
   external programs are always passed as an array.  File modifications
   are automatically tracked using git (no need to parse Arch patch-log
   to look for modified files).
 - Correctly parse multi-line summary text in patch-logs

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |  381 ++++++++++++++++++++-------------------------------
 1 files changed, 146 insertions(+), 235 deletions(-)

applies-to: 12cd9f2d764e50ae4fe2c6cd8b64fc72c668e0dd
d3cbba7b8e8e3db61dac685ab55055d360e6138d
diff --git a/git-archimport.perl b/git-archimport.perl
index f2bcbb4..5616d42 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -55,7 +55,7 @@ use warnings;
 use Getopt::Std;
 use File::Spec;
 use File::Temp qw(tempfile tempdir);
-use File::Path qw(mkpath);
+use File::Path qw(mkpath rmtree);
 use File::Basename qw(basename dirname);
 use String::ShellQuote;
 use Time::Local;
@@ -90,16 +90,17 @@ usage if $opt_h;
 @ARGV >= 1 or usage();
 my @arch_roots = @ARGV;
 
-my ($tmpdir, $tmpdirname) = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1);
-my $tmp = $opt_t || 1;
-$tmp = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1);
-$opt_v && print "+ Using $tmp as temporary directory\n";
+my $tmptree;
+$ENV{'TMPDIR'} = $opt_t if $opt_t;
+$tmptree = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1);
+$opt_v && print "+ Using $tmptree to store temporary trees\n";
 
 my @psets  = ();                # the collection
 my %psets  = ();                # the collection, by name
 
 my %rptags = ();                # my reverse private tags
                                 # to map a SHA1 to a commitid
+my $TLA = $ENV{'ARCH_CLIENT'} || 'tla';
 
 foreach my $root (@arch_roots) {
     my ($arepo, $abranch) = split(m!/!, $root);
@@ -211,7 +212,7 @@ unless (-d $git_dir) { # initial import
     }
 } else {    # progressing an import
     # load the rptags
-    opendir(DIR, "$git_dir/archimport/tags")
+    opendir(DIR, $ptag_dir)
 	|| die "can't opendir: $!";
     while (my $file = readdir(DIR)) {
         # skip non-interesting-files
@@ -288,26 +289,37 @@ foreach my $ps (@psets) {
 
     print " * Starting to work on $ps->{id}\n";
 
-    # 
-    # create the branch if needed
-    #
-    if ($ps->{type} eq 'i' && !$import) {
-        die "Should not have more than one 'Initial import' per GIT import: $ps->{id}";
+    # switch to that branch if we're not already in that branch:
+    if (-e "$git_dir/refs/heads/$ps->{branch}") {
+       system('git-checkout','-f',$ps->{branch}) == 0 or die "$! $?\n";
+
+       # remove any old stuff that got leftover:
+       chomp(my @rm = safe_pipe_capture('git-ls-files','--others'));
+       rmtree(\@rm) if @rm;
     }
-
-    unless ($import) { # skip for import
-        if ( -e "$git_dir/refs/heads/$ps->{branch}") {
-            # we know about this branch
-            `git checkout    $ps->{branch}`;
-        } else {
-            # new branch! we need to verify a few things
-            die "Branch on a non-tag!" unless $ps->{type} eq 't';
-            my $branchpoint = ptag($ps->{tag});
-            die "Tagging from unknown id unsupported: $ps->{tag}" 
-                unless $branchpoint;
+   
+    # Apply the import/changeset/merge into the working tree
+    my $dir = sync_to_ps($ps);
+    # read the new log entry:
+    my @commitlog = safe_pipe_capture($TLA,'cat-log','-d',$dir,$ps->{id});
+    die "Error in cat-log: $!" if $?;
+    chomp @commitlog;
+
+    # grab variables we want from the log, new fields get added to $ps:
+    # (author, date, email, summary, message body ...)
+    parselog($ps, \@commitlog);
+
+    if ($ps->{id} =~ /--base-0$/ && $ps->{id} ne $psets[0]{id}) {
+        # this should work when importing continuations 
+        if ($ps->{tag} && (my $branchpoint = eval { ptag($ps->{tag}) })) {
             
             # find where we are supposed to branch from
-            `git checkout -b $ps->{branch} $branchpoint`;
+            system('git-checkout','-f','-b',$ps->{branch},
+                            $branchpoint) == 0 or die "$! $?\n";
+            
+            # remove any old stuff that got leftover:
+            chomp(my @rm = safe_pipe_capture('git-ls-files','--others'));
+            rmtree(\@rm) if @rm;
 
             # If we trust Arch with the fact that this is just 
             # a tag, and it does not affect the state of the tree
@@ -316,95 +328,26 @@ foreach my $ps (@psets) {
             ptag($ps->{id}, $branchpoint);
             print " * Tagged $ps->{id} at $branchpoint\n";
             next;
-        } 
-        die $! if $?;
+        } else {
+            warn "Tagging from unknown id unsupported\n" if $ps->{tag};
+        }
+        # allow multiple bases/imports here since Arch supports cherry-picks
+        # from unrelated trees
     } 
-
-    #
-    # Apply the import/changeset/merge into the working tree
-    # 
-    if ($ps->{type} eq 'i' || $ps->{type} eq 't') {
-        apply_import($ps) or die $!;
-        $import=0;
-    } elsif ($ps->{type} eq 's') {
-        apply_cset($ps);
-    }
-
-    #
-    # prepare update git's index, based on what arch knows
-    # about the pset, resolve parents, etc
-    #
-    my $tree;
     
-    my $commitlog = `tla cat-archive-log -A $ps->{repo} $ps->{id}`; 
-    die "Error in cat-archive-log: $!" if $?;
-        
-    # parselog will git-add/rm files
-    # and generally prepare things for the commit
-    # NOTE: parselog will shell-quote filenames! 
-    my ($sum, $msg, $add, $del, $mod, $ren) = parselog($commitlog);
-    my $logmessage = "$sum\n$msg";
-
-
-    # imports don't give us good info
-    # on added files. Shame on them
-    if ($ps->{type} eq 'i' || $ps->{type} eq 't') { 
-        `find . -type f -print0 | grep -zv '^./$git_dir' | xargs -0 -l100 git-update-index --add`;
-        `git-ls-files --deleted -z | xargs --no-run-if-empty -0 -l100 git-update-index --remove`;
-    }
-
-    if (@$add) {
-        while (@$add) {
-            my @slice = splice(@$add, 0, 100);
-            my $slice = join(' ', @slice);          
-            `git-update-index --add $slice`;
-            die "Error in git-update-index --add: $!" if $?;
-        }
-    }
-    if (@$del) {
-        foreach my $file (@$del) {
-            unlink $file or die "Problems deleting $file : $!";
-        }
-        while (@$del) {
-            my @slice = splice(@$del, 0, 100);
-            my $slice = join(' ', @slice);
-            `git-update-index --remove $slice`;
-            die "Error in git-update-index --remove: $!" if $?;
-        }
-    }
-    if (@$ren) {                # renamed
-        if (@$ren % 2) {
-            die "Odd number of entries in rename!?";
-        }
-        ;
-        while (@$ren) {
-            my $from = pop @$ren;
-            my $to   = pop @$ren;           
-
-            unless (-d dirname($to)) {
-                mkpath(dirname($to)); # will die on err
-            }
-            #print "moving $from $to";
-            `mv $from $to`;
-            die "Error renaming $from $to : $!" if $?;
-            `git-update-index --remove $from`;
-            die "Error in git-update-index --remove: $!" if $?;
-            `git-update-index --add $to`;
-            die "Error in git-update-index --add: $!" if $?;
-        }
-
-    }
-    if (@$mod) {                # must be _after_ renames
-        while (@$mod) {
-            my @slice = splice(@$mod, 0, 100);
-            my $slice = join(' ', @slice);
-            `git-update-index $slice`;
-            die "Error in git-update-index: $!" if $?;
-        }
-    }
-
-    # warn "errors when running git-update-index! $!";
-    $tree = `git-write-tree`;
+    # update the index with all the changes we got
+    system('git-ls-files --others -z | '.
+            'git-update-index --add -z --stdin') == 0 or die "$! $?\n";
+    system('git-ls-files --deleted -z | '.
+            'git-update-index --remove -z --stdin') == 0 or die "$! $?\n";
+
+    # just brute force this and update everything, it's faster than
+    # parsing the Modified-files header and then having to pika-unescape
+    # each one in case it has weird characters
+    system('git-ls-files -z | '.
+             'git-update-index -z --stdin') == 0 or die "$! $?\n";
+    
+    my $tree = `git-write-tree`;
     die "cannot write tree $!" if $?;
     chomp $tree;
         
@@ -414,7 +357,7 @@ foreach my $ps (@psets) {
     #
     my @par;
     if ( -e "$git_dir/refs/heads/$ps->{branch}") {
-        if (open HEAD, "<$git_dir/refs/heads/$ps->{branch}") {
+        if (open HEAD, "<","$git_dir/refs/heads/$ps->{branch}") {
             my $p = <HEAD>;
             close HEAD;
             chomp $p;
@@ -429,7 +372,6 @@ foreach my $ps (@psets) {
     if ($ps->{merges}) {
         push @par, find_parents($ps);
     }
-    my $par = join (' ', @par);
 
     #    
     # Commit, tag and clean state
@@ -442,13 +384,14 @@ foreach my $ps (@psets) {
     $ENV{GIT_COMMITTER_EMAIL} = $ps->{email};
     $ENV{GIT_COMMITTER_DATE}  = $ps->{date};
 
-    my ($pid, $commit_rh, $commit_wh);
-    $commit_rh = 'commit_rh';
-    $commit_wh = 'commit_wh';
-    
-    $pid = open2(*READER, *WRITER, "git-commit-tree $tree $par") 
+    my $pid = open2(*READER, *WRITER, 'git-commit-tree',$tree,@par) 
         or die $!;
-    print WRITER $logmessage;   # write
+    print WRITER $ps->{summary},"\n";
+    print WRITER $ps->{message},"\n";
+
+    # make it easy to backtrack and figure out which Arch revision this was:
+    print WRITER 'git-archimport-id: ',$ps->{id},"\n";
+    
     close WRITER;
     my $commitid = <READER>;    # read
     chomp $commitid;
@@ -461,7 +404,7 @@ foreach my $ps (@psets) {
     #
     # Update the branch
     # 
-    open  HEAD, ">$git_dir/refs/heads/$ps->{branch}";
+    open  HEAD, ">","$git_dir/refs/heads/$ps->{branch}";
     print HEAD $commitid;
     close HEAD;
     unlink ("$git_dir/HEAD");
@@ -476,71 +419,41 @@ foreach my $ps (@psets) {
     print "   + tree   $tree\n";
     print "   + commit $commitid\n";
     $opt_v && print "   + commit date is  $ps->{date} \n";
-    $opt_v && print "   + parents:  $par \n";
+    $opt_v && print "   + parents: ".join(' ',@par)."\n";
 }
 
-sub apply_import {
+sub sync_to_ps {
     my $ps = shift;
-    my $bname = git_branchname($ps->{id});
+    my $tree_dir = $tmptree.'/'.tree_dirname($ps->{id});
 
-    `mkdir -p $tmp`;
-
-    `tla get -s --no-pristine -A $ps->{repo} $ps->{id} $tmp/import`;
-    die "Cannot get import: $!" if $?;    
-    `rsync -v --archive --delete --exclude '$git_dir' --exclude '.arch-ids' --exclude '{arch}' $tmp/import/* ./`;
-    die "Cannot rsync import:$!" if $?;
-    
-    `rm -fr $tmp/import`;
-    die "Cannot remove tempdir: $!" if $?;
-    
-
-    return 1;
-}
-
-sub apply_cset {
-    my $ps = shift;
-
-    `mkdir -p $tmp`;
-
-    # get the changeset
-    `tla get-changeset  -A $ps->{repo} $ps->{id} $tmp/changeset`;
-    die "Cannot get changeset: $!" if $?;
-    
-    # apply patches
-    if (`find $tmp/changeset/patches -type f -name '*.patch'`) {
-        # this can be sped up considerably by doing
-        #    (find | xargs cat) | patch
-        # but that cna get mucked up by patches
-        # with missing trailing newlines or the standard 
-        # 'missing newline' flag in the patch - possibly
-        # produced with an old/buggy diff.
-        # slow and safe, we invoke patch once per patchfile
-        `find $tmp/changeset/patches -type f -name '*.patch' -print0 | grep -zv '{arch}' | xargs -iFILE -0 --no-run-if-empty patch -p1 --forward -iFILE`;
-        die "Problem applying patches! $!" if $?;
-    }
-
-    # apply changed binary files
-    if (my @modified = `find $tmp/changeset/patches -type f -name '*.modified'`) {
-        foreach my $mod (@modified) {
-            chomp $mod;
-            my $orig = $mod;
-            $orig =~ s/\.modified$//; # lazy
-            $orig =~ s!^\Q$tmp\E/changeset/patches/!!;
-            #print "rsync -p '$mod' '$orig'";
-            `rsync -p $mod ./$orig`;
-            die "Problem applying binary changes! $!" if $?;
+    if (-d $tree_dir) {
+        if ($ps->{type} eq 't' && defined $ps->{tag}) {
+            # looks like a tag-only or (worse,) a mixed tags/changeset branch,
+            # can't rely on replay to work correctly on these
+            rmtree($tree_dir);
+            safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir);
+        } else {
+                my $tree_id = arch_tree_id($tree_dir);
+                if ($ps->{parent_id} eq $tree_id) {
+                    safe_pipe_capture($TLA,'replay','-d',$tree_dir,$ps->{id});
+                } else {
+                    safe_pipe_capture($TLA,'apply-delta','-d',$tree_dir,
+                                                        $tree_id, $ps->{id});
+                }
         }
+    } else {
+        safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir);
     }
-
-    # bring in new files
-    `rsync --archive --exclude '$git_dir' --exclude '.arch-ids' --exclude '{arch}' $tmp/changeset/new-files-archive/* ./`;
-
-    # deleted files are hinted from the commitlog processing
-
-    `rm -fr $tmp/changeset`;
+   
+    # added -I flag to rsync since we're going to fast! AIEEEEE!!!!
+    system('rsync','-aI','--delete','--exclude',$git_dir,
+#               '--exclude','.arch-inventory',
+                '--exclude','.arch-ids','--exclude','{arch}',
+                '--exclude','+*','--exclude',',*',
+                "$tree_dir/",'./') == 0 or die "Cannot rsync $tree_dir: $! $?";
+    return $tree_dir;
 }
 
-
 # =for reference
 # A log entry looks like 
 # Revision: moodle-org--moodle--1.3.3--patch-15
@@ -560,70 +473,42 @@ sub apply_cset {
 #     admin/editor.html backup/lib.php backup/restore.php
 # New-patches: arch-eduforge@catalyst.net.nz--2004/moodle-org--moodle--1.3.3--patch-15
 # Summary: Updating to latest from MOODLE_14_STABLE (1.4.5+)
+#   summary can be multiline with a leading space just like the above fields
 # Keywords:
 #
 # Updating yadda tadda tadda madda
 sub parselog {
-    my $log = shift;
-    #print $log;
-
-    my (@add, @del, @mod, @ren, @kw, $sum, $msg );
-
-    if ($log =~ m/(?:\n|^)New-files:(.*?)(?=\n\w)/s ) {
-        my $files = $1;
-        @add = split(m/\s+/s, $files);
-    }
-       
-    if ($log =~ m/(?:\n|^)Removed-files:(.*?)(?=\n\w)/s ) {
-        my $files = $1;
-        @del = split(m/\s+/s, $files);
-    }
-    
-    if ($log =~ m/(?:\n|^)Modified-files:(.*?)(?=\n\w)/s ) {
-        my $files = $1;
-        @mod = split(m/\s+/s, $files);
-    }
-    
-    if ($log =~ m/(?:\n|^)Renamed-files:(.*?)(?=\n\w)/s ) {
-        my $files = $1;
-        @ren = split(m/\s+/s, $files);
-    }
-
-    $sum ='';
-    if ($log =~ m/^Summary:(.+?)$/m ) {
-        $sum = $1;
-        $sum =~ s/^\s+//;
-        $sum =~ s/\s+$//;
-    }
-
-    $msg = '';
-    if ($log =~ m/\n\n(.+)$/s) {
-        $msg = $1;
-        $msg =~ s/^\s+//;
-        $msg =~ s/\s+$//;
-    }
-
-
-    # cleanup the arrays
-    foreach my $ref ( (\@add, \@del, \@mod, \@ren) ) {
-        my @tmp = ();
-        while (my $t = pop @$ref) {
-            next unless length ($t);
-            next if $t =~ m!\{arch\}/!;
-            next if $t =~ m!\.arch-ids/!;
-            next if $t =~ m!\.arch-inventory$!;
-           # tla cat-archive-log will give us filenames with spaces as file\(sp)name - why?
-           # we can assume that any filename with \ indicates some pika escaping that we want to get rid of.
-           if  ($t =~ /\\/ ){
-               $t = `tla escape --unescaped '$t'`;
-           }
-            push (@tmp, shell_quote($t));
+    my ($ps, $log) = @_;
+    my $key = undef;
+    while ($_ = shift @$log) {
+        if (/^Continuation-of:\s*(.*)/) {
+            $ps->{tag} = $1;
+            $key = undef;
+        } elsif (/^Summary:\s*(.*)$/ ) {
+            # summary can be multiline as long as it has a leading space
+            $ps->{summary} = [ $1 ];
+            $key = 'summary';
+        } elsif (/^Creator: (.*)\s*<([^\>]+)>/) {
+            $ps->{author} = $1;
+            $ps->{email} = $2;
+            $key = undef;
+        } elsif (/^$/) {
+            last; # remainder of @$log that didn't get shifted off is message
+        } elsif ($key) {
+            if (/^\s+(.*)$/) {
+                if ($key eq 'summary') {
+                    push @{$ps->{$key}}, $1;
+                } else {
+                    push @{$ps->{$key}}, split(/\s+/, $1);
+                }
+            } else {
+                $key = undef;
+            }
         }
-        @$ref = @tmp;
     }
     
-    #print Dumper [$sum, $msg, \@add, \@del, \@mod, \@ren]; 
-    return       ($sum, $msg, \@add, \@del, \@mod, \@ren); 
+    $ps->{summary} = join("\n",@{$ps->{summary}})."\n";
+    $ps->{message} = join("\n",@$log);
 }
 
 # write/read a tag
@@ -816,8 +701,11 @@ sub find_parents {
 	    }
 	}
     }
-    @parents = keys %parents;
-    @parents = map { " -p " . ptag($_) } @parents;
+    
+    @parents = ();
+    foreach (keys %parents) {
+        push @parents, '-p', ptag($_);
+    }
     return @parents;
 }
 
@@ -840,3 +728,26 @@ sub commitid2pset {
 	|| (print Dumper(sort keys %psets)) && die "Cannot find patchset for $name";
     return $ps;
 }
+
+
+# an alterative to `command` that allows input to be passed as an array
+# to work around shell problems with weird characters in arguments
+sub safe_pipe_capture {
+    my @output;
+    if (my $pid = open my $child, '-|') {
+        @output = (<$child>);
+        close $child or die join(' ',@_).": $! $?";
+    } else {
+	exec(@_) or die $?; # exec() can fail the executable can't be found
+    }
+    return wantarray ? @output : join('',@output);
+}
+
+# `tla logs -rf -d <dir> | head -n1` or `baz tree-id <dir>`
+sub arch_tree_id {
+    my $dir = shift;
+    chomp( my $ret = (safe_pipe_capture($TLA,'logs','-rf','-d',$dir))[0] );
+    return $ret;
+}
+
+
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/5] -D <depth> option to recurse into merged branches
  2005-11-12  9:30       ` [PATCH 4/5] Overhaul of changeset application Eric Wong
@ 2005-11-12  9:32         ` Eric Wong
  2005-11-14  2:01           ` Eric Wong
  2005-11-12 12:07         ` [PATCH 4/5] Overhaul of changeset application Martin Langhoff
  1 sibling, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-12  9:32 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

-D <depth> option to recurse into merged branches
-a auto-register Arch archive if it's on mirrors.sourcecontrol.net

fix for dealing with tag revisions
remove unused module loading (no more String::ShellQuote dep)

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |  257 ++++++++++++++++++++++++++++-----------------------
 1 files changed, 141 insertions(+), 116 deletions(-)

applies-to: d6d3e5272bc39ea086e5c1b0b39ceb5b51ade1ff
2fe160b44c5e5da1a139668767ba184b6b63f605
diff --git a/git-archimport.perl b/git-archimport.perl
index 5616d42..a0ea016 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -22,9 +22,10 @@ See man (1) git-archimport for more deta
 =head1 TODO
 
  - create tag objects instead of ref tags
- - audit shell-escaping of filenames
  - hide our private tags somewhere smarter
- - find a way to make "cat *patches | patch" safe even when patchfiles are missing newlines  
+ - sort and apply patches by graphing ancestry relations instead of just
+   relying in dates supplied in the changeset itself.
+   tla ancestry-graph -m could be helpful here...
 
 =head1 Devel tricks
 
@@ -53,15 +54,9 @@ and can contain multiple, unrelated bran
 use strict;
 use warnings;
 use Getopt::Std;
-use File::Spec;
-use File::Temp qw(tempfile tempdir);
+use File::Temp qw(tempdir);
 use File::Path qw(mkpath rmtree);
 use File::Basename qw(basename dirname);
-use String::ShellQuote;
-use Time::Local;
-use IO::Socket;
-use IO::Pipe;
-use POSIX qw(strftime dup2);
 use Data::Dumper qw/ Dumper /;
 use IPC::Open2;
 
@@ -72,29 +67,35 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git";
 $ENV{"GIT_DIR"} = $git_dir;
 my $ptag_dir = "$git_dir/archimport/tags";
 
-our($opt_h,$opt_v, $opt_T,
-    $opt_C,$opt_t);
+our($opt_h,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a);
 
 sub usage() {
     print STDERR <<END;
 Usage: ${\basename $0}     # fetch/update GIT from Arch
-       [ -h ] [ -v ] [ -T ] [ -t tempdir ] 
+       [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth  ] [ -t tempdir ]
        repository/arch-branch [ repository/arch-branch] ...
 END
     exit(1);
 }
 
-getopts("Thvt:") or usage();
+getopts("Thvat:D:") or usage();
 usage if $opt_h;
 
 @ARGV >= 1 or usage();
-my @arch_roots = @ARGV;
 
+# $arch_branches:
+# values associated with keys:
+#   =1 - Arch version / git 'branch' detected via abrowse on a limit
+#   >1 - Arch version / git 'branch' of an auxilliary branch we've merged
+my %arch_branches = map { $_ => 1 } @ARGV;
+ 
 my $tmptree;
 $ENV{'TMPDIR'} = $opt_t if $opt_t;
 $tmptree = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1);
 $opt_v && print "+ Using $tmptree to store temporary trees\n";
 
+my %reachable = ();             # Arch repositories we can access
+my %unreachable = ();           # Arch repositories we can't access :<
 my @psets  = ();                # the collection
 my %psets  = ();                # the collection, by name
 
@@ -102,114 +103,117 @@ my %rptags = ();                # my rev
                                 # to map a SHA1 to a commitid
 my $TLA = $ENV{'ARCH_CLIENT'} || 'tla';
 
-foreach my $root (@arch_roots) {
-    my ($arepo, $abranch) = split(m!/!, $root);
-    open ABROWSE, "tla abrowse -f -A $arepo --desc --merges $abranch |" 
-        or die "Problems with tla abrowse: $!";
-    
-    my %ps        = ();         # the current one
-    my $mode      = '';
-    my $lastseen  = '';
-    
-    while (<ABROWSE>) {
-        chomp;
-        
-        # first record padded w 8 spaces
-        if (s/^\s{8}\b//) {
-            
-            # store the record we just captured
-            if (%ps) {
-                my %temp = %ps; # break references
-                push (@psets, \%temp);
-		$psets{$temp{id}} = \%temp;
-                %ps = ();
-            }
-            
-            my ($id, $type) = split(m/\s{3}/, $_);
-            $ps{id}   = $id;
-            $ps{repo} = $arepo;
-
-            # deal with types
-            if ($type =~ m/^\(simple changeset\)/) {
-                $ps{type} = 's';
-            } elsif ($type eq '(initial import)') {
-                $ps{type} = 'i';
-            } elsif ($type =~ m/^\(tag revision of (.+)\)/) {
-                $ps{type} = 't';
-                $ps{tag}  = $1;
-            } else { 
-                warn "Unknown type $type";
-            }
-            $lastseen = 'id';
-        }
-        
-        if (s/^\s{10}//) { 
-            # 10 leading spaces or more 
-            # indicate commit metadata
-            
-            # date & author 
-            if ($lastseen eq 'id' && m/^\d{4}-\d{2}-\d{2}/) {
+sub do_abrowse {
+    my $stage = shift;
+    while (my ($limit, $level) = each %arch_branches) {
+        next unless $level == $stage;
+    
+        open ABROWSE, "$TLA abrowse -fkD --merges $limit |" 
+                                or die "Problems with tla abrowse: $!";
+    
+        my %ps        = ();         # the current one
+        my $lastseen  = '';
+    
+        while (<ABROWSE>) {
+            chomp;
+            
+            # first record padded w 8 spaces
+            if (s/^\s{8}\b//) {
+                my ($id, $type) = split(m/\s+/, $_, 2);
+
+                my %last_ps;
+                # store the record we just captured
+                if (%ps && !exists $psets{ $ps{id} }) {
+                    %last_ps = %ps; # break references
+                    push (@psets, \%last_ps);
+                    $psets{ $last_ps{id} } = \%last_ps;
+                }
                 
-                my ($date, $authoremail) = split(m/\s{2,}/, $_);
-                $ps{date}   = $date;
-                $ps{date}   =~ s/\bGMT$//; # strip off trailign GMT
-                if ($ps{date} =~ m/\b\w+$/) {
-                    warn 'Arch dates not in GMT?! - imported dates will be wrong';
+                my $branch = extract_versionname($id);
+                %ps = ( id => $id, branch => $branch );
+                if (%last_ps && ($last_ps{branch} eq $branch)) {
+                    $ps{parent_id} = $last_ps{id};
+                }
+                
+                $arch_branches{$branch} = 1;
+                $lastseen = 'id';
+
+                # deal with types (should work with baz or tla):
+                if ($type =~ m/\(.*changeset\)/) {
+                    $ps{type} = 's';
+                } elsif ($type =~ /\(.*import\)/) {
+                    $ps{type} = 'i';
+                } elsif ($type =~ m/\(tag.*\)/) {
+                    $ps{type} = 't';
+                    # read which revision we've tagged when we parse the log
+                    #$ps{tag}  = $1;
+                } else { 
+                    warn "Unknown type $type";
+                }
+
+                $arch_branches{$branch} = 1;
+                $lastseen = 'id';
+            } elsif (s/^\s{10}//) { 
+                # 10 leading spaces or more 
+                # indicate commit metadata
+                
+                # date
+                if ($lastseen eq 'id' && m/^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d)/){
+                    $ps{date}   = $1;
+                    $lastseen = 'date';
+                } elsif ($_ eq 'merges in:') {
+                    $ps{merges} = [];
+                    $lastseen = 'merges';
+                } elsif ($lastseen eq 'merges' && s/^\s{2}//) {
+                    my $id = $_;
+                    push (@{$ps{merges}}, $id);
+                   
+                    # aggressive branch finding:
+                    if ($opt_D) {
+                        my $branch = extract_versionname($id);
+                        my $repo = extract_reponame($branch);
+                        
+                        if (archive_reachable($repo) &&
+                                !defined $arch_branches{$branch}) {
+                            $arch_branches{$branch} = $stage + 1;
+                        }
+                    }
+                } else {
+                    warn "more metadata after merges!?: $_\n" unless /^\s*$/;
                 }
-            
-                $authoremail =~ m/^(.+)\s(\S+)$/;
-                $ps{author} = $1;
-                $ps{email}  = $2;
-            
-                $lastseen = 'date';
-            
-            } elsif ($lastseen eq 'date') {
-                # the only hint is position
-                # subject is after date
-                $ps{subj} = $_;
-                $lastseen = 'subj';
-            
-            } elsif ($lastseen eq 'subj' && $_ eq 'merges in:') {
-                $ps{merges} = [];
-                $lastseen = 'merges';
-            
-            } elsif ($lastseen eq 'merges' && s/^\s{2}//) {
-                push (@{$ps{merges}}, $_);
-            } else {
-                warn 'more metadata after merges!?';
             }
-            
         }
-    }
 
-    if (%ps) {
-        my %temp = %ps;         # break references
-        push (@psets, \%temp);  
-	$psets{ $temp{id} } = \%temp;
-        %ps = ();
-    }    
-    close ABROWSE;
+        if (%ps && !exists $psets{ $ps{id} }) {
+            my %temp = %ps;         # break references
+            if ($psets[$#psets]{branch} eq $ps{branch}) {
+                $temp{parent_id} = $psets[$#psets]{id};
+            }
+            push (@psets, \%temp);  
+            $psets{ $temp{id} } = \%temp;
+        }    
+        
+        close ABROWSE or die "$TLA abrowse failed on $limit\n";
+    }
 }                               # end foreach $root
 
+do_abrowse(1);
+my $depth = 2;
+$opt_D ||= 0;
+while ($depth <= $opt_D) {
+    do_abrowse($depth);
+    $depth++;
+}
+ 
 ## Order patches by time
+# FIXME see if we can find a more optimal way to do this by graphing
+# the ancestry data and walking it, that way we won't have to rely on
+# client-supplied dates
 @psets = sort {$a->{date}.$b->{id} cmp $b->{date}.$b->{id}} @psets;
 
-#print Dumper \@psets;
-
-##
-## TODO cleanup irrelevant patches
-##      and put an initial import
-##      or a full tag
-my $import = 0;
 unless (-d $git_dir) { # initial import
-    if ($psets[0]{type} eq 'i' || $psets[0]{type} eq 't') {
-        print "Starting import from $psets[0]{id}\n";
-	`git-init-db`;
-	die $! if $?;
-	$import = 1;
-    } else {
-        die "Need to start from an import or a tag -- cannot use $psets[0]{id}";
-    }
+    print "Starting import from $psets[0]{id}\n";
+    system('git-init-db') == 0 or die "$! $?\n";
 } else {    # progressing an import
     # load the rptags
     opendir(DIR, $ptag_dir)
@@ -233,7 +237,6 @@ unless (-d $git_dir) { # initial import
     closedir DIR;
 }
 
-# process patchsets
 # extract the Arch repository name (Arch "archive" in Arch-speak)
 sub extract_reponame {
     my $fq_cvbr = shift; # archivename/[[[[category]branch]version]revision]
@@ -266,21 +269,21 @@ sub tree_dirname {
 
 *git_branchname = *tree_dirname;
 
-# process patchsets
+# process patchsets in ancestry order
 foreach my $ps (@psets) {
     $ps->{branch} = git_branchname($ps->{id});
 
     #
     # ensure we have a clean state 
     # 
-    if (`git diff-files`) {
+    if (`git-diff-files`) {
         die "Unclean tree when about to process $ps->{id} " .
             " - did we fail to commit cleanly before?";
     }
     die $! if $?;
 
     #
-    # skip commits already in repo
+    # skip commits already in git repo
     #
     if (ptag($ps->{id})) {
       $opt_v && print " * Skipping already imported: $ps->{id}\n";
@@ -427,7 +430,7 @@ sub sync_to_ps {
     my $tree_dir = $tmptree.'/'.tree_dirname($ps->{id});
 
     if (-d $tree_dir) {
-        if ($ps->{type} eq 't' && defined $ps->{tag}) {
+        if ($ps->{type} eq 't') {
             # looks like a tag-only or (worse,) a mixed tags/changeset branch,
             # can't rely on replay to work correctly on these
             rmtree($tree_dir);
@@ -435,13 +438,16 @@ sub sync_to_ps {
         } else {
                 my $tree_id = arch_tree_id($tree_dir);
                 if ($ps->{parent_id} eq $tree_id) {
+                    # the common case (hopefully)
                     safe_pipe_capture($TLA,'replay','-d',$tree_dir,$ps->{id});
                 } else {
+                    # this can happen if branches cherry-pick
                     safe_pipe_capture($TLA,'apply-delta','-d',$tree_dir,
                                                         $tree_id, $ps->{id});
                 }
         }
     } else {
+        # new branch work
         safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir);
     }
    
@@ -750,4 +756,23 @@ sub arch_tree_id {
     return $ret;
 }
 
+sub archive_reachable {
+    my $archive = shift;
+    return 1 if $reachable{$archive};
+    return 0 if $unreachable{$archive};
+    
+    if (system "$TLA whereis-archive $archive >/dev/null") {
+        if ($opt_a && (system($TLA,'register-archive',
+                      "http://mirrors.sourcecontrol.net/$archive") == 0)) {
+            $reachable{$archive} = 1;
+            return 1;
+        }
+        print STDERR "Archive is unreachable: $archive\n";
+        $unreachable{$archive} = 1;
+        return 0;
+    } else {
+        $reachable{$archive} = 1;
+        return 1;
+    }
+}
 
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] archimport improvements
  2005-11-12  9:23 [PATCH] archimport improvements Eric Wong
  2005-11-12  9:25 ` [PATCH 1/5] remove shellquote usage for tags Eric Wong
@ 2005-11-12 11:54 ` Martin Langhoff
  2005-11-12 20:21   ` Eric Wong
  2005-11-17  9:26 ` [PATCH] archimport improvements Martin Langhoff
  2 siblings, 1 reply; 39+ messages in thread
From: Martin Langhoff @ 2005-11-12 11:54 UTC (permalink / raw
  To: Eric Wong; +Cc: git list

Eric,


On 11/12/05, Eric Wong <normalperson@yhbt.net> wrote:
> I'm another Arch-user trying out git.  Unfortunately, I encountered
> several problems with git-archimport that I needed fixed before my
> development trees could be imported into git.

Welcome and good stuff! I'll give your patches a try when I sober up.
In the meantime, some notes after having read the patches a bit...

> Bug Fixes:
>
> * Support for '--branch'-less Arch version names.
>   Encoding '/' to '--' (as was previously done) is not 100% reversable
>   because the "--branch" portion of an fully-qualified Arch version name
>   is optional (though not many people or Arch-related tools know this).
>
> * I'm encoding the '/' in the fully-qualified name as ',' to not confuse
>   other porcelains, but leaving '/' in branch names may be alright
>   provided porcelains can support them.
>
> * Identify git branches as an Arch "archive,category<--branch>--version"
>   Anything less than that is ambiguous as far as history and patch
>   relationships go.

These bug/sanity fixes are _good_. As you mention, I wasn't aware that
patchnames could show up not having a --branch part. Tricky...

> * Renamed directories containing renamed/moved files inside didn't get
>   tracked properly.  The original code was inadequate for this, and
>   making it support all rename cases that Arch supports is too much
>   work.  Instead, I maintain full-blown Arch trees in the temp dir and
>   replay patches + rsync based on that.  Performance is slightly slower
>   than before, but accuracy is more important to me.
>
> * Permission (execute bit only because of git) tracking as a side effect
>   of the above.

Hmmm. I understand what you are doing, but I'm not sure we'd want to
replace the current code with this strategy.  Importing large trees
with hundreds (thousands) of commits is so slow it is just a no go.
Renames are described quite well in the 'commit log', and the current
code does handle file renames...

> * Tracking changes from branches that are only cherry-picked now works

Can you elaborate a bit more on this?

> * Pika-escaped filenames unhandled.  This seems fixed in the latest
>   git, but I fixed it more generally and removed the ShellQuote module
>   dependency along the way.

Yes, this got fixed recently. Your change here goes together with the
'tla get' + rsync strategy which I'm not sure about.

> * Don't die() when a merge-base can't be found.  Arch supports
>   merging between unrelated trees.

Fair enough. Does it result on a good graft in git?

> Usability enhancements:
>
> * Optionally detect merged branches and attempt to import their history,
>   too.  Use the -D <depth> option for this.  Specifying a <depth>
>   greater than 1 is usually not needed unless the tree you're tracking
>   has had history pruned.
>
> * Optionally attempt to auto-register unknown Arch archives from
>   mirrors.sourcecontrol.net to pull their history with the -a (boolean)
>   switch.  Not sure how useful users will find this.

Those two are interesting!

> * Removed -A <archive> usage (unnecessary in all cases) and made all
>   Arch calls and output parsing to be compatible with both tla (tested
>   1.3.3) and baz (1.4.2).  Default is still tla, but the ARCH_CLIENT
>   environment variable may be changed to baz.

That's excellent -- thanks!

> Current weaknesses:
>
> * (Present in the original code as well).
>   The code still assumes that dates in commit logs can be trusted, which is
>   fine in most cases, but a wayward branch can screw up git-archimport and
>   cause parents to be missed.

Fair enough. You mention an alternative strategy (tla ancestry) --
have you tried it at all?

cheers,


martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] Overhaul of changeset application
  2005-11-12  9:30       ` [PATCH 4/5] Overhaul of changeset application Eric Wong
  2005-11-12  9:32         ` [PATCH 5/5] -D <depth> option to recurse into merged branches Eric Wong
@ 2005-11-12 12:07         ` Martin Langhoff
  2005-11-12 20:49           ` Eric Wong
  1 sibling, 1 reply; 39+ messages in thread
From: Martin Langhoff @ 2005-11-12 12:07 UTC (permalink / raw
  To: Eric Wong; +Cc: git list

Eric,

 I'd actually like to improve the script to handle directory renames
and file modes correctly so we don't need to ever call the glacially
slow `tla get` -- I don't think it's that much work, all I need is a
sample repo. OTOH, if you think (or can convince me) that there are
more serious problems ahead, perhaps we can have this as an
alternative import mechanism?

On 11/12/05, Eric Wong <normalperson@yhbt.net> wrote:
>  - Correctly parse multi-line summary text in patch-logs

Was this broken!? I'm sure I've imported multiline summaries!

cheers,


martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] archimport improvements
  2005-11-12 11:54 ` [PATCH] archimport improvements Martin Langhoff
@ 2005-11-12 20:21   ` Eric Wong
  2005-11-14 22:38     ` Martin Langhoff
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-12 20:21 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> Eric,
> 
> 
> On 11/12/05, Eric Wong <normalperson@yhbt.net> wrote:
> > I'm another Arch-user trying out git.  Unfortunately, I encountered
> > several problems with git-archimport that I needed fixed before my
> > development trees could be imported into git.
> 
> Welcome and good stuff! I'll give your patches a try when I sober up.
> In the meantime, some notes after having read the patches a bit...
> 
> > Bug Fixes:
> >
> > * Support for '--branch'-less Arch version names.
> >   Encoding '/' to '--' (as was previously done) is not 100% reversable
> >   because the "--branch" portion of an fully-qualified Arch version name
> >   is optional (though not many people or Arch-related tools know this).
> >
> > * I'm encoding the '/' in the fully-qualified name as ',' to not confuse
> >   other porcelains, but leaving '/' in branch names may be alright
> >   provided porcelains can support them.
> >
> > * Identify git branches as an Arch "archive,category<--branch>--version"
> >   Anything less than that is ambiguous as far as history and patch
> >   relationships go.
> 
> These bug/sanity fixes are _good_. As you mention, I wasn't aware that
> patchnames could show up not having a --branch part. Tricky...

Thanks.  I got lazy one day and started ignoring --branch on some of my
personal projects to save my fingers :)

> > * Renamed directories containing renamed/moved files inside didn't get
> >   tracked properly.  The original code was inadequate for this, and
> >   making it support all rename cases that Arch supports is too much
> >   work.  Instead, I maintain full-blown Arch trees in the temp dir and
> >   replay patches + rsync based on that.  Performance is slightly slower
> >   than before, but accuracy is more important to me.
> >
> > * Permission (execute bit only because of git) tracking as a side effect
> >   of the above.
> 
> Hmmm. I understand what you are doing, but I'm not sure we'd want to
> replace the current code with this strategy.  Importing large trees
> with hundreds (thousands) of commits is so slow it is just a no go.
> Renames are described quite well in the 'commit log', and the current
> code does handle file renames...

Untouched files inside renamed directories aren't explicitly tracked.
Renamed directories are especially a pain when a renamed one contains
sub-directories that are also renamed.

> > * Tracking changes from branches that are only cherry-picked now works
> 
> Can you elaborate a bit more on this?

Basically, don't die when merge-base fails, look a few lines down.

> > * Pika-escaped filenames unhandled.  This seems fixed in the latest
> >   git, but I fixed it more generally and removed the ShellQuote module
> >   dependency along the way.
> 
> Yes, this got fixed recently. Your change here goes together with the
> 'tla get' + rsync strategy which I'm not sure about.
> 
> > * Don't die() when a merge-base can't be found.  Arch supports
> >   merging between unrelated trees.
> 
> Fair enough. Does it result on a good graft in git?

Right now I end up with separate branches that are imported (according
to git-branch) but the git-log and gitk don't seem to to show
relationships between the unrelated trees.  I think find_parents()
may need to use an alternate strategy instead of warning and skipping
if a merge-base can't be found.

> > Usability enhancements:
> >
> > * Optionally detect merged branches and attempt to import their history,
> >   too.  Use the -D <depth> option for this.  Specifying a <depth>
> >   greater than 1 is usually not needed unless the tree you're tracking
> >   has had history pruned.
> >
> > * Optionally attempt to auto-register unknown Arch archives from
> >   mirrors.sourcecontrol.net to pull their history with the -a (boolean)
> >   switch.  Not sure how useful users will find this.
> 
> Those two are interesting!
> 
> > * Removed -A <archive> usage (unnecessary in all cases) and made all
> >   Arch calls and output parsing to be compatible with both tla (tested
> >   1.3.3) and baz (1.4.2).  Default is still tla, but the ARCH_CLIENT
> >   environment variable may be changed to baz.
> 
> That's excellent -- thanks!
> 
> > Current weaknesses:
> >
> > * (Present in the original code as well).
> >   The code still assumes that dates in commit logs can be trusted, which is
> >   fine in most cases, but a wayward branch can screw up git-archimport and
> >   cause parents to be missed.
> 
> Fair enough. You mention an alternative strategy (tla ancestry) --
> have you tried it at all?

No, not yet.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] Overhaul of changeset application
  2005-11-12 12:07         ` [PATCH 4/5] Overhaul of changeset application Martin Langhoff
@ 2005-11-12 20:49           ` Eric Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-12 20:49 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> Eric,
> 
>  I'd actually like to improve the script to handle directory renames
> and file modes correctly so we don't need to ever call the glacially
> slow `tla get` -- I don't think it's that much work, all I need is a
> sample repo. OTOH, if you think (or can convince me) that there are
> more serious problems ahead, perhaps we can have this as an
> alternative import mechanism?

tla get is hardly ever called, I don't think it's called any more than
before, even.  tla replay by far the most common case and still
reasonably fast.  I had to add the -I flag to rsync because it was going
at > 1 patch per second, plenty fast enough for me.

I also had an alternate implementation for using the revision library,
but that was slower than the current strategy because it had to do
two full Arch tree integrity checks for each patch applied.

Even on a hot (fully filled) revlib and all I had to do was tla
library-find + rsync, it ran more slowly, probably because rsync
couldn't take advantage of kernel/fs-level caching when it had to
work on a different directory each time.

Tracking renamed directories (especially when nested subdirectories are
also renamed) is very, very far from pleasant.

> On 11/12/05, Eric Wong <normalperson@yhbt.net> wrote:
> >  - Correctly parse multi-line summary text in patch-logs
> 
> Was this broken!? I'm sure I've imported multiline summaries!

It only got the first summary line when I tried it.  Also, it's possible
for hand-made message bodies to fool archimport if it has "headers"
after the first \n\n.  IIRC, some old tools copied entire logs of merged
changesets into the message body.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] -D <depth> option to recurse into merged branches
  2005-11-12  9:32         ` [PATCH 5/5] -D <depth> option to recurse into merged branches Eric Wong
@ 2005-11-14  2:01           ` Eric Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-14  2:01 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

One small fix on top of this one:

Don't check for parents if the only revision we have is a base-0
and @psets is empty.

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

applies-to: 8a7e18ff0884cae74a1127d5c96577a85acca3f4
5f2896558284724bcc87eb64daa0933b544ec20d
diff --git a/git-archimport.perl b/git-archimport.perl
index a0ea016..b624ba6 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -186,7 +186,7 @@ sub do_abrowse {
 
         if (%ps && !exists $psets{ $ps{id} }) {
             my %temp = %ps;         # break references
-            if ($psets[$#psets]{branch} eq $ps{branch}) {
+            if (@psets && $psets[$#psets]{branch} eq $ps{branch}) {
                 $temp{parent_id} = $psets[$#psets]{id};
             }
             push (@psets, \%temp);  
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] archimport improvements
  2005-11-12 20:21   ` Eric Wong
@ 2005-11-14 22:38     ` Martin Langhoff
  2005-11-15  8:03       ` Eric Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Langhoff @ 2005-11-14 22:38 UTC (permalink / raw
  To: Eric Wong; +Cc: git list

Eric,

thanks for resending those so quickly. I think I'm going to sit on the
'overhaul of changeset application' patch a bit -- I'll test & ack
your other patches for merge soonish but I want to review and test
this one carefully.

My main concern is that it seems to be calling tla get for each
revision that it imports. For large trees, this is slow. I would be
much happier with a fast Perl-based approach. Have you got a public
repo with directory renames?

Additional comments follow...

On 11/13/05, Eric Wong <normalperson@yhbt.net> wrote:
> > > * Identify git branches as an Arch "archive,category<--branch>--version"
> > >   Anything less than that is ambiguous as far as history and patch
> > >   relationships go.
> >
> > These bug/sanity fixes are _good_. As you mention, I wasn't aware that
> > patchnames could show up not having a --branch part. Tricky...
>
> Thanks.  I got lazy one day and started ignoring --branch on some of my
> personal projects to save my fingers :)

Yup, makes sense. My concern now is that existing imports will change
the name of branches and tags going forward. Can I ask you to resend
that patch with the new branchname mangling as default, and the old
one as optional?

I know it'll force us to go back to using shellquote, but I am not too
worried by that dependency at the moment.

> > > Current weaknesses:
> > >
> > > * (Present in the original code as well).
> > >   The code still assumes that dates in commit logs can be trusted, which is
> > >   fine in most cases, but a wayward branch can screw up git-archimport and
> > >   cause parents to be missed.
> >
> > Fair enough. You mention an alternative strategy (tla ancestry) --
> > have you tried it at all?
>
> No, not yet.

Also interested in this if you get around to it.

cheers,


martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] archimport improvements
  2005-11-14 22:38     ` Martin Langhoff
@ 2005-11-15  8:03       ` Eric Wong
  2005-11-15  8:05         ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-15  8:03 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> Eric,
> 
> thanks for resending those so quickly. I think I'm going to sit on the
> 'overhaul of changeset application' patch a bit -- I'll test & ack
> your other patches for merge soonish but I want to review and test
> this one carefully.
> 
> My main concern is that it seems to be calling tla get for each
> revision that it imports. For large trees, this is slow. I would be
> much happier with a fast Perl-based approach. Have you got a public
> repo with directory renames?

Please read my sync_to_ps() function very carefully.  Next is a patch
that helps you track which Arch command (get/replay/apply-delta) is used
for each changeset.

tla replay is the most common for any halfway normal (changeset-based)
tree by far.

tla get is not called any more often than before.

apply-delta is hardly, if ever called.  It may not even be reachable
unless somebody commits revisions to the same tree with clocks out of
order from patchlevel order.  Heck, if it's ever called, it's most
likely faster just to rmtree and tla get again.

Unfortunately, my heavily used and abused trees are private.

> Additional comments follow...
> 
> On 11/13/05, Eric Wong <normalperson@yhbt.net> wrote:
> > > > * Identify git branches as an Arch "archive,category<--branch>--version"
> > > >   Anything less than that is ambiguous as far as history and patch
> > > >   relationships go.
> > >
> > > These bug/sanity fixes are _good_. As you mention, I wasn't aware that
> > > patchnames could show up not having a --branch part. Tricky...
> >
> > Thanks.  I got lazy one day and started ignoring --branch on some of my
> > personal projects to save my fingers :)
> 
> Yup, makes sense. My concern now is that existing imports will change
> the name of branches and tags going forward. Can I ask you to resend
> that patch with the new branchname mangling as default, and the old
> one as optional?

Ok, good idea.  My previous patch already automatically converted the
private tags, which we actually need to parse, and I see no reason to
change that, but branch names and public tags which affect
non-gitarchimport users can be preserved with the -o flag.

> I know it'll force us to go back to using shellquote, but I am not too
> worried by that dependency at the moment.
 
Actually, usage of shell_quote() in git-archimport was always
unnecessary.  Passing arguments to external programs as an array,
using the 3-argument version of open() for files, and using -z in
git-commands with pipes are better ways to go.

> > > > Current weaknesses:
> > > >
> > > > * (Present in the original code as well).
> > > >   The code still assumes that dates in commit logs can be trusted, which is
> > > >   fine in most cases, but a wayward branch can screw up git-archimport and
> > > >   cause parents to be missed.
> > >
> > > Fair enough. You mention an alternative strategy (tla ancestry) --
> > > have you tried it at all?
> >
> > No, not yet.
> 
> Also interested in this if you get around to it.

It's not a high priority for me and I probably don't have time to do
this.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/2] archimport: allow for old style branch and public tag names
  2005-11-15  8:03       ` Eric Wong
@ 2005-11-15  8:05         ` Eric Wong
  2005-11-15  8:06           ` [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods Eric Wong
  2005-11-15  8:07           ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong
  0 siblings, 2 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-15  8:05 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

This patch adds the -o switch, which lets old trees tracked by
git-archmirror continue working with their old branch and tag names
to make life easier for people tracking your tree.

Private tags that are only used internally by git-archimport continue to be
new-style, and automatically converted upon first run.

Signed-off-by:: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |   22 +++++++++++++++++-----
 1 files changed, 17 insertions(+), 5 deletions(-)

applies-to: 44d831812786f4dfbf54a67b51e5f48c7d5afd66
4b341dd903883db0a89fe2f04e93dab053beb045
diff --git a/git-archimport.perl b/git-archimport.perl
index 1f721f6..304d462 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -67,12 +67,12 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git";
 $ENV{"GIT_DIR"} = $git_dir;
 my $ptag_dir = "$git_dir/archimport/tags";
 
-our($opt_h,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a);
+our($opt_h,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a,$opt_o);
 
 sub usage() {
     print STDERR <<END;
 Usage: ${\basename $0}     # fetch/update GIT from Arch
-       [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth  ] [ -t tempdir ]
+       [ -o ] [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth  ] [ -t tempdir ]
        repository/arch-branch [ repository/arch-branch] ...
 END
     exit(1);
@@ -267,7 +267,15 @@ sub tree_dirname {
     return $name;
 }
 
-*git_branchname = *tree_dirname;
+# old versions of git-archimport just use the <category--branch> part:
+sub old_style_branchname {
+    my $id = shift;
+    my $ret = safe_pipe_capture($TLA,'parse-package-name','-p',$id);
+    chomp $ret;
+    return $ret;
+}
+
+*git_branchname = $opt_o ? *old_style_branchname : *tree_dirname;
 
 # process patchsets in ancestry order
 foreach my $ps (@psets) {
@@ -527,8 +535,12 @@ sub parselog {
 sub tag {
     my ($tag, $commit) = @_;
  
-    # don't use subdirs for tags yet, it could screw up other porcelains
-    $tag =~ s|/|,|;
+    if ($opt_o) {
+        $tag =~ s|/|--|g;
+    } else {
+        # don't use subdirs for tags yet, it could screw up other porcelains
+        $tag =~ s|/|,|g;
+    }
     
     if ($commit) {
         open(C,">","$git_dir/refs/tags/$tag")
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods
  2005-11-15  8:05         ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong
@ 2005-11-15  8:06           ` Eric Wong
  2005-11-15  8:07           ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong
  1 sibling, 0 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-15  8:06 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

This patch adds debug messages (enabled with the usual -v switch) for
tracking how often each tla command is called.

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

applies-to: 8a7cc429d0fd935805851ac5ac10941d0bd86e94
b4de7920e0116afb35016435131a404658818ced
diff --git a/git-archimport.perl b/git-archimport.perl
index b624ba6..1f721f6 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -429,18 +429,23 @@ sub sync_to_ps {
     my $ps = shift;
     my $tree_dir = $tmptree.'/'.tree_dirname($ps->{id});
 
+    $opt_v && print "sync_to_ps($ps->{id}) method: ";
+
     if (-d $tree_dir) {
         if ($ps->{type} eq 't') {
             # looks like a tag-only or (worse,) a mixed tags/changeset branch,
             # can't rely on replay to work correctly on these
             rmtree($tree_dir);
+	    $opt_v && print "get (tag)\n";
             safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir);
         } else {
                 my $tree_id = arch_tree_id($tree_dir);
                 if ($ps->{parent_id} eq $tree_id) {
                     # the common case (hopefully)
+		    $opt_v && print "replay\n";
                     safe_pipe_capture($TLA,'replay','-d',$tree_dir,$ps->{id});
                 } else {
+		    $opt_v && print "apply-delta\n";
                     # this can happen if branches cherry-pick
                     safe_pipe_capture($TLA,'apply-delta','-d',$tree_dir,
                                                         $tree_id, $ps->{id});
@@ -448,6 +453,7 @@ sub sync_to_ps {
         }
     } else {
         # new branch work
+        $opt_v && print "get (new tree)\n";
         safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir);
     }
    
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] archimport: allow for old style branch and public tag names
  2005-11-15  8:05         ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong
  2005-11-15  8:06           ` [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods Eric Wong
@ 2005-11-15  8:07           ` Eric Wong
  1 sibling, 0 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-15  8:07 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

Oops, I sent these two patches out of order.  They should apply
fine without conflicts either way.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] archimport improvements
  2005-11-12  9:23 [PATCH] archimport improvements Eric Wong
  2005-11-12  9:25 ` [PATCH 1/5] remove shellquote usage for tags Eric Wong
  2005-11-12 11:54 ` [PATCH] archimport improvements Martin Langhoff
@ 2005-11-17  9:26 ` Martin Langhoff
  2005-11-24  7:46   ` Eric Wong
  2 siblings, 1 reply; 39+ messages in thread
From: Martin Langhoff @ 2005-11-17  9:26 UTC (permalink / raw
  To: Eric Wong; +Cc: git list

Eric,

I've merged and pushed out to
http://locke.catalyst.net.nz/git/git-martinlanghoff.git/#tojunio

  [PATCH 1/5] remove shellquote usage for tags
  [PATCH 2/5] archimport: don't die on merge-base failure
  [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git
  [PATCH 1/2] archimport: allow for old style branch and public tag names

That last one had a small edit to rebase it to the top of the head --
will probably have a small conflict for you on the usage line and
getopts() line.

What is pending is...

*  [PATCH 4/5] Overhaul of changeset application

I am testing it right now. Finding it rather slow on an idle linux
workstation with fast IDE disks, no X.org loaded and 1GB or RAM.
iowait is pegged at 90%. Wonder what will happen on a system with slow
disk access. tla/baz are unusable under any OS where the fs stack is
not _that_ polished (OSX and friends).

The early versions of the import also used $TLA for all ops, and I was
forced to change it to get my repos transformed in a reasonable time.

Can you send me a patch that makes it optional, so users can choose
fast or correct? I don't want to force glacial imports on anyone,
specially me. Testing an import of a reasonably sized repo must be a
quick operation or I won't do it ;-) And I do work on OSX too.

On the other hand, I might just implement renamed directories tracking
separately, specially if someone can point me to a public repo with
some interesting cases of renamed directories.

These patches seem to hang from 4/5 so will need rebasing after a
reworked. The first one seems to be 3 or 4 patches in one. It'd be
good to break it up.

* [ PATCH 5/5] -D <depth> option to recurse into merged branches
* Re: [PATCH 5/5] -D <depth> option to recurse into merged branches
* [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods

If you want to see the repos I'm testing with, register
arch-eduforge@catalyst.net.nz--2004
http://nzvle.eduforge.org/arch-mirror/ and try:

~/local/git/git-archimport.perl -v \
    arch-eduforge@catalyst.net.nz--2004/moodle-org--moodle  \
    arch-eduforge@catalyst.net.nz--2004/moodle--local \
    arch-eduforge@catalyst.net.nz--2004/moodle--local-forum-types \
    arch-eduforge@catalyst.net.nz--2004/moodle--local-lock-content \
    arch-eduforge@catalyst.net.nz--2004/moodle--nmit \
    arch-eduforge@catalyst.net.nz--2004/moodle--topnz

cheers,


martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] archimport improvements
  2005-11-17  9:26 ` [PATCH] archimport improvements Martin Langhoff
@ 2005-11-24  7:46   ` Eric Wong
  2005-11-24  7:47     ` [PATCH 1/9] archimport: first, make sure it still compiles Eric Wong
  2005-11-24  9:25     ` [PATCH] archimport improvements Martin Langhoff
  0 siblings, 2 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-24  7:46 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> Eric,
> 
> I've merged and pushed out to
> http://locke.catalyst.net.nz/git/git-martinlanghoff.git/#tojunio
> 
>   [PATCH 1/5] remove shellquote usage for tags
>   [PATCH 2/5] archimport: don't die on merge-base failure
>   [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git
>   [PATCH 1/2] archimport: allow for old style branch and public tag names
> 
> That last one had a small edit to rebase it to the top of the head --
> will probably have a small conflict for you on the usage line and
> getopts() line.
> 
> What is pending is...
> 
> *  [PATCH 4/5] Overhaul of changeset application
> 
> I am testing it right now. Finding it rather slow on an idle linux
> workstation with fast IDE disks, no X.org loaded and 1GB or RAM.
> iowait is pegged at 90%. Wonder what will happen on a system with slow
> disk access. tla/baz are unusable under any OS where the fs stack is
> not _that_ polished (OSX and friends).

Ok, I didn't expect you guys to have 12k of files in your trees.  None
of your source trees are remotely close to that size (but I have many
more changesets).  I'm surprised you guys were able to put up
with Arch in the first place!

125m58.431s with my method.
  8m24.504s with yours :)

All of my usual source trees imported 1k changesets in 10-15 minutes

> The early versions of the import also used $TLA for all ops, and I was
> forced to change it to get my repos transformed in a reasonable time.
> 
> Can you send me a patch that makes it optional, so users can choose
> fast or correct? I don't want to force glacial imports on anyone,
> specially me. Testing an import of a reasonably sized repo must be a
> quick operation or I won't do it ;-) And I do work on OSX too.

Patches on the way.

OTOH, the time spent importing the bulk of the history is a one-time
operation for most people and I'd much rather it get things as right as
possible and move on.

> On the other hand, I might just implement renamed directories tracking
> separately, specially if someone can point me to a public repo with
> some interesting cases of renamed directories.

IIRC, there are several nasty cases all of which are ordering-related,
especially with regard to nested directories or file renames inside
directories that are also renamed.  It should be noted that not even tla
gets all the possible directory rename cases right (baz seems better
from my observations). 

> These patches seem to hang from 4/5 so will need rebasing after a
> reworked. The first one seems to be 3 or 4 patches in one. It'd be
> good to break it up.

Sorry, I rushed through the initial overhaul and didn't generate neat
patches because I wanted to get some of my work moved to git ASAP.

> * [ PATCH 5/5] -D <depth> option to recurse into merged branches
> * Re: [PATCH 5/5] -D <depth> option to recurse into merged branches
> * [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/9] archimport: first, make sure it still compiles
  2005-11-24  7:46   ` Eric Wong
@ 2005-11-24  7:47     ` Eric Wong
  2005-11-24  7:48       ` [PATCH 2/9] remove String::ShellQuote dependency Eric Wong
  2005-11-24 18:54       ` [PATCH 1/9] archimport: first, make sure it still compiles Linus Torvalds
  2005-11-24  9:25     ` [PATCH] archimport improvements Martin Langhoff
  1 sibling, 2 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-24  7:47 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

applies-to: a17c1f442587b9c7d68b4f7e08c5f6786599c61e
119b07aa2bdb23d5f4977c4d696dd5e7eea56ca6
diff --git a/git-archimport.perl b/git-archimport.perl
index c3bed08..b5f8a2c 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -99,6 +99,7 @@ my %psets  = ();                # the co
 
 my %rptags = ();                # my reverse private tags
                                 # to map a SHA1 to a commitid
+my $TLA = $ENV{'ARCH_CLIENT'} || 'tla';
 
 foreach my $root (@arch_roots) {
     my ($arepo, $abranch) = split(m!/!, $root);
@@ -850,3 +851,18 @@ sub commitid2pset {
 	|| (print Dumper(sort keys %psets)) && die "Cannot find patchset for $name";
     return $ps;
 }
+
+# an alterative to `command` that allows input to be passed as an array
+# to work around shell problems with weird characters in arguments
+sub safe_pipe_capture {
+    my @output;
+    if (my $pid = open my $child, '-|') {
+        @output = (<$child>);
+        close $child or die join(' ',@_).": $! $?";
+    } else {
+	exec(@_) or die $?; # exec() can fail the executable can't be found
+    }
+    return wantarray ? @output : join('',@output);
+}
+
+
---
0.99.9.GIT

-- 
Eric Wong

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/9] remove String::ShellQuote dependency.
  2005-11-24  7:47     ` [PATCH 1/9] archimport: first, make sure it still compiles Eric Wong
@ 2005-11-24  7:48       ` Eric Wong
  2005-11-24  7:50         ` [PATCH 3/9] fix -t tmpdir switch Eric Wong
  2005-11-24 18:54       ` [PATCH 1/9] archimport: first, make sure it still compiles Linus Torvalds
  1 sibling, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-24  7:48 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff

use safe_pipe_capture() or system() over backticks where
shellquoting may have been necessary.
More changes planned, so I'm not touching the parts I'm
planning on replacing entirely.

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |   51 ++++++++++++++++++++++++++++-----------------------
 1 files changed, 28 insertions(+), 23 deletions(-)

applies-to: 83307766d30e928179b9aa85a3d7bb906cc08846
80494a7d496ab9f6e0a76a60b1f0b4215fdff442
diff --git a/git-archimport.perl b/git-archimport.perl
index b5f8a2c..b7e2480 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -55,9 +55,8 @@ use warnings;
 use Getopt::Std;
 use File::Spec;
 use File::Temp qw(tempfile tempdir);
-use File::Path qw(mkpath);
+use File::Path qw(mkpath rmtree);
 use File::Basename qw(basename dirname);
-use String::ShellQuote;
 use Time::Local;
 use IO::Socket;
 use IO::Pipe;
@@ -306,7 +305,7 @@ foreach my $ps (@psets) {
     unless ($import) { # skip for import
         if ( -e "$git_dir/refs/heads/$ps->{branch}") {
             # we know about this branch
-            `git checkout    $ps->{branch}`;
+            system('git-checkout',$ps->{branch});
         } else {
             # new branch! we need to verify a few things
             die "Branch on a non-tag!" unless $ps->{type} eq 't';
@@ -315,7 +314,7 @@ foreach my $ps (@psets) {
                 unless $branchpoint;
             
             # find where we are supposed to branch from
-            `git checkout -b $ps->{branch} $branchpoint`;
+            system('git-checkout','-b',$ps->{branch},$branchpoint);
 
             # If we trust Arch with the fact that this is just 
             # a tag, and it does not affect the state of the tree
@@ -344,7 +343,7 @@ foreach my $ps (@psets) {
     #
     my $tree;
     
-    my $commitlog = `tla cat-archive-log -A $ps->{repo} $ps->{id}`; 
+    my $commitlog = safe_pipe_capture($TLA,'cat-archive-log',$ps->{id}); 
     die "Error in cat-archive-log: $!" if $?;
         
     # parselog will git-add/rm files
@@ -422,7 +421,7 @@ foreach my $ps (@psets) {
     #
     my @par;
     if ( -e "$git_dir/refs/heads/$ps->{branch}") {
-        if (open HEAD, "<$git_dir/refs/heads/$ps->{branch}") {
+        if (open HEAD, "<","$git_dir/refs/heads/$ps->{branch}") {
             my $p = <HEAD>;
             close HEAD;
             chomp $p;
@@ -437,7 +436,6 @@ foreach my $ps (@psets) {
     if ($ps->{merges}) {
         push @par, find_parents($ps);
     }
-    my $par = join (' ', @par);
 
     #    
     # Commit, tag and clean state
@@ -454,7 +452,7 @@ foreach my $ps (@psets) {
     $commit_rh = 'commit_rh';
     $commit_wh = 'commit_wh';
     
-    $pid = open2(*READER, *WRITER, "git-commit-tree $tree $par") 
+    $pid = open2(*READER, *WRITER,'git-commit-tree',$tree,@par) 
         or die $!;
     print WRITER $logmessage;   # write
     close WRITER;
@@ -469,7 +467,7 @@ foreach my $ps (@psets) {
     #
     # Update the branch
     # 
-    open  HEAD, ">$git_dir/refs/heads/$ps->{branch}";
+    open  HEAD, ">","$git_dir/refs/heads/$ps->{branch}";
     print HEAD $commitid;
     close HEAD;
     system('git-update-ref', 'HEAD', "$ps->{branch}");
@@ -483,21 +481,23 @@ foreach my $ps (@psets) {
     print "   + tree   $tree\n";
     print "   + commit $commitid\n";
     $opt_v && print "   + commit date is  $ps->{date} \n";
-    $opt_v && print "   + parents:  $par \n";
+    $opt_v && print "   + parents:  ",join(' ',@par),"\n";
 }
 
 sub apply_import {
     my $ps = shift;
     my $bname = git_branchname($ps->{id});
 
-    `mkdir -p $tmp`;
+    mkpath($tmp);
 
-    `tla get -s --no-pristine -A $ps->{repo} $ps->{id} $tmp/import`;
+    safe_pipe_capture($TLA,'get','-s','--no-pristine',$ps->{id},"$tmp/import");
     die "Cannot get import: $!" if $?;    
-    `rsync -v --archive --delete --exclude '$git_dir' --exclude '.arch-ids' --exclude '{arch}' $tmp/import/* ./`;
+    system('rsync','-aI','--delete', '--exclude',$git_dir,
+		'--exclude','.arch-ids','--exclude','{arch}',
+		"$tmp/import/", './');
     die "Cannot rsync import:$!" if $?;
     
-    `rm -fr $tmp/import`;
+    rmtree("$tmp/import");
     die "Cannot remove tempdir: $!" if $?;
     
 
@@ -507,10 +507,10 @@ sub apply_import {
 sub apply_cset {
     my $ps = shift;
 
-    `mkdir -p $tmp`;
+    mkpath($tmp);
 
     # get the changeset
-    `tla get-changeset  -A $ps->{repo} $ps->{id} $tmp/changeset`;
+    safe_pipe_capture($TLA,'get-changeset',$ps->{id},"$tmp/changeset");
     die "Cannot get changeset: $!" if $?;
     
     # apply patches
@@ -534,17 +534,20 @@ sub apply_cset {
             $orig =~ s/\.modified$//; # lazy
             $orig =~ s!^\Q$tmp\E/changeset/patches/!!;
             #print "rsync -p '$mod' '$orig'";
-            `rsync -p $mod ./$orig`;
+            system('rsync','-p',$mod,"./$orig");
             die "Problem applying binary changes! $!" if $?;
         }
     }
 
     # bring in new files
-    `rsync --archive --exclude '$git_dir' --exclude '.arch-ids' --exclude '{arch}' $tmp/changeset/new-files-archive/* ./`;
+    system('rsync','-aI','--exclude',$git_dir,
+    		'--exclude','.arch-ids',
+		'--exclude', '{arch}',
+		"$tmp/changeset/new-files-archive/",'./');
 
     # deleted files are hinted from the commitlog processing
 
-    `rm -fr $tmp/changeset`;
+    rmtree("$tmp/changeset");
 }
 
 
@@ -622,9 +625,9 @@ sub parselog {
            # tla cat-archive-log will give us filenames with spaces as file\(sp)name - why?
            # we can assume that any filename with \ indicates some pika escaping that we want to get rid of.
            if  ($t =~ /\\/ ){
-               $t = `tla escape --unescaped '$t'`;
+               $t = (safe_pipe_capture($TLA,'escape','--unescaped',$t))[0];
            }
-            push (@tmp, shell_quote($t));
+            push (@tmp, $t);
         }
         @$ref = @tmp;
     }
@@ -827,8 +830,10 @@ sub find_parents {
 	    }
 	}
     }
-    @parents = keys %parents;
-    @parents = map { " -p " . ptag($_) } @parents;
+    @parents = ();
+    foreach (keys %parents) {
+        push @parents, '-p', ptag($_);
+    }
     return @parents;
 }
 
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/9] fix -t tmpdir switch
  2005-11-24  7:48       ` [PATCH 2/9] remove String::ShellQuote dependency Eric Wong
@ 2005-11-24  7:50         ` Eric Wong
  2005-11-24  7:51           ` [PATCH 4/9] remove git wrapper dependency Eric Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-24  7:50 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff

set TMPDIR env correctly if -t <tmpdir> is passed from the command-line.
setting TMPDIR => 1 as an argument to tempdir() has no effect otherwise

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

applies-to: 015fcfae8cdd564f0993940c5bac303c41913b1a
25aadaf3ebc18fcc3c7948dc831d3f93447b03b6
diff --git a/git-archimport.perl b/git-archimport.perl
index b7e2480..2ed2e3c 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -88,9 +88,8 @@ usage if $opt_h;
 @ARGV >= 1 or usage();
 my @arch_roots = @ARGV;
 
-my ($tmpdir, $tmpdirname) = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1);
-my $tmp = $opt_t || 1;
-$tmp = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1);
+$ENV{'TMPDIR'} = $opt_t if $opt_t; # $ENV{TMPDIR} will affect tempdir() calls:
+my $tmp = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1);
 $opt_v && print "+ Using $tmp as temporary directory\n";
 
 my @psets  = ();                # the collection
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/9] remove git wrapper dependency
  2005-11-24  7:50         ` [PATCH 3/9] fix -t tmpdir switch Eric Wong
@ 2005-11-24  7:51           ` Eric Wong
  2005-11-24  7:52             ` [PATCH 5/9] add -D <depth> and -a switch Eric Wong
  2005-11-24  8:20             ` [PATCH 4/9] remove git wrapper dependency Andreas Ericsson
  0 siblings, 2 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-24  7:51 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff

use git-diff-files instead of git diff-files so we don't rely on the
wrapper being installed (some people may have git as GNU interactive
tools :)

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

applies-to: 36c4ad6c2442e7700e5f1435d834f86d2680dd7f
40b5378433aa6b09dd358d482d7713c2db6a5d92
diff --git a/git-archimport.perl b/git-archimport.perl
index 2ed2e3c..938fa2b 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -278,7 +278,7 @@ foreach my $ps (@psets) {
     #
     # ensure we have a clean state 
     # 
-    if (`git diff-files`) {
+    if (`git-diff-files`) {
         die "Unclean tree when about to process $ps->{id} " .
             " - did we fail to commit cleanly before?";
     }
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/9] add -D <depth> and -a switch
  2005-11-24  7:51           ` [PATCH 4/9] remove git wrapper dependency Eric Wong
@ 2005-11-24  7:52             ` Eric Wong
  2005-11-24  7:53               ` [PATCH 6/9] safer log file parsing Eric Wong
  2005-11-24  8:20             ` [PATCH 4/9] remove git wrapper dependency Andreas Ericsson
  1 sibling, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-24  7:52 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff


add -D <depth> option to abrowse add -a switch to attempt to
auto-register archives at mirrors.sourcecontrol.net

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |  227 +++++++++++++++++++++++++++++++--------------------
 1 files changed, 136 insertions(+), 91 deletions(-)

applies-to: 4fd19922030196431179453910cb2b850b54c17d
c52baf62ec29772f7a4e5b0aa2b605bf0f6aa8f7
diff --git a/git-archimport.perl b/git-archimport.perl
index 938fa2b..3968740 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -53,14 +53,9 @@ and can contain multiple, unrelated bran
 use strict;
 use warnings;
 use Getopt::Std;
-use File::Spec;
-use File::Temp qw(tempfile tempdir);
+use File::Temp qw(tempdir);
 use File::Path qw(mkpath rmtree);
 use File::Basename qw(basename dirname);
-use Time::Local;
-use IO::Socket;
-use IO::Pipe;
-use POSIX qw(strftime dup2);
 use Data::Dumper qw/ Dumper /;
 use IPC::Open2;
 
@@ -71,27 +66,33 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git";
 $ENV{"GIT_DIR"} = $git_dir;
 my $ptag_dir = "$git_dir/archimport/tags";
 
-our($opt_h,$opt_v, $opt_T,$opt_t,$opt_o);
+our($opt_h,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a,$opt_o);
 
 sub usage() {
     print STDERR <<END;
 Usage: ${\basename $0}     # fetch/update GIT from Arch
-       [ -o ] [ -h ] [ -v ] [ -T ] [ -t tempdir ] 
+       [ -o ] [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth  ] [ -t tempdir ]
        repository/arch-branch [ repository/arch-branch] ...
 END
     exit(1);
 }
 
-getopts("Thvt:") or usage();
+getopts("Thvat:D:") or usage();
 usage if $opt_h;
 
 @ARGV >= 1 or usage();
-my @arch_roots = @ARGV;
+# $arch_branches:
+# values associated with keys:
+#   =1 - Arch version / git 'branch' detected via abrowse on a limit
+#   >1 - Arch version / git 'branch' of an auxilliary branch we've merged
+my %arch_branches = map { $_ => 1 } @ARGV;
 
 $ENV{'TMPDIR'} = $opt_t if $opt_t; # $ENV{TMPDIR} will affect tempdir() calls:
 my $tmp = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1);
 $opt_v && print "+ Using $tmp as temporary directory\n";
 
+my %reachable = ();             # Arch repositories we can access
+my %unreachable = ();           # Arch repositories we can't access :<
 my @psets  = ();                # the collection
 my %psets  = ();                # the collection, by name
 
@@ -99,96 +100,112 @@ my %rptags = ();                # my rev
                                 # to map a SHA1 to a commitid
 my $TLA = $ENV{'ARCH_CLIENT'} || 'tla';
 
-foreach my $root (@arch_roots) {
-    my ($arepo, $abranch) = split(m!/!, $root);
-    open ABROWSE, "tla abrowse -f -A $arepo --desc --merges $abranch |" 
-        or die "Problems with tla abrowse: $!";
-    
-    my %ps        = ();         # the current one
-    my $mode      = '';
-    my $lastseen  = '';
-    
-    while (<ABROWSE>) {
-        chomp;
+sub do_abrowse {
+    my $stage = shift;
+    while (my ($limit, $level) = each %arch_branches) {
+        next unless $level == $stage;
         
-        # first record padded w 8 spaces
-        if (s/^\s{8}\b//) {
-            
-            # store the record we just captured
-            if (%ps) {
-                my %temp = %ps; # break references
-                push (@psets, \%temp);
-		$psets{$temp{id}} = \%temp;
-                %ps = ();
-            }
-            
-            my ($id, $type) = split(m/\s{3}/, $_);
-            $ps{id}   = $id;
-            $ps{repo} = $arepo;
-
-            # deal with types
-            if ($type =~ m/^\(simple changeset\)/) {
-                $ps{type} = 's';
-            } elsif ($type eq '(initial import)') {
-                $ps{type} = 'i';
-            } elsif ($type =~ m/^\(tag revision of (.+)\)/) {
-                $ps{type} = 't';
-                $ps{tag}  = $1;
-            } else { 
-                warn "Unknown type $type";
-            }
-            $lastseen = 'id';
-        }
-        
-        if (s/^\s{10}//) { 
-            # 10 leading spaces or more 
-            # indicate commit metadata
+	open ABROWSE, "$TLA abrowse -fkD --merges $limit |" 
+                                or die "Problems with tla abrowse: $!";
+    
+        my %ps        = ();         # the current one
+        my $lastseen  = '';
+    
+        while (<ABROWSE>) {
+            chomp;
             
-            # date & author 
-            if ($lastseen eq 'id' && m/^\d{4}-\d{2}-\d{2}/) {
+            # first record padded w 8 spaces
+            if (s/^\s{8}\b//) {
+                my ($id, $type) = split(m/\s+/, $_, 2);
+
+                my %last_ps;
+                # store the record we just captured
+                if (%ps && !exists $psets{ $ps{id} }) {
+                    %last_ps = %ps; # break references
+                    push (@psets, \%last_ps);
+                    $psets{ $last_ps{id} } = \%last_ps;
+                }
                 
-                my ($date, $authoremail) = split(m/\s{2,}/, $_);
-                $ps{date}   = $date;
-                $ps{date}   =~ s/\bGMT$//; # strip off trailign GMT
-                if ($ps{date} =~ m/\b\w+$/) {
-                    warn 'Arch dates not in GMT?! - imported dates will be wrong';
+                my $branch = extract_versionname($id);
+                %ps = ( id => $id, branch => $branch );
+                if (%last_ps && ($last_ps{branch} eq $branch)) {
+                    $ps{parent_id} = $last_ps{id};
+                }
+                
+                $arch_branches{$branch} = 1;
+                $lastseen = 'id';
+
+                # deal with types (should work with baz or tla):
+                if ($type =~ m/\(.*changeset\)/) {
+                    $ps{type} = 's';
+                } elsif ($type =~ /\(.*import\)/) {
+                    $ps{type} = 'i';
+                } elsif ($type =~ m/\(tag.*\)/) {
+                    $ps{type} = 't';
+                    # read which revision we've tagged when we parse the log
+                    #$ps{tag}  = $1;
+                } else { 
+                    warn "Unknown type $type";
+                }
+
+                $arch_branches{$branch} = 1;
+                $lastseen = 'id';
+            } elsif (s/^\s{10}//) { 
+                # 10 leading spaces or more 
+                # indicate commit metadata
+                
+                # date
+                if ($lastseen eq 'id' && m/^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d)/){
+                    $ps{date}   = $1;
+                    $lastseen = 'date';
+                } elsif ($_ eq 'merges in:') {
+                    $ps{merges} = [];
+                    $lastseen = 'merges';
+                } elsif ($lastseen eq 'merges' && s/^\s{2}//) {
+                    my $id = $_;
+                    push (@{$ps{merges}}, $id);
+                   
+                    # aggressive branch finding:
+                    if ($opt_D) {
+                        my $branch = extract_versionname($id);
+                        my $repo = extract_reponame($branch);
+                        
+                        if (archive_reachable($repo) &&
+                                !defined $arch_branches{$branch}) {
+                            $arch_branches{$branch} = $stage + 1;
+                        }
+                    }
+                } else {
+                    warn "more metadata after merges!?: $_\n" unless /^\s*$/;
                 }
-            
-                $authoremail =~ m/^(.+)\s(\S+)$/;
-                $ps{author} = $1;
-                $ps{email}  = $2;
-            
-                $lastseen = 'date';
-            
-            } elsif ($lastseen eq 'date') {
-                # the only hint is position
-                # subject is after date
-                $ps{subj} = $_;
-                $lastseen = 'subj';
-            
-            } elsif ($lastseen eq 'subj' && $_ eq 'merges in:') {
-                $ps{merges} = [];
-                $lastseen = 'merges';
-            
-            } elsif ($lastseen eq 'merges' && s/^\s{2}//) {
-                push (@{$ps{merges}}, $_);
-            } else {
-                warn 'more metadata after merges!?';
             }
-            
         }
-    }
 
-    if (%ps) {
-        my %temp = %ps;         # break references
-        push (@psets, \%temp);  
-	$psets{ $temp{id} } = \%temp;
-        %ps = ();
-    }    
-    close ABROWSE;
+        if (%ps && !exists $psets{ $ps{id} }) {
+            my %temp = %ps;         # break references
+            if (@psets && $psets[$#psets]{branch} eq $ps{branch}) {
+                $temp{parent_id} = $psets[$#psets]{id};
+            }
+            push (@psets, \%temp);  
+            $psets{ $temp{id} } = \%temp;
+        }    
+        
+        close ABROWSE or die "$TLA abrowse failed on $limit\n";
+    }
 }                               # end foreach $root
 
+do_abrowse(1);
+my $depth = 2;
+$opt_D ||= 0;
+while ($depth <= $opt_D) {
+    do_abrowse($depth);
+    $depth++;
+}
+
 ## Order patches by time
+# FIXME see if we can find a more optimal way to do this by graphing
+# the ancestry data and walking it, that way we won't have to rely on
+# client-supplied dates
 @psets = sort {$a->{date}.$b->{id} cmp $b->{date}.$b->{id}} @psets;
 
 #print Dumper \@psets;
@@ -209,7 +226,7 @@ unless (-d $git_dir) { # initial import
     }
 } else {    # progressing an import
     # load the rptags
-    opendir(DIR, "$git_dir/archimport/tags")
+    opendir(DIR, $ptag_dir)
 	|| die "can't opendir: $!";
     while (my $file = readdir(DIR)) {
         # skip non-interesting-files
@@ -829,6 +846,7 @@ sub find_parents {
 	    }
 	}
     }
+
     @parents = ();
     foreach (keys %parents) {
         push @parents, '-p', ptag($_);
@@ -856,6 +874,7 @@ sub commitid2pset {
     return $ps;
 }
 
+
 # an alterative to `command` that allows input to be passed as an array
 # to work around shell problems with weird characters in arguments
 sub safe_pipe_capture {
@@ -869,4 +888,30 @@ sub safe_pipe_capture {
     return wantarray ? @output : join('',@output);
 }
 
+# `tla logs -rf -d <dir> | head -n1` or `baz tree-id <dir>`
+sub arch_tree_id {
+    my $dir = shift;
+    chomp( my $ret = (safe_pipe_capture($TLA,'logs','-rf','-d',$dir))[0] );
+    return $ret;
+}
+
+sub archive_reachable {
+    my $archive = shift;
+    return 1 if $reachable{$archive};
+    return 0 if $unreachable{$archive};
+    
+    if (system "$TLA whereis-archive $archive >/dev/null") {
+        if ($opt_a && (system($TLA,'register-archive',
+                      "http://mirrors.sourcecontrol.net/$archive") == 0)) {
+            $reachable{$archive} = 1;
+            return 1;
+        }
+        print STDERR "Archive is unreachable: $archive\n";
+        $unreachable{$archive} = 1;
+        return 0;
+    } else {
+        $reachable{$archive} = 1;
+        return 1;
+    }
+}
 
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/9] safer log file parsing
  2005-11-24  7:52             ` [PATCH 5/9] add -D <depth> and -a switch Eric Wong
@ 2005-11-24  7:53               ` Eric Wong
  2005-11-24  7:55                 ` [PATCH 7/9] Add the accurate changeset applyer Eric Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-24  7:53 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff

Better logfile parsing, no longer confused by 'headers' after the first
blank line.

Re-enabled tag-reading with abrowse (baz and tla compatible)

Remove need to quote args to external processes

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |  211 +++++++++++++++++++++++++++------------------------
 1 files changed, 112 insertions(+), 99 deletions(-)

applies-to: 1633bcf09400e93aca2eb335181db298a5f49350
3e12af1d958e2d631e27a2f696ca71f83094c7c3
diff --git a/git-archimport.perl b/git-archimport.perl
index 3968740..8676f35 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -140,10 +140,10 @@ sub do_abrowse {
                     $ps{type} = 's';
                 } elsif ($type =~ /\(.*import\)/) {
                     $ps{type} = 'i';
-                } elsif ($type =~ m/\(tag.*\)/) {
+                } elsif ($type =~ m/\(tag.*?(\S+\@\S+).*?\)/) {
                     $ps{type} = 't';
                     # read which revision we've tagged when we parse the log
-                    #$ps{tag}  = $1;
+                    $ps{tag}  = $1;
                 } else { 
                     warn "Unknown type $type";
                 }
@@ -359,78 +359,73 @@ foreach my $ps (@psets) {
     #
     my $tree;
     
-    my $commitlog = safe_pipe_capture($TLA,'cat-archive-log',$ps->{id}); 
+    my @commitlog = safe_pipe_capture($TLA,'cat-archive-log',$ps->{id}); 
     die "Error in cat-archive-log: $!" if $?;
         
-    # parselog will git-add/rm files
-    # and generally prepare things for the commit
-    # NOTE: parselog will shell-quote filenames! 
-    my ($sum, $msg, $add, $del, $mod, $ren) = parselog($commitlog);
-    my $logmessage = "$sum\n$msg";
-
+    parselog($ps,\@commitlog);
 
     # imports don't give us good info
     # on added files. Shame on them
-    if ($ps->{type} eq 'i' || $ps->{type} eq 't') { 
-        `find . -type f -print0 | grep -zv '^./$git_dir' | xargs -0 -l100 git-update-index --add`;
-        `git-ls-files --deleted -z | xargs --no-run-if-empty -0 -l100 git-update-index --remove`;
+    if ($ps->{type} eq 'i' || $ps->{type} eq 't') {
+        system('git-ls-files --others -z | '.
+                'git-update-index --add -z --stdin') == 0 or die "$! $?\n";
+        system('git-ls-files --deleted -z | '.
+                'git-update-index --remove -z --stdin') == 0 or die "$! $?\n";
     }
 
-    if (@$add) {
+    # TODO: handle removed_directories and renamed_directories:
+   
+    if (my $add = $ps->{new_files}) {
         while (@$add) {
             my @slice = splice(@$add, 0, 100);
-            my $slice = join(' ', @slice);          
-            `git-update-index --add $slice`;
-            die "Error in git-update-index --add: $!" if $?;
+            system('git-update-index','--add','--',@slice) == 0 or
+                            die "Error in git-update-index --add: $! $?\n";
         }
     }
-    if (@$del) {
-        foreach my $file (@$del) {
-            unlink $file or die "Problems deleting $file : $!";
-        }
+   
+    if (my $del = $ps->{removed_files}) {
+        unlink @$del;
         while (@$del) {
             my @slice = splice(@$del, 0, 100);
-            my $slice = join(' ', @slice);
-            `git-update-index --remove $slice`;
-            die "Error in git-update-index --remove: $!" if $?;
+            system('git-update-index','--remove','--',@slice) == 0 or
+                            die "Error in git-update-index --remove: $! $?\n";
         }
     }
-    if (@$ren) {                # renamed
+
+    if (my $ren = $ps->{renamed_files}) {                # renamed
         if (@$ren % 2) {
             die "Odd number of entries in rename!?";
         }
-        ;
+        
         while (@$ren) {
-            my $from = pop @$ren;
-            my $to   = pop @$ren;           
+            my $from = shift @$ren;
+            my $to   = shift @$ren;           
 
             unless (-d dirname($to)) {
                 mkpath(dirname($to)); # will die on err
             }
-            #print "moving $from $to";
-            `mv $from $to`;
-            die "Error renaming $from $to : $!" if $?;
-            `git-update-index --remove $from`;
-            die "Error in git-update-index --remove: $!" if $?;
-            `git-update-index --add $to`;
-            die "Error in git-update-index --add: $!" if $?;
+            print "moving $from $to";
+            rename($from, $to) or die "Error renaming '$from' '$to': $!\n";
+            system('git-update-index','--remove','--',$from) == 0 or
+                            die "Error in git-update-index --remove: $! $?\n";
+            system('git-update-index','--add','--',$to) == 0 or
+                            die "Error in git-update-index --add: $! $?\n";
         }
 
     }
-    if (@$mod) {                # must be _after_ renames
+
+    if (my $mod = $ps->{modified_files}) {
         while (@$mod) {
             my @slice = splice(@$mod, 0, 100);
-            my $slice = join(' ', @slice);
-            `git-update-index $slice`;
-            die "Error in git-update-index: $!" if $?;
+            system('git-update-index','--',@slice) == 0 or
+                            die "Error in git-update-index: $! $?\n";
         }
     }
-
+    
     # warn "errors when running git-update-index! $!";
     $tree = `git-write-tree`;
     die "cannot write tree $!" if $?;
     chomp $tree;
-        
     
     #
     # Who's your daddy?
@@ -464,13 +459,14 @@ foreach my $ps (@psets) {
     $ENV{GIT_COMMITTER_EMAIL} = $ps->{email};
     $ENV{GIT_COMMITTER_DATE}  = $ps->{date};
 
-    my ($pid, $commit_rh, $commit_wh);
-    $commit_rh = 'commit_rh';
-    $commit_wh = 'commit_wh';
-    
-    $pid = open2(*READER, *WRITER,'git-commit-tree',$tree,@par) 
+    my $pid = open2(*READER, *WRITER,'git-commit-tree',$tree,@par) 
         or die $!;
-    print WRITER $logmessage;   # write
+    print WRITER $ps->{summary},"\n";
+    print WRITER $ps->{message},"\n";
+    
+    # make it easy to backtrack and figure out which Arch revision this was:
+    print WRITER 'git-archimport-id: ',$ps->{id},"\n";
+    
     close WRITER;
     my $commitid = <READER>;    # read
     chomp $commitid;
@@ -568,7 +564,9 @@ sub apply_cset {
 
 
 # =for reference
-# A log entry looks like 
+# notes: *-files/-directories keys cannot have spaces, they're always
+# pika-escaped.  Everything after the first newline
+# A log entry looks like:
 # Revision: moodle-org--moodle--1.3.3--patch-15
 # Archive: arch-eduforge@catalyst.net.nz--2004
 # Creator: Penny Leach <penny@catalyst.net.nz>
@@ -586,70 +584,85 @@ sub apply_cset {
 #     admin/editor.html backup/lib.php backup/restore.php
 # New-patches: arch-eduforge@catalyst.net.nz--2004/moodle-org--moodle--1.3.3--patch-15
 # Summary: Updating to latest from MOODLE_14_STABLE (1.4.5+)
+#   summary can be multiline with a leading space just like the above fields
 # Keywords:
 #
 # Updating yadda tadda tadda madda
 sub parselog {
-    my $log = shift;
-    #print $log;
-
-    my (@add, @del, @mod, @ren, @kw, $sum, $msg );
-
-    if ($log =~ m/(?:\n|^)New-files:(.*?)(?=\n\w)/s ) {
-        my $files = $1;
-        @add = split(m/\s+/s, $files);
-    }
-       
-    if ($log =~ m/(?:\n|^)Removed-files:(.*?)(?=\n\w)/s ) {
-        my $files = $1;
-        @del = split(m/\s+/s, $files);
-    }
-    
-    if ($log =~ m/(?:\n|^)Modified-files:(.*?)(?=\n\w)/s ) {
-        my $files = $1;
-        @mod = split(m/\s+/s, $files);
-    }
-    
-    if ($log =~ m/(?:\n|^)Renamed-files:(.*?)(?=\n\w)/s ) {
-        my $files = $1;
-        @ren = split(m/\s+/s, $files);
-    }
-
-    $sum ='';
-    if ($log =~ m/^Summary:(.+?)$/m ) {
-        $sum = $1;
-        $sum =~ s/^\s+//;
-        $sum =~ s/\s+$//;
-    }
+    my ($ps, $log) = @_;
+    my $key = undef;
 
-    $msg = '';
-    if ($log =~ m/\n\n(.+)$/s) {
-        $msg = $1;
-        $msg =~ s/^\s+//;
-        $msg =~ s/\s+$//;
+    # headers we want that contain filenames:
+    my %want_headers = (
+        new_files => 1,
+        modified_files => 1,
+        renamed_files => 1,
+        renamed_directories => 1,
+        removed_files => 1,
+        removed_directories => 1,
+    );
+    
+    chomp (@$log);
+    while ($_ = shift @$log) {
+        if (/^Continuation-of:\s*(.*)/) {
+            $ps->{tag} = $1;
+            $key = undef;
+        } elsif (/^Summary:\s*(.*)$/ ) {
+            # summary can be multiline as long as it has a leading space
+            $ps->{summary} = [ $1 ];
+            $key = 'summary';
+        } elsif (/^Creator: (.*)\s*<([^\>]+)>/) {
+            $ps->{author} = $1;
+            $ps->{email} = $2;
+            $key = undef;
+        # any *-files or *-directories can be read here:
+        } elsif (/^([A-Z][a-z\-]+):\s*(.*)$/) {
+            my $val = $2;
+            $key = lc $1;
+            $key =~ tr/-/_/; # too lazy to quote :P
+            if ($want_headers{$key}) {
+                push @{$ps->{$key}}, split(/\s+/, $val);
+            } else {
+                $key = undef;
+            }
+        } elsif (/^$/) {
+            last; # remainder of @$log that didn't get shifted off is message
+        } elsif ($key) {
+            if (/^\s+(.*)$/) {
+                if ($key eq 'summary') {
+                    push @{$ps->{$key}}, $1;
+                } else { # files/directories:
+                    push @{$ps->{$key}}, split(/\s+/, $1);
+                }
+            } else {
+                $key = undef;
+            }
+        }
     }
-
-
-    # cleanup the arrays
-    foreach my $ref ( (\@add, \@del, \@mod, \@ren) ) {
-        my @tmp = ();
-        while (my $t = pop @$ref) {
-            next unless length ($t);
-            next if $t =~ m!\{arch\}/!;
-            next if $t =~ m!\.arch-ids/!;
-            next if $t =~ m!\.arch-inventory$!;
+   
+    # post-processing:
+    $ps->{summary} = join("\n",@{$ps->{summary}})."\n";
+    $ps->{message} = join("\n",@$log);
+    
+    # skip Arch control files, unescape pika-escaped files
+    foreach my $k (keys %want_headers) {
+        next unless (defined $ps->{$k});
+        my @tmp;
+        foreach my $t (@{$ps->{$k}}) {
+           next unless length ($t);
+           next if $t =~ m!\{arch\}/!;
+           next if $t =~ m!\.arch-ids/!;
+           # should we skip this?
+           next if $t =~ m!\.arch-inventory$!;
            # tla cat-archive-log will give us filenames with spaces as file\(sp)name - why?
            # we can assume that any filename with \ indicates some pika escaping that we want to get rid of.
-           if  ($t =~ /\\/ ){
+           if ($t =~ /\\/ ){
                $t = (safe_pipe_capture($TLA,'escape','--unescaped',$t))[0];
            }
-            push (@tmp, $t);
+           push @tmp, $t;
         }
-        @$ref = @tmp;
+        $ps->{$k} = \@tmp if scalar @tmp;
     }
-    
-    #print Dumper [$sum, $msg, \@add, \@del, \@mod, \@ren]; 
-    return       ($sum, $msg, \@add, \@del, \@mod, \@ren); 
 }
 
 # write/read a tag
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 7/9] Add the accurate changeset applyer
  2005-11-24  7:53               ` [PATCH 6/9] safer log file parsing Eric Wong
@ 2005-11-24  7:55                 ` Eric Wong
  2005-11-24  7:56                   ` [PATCH 8/9] Fix a bug I introduced in the new log parser Eric Wong
                                     ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-24  7:55 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff

And make it the default.
This includes stats tracking to verbose mode

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |  201 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 172 insertions(+), 29 deletions(-)

applies-to: aa9140057c95e59f65de0794f9054796fbfc96e5
32e5887eedb01ac4c398a06b0a1433ff6f4599fe
diff --git a/git-archimport.perl b/git-archimport.perl
index 8676f35..1cf1261 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -25,6 +25,9 @@ See man (1) git-archimport for more deta
  - audit shell-escaping of filenames
  - hide our private tags somewhere smarter
  - find a way to make "cat *patches | patch" safe even when patchfiles are missing newlines  
+ - sort and apply patches by graphing ancestry relations instead of just
+   relying in dates supplied in the changeset itself.
+   tla ancestry-graph -m could be helpful here...
 
 =head1 Devel tricks
 
@@ -66,18 +69,18 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git";
 $ENV{"GIT_DIR"} = $git_dir;
 my $ptag_dir = "$git_dir/archimport/tags";
 
-our($opt_h,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a,$opt_o);
+our($opt_h,$opt_f,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a,$opt_o);
 
 sub usage() {
     print STDERR <<END;
 Usage: ${\basename $0}     # fetch/update GIT from Arch
-       [ -o ] [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth  ] [ -t tempdir ]
+       [ -f ] [ -o ] [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth  ] [ -t tempdir ]
        repository/arch-branch [ repository/arch-branch] ...
 END
     exit(1);
 }
 
-getopts("Thvat:D:") or usage();
+getopts("fThvat:D:") or usage();
 usage if $opt_h;
 
 @ARGV >= 1 or usage();
@@ -95,6 +98,10 @@ my %reachable = ();             # Arch r
 my %unreachable = ();           # Arch repositories we can't access :<
 my @psets  = ();                # the collection
 my %psets  = ();                # the collection, by name
+my %stats  = (			# Track which strategy we used to import:
+	get_tag => 0, replay => 0, get_new => 0, get_delta => 0,
+        simple_changeset => 0, import_or_tag => 0
+);
 
 my %rptags = ();                # my reverse private tags
                                 # to map a SHA1 to a commitid
@@ -288,29 +295,69 @@ sub old_style_branchname {
 
 *git_branchname = $opt_o ? *old_style_branchname : *tree_dirname;
 
-# process patchsets
-foreach my $ps (@psets) {
-    $ps->{branch} = git_branchname($ps->{id});
-
-    #
-    # ensure we have a clean state 
-    # 
-    if (`git-diff-files`) {
-        die "Unclean tree when about to process $ps->{id} " .
-            " - did we fail to commit cleanly before?";
-    }
-    die $! if $?;
-
-    #
-    # skip commits already in repo
-    #
-    if (ptag($ps->{id})) {
-      $opt_v && print " * Skipping already imported: $ps->{id}\n";
-      next;
+sub process_patchset_accurate {
+    my $ps = shift;
+    
+    # switch to that branch if we're not already in that branch:
+    if (-e "$git_dir/refs/heads/$ps->{branch}") {
+       system('git-checkout','-f',$ps->{branch}) == 0 or die "$! $?\n";
+
+       # remove any old stuff that got leftover:
+       my $rm = safe_pipe_capture('git-ls-files','--others','-z');
+       rmtree(split(/\0/,$rm)) if $rm;
     }
+    
+    # Apply the import/changeset/merge into the working tree
+    my $dir = sync_to_ps($ps);
+    # read the new log entry:
+    my @commitlog = safe_pipe_capture($TLA,'cat-log','-d',$dir,$ps->{id});
+    die "Error in cat-log: $!" if $?;
+    chomp @commitlog;
+
+    # grab variables we want from the log, new fields get added to $ps:
+    # (author, date, email, summary, message body ...)
+    parselog($ps, \@commitlog);
+
+    if ($ps->{id} =~ /--base-0$/ && $ps->{id} ne $psets[0]{id}) {
+        # this should work when importing continuations 
+        if ($ps->{tag} && (my $branchpoint = eval { ptag($ps->{tag}) })) {
+            
+            # find where we are supposed to branch from
+            system('git-checkout','-f','-b',$ps->{branch},
+                            $branchpoint) == 0 or die "$! $?\n";
+            
+            # remove any old stuff that got leftover:
+            my $rm = safe_pipe_capture('git-ls-files','--others','-z');
+            rmtree(split(/\0/,$rm)) if $rm;
 
-    print " * Starting to work on $ps->{id}\n";
+            # If we trust Arch with the fact that this is just 
+            # a tag, and it does not affect the state of the tree
+            # then we just tag and move on
+            tag($ps->{id}, $branchpoint);
+            ptag($ps->{id}, $branchpoint);
+            print " * Tagged $ps->{id} at $branchpoint\n";
+            return 0;
+        } else {
+            warn "Tagging from unknown id unsupported\n" if $ps->{tag};
+        }
+        # allow multiple bases/imports here since Arch supports cherry-picks
+        # from unrelated trees
+    } 
+    
+    # update the index with all the changes we got
+    system('git-ls-files --others -z | '.
+            'git-update-index --add -z --stdin') == 0 or die "$! $?\n";
+    system('git-ls-files --deleted -z | '.
+            'git-update-index --remove -z --stdin') == 0 or die "$! $?\n";
+    system('git-ls-files -z | '.
+             'git-update-index -z --stdin') == 0 or die "$! $?\n";
+    return 1;
+}
 
+# the native changeset processing strategy.  This is very fast, but
+# does not handle permissions or any renames involving directories
+sub process_patchset_fast {
+    my $ps = shift;
     # 
     # create the branch if needed
     #
@@ -338,7 +385,7 @@ foreach my $ps (@psets) {
             tag($ps->{id}, $branchpoint);
             ptag($ps->{id}, $branchpoint);
             print " * Tagged $ps->{id} at $branchpoint\n";
-            next;
+            return 0;
         } 
         die $! if $?;
     } 
@@ -348,16 +395,17 @@ foreach my $ps (@psets) {
     # 
     if ($ps->{type} eq 'i' || $ps->{type} eq 't') {
         apply_import($ps) or die $!;
+        $stats{import_or_tag}++;
         $import=0;
     } elsif ($ps->{type} eq 's') {
         apply_cset($ps);
+        $stats{simple_changeset}++;
     }
 
     #
     # prepare update git's index, based on what arch knows
     # about the pset, resolve parents, etc
     #
-    my $tree;
     
     my @commitlog = safe_pipe_capture($TLA,'cat-archive-log',$ps->{id}); 
     die "Error in cat-archive-log: $!" if $?;
@@ -404,14 +452,13 @@ foreach my $ps (@psets) {
             unless (-d dirname($to)) {
                 mkpath(dirname($to)); # will die on err
             }
-            print "moving $from $to";
+            # print "moving $from $to";
             rename($from, $to) or die "Error renaming '$from' '$to': $!\n";
             system('git-update-index','--remove','--',$from) == 0 or
                             die "Error in git-update-index --remove: $! $?\n";
             system('git-update-index','--add','--',$to) == 0 or
                             die "Error in git-update-index --add: $! $?\n";
         }
-
     }
 
     if (my $mod = $ps->{modified_files}) {
@@ -421,9 +468,46 @@ foreach my $ps (@psets) {
                             die "Error in git-update-index: $! $?\n";
         }
     }
+    return 1; # we successfully applied the changeset
+}
+
+if ($opt_f) {
+    print "Will import patchsets using the fast strategy\n",
+            "Renamed directories and permission changes will be missed\n";
+    *process_patchset = *process_patchset_fast;
+} else {
+    print "Using the default (accurate) import strategy.\n",
+            "Things may be a bit slow\n";
+    *process_patchset = *process_patchset_accurate;
+}
+    
+foreach my $ps (@psets) {
+    # process patchsets
+    $ps->{branch} = git_branchname($ps->{id});
+
+    #
+    # ensure we have a clean state 
+    # 
+    if (my $dirty = `git-diff-files`) {
+        die "Unclean tree when about to process $ps->{id} " .
+            " - did we fail to commit cleanly before?\n$dirty";
+    }
+    die $! if $?;
     
+    #
+    # skip commits already in repo
+    #
+    if (ptag($ps->{id})) {
+      $opt_v && print " * Skipping already imported: $ps->{id}\n";
+      return 0;
+    }
+
+    print " * Starting to work on $ps->{id}\n";
+
+    process_patchset($ps) or next;
+
     # warn "errors when running git-update-index! $!";
-    $tree = `git-write-tree`;
+    my $tree = `git-write-tree`;
     die "cannot write tree $!" if $?;
     chomp $tree;
     
@@ -494,6 +578,65 @@ foreach my $ps (@psets) {
     print "   + commit $commitid\n";
     $opt_v && print "   + commit date is  $ps->{date} \n";
     $opt_v && print "   + parents:  ",join(' ',@par),"\n";
+    if (my $dirty = `git-diff-files`) {
+        die "22 Unclean tree when about to process $ps->{id} " .
+            " - did we fail to commit cleanly before?\n$dirty";
+    }
+}
+
+if ($opt_v) {
+    foreach (sort keys %stats) {
+        print" $_: $stats{$_}\n";
+    }
+}
+exit 0;
+
+# used by the accurate strategy:
+sub sync_to_ps {
+    my $ps = shift;
+    my $tree_dir = $tmp.'/'.tree_dirname($ps->{id});
+    
+    $opt_v && print "sync_to_ps($ps->{id}) method: ";
+
+    if (-d $tree_dir) {
+        if ($ps->{type} eq 't') {
+	    $opt_v && print "get (tag)\n";
+            # looks like a tag-only or (worse,) a mixed tags/changeset branch,
+            # can't rely on replay to work correctly on these
+            rmtree($tree_dir);
+            safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir);
+            $stats{get_tag}++;
+        } else {
+                my $tree_id = arch_tree_id($tree_dir);
+                if ($ps->{parent_id} && ($ps->{parent_id} eq $tree_id)) {
+                    # the common case (hopefully)
+		    $opt_v && print "replay\n";
+                    safe_pipe_capture($TLA,'replay','-d',$tree_dir,$ps->{id});
+                    $stats{replay}++;
+                } else {
+                    # getting one tree is usually faster than getting two trees
+                    # and applying the delta ...
+                    rmtree($tree_dir);
+		    $opt_v && print "apply-delta\n";
+                    safe_pipe_capture($TLA,'get','--no-pristine',
+                                        $ps->{id},$tree_dir);
+                    $stats{get_delta}++;
+                }
+        }
+    } else {
+        # new branch work
+        $opt_v && print "get (new tree)\n";
+        safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir);
+        $stats{get_new}++;
+    }
+   
+    # added -I flag to rsync since we're going to fast! AIEEEEE!!!!
+    system('rsync','-aI','--delete','--exclude',$git_dir,
+#               '--exclude','.arch-inventory',
+                '--exclude','.arch-ids','--exclude','{arch}',
+                '--exclude','+*','--exclude',',*',
+                "$tree_dir/",'./') == 0 or die "Cannot rsync $tree_dir: $! $?";
+    return $tree_dir;
 }
 
 sub apply_import {
@@ -896,7 +1039,7 @@ sub safe_pipe_capture {
         @output = (<$child>);
         close $child or die join(' ',@_).": $! $?";
     } else {
-	exec(@_) or die $?; # exec() can fail the executable can't be found
+	exec(@_) or die "$! $?"; # exec() can fail the executable can't be found
     }
     return wantarray ? @output : join('',@output);
 }
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 8/9] Fix a bug I introduced in the new log parser
  2005-11-24  7:55                 ` [PATCH 7/9] Add the accurate changeset applyer Eric Wong
@ 2005-11-24  7:56                   ` Eric Wong
  2005-11-24  7:58                     ` [PATCH 9/9] fix a in new changeset applyer addition Eric Wong
  2005-11-27  4:24                   ` [PATCH 7/9] Add the accurate changeset applyer Martin Langhoff
  2005-12-01 17:02                   ` Martin Langhoff
  2 siblings, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-11-24  7:56 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff

This fixes the case (that worked originally in Martin's version)
where the only new/modified files are Arch control files.

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

applies-to: db1362fe6567e349ff9dd9d70ce23c88a42a0ff2
ebe0689722f6c1440e680ec9a235b3dd571c7de0
diff --git a/git-archimport.perl b/git-archimport.perl
index 1cf1261..0080850 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -790,7 +790,7 @@ sub parselog {
     # skip Arch control files, unescape pika-escaped files
     foreach my $k (keys %want_headers) {
         next unless (defined $ps->{$k});
-        my @tmp;
+        my @tmp = ();
         foreach my $t (@{$ps->{$k}}) {
            next unless length ($t);
            next if $t =~ m!\{arch\}/!;
@@ -804,7 +804,7 @@ sub parselog {
            }
            push @tmp, $t;
         }
-        $ps->{$k} = \@tmp if scalar @tmp;
+        $ps->{$k} = \@tmp;
     }
 }
 
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 9/9] fix a in new changeset applyer addition
  2005-11-24  7:56                   ` [PATCH 8/9] Fix a bug I introduced in the new log parser Eric Wong
@ 2005-11-24  7:58                     ` Eric Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-24  7:58 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff

Fix a stupid bug I introduced when splitting the 
accurate and fast changeset appliers.

Also, remove an old debugging statement I added

Signed-off-by: Eric Wong <normalperson@yhbt.net>

---

 git-archimport.perl |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

applies-to: 6dfed0cb7c209cf47902d6dfcd02a974d252041b
b081cb1e0f79f1a290bcf1f2161d63415ec5e2a9
diff --git a/git-archimport.perl b/git-archimport.perl
index 0080850..aab4e38 100755
--- a/git-archimport.perl
+++ b/git-archimport.perl
@@ -499,7 +499,7 @@ foreach my $ps (@psets) {
     #
     if (ptag($ps->{id})) {
       $opt_v && print " * Skipping already imported: $ps->{id}\n";
-      return 0;
+      next;
     }
 
     print " * Starting to work on $ps->{id}\n";
@@ -578,10 +578,6 @@ foreach my $ps (@psets) {
     print "   + commit $commitid\n";
     $opt_v && print "   + commit date is  $ps->{date} \n";
     $opt_v && print "   + parents:  ",join(' ',@par),"\n";
-    if (my $dirty = `git-diff-files`) {
-        die "22 Unclean tree when about to process $ps->{id} " .
-            " - did we fail to commit cleanly before?\n$dirty";
-    }
 }
 
 if ($opt_v) {
---
0.99.9.GIT

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] remove git wrapper dependency
  2005-11-24  7:51           ` [PATCH 4/9] remove git wrapper dependency Eric Wong
  2005-11-24  7:52             ` [PATCH 5/9] add -D <depth> and -a switch Eric Wong
@ 2005-11-24  8:20             ` Andreas Ericsson
  2005-11-24  8:35               ` Junio C Hamano
  1 sibling, 1 reply; 39+ messages in thread
From: Andreas Ericsson @ 2005-11-24  8:20 UTC (permalink / raw
  To: git list

Eric Wong wrote:
> use git-diff-files instead of git diff-files so we don't rely on the
> wrapper being installed (some people may have git as GNU interactive
> tools :)
> 

This one should do
	git --exec-path

first to get the proper path to git-diff-files. Fall back to it being in 
the path if finding out fails.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] remove git wrapper dependency
  2005-11-24  8:20             ` [PATCH 4/9] remove git wrapper dependency Andreas Ericsson
@ 2005-11-24  8:35               ` Junio C Hamano
  2005-11-24  8:50                 ` Eric Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2005-11-24  8:35 UTC (permalink / raw
  To: git

Andreas Ericsson <ae@op5.se> writes:

> Eric Wong wrote:
>> use git-diff-files instead of git diff-files so we don't rely on the
>> wrapper being installed (some people may have git as GNU interactive
>> tools :)
>>
>
> This one should do
> 	git --exec-path
>
> first to get the proper path to git-diff-files. Fall back to it being in 
> the path if finding out fails.

Eric is worried about the case where git on your PATH is GNU
interactive tools, so "git --exec-path" would not give you what
you want ;-).

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] remove git wrapper dependency
  2005-11-24  8:35               ` Junio C Hamano
@ 2005-11-24  8:50                 ` Eric Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-24  8:50 UTC (permalink / raw
  To: git

Junio C Hamano <junkio@cox.net> wrote:
> Andreas Ericsson <ae@op5.se> writes:
> 
> > Eric Wong wrote:
> >> use git-diff-files instead of git diff-files so we don't rely on the
> >> wrapper being installed (some people may have git as GNU interactive
> >> tools :)
> >>
> >
> > This one should do
> > 	git --exec-path
> >
> > first to get the proper path to git-diff-files. Fall back to it being in 
> > the path if finding out fails.
> 
> Eric is worried about the case where git on your PATH is GNU
> interactive tools, so "git --exec-path" would not give you what
> you want ;-).

Right on.  I'm actually not a GNU interactive tools user, but I do have
empathy for them being a cg (cgvg) user myself for many, many years.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] archimport improvements
  2005-11-24  7:46   ` Eric Wong
  2005-11-24  7:47     ` [PATCH 1/9] archimport: first, make sure it still compiles Eric Wong
@ 2005-11-24  9:25     ` Martin Langhoff
  1 sibling, 0 replies; 39+ messages in thread
From: Martin Langhoff @ 2005-11-24  9:25 UTC (permalink / raw
  To: Eric Wong; +Cc: git list

On 11/24/05, Eric Wong <normalperson@yhbt.net> wrote:
> Ok, I didn't expect you guys to have 12k of files in your trees.  None
> of your source trees are remotely close to that size (but I have many
> more changesets).  I'm surprised you guys were able to put up
> with Arch in the first place!
>
> 125m58.431s with my method.
>   8m24.504s with yours :)
>
> All of my usual source trees imported 1k changesets in 10-15 minutes

:-) I'm happy that you managed to wait patiently for it to complete --
all my attempts to run your import code were ended by a sleepy ctrl-c.

> Patches on the way.

Cool -- will review, but may take a couple days, as I'm away from home
this week.

> OTOH, the time spent importing the bulk of the history is a one-time
> operation for most people and I'd much rather it get things as right as
> possible and move on.

Hmmm. Some teams -- such as mine -- just run it every couple hours to
maintain an Arch2cvs gateway.

More later,


martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] archimport: first, make sure it still compiles
  2005-11-24  7:47     ` [PATCH 1/9] archimport: first, make sure it still compiles Eric Wong
  2005-11-24  7:48       ` [PATCH 2/9] remove String::ShellQuote dependency Eric Wong
@ 2005-11-24 18:54       ` Linus Torvalds
  2005-11-26 10:51         ` Martin Langhoff
  2005-11-26 20:43         ` Eric Wong
  1 sibling, 2 replies; 39+ messages in thread
From: Linus Torvalds @ 2005-11-24 18:54 UTC (permalink / raw
  To: Eric Wong; +Cc: Martin Langhoff, git list, Martin Langhoff



Eric,
 I don't know about Junio, but if I were him, I'd have preferred that all 
your patches had a

	archimport: ..

prefix in the subject line, not just the first one.

For example, if you just merge the patches as-is now, and then look at the 
end result with gitk (or any of the tools that show the shortlog format: 
just the first line of the commit), you get explanations like

	fix -t tmpdir switch

which is clearly _correct_, but it's much nicer if they show which area 
was implied, ie

	archimport: fix -t tmpdir switch

so that you can tell from the shortlog whether it was a "global" change, 
or somethign that affected a specific program.

Just a suggestion,

		Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] archimport: first, make sure it still compiles
  2005-11-24 18:54       ` [PATCH 1/9] archimport: first, make sure it still compiles Linus Torvalds
@ 2005-11-26 10:51         ` Martin Langhoff
  2005-11-26 20:43         ` Eric Wong
  1 sibling, 0 replies; 39+ messages in thread
From: Martin Langhoff @ 2005-11-26 10:51 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Eric Wong, git list, Martin Langhoff

On 11/24/05, Linus Torvalds <torvalds@osdl.org> wrote:
>  I don't know about Junio, but if I were him, I'd have preferred that all
> your patches had a
>
>         archimport: ..
>
> prefix in the subject line, not just the first one.

Good catch -- I'll prefix them all as I merge them. If Junio pulls
from my tree, he'll get them prefixed.

cheers,



martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] archimport: first, make sure it still compiles
  2005-11-24 18:54       ` [PATCH 1/9] archimport: first, make sure it still compiles Linus Torvalds
  2005-11-26 10:51         ` Martin Langhoff
@ 2005-11-26 20:43         ` Eric Wong
  1 sibling, 0 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-26 20:43 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Martin Langhoff, git list, Martin Langhoff

Linus Torvalds <torvalds@osdl.org> wrote:
> 
> 
> Eric,
>  I don't know about Junio, but if I were him, I'd have preferred that all 
> your patches had a
> 
> 	archimport: ..
> 
> prefix in the subject line, not just the first one.

Good idea, will do for future patches.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Add the accurate changeset applyer
  2005-11-24  7:55                 ` [PATCH 7/9] Add the accurate changeset applyer Eric Wong
  2005-11-24  7:56                   ` [PATCH 8/9] Fix a bug I introduced in the new log parser Eric Wong
@ 2005-11-27  4:24                   ` Martin Langhoff
  2005-11-27  5:43                     ` Eric Wong
  2005-12-01 17:02                   ` Martin Langhoff
  2 siblings, 1 reply; 39+ messages in thread
From: Martin Langhoff @ 2005-11-27  4:24 UTC (permalink / raw
  To: Eric Wong; +Cc: git list, Martin Langhoff

On 11/24/05, Eric Wong <normalperson@yhbt.net> wrote:
> And make it the default.

Cheeky, but right ;-)

Would it be a good idea to read the log entry and decide what kind of
smarts do we need to apply the changeset? If the log entry looks
plain, use process_patchset_fast(), else invoke $TLA?

cheers,


martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Add the accurate changeset applyer
  2005-11-27  4:24                   ` [PATCH 7/9] Add the accurate changeset applyer Martin Langhoff
@ 2005-11-27  5:43                     ` Eric Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-27  5:43 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> On 11/24/05, Eric Wong <normalperson@yhbt.net> wrote:
> > And make it the default.
> 
> Cheeky, but right ;-)
> 
> Would it be a good idea to read the log entry and decide what kind of
> smarts do we need to apply the changeset? If the log entry looks
> plain, use process_patchset_fast(), else invoke $TLA?

This could work.  For it to work efficiently, process_patchset_fast()
should probably be modified to work on real Arch trees and rsync with
the git one.  Basically, we can replace the bulk of the tla replay calls
with your fast changeset applier.   Once the fast mode hits a changeset
it can't handle, it can do a tla replay on a single changeset instead of
having to do a slow get/apply-delta on an out-of-date tree.

process_patchset_fast() must understand how to handle permissions
changes, though, as Arch log entries are completely useless for that.

Unfortunately, doing this right and fast probably still requires more
time than it's worth.  Let's face it, trees with 12k files are extremely
rare in the Arch world (as are trees constantly reorganized by
obsessive-compulsives :), but many trees do get a small handful of
directory renames in their lifetime.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Add the accurate changeset applyer
  2005-11-24  7:55                 ` [PATCH 7/9] Add the accurate changeset applyer Eric Wong
  2005-11-24  7:56                   ` [PATCH 8/9] Fix a bug I introduced in the new log parser Eric Wong
  2005-11-27  4:24                   ` [PATCH 7/9] Add the accurate changeset applyer Martin Langhoff
@ 2005-12-01 17:02                   ` Martin Langhoff
  2005-12-03  2:51                     ` Eric Wong
  2 siblings, 1 reply; 39+ messages in thread
From: Martin Langhoff @ 2005-12-01 17:02 UTC (permalink / raw
  To: Eric Wong; +Cc: git list, Martin Langhoff

Eric,

My test results are a bit of a mixed bag. On one hand, I'm satisfied
that both fast and correct imports reach the same tree (minus file
modes) for the same commit with the arch repos I imported.

On the other hand, with my "moodle" repo, the 'correct' import seems
to have stop importing a lot earlier than it should have. I am
re-running it now to try to continue from where it left off, but it's
unclear why it abandoned -- I didn't see any error. How widely have
you tested this method?

cheers,


martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Add the accurate changeset applyer
  2005-12-01 17:02                   ` Martin Langhoff
@ 2005-12-03  2:51                     ` Eric Wong
  2005-12-05 18:53                       ` Martin Langhoff
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Wong @ 2005-12-03  2:51 UTC (permalink / raw
  To: Martin Langhoff; +Cc: git list, Martin Langhoff

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> Eric,
> 
> My test results are a bit of a mixed bag. On one hand, I'm satisfied
> that both fast and correct imports reach the same tree (minus file
> modes) for the same commit with the arch repos I imported.
> 
> On the other hand, with my "moodle" repo, the 'correct' import seems
> to have stop importing a lot earlier than it should have. I am
> re-running it now to try to continue from where it left off, but it's
> unclear why it abandoned -- I didn't see any error. How widely have
> you tested this method?

This was from the moodle repo I archive-mirrored locally a few weeks ago
for testing:

get_new: 6
get_tag: 0
import_or_tag: 0
replay: 356

Rerunning it doesn't seem to pull anymore.  IIRC, My previous times
only imported around ~150 patchsets.  The time it took to run this
was certainly longer than the last run (~4 hours here, vs ~2 hours
I mentioned in <20051124074605.GA4789@mail.yhbt.net>, so there may
be a bug somewhere...  Unfortunately, I no longer have those old
trees around.

I've imported several trees with >1000 revisions without problems,
mpd-uclinux is among them:

http://mpd.bogomips.org/mpd-uclinux.git/

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Add the accurate changeset applyer
  2005-12-03  2:51                     ` Eric Wong
@ 2005-12-05 18:53                       ` Martin Langhoff
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Langhoff @ 2005-12-05 18:53 UTC (permalink / raw
  To: Eric Wong; +Cc: git list, Martin Langhoff

On 12/3/05, Eric Wong <normalperson@yhbt.net> wrote:
> Rerunning it doesn't seem to pull anymore.  IIRC, My previous times
> only imported around ~150 patchsets.  The time it took to run this
> was certainly longer than the last run (~4 hours here, vs ~2 hours
> I mentioned in <20051124074605.GA4789@mail.yhbt.net>, so there may
> be a bug somewhere...  Unfortunately, I no longer have those old
> trees around.
>
> I've imported several trees with >1000 revisions without problems,
> mpd-uclinux is among them:
>
> http://mpd.bogomips.org/mpd-uclinux.git/

Haven't been able to retest an import and have it finished without my
ssh session dropping (should have used gnu screen). I'll be able to
test it more thoroughly in a couple of days. Very sorry about the
delay.

cheers,



martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2005-12-05 18:53 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-12  9:23 [PATCH] archimport improvements Eric Wong
2005-11-12  9:25 ` [PATCH 1/5] remove shellquote usage for tags Eric Wong
2005-11-12  9:27   ` [PATCH 2/5] archimport: don't die on merge-base failure Eric Wong
2005-11-12  9:29     ` [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git Eric Wong
2005-11-12  9:30       ` [PATCH 4/5] Overhaul of changeset application Eric Wong
2005-11-12  9:32         ` [PATCH 5/5] -D <depth> option to recurse into merged branches Eric Wong
2005-11-14  2:01           ` Eric Wong
2005-11-12 12:07         ` [PATCH 4/5] Overhaul of changeset application Martin Langhoff
2005-11-12 20:49           ` Eric Wong
2005-11-12 11:54 ` [PATCH] archimport improvements Martin Langhoff
2005-11-12 20:21   ` Eric Wong
2005-11-14 22:38     ` Martin Langhoff
2005-11-15  8:03       ` Eric Wong
2005-11-15  8:05         ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong
2005-11-15  8:06           ` [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods Eric Wong
2005-11-15  8:07           ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong
2005-11-17  9:26 ` [PATCH] archimport improvements Martin Langhoff
2005-11-24  7:46   ` Eric Wong
2005-11-24  7:47     ` [PATCH 1/9] archimport: first, make sure it still compiles Eric Wong
2005-11-24  7:48       ` [PATCH 2/9] remove String::ShellQuote dependency Eric Wong
2005-11-24  7:50         ` [PATCH 3/9] fix -t tmpdir switch Eric Wong
2005-11-24  7:51           ` [PATCH 4/9] remove git wrapper dependency Eric Wong
2005-11-24  7:52             ` [PATCH 5/9] add -D <depth> and -a switch Eric Wong
2005-11-24  7:53               ` [PATCH 6/9] safer log file parsing Eric Wong
2005-11-24  7:55                 ` [PATCH 7/9] Add the accurate changeset applyer Eric Wong
2005-11-24  7:56                   ` [PATCH 8/9] Fix a bug I introduced in the new log parser Eric Wong
2005-11-24  7:58                     ` [PATCH 9/9] fix a in new changeset applyer addition Eric Wong
2005-11-27  4:24                   ` [PATCH 7/9] Add the accurate changeset applyer Martin Langhoff
2005-11-27  5:43                     ` Eric Wong
2005-12-01 17:02                   ` Martin Langhoff
2005-12-03  2:51                     ` Eric Wong
2005-12-05 18:53                       ` Martin Langhoff
2005-11-24  8:20             ` [PATCH 4/9] remove git wrapper dependency Andreas Ericsson
2005-11-24  8:35               ` Junio C Hamano
2005-11-24  8:50                 ` Eric Wong
2005-11-24 18:54       ` [PATCH 1/9] archimport: first, make sure it still compiles Linus Torvalds
2005-11-26 10:51         ` Martin Langhoff
2005-11-26 20:43         ` Eric Wong
2005-11-24  9:25     ` [PATCH] archimport improvements Martin Langhoff

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).