git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: Kevin Ballard <kevin@sb.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: git-svn pulling down duplicate revisions
Date: Mon, 2 Jun 2008 03:42:25 -0700	[thread overview]
Message-ID: <20080602104225.GA8401@untitled> (raw)
In-Reply-To: <0E759330-1A0A-489D-ADA3-B71A49951227@sb.org>

Kevin Ballard <kevin@sb.org> wrote:
> On Jun 1, 2008, at 10:40 PM, Eric Wong wrote:
> 
> >Kevin Ballard <kevin@sb.org> wrote:
> >>On Jun 1, 2008, at 10:00 PM, Eric Wong wrote:
> >>
> >>>Kevin Ballard <kevin@sb.org> wrote:
> >>>>I started a git-svn clone on a large svn repository, and I noticed
> >>>>that for various branches, it kept pulling down the exact same
> >>>>revisions (starting at r1). In other words, if I had 4 branches  
> >>>>that
> >>>>shared common history, their common history all got pulled down 4
> >>>>times. I double-checked, and the created commit objects were
> >>>>identical.
> >>>>
> >>>>Why was git-svn pulling down the same revisions over and over, when
> >>>>it
> >>>>already knows it has a commit object for those revisions?
> >>>
> >>>Can you give me an example if a repository and command-line you used
> >>>that does this?   Did you use 'git svn clone -s' or did you manually
> >>>specify the branch locations in the repo?
> >>>
> >>>It could even be a lack of read permissions to the repository root
> >>>that would cause things like this.
> >>
> >>The repository is, unfortunately, a private repo so I can't share it.
> >>I used `git svn clone -s` to clone it. I have the SVN perl bindings
> >>v1.4.4 (according to git svn --version).
> >>
> >>I definitely have read permissions to the repo root. If I specify to
> >>only fetch -r 12000:HEAD (there's 14000-odd revisions), it doesn't
> >>pull down any duplicates, but when I let it start from the root, it
> >>pulls down hundreds of duplicates for multiple branches.
> >
> >Can you at least send me the 'svn log -v' output for that repo?
> >Feel free to leave out the actual log messages and munge the path
> >names if you can't expose that information.
> 
> I'll have to do it tomorrow when I'm at the office. How much log info  
> do you need? I can let it run until I see duplicate revisions (it's  
> pretty obvious, it starts over again from r1).

I'll need the revisions where branches were created from
the common ancestor (presumably trunk) and some revisions
before it.

For debugging problems with restricted repositories, it may be worth it
to create a repository skeleton cloning tool that just reads the output
of 'svn log --xml -v' and recreates a new SVN repository with:

  * all log messages stripped

  * all new files are created with just a random string in them (to
    throw off rename detection on the git side)
    (except symlinks, see below)

  * all path components tokenized and each token replaced with
    a dictionary value.  Something like:

    @tmp = map { $tok{$_} ||= ++$i; $tok{$_} } split(/\//, $old_path);
    $new_path = join('/', @tmp);

    This way all copy history can be preserved

  * all modified files will just get a random byte appended to them

  * all committer names replaced with a dictionary value (similar to
    what is done to path components).


-- 
Eric Wong

  reply	other threads:[~2008-06-02 10:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-20  0:26 git-svn pulling down duplicate revisions Kevin Ballard
2008-06-02  5:00 ` Eric Wong
2008-06-02  5:06   ` Kevin Ballard
2008-06-02  5:40     ` Eric Wong
2008-06-02  5:55       ` Kevin Ballard
2008-06-02 10:42         ` Eric Wong [this message]
2008-06-02 17:45           ` Kevin Ballard
2008-06-02 17:59             ` Björn Steinbrink

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080602104225.GA8401@untitled \
    --to=normalperson@yhbt.net \
    --cc=git@vger.kernel.org \
    --cc=kevin@sb.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).