git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-svn pulling down duplicate revisions
@ 2008-05-20  0:26 Kevin Ballard
  2008-06-02  5:00 ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Ballard @ 2008-05-20  0:26 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Eric Wong

I started a git-svn clone on a large svn repository, and I noticed  
that for various branches, it kept pulling down the exact same  
revisions (starting at r1). In other words, if I had 4 branches that  
shared common history, their common history all got pulled down 4  
times. I double-checked, and the created commit objects were identical.

Why was git-svn pulling down the same revisions over and over, when it  
already knows it has a commit object for those revisions?

-Kevin Ballard

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-svn pulling down duplicate revisions
  2008-05-20  0:26 git-svn pulling down duplicate revisions Kevin Ballard
@ 2008-06-02  5:00 ` Eric Wong
  2008-06-02  5:06   ` Kevin Ballard
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2008-06-02  5:00 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Git Mailing List

Kevin Ballard <kevin@sb.org> wrote:
> I started a git-svn clone on a large svn repository, and I noticed  
> that for various branches, it kept pulling down the exact same  
> revisions (starting at r1). In other words, if I had 4 branches that  
> shared common history, their common history all got pulled down 4  
> times. I double-checked, and the created commit objects were identical.
> 
> Why was git-svn pulling down the same revisions over and over, when it  
> already knows it has a commit object for those revisions?

Can you give me an example if a repository and command-line you used
that does this?   Did you use 'git svn clone -s' or did you manually
specify the branch locations in the repo?

It could even be a lack of read permissions to the repository root
that would cause things like this.

Thanks,

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-svn pulling down duplicate revisions
  2008-06-02  5:00 ` Eric Wong
@ 2008-06-02  5:06   ` Kevin Ballard
  2008-06-02  5:40     ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Ballard @ 2008-06-02  5:06 UTC (permalink / raw)
  To: Eric Wong; +Cc: Git Mailing List

On Jun 1, 2008, at 10:00 PM, Eric Wong wrote:

> Kevin Ballard <kevin@sb.org> wrote:
>> I started a git-svn clone on a large svn repository, and I noticed
>> that for various branches, it kept pulling down the exact same
>> revisions (starting at r1). In other words, if I had 4 branches that
>> shared common history, their common history all got pulled down 4
>> times. I double-checked, and the created commit objects were  
>> identical.
>>
>> Why was git-svn pulling down the same revisions over and over, when  
>> it
>> already knows it has a commit object for those revisions?
>
> Can you give me an example if a repository and command-line you used
> that does this?   Did you use 'git svn clone -s' or did you manually
> specify the branch locations in the repo?
>
> It could even be a lack of read permissions to the repository root
> that would cause things like this.

The repository is, unfortunately, a private repo so I can't share it.  
I used `git svn clone -s` to clone it. I have the SVN perl bindings  
v1.4.4 (according to git svn --version).

I definitely have read permissions to the repo root. If I specify to  
only fetch -r 12000:HEAD (there's 14000-odd revisions), it doesn't  
pull down any duplicates, but when I let it start from the root, it  
pulls down hundreds of duplicates for multiple branches.

-Kevin Ballard

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-svn pulling down duplicate revisions
  2008-06-02  5:06   ` Kevin Ballard
@ 2008-06-02  5:40     ` Eric Wong
  2008-06-02  5:55       ` Kevin Ballard
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2008-06-02  5:40 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Git Mailing List

Kevin Ballard <kevin@sb.org> wrote:
> On Jun 1, 2008, at 10:00 PM, Eric Wong wrote:
> 
> >Kevin Ballard <kevin@sb.org> wrote:
> >>I started a git-svn clone on a large svn repository, and I noticed
> >>that for various branches, it kept pulling down the exact same
> >>revisions (starting at r1). In other words, if I had 4 branches that
> >>shared common history, their common history all got pulled down 4
> >>times. I double-checked, and the created commit objects were  
> >>identical.
> >>
> >>Why was git-svn pulling down the same revisions over and over, when  
> >>it
> >>already knows it has a commit object for those revisions?
> >
> >Can you give me an example if a repository and command-line you used
> >that does this?   Did you use 'git svn clone -s' or did you manually
> >specify the branch locations in the repo?
> >
> >It could even be a lack of read permissions to the repository root
> >that would cause things like this.
> 
> The repository is, unfortunately, a private repo so I can't share it.  
> I used `git svn clone -s` to clone it. I have the SVN perl bindings  
> v1.4.4 (according to git svn --version).
> 
> I definitely have read permissions to the repo root. If I specify to  
> only fetch -r 12000:HEAD (there's 14000-odd revisions), it doesn't  
> pull down any duplicates, but when I let it start from the root, it  
> pulls down hundreds of duplicates for multiple branches.

Can you at least send me the 'svn log -v' output for that repo?
Feel free to leave out the actual log messages and munge the path
names if you can't expose that information.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-svn pulling down duplicate revisions
  2008-06-02  5:40     ` Eric Wong
@ 2008-06-02  5:55       ` Kevin Ballard
  2008-06-02 10:42         ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Ballard @ 2008-06-02  5:55 UTC (permalink / raw)
  To: Eric Wong; +Cc: Git Mailing List

On Jun 1, 2008, at 10:40 PM, Eric Wong wrote:

> Kevin Ballard <kevin@sb.org> wrote:
>> On Jun 1, 2008, at 10:00 PM, Eric Wong wrote:
>>
>>> Kevin Ballard <kevin@sb.org> wrote:
>>>> I started a git-svn clone on a large svn repository, and I noticed
>>>> that for various branches, it kept pulling down the exact same
>>>> revisions (starting at r1). In other words, if I had 4 branches  
>>>> that
>>>> shared common history, their common history all got pulled down 4
>>>> times. I double-checked, and the created commit objects were
>>>> identical.
>>>>
>>>> Why was git-svn pulling down the same revisions over and over, when
>>>> it
>>>> already knows it has a commit object for those revisions?
>>>
>>> Can you give me an example if a repository and command-line you used
>>> that does this?   Did you use 'git svn clone -s' or did you manually
>>> specify the branch locations in the repo?
>>>
>>> It could even be a lack of read permissions to the repository root
>>> that would cause things like this.
>>
>> The repository is, unfortunately, a private repo so I can't share it.
>> I used `git svn clone -s` to clone it. I have the SVN perl bindings
>> v1.4.4 (according to git svn --version).
>>
>> I definitely have read permissions to the repo root. If I specify to
>> only fetch -r 12000:HEAD (there's 14000-odd revisions), it doesn't
>> pull down any duplicates, but when I let it start from the root, it
>> pulls down hundreds of duplicates for multiple branches.
>
> Can you at least send me the 'svn log -v' output for that repo?
> Feel free to leave out the actual log messages and munge the path
> names if you can't expose that information.

I'll have to do it tomorrow when I'm at the office. How much log info  
do you need? I can let it run until I see duplicate revisions (it's  
pretty obvious, it starts over again from r1).

-Kevin

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-svn pulling down duplicate revisions
  2008-06-02  5:55       ` Kevin Ballard
@ 2008-06-02 10:42         ` Eric Wong
  2008-06-02 17:45           ` Kevin Ballard
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2008-06-02 10:42 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Git Mailing List

Kevin Ballard <kevin@sb.org> wrote:
> On Jun 1, 2008, at 10:40 PM, Eric Wong wrote:
> 
> >Kevin Ballard <kevin@sb.org> wrote:
> >>On Jun 1, 2008, at 10:00 PM, Eric Wong wrote:
> >>
> >>>Kevin Ballard <kevin@sb.org> wrote:
> >>>>I started a git-svn clone on a large svn repository, and I noticed
> >>>>that for various branches, it kept pulling down the exact same
> >>>>revisions (starting at r1). In other words, if I had 4 branches  
> >>>>that
> >>>>shared common history, their common history all got pulled down 4
> >>>>times. I double-checked, and the created commit objects were
> >>>>identical.
> >>>>
> >>>>Why was git-svn pulling down the same revisions over and over, when
> >>>>it
> >>>>already knows it has a commit object for those revisions?
> >>>
> >>>Can you give me an example if a repository and command-line you used
> >>>that does this?   Did you use 'git svn clone -s' or did you manually
> >>>specify the branch locations in the repo?
> >>>
> >>>It could even be a lack of read permissions to the repository root
> >>>that would cause things like this.
> >>
> >>The repository is, unfortunately, a private repo so I can't share it.
> >>I used `git svn clone -s` to clone it. I have the SVN perl bindings
> >>v1.4.4 (according to git svn --version).
> >>
> >>I definitely have read permissions to the repo root. If I specify to
> >>only fetch -r 12000:HEAD (there's 14000-odd revisions), it doesn't
> >>pull down any duplicates, but when I let it start from the root, it
> >>pulls down hundreds of duplicates for multiple branches.
> >
> >Can you at least send me the 'svn log -v' output for that repo?
> >Feel free to leave out the actual log messages and munge the path
> >names if you can't expose that information.
> 
> I'll have to do it tomorrow when I'm at the office. How much log info  
> do you need? I can let it run until I see duplicate revisions (it's  
> pretty obvious, it starts over again from r1).

I'll need the revisions where branches were created from
the common ancestor (presumably trunk) and some revisions
before it.

For debugging problems with restricted repositories, it may be worth it
to create a repository skeleton cloning tool that just reads the output
of 'svn log --xml -v' and recreates a new SVN repository with:

  * all log messages stripped

  * all new files are created with just a random string in them (to
    throw off rename detection on the git side)
    (except symlinks, see below)

  * all path components tokenized and each token replaced with
    a dictionary value.  Something like:

    @tmp = map { $tok{$_} ||= ++$i; $tok{$_} } split(/\//, $old_path);
    $new_path = join('/', @tmp);

    This way all copy history can be preserved

  * all modified files will just get a random byte appended to them

  * all committer names replaced with a dictionary value (similar to
    what is done to path components).


-- 
Eric Wong

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-svn pulling down duplicate revisions
  2008-06-02 10:42         ` Eric Wong
@ 2008-06-02 17:45           ` Kevin Ballard
  2008-06-02 17:59             ` Björn Steinbrink
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Ballard @ 2008-06-02 17:45 UTC (permalink / raw)
  To: Eric Wong; +Cc: Git Mailing List

On Jun 2, 2008, at 3:42 AM, Eric Wong wrote:

> Kevin Ballard <kevin@sb.org> wrote:
>> On Jun 1, 2008, at 10:40 PM, Eric Wong wrote:
>>
>>> Kevin Ballard <kevin@sb.org> wrote:
>>>> On Jun 1, 2008, at 10:00 PM, Eric Wong wrote:
>>>>
>>>>> Kevin Ballard <kevin@sb.org> wrote:
>>>>>> I started a git-svn clone on a large svn repository, and I  
>>>>>> noticed
>>>>>> that for various branches, it kept pulling down the exact same
>>>>>> revisions (starting at r1). In other words, if I had 4 branches
>>>>>> that
>>>>>> shared common history, their common history all got pulled down 4
>>>>>> times. I double-checked, and the created commit objects were
>>>>>> identical.
>>>>>>
>>>>>> Why was git-svn pulling down the same revisions over and over,  
>>>>>> when
>>>>>> it
>>>>>> already knows it has a commit object for those revisions?
>>>>>
>>>>> Can you give me an example if a repository and command-line you  
>>>>> used
>>>>> that does this?   Did you use 'git svn clone -s' or did you  
>>>>> manually
>>>>> specify the branch locations in the repo?
>>>>>
>>>>> It could even be a lack of read permissions to the repository root
>>>>> that would cause things like this.
>>>>
>>>> The repository is, unfortunately, a private repo so I can't share  
>>>> it.
>>>> I used `git svn clone -s` to clone it. I have the SVN perl bindings
>>>> v1.4.4 (according to git svn --version).
>>>>
>>>> I definitely have read permissions to the repo root. If I specify  
>>>> to
>>>> only fetch -r 12000:HEAD (there's 14000-odd revisions), it doesn't
>>>> pull down any duplicates, but when I let it start from the root, it
>>>> pulls down hundreds of duplicates for multiple branches.
>>>
>>> Can you at least send me the 'svn log -v' output for that repo?
>>> Feel free to leave out the actual log messages and munge the path
>>> names if you can't expose that information.
>>
>> I'll have to do it tomorrow when I'm at the office. How much log info
>> do you need? I can let it run until I see duplicate revisions (it's
>> pretty obvious, it starts over again from r1).
>
> I'll need the revisions where branches were created from
> the common ancestor (presumably trunk) and some revisions
> before it.
>
> For debugging problems with restricted repositories, it may be worth  
> it
> to create a repository skeleton cloning tool that just reads the  
> output
> of 'svn log --xml -v' and recreates a new SVN repository with:
>
>  * all log messages stripped
>
>  * all new files are created with just a random string in them (to
>    throw off rename detection on the git side)
>    (except symlinks, see below)
>
>  * all path components tokenized and each token replaced with
>    a dictionary value.  Something like:
>
>    @tmp = map { $tok{$_} ||= ++$i; $tok{$_} } split(/\//, $old_path);
>    $new_path = join('/', @tmp);
>
>    This way all copy history can be preserved
>
>  * all modified files will just get a random byte appended to them
>
>  * all committer names replaced with a dictionary value (similar to
>    what is done to path components).

Isn't there a script somewhere that's supposed to do this? Do you know  
where it is?

Incidentally, I just checked and when I start the git-svn clone, it  
starts pulling down revisions for the branch 'css_refactor@1559' (odd  
branch name, but it claimed to find multiple branch points for this  
'css_refactor' branch). My guess is when it starts working on the next  
branch, it doesn't view it as related to css_refactor and starts  
pulling down the revisions again even though those revisions actually  
belonged to trunk.

-Kevin

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git-svn pulling down duplicate revisions
  2008-06-02 17:45           ` Kevin Ballard
@ 2008-06-02 17:59             ` Björn Steinbrink
  0 siblings, 0 replies; 8+ messages in thread
From: Björn Steinbrink @ 2008-06-02 17:59 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Eric Wong, Git Mailing List

On 2008.06.02 10:45:04 -0700, Kevin Ballard wrote:
> Incidentally, I just checked and when I start the git-svn clone, it
> starts pulling down revisions for the branch 'css_refactor@1559' (odd
> branch name, but it claimed to find multiple branch points for this
> 'css_refactor' branch). My guess is when it starts working on the next
> branch, it doesn't view it as related to css_refactor and starts
> pulling down the revisions again even though those revisions actually
> belonged to trunk.

Hm, you could probably test that theory at least for branches that
started from trunk I guess. First, clone trunk only, and then add the
branches/tags config entries and fetch the rest. If it holds, then the
duplication should be gone I guess.

Björn

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-06-02 18:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-20  0:26 git-svn pulling down duplicate revisions Kevin Ballard
2008-06-02  5:00 ` Eric Wong
2008-06-02  5:06   ` Kevin Ballard
2008-06-02  5:40     ` Eric Wong
2008-06-02  5:55       ` Kevin Ballard
2008-06-02 10:42         ` Eric Wong
2008-06-02 17:45           ` Kevin Ballard
2008-06-02 17:59             ` Björn Steinbrink

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).