git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] allow git-svn fetching to work using serf
@ 2013-07-06  3:41 Kyle McKay
  2013-07-06  7:17 ` David Rothenberger
  0 siblings, 1 reply; 9+ messages in thread
From: Kyle McKay @ 2013-07-06  3:41 UTC (permalink / raw)
  To: git; +Cc: David Rothenberger, Petr Baudis, Jonathan Nieder, Daniel Shahaf

This patch allows git-svn to fetch successfully using the
serf library when given an https?: url to fetch from.

Unfortunately some svn servers do not seem to be configured
well for use with the serf library.  This can cause fetching
to take longer compared to the neon library or actually
cause timeouts during the fetch.  When timeouts occur
git-svn can be safely restarted to fetch more revisions.

A new temp_is_locked function has been added to Git.pm
to facilitate using the minimal number of temp files
possible when using serf.

The problem that occurs when running git-svn fetch using
the serf library is that the previously used temp file
is not always unlocked before the next temp file needs
to be used.

To work around this problem, a new temp name is used
if the temp name that would otherwise be chosen is
currently locked.

This patch may not be all that is required, but at least
it is a starting point.

Daniel Shahaf has suggested also setting "servers:global:http-bulk- 
updates=on".

Kyle J. McKay (2):
  Git.pm: add new temp_is_locked function
  git-svn: allow git-svn fetching to work using serf

perl/Git.pm             | 17 ++++++++++++++++-
perl/Git/SVN/Fetcher.pm |  6 ++++--
2 files changed, 20 insertions(+), 3 deletions(-)

-- 
1.8.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] allow git-svn fetching to work using serf
  2013-07-06  3:41 [PATCH 0/2] allow git-svn fetching to work using serf Kyle McKay
@ 2013-07-06  7:17 ` David Rothenberger
  2013-07-07  0:28   ` Jonathan Nieder
  0 siblings, 1 reply; 9+ messages in thread
From: David Rothenberger @ 2013-07-06  7:17 UTC (permalink / raw)
  To: git

On 7/5/2013 8:41 PM, Kyle McKay wrote:
> This patch allows git-svn to fetch successfully using the
> serf library when given an https?: url to fetch from.

Thanks, Kyle. I confirm this is working for my problem cases as
well.

> Daniel Shahaf has suggested also setting
> "servers:global:http-bulk-updates=on".

I have a patch that does this, but since turning on bulk updates has
a possible performance penalty, I prefer your approach. 

Please let me know if anyone wants to see the patch.

-- 
David Rothenberger  ----  daveroth@acm.org

What to do in case of an alien attack:

    1)   Hide beneath the seat of your plane and look away.
    2)   Avoid eye contact.
    3) If there are no eyes, avoid all contact.

                -- The Firesign Theatre, _Everything you know is Wrong_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] allow git-svn fetching to work using serf
  2013-07-06  7:17 ` David Rothenberger
@ 2013-07-07  0:28   ` Jonathan Nieder
  2013-07-07  1:24     ` Kyle McKay
  2013-07-07  3:44     ` David Rothenberger
  0 siblings, 2 replies; 9+ messages in thread
From: Jonathan Nieder @ 2013-07-07  0:28 UTC (permalink / raw)
  To: David Rothenberger; +Cc: git

David Rothenberger wrote:
> On 7/5/2013 8:41 PM, Kyle McKay wrote:

>> Daniel Shahaf has suggested also setting
>> "servers:global:http-bulk-updates=on".
>
> I have a patch that does this, but since turning on bulk updates has
> a possible performance penalty, I prefer your approach. 

I assume that's because http-bulk-updates defeats caching.  If so,
makes sense.

Please forgive my ignorance: is there a bug filed about ra_serf's
misbehavior here?  Is it eventually going to be fixed and this is
just a workaround, or is the growth in temp file use something we'd
live with permanently?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] allow git-svn fetching to work using serf
  2013-07-07  0:28   ` Jonathan Nieder
@ 2013-07-07  1:24     ` Kyle McKay
  2013-07-07  1:37       ` Jonathan Nieder
  2013-07-07  3:44     ` David Rothenberger
  1 sibling, 1 reply; 9+ messages in thread
From: Kyle McKay @ 2013-07-07  1:24 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: David Rothenberger, git

On Jul 6, 2013, at 17:28, Jonathan Nieder wrote:
> David Rothenberger wrote:
>> On 7/5/2013 8:41 PM, Kyle McKay wrote:
>
>>> Daniel Shahaf has suggested also setting
>>> "servers:global:http-bulk-updates=on".
>>
>> I have a patch that does this, but since turning on bulk updates has
>> a possible performance penalty, I prefer your approach.
>
> I assume that's because http-bulk-updates defeats caching.  If so,
> makes sense.
>
> Please forgive my ignorance: is there a bug filed about ra_serf's
> misbehavior here?  Is it eventually going to be fixed and this is
> just a workaround, or is the growth in temp file use something we'd
> live with permanently?

Apparently it will not be fixed:

Begin forwarded message:
> From: David Rothenberger <daveroth@acm.org>
> Date: July 5, 2013 16:14:12 PDT
> To: git@vger.kernel.org
> Subject: Re: git-svn "Temp file with moniker 'svn_delta' already in  
> use" and skelta mode
>
> I traced git-svn and discovered that the error is due to a known
> problem in the SVN APIs. ra_serf does not drive the delta editor in
> a depth-first manner as required by the API [1]. Instead, the calls
> come in this order:
>
> 1. open_root
> 2. open_directory
> 3. add_file
> 4. apply_textdelta
> 5. add_file
> 6. apply_textdelta
>
> This is a known issue [2] and one that the Subversion folks have
> elected not to fix [3].
>
> [1]
> http://subversion.apache.org/docs/api/latest/structsvn__delta__editor__t.html#details
> [2] http://subversion.tigris.org/issues/show_bug.cgi?id=2932
> [3] http://subversion.tigris.org/issues/show_bug.cgi?id=3831

The summary of [3] which is marked RESOLVED,FIXED is "Add errata / 
release note noise around ra_serf's editor drive violations".

Kyle

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] allow git-svn fetching to work using serf
  2013-07-07  1:24     ` Kyle McKay
@ 2013-07-07  1:37       ` Jonathan Nieder
  2013-07-07  2:46         ` Kyle McKay
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Nieder @ 2013-07-07  1:37 UTC (permalink / raw)
  To: Kyle McKay; +Cc: David Rothenberger, git

Kyle McKay wrote:
> On Jul 6, 2013, at 17:28, Jonathan Nieder wrote:
>> David Rothenberger wrote:
>>> On 7/5/2013 8:41 PM, Kyle McKay wrote:

>>>> Daniel Shahaf has suggested also setting
>>>> "servers:global:http-bulk-updates=on".
>>>
>>> I have a patch that does this, but since turning on bulk updates has
>>> a possible performance penalty, I prefer your approach.
>>
>> I assume that's because http-bulk-updates defeats caching.  If so,
>> makes sense.
>>
>> Please forgive my ignorance: is there a bug filed about ra_serf's
>> misbehavior here?  Is it eventually going to be fixed and this is
>> just a workaround, or is the growth in temp file use something we'd
>> live with permanently?
[...]
>
> Begin forwarded message:
[...]
>> [2] http://subversion.tigris.org/issues/show_bug.cgi?id=2932

Ah, thanks for the context.

It's still not clear to me how we know that ra_serf driving the editor
in a non depth-first manner is the problem here.  Has that explanation
been confirmed somehow?

For example, does the workaround mentioned by danielsh work?  Does
using ra_neon instead of ra_serf avoid trouble as well?  Is there a
simple explanation of why violating the depth-first constraint would
lead to multiple blob (i.e., file, not directory) deltas being opened
in a row without an intervening close?

Jonathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] allow git-svn fetching to work using serf
  2013-07-07  1:37       ` Jonathan Nieder
@ 2013-07-07  2:46         ` Kyle McKay
  2013-07-07 17:40           ` Jonathan Nieder
  0 siblings, 1 reply; 9+ messages in thread
From: Kyle McKay @ 2013-07-07  2:46 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: David Rothenberger, git

On Jul 6, 2013, at 18:37, Jonathan Nieder wrote:
> Kyle McKay wrote:
>> On Jul 6, 2013, at 17:28, Jonathan Nieder wrote:
>>> David Rothenberger wrote:
>>>> On 7/5/2013 8:41 PM, Kyle McKay wrote:
>
>>>>> Daniel Shahaf has suggested also setting
>>>>> "servers:global:http-bulk-updates=on".
>>>>
>>>> I have a patch that does this, but since turning on bulk updates  
>>>> has
>>>> a possible performance penalty, I prefer your approach.
>>>
>>> I assume that's because http-bulk-updates defeats caching.  If so,
>>> makes sense.
>>>
>>> Please forgive my ignorance: is there a bug filed about ra_serf's
>>> misbehavior here?  Is it eventually going to be fixed and this is
>>> just a workaround, or is the growth in temp file use something we'd
>>> live with permanently?
> [...]
>>
>> Begin forwarded message:
> [...]
>>> [2] http://subversion.tigris.org/issues/show_bug.cgi?id=2932
>
> Ah, thanks for the context.
>
> It's still not clear to me how we know that ra_serf driving the editor
> in a non depth-first manner is the problem here.  Has that explanation
> been confirmed somehow?
>
> For example, does the workaround mentioned by danielsh work?  Does
> using ra_neon instead of ra_serf avoid trouble as well?  Is there a
> simple explanation of why violating the depth-first constraint would
> lead to multiple blob (i.e., file, not directory) deltas being opened
> in a row without an intervening close?

Using ra_neon seams to eliminate the problem. Using ra_neon has always  
been the default until svn 1.8 which drops ra_neon support entirely  
and always uses ra_serf for https?: urls.

The workaround mentioned by danielsh won't work if the server has  
configured SVNAllowBulkUpdates Off because that will force use of  
skelta mode no matter what the client does.  However, since ra_neon  
only ever has a single connection to the server it probably doesn't  
matter.

Since ra_serf makes multiple connections to the server (hard-coded to  
4 prior to svn 1.8, defaults to 4 in svn 1.8 but can be set to between  
1 and 8) it makes sense there would be multiple active calls to  
apply_textdelta if processing is done as results are received on the  
multiple connections.

Kyle

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] allow git-svn fetching to work using serf
  2013-07-07  0:28   ` Jonathan Nieder
  2013-07-07  1:24     ` Kyle McKay
@ 2013-07-07  3:44     ` David Rothenberger
  2013-07-07 17:53       ` Jonathan Nieder
  1 sibling, 1 reply; 9+ messages in thread
From: David Rothenberger @ 2013-07-07  3:44 UTC (permalink / raw)
  To: git; +Cc: git

On 7/6/2013 5:28 PM, Jonathan Nieder wrote:
> David Rothenberger wrote:
>> On 7/5/2013 8:41 PM, Kyle McKay wrote:
> 
>>> Daniel Shahaf has suggested also setting
>>> "servers:global:http-bulk-updates=on".
>>
>> I have a patch that does this, but since turning on bulk updates has
>> a possible performance penalty, I prefer your approach. 
> 
> I assume that's because http-bulk-updates defeats caching.  If so,
> makes sense.

I believe that "bulk updates" means that serf makes one request for a
lot of information and receives it all in one big response. In "skelta"
mode, serf makes a single request for a single piece of information. The
serf authors feel this can lead to improved overall throughput because
they can pipeline these requests and have multiple connections open at
the same time.

The downside, though, is that serf will do multiple open_file calls in
parallel as it descends down sibling directories.

> It's still not clear to me how we know that ra_serf driving the editor
> in a non depth-first manner is the problem here.  Has that explanation
> been confirmed somehow?

I did do a trace of "git svn fetch" and observed this non-depth-first
traversal. It certainly causes the failure we've observed.

> Is there a simple explanation of why violating the depth-first
> constraint would lead to multiple blob (i.e., file, not directory)
> deltas being opened in a row without an intervening close?

I believe serf is doing the following for a number of files in parallel:
 1. open_file
 2. apply_textdelta
 3. change_file_prop, change_file_prop, ...
 4. close_file


-- 
David Rothenberger  ----  daveroth@acm.org

Nusbaum's Rule:
        The more pretentious the corporate name, the smaller the
        organization.  (For instance, the Murphy Center for the
        Codification of Human and Organizational Law, contrasted
        to IBM, GM, and AT&T.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] allow git-svn fetching to work using serf
  2013-07-07  2:46         ` Kyle McKay
@ 2013-07-07 17:40           ` Jonathan Nieder
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Nieder @ 2013-07-07 17:40 UTC (permalink / raw)
  To: Kyle McKay; +Cc: David Rothenberger, git, users

(cc-ing subversion's users@ list for advice)
Kyle McKay wrote:
> On Jul 6, 2013, at 18:37, Jonathan Nieder wrote:
>> Kyle McKay wrote:
>>> Begin forwarded message:

>>>> [2] http://subversion.tigris.org/issues/show_bug.cgi?id=2932
>>
>> Ah, thanks for the context.
>>
>> It's still not clear to me how we know that ra_serf driving the editor
>> in a non depth-first manner is the problem here.  Has that explanation
>> been confirmed somehow?
[...]
> Since ra_serf makes multiple connections to the server (hard-coded
> to 4 prior to svn 1.8, defaults to 4 in svn 1.8 but can be set to
> between 1 and 8) it makes sense there would be multiple active calls
> to apply_textdelta if processing is done as results are received on
> the multiple connections.

Ah, that's worrisome.  Do I understand you correctly that to work with
ra_serf in skelta mode, callers need to make their apply_textdelta
callback thread-safe?

Or do you just mean that the traversal order is based on the order in
which results are received?  That would be fine, as long as after each
apply_textdelta call, close_file is called before the next
apply_textdelta.

Jonathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] allow git-svn fetching to work using serf
  2013-07-07  3:44     ` David Rothenberger
@ 2013-07-07 17:53       ` Jonathan Nieder
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Nieder @ 2013-07-07 17:53 UTC (permalink / raw)
  To: David Rothenberger; +Cc: git, users, Kyle J. McKay, Eric Wong

(cc-ing users@ as requested by danielsh)
David Rothenberger wrote:
> On 7/6/2013 5:28 PM, Jonathan Nieder wrote:

>> Is there a simple explanation of why violating the depth-first
>> constraint would lead to multiple blob (i.e., file, not directory)
>> deltas being opened in a row without an intervening close?
>
> I believe serf is doing the following for a number of files in parallel:
>  1. open_file
>  2. apply_textdelta
>  3. change_file_prop, change_file_prop, ...
>  4. close_file

Ah, that makes more sense.  It is not about traversal order but about
processing multiple non-directory files in parallel, and step (3)
potentially involving a large number of property changes means that it
can make sense not to take a lock.

Perhaps the reference documentation could warn about this?

On the git-svn side, it looks like we have enough information to make
a more complete commit message or in-code comment so the reason for
multiple git_blob tempfiles is not forgotten.  Thanks for your patient
explanations.

Jonathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-07-07 17:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-06  3:41 [PATCH 0/2] allow git-svn fetching to work using serf Kyle McKay
2013-07-06  7:17 ` David Rothenberger
2013-07-07  0:28   ` Jonathan Nieder
2013-07-07  1:24     ` Kyle McKay
2013-07-07  1:37       ` Jonathan Nieder
2013-07-07  2:46         ` Kyle McKay
2013-07-07 17:40           ` Jonathan Nieder
2013-07-07  3:44     ` David Rothenberger
2013-07-07 17:53       ` Jonathan Nieder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).