Git ~unusable on slow lines :,'C

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Git ~unusable on slow lines :,'C
@ 2012-10-08 18:27 Marcel Partap
  2012-10-09  1:49 ` Carlos Martín Nieto
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Partap @ 2012-10-08 18:27 UTC (permalink / raw)
  To: git

Dear Git Devs,
I love GIT, but since a couple of months I'm on 3G and after my traffic
limit is transcended, things slow down to a feeble 8KiB/s. Juuuust like
back then - things moved somewhat slower. And I'm fine with that - as
long as things just keep moving.
Unfortunately, git does not scale down very well, so for ten more days I
will be unable to get the newest commits onto my machine. Which is very,
very sad :/
> git fetch --verbose --all 
> Fetching origin
> POST git-upload-pack (1023 bytes)
> POST git-upload-pack (gzip 1123 to 614 bytes)
> POST git-upload-pack (gzip 1973 to 1030 bytes)
> POST git-upload-pack (gzip 5173 to 2639 bytes)
> POST git-upload-pack (gzip 7978 to 4042 bytes)
> remote: Counting objects: 24504, done.
> remote: Compressing objects: 100% (10705/10705), done.
> error: RPC failed; result=56, HTTP code = 200iB | 10 KiB/s       
> fatal: The remote end hung up unexpectedly
> fatal: early EOF
> fatal: index-pack failed
> error: Could not fetch origin
Bam, the server kicked me off after taking to long to sync my copy.
Multiple potential points of action:
- git fetch should show the total amount of data it is about to transfer!
- when ab^H^Horting, the cursor should be moved down (tput cud1) to not
overwrite previous output
- would be nice to be able to tell git fetch to get the next chunk of
say 500 commits instead of trying to receive ALL commits, then b0rking
after umpteen percent on server timeout. Not?

#Regards!Marcel c:

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git ~unusable on slow lines :,'C
  2012-10-08 18:27 Git ~unusable on slow lines :,'C Marcel Partap
@ 2012-10-09  1:49 ` Carlos Martín Nieto
  2012-10-09 14:06   ` Marcel Partap
  2012-10-09 16:46   ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Carlos Martín Nieto @ 2012-10-09  1:49 UTC (permalink / raw)
  To: Marcel Partap; +Cc: git

Marcel Partap <mpartap@gmx.net> writes:

> Dear Git Devs,
> I love GIT, but since a couple of months I'm on 3G and after my traffic
> limit is transcended, things slow down to a feeble 8KiB/s. Juuuust like
> back then - things moved somewhat slower. And I'm fine with that - as
> long as things just keep moving.
> Unfortunately, git does not scale down very well, so for ten more days I
> will be unable to get the newest commits onto my machine. Which is very,
> very sad :/
>> git fetch --verbose --all 
>> Fetching origin
>> POST git-upload-pack (1023 bytes)
>> POST git-upload-pack (gzip 1123 to 614 bytes)
>> POST git-upload-pack (gzip 1973 to 1030 bytes)
>> POST git-upload-pack (gzip 5173 to 2639 bytes)
>> POST git-upload-pack (gzip 7978 to 4042 bytes)
>> remote: Counting objects: 24504, done.
>> remote: Compressing objects: 100% (10705/10705), done.
>> error: RPC failed; result=56, HTTP code = 200iB | 10 KiB/s       
>> fatal: The remote end hung up unexpectedly
>> fatal: early EOF
>> fatal: index-pack failed
>> error: Could not fetch origin
> Bam, the server kicked me off after taking to long to sync my copy.

This is unrelated to git. The HTTP server's configuration is too
impatient.

> Multiple potential points of action:
> - git fetch should show the total amount of data it is about to
> transfer!

It can't, because it doesn't know.

> - when ab^H^Horting, the cursor should be moved down (tput cud1) to not
> overwrite previous output

The error message doesn't really know whether it is going to overwrite
it (the CR comes from the server), though I suppose an extra LF wouldn't
hurt there.

> - would be nice to be able to tell git fetch to get the next chunk of
> say 500 commits instead of trying to receive ALL commits, then b0rking
> after umpteen percent on server timeout. Not?

You asked for the current state of the repository, and that's what its
giving you. The timeout has nothing to do with git, if you can't
convince the admins to increase it, you can try using another transport
which doesn't suffer from HTTP, as it's most likely an anti-DoS measure.

If you want to download it bit by bit, you can tell fetch to download
particular tags. Doing this automatically for this would be working
around a configuration issue for a particular server, which is generally
better fixed in other ways.


   cmn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git ~unusable on slow lines :,'C
  2012-10-09  1:49 ` Carlos Martín Nieto
@ 2012-10-09 14:06   ` Marcel Partap
  2012-10-09 15:58     ` Shawn Pearce
  2012-10-09 17:39     ` Carlos Martín Nieto
  2012-10-09 16:46   ` Junio C Hamano
  1 sibling, 2 replies; 7+ messages in thread
From: Marcel Partap @ 2012-10-09 14:06 UTC (permalink / raw)
  To: Carlos Martín Nieto; +Cc: git

>> Bam, the server kicked me off after taking to long to sync my copy.
> This is unrelated to git. The HTTP server's configuration is too
> impatient.
Yes. How does that mean it is unrelated to git?

>> - git fetch should show the total amount of data it is about to
>> transfer!
> It can't, because it doesn't know.
The server side doesn't know at how much the objects *it just repacked
for transfer* weigh in?
If that truly is the case, wouldn't it make sense to make git a little
more introspective? f.e.
> # git info git://foo.org/bar.git
> .. [server generating figures] ..
> URL: git://foo.org/bar.git
> Created/Earliest commit: ...
> Last modified/Latest commit: ...
> Total object count: .... (..commits, ..files, .. directories)
> Total repository size (compressed): ... MiB
> Branches:
> [git branch -va] + branch size

> The error message doesn't really know whether it is going to overwrite
> it (the CR comes from the server), though I suppose an extra LF wouldn't
> hurt there.
Definitely wouldn't hurt.

>> - would be nice to be able to tell git fetch to get the next chunk of
>> say 500 commits instead of trying to receive ALL commits, then b0rking
>> after umpteen percent on server timeout. Not?
> You asked for the current state of the repository, and that's what its
> giving you.
And instead, I would rather like to ask for the next 500 commits. No way
to do it.

> The timeout has nothing to do with git, if you can't
> convince the admins to increase it, you can try using another transport
> which doesn't suffer from HTTP, as it's most likely an anti-DoS measure.
See, I probably can't convince the admins to drop their anti-dos measures.
And they (drupal.org admins) probably will not change their allowed
protocol policies.
Despite that, i've had timeouts or simply stale connections dying down
before with other repositories and various transport modes.
The easiest fix would be an option to tell git to not fetch everything...

> If you want to download it bit by bit, you can tell fetch to download
> particular tags.
..without specifying specific commit tags.
Browsing gitweb sites to find a tag for which the fetch doesn't time out
is hugely inconvenient, especially on a slow line.

> Doing this automatically for this would be working
> around a configuration issue for a particular server, which is generally
> better fixed in other ways.
It is not only a configuration issue for one particular server. Git in
general is hardly usable on slow lines because
- it doesn't show the volume of data that is to be downloaded!
- it doesn't allow the user to sync up in steps the circumstances will
allow to succeed.

#Regards!Marcel.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git ~unusable on slow lines :,'C
  2012-10-09 14:06   ` Marcel Partap
@ 2012-10-09 15:58     ` Shawn Pearce
  2012-10-09 17:19       ` Marcel Partap
  2012-10-09 17:39     ` Carlos Martín Nieto
  1 sibling, 1 reply; 7+ messages in thread
From: Shawn Pearce @ 2012-10-09 15:58 UTC (permalink / raw)
  To: Marcel Partap; +Cc: Carlos Martín Nieto, git

On Tue, Oct 9, 2012 at 7:06 AM, Marcel Partap <mpartap@gmx.net> wrote:
>>> Bam, the server kicked me off after taking to long to sync my copy.
>> This is unrelated to git. The HTTP server's configuration is too
>> impatient.
> Yes. How does that mean it is unrelated to git?

It means its out of our control, we cannot modify the HTTP server's
configuration to have a longer timeout. We can recommend that the
timeout be increased, but as you point out the admins may not do that.

>>> - git fetch should show the total amount of data it is about to
>>> transfer!
>> It can't, because it doesn't know.
> The server side doesn't know at how much the objects *it just repacked
> for transfer* weigh in?

Actually it does. Its just not used here. What value is that to you?
You asked for the repository. If you know its size is going to be ~105
MiB you have two choices... continue to get the repository you asked
for, or disconnect and give up. Either way the size doesn't help you.
It would require a protocol modification to send a size estimate down
to the client before the data in order to give the client a better
progress meter than the object count (allowing it instead to track by
bytes received). But this has been seen as not very useful or
worthwhile since it doesn't really help anyone do anything better. So
why change the protocol?

>> You asked for the current state of the repository, and that's what its
>> giving you.
> And instead, I would rather like to ask for the next 500 commits. No way
> to do it.

No, there isn't. Git assumes that once it has commit X, all versions
that predate X are already on the local workstation. This is a
fundamental assumption that the entire protocol relies on. It is not
trivial to change. We have been through this many times on the mailing
list, please search the archives for "resumable clone".

>> The timeout has nothing to do with git, if you can't
>> convince the admins to increase it, you can try using another transport
>> which doesn't suffer from HTTP, as it's most likely an anti-DoS measure.
> See, I probably can't convince the admins to drop their anti-dos measures.
> And they (drupal.org admins) probably will not change their allowed
> protocol policies.

Then if they are hosting really big repositories that are hard for
their contributors to obtain, they should take the time to write a
script that periodically creates a bundle file for each repository
using `git bundle create repo.bundle --all`. They can host these
bundle files in any file transport service like HTTP or BitTorrent,
and users can download and resume these using normal HTTP download
tools. Once you have a bundle file locally, you can clone from it with
modern Git with `git clone $(pwd)/repo.bundle` to initialize the
repository.

This is currently the best way to support resumable clone. The repo
will be stale by whatever time has elapsed since the bundle file was
created. But then Git can do an incremental fetch to catch up, and
this transfer size should be limited to the progress made since the
bundle was made. If bundles are made once per month or after each
major release its usually a manageable delta.

> It is not only a configuration issue for one particular server. Git in
> general is hardly usable on slow lines because
> - it doesn't show the volume of data that is to be downloaded!

If it did show you, what would you do? Declare defeat before it even
starts to download and give up and start a thread about how Git
requires too much bandwidth?

Have you tried to shallow clone the repository in question?

> - it doesn't allow the user to sync up in steps the circumstances will
> allow to succeed.

Sadly, this is quite true. :-(

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git ~unusable on slow lines :,'C
  2012-10-09  1:49 ` Carlos Martín Nieto
  2012-10-09 14:06   ` Marcel Partap
@ 2012-10-09 16:46   ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2012-10-09 16:46 UTC (permalink / raw)
  To: Carlos Martín Nieto; +Cc: Marcel Partap, git

cmn@elego.de (Carlos Martín Nieto) writes:

> If you want to download it bit by bit, you can tell fetch to download
> particular tags. Doing this automatically for this would be working
> around a configuration issue for a particular server, which is generally
> better fixed in other ways.

As part of an upcoming "protocol update" discussion, we may want to
include allowing "upload-pack" to accept a request for commit that
is not at the tip of any ref.

E.g. "want refs/heads/master~*0.1" might ask "I know your entire
history is very big; please give me only the one tenth of the oldest
history during this round." (this is not a suggestion on how to do
this at the UI level).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git ~unusable on slow lines :,'C
  2012-10-09 15:58     ` Shawn Pearce
@ 2012-10-09 17:19       ` Marcel Partap
  0 siblings, 0 replies; 7+ messages in thread
From: Marcel Partap @ 2012-10-09 17:19 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Carlos Martín Nieto, git

>>>> - git fetch should show the total amount of data it is about to
>>>> transfer!
>>> It can't, because it doesn't know.
>> The server side doesn't know at how much the objects *it just repacked
>> for transfer* weigh in?
> Actually it does.
Then, please, make it display it.

> What value is that to you?
The size that is to be transferred, and the total repository size.

> You asked for the repository. If you know its size is going to be ~105
> MiB you have two choices... continue to get the repository you asked
> for, or disconnect and give up.
> Either way the size doesn't help you.
Yes it does - when displayed, one could make an informed choice.
But it doesn't show this, just the object count.. and that is of low
expressiveness.
It so happened last week that I tried cloning a repository with a
seemingly moderate amount of objects and small code base. However, full
Java RE zips had been checked in and updated multiple times - suddenly
my monthly 3G traffic limit was exhausted. Needless to say without a
clue how much more data would follow, I aborted the transfer - and was
left with a net result of *zilch* bytes of code, and a line cut down to
ridiculous speed. Now I can't even sync up my Drupal copy.

> It would require a protocol modification to send a size estimate down
> to the client before the data in order to give the client a better
> progress meter than the object count (allowing it instead to track by
> bytes received).
Well, if it requires that, so be it. I fail to understand why this
wasn't considered before.

> But this has been seen as not very useful or worthwhile
> since it doesn't really help anyone do anything better.
Huh?

> So why change the protocol?
Sanity? Usability of git with slow lines?


> Git assumes that once it has commit X, all versions
> that predate X are already on the local workstation.
And that's true for all my repositories, since none of them was cloned
--shallow.

> This is a fundamental assumption that the entire protocol relies on.
What about --shallow, --depth?

> It is not trivial to change.
Many changes for the better are not trivial. And still worth it.

> We have been through this many times on the mailing
> list, please search the archives for "resumable clone".
Ok - yet that probably doesn't invalidate all arguments in favor of it.

> they should [...] host these bundle files [...]
> and users can download and resume these
Thanks for the tip, I will forward it to the server administrators.
However, this does not help to handle the huge amount of commits to
fetch that pile up within a couple of months.

> This is currently the best way to support resumable clone.
I wasn't even mentioning that, but that'd be nice to have aswell^^...

> If bundles are made once per month or after each
> major release its usually a manageable delta.
While downloading bundle delta files definitely is a plausible solution
- isn't that quite far from user friendly?

> If it did show you, what would you do?
Not try to checkout a repository full of JRE zips blindfolded?

> Declare defeat before it even
> starts to download and give up and start a thread about how Git
> requires too much bandwidth?
Kindly ask the author to locally rewrite his history and recreate the
repository with *LINKS* to JRE zips instead?
Not for a second did I doubt the efficiency of git's packing and
compression algorithms! That's why I'm quite amazed about the shear
existence of these issues of not showing the repository size before
downloading (or, IIUC, *anywhere*) and a protocol that is incapable of
resuming or partly fetching a repository, even though it obviously
provides means of negotiation between server and client.. Just boggles
me that within 7+ years of development this hasn't been addressed
(disclaimer: I do not claim to grok the protocol - not wanting to put
blame on anyone here :).

> Have you tried to shallow clone the repository in question?
No - would it allow me to fuse the two repositories afterwards? That'd
actually be quite cool and a good idea to instantly solve my current
problem... gonna try that, thx :)

#Regards!Marcel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git ~unusable on slow lines :,'C
  2012-10-09 14:06   ` Marcel Partap
  2012-10-09 15:58     ` Shawn Pearce
@ 2012-10-09 17:39     ` Carlos Martín Nieto
  1 sibling, 0 replies; 7+ messages in thread
From: Carlos Martín Nieto @ 2012-10-09 17:39 UTC (permalink / raw)
  To: Marcel Partap; +Cc: Carlos Martín Nieto, git

Marcel Partap <mpartap@gmx.net> writes:

>>> Bam, the server kicked me off after taking to long to sync my copy.
>> This is unrelated to git. The HTTP server's configuration is too
>> impatient.
> Yes. How does that mean it is unrelated to git?
>
>>> - git fetch should show the total amount of data it is about to
>>> transfer!
>> It can't, because it doesn't know.
> The server side doesn't know at how much the objects *it just repacked
> for transfer* weigh in?
> If that truly is the case, wouldn't it make sense to make git a little
> more introspective? f.e.

It sends you more objects than the ones it just repacked in the normal
case. It could tell you, but it would have to keep track of more
information (which would make it take longer for the first bytes to get
to you) for little gain. The only thing you'd be able to do is to
abort the transfer immediately, but you can do that anyway, and waiting
is only going to add history to download.

>> # git info git://foo.org/bar.git
>> .. [server generating figures] ..
>> URL: git://foo.org/bar.git
>> Created/Earliest commit: ...
>> Last modified/Latest commit: ...
>> Total object count: .... (..commits, ..files, .. directories)
>> Total repository size (compressed): ... MiB
>> Branches:
>> [git branch -va] + branch size
>
>> The error message doesn't really know whether it is going to overwrite
>> it (the CR comes from the server), though I suppose an extra LF wouldn't
>> hurt there.
> Definitely wouldn't hurt.
>
>>> - would be nice to be able to tell git fetch to get the next chunk of
>>> say 500 commits instead of trying to receive ALL commits, then b0rking
>>> after umpteen percent on server timeout. Not?
>> You asked for the current state of the repository, and that's what its
>> giving you.
> And instead, I would rather like to ask for the next 500 commits. No way
> to do it.

Do you mean that there are no tags in between your current state and the
one you want to be at?

>
>> The timeout has nothing to do with git, if you can't
>> convince the admins to increase it, you can try using another transport
>> which doesn't suffer from HTTP, as it's most likely an anti-DoS measure.
> See, I probably can't convince the admins to drop their anti-dos measures.
> And they (drupal.org admins) probably will not change their allowed
> protocol policies.

Switch to using the raw git protocol, which is much less likely to have
this sort of measure.

> Despite that, i've had timeouts or simply stale connections dying down
> before with other repositories and various transport modes.
> The easiest fix would be an option to tell git to not fetch everything...
>
>> If you want to download it bit by bit, you can tell fetch to download
>> particular tags.
> ..without specifying specific commit tags.
> Browsing gitweb sites to find a tag for which the fetch doesn't time out
> is hugely inconvenient, especially on a slow line.

Don't use the web then. Use ls-remote to see what's at the other end.

>
>> Doing this automatically for this would be working
>> around a configuration issue for a particular server, which is generally
>> better fixed in other ways.
> It is not only a configuration issue for one particular server. Git in
> general is hardly usable on slow lines because
> - it doesn't show the volume of data that is to be downloaded!

How would showing the amount of data help your connection?

> - it doesn't allow the user to sync up in steps the circumstances will
> allow to succeed.

This is unfortunate is some circunstances, but you haven't shown that
yours is one of these.


   cmn

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-10-09 17:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-08 18:27 Git ~unusable on slow lines :,'C Marcel Partap
2012-10-09  1:49 ` Carlos Martín Nieto
2012-10-09 14:06   ` Marcel Partap
2012-10-09 15:58     ` Shawn Pearce
2012-10-09 17:19       ` Marcel Partap
2012-10-09 17:39     ` Carlos Martín Nieto
2012-10-09 16:46   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).