git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-daemon on NSLU2
@ 2007-08-24  5:54 Jon Smirl
  2007-08-24  6:21 ` Shawn O. Pearce
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Smirl @ 2007-08-24  5:54 UTC (permalink / raw)
  To: Git Mailing List

Any ideas on why git protocol clone is failing?

2007-08-24_20:51:33.85649 [9758] Connection from 72.74.92.181:19367
2007-08-24_20:51:33.85828 [9758] Extended attributes (33 bytes) exist
<host=git.jonsmirl.is-a-geek.net>
2007-08-24_20:51:33.96990 [9758] Request upload-pack for
'/home/git/mpc5200b.git'
2007-08-24_20:51:45.00789 fatal: Out of memory? mmap failed: Cannot
allocate memory
2007-08-24_20:51:45.08746 error: git-upload-pack: git-rev-list died with error.
2007-08-24_20:51:45.08771 fatal: git-upload-pack: aborting due to
possible repository corruption on the remote side.

NSLU2 ($70) is 266Mhz ARM with 32MB memory.
It's running Debian on a 250GB disk with 180MB swap.

Watching top the process runs up to about 60MB in virtual size and exits.
Setting the window down made no difference  packedGitWindowSize = 4194304

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24  5:54 git-daemon on NSLU2 Jon Smirl
@ 2007-08-24  6:21 ` Shawn O. Pearce
  2007-08-24 19:38   ` Jon Smirl
  0 siblings, 1 reply; 30+ messages in thread
From: Shawn O. Pearce @ 2007-08-24  6:21 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List

Jon Smirl <jonsmirl@gmail.com> wrote:
> Any ideas on why git protocol clone is failing?
> 
> 2007-08-24_20:51:33.85649 [9758] Connection from 72.74.92.181:19367
> 2007-08-24_20:51:33.85828 [9758] Extended attributes (33 bytes) exist
> <host=git.jonsmirl.is-a-geek.net>
> 2007-08-24_20:51:33.96990 [9758] Request upload-pack for
> '/home/git/mpc5200b.git'
> 2007-08-24_20:51:45.00789 fatal: Out of memory? mmap failed: Cannot
> allocate memory
> 2007-08-24_20:51:45.08746 error: git-upload-pack: git-rev-list died with error.
> 2007-08-24_20:51:45.08771 fatal: git-upload-pack: aborting due to
> possible repository corruption on the remote side.
> 
> NSLU2 ($70) is 266Mhz ARM with 32MB memory.
> It's running Debian on a 250GB disk with 180MB swap.
> 
> Watching top the process runs up to about 60MB in virtual size and exits.
> Setting the window down made no difference  packedGitWindowSize = 4194304

ulimits?  packedGitLimit may also need to be decreased?  Though we
always try to free unused windows before we declare we are out
of memory...

-- 
Shawn.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24  6:21 ` Shawn O. Pearce
@ 2007-08-24 19:38   ` Jon Smirl
  2007-08-24 20:23     ` Nicolas Pitre
  2007-08-24 20:27     ` Jon Smirl
  0 siblings, 2 replies; 30+ messages in thread
From: Jon Smirl @ 2007-08-24 19:38 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Git Mailing List

I'm still trying to debug git-daemon

I do find it surprising that git-index-pack can't be happy with in
20MB of RAM and it has to continuously swap it's 30MB of virtual. My
disk is chattering itself to death. It stayed that way for 40 minutes.

I'm practicing on the kernel tree.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 19:38   ` Jon Smirl
@ 2007-08-24 20:23     ` Nicolas Pitre
  2007-08-24 21:17       ` Jon Smirl
  2007-08-24 20:27     ` Jon Smirl
  1 sibling, 1 reply; 30+ messages in thread
From: Nicolas Pitre @ 2007-08-24 20:23 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Shawn O. Pearce, Git Mailing List

On Fri, 24 Aug 2007, Jon Smirl wrote:

> I'm still trying to debug git-daemon
> 
> I do find it surprising that git-index-pack can't be happy with in
> 20MB of RAM and it has to continuously swap it's 30MB of virtual. My
> disk is chattering itself to death. It stayed that way for 40 minutes.
> 
> I'm practicing on the kernel tree.

You hope for miracles, do you?  ;-)

Please stop hammering that poor little NSLU2 with such a workset, or 
hack some additional 224MB of RAM into it.  There is no magical 
solution.


Nicolas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 19:38   ` Jon Smirl
  2007-08-24 20:23     ` Nicolas Pitre
@ 2007-08-24 20:27     ` Jon Smirl
  1 sibling, 0 replies; 30+ messages in thread
From: Jon Smirl @ 2007-08-24 20:27 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Git Mailing List

Not sure what I did but I have git-daemon working on the NSLU2 now.

It is unusable with 32MB physical memory.  I am 2hrs into the clone of
the kernel repository and it has only counted 9,500 objects and used
100min CPU time. There are 540,000 objects in the repository.

Disk is chattering insanely, I'm way IO bound.

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 6  2  37960    972    168  11952  160   64  1748    64 2224 2233  5 28  0 67
 4  2  37960   1012    176  11756  168    0  2424     0 2517 2780 10 29  0 61
 2  2  37960    944    200  11792  152    0  1456    88 2102 2067  6 21  0 73
 2  2  37960   1120    180  11620  120    0  1180     0 2106 2122  4 21  0 75
 2  2  37960   1044    180  11788   76   28  1800    28 2255 2275  7 27  0 66
 4  3  37960   1144    176  11436   68    0  1896    12 2384 2553  7 23  0 70
 4  1  37972    992    188  11932   44  188  1148   188 1910 1731  3 18  0 79
 3  2  37976    804    196  12008  336   16  2104   112 2353 2490 13 22  0 65
 2  2  37976   1068    164  11720   96    8  2008     8 2502 2731  5 36  0 59
 2  2  37976   1280    184  11528  140    8  1332    36 2054 1956  7 26  0 67
 4  2  37976   1028    200  11552  264   16   956    16 1855 1710  4 20  0 76
 2  2  37976    844    192  11680  144    8  1576     8 2206 2307  5 31  0 64
 3  1  37984   1304    172  11264   92   28  1444    52 1998 1887  5 23  0 72
 2  2  38000   1012    168  11680  124   84  1896   192 2385 2486  3 30  0 67
 5  2  38008    928    164  11916  136   20  1776    20 2256 2308 11 22  0 67
 2  3  38008   1168    184  11704  144   20  1820    32 2163 2186  5 24  0 71
 4  4  38016    816    156  11784  248   32  1828    44 2328 2422  2 24  0 74
 4  1  38020   1476    160  11448  152  104  2080   116 1925 1728  3 24  0 73
 2  5  38028    828    192  12140  240  140  1768   232 2319 2226  4 29  0 68
 2  2  38020   1136    172  11880  156   16  1764    72 2081 2020  3 20  0 77
 2  3  38060   1040    172  12016  188  140  2056   140 2180 2182  6 26  0 68

root     11241  0.3  0.0    104    24 ?        Ss   06:54   0:07 runsv
git-daemon
gitlog   11242  0.0  0.1    124    40 ?        S    06:54   0:01
svlogd -tt /var/log/git-daemon
root     11335  0.0  0.4   1620   140 pts/0    S+   06:56   0:00
strace git-daemon --verbose --export-all /home/git
root     11336  0.0  0.4   1808   144 pts/0    S+   06:56   0:00
git-daemon --verbose --export-all /home/git
root     11344  0.1  1.0  60240   328 pts/0    S+   06:56   0:02
/usr/local/bin/git-upload-pack --strict --timeout=0 .
root     11349  6.5 50.8 171868 15240 pts/0    D+   06:56   2:09
/usr/local/bin/git-upload-pack --strict --timeout=0 .
root     11350  0.6 14.6  16392  4380 pts/0    S+   06:56   0:12
/usr/local/bin git-pack-objects --stdout --progress
--delta-base-offset


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 20:23     ` Nicolas Pitre
@ 2007-08-24 21:17       ` Jon Smirl
  2007-08-24 21:54         ` Nicolas Pitre
                           ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Jon Smirl @ 2007-08-24 21:17 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Shawn O. Pearce, Git Mailing List

On 8/24/07, Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 24 Aug 2007, Jon Smirl wrote:
>
> > I'm still trying to debug git-daemon
> >
> > I do find it surprising that git-index-pack can't be happy with in
> > 20MB of RAM and it has to continuously swap it's 30MB of virtual. My
> > disk is chattering itself to death. It stayed that way for 40 minutes.
> >
> > I'm practicing on the kernel tree.
>
> You hope for miracles, do you?  ;-)

We're going something wrong in git-daemon. I can clone the tree in
five minutes using the http protocol. Using the git protocol would
take 24hrs if I let it finish.


> Please stop hammering that poor little NSLU2 with such a workset, or
> hack some additional 224MB of RAM into it.  There is no magical
> solution.
>
>
> Nicolas
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 21:17       ` Jon Smirl
@ 2007-08-24 21:54         ` Nicolas Pitre
  2007-08-24 22:06         ` Jon Smirl
  2007-08-24 23:28         ` Linus Torvalds
  2 siblings, 0 replies; 30+ messages in thread
From: Nicolas Pitre @ 2007-08-24 21:54 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Shawn O. Pearce, Git Mailing List

On Fri, 24 Aug 2007, Jon Smirl wrote:

> On 8/24/07, Nicolas Pitre <nico@cam.org> wrote:
> > On Fri, 24 Aug 2007, Jon Smirl wrote:
> >
> > > I'm still trying to debug git-daemon
> > >
> > > I do find it surprising that git-index-pack can't be happy with in
> > > 20MB of RAM and it has to continuously swap it's 30MB of virtual. My
> > > disk is chattering itself to death. It stayed that way for 40 minutes.
> > >
> > > I'm practicing on the kernel tree.
> >
> > You hope for miracles, do you?  ;-)
> 
> We're going something wrong in git-daemon. I can clone the tree in
> five minutes using the http protocol. Using the git protocol would
> take 24hrs if I let it finish.

The http protocol is merely only a dumb file copy with no packing 
optimization what so ever.

The native protocol performs a whole more to provide clients with only 
the minimum data needed.

Try running "git repack -a" directly on the NSLU2.  You should have the 
same performance problems as with a clone.


Nicolas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 21:17       ` Jon Smirl
  2007-08-24 21:54         ` Nicolas Pitre
@ 2007-08-24 22:06         ` Jon Smirl
  2007-08-24 22:39           ` Jakub Narebski
  2007-08-25  0:10           ` Nicolas Pitre
  2007-08-24 23:28         ` Linus Torvalds
  2 siblings, 2 replies; 30+ messages in thread
From: Jon Smirl @ 2007-08-24 22:06 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Shawn O. Pearce, Git Mailing List

On 8/24/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> We're going something wrong in git-daemon. I can clone the tree in
> five minutes using the http protocol. Using the git protocol would
> take 24hrs if I let it finish.

20Mb/s to kernel.org
time git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
real    2m34.629s

20Mb/s to kernel.org
time git clone http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
real    3m52.203s

Same kernel from my NSLU2 over http (100Mb/s)
time git clone http://jonsmirl.is-a-geek.net/apache2-default/mpc.git
real    2m36.227s

Using git protocol to nslu2 takes 24hrs

On 8/24/07, Nicolas Pitre <nico@cam.org> wrote:
> Try running "git repack -a" directly on the NSLU2.  You should have the
> same performance problems as with a clone.

This is true, it would take over 24hrs to finish.

Is their a reason why initial clone hasn't been special cased? Why
can't initial clone just blast over the pack file already sitting on
the disk?

I also wonder if a little application of some sorting to in-memory
data structures could help with the random IO patterns. I'm getting
the same data out of a stupid HTTP server and it doesn't go all IO
bound on me so a solution has to be possible.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 22:06         ` Jon Smirl
@ 2007-08-24 22:39           ` Jakub Narebski
  2007-08-24 22:59             ` Junio C Hamano
  2007-08-24 23:46             ` Jon Smirl
  2007-08-25  0:10           ` Nicolas Pitre
  1 sibling, 2 replies; 30+ messages in thread
From: Jakub Narebski @ 2007-08-24 22:39 UTC (permalink / raw)
  To: git

Jon Smirl wrote:
> On 8/24/07, Nicolas Pitre <nico@cam.org> wrote:

>> Try running "git repack -a" directly on the NSLU2.  You should have the
>> same performance problems as with a clone.
> 
> This is true, it would take over 24hrs to finish.
> 
> Is their a reason why initial clone hasn't been special cased? Why
> can't initial clone just blast over the pack file already sitting on
> the disk?

There was idea to special case clone (just concatenate the packs, the
receiving side as someone told there can detect pack boundaries; do not
forget to pack loose objects, first), instead of using generic fetch --all
for clone, bnut no code. Code speaks louder than words (although if someone
would provide details of pack boundary detection...)

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 22:39           ` Jakub Narebski
@ 2007-08-24 22:59             ` Junio C Hamano
  2007-08-24 23:21               ` Jakub Narebski
  2007-08-24 23:46             ` Jon Smirl
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2007-08-24 22:59 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> There was idea to special case clone (just concatenate the packs, the
> receiving side as someone told there can detect pack boundaries; do not
> forget to pack loose objects, first), instead of using generic fetch --all
> for clone, bnut no code. Code speaks louder than words (although if someone
> would provide details of pack boundary detection...)

I have to say that "although ..." part of that statement
disqualifies this to be called an "idea".

Really, I find that you (yes, in this case I am not generalizing
but talking specifically about you) tend to overuse the word
"idea" when you talk things that are not yet even at that stage
yet.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 22:59             ` Junio C Hamano
@ 2007-08-24 23:21               ` Jakub Narebski
  0 siblings, 0 replies; 30+ messages in thread
From: Jakub Narebski @ 2007-08-24 23:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> There was idea to special case clone (just concatenate the packs, the
>> receiving side as someone told there can detect pack boundaries; do not
>> forget to pack loose objects, first), instead of using generic fetch --all
>> for clone, bnut no code. Code speaks louder than words (although if someone
>> would provide details of pack boundary detection...)
> 
> I have to say that "although ..." part of that statement
> disqualifies this to be called an "idea".

Ermm... if I remember correctly during discussion (single subthread)
there were provided details, or at least idea, of how to separate
concatented packs into individual packs. Unfortunately I haven't
saved the message, and do not remember enogh of it to search archives...

I should have wrote "remind" instead of "provide" there...

> Really, I find that you (yes, in this case I am not generalizing
> but talking specifically about you) tend to overuse the word
> "idea" when you talk things that are not yet even at that stage
> yet.

I'm not native English speaker... ;-)

Seriously, it's a fault of mine...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 21:17       ` Jon Smirl
  2007-08-24 21:54         ` Nicolas Pitre
  2007-08-24 22:06         ` Jon Smirl
@ 2007-08-24 23:28         ` Linus Torvalds
  2007-08-25 15:44           ` Jon Smirl
  2 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2007-08-24 23:28 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List



On Fri, 24 Aug 2007, Jon Smirl wrote:
> 
> We're going something wrong in git-daemon.

Nope.

Or rather, it's mostly by design.

> I can clone the tree in five minutes using the http protocol. Using the 
> git protocol would take 24hrs if I let it finish.

The http side doesn't actually do any global verification, the way 
git-daemon does. So to it, everything is just temporary buffers, and you 
don't need any memory at all, really.

git-daemon will create a packfile. That means that it has to generate the 
*global* object reachability, and will then optimize the object packing 
etc etc. That's a minimum of something like 48 bytes per object for just 
the object chains, and the kernel has a *lot* of objects (over half a 
million).

In addition to the object chains yourself, the native protocol will also 
obviously have to actually *look* at and parse all the tree and commit 
objects while it does all this, so while it doesn't necessarily keep all 
of those in memory all the time, it will need to access them, and if you 
don't have enough memory to cache them, that will add its own set of IO.

So I haven't checked exactly how much memory you really want to have to 
serve big projects, but with some handwavy guesstimate, if you actually 
want to do a good job I'd guess that you really want to have at least as 
much memory as the size of largest project you are serving, and probably 
add at least 10-20% on top of that.

So for the kernel, at a guess, you'd probably want to have at least 256MB 
of RAM to do a half-way good job. 512MB is likely nicer and allows you to 
actually cache the stuff over multiple accesses.

But I haven't actually tested. Maybe it might be bearable at 128M.

			Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 22:39           ` Jakub Narebski
  2007-08-24 22:59             ` Junio C Hamano
@ 2007-08-24 23:46             ` Jon Smirl
  2007-08-25  0:04               ` Junio C Hamano
  1 sibling, 1 reply; 30+ messages in thread
From: Jon Smirl @ 2007-08-24 23:46 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On 8/24/07, Jakub Narebski <jnareb@gmail.com> wrote:
> There was idea to special case clone (just concatenate the packs, the
> receiving side as someone told there can detect pack boundaries; do not
> forget to pack loose objects, first), instead of using generic fetch --all
> for clone, bnut no code. Code speaks louder than words (although if someone
> would provide details of pack boundary detection...)

A related concept, initial clone of a repository does the equivalent
of repack -a on the repo before transmitting it. Why aren't we saving
those results by switching the repo onto the new pack file? Then the
next clone that comes along won't have to do anything but send the
file.

But this logic can be flipped around, if the remote needs any object
from the pack file, just send them the whole pack file and let the
remote sort it out. Using this logic you can still minimize the IO
statistically.

When a remote does a fetch you have to pack all of the loose objects.
When the loose object pile reaches 20MB or so, the fetch can trigger a
repack of the oldest half into a pack that is kept by the tree and
replaces those older loose objects. For future fetches simply apply
the rule of sending the whole pack if any object is needed.

The repack of the 10MB of older objects can be kicked out to another
process and copied into the tree when it is finished. At that point
the loose objects can be deleted. The git db can tolerate a process
copying in a new packfile and deleting the old objects while other
processes may be using the database, right?

This model shouldn't statistically change the amount of data very
much. If you haven't synced your tree in a month a few too many
objects may get sent to you. However, it should dramatically reduce
the IO load on the server cause by git protocol initial clones.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 23:46             ` Jon Smirl
@ 2007-08-25  0:04               ` Junio C Hamano
  2007-08-25  7:12                 ` David Kastrup
  2007-08-25 17:02                 ` Salikh Zakirov
  0 siblings, 2 replies; 30+ messages in thread
From: Junio C Hamano @ 2007-08-25  0:04 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Jakub Narebski, git

"Jon Smirl" <jonsmirl@gmail.com> writes:

> On 8/24/07, Jakub Narebski <jnareb@gmail.com> wrote:
>> There was idea to special case clone (just concatenate the packs, the
>> receiving side as someone told there can detect pack boundaries; do not
>> forget to pack loose objects, first), instead of using generic fetch --all
>> for clone, bnut no code. Code speaks louder than words (although if someone
>> would provide details of pack boundary detection...)
>
> A related concept, initial clone of a repository does the equivalent
> of repack -a on the repo before transmitting it. Why aren't we saving
> those results by switching the repo onto the new pack file? Then the
> next clone that comes along won't have to do anything but send the
> file.

If the majority of the access to your repository is the initial
clone request, then it might be a worthwhile thing to do.  In
fact didn't we use to have such a "pre-prepared pack" support?

But I do not think "majority is initial clone" is the norm.
Even among the people who does an "initial clone" (from the
end-user perspective), what they do may not be the initial full
clone your special hack helps (and that was one of the reasons
we dropped the pre-prepared pack support --- "been there, done
that" to some extent).

 - If your client "clone"s only a single branch by doing:

	$ git init
	$ git remote add origin $remote_url
        $ git pull origin master

   the set of objects you need to send would be different
   (slightly smaller) than the normal clone.

 - Another example would be a client that uses --reference:

	$ git clone --reference neigh.git git://yourbox/repo.git

   which would give you a request that is different from the
   usual initial full clone request.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 22:06         ` Jon Smirl
  2007-08-24 22:39           ` Jakub Narebski
@ 2007-08-25  0:10           ` Nicolas Pitre
  1 sibling, 0 replies; 30+ messages in thread
From: Nicolas Pitre @ 2007-08-25  0:10 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Shawn O. Pearce, Git Mailing List

On Fri, 24 Aug 2007, Jon Smirl wrote:

> On 8/24/07, Nicolas Pitre <nico@cam.org> wrote:
> > Try running "git repack -a" directly on the NSLU2.  You should have the
> > same performance problems as with a clone.
> 
> This is true, it would take over 24hrs to finish.
> 
> Is their a reason why initial clone hasn't been special cased? Why
> can't initial clone just blast over the pack file already sitting on
> the disk?

What is the gain?  You'll get back to the same performance problem 
eventually with some fetch operation, unless you intend to serve clients 
with the whole pack everytime just like the http protocol does.

Also you don't want people cloning from you getting stuff that sits in 
your reflog.  The native protocol makes sure that only the needed 
objects are sent over and no more.

> I also wonder if a little application of some sorting to in-memory
> data structures could help with the random IO patterns. I'm getting
> the same data out of a stupid HTTP server and it doesn't go all IO
> bound on me so a solution has to be possible.

The http application is, indeed, stupid.  It performs no reachability 
analysis, no repacking, no nothing except copying the bits over.

And yes I did add some sorting optimizations in this round, so if you 
try 2.5.3-* you should have them.  But there is a limit to what can be 
done.

Point is, if you want serious Git serving, and not only _dumb_ protocols 
(http is one of them) then you need more RAM.  The NSLU2 is cool, but 
maybe not appropriate for serving the Linux kernel natively with Git.


Nicolas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-25  0:04               ` Junio C Hamano
@ 2007-08-25  7:12                 ` David Kastrup
  2007-08-25 17:02                 ` Salikh Zakirov
  1 sibling, 0 replies; 30+ messages in thread
From: David Kastrup @ 2007-08-25  7:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jon Smirl, Jakub Narebski, git

Junio C Hamano <gitster@pobox.com> writes:

> "Jon Smirl" <jonsmirl@gmail.com> writes:
>
>> On 8/24/07, Jakub Narebski <jnareb@gmail.com> wrote:
>>> There was idea to special case clone (just concatenate the packs, the
>>> receiving side as someone told there can detect pack boundaries; do not
>>> forget to pack loose objects, first), instead of using generic fetch --all
>>> for clone, bnut no code. Code speaks louder than words (although if someone
>>> would provide details of pack boundary detection...)
>>
>> A related concept, initial clone of a repository does the equivalent
>> of repack -a on the repo before transmitting it. Why aren't we saving
>> those results by switching the repo onto the new pack file? Then the
>> next clone that comes along won't have to do anything but send the
>> file.
>
> If the majority of the access to your repository is the initial
> clone request, then it might be a worthwhile thing to do.  In fact
> didn't we use to have such a "pre-prepared pack" support?
>
> But I do not think "majority is initial clone" is the norm.

Well, as long as the majority is not affected negatively, catering for
a minority better is a strict improvement.  Most repositories will
never get cloned and won't be affected.  But there are some
repositories with a non-trivial amount of cloning.

> Even among the people who does an "initial clone" (from the
> end-user perspective), what they do may not be the initial full
> clone your special hack helps (and that was one of the reasons
> we dropped the pre-prepared pack support --- "been there, done
> that" to some extent).

If it doesn't get used, its presence does no harm, of course except
from having to be maintained and tested.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-24 23:28         ` Linus Torvalds
@ 2007-08-25 15:44           ` Jon Smirl
  2007-08-26  9:33             ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Smirl @ 2007-08-25 15:44 UTC (permalink / raw)
  To: Linus Torvalds, jnareb; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List

On 8/24/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > I can clone the tree in five minutes using the http protocol. Using the
> > git protocol would take 24hrs if I let it finish.
>
> The http side doesn't actually do any global verification, the way
> git-daemon does. So to it, everything is just temporary buffers, and you
> don't need any memory at all, really.
>
> git-daemon will create a packfile. That means that it has to generate the
> *global* object reachability, and will then optimize the object packing
> etc etc. That's a minimum of something like 48 bytes per object for just
> the object chains, and the kernel has a *lot* of objects (over half a
> million).

A large, repeating work load is created in this process when you take
a 200MB pack, repack it to add a few loose objects and then don't save
the results. This model makes the NSLU2 unusable, but I also see it at
my shared hosting provider. Initial clones of a repo that take 3min
from kernel.org take 25min on a shared host since the RAM is not
dedicated.

There are three categories of fetches:
1) initial clone, fetch all
2) fetch recent
3) I haven't fetched in three months

99% of fetches fall in the first two categories.

A very simple solution is to sendfile() existing packs if they contain
any objects that the client wants and let the client deal with the
unwanted objects. Yes this does send extra traffic over the net, but
the only group significantly impacted is #2 which is the most
infrequent group.

Loose objects are handled as they are currently. To optimize this
scheme you need to let the loose objects build up at the server and
then periodically sweep only the older ones into a pack. Packing the
entire repo into a single pack would cause recent fetches to retrieve
the entire pack.

Initial clone can be optimized further by recognizing that the
receiving repository is empty and sending them everything; no need to
compute which objects are missing at the server. This method will
speed up initial clone since the existing pack can be immediately sent
instead of waiting on a pack file to be built. Build the loose object
pack in parallel with sending the existing packs.

I recognize that in the case of cloning a single branch or --reference
too many objects will also be transmitted but I believe the benefits
of reducing the server load outweigh the overhead of transmitting
extra objects in this case. You can always remove the extra objects on
the client side.

On 8/24/07, Jakub Narebski <jnareb@gmail.com> wrote:
> There was idea to special case clone (just concatenate the packs, the
> receiving side as someone told there can detect pack boundaries; do not
> forget to pack loose objects, first), instead of using generic fetch --all
> for clone, bnut no code. Code speaks louder than words (although if someone
> would provide details of pack boundary detection...)

Write the file name and length into the socket before sending the
pack. Use sendfile() or it's current incarnation to actually send the
pack. Insert these header lines between packs.

> In addition to the object chains yourself, the native protocol will also
> obviously have to actually *look* at and parse all the tree and commit
> objects while it does all this, so while it doesn't necessarily keep all
> of those in memory all the time, it will need to access them, and if you
> don't have enough memory to cache them, that will add its own set of IO.
>
> So I haven't checked exactly how much memory you really want to have to
> serve big projects, but with some handwavy guesstimate, if you actually
> want to do a good job I'd guess that you really want to have at least as
> much memory as the size of largest project you are serving, and probably
> add at least 10-20% on top of that.
>
> So for the kernel, at a guess, you'd probably want to have at least 256MB
> of RAM to do a half-way good job. 512MB is likely nicer and allows you to
> actually cache the stuff over multiple accesses.
>
> But I haven't actually tested. Maybe it might be bearable at 128M.
>
>                         Linus
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-25  0:04               ` Junio C Hamano
  2007-08-25  7:12                 ` David Kastrup
@ 2007-08-25 17:02                 ` Salikh Zakirov
  1 sibling, 0 replies; 30+ messages in thread
From: Salikh Zakirov @ 2007-08-25 17:02 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:
> But I do not think "majority is initial clone" is the norm.
> Even among the people who does an "initial clone" (from the
> end-user perspective), what they do may not be the initial full
> clone your special hack helps (and that was one of the reasons
> we dropped the pre-prepared pack support --- "been there, done
> that" to some extent).

FWIW, on my previous job release engineering team used git
in a special way involving lots of initial clones. 

The project itself was kept under SVN, and several machines
were doing continuous builds, starting from scratch.
Unfortunately, doing from scratch checkouts from SVN was not
an option because of high SVN checkout overhead, and machines
did a git-clone of imported repository instead.

Obviously using --reference would have saved even more on initial clone,
but the release team consisting of a pregnant woman and an
intern student had neither time nor inclination to learn
git any deeper than were strictly necessary to get the job done.
Apparently, pure git-clone performance was good enough.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-25 15:44           ` Jon Smirl
@ 2007-08-26  9:33             ` Jeff King
  2007-08-26 16:34               ` Jon Smirl
  2007-08-27  0:14               ` Jakub Narebski
  0 siblings, 2 replies; 30+ messages in thread
From: Jeff King @ 2007-08-26  9:33 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Linus Torvalds, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List

On Sat, Aug 25, 2007 at 11:44:07AM -0400, Jon Smirl wrote:

> A very simple solution is to sendfile() existing packs if they contain
> any objects that the client wants and let the client deal with the
> unwanted objects. Yes this does send extra traffic over the net, but
> the only group significantly impacted is #2 which is the most
> infrequent group.
>
> Loose objects are handled as they are currently. To optimize this
> scheme you need to let the loose objects build up at the server and
> then periodically sweep only the older ones into a pack. Packing the
> entire repo into a single pack would cause recent fetches to retrieve
> the entire pack.

I was about to write "but then 'fetch recent' clients will have to get
the entire repo after the upstream does a 'git-repack -a -d'" but you
seem to have figured that out already.

I'm unclear: are you proposing new behavior for git-daemon in general,
or a special mode for resource-constrained servers? If general behavior,
are you suggesting that we never use 'git-repack -a' on repos which
might be cloned?

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-26  9:33             ` Jeff King
@ 2007-08-26 16:34               ` Jon Smirl
  2007-08-26 17:15                 ` Linus Torvalds
  2007-08-27  0:14               ` Jakub Narebski
  1 sibling, 1 reply; 30+ messages in thread
From: Jon Smirl @ 2007-08-26 16:34 UTC (permalink / raw)
  To: Jeff King
  Cc: Linus Torvalds, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List

On 8/26/07, Jeff King <peff@peff.net> wrote:
> On Sat, Aug 25, 2007 at 11:44:07AM -0400, Jon Smirl wrote:
>
> > A very simple solution is to sendfile() existing packs if they contain
> > any objects that the client wants and let the client deal with the
> > unwanted objects. Yes this does send extra traffic over the net, but
> > the only group significantly impacted is #2 which is the most
> > infrequent group.
> >
> > Loose objects are handled as they are currently. To optimize this
> > scheme you need to let the loose objects build up at the server and
> > then periodically sweep only the older ones into a pack. Packing the
> > entire repo into a single pack would cause recent fetches to retrieve
> > the entire pack.
>
> I was about to write "but then 'fetch recent' clients will have to get
> the entire repo after the upstream does a 'git-repack -a -d'" but you
> seem to have figured that out already.
>
> I'm unclear: are you proposing new behavior for git-daemon in general,
> or a special mode for resource-constrained servers? If general behavior,
> are you suggesting that we never use 'git-repack -a' on repos which
> might be cloned?

This would be a new general behavior. There are cases where git-daemon
is very resource hungry, rearranging things a little can remove this
need for everyone.

There are several ways to address the repack -a problem. But the
simplest solution may be the best, send existing packs only on an
initial clone. In all other cases continue with the current algorithm.
We could work on methods for making the middle case better but it is
so infrequent it is probably not worth bothering with.

Changing git-daemon only for the initial clone case also means that
people don't need to change the way they manage packs.

Posters have been saying, why worry about initial clone since it isn't
done that often. I agree that it isn't done that often, but if it is
done all on my NSLU2 it will take about 40hrs to complete. We can
easily see the impact of changing the the initial clone algorithm, the
http clone takes 3min.

BTW, if the NSLU2 needs a repack -a I can do it on another machine and
copy it over. Or maybe someone will write a repack that is happy in
20MB. The NSLU2 is a great home server, it is usually fast enough.
Power consumption is a tiny 8W, fine to leave on 24/7, My NSLU2 is as
powerful as the average desktop machine in the early 90's, how quickly
we forget.


>
> -Peff
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-26 16:34               ` Jon Smirl
@ 2007-08-26 17:15                 ` Linus Torvalds
  2007-08-26 18:06                   ` Jon Smirl
  2007-08-26 22:24                   ` Daniel Hulme
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2007-08-26 17:15 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List



On Sun, 26 Aug 2007, Jon Smirl wrote:
> 
> Changing git-daemon only for the initial clone case also means that
> people don't need to change the way they manage packs.

I do agree that we might want to do some special-case handling for the 
initial clone (because it *is* kind of special), but it's not necessarily 
as easy as just re-using an existing pack.

At a minimum, we'd need to have something that knows how to make a single 
pack out of several packs and some loose objects. That shouldn't be 
*hard*, but it's certainly nontrivial, especially in the presense of the 
same objects possibly being available more than once in different packs.

[ The "duplicate object" thing does actually happen: even if you use only 
  "git native" protocols, you can get duplicate objects because a file was 
  changed back to an earlier version. The incremental packs you get from 
  push/pull'ing between two repositories try to send the minimal 
  incremental changes, but the keyword here is _try_: they will 
  potentially send objects that the receiver already has, if it's not 
  obvious that the receiver has them from the "commit boundary" cases ]

Maybe the client side will handle a pack with duplicate objects perfectly 
fine, and it's not an issue. Maybe. It might even be likely (I can't think 
of anything that would obviously break). But at a minimum, it would be 
something that needs some code on the sending side, and a lot of 
verification that the end result works ok on the receiving side.

And there's actually a deeper problem: the current native protocol 
guarantees that the objects sent over are only those that are reachable. 
That matters. It matters for subtle security issues (maybe you are 
exporting some repository that was rebased, and has objects that you 
didn't *intend* to make public!), but it also matters for issues like git 
"alternates" files.

If you only ever look at a single repo, you'll never see the alternates 
issue, but if you're seriously looking at serving git repositories, I 
don't really see the "single repo" case as being at all the most common or 
interesting case. 

And if you look at something like kernel.org, the "alternates" thing is 
*much* more important than how much memory git-daemon uses! Yes, 
kernel.org would probably be much happier if git-daemon wasn't such a 
memory pig occasionally, but on the other hand, the win from using 
alternates and being able to share 99% of all objects in all the various 
related kernel repositories is actually likely to be a *bigger* memory win 
than any git-daemon memory usage, because now the disk caching works a 
hell of a lot better!

So it's not actually clear how the initial clone thing can be optimized on 
the server side.

It's easier to optimize on the *client* side: just do the initial clone 
with rsync/http (and "git gc" it on the client afterwards), and then 
change it to the git native protocol after the clone.

That may not sound very user-friendly, but let's face it, I think there is 
exactly one person in the whole universe that tries to use an NSLU2 as a 
git server. So the "client-side workaround" is likely to affect a very 
limited number of clients ;)

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-26 17:15                 ` Linus Torvalds
@ 2007-08-26 18:06                   ` Jon Smirl
  2007-08-26 18:26                     ` Linus Torvalds
  2007-08-26 22:24                   ` Daniel Hulme
  1 sibling, 1 reply; 30+ messages in thread
From: Jon Smirl @ 2007-08-26 18:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List

On 8/26/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> And there's actually a deeper problem: the current native protocol
> guarantees that the objects sent over are only those that are reachable.
> That matters. It matters for subtle security issues (maybe you are
> exporting some repository that was rebased, and has objects that you
> didn't *intend* to make public!), but it also matters for issues like git
> "alternates" files.

Are these objects visible through the other protocols? It seems
dangerous to leave something on an open server that you want to keep
hidden.

> If you only ever look at a single repo, you'll never see the alternates
> issue, but if you're seriously looking at serving git repositories, I
> don't really see the "single repo" case as being at all the most common or
> interesting case.
>
> And if you look at something like kernel.org, the "alternates" thing is
> *much* more important than how much memory git-daemon uses! Yes,
> kernel.org would probably be much happier if git-daemon wasn't such a
> memory pig occasionally, but on the other hand, the win from using
> alternates and being able to share 99% of all objects in all the various
> related kernel repositories is actually likely to be a *bigger* memory win
> than any git-daemon memory usage, because now the disk caching works a
> hell of a lot better!

Doesn't kernel.org use alternates or something equivalent for serving
up all those nearly identical kernel trees?

I've been handling the problem locally by using remotes and fetching
all the repos I'm interested in into a single git db.

>
> So it's not actually clear how the initial clone thing can be optimized on
> the server side.
>
> It's easier to optimize on the *client* side: just do the initial clone
> with rsync/http (and "git gc" it on the client afterwards), and then
> change it to the git native protocol after the clone.

Even better, get them to clone from kernel.org and then just fetch in
the differences from my server. It's an educational problem.

How about changing initial clone to refuse to use the git protocol?

>
> That may not sound very user-friendly, but let's face it, I think there is
> exactly one person in the whole universe that tries to use an NSLU2 as a
> git server. So the "client-side workaround" is likely to affect a very
> limited number of clients ;)

I'll send you one and double the size of the user base. I have this
fancy new 20Mb FIOS connection and I can't come up with anything to
use the bandwidth on.

Anyway, I already gave up and moved on to a hosting provider. Repo is
here: http://git.digispeaker.com/ There's nothing there yet but a
clone of the 2.6 tree.
I don't  think there is a solution for running a git daemon on a shared host.

Petr pointed out to me that an NSLU2 is late 90's equivalent not early
so my memory if faulty too.


>
>                 Linus
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-26 18:06                   ` Jon Smirl
@ 2007-08-26 18:26                     ` Linus Torvalds
  2007-08-26 19:00                       ` Jon Smirl
  2007-08-27 11:03                       ` Theodore Tso
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2007-08-26 18:26 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List



On Sun, 26 Aug 2007, Jon Smirl wrote:
>
> On 8/26/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > And there's actually a deeper problem: the current native protocol
> > guarantees that the objects sent over are only those that are reachable.
> > That matters. It matters for subtle security issues (maybe you are
> > exporting some repository that was rebased, and has objects that you
> > didn't *intend* to make public!), but it also matters for issues like git
> > "alternates" files.
> 
> Are these objects visible through the other protocols? It seems
> dangerous to leave something on an open server that you want to keep
> hidden.

They'd be visible to any stupid walker, yes. But if you're 
security-conscious, you'd simply not *allow* any stupid walkers.

One of the goals of "git-daemon" was to have a simple service that was 
"obviously secure". Now, it's debatable just how obvious the daemon is, 
but it really is pretty simple, and I do think it should be possible to 
almost statically validate that it only ever reads files, and that it will 
only ever read files that act like valid *git* data. 

Some people may care about that kind of thing. I don't know how many,  but 
it really was one of the design criteria (which is why, for example, git 
daemon will just silently close the connection if it finds something 
fishy: no fishing expeditions with bad clients trying to figure out what 
files exist on a server allowed!).

So the fact that a web server or rsync will expose everything is kind of 
irrelevant - those are *designed* to expose everything. git-daemon was 
designed *not* to do that.

> Doesn't kernel.org use alternates or something equivalent for serving
> up all those nearly identical kernel trees?

Absolutely. And that's the point. "git-daemon" will serve a nice 
individualized pack, even though any particular repository doesn't have 
one, but is really a combination of "the base Linus pack + extensions".

> > So it's not actually clear how the initial clone thing can be optimized on
> > the server side.
> >
> > It's easier to optimize on the *client* side: just do the initial clone
> > with rsync/http (and "git gc" it on the client afterwards), and then
> > change it to the git native protocol after the clone.
> 
> Even better, get them to clone from kernel.org and then just fetch in
> the differences from my server. It's an educational problem.

Yes. 

> How about changing initial clone to refuse to use the git protocol?

Absolutely not. It's quite often the best one to use (the ssh protocol 
has the exact same issues, and is the only secure protocol).

But on a SNLU2, maybe *you* want to make your server side refuse it? I 
would be easy enough: if the client doesn't report any existing SHA1's, 
you just say "I'm not going to work with you".

			Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-26 18:26                     ` Linus Torvalds
@ 2007-08-26 19:00                       ` Jon Smirl
  2007-08-26 20:19                         ` Linus Torvalds
  2007-08-27 11:03                       ` Theodore Tso
  1 sibling, 1 reply; 30+ messages in thread
From: Jon Smirl @ 2007-08-26 19:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List

On 8/26/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > Doesn't kernel.org use alternates or something equivalent for serving
> > up all those nearly identical kernel trees?
>
> Absolutely. And that's the point. "git-daemon" will serve a nice
> individualized pack, even though any particular repository doesn't have
> one, but is really a combination of "the base Linus pack + extensions".

A really simple change to the git protocol would be to make the client
loop on the request. On the first request the server would see that
the client has no objects and send the "base Linus pack". The client
would then loop around and repeat the process which will trigger the
current pack building process.

Do pack files contain enough information about the heads of the object
chains for this to work? The client needs to be able to determine it's
state after receiving the pack and send the info back in the next
round.

I'm not buying the security argument. If you want something kept
hidden get it out of the public db. If I know the sha of the hidden
object can't I just add a head for it and git-deamon will happily send
it and the chain up to it to me?

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-26 19:00                       ` Jon Smirl
@ 2007-08-26 20:19                         ` Linus Torvalds
  2007-08-26 21:22                           ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2007-08-26 20:19 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List



On Sun, 26 Aug 2007, Jon Smirl wrote:
> 
> A really simple change to the git protocol would be to make the client
> loop on the request. On the first request the server would see that
> the client has no objects and send the "base Linus pack". The client
> would then loop around and repeat the process which will trigger the
> current pack building process.

Jon, just give it up. The fact is, the git protocol works the right way 
already.

> I'm not buying the security argument. If you want something kept hidden 
> get it out of the public db. If I know the sha of the hidden object 
> can't I just add a head for it and git-deamon will happily send it and 
> the chain up to it to me?

That's a particularly idiotic statement.

If you know the SHA1, there can *by*definition* not be any hidden objects. 
The SHA1 depends on the object chain.

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-26 20:19                         ` Linus Torvalds
@ 2007-08-26 21:22                           ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2007-08-26 21:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jon Smirl, Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

>> I'm not buying the security argument. If you want something kept hidden 
>> get it out of the public db. If I know the sha of the hidden object 
>> can't I just add a head for it and git-deamon will happily send it and 
>> the chain up to it to me?
>
> That's a particularly idiotic statement.
>
> If you know the SHA1, there can *by*definition* not be any hidden objects. 
> The SHA1 depends on the object chain.

I think what we have is even stronger --- upload-pack does not
allow asking for an arbitrary commit.  The requesting fetch-pack
side needs to pick from what are offerred, and upload-pack makes
sure of that.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-26 17:15                 ` Linus Torvalds
  2007-08-26 18:06                   ` Jon Smirl
@ 2007-08-26 22:24                   ` Daniel Hulme
  1 sibling, 0 replies; 30+ messages in thread
From: Daniel Hulme @ 2007-08-26 22:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jon Smirl, Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]

On Sun, Aug 26, 2007 at 10:15:24AM -0700, Linus Torvalds wrote:
> It's easier to optimize on the *client* side: just do the initial clone 
> with rsync/http (and "git gc" it on the client afterwards), and then 
> change it to the git native protocol after the clone.

When I was working on Xen two years ago, they did the same thing with
their Mercurial repository. They had a proper repo that handled all the
push and fetch traffic, and a cron job would periodically pull from that
into a second repo. This second one was served by http. People were
encouraged to download the seed repo and then do a fetch (from the main
one) immediately.

I don't know whether they still do that, but in any case it shows your
idea is not unprecedented.

-- 
Kanga  said to Roo,  "Drink up  your milk  first, dear, and  talk after-
wards." So Roo, who was drinking his milk, tried to say that he could do
both at once... and had to be  patted on the back  and dried for quite a
long time afterwards.                     A. A. Milne, 'Winnie-the-Pooh'

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-26  9:33             ` Jeff King
  2007-08-26 16:34               ` Jon Smirl
@ 2007-08-27  0:14               ` Jakub Narebski
  1 sibling, 0 replies; 30+ messages in thread
From: Jakub Narebski @ 2007-08-27  0:14 UTC (permalink / raw)
  To: Jeff King, Git Mailing List
  Cc: Jon Smirl, Linus Torvalds, Nicolas Pitre, Shawn O. Pearce

On Sun, Aug 26, 2007, Jeff King wrote:
> On Sat, Aug 25, 2007 at 11:44:07AM -0400, Jon Smirl wrote:
> 
>> A very simple solution is to sendfile() existing packs if they contain
>> any objects that the client wants and let the client deal with the
>> unwanted objects. Yes this does send extra traffic over the net, but
>> the only group significantly impacted is #2 which is the most
>> infrequent group.
>>
>> Loose objects are handled as they are currently. To optimize this
>> scheme you need to let the loose objects build up at the server and
>> then periodically sweep only the older ones into a pack. Packing the
>> entire repo into a single pack would cause recent fetches to retrieve
>> the entire pack.
> 
> I was about to write "but then 'fetch recent' clients will have to get
> the entire repo after the upstream does a 'git-repack -a -d'" but you
> seem to have figured that out already.
> 
> I'm unclear: are you proposing new behavior for git-daemon in general,
> or a special mode for resource-constrained servers? If general behavior,
> are you suggesting that we never use 'git-repack -a' on repos which
> might be cloned?

I think that "reuse existing packs if sensible" idea (instead of generating
always new pack) is a good one, even if at first limited to the clone case.

There are nevertheless a few complications.

1. When discussing this idea on git mailing list some time ago somebody
said that we don't need to implement "multi pack" extension (which was
at the beginning in the design, to add later, if I understand correctly),
it is enough to concatenate packs. The receiving side can then detect
boundaries between packs and split them appropriately. But is a
concatenated a proper pack? If not, then we can send concatenation of
packs only if the client (receiving side) understands it, and can split it;
it means checking for protocol extension...

2. How to detect that request is for a clone? git-clone is get all remote
heads and fetch from just received heads. But because fecthing refs and
fetching objects is separate, we cannot I think use this sequence for
detecting that we want a clone. We can use "no haves" as heuristic to
detect a clone request, but "no haves" occurs also for initial fetching of
single branch (i.e. using: git-remote; git-fetch sequence instead of
git-clone).

3. The problem with alternates mentioned by Linus is not much a problem,
as we can simply consider packs from the alternate repository/repositories.
For example if we use single alternate, we would send concatenation of
packs from this repository, and from alternate (and pack of loose objects
from this repository).


We would probably want to have some heuristic (besides configuring
git-daemon) to choose between reusing existing packs (and sending them
concatenated), and generating a pack for sending. Note that for dumb
transports we have the opposite problem and opposite idea: we always
send full packs for dumb transports; the idea was to use range downloading
(available at least for http and ftp protocols) to download only needed
fragments of packs. Perhaps if some % of pack (number of objects in the
pack or size of pack) is to be send then we reuse the pack, and remove
objects in the pack from consideration. No idea of how to implement that,
though. Or if number of objects in pack to be send crosses some threshold,
or generating pack/doing reachability analysis takes to loong, then reuse
existing packs.

Or you can wait fro the GitTorrent protocol to be implemented, or implement
it yourself... ;-)

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-26 18:26                     ` Linus Torvalds
  2007-08-26 19:00                       ` Jon Smirl
@ 2007-08-27 11:03                       ` Theodore Tso
  2007-08-27 16:26                         ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Theodore Tso @ 2007-08-27 11:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jon Smirl, Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List

On Sun, Aug 26, 2007 at 11:26:07AM -0700, Linus Torvalds wrote:
> > How about changing initial clone to refuse to use the git protocol?
> 
> Absolutely not. It's quite often the best one to use (the ssh protocol 
> has the exact same issues, and is the only secure protocol).
> 
> But on a SNLU2, maybe *you* want to make your server side refuse it? I 
> would be easy enough: if the client doesn't report any existing SHA1's, 
> you just say "I'm not going to work with you".

What if the server sends a message which current clients interprets as
an error, and which newer clients could interpret as, "do a clone from
<this> URL, and then come back and talk to me".  Basically an
automated redirect to get the "Linus base pack" somewhere else, and
then to go back to the original server.  It certainly doesn't make
sense to change anything about the low-level protocol, but maybe a
higher level redirect would make sense, just as a user convenience thing.

       	     	      	    	 	- Ted

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: git-daemon on NSLU2
  2007-08-27 11:03                       ` Theodore Tso
@ 2007-08-27 16:26                         ` Linus Torvalds
  0 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2007-08-27 16:26 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Jon Smirl, Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce,
	Git Mailing List



On Mon, 27 Aug 2007, Theodore Tso wrote:
> 
> What if the server sends a message which current clients interprets as
> an error, and which newer clients could interpret as, "do a clone from
> <this> URL, and then come back and talk to me".  Basically an
> automated redirect to get the "Linus base pack" somewhere else, and
> then to go back to the original server.  It certainly doesn't make
> sense to change anything about the low-level protocol, but maybe a
> higher level redirect would make sense, just as a user convenience thing.

I agree, a redirect might be a good idea regardless of whether it's 
something like "I'm a poor little NSLU2, please don't do anything but 
incremental updates", or whether it's something like "this repository has 
moved, use address xyz instead".

And it should be pretty easy from a high-level protocol, although it does 
obviously need both server and client support.

			Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-08-27 16:28 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-24  5:54 git-daemon on NSLU2 Jon Smirl
2007-08-24  6:21 ` Shawn O. Pearce
2007-08-24 19:38   ` Jon Smirl
2007-08-24 20:23     ` Nicolas Pitre
2007-08-24 21:17       ` Jon Smirl
2007-08-24 21:54         ` Nicolas Pitre
2007-08-24 22:06         ` Jon Smirl
2007-08-24 22:39           ` Jakub Narebski
2007-08-24 22:59             ` Junio C Hamano
2007-08-24 23:21               ` Jakub Narebski
2007-08-24 23:46             ` Jon Smirl
2007-08-25  0:04               ` Junio C Hamano
2007-08-25  7:12                 ` David Kastrup
2007-08-25 17:02                 ` Salikh Zakirov
2007-08-25  0:10           ` Nicolas Pitre
2007-08-24 23:28         ` Linus Torvalds
2007-08-25 15:44           ` Jon Smirl
2007-08-26  9:33             ` Jeff King
2007-08-26 16:34               ` Jon Smirl
2007-08-26 17:15                 ` Linus Torvalds
2007-08-26 18:06                   ` Jon Smirl
2007-08-26 18:26                     ` Linus Torvalds
2007-08-26 19:00                       ` Jon Smirl
2007-08-26 20:19                         ` Linus Torvalds
2007-08-26 21:22                           ` Junio C Hamano
2007-08-27 11:03                       ` Theodore Tso
2007-08-27 16:26                         ` Linus Torvalds
2007-08-26 22:24                   ` Daniel Hulme
2007-08-27  0:14               ` Jakub Narebski
2007-08-24 20:27     ` Jon Smirl

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).