git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* fetching packs and storing them as packs
@ 2006-10-26  3:44 Nicolas Pitre
  2006-10-26 14:45 ` Eran Tromer
  2006-10-29  7:47 ` [PATCH] send-pack --keep: do not explode into loose objects on the receiving end Junio C Hamano
  0 siblings, 2 replies; 51+ messages in thread
From: Nicolas Pitre @ 2006-10-26  3:44 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

With the last few patches I just posted it is now possible to receive 
(fetch) packs, validate them on the fly, complete them if they are thin 
packs, and store them directly without exploding them into loose 
objects.

There are advantages and inconvenients to both methods, so I think this 
should become a configuration option and/or even a command line argument 
to git-fetch. I think there are many more advantages to keeping packs 
packed hence I think using index-pack should become the default.

But I'm a bit tired to play with it and the final integration is for 
someone else to do.  I've tested it lightly using the extremely crude 
patch below to hook it in the fetch process.

Have fun!

diff --git a/fetch-clone.c b/fetch-clone.c
index 76b99af..28796c3 100644
--- a/fetch-clone.c
+++ b/fetch-clone.c
@@ -142,7 +142,8 @@ int receive_unpack_pack(int xd[2], const
 		dup2(fd[0], 0);
 		close(fd[0]);
 		close(fd[1]);
-		execl_git_cmd("unpack-objects", quiet ? "-q" : NULL, NULL);
+		execl_git_cmd("index-pack", "--stdin", "--fix-thin",
+			      quiet ? NULL : "-v", NULL);
 		die("git-unpack-objects exec failed");
 	}
 	close(fd[0]);
diff --git a/receive-pack.c b/receive-pack.c
index 1fcf3a9..7f6dc49 100644
--- a/receive-pack.c
+++ b/receive-pack.c
@@ -7,7 +7,7 @@
 
 static const char receive_pack_usage[] = "git-receive-pack <git-dir>";
 
-static const char *unpacker[] = { "unpack-objects", NULL };
+static const char *unpacker[] = { "index-pack", "-v", "--stdin", "--fix-thin", NULL };
 
 static int report_status;

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-26  3:44 fetching packs and storing them as packs Nicolas Pitre
@ 2006-10-26 14:45 ` Eran Tromer
       [not found]   ` <Pine.LNX.4.64.0610261105200.12418@xanadu.home>
  2006-10-27 20:22   ` Linus Torvalds
  2006-10-29  7:47 ` [PATCH] send-pack --keep: do not explode into loose objects on the receiving end Junio C Hamano
  1 sibling, 2 replies; 51+ messages in thread
From: Eran Tromer @ 2006-10-26 14:45 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

On 2006-10-26 05:44, Nicolas Pitre wrote:
> diff --git a/receive-pack.c b/receive-pack.c
> index 1fcf3a9..7f6dc49 100644
> --- a/receive-pack.c
> +++ b/receive-pack.c
> @@ -7,7 +7,7 @@
>  
>  static const char receive_pack_usage[] = "git-receive-pack <git-dir>";
>  
> -static const char *unpacker[] = { "unpack-objects", NULL };
> +static const char *unpacker[] = { "index-pack", "-v", "--stdin", "--fix-thin", NULL };
>  
>  static int report_status;

This creates a race condition w.r.t. "git repack -a -d", similar to the
existing race condition between "git fetch --keep" and
"git repack -a -d". There's a point in time where the new pack is stored
but not yet referenced, and if "git repack -a -d" runs at that point it
will eradicate the pack. When the heads are finally updated, you get a
corrupted repository.

(That's for the shell implementation of git-repack, at least. I assume
the new builtin preserves the old semantics.)

Since people run the supposedly safe "git repack -a -d" on regular
basis, this is going to bite.

  Eran

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
       [not found]   ` <Pine.LNX.4.64.0610261105200.12418@xanadu.home>
@ 2006-10-26 22:09     ` Eran Tromer
  2006-10-27  0:50       ` Nicolas Pitre
  0 siblings, 1 reply; 51+ messages in thread
From: Eran Tromer @ 2006-10-26 22:09 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Junio C Hamano, git

Hi,

On 2006-10-26 17:08, Nicolas Pitre wrote:
> On Thu, 26 Oct 2006, Eran Tromer wrote:
>> This creates a race condition w.r.t. "git repack -a -d", similar to the
>> existing race condition between "git fetch --keep" and
>> "git repack -a -d". There's a point in time where the new pack is stored
>> but not yet referenced, and if "git repack -a -d" runs at that point it
>> will eradicate the pack. When the heads are finally updated, you get a
>> corrupted repository.
> 
> And how is it different from receiving a pack through git-unpack-objects 
> where lots of loose objects are created, and git-repack -a -d removing 
> those unconnected loose objects before the heads are updated?

git-repack -a -d does not touch unconnected loose objects.
It removes only unconnected packed objects.

Only git-prune removes unconnected loose objects, and that's documented
as unsafe.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-26 22:09     ` Eran Tromer
@ 2006-10-27  0:50       ` Nicolas Pitre
  2006-10-27  1:42         ` Shawn Pearce
  0 siblings, 1 reply; 51+ messages in thread
From: Nicolas Pitre @ 2006-10-27  0:50 UTC (permalink / raw
  To: Eran Tromer; +Cc: Junio C Hamano, git

On Fri, 27 Oct 2006, Eran Tromer wrote:

> Hi,
> 
> On 2006-10-26 17:08, Nicolas Pitre wrote:
> > On Thu, 26 Oct 2006, Eran Tromer wrote:
> >> This creates a race condition w.r.t. "git repack -a -d", similar to the
> >> existing race condition between "git fetch --keep" and
> >> "git repack -a -d". There's a point in time where the new pack is stored
> >> but not yet referenced, and if "git repack -a -d" runs at that point it
> >> will eradicate the pack. When the heads are finally updated, you get a
> >> corrupted repository.
> > 
> > And how is it different from receiving a pack through git-unpack-objects 
> > where lots of loose objects are created, and git-repack -a -d removing 
> > those unconnected loose objects before the heads are updated?
> 
> git-repack -a -d does not touch unconnected loose objects.
> It removes only unconnected packed objects.

Right.

> Only git-prune removes unconnected loose objects, and that's documented
> as unsafe.

Well, the race does exist.  Don't do repack -a -d at the same time then.

This race should be adressed somehow if it is really a problem.  Now 
that I've used index-pack in place of unpack-objects for a while, I 
don't think I'll want to go back.  It is simply faster to fetch, faster 
to checkout, faster to repack, wastes less disk space, etc.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  0:50       ` Nicolas Pitre
@ 2006-10-27  1:42         ` Shawn Pearce
  2006-10-27  2:38           ` Sean
                             ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Shawn Pearce @ 2006-10-27  1:42 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Eran Tromer, Junio C Hamano, git

Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 27 Oct 2006, Eran Tromer wrote:
> 
> > Hi,
> > 
> > On 2006-10-26 17:08, Nicolas Pitre wrote:
> > > On Thu, 26 Oct 2006, Eran Tromer wrote:
> > >> This creates a race condition w.r.t. "git repack -a -d", similar to the
> > >> existing race condition between "git fetch --keep" and
> > >> "git repack -a -d". There's a point in time where the new pack is stored
> > >> but not yet referenced, and if "git repack -a -d" runs at that point it
> > >> will eradicate the pack. When the heads are finally updated, you get a
> > >> corrupted repository.
> > > 
> > > And how is it different from receiving a pack through git-unpack-objects 
> > > where lots of loose objects are created, and git-repack -a -d removing 
> > > those unconnected loose objects before the heads are updated?
> > 
> > git-repack -a -d does not touch unconnected loose objects.
> > It removes only unconnected packed objects.
> 
> Right.
> 
> > Only git-prune removes unconnected loose objects, and that's documented
> > as unsafe.
> 
> Well, the race does exist.  Don't do repack -a -d at the same time then.

This is an issue for "central" repositories that people push into
and which might be getting repacked according to a cronjob.

Unfortunately I don't have a solution.  I tried to come up with
one but didn't.  :-)

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  1:42         ` Shawn Pearce
@ 2006-10-27  2:38           ` Sean
  2006-10-27  6:57             ` Junio C Hamano
  2006-10-27  2:41           ` Nicolas Pitre
  2006-10-27  2:42           ` Eran Tromer
  2 siblings, 1 reply; 51+ messages in thread
From: Sean @ 2006-10-27  2:38 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Nicolas Pitre, Eran Tromer, Junio C Hamano, git

On Thu, 26 Oct 2006 21:42:29 -0400
Shawn Pearce <spearce@spearce.org> wrote:

> This is an issue for "central" repositories that people push into
> and which might be getting repacked according to a cronjob.
> 
> Unfortunately I don't have a solution.  I tried to come up with
> one but didn't.  :-)

What about creating a temporary ref before pushing, and then removing
it only after the HEAD has been updated?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  1:42         ` Shawn Pearce
  2006-10-27  2:38           ` Sean
@ 2006-10-27  2:41           ` Nicolas Pitre
  2006-10-27  2:42           ` Eran Tromer
  2 siblings, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2006-10-27  2:41 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Eran Tromer, Junio C Hamano, git

On Thu, 26 Oct 2006, Shawn Pearce wrote:

> Nicolas Pitre <nico@cam.org> wrote:
> > On Fri, 27 Oct 2006, Eran Tromer wrote:
> > 
> > > Hi,
> > > 
> > > On 2006-10-26 17:08, Nicolas Pitre wrote:
> > > > On Thu, 26 Oct 2006, Eran Tromer wrote:
> > > >> This creates a race condition w.r.t. "git repack -a -d", similar to the
> > > >> existing race condition between "git fetch --keep" and
> > > >> "git repack -a -d". There's a point in time where the new pack is stored
> > > >> but not yet referenced, and if "git repack -a -d" runs at that point it
> > > >> will eradicate the pack. When the heads are finally updated, you get a
> > > >> corrupted repository.
> > > > 
> > > > And how is it different from receiving a pack through git-unpack-objects 
> > > > where lots of loose objects are created, and git-repack -a -d removing 
> > > > those unconnected loose objects before the heads are updated?
> > > 
> > > git-repack -a -d does not touch unconnected loose objects.
> > > It removes only unconnected packed objects.
> > 
> > Right.
> > 
> > > Only git-prune removes unconnected loose objects, and that's documented
> > > as unsafe.
> > 
> > Well, the race does exist.  Don't do repack -a -d at the same time then.
> 
> This is an issue for "central" repositories that people push into
> and which might be getting repacked according to a cronjob.
> 
> Unfortunately I don't have a solution.  I tried to come up with
> one but didn't.  :-)

Just continue to explode received packs into loose objects then.  It is 
that simple.  I said there were advantages and inconvenients to both 
methods. This one is a nice example.

I won't repack from a cron job, so I don't expect to run a repack and a 
fetch at the same time on my private repositories.  I therefore don't 
care about that race and so is the case for the vast majority of users.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  1:42         ` Shawn Pearce
  2006-10-27  2:38           ` Sean
  2006-10-27  2:41           ` Nicolas Pitre
@ 2006-10-27  2:42           ` Eran Tromer
  2006-10-27  3:00             ` Shawn Pearce
  2 siblings, 1 reply; 51+ messages in thread
From: Eran Tromer @ 2006-10-27  2:42 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Nicolas Pitre, Junio C Hamano, git

On 2006-10-27 03:42, Shawn Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
>> On Fri, 27 Oct 2006, Eran Tromer wrote:
>> Well, the race does exist.  Don't do repack -a -d at the same time then.
> 
> This is an issue for "central" repositories that people push into
> and which might be getting repacked according to a cronjob.

AFAICT, the bottom line of the "Re: auto-packing on kernel.org? please?"
thread last October was "sure, go ahead".


> Unfortunately I don't have a solution.  I tried to come up with
> one but didn't.  :-)

Here's one way to do it.
Change git-repack to follow references under $GIT_DIR/tmp/refs/ too.
To receive or fetch a pack:
1. Add references to the new heads in
   `mktemp $GIT_DIR/tmp/refs/XXXXXX`.
2. Put the new .pack under $GIT_DIR/objects/pack/.
3. Put the new .idx under $GIT_DIR/objects/pack/.
4. Update the relevant heads under $GIT_DIR/refs/.
5. Delete the references from step 1.

This is repack-safe and never corrupts the repo. The worst-case failure
mode is if you die before cleaning the refs from $GIT_DIR/tmp/refs. That
may mean some packed objects will never be removed by "repack -a -d"
even if they lose all references from $GIT_DIR/refs, so do "tmpwatch -m
240 $GIT_DIR/tmp/refs" to take care of that.

  Eran

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  2:42           ` Eran Tromer
@ 2006-10-27  3:00             ` Shawn Pearce
  2006-10-27  3:13               ` Sean
                                 ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Shawn Pearce @ 2006-10-27  3:00 UTC (permalink / raw
  To: Eran Tromer; +Cc: Nicolas Pitre, Junio C Hamano, git

Eran Tromer <git2eran@tromer.org> wrote:
> > Unfortunately I don't have a solution.  I tried to come up with
> > one but didn't.  :-)
> 
> Here's one way to do it.
> Change git-repack to follow references under $GIT_DIR/tmp/refs/ too.
> To receive or fetch a pack:
> 1. Add references to the new heads in
>    `mktemp $GIT_DIR/tmp/refs/XXXXXX`.
> 2. Put the new .pack under $GIT_DIR/objects/pack/.
> 3. Put the new .idx under $GIT_DIR/objects/pack/.
> 4. Update the relevant heads under $GIT_DIR/refs/.
> 5. Delete the references from step 1.
> 
> This is repack-safe and never corrupts the repo. The worst-case failure
> mode is if you die before cleaning the refs from $GIT_DIR/tmp/refs. That
> may mean some packed objects will never be removed by "repack -a -d"
> even if they lose all references from $GIT_DIR/refs, so do "tmpwatch -m
> 240 $GIT_DIR/tmp/refs" to take care of that.

That was actually my (and also Sean's) solution.  Except I would
put the temporary refs as "$GIT_DIR/refs/ref_XXXXXX" as this is
less code to change and its consistent with how temporary loose
objects are created.

Unfortunately it does not completely work.

What happens when the incoming pack (steps #2 and #3) takes 15
minutes to upload (slow ADSL modem, lots of objects) and the
background repack process sees those temporary refs and starts
trying to include those objects?  It can't walk the DAG that those
refs point at because the objects aren't in the current repository.

From what I know of that code the pack-objects process will fail to
find the object pointed at by the ref, rescan the packs directory,
find no new packs, look for the object again, and abort over the
"corruption".

OK so the repository won't get corrupted but the repack would be
forced to abort.


Another issue I just thought about tonight is we may need a
count-packs utility that like count-objects lists the number
of active packs and their total size.  If we start hanging onto
every pack we receive over the wire the pack directory is going to
grow pretty fast and we'll need a way to tell us when its time to
`repack -a -d`.

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  3:00             ` Shawn Pearce
@ 2006-10-27  3:13               ` Sean
  2006-10-27  3:20                 ` Jakub Narebski
  2006-10-27  4:03               ` Eran Tromer
  2006-10-27 14:27               ` Nicolas Pitre
  2 siblings, 1 reply; 51+ messages in thread
From: Sean @ 2006-10-27  3:13 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Eran Tromer, Nicolas Pitre, Junio C Hamano, git

On Thu, 26 Oct 2006 23:00:54 -0400
Shawn Pearce <spearce@spearce.org> wrote:

> What happens when the incoming pack (steps #2 and #3) takes 15
> minutes to upload (slow ADSL modem, lots of objects) and the
> background repack process sees those temporary refs and starts
> trying to include those objects?  It can't walk the DAG that those
> refs point at because the objects aren't in the current repository.

As long as there was standard naming for such temporary refs,
they could be completely ignored by the repack process, no?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  3:13               ` Sean
@ 2006-10-27  3:20                 ` Jakub Narebski
  2006-10-27  3:27                   ` Sean
  0 siblings, 1 reply; 51+ messages in thread
From: Jakub Narebski @ 2006-10-27  3:20 UTC (permalink / raw
  To: git

Sean wrote:

> On Thu, 26 Oct 2006 23:00:54 -0400
> Shawn Pearce <spearce@spearce.org> wrote:
> 
>> What happens when the incoming pack (steps #2 and #3) takes 15
>> minutes to upload (slow ADSL modem, lots of objects) and the
>> background repack process sees those temporary refs and starts
>> trying to include those objects?  It can't walk the DAG that those
>> refs point at because the objects aren't in the current repository.
> 
> As long as there was standard naming for such temporary refs,
> they could be completely ignored by the repack process, no?

You meant I think: half ignored. Taken into account when finding
which parts are referenced to delete (-d part), but not complain
if they don't point to anything (validation).

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  3:20                 ` Jakub Narebski
@ 2006-10-27  3:27                   ` Sean
  0 siblings, 0 replies; 51+ messages in thread
From: Sean @ 2006-10-27  3:27 UTC (permalink / raw
  To: Jakub Narebski; +Cc: git

On Fri, 27 Oct 2006 05:20:53 +0200
Jakub Narebski <jnareb@gmail.com> wrote:

> You meant I think: half ignored. Taken into account when finding
> which parts are referenced to delete (-d part), but not complain
> if they don't point to anything (validation).

Yes, ignored by repack, not ignored by prune.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  3:00             ` Shawn Pearce
  2006-10-27  3:13               ` Sean
@ 2006-10-27  4:03               ` Eran Tromer
  2006-10-27  4:42                 ` Shawn Pearce
  2006-10-27 14:27               ` Nicolas Pitre
  2 siblings, 1 reply; 51+ messages in thread
From: Eran Tromer @ 2006-10-27  4:03 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Nicolas Pitre, Junio C Hamano, git

Hi,

On 2006-10-27 05:00, Shawn Pearce wrote:
>> Change git-repack to follow references under $GIT_DIR/tmp/refs/ too.
>> To receive or fetch a pack:
>> 1. Add references to the new heads in
>>    `mktemp $GIT_DIR/tmp/refs/XXXXXX`.
>> 2. Put the new .pack under $GIT_DIR/objects/pack/.
>> 3. Put the new .idx under $GIT_DIR/objects/pack/.
>> 4. Update the relevant heads under $GIT_DIR/refs/.
>> 5. Delete the references from step 1.

> That was actually my (and also Sean's) solution.  Except I would
> put the temporary refs as "$GIT_DIR/refs/ref_XXXXXX" as this is
> less code to change and its consistent with how temporary loose
> objects are created.

If you do that, other programs (e.g., anyone who uses rev-list --all)
may try to walk those heads or consider them available before the pack
is really there. The point about $GIT_DIR/tmp/refs is that only programs
meddling with physical packs (git-fetch, git-receive-pack, git-repack)
will know about it.


> What happens when the incoming pack (steps #2 and #3) takes 15
> minutes to upload (slow ADSL modem, lots of objects) and the
> background repack process sees those temporary refs and starts
> trying to include those objects?  It can't walk the DAG that those
> refs point at because the objects aren't in the current repository.
> 
>>From what I know of that code the pack-objects process will fail to
> find the object pointed at by the ref, rescan the packs directory,
> find no new packs, look for the object again, and abort over the
> "corruption".

Good point. Then I guess we'll need to change git-repack to ignore
missing objects if they're referenced from $GIT_DIR/tmp/refs but not
from $GIT_DIR/refs. Ugly, but shouldn't be too hard.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  4:03               ` Eran Tromer
@ 2006-10-27  4:42                 ` Shawn Pearce
  2006-10-27  7:42                   ` Alex Riesen
  0 siblings, 1 reply; 51+ messages in thread
From: Shawn Pearce @ 2006-10-27  4:42 UTC (permalink / raw
  To: Eran Tromer; +Cc: Nicolas Pitre, Junio C Hamano, git

Eran Tromer <git2eran@tromer.org> wrote:
> On 2006-10-27 05:00, Shawn Pearce wrote:
> >> Change git-repack to follow references under $GIT_DIR/tmp/refs/ too.
> >> To receive or fetch a pack:
> >> 1. Add references to the new heads in
> >>    `mktemp $GIT_DIR/tmp/refs/XXXXXX`.
> >> 2. Put the new .pack under $GIT_DIR/objects/pack/.
> >> 3. Put the new .idx under $GIT_DIR/objects/pack/.
> >> 4. Update the relevant heads under $GIT_DIR/refs/.
> >> 5. Delete the references from step 1.
> 
> > That was actually my (and also Sean's) solution.  Except I would
> > put the temporary refs as "$GIT_DIR/refs/ref_XXXXXX" as this is
> > less code to change and its consistent with how temporary loose
> > objects are created.
> 
> If you do that, other programs (e.g., anyone who uses rev-list --all)
> may try to walk those heads or consider them available before the pack
> is really there. The point about $GIT_DIR/tmp/refs is that only programs
> meddling with physical packs (git-fetch, git-receive-pack, git-repack)
> will know about it.
 
Doh.  Yes, of course, that makes much sense.

Hmm... Looking at git-repack we have two things currently pending
to rework in there:

  - Historical vs. active packs.
  - Don't delete a possibly still incoming pack during -d.

These have a lot of the same implementation issues.  We need to
be able to identify a set of packs which should be allowed for
repack with -a, and allowed for removal with -d if -a was also used.
A newly uploaded pack cannot be in that list unless its contents are
referenced by one or more refs (which implies that the receive-pack
process has completed).

I'm thinking that the ref thing might be unnecessary.  We just
need to fix repack so it builds a list of "active packs" whose
objects should be copied into the new pack, and then only packs
loose objects and those objects contained by an active packs.

So the receive-pack process becomes:

  a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
  b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
  c. Write pack and index.
  d. Move pack to $GIT_DIR/objects/pack/...
  e. Move index to $GIT_DIR/objects/pack...
  f. Update refs.
  g. Arrange for new pack and index to be considered active.

And the repack -a -d process becomes:

  1. List all active packs and store in memory.
  2. Repack only loose objects and objects contained in active packs.
  3. Move new pack and idx into $GIT_DIR/objects/pack/...
  4. Arrange for new pack and idx to be considered active.
  5. Delete active packs found by step #1.

Junio was originally considering making historical packs
historical by placing their names into an information file (such as
`$GIT_DIR/objects/info/historical-packs`) and then consider all other
packs as active.  Thus step #1 is list all packs and removes those
whose names appear in historical-packs, while step #4 is unnecessary.

I was thinking about just changing the "pack-" prefix to "hist-" for
the historical packs and assuming all "pack-*.pack" to be active.
Thus step #1 is a simple glob on the pack directory and step #4
is unnecessary.

In the latter case its easy to mark an existing pack as historical
(just hardlink hist- names for pack, then idx, then unlink previous
names) and its also easy to mark new incoming packs as non active
by using a different prefix (e.g. "incm-") during step #d/#e and
then relinking them as "pack-" during step #g.  Its also very safe
on systems that support hardlinks.

We shouldn't ever need to worry about race conditions with repacking
historical packs.  For starters historical packs will tend to be
several years' worth of object accumulation and will be so large
that repacking them might take 45 minutes or more.  Thus they
probably will never get repacked.  An active pack will simply move
into historical status after it gets so large that its no longer
worthwhile to keep repacking it.  They also will tend to have objects
that are so old that at least one ref in the repository will point
at their entire DAG and thus everything would carry over on a repack.

So this would be cleaner then messing around with temporary refs and
gets us the historical pack feature we've been looking to implement.

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  2:38           ` Sean
@ 2006-10-27  6:57             ` Junio C Hamano
  2006-10-27 17:23               ` Nicolas Pitre
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2006-10-27  6:57 UTC (permalink / raw
  To: Sean; +Cc: Shawn Pearce, Nicolas Pitre, Eran Tromer, git

Sean <seanlkml@sympatico.ca> writes:

> On Thu, 26 Oct 2006 21:42:29 -0400
> Shawn Pearce <spearce@spearce.org> wrote:
>
>> This is an issue for "central" repositories that people push into
>> and which might be getting repacked according to a cronjob.
>> 
>> Unfortunately I don't have a solution.  I tried to come up with
>> one but didn't.  :-)
>
> What about creating a temporary ref before pushing, and then removing
> it only after the HEAD has been updated?

That won't work.  If repack is faster than index-pack, repack
would fail to find necessary objects, barf, and would not remove
the existing or new pack, and then index-pack would eventually
succeed and when it does at least your repository is complete
even though it may still have redundant objects in packs.

So in that sense, it is not a disaster, so it might be a good
enough solution.

I'd almost say "heavy repository-wide operations like 'repack -a
-d' and 'prune' should operate under a single repository lock",
but historically we've avoided locks and instead tried to do
things optimistically and used compare-and-swap to detect
conflicts, so maybe that avenue might be worth pursuing.

How about (I'm thinking aloud and I'm sure there will be
holes -- I won't think about prune for now)...

* "repack -a -d":

 (1) initially run show-ref (or "ls-remote .") and store the
     result in .git/$ref_pack_lock_file;

 (2) enumerate existing packs;

 (3) do the usual "rev-list --all | pack-objects" thing; this
     may end up including more objects than what are reachable
     from the result of (1) if somebody else updates refs in the
     meantime;

 (4) enumerate existing packs; if there is difference from (2)
     other than what (3) created, that means somebody else added
     a pack in the meantime; stop and do not do the "-d" part;

 (5) run "ls-remote ." again and compare it with what it got in
     (1); if different, somebody else updated a ref in the
     meantime; stop and do not do the "-d" part;

 (6) do the "-d" part as usual by removing packs we saw in (2)
     but do not remove the pack we created in (3);

 (7) remove .git/$ref_pack_lock_file.

* "fetch --thin" and "index-pack --stdin":

 (1) check the .git/$ref_pack_lock_file, and refuse to operate
    if there is such (this is not strictly needed for
    correctness but only to give an early exit);

 (2) create a new pack under a temporary name, and when
     complete, make the pack/index pair .pack and .idx;

 (3) update the refs.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  4:42                 ` Shawn Pearce
@ 2006-10-27  7:42                   ` Alex Riesen
  2006-10-27  7:52                     ` Shawn Pearce
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Riesen @ 2006-10-27  7:42 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Eran Tromer, Nicolas Pitre, Junio C Hamano, git

> So the receive-pack process becomes:
>
>  a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
> b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.

Why not $GIT_DIR/objects/tmp/pack... and ignore it everywhere?

On 10/27/06, Shawn Pearce <spearce@spearce.org> wrote:
> Eran Tromer <git2eran@tromer.org> wrote:
> > On 2006-10-27 05:00, Shawn Pearce wrote:
> > >> Change git-repack to follow references under $GIT_DIR/tmp/refs/ too.
> > >> To receive or fetch a pack:
> > >> 1. Add references to the new heads in
> > >>    `mktemp $GIT_DIR/tmp/refs/XXXXXX`.
> > >> 2. Put the new .pack under $GIT_DIR/objects/pack/.
> > >> 3. Put the new .idx under $GIT_DIR/objects/pack/.
> > >> 4. Update the relevant heads under $GIT_DIR/refs/.
> > >> 5. Delete the references from step 1.
> >
> > > That was actually my (and also Sean's) solution.  Except I would
> > > put the temporary refs as "$GIT_DIR/refs/ref_XXXXXX" as this is
> > > less code to change and its consistent with how temporary loose
> > > objects are created.
> >
> > If you do that, other programs (e.g., anyone who uses rev-list --all)
> > may try to walk those heads or consider them available before the pack
> > is really there. The point about $GIT_DIR/tmp/refs is that only programs
> > meddling with physical packs (git-fetch, git-receive-pack, git-repack)
> > will know about it.
>
> Doh.  Yes, of course, that makes much sense.
>
> Hmm... Looking at git-repack we have two things currently pending
> to rework in there:
>
>   - Historical vs. active packs.
>   - Don't delete a possibly still incoming pack during -d.
>
> These have a lot of the same implementation issues.  We need to
> be able to identify a set of packs which should be allowed for
> repack with -a, and allowed for removal with -d if -a was also used.
> A newly uploaded pack cannot be in that list unless its contents are
> referenced by one or more refs (which implies that the receive-pack
> process has completed).
>
> I'm thinking that the ref thing might be unnecessary.  We just
> need to fix repack so it builds a list of "active packs" whose
> objects should be copied into the new pack, and then only packs
> loose objects and those objects contained by an active packs.
>
> So the receive-pack process becomes:
>
>   a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
>   b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
>   c. Write pack and index.
>   d. Move pack to $GIT_DIR/objects/pack/...
>   e. Move index to $GIT_DIR/objects/pack...
>   f. Update refs.
>   g. Arrange for new pack and index to be considered active.
>
> And the repack -a -d process becomes:
>
>   1. List all active packs and store in memory.
>   2. Repack only loose objects and objects contained in active packs.
>   3. Move new pack and idx into $GIT_DIR/objects/pack/...
>   4. Arrange for new pack and idx to be considered active.
>   5. Delete active packs found by step #1.
>
> Junio was originally considering making historical packs
> historical by placing their names into an information file (such as
> `$GIT_DIR/objects/info/historical-packs`) and then consider all other
> packs as active.  Thus step #1 is list all packs and removes those
> whose names appear in historical-packs, while step #4 is unnecessary.
>
> I was thinking about just changing the "pack-" prefix to "hist-" for
> the historical packs and assuming all "pack-*.pack" to be active.
> Thus step #1 is a simple glob on the pack directory and step #4
> is unnecessary.
>
> In the latter case its easy to mark an existing pack as historical
> (just hardlink hist- names for pack, then idx, then unlink previous
> names) and its also easy to mark new incoming packs as non active
> by using a different prefix (e.g. "incm-") during step #d/#e and
> then relinking them as "pack-" during step #g.  Its also very safe
> on systems that support hardlinks.
>
> We shouldn't ever need to worry about race conditions with repacking
> historical packs.  For starters historical packs will tend to be
> several years' worth of object accumulation and will be so large
> that repacking them might take 45 minutes or more.  Thus they
> probably will never get repacked.  An active pack will simply move
> into historical status after it gets so large that its no longer
> worthwhile to keep repacking it.  They also will tend to have objects
> that are so old that at least one ref in the repository will point
> at their entire DAG and thus everything would carry over on a repack.
>
> So this would be cleaner then messing around with temporary refs and
> gets us the historical pack feature we've been looking to implement.
>
> --
> Shawn.
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  7:42                   ` Alex Riesen
@ 2006-10-27  7:52                     ` Shawn Pearce
  2006-10-27  8:08                       ` Alex Riesen
  0 siblings, 1 reply; 51+ messages in thread
From: Shawn Pearce @ 2006-10-27  7:52 UTC (permalink / raw
  To: Alex Riesen; +Cc: Eran Tromer, Nicolas Pitre, Junio C Hamano, git

Alex Riesen <raa.lkml@gmail.com> wrote:
> >So the receive-pack process becomes:
> >
> > a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
> >b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
> 
> Why not $GIT_DIR/objects/tmp/pack... and ignore it everywhere?

Because there is a race condition.

The contents of the new pack must be accessable as a normal pack
before we update and unlock the refs that are being changed.  This
means it must be a normal pack in $GIT_DIR/objects/pack.

Currently all packs under $GIT_DIR/objects/pack are deleted during
`repack -a -d`.  Those packs may have been added to that directory
after the repack started resulting in them getting deleted when
the repack completes, but with none of their contained objects in
the newly created pack.  Thus the repository is suddenly missing
everything that was just pushed (or fetched).

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  7:52                     ` Shawn Pearce
@ 2006-10-27  8:08                       ` Alex Riesen
  2006-10-27  8:13                         ` Shawn Pearce
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Riesen @ 2006-10-27  8:08 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Eran Tromer, Nicolas Pitre, Junio C Hamano, git

>> >So the receive-pack process becomes:
>> >
>> > a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
>> >b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
>>
>> Why not $GIT_DIR/objects/tmp/pack... and ignore it everywhere?
>
> Because there is a race condition.

Oh, right. Incidentally, is there a lockfile for packs?

On 10/27/06, Shawn Pearce <spearce@spearce.org> wrote:
> Alex Riesen <raa.lkml@gmail.com> wrote:
> > >So the receive-pack process becomes:
> > >
> > > a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
> > >b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
> >
> > Why not $GIT_DIR/objects/tmp/pack... and ignore it everywhere?
>
> Because there is a race condition.
>
> The contents of the new pack must be accessable as a normal pack
> before we update and unlock the refs that are being changed.  This
> means it must be a normal pack in $GIT_DIR/objects/pack.
>
> Currently all packs under $GIT_DIR/objects/pack are deleted during
> `repack -a -d`.  Those packs may have been added to that directory
> after the repack started resulting in them getting deleted when
> the repack completes, but with none of their contained objects in
> the newly created pack.  Thus the repository is suddenly missing
> everything that was just pushed (or fetched).
>
> --
> Shawn.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  8:08                       ` Alex Riesen
@ 2006-10-27  8:13                         ` Shawn Pearce
  0 siblings, 0 replies; 51+ messages in thread
From: Shawn Pearce @ 2006-10-27  8:13 UTC (permalink / raw
  To: Alex Riesen; +Cc: Eran Tromer, Nicolas Pitre, Junio C Hamano, git

Alex Riesen <raa.lkml@gmail.com> wrote:
> >>>So the receive-pack process becomes:
> >>>
> >>> a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
> >>>b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
> >>
> >>Why not $GIT_DIR/objects/tmp/pack... and ignore it everywhere?
> >
> >Because there is a race condition.
> 
> Oh, right. Incidentally, is there a lockfile for packs?

No, we have never needed them before.  And I think we'd like to
avoid adding them now.

We do have the rule that a pack can only be accessed if its .idx
file exists; therefore we generate a pack and its index, then place
the pack into the object directory, then the index.  That way the
pack is there before the index but another process won't attempt
to read the pack until the index exists.  It also means we should
never put an index into the pack directory before its associated
pack.  :-)

If we are pruneing packs or loose objects after the repack we delete
only after the pack and index are in place.  Running processes
rescan the existing packs once if they look for an object and
cannot find it in the loose objects directory or in the packs it
already knows about; this lets most running processes automatically
recover should a parallel repack delete the loose objects it needs
(as they were just packed).

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  3:00             ` Shawn Pearce
  2006-10-27  3:13               ` Sean
  2006-10-27  4:03               ` Eran Tromer
@ 2006-10-27 14:27               ` Nicolas Pitre
  2006-10-27 14:38                 ` Petr Baudis
  2 siblings, 1 reply; 51+ messages in thread
From: Nicolas Pitre @ 2006-10-27 14:27 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Eran Tromer, Junio C Hamano, git

On Thu, 26 Oct 2006, Shawn Pearce wrote:

> Unfortunately it does not completely work.
> 
> What happens when the incoming pack (steps #2 and #3) takes 15
> minutes to upload (slow ADSL modem, lots of objects) and the
> background repack process sees those temporary refs and starts
> trying to include those objects?  It can't walk the DAG that those
> refs point at because the objects aren't in the current repository.
> 
> From what I know of that code the pack-objects process will fail to
> find the object pointed at by the ref, rescan the packs directory,
> find no new packs, look for the object again, and abort over the
> "corruption".
> 
> OK so the repository won't get corrupted but the repack would be
> forced to abort.

Maybe this is the best way out?  Abort git-repack with "a fetch is in 
progress -- retry later".  No one will really suffer if the repack has 
to wait for the next scheduled cron job, especially if the fetch doesn't 
explode packs into loose objects anymore.

> Another issue I just thought about tonight is we may need a
> count-packs utility that like count-objects lists the number
> of active packs and their total size.  If we start hanging onto
> every pack we receive over the wire the pack directory is going to
> grow pretty fast and we'll need a way to tell us when its time to
> `repack -a -d`.

Sure.  Although the pack count is going to grow much less rapidly.  
Think of one pack per fetch instead of many many objects per fetch.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27 14:27               ` Nicolas Pitre
@ 2006-10-27 14:38                 ` Petr Baudis
  2006-10-27 14:48                   ` J. Bruce Fields
  2006-10-27 18:56                   ` Junio C Hamano
  0 siblings, 2 replies; 51+ messages in thread
From: Petr Baudis @ 2006-10-27 14:38 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Shawn Pearce, Eran Tromer, Junio C Hamano, git

Dear diary, on Fri, Oct 27, 2006 at 04:27:05PM CEST, I got a letter
where Nicolas Pitre <nico@cam.org> said that...
> On Thu, 26 Oct 2006, Shawn Pearce wrote:
> > OK so the repository won't get corrupted but the repack would be
> > forced to abort.
> 
> Maybe this is the best way out?  Abort git-repack with "a fetch is in 
> progress -- retry later".  No one will really suffer if the repack has 
> to wait for the next scheduled cron job, especially if the fetch doesn't 
> explode packs into loose objects anymore.

I don't really like this that much. Big projects can have 10 commits per
hour on average, and they also take potentially long time to repack, so
you might get to never really repack them.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27 14:38                 ` Petr Baudis
@ 2006-10-27 14:48                   ` J. Bruce Fields
  2006-10-27 15:03                     ` Petr Baudis
  2006-10-27 18:56                   ` Junio C Hamano
  1 sibling, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2006-10-27 14:48 UTC (permalink / raw
  To: Petr Baudis; +Cc: Nicolas Pitre, Shawn Pearce, Eran Tromer, Junio C Hamano, git

On Fri, Oct 27, 2006 at 04:38:54PM +0200, Petr Baudis wrote:
> I don't really like this that much. Big projects can have 10 commits per
> hour on average, and they also take potentially long time to repack, so
> you might get to never really repack them.

An average of 10 per minute doesn't mean there aren't frequent long idle
times.  That commit traffic is probably extremely bursty, right?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27 14:48                   ` J. Bruce Fields
@ 2006-10-27 15:03                     ` Petr Baudis
  2006-10-27 16:04                       ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Petr Baudis @ 2006-10-27 15:03 UTC (permalink / raw
  To: J. Bruce Fields
  Cc: Nicolas Pitre, Shawn Pearce, Eran Tromer, Junio C Hamano, git

Dear diary, on Fri, Oct 27, 2006 at 04:48:39PM CEST, I got a letter
where "J. Bruce Fields" <bfields@fieldses.org> said that...
> On Fri, Oct 27, 2006 at 04:38:54PM +0200, Petr Baudis wrote:
> > I don't really like this that much. Big projects can have 10 commits per
> > hour on average, and they also take potentially long time to repack, so
> > you might get to never really repack them.
> 
> An average of 10 per minute doesn't mean there aren't frequent long idle
> times.  That commit traffic is probably extremely bursty, right?

10 per _hour_. :-)

E.g. GNOME is 7 commits per hour average, and it does tend to be pretty
spread out:

	http://cia.navi.cx/stats/project/gnome

(Unfortunately I can't figure out how to squeeze more commits from the
web interface. KDE gets even more commits than GNOME and Gentoo tops
all the CIA-tracked projects.)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27 15:03                     ` Petr Baudis
@ 2006-10-27 16:04                       ` J. Bruce Fields
  2006-10-27 16:05                         ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2006-10-27 16:04 UTC (permalink / raw
  To: Petr Baudis; +Cc: Nicolas Pitre, Shawn Pearce, Eran Tromer, Junio C Hamano, git

On Fri, Oct 27, 2006 at 05:03:34PM +0200, Petr Baudis wrote:
> Dear diary, on Fri, Oct 27, 2006 at 04:48:39PM CEST, I got a letter
> where "J. Bruce Fields" <bfields@fieldses.org> said that...

Ya know, it'd be cool if that fit on one line....

> > On Fri, Oct 27, 2006 at 04:38:54PM +0200, Petr Baudis wrote:
> > > I don't really like this that much. Big projects can have 10 commits per
> > > hour on average, and they also take potentially long time to repack, so
> > > you might get to never really repack them.
> > 
> > An average of 10 per minute doesn't mean there aren't frequent long idle
> > times.  That commit traffic is probably extremely bursty, right?
> 
> 10 per _hour_. :-)

Whoops, right.

> E.g. GNOME is 7 commits per hour average, and it does tend to be pretty
> spread out:
> 
> 	http://cia.navi.cx/stats/project/gnome
> 
> (Unfortunately I can't figure out how to squeeze more commits from the
> web interface. KDE gets even more commits than GNOME and Gentoo tops
> all the CIA-tracked projects.)

That's not enough to tell how long on average you'd have to wait for a
gap of a certain length.

I think if you expect x commits per hour, and need y hours to prune,
then you should be able to get a worst-case estimate of hours between
y-hour gaps from

	octave -q --eval "1/poisscdf(0,x/y)"

but my statistics isn't great, so maybe that's not quite right.

And in any case the commit arrival times are probably very far from
independent, which probably makes gaps more likely.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27 16:04                       ` J. Bruce Fields
@ 2006-10-27 16:05                         ` J. Bruce Fields
  0 siblings, 0 replies; 51+ messages in thread
From: J. Bruce Fields @ 2006-10-27 16:05 UTC (permalink / raw
  To: Petr Baudis; +Cc: Nicolas Pitre, Shawn Pearce, Eran Tromer, Junio C Hamano, git

On Fri, Oct 27, 2006 at 12:04:50PM -0400, bfields wrote:
> I think if you expect x commits per hour, and need y hours to prune,
> then you should be able to get a worst-case estimate of hours between
> y-hour gaps from
> 
> 	octave -q --eval "1/poisscdf(0,x/y)"

Uh, sorry, that should be x*y, not x/y....


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27  6:57             ` Junio C Hamano
@ 2006-10-27 17:23               ` Nicolas Pitre
  0 siblings, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2006-10-27 17:23 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Sean, Shawn Pearce, Eran Tromer, git

On Thu, 26 Oct 2006, Junio C Hamano wrote:

> I'd almost say "heavy repository-wide operations like 'repack -a
> -d' and 'prune' should operate under a single repository lock",
> but historically we've avoided locks and instead tried to do
> things optimistically and used compare-and-swap to detect
> conflicts, so maybe that avenue might be worth pursuing.
> 
> How about (I'm thinking aloud and I'm sure there will be
> holes -- I won't think about prune for now)...
> 
> * "repack -a -d":
> 
>  (1) initially run show-ref (or "ls-remote .") and store the
>      result in .git/$ref_pack_lock_file;
> 
>  (2) enumerate existing packs;
> 
>  (3) do the usual "rev-list --all | pack-objects" thing; this
>      may end up including more objects than what are reachable
>      from the result of (1) if somebody else updates refs in the
>      meantime;
> 
>  (4) enumerate existing packs; if there is difference from (2)
>      other than what (3) created, that means somebody else added
>      a pack in the meantime; stop and do not do the "-d" part;
> 
>  (5) run "ls-remote ." again and compare it with what it got in
>      (1); if different, somebody else updated a ref in the
>      meantime; stop and do not do the "-d" part;
> 
>  (6) do the "-d" part as usual by removing packs we saw in (2)
>      but do not remove the pack we created in (3);
> 
>  (7) remove .git/$ref_pack_lock_file.
> 
> * "fetch --thin" and "index-pack --stdin":
> 
>  (1) check the .git/$ref_pack_lock_file, and refuse to operate
>     if there is such (this is not strictly needed for
>     correctness but only to give an early exit);

I don't think this is a good idea.  A fetch should always work 
irrespective of any repack taking place.  The fetch really should have 
priority over a repack since it is directly related to the user 
experience.  The repack can fail or produce suboptimal results if a race 
occurs, but the fetch must not fail for such a reason.

>  (2) create a new pack under a temporary name, and when
>      complete, make the pack/index pair .pack and .idx;

Actually this is what already happens if you don't specify a name to 
git-index-pack --stdin.

>  (3) update the refs.

So the actual race is the really small interval between the time the new 
pack+index are moved to .git/objects/pack/ and the moment the refs are 
updated.  In practice this is probably less than a second.  All that is 
needed here is to somehow go back to (2) if that interval occurs between 
(2) and (3).



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27 14:38                 ` Petr Baudis
  2006-10-27 14:48                   ` J. Bruce Fields
@ 2006-10-27 18:56                   ` Junio C Hamano
  1 sibling, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2006-10-27 18:56 UTC (permalink / raw
  To: Petr Baudis; +Cc: Shawn Pearce, Eran Tromer, git

Petr Baudis <pasky@suse.cz> writes:

> Dear diary, on Fri, Oct 27, 2006 at 04:27:05PM CEST, I got a letter
> where Nicolas Pitre <nico@cam.org> said that...
>> On Thu, 26 Oct 2006, Shawn Pearce wrote:
>> > OK so the repository won't get corrupted but the repack would be
>> > forced to abort.
>> 
>> Maybe this is the best way out?  Abort git-repack with "a fetch is in 
>> progress -- retry later".  No one will really suffer if the repack has 
>> to wait for the next scheduled cron job, especially if the fetch doesn't 
>> explode packs into loose objects anymore.
>
> I don't really like this that much. Big projects can have 10 commits per
> hour on average, and they also take potentially long time to repack, so
> you might get to never really repack them.

One question about that statistics is if the frequency of 10
commits per hour is 10 pushes into the central repository per
hour or 10 commits distributed all over the world in dozens of
developers' repositories.

Even if the number is 10 pushes into the central repository per
hour, I do not see it as a big problem in practice from the
workflow point of view.  Even people sticking to their CVS
workflow to have a central repository model are gaining big time
from being able to keep working disconnected by switching to git
using the shared repository mode, and it should not be a big
deal if the central repository master shuts down pushes into the
repository for N minutes a day for scheduled repacking.  So it
could be that a more practical way out is to say "'repack -a -d'
and 'prune' are to be run when things are quiescent".

A cron job for the scheduled repack/prune can set a flag
(repository wide lockfile or something) to ask new push/fetch to
wait and come back later, and we could set up a pre-* hooks for
push/fetch to notice it.  While push/fetch processes that have
already been started can still interfere, as long as they cause
repack/prune to fail the "deletion" part, eventually outstanding
push/fetch will die out and the cron job will have that
quiescent window.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-26 14:45 ` Eran Tromer
       [not found]   ` <Pine.LNX.4.64.0610261105200.12418@xanadu.home>
@ 2006-10-27 20:22   ` Linus Torvalds
  2006-10-27 21:53     ` Junio C Hamano
  1 sibling, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2006-10-27 20:22 UTC (permalink / raw
  To: Eran Tromer; +Cc: git, Junio C Hamano



On Thu, 26 Oct 2006, Eran Tromer wrote:
>
> This creates a race condition w.r.t. "git repack -a -d", similar to the
> existing race condition between "git fetch --keep" and
> "git repack -a -d". There's a point in time where the new pack is stored
> but not yet referenced, and if "git repack -a -d" runs at that point it
> will eradicate the pack. When the heads are finally updated, you get a
> corrupted repository.

(I note that there's a whole thread on this, but I was off doing other 
things, so I probably missed part of it)

We really should _never_ create a pack in-place with the final name.

The way to fix the race is to simply not create the patch as

	.git/objects/packed/pack-xyz.{pack|idx}

in the first place, but simply "mv" them into place later. If you do that 
after you've written the temporary pointer to it, there is no race (the 
temporary pointer may not be usable, of course, but that's a separate 
issue).

That said, I think some of the "git repack -d" logic is also unnecessarily 
fragile. In particular, it shouldn't just do

	existing=$(find . -type f \( -name '*.pack' -o -name '*.idx' \))

like it does to generate the "existing" list, it should probably only ever 
remove a pack-file and index file _pair_, ie it should do something like

	existing=$(find . -type f -name '*.pack')

and then do

	for pack in $existing
	do
		index="$(basename $pack).idx"
		if [ -f $index ] && [ "$pack"!= "$newpack" ]
		then
			rm -f "$pack" "$index"
		fi
	done

etc, exactly so that it would never remove anything that is getting 
indexed or is otherwise half-way done, regardless of any other issues.

Hmm?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27 20:22   ` Linus Torvalds
@ 2006-10-27 21:53     ` Junio C Hamano
  2006-10-28  3:42       ` Shawn Pearce
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2006-10-27 21:53 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> We really should _never_ create a pack in-place with the final name.

The "fattening" index-pack Nico did does not have this problem
as far as I can see.  Under --stdin, it creates a temporary pack
file "$GIT_OBJECT_DIRECTORY/pack_XXXXXX"; after the received
pack is fattened by adding missing delta-base objects and fixing
up the pack header, final() moves the {pack,idx} pair to the
final location.

The race is about this sequence:

	- git-receive-pack is spawned from remove git-send-pack;
          it lets "index-pack --stdin --fatten" to keep the pack.

	- index-pack does its magic and moves the pack and idx
          to their final location;

	- "repack -a -d" is started by somebody else; it first
          remembers all the existing packs; it does the usual
          repacking-into-one.

	- git-receive-pack that invoked the index-pack waits for
          index-pack to finish, and then updates the refs;

	- "repack -a -d" is done repacking; removes the packs
          that existed when it checked earlier.

Two instances of receive-pack running simultaneously is safe (in
the sense that it does not corrupt the repository; one instance
can fail after noticing the other updated the ref it wanted to
update) and there is no reason to exclude each other.  But
"repack -a -d" and receive-pack are not.

Can we perhaps have reader-writer lock on the filesystem to
pretect the repository?  "prune" can also be made into a writer
for that lock and "fetch-pack --keep" would be a reader for the
lock.  That reader-writer lock would solve the issue rather
nicely.

> That said, I think some of the "git repack -d" logic is also unnecessarily 
> fragile.

Noted; will fix.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-27 21:53     ` Junio C Hamano
@ 2006-10-28  3:42       ` Shawn Pearce
  2006-10-28  4:09         ` Junio C Hamano
  2006-10-28  4:18         ` Linus Torvalds
  0 siblings, 2 replies; 51+ messages in thread
From: Shawn Pearce @ 2006-10-28  3:42 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano <junkio@cox.net> wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
[snip]
> Can we perhaps have reader-writer lock on the filesystem to
> pretect the repository?  "prune" can also be made into a writer
> for that lock and "fetch-pack --keep" would be a reader for the
> lock.  That reader-writer lock would solve the issue rather
> nicely.
> 
> > That said, I think some of the "git repack -d" logic is also unnecessarily 
> > fragile.
> 
> Noted; will fix.

So a reader-writer lock is preferred over
a non-locking solution such as I posted in
http://article.gmane.org/gmane.comp.version-control.git/30288 ?

Not to mention that such a solution would also fix the -d issue
Linus points out above.

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28  3:42       ` Shawn Pearce
@ 2006-10-28  4:09         ` Junio C Hamano
  2006-10-28  4:18         ` Linus Torvalds
  1 sibling, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2006-10-28  4:09 UTC (permalink / raw
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

> So a reader-writer lock is preferred over
> a non-locking solution such as I posted in
> http://article.gmane.org/gmane.comp.version-control.git/30288 ?

If you mean these two in your message to be "solution":

   So the receive-pack process becomes:

     a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
     b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
     c. Write pack and index.
     d. Move pack to $GIT_DIR/objects/pack/...
     e. Move index to $GIT_DIR/objects/pack...
     f. Update refs.
     g. Arrange for new pack and index to be considered active.

   And the repack -a -d process becomes:

     1. List all active packs and store in memory.
     2. Repack only loose objects and objects contained in active packs.
     3. Move new pack and idx into $GIT_DIR/objects/pack/...
     4. Arrange for new pack and idx to be considered active.
     5. Delete active packs found by step #1.

I am not so sure how it solves anything at all.

The race is about this sequence:

      - git-receive-pack is spawned from remove git-send-pack;
        it lets "index-pack --stdin --fatten" to keep the pack.

      - index-pack does its magic and moves the pack and idx
        to their final location;

      - "repack -a -d" is started by somebody else; it first
        remembers all the existing packs; it does the usual
        repacking-into-one.

      - git-receive-pack that invoked the index-pack waits for
        index-pack to finish, and then updates the refs;

      - "repack -a -d" is done repacking; removes the packs
        that existed when it checked earlier.

Now, I am not sure what your plan to "arrange for new pack and
idx to be considered active" is.  Care to explain?

There is a tricky constraints imposed on us by (arguably broken)
commit walkers in that it relies on the (arguably broken)
sha1_file.c:sha1_pack_name() interface, so naming historical
ones $GIT_OBJECT_DIR/pack/hist-X{40}.pack would not work; we
would need to fix commit walkers for that first.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28  3:42       ` Shawn Pearce
  2006-10-28  4:09         ` Junio C Hamano
@ 2006-10-28  4:18         ` Linus Torvalds
  2006-10-28  5:42           ` Junio C Hamano
  1 sibling, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2006-10-28  4:18 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Junio C Hamano, git



On Fri, 27 Oct 2006, Shawn Pearce wrote:
> 
> So a reader-writer lock is preferred over
> a non-locking solution such as I posted in
> http://article.gmane.org/gmane.comp.version-control.git/30288 ?
> 
> Not to mention that such a solution would also fix the -d issue
> Linus points out above.

Be very careful.

There's a good reason why git doesn't use locking, and tends to use the 
"create file exclusively and move over the old version after having tested 
that the old version is still relevant" approach.

Two _major_ issues:

 - just about any other locking algorithm simply doesn't work on some 
   filesystems. And then you're just royally screwed.

 - I want to be able to push out, regardless of whether there is somebody 
   (or millions of somebodies) reading the repository at the same time. So 
   locking is not acceptable for "normal operations" at all - at most this 
   would be a "keep a repack from interfering with another repack" kind of 
   thing.

I would MUCH rather we just rename the index/pack file to something that 
git can _use_, but that "git repack -a -d" won't remove. In other words, 
rather than locking, it would be much better to just use a naming rule: 
when we download a new pack, the new pack will be called

	new-pack-<SHA1ofobjectlist>.pack
	new-pack-<SHA1ofobjectlist>.idx

and we just make the rule that "git repack -a -d" will only ever touch 
packs that are called just "pack-*.{pack|idx}", and never anything else.

It really is that simple. Allow normal git object opens to open the 
"temporary file" naming version too (so that you can install the refs 
before the rename, and all the objects will be visible), but don't allow 
"git repack" to remove packs that are in the process of being installed.

Race removed, and no locking really needed. At most, we might need to be 
able to match up a "new-pack-*.idx" file with a "pack-*.pack" file when we 
open pack-files, simply because we can't rename two files atomically, so 
the pack-file and index file would potentially exist with "different" 
names for a short window. 

That kind of small semantic changes are _way_ better than introducing 
locking, which will inevitably have much worse error cases (not working, 
stale locks, inability to push because something is really slow, or any 
number of other problems).


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28  4:18         ` Linus Torvalds
@ 2006-10-28  5:42           ` Junio C Hamano
  2006-10-28  7:21             ` Shawn Pearce
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2006-10-28  5:42 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git, Shawn Pearce

Linus Torvalds <torvalds@osdl.org> writes:

> On Fri, 27 Oct 2006, Shawn Pearce wrote:
>> 
>> So a reader-writer lock is preferred over
>> a non-locking solution such as I posted in
>> http://article.gmane.org/gmane.comp.version-control.git/30288 ?
>> 
>> Not to mention that such a solution would also fix the -d issue
>> Linus points out above.
>
> Be very careful.
>
> There's a good reason why git doesn't use locking, and tends to use the 
> "create file exclusively and move over the old version after having tested 
> that the old version is still relevant" approach.
>
> Two _major_ issues:
>...
>
> I would MUCH rather we just rename the index/pack file to something that 
> git can _use_, but that "git repack -a -d" won't remove....

Two points.

The "locking" I mentioned was between receive-pack and repack -a
-d; upload-pack (what millions people are using to read from the
repository you are pushing into) is not affected.  So in that
sense, we can afford to use lock without much contention.

I just thought of a cute hack that does not involve renaming
packs at all (so no need to match new-pack-X.pack with
pack-X.idx), and Shawn's sequence actually would work, which is:

The receive-pack side:

  a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
  b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
  c. Write pack and index, in "inactive" state.
  d. Move pack to $GIT_DIR/objects/pack/...
  e. Move idx to $GIT_DIR/objects/pack...
  f. Update refs.
  g. Mark new pack and idx as "active".

The "repack -a -d" side:

  1. List all active packs and store in memory.
  2. Repack only loose objects and objects contained in active packs.
  3. Move new pack and idx into $GIT_DIR/objects/pack/...
  4. Mark new pack and idx as "active".
  5. Delete active packs found by step #1.

Pack-idx pair is marked "active" by "chmod u+s" the .pack file.
During the normal operation, all .pack/.idx pair in objects/pack/
directories are usable regardless of the setuid bit; we would
never make .pack files executable so u+s would not otherwise
hurt us either.  "active" probably is better read as "eligible
for repacking".



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28  5:42           ` Junio C Hamano
@ 2006-10-28  7:21             ` Shawn Pearce
  2006-10-28  8:40               ` Shawn Pearce
                                 ` (3 more replies)
  0 siblings, 4 replies; 51+ messages in thread
From: Shawn Pearce @ 2006-10-28  7:21 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano <junkio@cox.net> wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
> > I would MUCH rather we just rename the index/pack file to something that 
> > git can _use_, but that "git repack -a -d" won't remove....
> 
> Two points.
> 
> The "locking" I mentioned was between receive-pack and repack -a
> -d; upload-pack (what millions people are using to read from the
> repository you are pushing into) is not affected.  So in that
> sense, we can afford to use lock without much contention.

And giving how difficult locking is to get on most filesystems I'd
just rather avoid any sort of locking whenever possible.  That's one
reason why reflog is 1 file per ref and not 1 file per repository...
 
> I just thought of a cute hack that does not involve renaming
> packs at all (so no need to match new-pack-X.pack with
> pack-X.idx), and Shawn's sequence actually would work, which is:

I take this above statement to mean that you answered your own
question about how my sequence is able to resolve the race condition?
 
> The receive-pack side:
> 
>   a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
>   b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
>   c. Write pack and index, in "inactive" state.
>   d. Move pack to $GIT_DIR/objects/pack/...
>   e. Move idx to $GIT_DIR/objects/pack...
>   f. Update refs.
>   g. Mark new pack and idx as "active".
> 
> The "repack -a -d" side:
> 
>   1. List all active packs and store in memory.
>   2. Repack only loose objects and objects contained in active packs.
>   3. Move new pack and idx into $GIT_DIR/objects/pack/...
>   4. Mark new pack and idx as "active".
>   5. Delete active packs found by step #1.
> 
> Pack-idx pair is marked "active" by "chmod u+s" the .pack file.
> During the normal operation, all .pack/.idx pair in objects/pack/
> directories are usable regardless of the setuid bit; we would
> never make .pack files executable so u+s would not otherwise
> hurt us either.  "active" probably is better read as "eligible
> for repacking".

As cool as that trick is I'm against using the file mode as a
way to indicate the status of a pack file.  For one thing not
every filesystem that Git is used on handles file modes properly.
We already have core.filemode thanks to some of those and I use
Git on at least one of those "not so friendly" filesystems...

Why not just use create a new flag file?

Lets say that a pack X is NOT eligible to be repacked if
"$GIT_DIR/objects/pack/pack-X.keep" exists.

Thus we want to have the new ".keep" file for historical packs and
incoming receive-pack between steps c and g.  In the former case
the historical pack is already "very large" and thus one additional
empty file to indicate we want to retain that pack as-is is trivial
overhead (relatively speaking); in the latter case the lifespan of
the file is relatively short and thus any overhead associated with it
on the local filesystem is free (it may never even hit the platter).

In the sequence above we create pack-X.keep between steps b and c
during receive-pack ensuring that even before the pack is usable by
a Git reader process that it can't be swept up by a `repack -a -d`
and we delete the pack-X.keep file in step g to mark it active.

Further only repack and the receive-pack side code changes: all
existing packs are automatically taken to be active while only
packs coming in from receive-pack or those marked by a human as
"historical" will be kept.

Two birds, one stone.  Thoughts?

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28  7:21             ` Shawn Pearce
@ 2006-10-28  8:40               ` Shawn Pearce
  2006-10-28 19:15                 ` Junio C Hamano
  2006-10-28 17:59               ` Linus Torvalds
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 51+ messages in thread
From: Shawn Pearce @ 2006-10-28  8:40 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Shawn Pearce <spearce@spearce.org> wrote:
> Why not just use create a new flag file?
> 
> Lets say that a pack X is NOT eligible to be repacked if
> "$GIT_DIR/objects/pack/pack-X.keep" exists.

Here's the `git repack -a -d` portion of that.
Thoughts?

-- >8 --
[PATCH] Only repack active packs by skipping over kept packs.

During `git repack -a -d` only repack objects which are loose or
which reside in an active (a non-kept) pack.  This allows the user
to keep large packs as-is without continuous repacking and can be
very helpful on large repositories.  It should also help us resolve
a race condition between `git repack -a -d` and the new pack store
functionality in `git-receive-pack`.

Kept packs are those which have a corresponding .keep file in
$GIT_OBJECT_DIRECTORY/pack.  That is pack-X.pack will be kept
(not repacked and not deleted) if pack-X.keep exists in the same
directory when `git repack -a -d` starts.

Currently this feature is not documented and there is no user
interface to keep an existing pack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 git-repack.sh |   48 +++++++++++++++++++++++++++++++-----------------
 1 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/git-repack.sh b/git-repack.sh
index 17e2452..fe1e2ef 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -43,13 +43,30 @@ trap 'rm -f "$PACKTMP"-*' 0 1 2 3 15
 case ",$all_into_one," in
 ,,)
 	args='--unpacked --incremental'
+	active=
 	;;
 ,t,)
-	args=
-
-	# Redundancy check in all-into-one case is trivial.
-	existing=`test -d "$PACKDIR" && cd "$PACKDIR" && \
-	    find . -type f \( -name '*.pack' -o -name '*.idx' \) -print`
+	args=--unpacked
+	active=
+	if test -d "$PACKDIR"
+	then
+		for p in `find "$PACKDIR" -type f -name '*.pack' -print`
+		do
+			n=`basename "$p" .pack`
+			d=`dirname "$p"`
+			if test -e "$d/$n.keep"
+			then
+				: keep
+			else
+				args="$args --unpacked=$p"
+				active="$active $n"
+			fi
+		done
+	fi
+	if test "X$args" = X--unpacked
+	then
+		args='--unpacked --incremental'
+	fi
 	;;
 esac
 
@@ -86,20 +103,17 @@ fi
 
 if test "$remove_redundant" = t
 then
-	# We know $existing are all redundant only when
-	# all-into-one is used.
-	if test "$all_into_one" != '' && test "$existing" != ''
+	# We know $active are all redundant.
+	if test "$active" != ''
 	then
 		sync
-		( cd "$PACKDIR" &&
-		  for e in $existing
-		  do
-			case "$e" in
-			./pack-$name.pack | ./pack-$name.idx) ;;
-			*)	rm -f $e ;;
-			esac
-		  done
-		)
+		for n in $active
+		do
+			if test "$n" != "pack-$name"
+			then
+				rm -f "$PACKDIR/$n.pack" "$PACKDIR/$n.idx"
+			fi
+		done
 	fi
 	git-prune-packed
 fi
-- 
1.4.3.3.g7d63

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28  7:21             ` Shawn Pearce
  2006-10-28  8:40               ` Shawn Pearce
@ 2006-10-28 17:59               ` Linus Torvalds
  2006-10-28 18:34               ` Junio C Hamano
  2006-10-28 22:31               ` Eran Tromer
  3 siblings, 0 replies; 51+ messages in thread
From: Linus Torvalds @ 2006-10-28 17:59 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Junio C Hamano, git



On Sat, 28 Oct 2006, Shawn Pearce wrote:
> 
> Why not just use create a new flag file?
> 
> Lets say that a pack X is NOT eligible to be repacked if
> "$GIT_DIR/objects/pack/pack-X.keep" exists.

Yeah, me likee. Simple and straightforward, and mixes well with the 
"--keep" flag that has been discussed for git repack anyway.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28  7:21             ` Shawn Pearce
  2006-10-28  8:40               ` Shawn Pearce
  2006-10-28 17:59               ` Linus Torvalds
@ 2006-10-28 18:34               ` Junio C Hamano
  2006-10-28 22:31               ` Eran Tromer
  3 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2006-10-28 18:34 UTC (permalink / raw
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

> I take this above statement to mean that you answered your own
> question about how my sequence is able to resolve the race condition?

Yes.  I needed more thought after I asked that question.

>...
> Why not just use create a new flag file?
>
> Lets say that a pack X is NOT eligible to be repacked if
> "$GIT_DIR/objects/pack/pack-X.keep" exists.

I like it.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28  8:40               ` Shawn Pearce
@ 2006-10-28 19:15                 ` Junio C Hamano
  2006-10-29  3:50                   ` Shawn Pearce
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2006-10-28 19:15 UTC (permalink / raw
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

> Shawn Pearce <spearce@spearce.org> wrote:
>> Why not just use create a new flag file?
>> 
>> Lets say that a pack X is NOT eligible to be repacked if
>> "$GIT_DIR/objects/pack/pack-X.keep" exists.
>
> Here's the `git repack -a -d` portion of that.
> Thoughts?

> +	args=--unpacked
> +	active=
> +	if test -d "$PACKDIR"
> +	then
> +		for p in `find "$PACKDIR" -type f -name '*.pack' -print`

This change to run 'find "$PACKDIR"' is fragile when your
$GIT_OBJECT_DIRECTORY has $IFS in it; running "find ." after
"cd" in a subprocess was done very much on purpose to avoid that
issue.  Please don't break it.

> +		do
> +			n=`basename "$p" .pack`
> +			d=`dirname "$p"`
> +			if test -e "$d/$n.keep"
> +			then
> +				: keep
> +			else
> +				args="$args --unpacked=$p"
> +				active="$active $n"
> +			fi
> +		done
> +	fi
> +	if test "X$args" = X--unpacked
> +	then
> +		args='--unpacked --incremental'
> +	fi
>  	;;
>  esac

I do not remember offhand what --incremental meant, but
presumably this is for the very initial "repack" (PACKDIR did
not exist or find loop did not find anything to repack) and the
flag would not make a difference?  Care to explain?

Other than that, the overall structure seems quite sane.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28  7:21             ` Shawn Pearce
                                 ` (2 preceding siblings ...)
  2006-10-28 18:34               ` Junio C Hamano
@ 2006-10-28 22:31               ` Eran Tromer
  2006-10-29  3:38                 ` Shawn Pearce
  3 siblings, 1 reply; 51+ messages in thread
From: Eran Tromer @ 2006-10-28 22:31 UTC (permalink / raw
  To: git; +Cc: Shawn Pearce

Hi Shawn,

On 2006-10-28 09:21, Shawn Pearce wrote:
> Lets say that a pack X is NOT eligible to be repacked if
> "$GIT_DIR/objects/pack/pack-X.keep" exists.
> 
> Thus we want to have the new ".keep" file for historical packs and
> incoming receive-pack between steps c and g.  In the former case
> the historical pack is already "very large" and thus one additional
> empty file to indicate we want to retain that pack as-is is trivial
> overhead (relatively speaking); in the latter case the lifespan of
> the file is relatively short and thus any overhead associated with it
> on the local filesystem is free (it may never even hit the platter).

Sounds perfect.

It would be nice to have whoever creates a pack-*.keep file put
something useful as the content of the file, so we'll know what to clean
up after abnormal termination:

$ grep -l ^git-receive-pack $GIT_DIR/objects/pack/pack-*.keep



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28 22:31               ` Eran Tromer
@ 2006-10-29  3:38                 ` Shawn Pearce
  2006-10-29  3:48                   ` Jakub Narebski
  0 siblings, 1 reply; 51+ messages in thread
From: Shawn Pearce @ 2006-10-29  3:38 UTC (permalink / raw
  To: Eran Tromer; +Cc: git

Eran Tromer <git2eran@tromer.org> wrote:
> Hi Shawn,
> 
> On 2006-10-28 09:21, Shawn Pearce wrote:
> > Lets say that a pack X is NOT eligible to be repacked if
> > "$GIT_DIR/objects/pack/pack-X.keep" exists.
> > 
> > Thus we want to have the new ".keep" file for historical packs and
> > incoming receive-pack between steps c and g.  In the former case
> > the historical pack is already "very large" and thus one additional
> > empty file to indicate we want to retain that pack as-is is trivial
> > overhead (relatively speaking); in the latter case the lifespan of
> > the file is relatively short and thus any overhead associated with it
> > on the local filesystem is free (it may never even hit the platter).
> 
> Sounds perfect.
> 
> It would be nice to have whoever creates a pack-*.keep file put
> something useful as the content of the file, so we'll know what to clean
> up after abnormal termination:
> 
> $ grep -l ^git-receive-pack $GIT_DIR/objects/pack/pack-*.keep

Yes, that's a very good idea.  When I do the git-receive-pack
implementation tonight I'll try to dump useful information to the
.keep file such that you can easily grep for the stale .keeps
and decide which ones should go.

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-29  3:38                 ` Shawn Pearce
@ 2006-10-29  3:48                   ` Jakub Narebski
  2006-10-29  3:52                     ` Shawn Pearce
  0 siblings, 1 reply; 51+ messages in thread
From: Jakub Narebski @ 2006-10-29  3:48 UTC (permalink / raw
  To: git

Shawn Pearce wrote:

> Eran Tromer <git2eran@tromer.org> wrote:
>>
>> It would be nice to have whoever creates a pack-*.keep file put
>> something useful as the content of the file, so we'll know what to clean
>> up after abnormal termination:
>> 
>> $ grep -l ^git-receive-pack $GIT_DIR/objects/pack/pack-*.keep
> 
> Yes, that's a very good idea.  When I do the git-receive-pack
> implementation tonight I'll try to dump useful information to the
> .keep file such that you can easily grep for the stale .keeps
> and decide which ones should go.

Perhaps git-count-packs (or enhanced git-count-objects, or git-count-stuff;
whatever it would be named) could also list (with some option) the reasons
for packs to be kept...

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-28 19:15                 ` Junio C Hamano
@ 2006-10-29  3:50                   ` Shawn Pearce
  2006-10-29  4:29                     ` Junio C Hamano
  0 siblings, 1 reply; 51+ messages in thread
From: Shawn Pearce @ 2006-10-29  3:50 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> 
> > Shawn Pearce <spearce@spearce.org> wrote:
> >> Why not just use create a new flag file?
> >> 
> >> Lets say that a pack X is NOT eligible to be repacked if
> >> "$GIT_DIR/objects/pack/pack-X.keep" exists.
> >
> > Here's the `git repack -a -d` portion of that.
> > Thoughts?
> 
> > +	args=--unpacked
> > +	active=
> > +	if test -d "$PACKDIR"
> > +	then
> > +		for p in `find "$PACKDIR" -type f -name '*.pack' -print`
> 
> This change to run 'find "$PACKDIR"' is fragile when your
> $GIT_OBJECT_DIRECTORY has $IFS in it; running "find ." after
> "cd" in a subprocess was done very much on purpose to avoid that
> issue.  Please don't break it.

I only broke it because you backed me into a corner with --unpacked=
:-)

The issue is --unpacked= uses the path of the pack name, which
includes $GIT_OBJECT_DIRECTORY, whatever that may be.  This makes it
impossible for the shell script to hand through a proper --unpacked=
line for the active packs without including $GIT_OBJECT_DIRECTORY
as part of the option.

I agree with you about the $IFS issue.  I'll redraft this patch
tonight such that $IFS doesn't get broken here but that's going
to take a small code patch over in revisions.c, which I'll also
do tonight.

> > +	if test "X$args" = X--unpacked
> > +	then
> > +		args='--unpacked --incremental'
> > +	fi
 
> I do not remember offhand what --incremental meant, but
> presumably this is for the very initial "repack" (PACKDIR did
> not exist or find loop did not find anything to repack) and the
> flag would not make a difference?  Care to explain?
 
I think there is a bug in pack-objects but I couldn't find it last
night.  Using --incremental worked around it.  :-)

According to the documentation:

  --unpacked tells pack-objects to only pack loose objects.

  --incremental tells pack-objects to skip any object that is
  already contained in a pack even if it appears in the input list.

What I really wanted here was to just use '--unpacked'; if there
are no active packs and the user asked for '-a' we actually just
want to pack the loose objects into a new active pack as we aren't
allowed to touch any of the existing packs (they are all kept or
there simply aren't any packs yet).

However on the git.git repository if I ran `git repack -a -d`
with every single object in a kept pack and no loose objects I kept
repacking the same 102 objects into a new active pack, even though
there were no loose objects to repack and no active packs.  Uh, yea.

Adding --incremental in this case kept it from repacking those
same 102 objects on every invocation.  Yea, its a bug.  I meant to
mention it in my email.

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-29  3:48                   ` Jakub Narebski
@ 2006-10-29  3:52                     ` Shawn Pearce
  0 siblings, 0 replies; 51+ messages in thread
From: Shawn Pearce @ 2006-10-29  3:52 UTC (permalink / raw
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:
> Shawn Pearce wrote:
> 
> > Eran Tromer <git2eran@tromer.org> wrote:
> >>
> >> It would be nice to have whoever creates a pack-*.keep file put
> >> something useful as the content of the file, so we'll know what to clean
> >> up after abnormal termination:
> >> 
> >> $ grep -l ^git-receive-pack $GIT_DIR/objects/pack/pack-*.keep
> > 
> > Yes, that's a very good idea.  When I do the git-receive-pack
> > implementation tonight I'll try to dump useful information to the
> > .keep file such that you can easily grep for the stale .keeps
> > and decide which ones should go.
> 
> Perhaps git-count-packs (or enhanced git-count-objects, or git-count-stuff;
> whatever it would be named) could also list (with some option) the reasons
> for packs to be kept...

That would be more like 'git ls-packs' to me, but as Junio pointed
out why add a command just to count packs, and as others have said
recently "Git has too many commands!".

<funny-and-not-to-be-taken-seriously>
I like git-count-stuff.  So generic.  Maybe we can shove
git-pickaxe in as the --count-lines-and-authors-andthings option of
git-count-stuff, seeing as how some people didn't like its name.  :-)
</funny-and-not-to-be-taken-seriously>

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-29  3:50                   ` Shawn Pearce
@ 2006-10-29  4:29                     ` Junio C Hamano
  2006-10-29  4:38                       ` Shawn Pearce
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2006-10-29  4:29 UTC (permalink / raw
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

> The issue is --unpacked= uses the path of the pack name, which
> includes $GIT_OBJECT_DIRECTORY, whatever that may be.  This makes it
> impossible for the shell script to hand through a proper --unpacked=
> line for the active packs without including $GIT_OBJECT_DIRECTORY
> as part of the option.

Yeah, I realize that; you need to know how to build shell script
that is properly shell quoted to be eval'ed, which is not hard
but is not usually done and is cumbersome.

I would suspect it is probably easier to just say --unpacked
(without packname) means "unpacked objects, and objects in packs
that do not have corresponding .keep".  However, that would be a
change in semantics for --unpacked (without packname), which is
not nice.

So how about pack-X{40}.volatile that marks an eligible one for
repacking?

Then we can make "pack-objects --unpacked" to pretend the ones
with corresponding .volatile as if the objects in them are
loose, without breaking backward compatibility.

> However on the git.git repository if I ran `git repack -a -d`
> with every single object in a kept pack and no loose objects I
> kept repacking the same 102 objects into a new active pack,
> even though there were no loose objects to repack and no
> active packs.  Uh, yea.

Will take a look myself if you are otherwise busy.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-29  4:29                     ` Junio C Hamano
@ 2006-10-29  4:38                       ` Shawn Pearce
  2006-10-29  5:16                         ` Junio C Hamano
  0 siblings, 1 reply; 51+ messages in thread
From: Shawn Pearce @ 2006-10-29  4:38 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> 
> > The issue is --unpacked= uses the path of the pack name, which
> > includes $GIT_OBJECT_DIRECTORY, whatever that may be.  This makes it
> > impossible for the shell script to hand through a proper --unpacked=
> > line for the active packs without including $GIT_OBJECT_DIRECTORY
> > as part of the option.
> 
> Yeah, I realize that; you need to know how to build shell script
> that is properly shell quoted to be eval'ed, which is not hard
> but is not usually done and is cumbersome.

Too much work.  :-)

> I would suspect it is probably easier to just say --unpacked
> (without packname) means "unpacked objects, and objects in packs
> that do not have corresponding .keep".  However, that would be a
> change in semantics for --unpacked (without packname), which is
> not nice.
> 
> So how about pack-X{40}.volatile that marks an eligible one for
> repacking?

Then anyone who has an existing pack would need to create that
file first as soon as they got this newer version of Git... not
very upgrade friendly if you ask me.
 
> Then we can make "pack-objects --unpacked" to pretend the ones
> with corresponding .volatile as if the objects in them are
> loose, without breaking backward compatibility.

Currently I'm changing --unpacked= to match without needing quoting.
I'm allowing it to match an exact pack name or if it starts with
"pack-" and matches the last 50 ("pack-X{40}.pack") of the pack name.

I figure this should work fine as probably anyone who has a pack
name that matches 50 characters and starts with "pack-" is using a
pack file name which has the SHA1 of the object list contained in
it and is thus probably unique.
 
-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-29  4:38                       ` Shawn Pearce
@ 2006-10-29  5:16                         ` Junio C Hamano
  2006-10-29  5:21                           ` Shawn Pearce
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2006-10-29  5:16 UTC (permalink / raw
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>> Shawn Pearce <spearce@spearce.org> writes:
>> 
>> So how about pack-X{40}.volatile that marks an eligible one for
>> repacking?
>
> Then anyone who has an existing pack would need to create that
> file first as soon as they got this newer version of Git... not
> very upgrade friendly if you ask me.

Ah, I mixed things up completely.  You're right.  Having .keep
leaves that pack as is (and lack of matching .keep causes it to
be repacked -- people do not have .keep so everything should be
repacked as before).

>> Then we can make "pack-objects --unpacked" to pretend the ones
>> with corresponding .volatile as if the objects in them are
>> loose, without breaking backward compatibility.
>
> Currently I'm changing --unpacked= to match without needing quoting.
> I'm allowing it to match an exact pack name or if it starts with
> "pack-" and matches the last 50 ("pack-X{40}.pack") of the pack name.

I think is a very sane thing to do (I should have done that from
the beginning).  I do not like "the last 50", but I do not have
objection to make it take either full path or just the filename
under objects/pack/ (so not "the last 50" but "filename w/o
slash").

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: fetching packs and storing them as packs
  2006-10-29  5:16                         ` Junio C Hamano
@ 2006-10-29  5:21                           ` Shawn Pearce
  0 siblings, 0 replies; 51+ messages in thread
From: Shawn Pearce @ 2006-10-29  5:21 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> >> Then we can make "pack-objects --unpacked" to pretend the ones
> >> with corresponding .volatile as if the objects in them are
> >> loose, without breaking backward compatibility.
> >
> > Currently I'm changing --unpacked= to match without needing quoting.
> > I'm allowing it to match an exact pack name or if it starts with
> > "pack-" and matches the last 50 ("pack-X{40}.pack") of the pack name.
> 
> I think is a very sane thing to do (I should have done that from
> the beginning).  I do not like "the last 50", but I do not have
> objection to make it take either full path or just the filename
> under objects/pack/ (so not "the last 50" but "filename w/o
> slash").

OK.  I coded it with the last 50 but will rewrite that commit
without as the code is slightly shorter that way.  :-)

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] send-pack --keep: do not explode into loose objects on the receiving end.
  2006-10-26  3:44 fetching packs and storing them as packs Nicolas Pitre
  2006-10-26 14:45 ` Eran Tromer
@ 2006-10-29  7:47 ` Junio C Hamano
  2006-10-29  7:56   ` Shawn Pearce
  1 sibling, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2006-10-29  7:47 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: git

This adds "keep-pack" extension to send-pack vs receive pack protocol,
and makes the receiver invoke "index-pack --stdin --fix-thin".

With this, you can ask send-pack not to explode the result into
loose objects on the receiving end.

I've patched has_sha1_file() to re-check for added packs just
like is done in read_sha1_file() for now, but I think the static
"re-prepare" interface for packs was a mistake.  Creation of a
new pack inside a process that needs to read objects in them
back ought to be a rare event, so we are better off making the
callers (such as receive-pack that calls "index-pack --stdin
--fix-thin") explicitly call re-prepare.  That way we do not
have to penalize ordinary users of read_sha1_file() and
has_sha1_file().

We would need to fix this someday.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 receive-pack.c       |   19 ++++++++++++++++---
 send-pack.c          |   23 ++++++++++++++++++-----
 sha1_file.c          |    7 +++++--
 t/t5400-send-pack.sh |    9 +++++++++
 4 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/receive-pack.c b/receive-pack.c
index ea2dbd4..ef50226 100644
--- a/receive-pack.c
+++ b/receive-pack.c
@@ -8,10 +8,14 @@
 static const char receive_pack_usage[] = "git-receive-pack <git-dir>";
 
 static const char *unpacker[] = { "unpack-objects", NULL };
+static const char *keep_packer[] = {
+	"index-pack", "--stdin", "--fix-thin", NULL
+};
 
 static int report_status;
+static int keep_pack;
 
-static char capabilities[] = "report-status";
+static char capabilities[] = "report-status keep-pack";
 static int capabilities_sent;
 
 static int show_ref(const char *path, const unsigned char *sha1)
@@ -261,6 +265,8 @@ static void read_head_info(void)
 		if (reflen + 82 < len) {
 			if (strstr(refname + reflen + 1, "report-status"))
 				report_status = 1;
+			if (strstr(refname + reflen + 1, "keep-pack"))
+				keep_pack = 1;
 		}
 		cmd = xmalloc(sizeof(struct command) + len - 80);
 		hashcpy(cmd->old_sha1, old_sha1);
@@ -275,7 +281,14 @@ static void read_head_info(void)
 
 static const char *unpack(int *error_code)
 {
-	int code = run_command_v_opt(1, unpacker, RUN_GIT_CMD);
+	int code;
+
+	if (keep_pack)
+		code = run_command_v_opt(ARRAY_SIZE(keep_packer) - 1,
+					 keep_packer, RUN_GIT_CMD);
+	else
+		code = run_command_v_opt(ARRAY_SIZE(unpacker) - 1,
+					 unpacker, RUN_GIT_CMD);
 
 	*error_code = 0;
 	switch (code) {
@@ -335,7 +348,7 @@ int main(int argc, char **argv)
 	if (!dir)
 		usage(receive_pack_usage);
 
-	if(!enter_repo(dir, 0))
+	if (!enter_repo(dir, 0))
 		die("'%s': unable to chdir or not a git archive", dir);
 
 	write_head_info();
diff --git a/send-pack.c b/send-pack.c
index 5bb123a..54d218c 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -6,13 +6,14 @@
 #include "exec_cmd.h"
 
 static const char send_pack_usage[] =
-"git-send-pack [--all] [--exec=git-receive-pack] <remote> [<head>...]\n"
+"git-send-pack [--all] [--keep] [--exec=git-receive-pack] <remote> [<head>...]\n"
 "  --all and explicit <head> specification are mutually exclusive.";
 static const char *exec = "git-receive-pack";
 static int verbose;
 static int send_all;
 static int force_update;
 static int use_thin_pack;
+static int keep_pack;
 
 static int is_zero_sha1(const unsigned char *sha1)
 {
@@ -270,6 +271,7 @@ static int send_pack(int in, int out, in
 	int new_refs;
 	int ret = 0;
 	int ask_for_status_report = 0;
+	int ask_to_keep_pack = 0;
 	int expect_status_report = 0;
 
 	/* No funny business with the matcher */
@@ -279,6 +281,8 @@ static int send_pack(int in, int out, in
 	/* Does the other end support the reporting? */
 	if (server_supports("report-status"))
 		ask_for_status_report = 1;
+	if (server_supports("keep-pack") && keep_pack)
+		ask_to_keep_pack = 1;
 
 	/* match them up */
 	if (!remote_tail)
@@ -355,12 +359,17 @@ static int send_pack(int in, int out, in
 		strcpy(old_hex, sha1_to_hex(ref->old_sha1));
 		new_hex = sha1_to_hex(ref->new_sha1);
 
-		if (ask_for_status_report) {
-			packet_write(out, "%s %s %s%c%s",
+		if (ask_for_status_report || ask_to_keep_pack) {
+			packet_write(out, "%s %s %s%c%s%s",
 				     old_hex, new_hex, ref->name, 0,
-				     "report-status");
+				     ask_for_status_report
+				     ? " report-status" : "",
+				     ask_to_keep_pack
+				     ? " keep-pack" : "");
+			if (ask_for_status_report)
+				expect_status_report = 1;
 			ask_for_status_report = 0;
-			expect_status_report = 1;
+			ask_to_keep_pack = 0;
 		}
 		else
 			packet_write(out, "%s %s %s",
@@ -419,6 +428,10 @@ int main(int argc, char **argv)
 				verbose = 1;
 				continue;
 			}
+			if (!strcmp(arg, "--keep")) {
+				keep_pack = 1;
+				continue;
+			}
 			if (!strcmp(arg, "--thin")) {
 				use_thin_pack = 1;
 				continue;
diff --git a/sha1_file.c b/sha1_file.c
index e89d24c..278ba2f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1292,7 +1292,7 @@ static void *read_packed_sha1(const unsi
 	return unpack_entry(&e, type, size);
 }
 
-void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size)
+void *read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size)
 {
 	unsigned long mapsize;
 	void *map, *buf;
@@ -1757,7 +1757,10 @@ int has_sha1_file(const unsigned char *s
 
 	if (find_pack_entry(sha1, &e, NULL))
 		return 1;
-	return find_sha1_file(sha1, &st) ? 1 : 0;
+	if (find_sha1_file(sha1, &st))
+		return 1;
+	reprepare_packed_git();
+	return find_pack_entry(sha1, &e, NULL);
 }
 
 /*
diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index 8afb899..d831f8d 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -78,4 +78,13 @@ test_expect_success \
 	! diff -u .git/refs/heads/master victim/.git/refs/heads/master
 '
 
+test_expect_success 'push with --keep' '
+	t=`cd victim && git-rev-parse --verify refs/heads/master` &&
+	git-update-ref refs/heads/master $t &&
+	: > foo &&
+	git add foo &&
+	git commit -m "one more" &&
+	git-send-pack --keep ./victim/.git/ master
+'
+
 test_done
-- 
1.4.3.3.g7d63


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] send-pack --keep: do not explode into loose objects on the receiving end.
  2006-10-29  7:47 ` [PATCH] send-pack --keep: do not explode into loose objects on the receiving end Junio C Hamano
@ 2006-10-29  7:56   ` Shawn Pearce
  2006-10-29  8:05     ` Junio C Hamano
  2006-10-30  1:44     ` Nicolas Pitre
  0 siblings, 2 replies; 51+ messages in thread
From: Shawn Pearce @ 2006-10-29  7:56 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Nicolas Pitre, git

Junio C Hamano <junkio@cox.net> wrote:
> This adds "keep-pack" extension to send-pack vs receive pack protocol,
> and makes the receiver invoke "index-pack --stdin --fix-thin".

I'm torn on this.  I see that keeping a pack vs. exploding to loose
objects is a local repository decision and thus should be determined
by the receiving repository, not the sending one.

I was thinking of just reading the pack header in receive-pack,
checking the object count, and if its over a configured threshold
call index-pack rather than unpack-objects.  Unfortunately I just
realized that if we read the pack header to make that decision then
its gone and the child process won't have it.  :-(

-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] send-pack --keep: do not explode into loose objects on the receiving end.
  2006-10-29  7:56   ` Shawn Pearce
@ 2006-10-29  8:05     ` Junio C Hamano
  2006-10-30  1:44     ` Nicolas Pitre
  1 sibling, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2006-10-29  8:05 UTC (permalink / raw
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

> I was thinking of just reading the pack header in receive-pack,
> checking the object count, and if its over a configured threshold
> call index-pack rather than unpack-objects.  Unfortunately I just
> realized that if we read the pack header to make that decision then
> its gone and the child process won't have it.  :-(

If you want to do that, that is certainly possible.

You can read the first block in the parent (without discarding),
make the decision and then fork()+exec() either unpack-objects
or index-pack and feed it from the parent.  The parent first
feeds the initial block it read to make that decision, and then
becomes a cat that reads from send-pack and writes to the child
process that is either unpack-objects or index-pack.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] send-pack --keep: do not explode into loose objects on the receiving end.
  2006-10-29  7:56   ` Shawn Pearce
  2006-10-29  8:05     ` Junio C Hamano
@ 2006-10-30  1:44     ` Nicolas Pitre
  1 sibling, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2006-10-30  1:44 UTC (permalink / raw
  To: Shawn Pearce; +Cc: Junio C Hamano, git

On Sun, 29 Oct 2006, Shawn Pearce wrote:

> Junio C Hamano <junkio@cox.net> wrote:
> > This adds "keep-pack" extension to send-pack vs receive pack protocol,
> > and makes the receiver invoke "index-pack --stdin --fix-thin".
> 
> I'm torn on this.  I see that keeping a pack vs. exploding to loose
> objects is a local repository decision and thus should be determined
> by the receiving repository, not the sending one.

I second this.  I think it is really not the remote end's business to 
decide what the local storage policy is, and vice versa.



^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2006-10-30  1:44 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-26  3:44 fetching packs and storing them as packs Nicolas Pitre
2006-10-26 14:45 ` Eran Tromer
     [not found]   ` <Pine.LNX.4.64.0610261105200.12418@xanadu.home>
2006-10-26 22:09     ` Eran Tromer
2006-10-27  0:50       ` Nicolas Pitre
2006-10-27  1:42         ` Shawn Pearce
2006-10-27  2:38           ` Sean
2006-10-27  6:57             ` Junio C Hamano
2006-10-27 17:23               ` Nicolas Pitre
2006-10-27  2:41           ` Nicolas Pitre
2006-10-27  2:42           ` Eran Tromer
2006-10-27  3:00             ` Shawn Pearce
2006-10-27  3:13               ` Sean
2006-10-27  3:20                 ` Jakub Narebski
2006-10-27  3:27                   ` Sean
2006-10-27  4:03               ` Eran Tromer
2006-10-27  4:42                 ` Shawn Pearce
2006-10-27  7:42                   ` Alex Riesen
2006-10-27  7:52                     ` Shawn Pearce
2006-10-27  8:08                       ` Alex Riesen
2006-10-27  8:13                         ` Shawn Pearce
2006-10-27 14:27               ` Nicolas Pitre
2006-10-27 14:38                 ` Petr Baudis
2006-10-27 14:48                   ` J. Bruce Fields
2006-10-27 15:03                     ` Petr Baudis
2006-10-27 16:04                       ` J. Bruce Fields
2006-10-27 16:05                         ` J. Bruce Fields
2006-10-27 18:56                   ` Junio C Hamano
2006-10-27 20:22   ` Linus Torvalds
2006-10-27 21:53     ` Junio C Hamano
2006-10-28  3:42       ` Shawn Pearce
2006-10-28  4:09         ` Junio C Hamano
2006-10-28  4:18         ` Linus Torvalds
2006-10-28  5:42           ` Junio C Hamano
2006-10-28  7:21             ` Shawn Pearce
2006-10-28  8:40               ` Shawn Pearce
2006-10-28 19:15                 ` Junio C Hamano
2006-10-29  3:50                   ` Shawn Pearce
2006-10-29  4:29                     ` Junio C Hamano
2006-10-29  4:38                       ` Shawn Pearce
2006-10-29  5:16                         ` Junio C Hamano
2006-10-29  5:21                           ` Shawn Pearce
2006-10-28 17:59               ` Linus Torvalds
2006-10-28 18:34               ` Junio C Hamano
2006-10-28 22:31               ` Eran Tromer
2006-10-29  3:38                 ` Shawn Pearce
2006-10-29  3:48                   ` Jakub Narebski
2006-10-29  3:52                     ` Shawn Pearce
2006-10-29  7:47 ` [PATCH] send-pack --keep: do not explode into loose objects on the receiving end Junio C Hamano
2006-10-29  7:56   ` Shawn Pearce
2006-10-29  8:05     ` Junio C Hamano
2006-10-30  1:44     ` Nicolas Pitre

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).