git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC][PATCH] Allow transfer of any valid sha1
@ 2006-05-24  7:51 Eric W. Biederman
  2006-05-24  9:07 ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2006-05-24  7:51 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git


While working on git-quiltimport I decided to see if
I could transform Andrews patches where he imports git tress into
git-pull commands, which should result in better history and better
attribution.

To be accurate of his source Andrew records the sha1 of the commit
and the git tree he pulled from.  Which looks like:

GIT b307e8548921c686d2eb948ca418ab2941876daa \
 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

So I figured I would transform the above line into the obvious
git-pull command:

 git-pull \
  git+ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git \
  b307e8548921c686d2eb948ca418ab2941876daa

To my surprise that didn't work.  There were a couple of little places
in the scripts where git-fetch and git-fetch-pack never expected to be
given a sha1 but that was easy to fix up, and had no real repercussions.  

More problematic was the little bit in git-upload pack that only
allows you to a sha1 if it was on the list of sha1 generated from
looking at the heads.  I'm not at all certain of the sense of
that check as you can get everything by just cloning the repository.

Can we fix the check in upload-pack.c something like my
patch below does?  Are there any security implications for
doing that?

Could we just make the final check before dying if (!o) ?





		/* We have sent all our refs already, and the other end
		 * should have chosen out of them; otherwise they are
		 * asking for nonsense.
		 *
		 * Hmph.  We may later want to allow "want" line that
		 * asks for something like "master~10" (symbolic)...
		 * would it make sense?  I don't know.
		 */

diff --git a/upload-pack.c b/upload-pack.c
index 47560c9..0f2e544 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -207,7 +207,9 @@ static int receive_needs(void)
 		 * would it make sense?  I don't know.
 		 */
 		o = lookup_object(sha1_buf);
-		if (!o || !(o->flags & OUR_REF))
+		if (!o)
+			o = parse_object(sha1_buf);
+		if (!o || ((o->type != commit_type) && (o->type != tag_type)))
 			die("git-upload-pack: not our ref %s", line+5);
 		if (!(o->flags & WANTED)) {
 			o->flags |= WANTED;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-24  7:51 [RFC][PATCH] Allow transfer of any valid sha1 Eric W. Biederman
@ 2006-05-24  9:07 ` Junio C Hamano
  2006-05-25  5:09   ` Eric W. Biederman
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-05-24  9:07 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git

ebiederm@xmission.com (Eric W. Biederman) writes:

> Can we fix the check in upload-pack.c something like my
> patch below does?  Are there any security implications for
> doing that?

> Could we just make the final check before dying if (!o) ?

The primary implication is about correctness, so I am reluctant
to break it without a careful alternative check in place.

The issue is that having a single object in the repository does
not guarantee that you have everything reachable from it, and we
need that guarantee.  Reachability from the refs is what
guarantees that.

We are careful to update the ref at the very end of the transfer
(fetch/clone or push); so if an object is reachable from a ref,
then all the objects reachable from that object are available in
the repository.

Imagine http commit walker started fetching tip of upstream into
your repository and you interrupted the transfer.  Objects near
the tip of the upstream history are available after such an
interrupted transfer.  But a bit older history (but still later
than what we had before we started the transfer) are not.

We do not update the ref with the downloaded tip object, so that
we would not break the guarantee.  This guarantee is needed for
feeding clients from the repository later.  If you tell your
clients, after such an interrupted transfer, that you are
willing to serve the objects near the (new) tip, the clients may
rightfully request objects that are reachable from these
objects, some of them you do _not_ have!

So this "on demand SHA1" stuff needs to be solved by checking if
the given object is reachable from our refs in upload-pack,
instead of the current check to see if the given object is
pointed by our refs.  When upload-pack can prove that the object
is reachable from one of the refs, it is OK to use it; otherwise
you should not.

Now, proving that a given SHA1 is the name of an object that
exists in the repository is cheap (has_sha1_file()), but proving
that the object is reachable from some of our refs can become
quite expensive.  That gives this issue a security implication
as well -- you can easily DoS the git-daemon that way, for
example.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-24  9:07 ` Junio C Hamano
@ 2006-05-25  5:09   ` Eric W. Biederman
  2006-05-25  6:36     ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2006-05-25  5:09 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Can we fix the check in upload-pack.c something like my
>> patch below does?  Are there any security implications for
>> doing that?
>
>> Could we just make the final check before dying if (!o) ?
>
> The primary implication is about correctness, so I am reluctant
> to break it without a careful alternative check in place.
>
> The issue is that having a single object in the repository does
> not guarantee that you have everything reachable from it, and we
> need that guarantee.  Reachability from the refs is what
> guarantees that.

I don't see why having something reachable from a ref guarantees
that everything is reachable.  Given the recent patch that added
a check to make certain a ref actually existed I believe there
is some evidence that trees may become corrupted, and have the
problems you describe.

> We are careful to update the ref at the very end of the transfer
> (fetch/clone or push); so if an object is reachable from a ref,
> then all the objects reachable from that object are available in
> the repository.

In the normal case I agree.

> Imagine http commit walker started fetching tip of upstream into
> your repository and you interrupted the transfer.  Objects near
> the tip of the upstream history are available after such an
> interrupted transfer.  But a bit older history (but still later
> than what we had before we started the transfer) are not.
>
> We do not update the ref with the downloaded tip object, so that
> we would not break the guarantee.  This guarantee is needed for
> feeding clients from the repository later.  If you tell your
> clients, after such an interrupted transfer, that you are
> willing to serve the objects near the (new) tip, the clients may
> rightfully request objects that are reachable from these
> objects, some of them you do _not_ have!

I clearly would not advertise it.  My problem is that I have
evidence that someone pulled a given sha1 at some point from 
some branch on a given repository.  But I don't have that branch.

Actually trees mirrored with rsync have similar problems all of
the time when the catch a tree in the middle of an update.

> So this "on demand SHA1" stuff needs to be solved by checking if
> the given object is reachable from our refs in upload-pack,
> instead of the current check to see if the given object is
> pointed by our refs.  When upload-pack can prove that the object
> is reachable from one of the refs, it is OK to use it; otherwise
> you should not.

I have a problem with that approach.  Suppose the branch I have
evidence something came from is like your pu branch.   If I want
a copy of your pu branch at some point in the past, but you have
rebased it since that sha1 was published then there will clearly not
be a path from any current head to that branch.  But if I still have a
copy of the sha1 I should actually be able to recover the old copy of
the pu branch from your tree.

> Now, proving that a given SHA1 is the name of an object that
> exists in the repository is cheap (has_sha1_file()), but proving
> that the object is reachable from some of our refs can become
> quite expensive.  That gives this issue a security implication
> as well -- you can easily DoS the git-daemon that way, for
> example.

Exactly, which is why I aimed for the cheap test.

There is a reasonable argument that can be made that the branches
represent the policy that you are willing to serve.  If you have a
tree and share a common object store with a much lager tree, like
David Woodhouse has set up, I can see such a policy being desirable.

That is an argument I have a much harder time shooting down.
At the same time if it is just a policy question the policy it should
be modifiable with an appropriate configuration directive, or
command line option.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25  5:09   ` Eric W. Biederman
@ 2006-05-25  6:36     ` Junio C Hamano
  2006-05-25 17:00       ` Eric W. Biederman
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-05-25  6:36 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git

ebiederm@xmission.com (Eric W. Biederman) writes:

> I clearly would not advertise it.  My problem is that I have
> evidence that someone pulled a given sha1 at some point from 
> some branch on a given repository.  But I don't have that branch.

If that was over rsync (as you mention later), then I would
consider that is an unfortunate unfixable issue.  rsync mirrors
are fundamentally unsafe for git -- Linus and I do not keep
saying rsync should be deprecated without good reasons.

There still might be bugs that breaks this guarantee outside
rsync, but if that is the case we should fix it.

I do not want to rehash the thread around Sep 29th 2005 here.
The entry point of that thread is this message:

	http://marc.theaimsgroup.com/?l=git&m=112795140820665

and the punch line are these two messages:

	http://marc.theaimsgroup.com/?l=git&m=112801874021223
	http://marc.theaimsgroup.com/?l=git&m=112802808030710

I did not realize what I was breaking initially.  I am not
ashamed of having been wrong, but it was embarrassing ;-).

> If I want
> a copy of your pu branch at some point in the past, but you have
> rebased it since that sha1 was published then there will clearly not
> be a path from any current head to that branch.  But if I still have a
> copy of the sha1 I should actually be able to recover the old copy of
> the pu branch from your tree.

Not necessarily.  I occasionally prune after rewinding.  When my
"pu" branch head does not point at the lost commit, the
repository may or may not have that object you happen to know I
used to have anymore.

>> Now, proving that a given SHA1 is the name of an object that
>> exists in the repository is cheap (has_sha1_file()), but proving
>> that the object is reachable from some of our refs can become
>> quite expensive.  That gives this issue a security implication
>> as well -- you can easily DoS the git-daemon that way, for
>> example.
>
> Exactly, which is why I aimed for the cheap test.

But the thing is the cheap test is broken, eh, rather,
propagates brokenness downstream (which is perhaps worse).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25  6:36     ` Junio C Hamano
@ 2006-05-25 17:00       ` Eric W. Biederman
  2006-05-25 17:28         ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2006-05-25 17:00 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> I clearly would not advertise it.  My problem is that I have
>> evidence that someone pulled a given sha1 at some point from 
>> some branch on a given repository.  But I don't have that branch.
>
> If that was over rsync (as you mention later), then I would
> consider that is an unfortunate unfixable issue.  rsync mirrors
> are fundamentally unsafe for git -- Linus and I do not keep
> saying rsync should be deprecated without good reasons.
>
> There still might be bugs that breaks this guarantee outside
> rsync, but if that is the case we should fix it.

Sounds reasonable.  So far I don't believe anything I have
proposed would result in a reference getting written
if we don't transfer all of the dependencies.

I do need to examine the algorithm by which we compute what
to transmit and make certain I have not broke that.
I believe I am still only using the existing references
for finding a common point in the history.

> I do not want to rehash the thread around Sep 29th 2005 here.
> The entry point of that thread is this message:
>
> 	http://marc.theaimsgroup.com/?l=git&m=112795140820665
>
> and the punch line are these two messages:
>
> 	http://marc.theaimsgroup.com/?l=git&m=112801874021223
> 	http://marc.theaimsgroup.com/?l=git&m=112802808030710
>
> I did not realize what I was breaking initially.  I am not
> ashamed of having been wrong, but it was embarrassing ;-).
>
>> If I want
>> a copy of your pu branch at some point in the past, but you have
>> rebased it since that sha1 was published then there will clearly not
>> be a path from any current head to that branch.  But if I still have a
>> copy of the sha1 I should actually be able to recover the old copy of
>> the pu branch from your tree.
>
> Not necessarily.  I occasionally prune after rewinding.  When my
> "pu" branch head does not point at the lost commit, the
> repository may or may not have that object you happen to know I
> used to have anymore.

Agreed.  Of course the simple object existence test works in that
instance.  Not that it does in general.  The could be a git-prune
versus upload-pack race for instance.

>>> Now, proving that a given SHA1 is the name of an object that
>>> exists in the repository is cheap (has_sha1_file()), but proving
>>> that the object is reachable from some of our refs can become
>>> quite expensive.  That gives this issue a security implication
>>> as well -- you can easily DoS the git-daemon that way, for
>>> example.
>>
>> Exactly, which is why I aimed for the cheap test.
>
> But the thing is the cheap test is broken, eh, rather,
> propagates brokenness downstream (which is perhaps worse).

As I understand it brokenness is writing a ref when you don't
have the complete tree it points to.  I have no desire
to do that.

My basic argument is that starting a pull with a commit that is not a
reference is no worse than staring a pull from a broken repository.  The
same checks that protects us should work in either case.

So as long as what I have done does not compromise the computation
of a common ancestor I think we should be fine.

I can see the argument that in a non-broken repository that finding
a path from an existing ref is proof that everything will work.
However if the code can be made to work without requiring that
proof it should be an even stronger guarantee of correctness,
and the expensive step of walking down from an existing ref would
be unnecessary.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 17:00       ` Eric W. Biederman
@ 2006-05-25 17:28         ` Linus Torvalds
  2006-05-25 17:59           ` Eric W. Biederman
  2006-05-25 18:28           ` Junio C Hamano
  0 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2006-05-25 17:28 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: Junio C Hamano, git



On Thu, 25 May 2006, Eric W. Biederman wrote:
> 
> My basic argument is that starting a pull with a commit that is not a
> reference is no worse than staring a pull from a broken repository.  The
> same checks that protects us should work in either case.

I think Junio reacted to the subject line, which was somewhat badly 
phrased. You're not looking to transfer random objects, you're looking to 
_start_ a branch at any arbitrary known point.

However, Junio's point is probably that the "any valid SHA1" might 
actually point to a broken tree, even if it exists on the server.

Of course, in that case hopefully git-rev-list exits with an error, and 
the server doesn't generate any pack at all rather than generating a 
broken one.

However, there's a (questionable) security issue: what if the server 
doesn't _want_ to expose certain branches? Arguably, if you know the top 
SHA1, you likely know all that it contains, but it may be a valid argument 
to say that if the SHA1 isn't an exported branch, you shouldn't 
necessarily be able to follow it.

		Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 17:28         ` Linus Torvalds
@ 2006-05-25 17:59           ` Eric W. Biederman
  2006-05-25 18:28           ` Junio C Hamano
  1 sibling, 0 replies; 18+ messages in thread
From: Eric W. Biederman @ 2006-05-25 17:59 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Thu, 25 May 2006, Eric W. Biederman wrote:
>> 
>> My basic argument is that starting a pull with a commit that is not a
>> reference is no worse than staring a pull from a broken repository.  The
>> same checks that protects us should work in either case.
>
> I think Junio reacted to the subject line, which was somewhat badly 
> phrased. You're not looking to transfer random objects, you're looking to 
> _start_ a branch at any arbitrary known point.

Probably, but if I understood enough to get the subject line right the
first time I probably would have understood enough to just send
a patch :)

> However, Junio's point is probably that the "any valid SHA1" might 
> actually point to a broken tree, even if it exists on the server.
>
> Of course, in that case hopefully git-rev-list exits with an error, and 
> the server doesn't generate any pack at all rather than generating a 
> broken one.
>
> However, there's a (questionable) security issue: what if the server 
> doesn't _want_ to expose certain branches? Arguably, if you know the top 
> SHA1, you likely know all that it contains, but it may be a valid argument 
> to say that if the SHA1 isn't an exported branch, you shouldn't 
> necessarily be able to follow it.

Agreed and I mentioned this one earlier.

However the only way the above scenario can even happen in a useful
manner is with a shared object store for several repositories.  Otherwise
you couldn't access the data you don't want to share.

I can't think of a valid argument against not sharing an entire
repository except David Woodhouse's bandwidth concern.
Of course what was wanted there was a test a limit to how far
back in the history you could look for a common commit, which
is something different.

In general it is much easier to guarantee that either a repository is
shared or it is not.  Making a guarantee that objects that
"git-fsck-objects --unreachable --full" identifies will never be
downloaded is difficult, and probably not worth encouraging
people to do.

That said it is easy to keep the current behavior as an option,
so the security policy issue shouldn't limit the technical discussion.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 17:28         ` Linus Torvalds
  2006-05-25 17:59           ` Eric W. Biederman
@ 2006-05-25 18:28           ` Junio C Hamano
  2006-05-25 18:36             ` Linus Torvalds
  2006-05-25 20:50             ` Eric W. Biederman
  1 sibling, 2 replies; 18+ messages in thread
From: Junio C Hamano @ 2006-05-25 18:28 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Eric W. Biederman, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Thu, 25 May 2006, Eric W. Biederman wrote:
>> 
>> My basic argument is that starting a pull with a commit that is not a
>> reference is no worse than staring a pull from a broken repository.  The
>> same checks that protects us should work in either case.
>
> I think Junio reacted to the subject line, which was somewhat badly 
> phrased. You're not looking to transfer random objects, you're looking to 
> _start_ a branch at any arbitrary known point.

I realize that now.  From Eric's original message:

  To be accurate of his source Andrew records the sha1 of the commit
  and the git tree he pulled from.  Which looks like:

  GIT b307e8548921c686d2eb948ca418ab2941876daa \
   git+ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

  So I figured I would transform the above line into the obvious
  git-pull command:

   git-pull \
    git+ssh://master.kernel.org/pub/scm/.../torvalds/linux-2.6.git \
    b307e8548921c686d2eb948ca418ab2941876daa

With the limitation of the current tool, we could do:

  git-fetch master.kernel.org:/pub/scm/.../torvalds/linux-2.6.git \
	refs/heads/master:refs/remotes/linus/master
  git merge 'whatever merge message' HEAD b307e854

assuming that b307e854 is reachable from your tip.  So it might
be just a matter of giving a convenient shorthand to do the
above two commands, instead of mucking with upload-pack.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 18:28           ` Junio C Hamano
@ 2006-05-25 18:36             ` Linus Torvalds
  2006-05-25 20:30               ` Eric W. Biederman
  2006-05-25 20:50             ` Eric W. Biederman
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2006-05-25 18:36 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Eric W. Biederman, git



On Thu, 25 May 2006, Junio C Hamano wrote:
> 
> With the limitation of the current tool, we could do:
> 
>   git-fetch master.kernel.org:/pub/scm/.../torvalds/linux-2.6.git \
> 	refs/heads/master:refs/remotes/linus/master
>   git merge 'whatever merge message' HEAD b307e854
> 
> assuming that b307e854 is reachable from your tip.  So it might
> be just a matter of giving a convenient shorthand to do the
> above two commands, instead of mucking with upload-pack.

It's not upload-pack that needs mucking with. It's simply "fetch-pack" 
that currently will refuse to say "want b307e854..", because the only 
thing it can do is say "want <headref>".

So the patch would literally be to have a way to tell fetch-pack directly 
what you want, and not have the "only select from remote branches" logic.

		Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 18:36             ` Linus Torvalds
@ 2006-05-25 20:30               ` Eric W. Biederman
  2006-05-25 20:53                 ` Junio C Hamano
  2006-05-26 10:04                 ` Junio C Hamano
  0 siblings, 2 replies; 18+ messages in thread
From: Eric W. Biederman @ 2006-05-25 20:30 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Junio C Hamano, Eric W. Biederman, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Thu, 25 May 2006, Junio C Hamano wrote:
>> 
>> With the limitation of the current tool, we could do:
>> 
>>   git-fetch master.kernel.org:/pub/scm/.../torvalds/linux-2.6.git \
>> 	refs/heads/master:refs/remotes/linus/master
>>   git merge 'whatever merge message' HEAD b307e854
>> 
>> assuming that b307e854 is reachable from your tip.  So it might
>> be just a matter of giving a convenient shorthand to do the
>> above two commands, instead of mucking with upload-pack.
>
> It's not upload-pack that needs mucking with. It's simply "fetch-pack" 
> that currently will refuse to say "want b307e854..", because the only 
> thing it can do is say "want <headref>".
>
> So the patch would literally be to have a way to tell fetch-pack directly 
> what you want, and not have the "only select from remote branches" logic.

So fixing fetch-pack is easy and pretty non-controversial.
The patch below handles that.

The problem is that I then run into the limitations in upload-pack.

(The movement of filter_refs may actually be overkill)

Eric



diff --git a/fetch-pack.c b/fetch-pack.c
index a3bcad0..c767d84 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -260,6 +260,27 @@ static void mark_recent_complete_commits
 	}
 }
 
+static struct ref **get_sha1_heads(struct ref **refs, int nr_heads, char **head)
+{
+	int i;
+	for (i  = 0; i < nr_heads; i++) {
+		struct ref *ref;
+		unsigned char sha1[20];
+		char *s = head[i];
+		int len = strlen(s);
+
+		if (len != 40 || get_sha1_hex(s, sha1))
+			continue;
+
+		ref = xcalloc(1, sizeof(*ref) + len + 1);
+		memcpy(ref->old_sha1, sha1, 20);
+		memcpy(ref->name, s, len + 1);
+		*refs = ref;
+		refs = &ref->next;
+	}
+	return refs;
+}
+
 static void filter_refs(struct ref **refs, int nr_match, char **match)
 {
 	struct ref *prev, *current, *next;
@@ -311,6 +332,8 @@ static int everything_local(struct ref *
 	if (cutoff)
 		mark_recent_complete_commits(cutoff);
 
+	filter_refs(refs, nr_match, match);
+
 	/*
 	 * Mark all complete remote refs as common refs.
 	 * Don't mark them common yet; the server has to be told so first.
@@ -329,8 +352,6 @@ static int everything_local(struct ref *
 		}
 	}
 
-	filter_refs(refs, nr_match, match);
-
 	for (retval = 1, ref = *refs; ref ; ref = ref->next) {
 		const unsigned char *remote = ref->old_sha1;
 		unsigned char local[20];
@@ -373,6 +394,7 @@ static int fetch_pack(int fd[2], int nr_
 		packet_flush(fd[1]);
 		die("no matching remote head");
 	}
+	get_sha1_heads(&ref, nr_match, match);
 	if (everything_local(&ref, nr_match, match)) {
 		packet_flush(fd[1]);
 		goto all_done;
diff --git a/git-parse-remote.sh b/git-parse-remote.sh
index 187f088..2372df8 100755
--- a/git-parse-remote.sh
+++ b/git-parse-remote.sh
@@ -105,6 +105,7 @@ canon_refs_list_for_fetch () {
 		'') remote=HEAD ;;
 		refs/heads/* | refs/tags/* | refs/remotes/*) ;;
 		heads/* | tags/* | remotes/* ) remote="refs/$remote" ;;
+		[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]) ;;
 		*) remote="refs/heads/$remote" ;;
 		esac
 		case "$local" in

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 18:28           ` Junio C Hamano
  2006-05-25 18:36             ` Linus Torvalds
@ 2006-05-25 20:50             ` Eric W. Biederman
  2006-05-25 21:04               ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2006-05-25 20:50 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano <junkio@cox.net> writes:

> Linus Torvalds <torvalds@osdl.org> writes:
>
>> On Thu, 25 May 2006, Eric W. Biederman wrote:
>>> 
>>> My basic argument is that starting a pull with a commit that is not a
>>> reference is no worse than staring a pull from a broken repository.  The
>>> same checks that protects us should work in either case.
>>
>> I think Junio reacted to the subject line, which was somewhat badly 
>> phrased. You're not looking to transfer random objects, you're looking to 
>> _start_ a branch at any arbitrary known point.
>
> I realize that now.  From Eric's original message:
>
>   To be accurate of his source Andrew records the sha1 of the commit
>   and the git tree he pulled from.  Which looks like:
>
>   GIT b307e8548921c686d2eb948ca418ab2941876daa \
>    git+ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>
>   So I figured I would transform the above line into the obvious
>   git-pull command:
>
>    git-pull \
>     git+ssh://master.kernel.org/pub/scm/.../torvalds/linux-2.6.git \
>     b307e8548921c686d2eb948ca418ab2941876daa
>
> With the limitation of the current tool, we could do:
>
>   git-fetch master.kernel.org:/pub/scm/.../torvalds/linux-2.6.git \
> 	refs/heads/master:refs/remotes/linus/master
>   git merge 'whatever merge message' HEAD b307e854
>
> assuming that b307e854 is reachable from your tip.  So it might
> be just a matter of giving a convenient shorthand to do the
> above two commands, instead of mucking with upload-pack.

If we conclude the fetch by sha1 path is not practical certainly.

There are a couple of problems with the just use the tool as
is approach.
- I don't know which branch I need to fetch.
  Although it looks like Andrew has kept that information when it was not the
  default branch so I can probably use that.
- Fetching a branch that I just want a subset of is wasteful.
- It feels really weird when everything else allows me to use sha1s
  for git-fetch to deny them.

Then there is the big hole in my plan to get better changelog information
that it appears that after Andrew pulls a branch he resolves some
merge conflicts.  If that is right I need to figure out how to address
that before I can improve git-quiltimport.sh.

To get a slightly better feel of the problem below is the complete list of git
trees that Andrew pulled in for 2.6.17-rc4-mm3.

Eric


GIT d684de2c4a498ec4edf4e6c3420b008c62be394c git+ssh://master.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git#test
GIT 34ec52e3356245e9a13dfcbc8460635e675f13cf git+ssh://master.kernel.org/pub/scm/linux/kernel/git/davej/agpgart.git
GIT 08e66777d094d93091a914a8746a9b93599e14a9 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/perex/alsa-current.git
GIT 0ef744735f0d82d90809935586a0d1043f7f09b5 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/viro/audit-current.git#master.b13
GIT cca5d8ad1f58f188500b9fb12ba6d98643a4cf49 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git#for-linus
GIT b3b6a155c2b85d436b192d74e459f837eab0944e git+ssh://master.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git#cfq
GIT 8ba86486650c59f969c589c7d6c3dd4da734c75a git+ssh://master.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6.git
GIT 2a1db55336a9e99f5dd7ee64e42fa4cbb509ea6a git+ssh://master.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
GIT 639b4408a9e8e014878c7538859f33f852c23882 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb.git#devel
GIT d2f222e6310b073ae3d91b8d3d676621fae1314e git+ssh://master.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6.git
GIT 3ac6c7b44560fdf2ea8865536bd52d4ff038107e git://git.infradead.org/hdrcleanup-2.6.git
GIT 12415e45ab0429a88412f4af365515adbe0bdd68 git://git.infradead.org/hdrinstall-2.6.git
GIT 155f23d603727fcb2af6c69ff77b74d1d4eb5bde git+ssh://master.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git#test
GIT ba9cfd16a13a932f0603a7f65b3881738a698ae1 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git#for-mm
GIT 51d797474f87b375819d084f7583a2864c5656c4 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/intelfb-2.6#i915fb
GIT c32217fdc98292dbafd5f51d3f43337081b01c29 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-klibc.git
GIT 9fe74aaa1dc55100d20d9b7be2ccbf84ad26ce84 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git#ALL
GIT 864fdc881dd9e0077f9ed11191055e3eabf3b2a5 git://www.linux-mips.org/pub/scm/upstream.git#for-akpm
GIT 0d25971d7c969debf76f9fab6d6b37cb62408f55 git://git.infradead.org/mtd-2.6.git
GIT b748b7167cbeb11e729c6f9c3472165903dd115e git+ssh://master.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git#ALL
GIT c4bdea3ce8b1d9b9d8dc44223542b3ebbe3a3020 git://git.linux-nfs.org/pub/linux/nfs-2.6.git
GIT f8b4c6027275d9b2d5004726a6d1bb818a13ddef git://oss.oracle.com/home/sourcebo/git/ocfs2.git/#ALL
GIT 35b86edf75270176310cb9507745d8c02b9e6592 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git/
GIT 3c06da5ae5358e9d325d541a053e1059e9654bcc git+ssh://master.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git
GIT aa783a8f31c79f493bd49ba926b171b79b9839fb git://git.infradead.org/users/dwmw2/rbtree-2.6.git
GIT ee69d3f20b23250eae98a4cb20236196694f0b81 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/jejb/aic94xx-sas-2.6.git
GIT 9f434d4f84a235f6b61aec6e691d6b07bc46fc24 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6.git
GIT 0df298d180556450cbe5edf12c1e890f6ac6ea97 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-target-2.6.git
GIT f1d282724317895f73c4c182041ab4385126c026 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git#stex
GIT 19932e4d7d2002bc956d1636e1c3a1d4455049fa git+ssh://master.kernel.org/pub/scm/linux/kernel/git/viro/bird.git#frv.b14
GIT 6ea79eadeba9b0d0ab08dcf7ee16df13e1fdadae git+ssh://master.kernel.org/pub/scm/linux/kernel/git/viro/bird.git#m32r.b14
GIT c3d6ecc77e8e5d4ac82c3bf27ed02b8c0d83d41d git+ssh://master.kernel.org/pub/scm/linux/kernel/git/viro/bird.git#m68k.b14
GIT e7dd49b206624021c23a27512d2a31503f5207f2 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/viro/bird.git#upf.b14
GIT ea0d175136582dff6e3591368b1da472ef3870b8 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/viro/bird.git#volatile.b14
GIT b29527edccbbc53a908bc34a910df3601061c950 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog-mm.git
GIT b307e8548921c686d2eb948ca418ab2941876daa git+ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 20:30               ` Eric W. Biederman
@ 2006-05-25 20:53                 ` Junio C Hamano
  2006-05-26  8:27                   ` Eric W. Biederman
  2006-05-26 10:04                 ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-05-25 20:53 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git

ebiederm@xmission.com (Eric W. Biederman) writes:

> So fixing fetch-pack is easy and pretty non-controversial.
> The patch below handles that.

I am at work so I cannot really spend time on this right now,
but I am OK with letting it send arbitrary SHA1 the caller
obtained out of band.  I do not know about your implementation,
since I haven't really looked at it.

> (The movement of filter_refs may actually be overkill)

It may not just overkill but may actively be wrong, but again I
haven't looked at it yet.

Will take a look tonight.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 20:50             ` Eric W. Biederman
@ 2006-05-25 21:04               ` Junio C Hamano
  2006-05-26  8:32                 ` Eric W. Biederman
  2006-06-08  9:33                 ` Eric W. Biederman
  0 siblings, 2 replies; 18+ messages in thread
From: Junio C Hamano @ 2006-05-25 21:04 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git

ebiederm@xmission.com (Eric W. Biederman) writes:

> - I don't know which branch I need to fetch.

As you say yourself Andrew marks which one he fetched from, so
this is a non-issue.

> - Fetching a branch that I just want a subset of is wasteful.

Generally this is true, but in practice and especially for this
particular application I do not think so.  After all Andrew
pulled from the tip and got that tip, and IIUYC you are trying
to follow what Andrew did, so you'd be better doing this soon
after Andrew annouces the series, so your subset would be a
close to 100% subset.  Otherwise you would have different
problem anyway -- the tree owner after seeing -mm tree has his
series may rewind and rebuild the branch in preparation of
feeding him with the next time around.

> - It feels really weird when everything else allows me to use sha1s
>   for git-fetch to deny them.

That is a real argument and I am not opposed to change
fetch-pack to ask for an arbitrary SHA1 the caller obtained out
of band.

> Then there is the big hole in my plan to get better changelog information
> that it appears that after Andrew pulls a branch he resolves some
> merge conflicts.  If that is right I need to figure out how to address
> that before I can improve git-quiltimport.sh.

The last time I talked with Andrew, he is not doing a merge nor
resolving merge conflicts.  He treats git primarily as a
patchbomb distribution mechanism, and works on (a rough
equivalent of) the output of format-patch from merge base
between his base tree and individual subsystem tree.  After that
things are normal quilt workflow outside git, whatever it is.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 20:53                 ` Junio C Hamano
@ 2006-05-26  8:27                   ` Eric W. Biederman
  0 siblings, 0 replies; 18+ messages in thread
From: Eric W. Biederman @ 2006-05-26  8:27 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> So fixing fetch-pack is easy and pretty non-controversial.
>> The patch below handles that.
>
> I am at work so I cannot really spend time on this right now,
> but I am OK with letting it send arbitrary SHA1 the caller
> obtained out of band.  I do not know about your implementation,
> since I haven't really looked at it.

Agreed.  I'm not certain about my implementation yet either I
just know I was in the ball park.

I needed the conversation to understand what the limits were.

>> (The movement of filter_refs may actually be overkill)
>
> It may not just overkill but may actively be wrong, but again I
> haven't looked at it yet.
>
> Will take a look tonight.

Sure.  The code was all a work in progress so I don't expect to
have all of the details ironed out.  In particular I didn't
even look at the non fetch-pack case, and I didn't update the
documentation.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 21:04               ` Junio C Hamano
@ 2006-05-26  8:32                 ` Eric W. Biederman
  2006-06-08  9:33                 ` Eric W. Biederman
  1 sibling, 0 replies; 18+ messages in thread
From: Eric W. Biederman @ 2006-05-26  8:32 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> ebiederm@xmission.com (Eric W. Biederman) writes:

>> - It feels really weird when everything else allows me to use sha1s
>>   for git-fetch to deny them.
>
> That is a real argument and I am not opposed to change
> fetch-pack to ask for an arbitrary SHA1 the caller obtained out
> of band.

Good this was the primary reason I kept pursuing the issue after
I figured out what it was.

>> Then there is the big hole in my plan to get better changelog information
>> that it appears that after Andrew pulls a branch he resolves some
>> merge conflicts.  If that is right I need to figure out how to address
>> that before I can improve git-quiltimport.sh.
>
> The last time I talked with Andrew, he is not doing a merge nor
> resolving merge conflicts.  He treats git primarily as a
> patchbomb distribution mechanism, and works on (a rough
> equivalent of) the output of format-patch from merge base
> between his base tree and individual subsystem tree.  After that
> things are normal quilt workflow outside git, whatever it is.

That sounds right.  I just know that there I had some strange
merge conflicts on the second git tree I pulled from.  Something
about a file being added twice.  It was one thing too many to
investigate this round.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 20:30               ` Eric W. Biederman
  2006-05-25 20:53                 ` Junio C Hamano
@ 2006-05-26 10:04                 ` Junio C Hamano
  2006-05-26 17:32                   ` Eric W. Biederman
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-05-26 10:04 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: Linus Torvalds, git

ebiederm@xmission.com (Eric W. Biederman) writes:

> diff --git a/fetch-pack.c b/fetch-pack.c
> index a3bcad0..c767d84 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -260,6 +260,27 @@ static void mark_recent_complete_commits
>  	}
>  }
>  
> +static struct ref **get_sha1_heads(struct ref **refs, int nr_heads, char **head)
> +{
> +	int i;
> +	for (i  = 0; i < nr_heads; i++) {
> +		struct ref *ref;
> +		unsigned char sha1[20];
> +		char *s = head[i];
> +		int len = strlen(s);
> +
> +		if (len != 40 || get_sha1_hex(s, sha1))
> +			continue;

So the new convention is fetch-pack can take ref name (as
before), or a bare 40-byte hexadecimal.  I think sane people
would not use ambiguous refname that says "deadbeef" five times,
and even if the do so they could disambiguate by explicitly
saying "refs/heads/" followed by "deadbeef" five times, so it
should be OK.

> +
> +		ref = xcalloc(1, sizeof(*ref) + len + 1);
> +		memcpy(ref->old_sha1, sha1, 20);
> +		memcpy(ref->name, s, len + 1);
> +		*refs = ref;
> +		refs = &ref->next;
> +	}
> +	return refs;
> +}
> +

This function takes the pointer to a location that holds a
pointer to a "struct ref" -- it is the location to store the
newly allocated ref structure, i.e. the next pointer of the last
element in the list.  When it returns, the location pointed at
by the pointer given to you points at the first element you
allocated, and it returns the next pointer of the last element
allocated by it.  That is the same calling convention as
connect.c::get_remote_heads().  So when calling this function to
append to a list you already have, you would give the next
pointer to the last element of the existing list.  But you do
not seem to do that.

I think the body of fetch_pack() should become something like:

	struct ref *ref, **tail;

        tail = get_remote_heads(fd[0], &ref, 0, NULL, 0);
	if (server_supports("multi_ack")) {
		...
	}
	tail = get_sha1_heads(tail, nr_match, match);
	if (everything_local(&ref, nr_match, match)) {
		...

> @@ -311,6 +332,8 @@ static int everything_local(struct ref *
>  	if (cutoff)
>  		mark_recent_complete_commits(cutoff);
>  
> +	filter_refs(refs, nr_match, match);
> +

I am not sure about this change.

In the original code we do not let get_remote_heads() to filter
the refs but call filter_refs() after the "mark all complete
remote refs as common" step for a reason.  Even though we may
not be fetching from some remote refs, we would want to take
advantage of the knowledge of what objects they have so that we
can mark as many objects as common as possible in the early
stage.  I suspect this change defeats that optimization.

So instead I would teach "mark all complete remote refs" loop
that not everything in refs list is a valid remote ref, and skip
what get_sha1_heads() injected, because these arbitrary ones we
got from the command line are not something we know exist on the
remote side.  Maybe something like this.

	/*
	 * Mark all complete remote refs as common refs.
	 * Don't mark them common yet; the server has to be told so first.
	 */
	for (ref = *refs; ref; ref = ref->next) {
		struct object *o;
                if (ref is SHA1 from the command line)
                	continue;
		o = deref_tag(lookup_object(ref->old_sha1), NULL, 0);
		if (!o || o->type != commit_type || !(o->flags & COMPLETE))
			continue;
		...

To implement "ref is SHA1 from the command line", I would add
another 1-bit field to "struct ref" and mark the new ones you
create in get_sha1_heads() as such (existing "force" field
could also become an 1-bit field -- we do not neeed a char).

> @@ -373,6 +394,7 @@ static int fetch_pack(int fd[2], int nr_
>  		packet_flush(fd[1]);
>  		die("no matching remote head");
>  	}
> +	get_sha1_heads(&ref, nr_match, match);

I talked about this one already...

> diff --git a/git-parse-remote.sh b/git-parse-remote.sh
> index 187f088..2372df8 100755
> --- a/git-parse-remote.sh
> +++ b/git-parse-remote.sh
> @@ -105,6 +105,7 @@ canon_refs_list_for_fetch () {
>  		'') remote=HEAD ;;
>  		refs/heads/* | refs/tags/* | refs/remotes/*) ;;
>  		heads/* | tags/* | remotes/* ) remote="refs/$remote" ;;
> +		[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]) ;;

Yuck.  Don't we have $_x40 somewhere?

We never use uppercase so at least we could save 24 columns from
here ;-).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-26 10:04                 ` Junio C Hamano
@ 2006-05-26 17:32                   ` Eric W. Biederman
  0 siblings, 0 replies; 18+ messages in thread
From: Eric W. Biederman @ 2006-05-26 17:32 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano <junkio@cox.net> writes:

> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index a3bcad0..c767d84 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -260,6 +260,27 @@ static void mark_recent_complete_commits
>>  	}
>>  }
>>  
>> +static struct ref **get_sha1_heads(struct ref **refs, int nr_heads, char
> **head)
>> +{
>> +	int i;
>> +	for (i  = 0; i < nr_heads; i++) {
>> +		struct ref *ref;
>> +		unsigned char sha1[20];
>> +		char *s = head[i];
>> +		int len = strlen(s);
>> +
>> +		if (len != 40 || get_sha1_hex(s, sha1))
>> +			continue;
>
> So the new convention is fetch-pack can take ref name (as
> before), or a bare 40-byte hexadecimal.  I think sane people
> would not use ambiguous refname that says "deadbeef" five times,
> and even if the do so they could disambiguate by explicitly
> saying "refs/heads/" followed by "deadbeef" five times, so it
> should be OK.

Yes.

>> +
>> +		ref = xcalloc(1, sizeof(*ref) + len + 1);
>> +		memcpy(ref->old_sha1, sha1, 20);
>> +		memcpy(ref->name, s, len + 1);
>> +		*refs = ref;
>> +		refs = &ref->next;
>> +	}
>> +	return refs;
>> +}
>> +
>
> This function takes the pointer to a location that holds a
> pointer to a "struct ref" -- it is the location to store the
> newly allocated ref structure, i.e. the next pointer of the last
> element in the list.  When it returns, the location pointed at
> by the pointer given to you points at the first element you
> allocated, and it returns the next pointer of the last element
> allocated by it.  That is the same calling convention as
> connect.c::get_remote_heads().  So when calling this function to
> append to a list you already have, you would give the next
> pointer to the last element of the existing list.  But you do
> not seem to do that.

Ack. That does look like a bug.  I knew there as something
fishy about that code.  But it worked for my basic testing so I didn't
worry about it.

> I think the body of fetch_pack() should become something like:
>
> 	struct ref *ref, **tail;
>
>         tail = get_remote_heads(fd[0], &ref, 0, NULL, 0);
> 	if (server_supports("multi_ack")) {
> 		...
> 	}
> 	tail = get_sha1_heads(tail, nr_match, match);
> 	if (everything_local(&ref, nr_match, match)) {
> 		...

Actually because we want the filter to resolve sha1s by
default in terms of what was passed on the command line.  I'm pretty
certain that should be:

	tail = get_sha1_heads(&ref, nr_match, match);
	tail = get_remote_heads(fd[0], tail, 0, NULL, 0);
        ...


>> @@ -311,6 +332,8 @@ static int everything_local(struct ref *
>>  	if (cutoff)
>>  		mark_recent_complete_commits(cutoff);
>>  
>> +	filter_refs(refs, nr_match, match);
>> +
>
> I am not sure about this change.

Agreed.  It was a hold over from an earlier way of injecting
the sha1 into the logic.  

As for what happens I think I need to audit everything that
takes a ref from fetch_pack.  To make certain I have not
messed up the logic.

> In the original code we do not let get_remote_heads() to filter
> the refs but call filter_refs() after the "mark all complete
> remote refs as common" step for a reason.  Even though we may
> not be fetching from some remote refs, we would want to take
> advantage of the knowledge of what objects they have so that we
> can mark as many objects as common as possible in the early
> stage.  I suspect this change defeats that optimization.

It feels like it.

> So instead I would teach "mark all complete remote refs" loop
> that not everything in refs list is a valid remote ref, and skip
> what get_sha1_heads() injected, because these arbitrary ones we
> got from the command line are not something we know exist on the
> remote side.  Maybe something like this.

Sounds sane.  We also introduce a new possibility of having a
ref that is complete but not remote.

> 	/*
> 	 * Mark all complete remote refs as common refs.
> 	 * Don't mark them common yet; the server has to be told so first.
> 	 */
> 	for (ref = *refs; ref; ref = ref->next) {
> 		struct object *o;
>                 if (ref is SHA1 from the command line)
>                 	continue;
> 		o = deref_tag(lookup_object(ref->old_sha1), NULL, 0);
> 		if (!o || o->type != commit_type || !(o->flags & COMPLETE))
> 			continue;
> 		...
>
> To implement "ref is SHA1 from the command line", I would add
> another 1-bit field to "struct ref" and mark the new ones you
> create in get_sha1_heads() as such (existing "force" field
> could also become an 1-bit field -- we do not neeed a char).

Sounds sane.
So that gives me:
	unsigned int force : 1;
	unsigned int injected : 1;

Which aligns them to an int boundary but since we are followed
immediately by a pointer should result in no additional storage being
consumed.

>> @@ -373,6 +394,7 @@ static int fetch_pack(int fd[2], int nr_
>>  		packet_flush(fd[1]);
>>  		die("no matching remote head");
>>  	}
>> +	get_sha1_heads(&ref, nr_match, match);
>
> I talked about this one already...
>
>> diff --git a/git-parse-remote.sh b/git-parse-remote.sh
>> index 187f088..2372df8 100755
>> --- a/git-parse-remote.sh
>> +++ b/git-parse-remote.sh
>> @@ -105,6 +105,7 @@ canon_refs_list_for_fetch () {
>>  		'') remote=HEAD ;;
>>  		refs/heads/* | refs/tags/* | refs/remotes/*) ;;
>>  		heads/* | tags/* | remotes/* ) remote="refs/$remote" ;;
>> +
> [0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])
> ;;
>
> Yuck.  Don't we have $_x40 somewhere?

I couldn't find one in shell.  

> We never use uppercase so at least we could save 24 columns from
> here ;-).

I'm not certain why we always add make $remote="refs/heads/$remote" by
default in that switch statement. git-fetch-pack at least doesn't need
it.

If that is true of the other consumers we could easily make the test:
[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]*) ;;
Or even simply make the default case *) ;;

But for the moment I will stick to the long form because it is
obviously correct.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] Allow transfer of any valid sha1
  2006-05-25 21:04               ` Junio C Hamano
  2006-05-26  8:32                 ` Eric W. Biederman
@ 2006-06-08  9:33                 ` Eric W. Biederman
  1 sibling, 0 replies; 18+ messages in thread
From: Eric W. Biederman @ 2006-06-08  9:33 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git


A quick status update.

I think I have clean working version of the sha1 transfer code,
I left on vacation before I could send it out so I need to dig
it out and make certain everything still applies.

I finally figured out what my problem pulling Andrew's changes
were.  git-quiltimport remembers what the previous commit was and when
I added merging I forgot to update that the variable that stores
the previous commit.  So since I had the history wrong git-merge
was finding the wrong common ancestor, which is an easy way
to mess up an automatic merge :)

> The last time I talked with Andrew, he is not doing a merge nor
> resolving merge conflicts.  He treats git primarily as a
> patchbomb distribution mechanism, and works on (a rough
> equivalent of) the output of format-patch from merge base
> between his base tree and individual subsystem tree.  After that
> things are normal quilt workflow outside git, whatever it is.

Andrews git import does appear to be a git-pull from an appropriate
tree and then a diff of the automatic merge result, so while
there doesn't appear to be manual merging there is a little
bit of automatic merging going on.

Anyway when I wake up in the morning I should see if I have
successfully imported Andres 2.6.17-rc5-mm3 tree.   All of that
pulling of git trees on demand noticeably slows down the import 
on my dinky test machine.  I'm not certain how much of that
a machine that had plenty of memory would see though.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-06-08  9:33 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-24  7:51 [RFC][PATCH] Allow transfer of any valid sha1 Eric W. Biederman
2006-05-24  9:07 ` Junio C Hamano
2006-05-25  5:09   ` Eric W. Biederman
2006-05-25  6:36     ` Junio C Hamano
2006-05-25 17:00       ` Eric W. Biederman
2006-05-25 17:28         ` Linus Torvalds
2006-05-25 17:59           ` Eric W. Biederman
2006-05-25 18:28           ` Junio C Hamano
2006-05-25 18:36             ` Linus Torvalds
2006-05-25 20:30               ` Eric W. Biederman
2006-05-25 20:53                 ` Junio C Hamano
2006-05-26  8:27                   ` Eric W. Biederman
2006-05-26 10:04                 ` Junio C Hamano
2006-05-26 17:32                   ` Eric W. Biederman
2006-05-25 20:50             ` Eric W. Biederman
2006-05-25 21:04               ` Junio C Hamano
2006-05-26  8:32                 ` Eric W. Biederman
2006-06-08  9:33                 ` Eric W. Biederman

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).