git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] fix unparsed object access in upload-pack
@ 2013-03-16 10:24 Jeff King
  2013-03-16 10:25 ` [PATCH 1/3] upload-pack: drop lookup-before-parse optimization Jeff King
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Jeff King @ 2013-03-16 10:24 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

This series fixes the issue I mentioned recently with upload-pack, where
we might feed unparsed objects to the revision parser. The bug is in
435c833 (the tip of the jk/peel-ref topic), which is in v1.8.1 and up.

The fix should go to maint.  The bug breaks shallow clones from
repositories with packed refs, though the effects range from working OK
to sending too many objects to calling die() in the revision traversal
machinery. Given the size of the breakage (I had several bug reports
within a few hours of deploying v1.8.1.5 on github.com), and the fact
that the bug was introduced only in v1.8.1, it may make sense to roll a
v1.8.1.6 in addition to putting it on the 1.8.2 maint track.

  [1/3]: upload-pack: drop lookup-before-parse optimization

    This is just a cleanup, and is not necessary for the fix.

  [2/3]: upload-pack: make sure "want" objects are parsed

    This is the fix itself, which can stand alone from the other two
    patches, both semantically and textually.

  [3/3]: upload-pack: load non-tip "want" objects from disk

    While investigating the bug, I found some weirdness around the
    stateless-rpc check_non_tip code. As far as I can tell, that code
    never actually gets triggered. It's not too surprising that we
    wouldn't have noticed, because it is about falling back due to a
    race condition. But please sanity check my explanation and patch.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] upload-pack: drop lookup-before-parse optimization
  2013-03-16 10:24 [PATCH 0/3] fix unparsed object access in upload-pack Jeff King
@ 2013-03-16 10:25 ` Jeff King
  2013-03-16 10:27 ` [PATCH 2/3] upload-pack: make sure "want" objects are parsed Jeff King
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2013-03-16 10:25 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

When we receive a "have" line from the client, we want to
load the object pointed to by the sha1. However, we are
careful to do:

  o = lookup_object(sha1);
  if (!o || !o->parsed)
	  o = parse_object(sha1);

to avoid loading the object from disk if we have already
seen it.  However, since ccdc603 (parse_object: try internal
cache before reading object db), parse_object already does
this optimization internally. We can just call parse_object
directly.

Signed-off-by: Jeff King <peff@peff.net>
---
 upload-pack.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 30146a0..6bf81aa 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -325,9 +325,7 @@ static int got_sha1(char *hex, unsigned char *sha1)
 	if (!has_sha1_file(sha1))
 		return -1;
 
-	o = lookup_object(sha1);
-	if (!(o && o->parsed))
-		o = parse_object(sha1);
+	o = parse_object(sha1);
 	if (!o)
 		die("oops (%s)", sha1_to_hex(sha1));
 	if (o->type == OBJ_COMMIT) {
-- 
1.8.2.rc2.7.gef06216

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] upload-pack: make sure "want" objects are parsed
  2013-03-16 10:24 [PATCH 0/3] fix unparsed object access in upload-pack Jeff King
  2013-03-16 10:25 ` [PATCH 1/3] upload-pack: drop lookup-before-parse optimization Jeff King
@ 2013-03-16 10:27 ` Jeff King
  2013-03-16 10:28 ` [PATCH 3/3] upload-pack: load non-tip "want" objects from disk Jeff King
  2013-03-17  5:16 ` [PATCH 0/3] fix unparsed object access in upload-pack Junio C Hamano
  3 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2013-03-16 10:27 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

When upload-pack receives a "want" line from the client, it
adds it to an object array. We call lookup_object to find
the actual object, which will only check for objects already
in memory. This works because we are expecting to find
objects that we already loaded during the ref advertisement.

We use the resulting object structs for a variety of
purposes. Some of them care only about the object flags, but
others care about the type of the object (e.g.,
ok_to_give_up), or even feed them to the revision parser
(when --depth is used), which assumes that objects it
receives are fully parsed.

Once upon a time, this was OK; any object we loaded into
memory would also have been parsed. But since 435c833
(upload-pack: use peel_ref for ref advertisements,
2012-10-04), we try to avoid parsing objects during the ref
advertisement. This means that lookup_object may return an
object with a type of OBJ_NONE. The resulting mess depends
on the exact set of objects, but can include the revision
parser barfing, or the shallow code sending the wrong set of
objects.

This patch teaches upload-pack to parse each "want" object
as we receive it. We do not replace the lookup_object call
with parse_object, as the current code is careful not to let
just any object appear on a "want" line, but rather only one
we have previously advertised (whereas parse_object would
actually load any arbitrary object from disk).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5500-fetch-pack.sh | 9 +++++++++
 upload-pack.c         | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 354d32c..d574085 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -364,6 +364,15 @@ EOF
 	test_cmp count7.expected count7.actual
 '
 
+test_expect_success 'clone shallow with packed refs' '
+	git pack-refs --all &&
+	git clone --depth 1 --branch A "file://$(pwd)/." shallow8 &&
+	echo "in-pack: 4" > count8.expected &&
+	GIT_DIR=shallow8/.git git count-objects -v |
+		grep "^in-pack" > count8.actual &&
+	test_cmp count8.expected count8.actual
+'
+
 test_expect_success 'setup tests for the --stdin parameter' '
 	for head in C D E F
 	do
diff --git a/upload-pack.c b/upload-pack.c
index 6bf81aa..41736ec 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -639,7 +639,7 @@ static void receive_needs(void)
 			use_include_tag = 1;
 
 		o = lookup_object(sha1_buf);
-		if (!o)
+		if (!o || !parse_object(o->sha1))
 			die("git upload-pack: not our ref %s",
 			    sha1_to_hex(sha1_buf));
 		if (!(o->flags & WANTED)) {
-- 
1.8.2.rc2.7.gef06216

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] upload-pack: load non-tip "want" objects from disk
  2013-03-16 10:24 [PATCH 0/3] fix unparsed object access in upload-pack Jeff King
  2013-03-16 10:25 ` [PATCH 1/3] upload-pack: drop lookup-before-parse optimization Jeff King
  2013-03-16 10:27 ` [PATCH 2/3] upload-pack: make sure "want" objects are parsed Jeff King
@ 2013-03-16 10:28 ` Jeff King
  2013-03-17  5:16 ` [PATCH 0/3] fix unparsed object access in upload-pack Junio C Hamano
  3 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2013-03-16 10:28 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

It is a long-time security feature that upload-pack will not
serve any "want" lines that do not correspond to the tip of
one of our refs. Traditionally, this was enforced by
checking the objects in the in-memory hash; they should have
been loaded and received the OUR_REF flag during the
advertisement.

The stateless-rpc mode, however, has a race condition here:
one process advertises, and another receives the want lines,
so the refs may have changed in the interim.  To address
this, commit 051e400 added a new verification mode; if the
object is not OUR_REF, we set a "has_non_tip" flag, and then
later verify that the requested objects are reachable from
our current tips.

However, we still die immediately when the object is not in
our in-memory hash, and at this point we should only have
loaded our tip objects. So the check_non_tip code path does
not ever actually trigger, as any non-tip objects would
have already caused us to die.

We can fix that by using parse_object instead of
lookup_object, which will load the object from disk if it
has not already been loaded.

We still need to check that parse_object does not return
NULL, though, as it is possible we do not have the object
at all. A more appropriate error message would be "no such
object" rather than "not our ref"; however, we do not want
to leak information about what objects are or are not in
the object database, so we continue to use the same "not
our ref" message that would be produced by an unreachable
object.

Signed-off-by: Jeff King <peff@peff.net>
---
 upload-pack.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 41736ec..948cfff 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -638,8 +638,8 @@ static void receive_needs(void)
 		if (parse_feature_request(features, "include-tag"))
 			use_include_tag = 1;
 
-		o = lookup_object(sha1_buf);
-		if (!o || !parse_object(o->sha1))
+		o = parse_object(sha1_buf);
+		if (!o)
 			die("git upload-pack: not our ref %s",
 			    sha1_to_hex(sha1_buf));
 		if (!(o->flags & WANTED)) {
-- 
1.8.2.rc2.7.gef06216

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] fix unparsed object access in upload-pack
  2013-03-16 10:24 [PATCH 0/3] fix unparsed object access in upload-pack Jeff King
                   ` (2 preceding siblings ...)
  2013-03-16 10:28 ` [PATCH 3/3] upload-pack: load non-tip "want" objects from disk Jeff King
@ 2013-03-17  5:16 ` Junio C Hamano
  2013-03-17  5:40   ` Jeff King
  3 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2013-03-17  5:16 UTC (permalink / raw
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> This series fixes the issue I mentioned recently with upload-pack, where
> we might feed unparsed objects to the revision parser. The bug is in
> 435c833 (the tip of the jk/peel-ref topic), which is in v1.8.1 and up.

Good to see follow-up from a responsible contributor ;-)

> ... (I had several bug reports
> within a few hours of deploying v1.8.1.5 on github.com)

Nice to have a pro at the widely used site ;-)  I often wish it had
a mechanism to deploy the tip of 'master' or 'maint', or even 'next'
for 0.2% of its users' repositories to give us an early feedback.

>   [3/3]: upload-pack: load non-tip "want" objects from disk
>
>     While investigating the bug, I found some weirdness around the
>     stateless-rpc check_non_tip code. As far as I can tell, that code
>     never actually gets triggered. It's not too surprising that we
>     wouldn't have noticed, because it is about falling back due to a
>     race condition. But please sanity check my explanation and patch.

Thanks. That fall-back is Shawn's doing and I suspect that nobody is
exercising the codepath (he isn't).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] fix unparsed object access in upload-pack
  2013-03-17  5:16 ` [PATCH 0/3] fix unparsed object access in upload-pack Junio C Hamano
@ 2013-03-17  5:40   ` Jeff King
  2013-03-17  6:17     ` Junio C Hamano
  2013-03-17 16:38     ` René Scharfe
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff King @ 2013-03-17  5:40 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On Sat, Mar 16, 2013 at 10:16:40PM -0700, Junio C Hamano wrote:

> > ... (I had several bug reports
> > within a few hours of deploying v1.8.1.5 on github.com)
> 
> Nice to have a pro at the widely used site ;-)  I often wish it had
> a mechanism to deploy the tip of 'master' or 'maint', or even 'next'
> for 0.2% of its users' repositories to give us an early feedback.

We are quite slow and conservative at deploying new git, preferring
instead to let the rest of the world act as our testbed. :)

As seen with this bug, though, we really do get a lot more coverage of
weird cases due to our size. In the cases I looked at, the trigger
seemed to be clients doing the equivalent of of "git clone --depth=X
--no-single-branch". Almost nobody would do that intentionally, but
prior to v1.7.10, we did not have --single-branch; older clients using
--depth on a repository with multiple branches started failing clones
almost immediately.

We do have the capability to roll out to one or a few of our servers
(the granularity is not 0.2%, but it is still small). I'm going to try
to keep us more in sync with upstream git, but I don't know if I will
get to the point of ever deploying "master" or "next", even for a small
portion of the population. We are accumulating more hacks[1] on top of
git, so it is not just "run master for an hour on this server"; I have
to actually merge our fork.

I had been handling our hacks as patch series to be rebased on top of
upstream git releases, but that was getting increasingly unwieldy
(especially as people besides me work on it). Going forward, I'm going
to treat upstream git as a vendor branch and merge in occasionally to
get fixes.

One thing I might try is to keep a local "next-upstream" branch, that is
continually merging what is on upstream master with our local production
version. Then I could graduate it to production just like any other
topic branch when it comes time (instead of doing a gigantic painful
merge when we decide to upgrade upstream git).

That would mean I could test-deploy the "next-upstream" branch to a few
servers on any given day without doing a lot of work. So it might make
sense to do it at key times, like when we are in -rc here, to help shake
out any existing bugs before you make a release.

> >   [3/3]: upload-pack: load non-tip "want" objects from disk
> >
> >     While investigating the bug, I found some weirdness around the
> >     stateless-rpc check_non_tip code. As far as I can tell, that code
> >     never actually gets triggered. It's not too surprising that we
> >     wouldn't have noticed, because it is about falling back due to a
> >     race condition. But please sanity check my explanation and patch.
> 
> Thanks. That fall-back is Shawn's doing and I suspect that nobody is
> exercising the codepath (he isn't).

I almost wonder if we should cut it out entirely. It is definitely a
possible race condition, but I wonder if anybody actually hits it in
practice (and if they do, the consequence is that the fetch fails and
needs to be retried). As far as I can tell, the code path has never
actually been followed, and I do not recall ever seeing a bug report or
complaint about it (though perhaps it happened once, which spurred the
initial development?).

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] fix unparsed object access in upload-pack
  2013-03-17  5:40   ` Jeff King
@ 2013-03-17  6:17     ` Junio C Hamano
  2013-03-17  8:47       ` Jeff King
  2013-03-17 16:38     ` René Scharfe
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2013-03-17  6:17 UTC (permalink / raw
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

>> >   [3/3]: upload-pack: load non-tip "want" objects from disk
>> >
>> >     While investigating the bug, I found some weirdness around the
>> >     stateless-rpc check_non_tip code. As far as I can tell, that code
>> >     never actually gets triggered. It's not too surprising that we
>> >     wouldn't have noticed, because it is about falling back due to a
>> >     race condition. But please sanity check my explanation and patch.
>> 
>> Thanks. That fall-back is Shawn's doing and I suspect that nobody is
>> exercising the codepath (he isn't).
>
> I almost wonder if we should cut it out entirely. It is definitely a
> possible race condition, but I wonder if anybody actually hits it in
> practice (and if they do, the consequence is that the fetch fails and
> needs to be retried). As far as I can tell, the code path has never
> actually been followed, and I do not recall ever seeing a bug report or
> complaint about it (though perhaps it happened once, which spurred the
> initial development?).

If you run multiple servers serving the same repository at the same
URL with a small mirroring lag, one may observe a set of refs from
one server, that are a tad older than the other server you actually
fetch from.  k.org may have such an arrangement, but does GitHub
serve the same repository on multiple machines without tying the
same client to the same backend?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] fix unparsed object access in upload-pack
  2013-03-17  6:17     ` Junio C Hamano
@ 2013-03-17  8:47       ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2013-03-17  8:47 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On Sat, Mar 16, 2013 at 11:17:18PM -0700, Junio C Hamano wrote:

> > I almost wonder if we should cut it out entirely. It is definitely a
> > possible race condition, but I wonder if anybody actually hits it in
> > practice (and if they do, the consequence is that the fetch fails and
> > needs to be retried). As far as I can tell, the code path has never
> > actually been followed, and I do not recall ever seeing a bug report or
> > complaint about it (though perhaps it happened once, which spurred the
> > initial development?).
> 
> If you run multiple servers serving the same repository at the same
> URL with a small mirroring lag, one may observe a set of refs from
> one server, that are a tad older than the other server you actually
> fetch from.  k.org may have such an arrangement, but does GitHub
> serve the same repository on multiple machines without tying the
> same client to the same backend?

Each repository is a on a single backend host. They're redundant
internally (each host is actually multiple hosts), but pure-git requests
go to a single master for each host (though for some read-only
operations I think we spread the load across the redundant spares). You
might get a separate machine during a failover event, but they share
block devices via DRBD, so in theory an fsync() should hit both
machines, and there is no lag (and you are likely to get an intermittent
failure in such a case, anyway, since the machine serving your git
request probably died mid-packet).

I thought this change was to prevent against the common race:

  1. Client request stateless ref advertisement.

  2. Somebody updates ref.

  3. Client requests "want" objects based on old advertisement.

and I think it does solve that (assuming step 2 is not a rewind). The
important thing is that time always moves forward.

But if you are talking about mirror lag, time can move in either
direction. Imagine you have two machines, A and B, and A is missing an
update to B. If you hit A first, then B, it is the same as the update
sequence above. The patch helps. But if you hit B first, then A, you
will ask it for objects it has not yet received, and we must fail.

So I think any such mirroring setup would want to try very hard to make
sure you hit the same machine both times anyway, regardless of this
patch.

I'm fine to leave it. I was just questioning its utility since AFAICT,
it has never worked and nobody has cared. It's not too much code,
though, and it is only run when we hit the race, so I don't think it is
hurting anything.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] fix unparsed object access in upload-pack
  2013-03-17  5:40   ` Jeff King
  2013-03-17  6:17     ` Junio C Hamano
@ 2013-03-17 16:38     ` René Scharfe
  1 sibling, 0 replies; 9+ messages in thread
From: René Scharfe @ 2013-03-17 16:38 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, git

Am 17.03.2013 06:40, schrieb Jeff King:
> We do have the capability to roll out to one or a few of our servers
> (the granularity is not 0.2%, but it is still small). I'm going to try
> to keep us more in sync with upstream git, but I don't know if I will
> get to the point of ever deploying "master" or "next", even for a small
> portion of the population. We are accumulating more hacks[1] on top of
> git, so it is not just "run master for an hour on this server"; I have
> to actually merge our fork.

Did you perhaps intend to list these hacks in a footnote or link to a 
repository containing them?  (I can't find the counterpart of that [1].)

René

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-03-17 16:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-16 10:24 [PATCH 0/3] fix unparsed object access in upload-pack Jeff King
2013-03-16 10:25 ` [PATCH 1/3] upload-pack: drop lookup-before-parse optimization Jeff King
2013-03-16 10:27 ` [PATCH 2/3] upload-pack: make sure "want" objects are parsed Jeff King
2013-03-16 10:28 ` [PATCH 3/3] upload-pack: load non-tip "want" objects from disk Jeff King
2013-03-17  5:16 ` [PATCH 0/3] fix unparsed object access in upload-pack Junio C Hamano
2013-03-17  5:40   ` Jeff King
2013-03-17  6:17     ` Junio C Hamano
2013-03-17  8:47       ` Jeff King
2013-03-17 16:38     ` René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).