From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 01E41201A0 for ; Thu, 11 May 2017 20:52:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755817AbdEKUwS (ORCPT ); Thu, 11 May 2017 16:52:18 -0400 Received: from cloud.peff.net ([104.130.231.41]:49890 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755197AbdEKUwR (ORCPT ); Thu, 11 May 2017 16:52:17 -0400 Received: (qmail 17199 invoked by uid 109); 11 May 2017 20:52:16 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Thu, 11 May 2017 20:52:16 +0000 Received: (qmail 18762 invoked by uid 111); 11 May 2017 20:52:47 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Thu, 11 May 2017 16:52:47 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Thu, 11 May 2017 16:52:14 -0400 Date: Thu, 11 May 2017 16:52:14 -0400 From: Jeff King To: Jonathan Tan Cc: git@vger.kernel.org, jrnieder@gmail.com, spearce@spearce.org Subject: Re: [PATCH v3] fetch-pack: always allow fetching of literal SHA1s Message-ID: <20170511205214.ofwkqogksohs6s4b@sigill.intra.peff.net> References: <20170509182042.28389-1-jonathantanmy@google.com> <20170510221157.8971-1-jonathantanmy@google.com> <20170511094635.ljpwg3bxkqj2wmgc@sigill.intra.peff.net> <695d833e-9ef1-4d74-265e-f5e3af8a54cd@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <695d833e-9ef1-4d74-265e-f5e3af8a54cd@google.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, May 11, 2017 at 10:51:37AM -0700, Jonathan Tan wrote: > > This is inside the nr_sought loop. So if I were to do: > > > > git fetch origin $(git ls-remote origin | awk '{print $1}') > > > > we're back to being quadratic. I realize that's probably a silly thing > > to do, but in the general case, you're O(m*n), where "n" is number of > > unmatched remote refs and "m" is the number of SHA-1s you're looking > > for. > > The original patch was quadratic regardless of whether we're fetching names > or SHA-1s, which can be said to be bad since it is a regression in an > existing and common use case (and I agree). This one is O(m*n), as you said > - the "quadratic-ness" only kicks in if you fetch SHA-1s, which before this > patch is impossible. Yeah, no question this is an improvement over the original. I think this still "regresses" a certain request performance-wise, but it's a request that could never have succeeded in the original. Mostly I'd just rather not leave a hidden quadratic loop that will blow up in somebody's face a year or two down the road. > Having a sha1_array or oidset would require allocation (O(n log n) time, I > think, in addition to O(n) space) and this cost would be incurred regardless > of how many SHA-1s were actually fetched (if m is an order of magnitude less > than log n, for example, having a sha1_array might be actually worse). Also, > presumably we don't want to incur this cost if we are fetching zero SHA-1s, > so we would need to do some sort of pre-check. I'm inclined to leave it the > way it is and consider this only if the use case of fetching many SHA1-s > comes up. An oidset should be O(n) in both time and space. Which we're already spending in the earlier loop (and in general, when we allocate O(n) ref structs in the first place). I don't think we need to care too much about micro-optimizing bumps to the constant-factor here; we just need to make sure we don't end up in an algorithmic explosion. And if we initialize the oidset lazily, then we incur that cost only when we would be doing the linear walk in your patch anyway. See the patch below. > > AIUI, you could also avoid creating the unmatched list entirely when the > > server advertises tip/reachable sha1s. That's a small optimization, but > > I think it may actually make the logic clearer. > > If you mean adding an "if" block at the point where we add the unmatched ref > to the unmatched list (that either adds it to the list or immediately frees > it), I think it makes the logic slightly more complicated. Or maybe you had > something else in mind? I was just thinking that it does not leave a case where we create a data structure (the unmatched list) even though we know right then that it will not ever be used. So it's an extra line of logic there, but the overall function may be clearer. I dunno, it's probably not a big deal either way. -Peff --- diff --git a/fetch-pack.c b/fetch-pack.c index 5cace7458..a660331e6 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -15,6 +15,7 @@ #include "version.h" #include "prio-queue.h" #include "sha1-array.h" +#include "oidset.h" static int transfer_unpack_limit = -1; static int fetch_unpack_limit = -1; @@ -592,6 +593,12 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args, } } +static void add_refs_to_oidset(struct oidset *oids, const struct ref *refs) +{ + for (; refs; refs = refs->next) + oidset_insert(oids, &refs->old_oid); +} + static void filter_refs(struct fetch_pack_args *args, struct ref **refs, struct ref **sought, int nr_sought) @@ -600,6 +607,8 @@ static void filter_refs(struct fetch_pack_args *args, struct ref **newtail = &newlist; struct ref *unmatched = NULL; struct ref *ref, *next; + struct oidset tip_oids = OIDSET_INIT; + int tip_oids_initialized = 0; int i; i = 0; @@ -654,22 +663,15 @@ static void filter_refs(struct fetch_pack_args *args, (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) { can_append = 1; } else { - struct ref *u; - /* Check all refs, including those already matched */ - for (u = unmatched; u; u = u->next) { - if (!oidcmp(&ref->old_oid, &u->old_oid)) { - can_append = 1; - goto can_append; - } - } - for (u = newlist; u; u = u->next) { - if (!oidcmp(&ref->old_oid, &u->old_oid)) { - can_append = 1; - break; - } + if (!tip_oids_initialized) { + /* Check all refs, including those already matched */ + add_refs_to_oidset(&tip_oids, unmatched); + add_refs_to_oidset(&tip_oids, newlist); + tip_oids_initialized = 1; } + can_append = oidset_contains(&tip_oids, &ref->old_oid); } -can_append: + if (can_append) { ref->match_status = REF_MATCHED; *newtail = copy_ref(ref); @@ -680,6 +682,7 @@ static void filter_refs(struct fetch_pack_args *args, } *refs = newlist; + oidset_clear(&tip_oids); for (ref = unmatched; ref; ref = next) { next = ref->next; free(ref);