From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=unavailable autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id B50EC1F453; Thu, 2 May 2019 09:42:55 +0000 (UTC) Date: Thu, 2 May 2019 09:42:55 +0000 From: Eric Wong To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Cc: Stefan Beller , git@vger.kernel.org, meta@public-inbox.org Subject: Re: "IMAP IDLE"-like long-polling "git fetch" Message-ID: <20190502094255.kbpzffokvdch63qg@dcvr> References: <20181229034342.11543-1-e@80x24.org> <20181229035621.cwjpknctq3rjnlhs@dcvr> <20181229043858.GA28509@pure.paranoia.local> <20190502085055.34kkll2deowat6il@dcvr> <87ftpxqkji.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87ftpxqkji.fsf@evledraar.gmail.com> List-Id: Ævar Arnfjörð Bjarmason wrote: > > On Thu, May 02 2019, Eric Wong wrote: > > > Stefan Beller wrote: > >> IIRC, More than half the bandwidth of Googles git servers are used > >> for ls-remote calls (i.e. polling a lot of repos, most of them did *not* > >> change, by build bots which are really eager to try again after a minute). > > > > Thinking back at that statement; I think polling can be > > optimized in git, at least. > > > > IIRC, your repos have lots of refs; right? > > (which is why it's a bandwidth problem) > > > > Since info/refs is a static file (hopefully updated by a > > post-update hook), the smart client can make an HTTP request > > to check If-Modified-Since: to avoid the big response. > > > > The client would need to cache the mtime of the last requested > > refs file; somewhere. > > > > IOW, do refs negotiation the "dumb" way; since it's no better > > than the smart way, really. Keep doing object transfers the > > smart way. > > > > During the initial clone, smart servers could probably > > have a header informing clients that their info/refs > > is up-to-date and clients can do dumb refs negotiation. > > Doing this with If-Modified-Since sounds like an easier drop-in > replacement (just needs a client change), but I wonder if ETag isn't a > better fit for this. ETags overall could work. > I.e. we'd document some convention where the ETag is a hash of the refs > the client expects to be advertised in some format, it then sends that > to the server. But I was hoping to avoid the overhead of spawning git-http-backend entirely. And there's no consistent way to configure ETags on different static servers. > That allows the same thing without anyone keeping more state than they > keep now in their local ref store I think caching the remote info/refs is useful anyways in case the user changes their fetch refspec, and it could speed up invocations of "git ls-remote". > On the fancier side I think bloom filters are something that's been > discussed (and I believe someone (Twitter?) had such an internal patch), > i.e. the client sends a bloom filter of refs they have, and the server > advertises things they don't know about yet (and due to how bloom > filters work, some things they *do* know about already but tripped up > the bloom filter...). I'm not smart enough to understand such fancy things :)