user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH] TODO: add note for "IMAP IDLE"-like long-polling "git fetch"
@ 2018-12-29  3:43 Eric Wong
  2018-12-29  3:56 ` Eric Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Wong @ 2018-12-29  3:43 UTC (permalink / raw)
  To: meta

---
 TODO | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/TODO b/TODO
index 87cadc9..c9ee756 100644
--- a/TODO
+++ b/TODO
@@ -90,3 +90,7 @@ all need to be considered for everything we introduce)
   davfs2 needs Range: request support for this to be feasible:
     https://savannah.nongnu.org/bugs/?33259
     https://savannah.nongnu.org/support/?107649
+
+* Contribute something like IMAP IDLE for "git fetch".
+  Inboxes (and any git repos) can be kept up-to-date without
+  relying on polling.
-- 
EW


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* "IMAP IDLE"-like long-polling "git fetch"
  2018-12-29  3:43 [PATCH] TODO: add note for "IMAP IDLE"-like long-polling "git fetch" Eric Wong
@ 2018-12-29  3:56 ` Eric Wong
  2018-12-29  4:38   ` Konstantin Ryabitsev
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Wong @ 2018-12-29  3:56 UTC (permalink / raw)
  To: git; +Cc: meta

Hey all, I just added this to the TODO file for public-inbox[1] but
obviously it's intended for git.git (meta@public-inbox cc-ed):

> +* Contribute something like IMAP IDLE for "git fetch".
> +  Inboxes (and any git repos) can be kept up-to-date without
> +  relying on polling.

I would've thought somebody had done this by now, but I guess
it's dependent on a bunch of things (TLS layer nowadays, maybe
HTTP/2), so git-daemon support alone wouldn't cut it...

Anyways, until this is implemented, feel free to continue
hammering a way on https://public-inbox.org/git/ with frequent
"git fetch".  I write C10K servers in my sleep -_-


[1] https://public-inbox.org/meta/20181229034342.11543-1-e@80x24.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "IMAP IDLE"-like long-polling "git fetch"
  2018-12-29  3:56 ` Eric Wong
@ 2018-12-29  4:38   ` Konstantin Ryabitsev
  2018-12-29  6:13     ` Eric Wong
  2019-01-09 22:27     ` Stefan Beller
  0 siblings, 2 replies; 9+ messages in thread
From: Konstantin Ryabitsev @ 2018-12-29  4:38 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, meta

On Sat, Dec 29, 2018 at 03:56:21AM +0000, Eric Wong wrote:
> Hey all, I just added this to the TODO file for public-inbox[1] but
> obviously it's intended for git.git (meta@public-inbox cc-ed):
> 
> > +* Contribute something like IMAP IDLE for "git fetch".
> > +  Inboxes (and any git repos) can be kept up-to-date without
> > +  relying on polling.
> 
> I would've thought somebody had done this by now, but I guess
> it's dependent on a bunch of things (TLS layer nowadays, maybe
> HTTP/2), so git-daemon support alone wouldn't cut it...

Polling is not all bad, especially for large repository collections. I'm
not sure you want to "idle" individual repositories when there's
thousands of them. We ended up writing grokmirror for replicating
repo collections using manifest files.

> Anyways, until this is implemented, feel free to continue
> hammering a way on https://public-inbox.org/git/ with frequent
> "git fetch".  I write C10K servers in my sleep -_-

The archive is also mirrored at
https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/git.git, and
also on kernel.googlesource.com.

-K

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "IMAP IDLE"-like long-polling "git fetch"
  2018-12-29  4:38   ` Konstantin Ryabitsev
@ 2018-12-29  6:13     ` Eric Wong
  2019-01-09 22:27     ` Stefan Beller
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Wong @ 2018-12-29  6:13 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git, meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Sat, Dec 29, 2018 at 03:56:21AM +0000, Eric Wong wrote:
> > Hey all, I just added this to the TODO file for public-inbox[1] but
> > obviously it's intended for git.git (meta@public-inbox cc-ed):
> > 
> > > +* Contribute something like IMAP IDLE for "git fetch".
> > > +  Inboxes (and any git repos) can be kept up-to-date without
> > > +  relying on polling.
> > 
> > I would've thought somebody had done this by now, but I guess
> > it's dependent on a bunch of things (TLS layer nowadays, maybe
> > HTTP/2), so git-daemon support alone wouldn't cut it...
> 
> Polling is not all bad, especially for large repository collections. I'm
> not sure you want to "idle" individual repositories when there's
> thousands of them. We ended up writing grokmirror for replicating
> repo collections using manifest files.

I wasn't intending it for giant sites like korg, but for
individual hackers on their workstations tracking a handful of
projects they follow.

The cost for a hackers' machine would be the same as the current
situation where developers idle on IRC channels for the projects
they're involved in.

> > Anyways, until this is implemented, feel free to continue
> > hammering a way on https://public-inbox.org/git/ with frequent
> > "git fetch".  I write C10K servers in my sleep -_-
> 
> The archive is also mirrored at
> https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/git.git, and
> also on kernel.googlesource.com.

Now, I'm wondering if you can make a v2 public-inbox mirror of
git@vger and run it on lore.  Converting public-inbox.org/git to
v2 would break things for everybody fetching, right now :<

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "IMAP IDLE"-like long-polling "git fetch"
  2018-12-29  4:38   ` Konstantin Ryabitsev
  2018-12-29  6:13     ` Eric Wong
@ 2019-01-09 22:27     ` Stefan Beller
  2019-01-09 22:49       ` Konstantin Ryabitsev
  2019-05-02  8:50       ` Eric Wong
  1 sibling, 2 replies; 9+ messages in thread
From: Stefan Beller @ 2019-01-09 22:27 UTC (permalink / raw)
  To: Eric Wong, git, meta

On Fri, Dec 28, 2018 at 8:39 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> On Sat, Dec 29, 2018 at 03:56:21AM +0000, Eric Wong wrote:
> > Hey all, I just added this to the TODO file for public-inbox[1] but
> > obviously it's intended for git.git (meta@public-inbox cc-ed):
> >
> > > +* Contribute something like IMAP IDLE for "git fetch".
> > > +  Inboxes (and any git repos) can be kept up-to-date without
> > > +  relying on polling.
> >
> > I would've thought somebody had done this by now, but I guess
> > it's dependent on a bunch of things (TLS layer nowadays, maybe
> > HTTP/2), so git-daemon support alone wouldn't cut it...
>
> Polling is not all bad, especially for large repository collections.

I disagree with that statement.

IIRC, More than half the bandwidth of Googles git servers are used
for ls-remote calls (i.e. polling a lot of repos, most of them did *not*
change, by build bots which are really eager to try again after a minute).

That is why we use a superproject, with all other repositories as
a submodule for polling, as that would slash the ls-remote traffic
approximately by the number of repositories.

There was an attempt in JGit to support this type of communication
of long polling at
https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/2adc572628f9382ace5fbd791325dc64f7c968d3
but not a whole lot is left over in JGit as it was refactored at least
once again.

IIRC the issues where in the lack of protocol definition that made it
usable for a wider audience.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "IMAP IDLE"-like long-polling "git fetch"
  2019-01-09 22:27     ` Stefan Beller
@ 2019-01-09 22:49       ` Konstantin Ryabitsev
  2019-05-02  8:50       ` Eric Wong
  1 sibling, 0 replies; 9+ messages in thread
From: Konstantin Ryabitsev @ 2019-01-09 22:49 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Eric Wong, git, meta

On Wed, Jan 09, 2019 at 02:27:25PM -0800, Stefan Beller wrote:
> > > I would've thought somebody had done this by now, but I guess
> > > it's dependent on a bunch of things (TLS layer nowadays, maybe
> > > HTTP/2), so git-daemon support alone wouldn't cut it...
> >
> > Polling is not all bad, especially for large repository collections.
> 
> I disagree with that statement.
> 
> IIRC, More than half the bandwidth of Googles git servers are used
> for ls-remote calls (i.e. polling a lot of repos, most of them did *not*
> change, by build bots which are really eager to try again after a minute).

Oh, that's not the kind of polling I meant -- we monitor a single
manifest file containing the state of all repositories. It's a static
file served directly by any httpd daemon, and the only traffic is
usually the "not modified" http header.

-K

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "IMAP IDLE"-like long-polling "git fetch"
  2019-01-09 22:27     ` Stefan Beller
  2019-01-09 22:49       ` Konstantin Ryabitsev
@ 2019-05-02  8:50       ` Eric Wong
  2019-05-02  9:21         ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Wong @ 2019-05-02  8:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, meta

Stefan Beller <sbeller@google.com> wrote:
> IIRC, More than half the bandwidth of Googles git servers are used
> for ls-remote calls (i.e. polling a lot of repos, most of them did *not*
> change, by build bots which are really eager to try again after a minute).

Thinking back at that statement; I think polling can be
optimized in git, at least.

IIRC, your repos have lots of refs; right?
(which is why it's a bandwidth problem)

Since info/refs is a static file (hopefully updated by a
post-update hook), the smart client can make an HTTP request
to check If-Modified-Since: to avoid the big response.

The client would need to cache the mtime of the last requested
refs file; somewhere.

IOW, do refs negotiation the "dumb" way; since it's no better
than the smart way, really.  Keep doing object transfers the
smart way.

During the initial clone, smart servers could probably
have a header informing clients that their info/refs
is up-to-date and clients can do dumb refs negotiation.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "IMAP IDLE"-like long-polling "git fetch"
  2019-05-02  8:50       ` Eric Wong
@ 2019-05-02  9:21         ` Ævar Arnfjörð Bjarmason
  2019-05-02  9:42           ` Eric Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-02  9:21 UTC (permalink / raw)
  To: Eric Wong; +Cc: Stefan Beller, git, meta


On Thu, May 02 2019, Eric Wong wrote:

> Stefan Beller <sbeller@google.com> wrote:
>> IIRC, More than half the bandwidth of Googles git servers are used
>> for ls-remote calls (i.e. polling a lot of repos, most of them did *not*
>> change, by build bots which are really eager to try again after a minute).
>
> Thinking back at that statement; I think polling can be
> optimized in git, at least.
>
> IIRC, your repos have lots of refs; right?
> (which is why it's a bandwidth problem)
>
> Since info/refs is a static file (hopefully updated by a
> post-update hook), the smart client can make an HTTP request
> to check If-Modified-Since: to avoid the big response.
>
> The client would need to cache the mtime of the last requested
> refs file; somewhere.
>
> IOW, do refs negotiation the "dumb" way; since it's no better
> than the smart way, really.  Keep doing object transfers the
> smart way.
>
> During the initial clone, smart servers could probably
> have a header informing clients that their info/refs
> is up-to-date and clients can do dumb refs negotiation.

Doing this with If-Modified-Since sounds like an easier drop-in
replacement (just needs a client change), but I wonder if ETag isn't a
better fit for this.

I.e. we'd document some convention where the ETag is a hash of the refs
the client expects to be advertised in some format, it then sends that
to the server.

That allows the same thing without anyone keeping more state than they
keep now in their local ref store

On the fancier side I think bloom filters are something that's been
discussed (and I believe someone (Twitter?) had such an internal patch),
i.e. the client sends a bloom filter of refs they have, and the server
advertises things they don't know about yet (and due to how bloom
filters work, some things they *do* know about already but tripped up
the bloom filter...).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "IMAP IDLE"-like long-polling "git fetch"
  2019-05-02  9:21         ` Ævar Arnfjörð Bjarmason
@ 2019-05-02  9:42           ` Eric Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2019-05-02  9:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Stefan Beller, git, meta

Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> 
> On Thu, May 02 2019, Eric Wong wrote:
> 
> > Stefan Beller <sbeller@google.com> wrote:
> >> IIRC, More than half the bandwidth of Googles git servers are used
> >> for ls-remote calls (i.e. polling a lot of repos, most of them did *not*
> >> change, by build bots which are really eager to try again after a minute).
> >
> > Thinking back at that statement; I think polling can be
> > optimized in git, at least.
> >
> > IIRC, your repos have lots of refs; right?
> > (which is why it's a bandwidth problem)
> >
> > Since info/refs is a static file (hopefully updated by a
> > post-update hook), the smart client can make an HTTP request
> > to check If-Modified-Since: to avoid the big response.
> >
> > The client would need to cache the mtime of the last requested
> > refs file; somewhere.
> >
> > IOW, do refs negotiation the "dumb" way; since it's no better
> > than the smart way, really.  Keep doing object transfers the
> > smart way.
> >
> > During the initial clone, smart servers could probably
> > have a header informing clients that their info/refs
> > is up-to-date and clients can do dumb refs negotiation.
> 
> Doing this with If-Modified-Since sounds like an easier drop-in
> replacement (just needs a client change), but I wonder if ETag isn't a
> better fit for this.

ETags overall could work.

> I.e. we'd document some convention where the ETag is a hash of the refs
> the client expects to be advertised in some format, it then sends that
> to the server.

But I was hoping to avoid the overhead of spawning git-http-backend
entirely.  And there's no consistent way to configure ETags on
different static servers.

> That allows the same thing without anyone keeping more state than they
> keep now in their local ref store

I think caching the remote info/refs is useful anyways in case
the user changes their fetch refspec, and it could speed up
invocations of "git ls-remote".

> On the fancier side I think bloom filters are something that's been
> discussed (and I believe someone (Twitter?) had such an internal patch),
> i.e. the client sends a bloom filter of refs they have, and the server
> advertises things they don't know about yet (and due to how bloom
> filters work, some things they *do* know about already but tripped up
> the bloom filter...).

I'm not smart enough to understand such fancy things :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-05-02  9:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-29  3:43 [PATCH] TODO: add note for "IMAP IDLE"-like long-polling "git fetch" Eric Wong
2018-12-29  3:56 ` Eric Wong
2018-12-29  4:38   ` Konstantin Ryabitsev
2018-12-29  6:13     ` Eric Wong
2019-01-09 22:27     ` Stefan Beller
2019-01-09 22:49       ` Konstantin Ryabitsev
2019-05-02  8:50       ` Eric Wong
2019-05-02  9:21         ` Ævar Arnfjörð Bjarmason
2019-05-02  9:42           ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).