git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Josh Steadmon <steadmon@google.com>
Cc: git@vger.kernel.org, "René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
	"Jeff King" <peff@peff.net>, "Frack Bui-Huu" <fbui@suse.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: Re: Proposed approaches to supporting HTTP remotes in "git archive"
Date: Fri, 27 Jul 2018 14:56:44 -0700	[thread overview]
Message-ID: <20180727215644.GA223387@aiede.svl.corp.google.com> (raw)
In-Reply-To: <CANq=j3tK7QeBJOC7VNWkh4+WBNibMJJp5YUkd9te5NaYwukAow@mail.gmail.com>

(just cc-ing René Scharfe, archive expert; Peff; Dscho; Franck Bui-Huu
to see how his creation is evolving)
Josh Steadmon wrote:

> # Supporting HTTP remotes in "git archive"
>
> We would like to allow remote archiving from HTTP servers. There are a
> few possible implementations to be discussed:
>
> ## Shallow clone to temporary repo
>
> This approach builds on existing endpoints. Clients will connect to the
> remote server's git-upload-pack service and fetch a shallow clone of the
> requested commit into a temporary local repo. The write_archive()
> function is then called on the local clone to write out the requested
> archive.
>
> ### Benefits
>
> * This can be implemented entirely in builtin/archive.c. No new service
>   endpoints or server code are required.
>
> * The archive is generated and compressed on the client side. This
>   reduces CPU load on the server (for compressed archives) which would
>    otherwise be a potential DoS vector.
>
> * This provides a git-native way to archive any HTTP servers that
>   support the git-upload-pack service; some providers (including GitHub)
>   do not currently allow the git-upload-archive service.
>
> ### Drawbacks
>
> * Archives generated remotely may not be bit-for-bit identical compared
>   to those generated locally, if the versions of git used on the client
>   and on the server differ.
>
> * This requires higher bandwidth compared to transferring a compressed
>   archive generated on the server.
>
>
> ## Use git-upload-archive
>
> This approach requires adding support for the git-upload-archive
> endpoint to the HTTP backend. Clients will connect to the remote
> server's git-upload-archive service and the server will generate the
> archive which is then delivered to the client.
>
> ### Benefits
>
> * Matches existing "git archive" behavior for other remotes.
>
> * Requires less bandwidth to send a compressed archive than a shallow
>   clone.
>
> * Resulting archive does not depend in any way on the client
>   implementation.
>
> ### Drawbacks
>
> * Implementation is more complicated; it will require changes to (at
>   least) builtin/archive.c, http-backend.c, and
>   builtin/upload-archive.c.
>
> * Generates more CPU load on the server when compressing archives. This
>   is potentially a DoS vector.
>
> * Does not allow archiving from servers that don't support the
>   git-upload-archive service.
>
>
> ## Add a new protocol v2 "archive" command
>
> I am still a bit hazy on the exact details of this approach, please
> forgive any inaccuracies (I'm a new contributor and haven't examined
> custom v2 commands in much detail yet).
>
> This approach builds off the existing v2 upload-pack endpoint. The
> client will issue an archive command (with options to select particular
> paths or a tree-ish). The server will generate the archive and deliver
> it to the client.
>
> ### Benefits
>
> * Requires less bandwidth to send a compressed archive than a shallow
>   clone.
>
> * Resulting archive does not depend in any way on the client
>   implementation.
>
> ### Drawbacks
>
> * Generates more CPU load on the server when compressing archives. This
>   is potentially a DoS vector.
>
> * Servers must support the v2 protocol (although the client could
>   potentially fallback to some other supported remote archive
>    functionality).
>
> ### Unknowns
>
> * I am not clear on the relative complexity of this approach compared to
>   the others, and would appreciate any guidance offered.
>
>
> ## Summary
>
> Personally, I lean towards the first approach. It could give us an
> opportunity to remove server-side complexity; there is no reason that
> the shallow-clone approach must be restricted to the HTTP transport, and
> we could re-implement other transports using this method.  Additionally,
> it would allow clients to pull archives from remotes that would not
> otherwise support it.
>
> That said, I am happy to work on whichever approach the community deems
> most worthwhile.

  reply	other threads:[~2018-07-27 21:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-27 21:47 Proposed approaches to supporting HTTP remotes in "git archive" Josh Steadmon
2018-07-27 21:56 ` Jonathan Nieder [this message]
2018-07-27 22:00 ` Jonathan Nieder
2018-07-27 22:32 ` Junio C Hamano
2018-07-29 11:54   ` René Scharfe
2018-07-28 18:52 ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180727215644.GA223387@aiede.svl.corp.google.com \
    --to=jrnieder@gmail.com \
    --cc=fbui@suse.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=peff@peff.net \
    --cc=rene.scharfe@lsrfire.ath.cx \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).