From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 3/8] http-fetch: support fetching packfiles by URL
Date: Fri, 29 May 2020 16:25:52 -0700 [thread overview]
Message-ID: <xmqqeer2xr0f.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <6b3a628719e0593893e537de0220a5e0d5460232.1590789428.git.jonathantanmy@google.com> (Jonathan Tan's message of "Fri, 29 May 2020 15:30:15 -0700")
Jonathan Tan <jonathantanmy@google.com> writes:
> +'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin | --packfile | <commit>] <url>
> ...
> +--packfile::
> + Instead of a commit id on the command line (which is not expected in
> + this case), 'git http-fetch' fetches the packfile directly at the given
> + URL and uses index-pack to generate corresponding .idx and .keep files.
> + The output of index-pack is printed to stdout.
This makes sense as an external interface, I guess.
How should this interact with --stdin option, which is more like
"instead of getting a single <dest filename, object name> pair from
the command line, handle many pairs read from the standard input"
batch mode operation. Would it be beneficial to allow unbounded
number of packfiles, not just a single one, to be fetched and
indexed by a single invocation of the command? I suspect that given
the relatively large size of a single request for fetching a
packfile, one invocation of the command per packfile won't be too
heavy an overhead, so lack of such an orthogonality may only hurt
conceptual cleanliness, but not performance. OK.
> - if (argc != arg + 2 - commits_on_stdin)
> + if (argc != arg + 2 - (commits_on_stdin || packfile))
> usage(http_fetch_usage);
> if (commits_on_stdin) {
> commits = walker_targets_stdin(&commit_id, &write_ref);
> + } else if (packfile) {
> + /* URL will be set later */
Prefer to see an empty statement spelled more explicitly, like this:
; /* URL will be set later */
Otherwise reader would be left wondering if a line was (or lines
were) accidentally lost after this comment.
> } else {
> commit_id = (char **) &argv[arg++];
> commits = 1;
> }
>
> + if (packfile) {
> + url = xstrdup(argv[arg]);
> + } else {
> + if (argv[arg])
> + str_end_url_with_slash(argv[arg], &url);
> + }
>
> setup_git_directory();
>
> git_config(git_default_config, NULL);
>
> http_init(NULL, url, 0);
> + if (packfile) {
> + struct http_pack_request *preq;
> + struct slot_results results;
> + int ret;
> +
> + preq = new_http_pack_request(NULL, url);
> + if (preq == NULL)
> + die("couldn't create http pack request");
> + preq->slot->results = &results;
> + preq->generate_keep = 1;
> +
> + if (start_active_slot(preq->slot)) {
> + run_active_slot(preq->slot);
> + if (results.curl_result != CURLE_OK) {
> + die("Unable to get pack file %s\n%s", preq->url,
> + curl_errorstr);
> + }
> + } else {
> + die("Unable to start request");
> + }
> +
> + if ((ret = finish_http_pack_request(preq)))
> + die("finish_http_pack_request gave result %d", ret);
> + release_http_pack_request(preq);
> + rc = 0;
The above probably want to be a single helper function.
The other side of if/else may also become another helper function.
That way, the flow of control would become clearer. After all,
these two branches do not share all that much. Only http-init and
http-cleanup and nothing else.
For that matter, even before introducing this new mode of operation,
another patch to make a preparatory move of the original logic in
this function to a helper function that would be called from the
"else" side may make it easier to see what is going on.
> diff --git a/http.c b/http.c
> index 130e9d6259..ac66215ee6 100644
> --- a/http.c
> +++ b/http.c
> @@ -2280,15 +2280,18 @@ int finish_http_pack_request(struct http_pack_request *preq)
> int tmpfile_fd;
> int ret = 0;
>
> - close_pack_index(p);
> + if (p)
> + close_pack_index(p);
>
> fclose(preq->packfile);
> preq->packfile = NULL;
>
> - lst = preq->lst;
> - while (*lst != p)
> - lst = &((*lst)->next);
> - *lst = (*lst)->next;
> + if (p) {
> + lst = preq->lst;
> + while (*lst != p)
> + lst = &((*lst)->next);
> + *lst = (*lst)->next;
> + }
This is quite ugly. What is the original meaning of the target
field of the pack_request structure again? A packed_git structure
that will be filled when we are done fetching the packfile from the
other side and installed in our repository? When we are (ab)using
http_fetch code to fetch a single packfile, we do not install the
packfile into the running process, because we are only (re)using the
existing machinery as a poor-man's "curl | git index-pack --stdin"?
I do not think it is a bad idea to roll "curl | git index-pack
--stdin" our own, but I do find this an ugly way to do so. Perhaps
a set of lower-level helper functions can be isolated out of the
existing code before this new feature is added, and then a new "just
fetch and pipe it to the index-pack" feature should be written using
these helpers but with a separate set of entry points? Would it be
a good way to make the resulting code cleaner than this patch does?
I dunno.
> diff --git a/http.h b/http.h
> index a5b082f3ae..709dfa4c19 100644
> --- a/http.h
> +++ b/http.h
> @@ -223,12 +223,21 @@ struct http_pack_request {
> struct active_request_slot *slot;
>
> /*
> - * After calling new_http_pack_request(), point lst to the head of the
> + * After calling new_http_pack_request(), if fetching a pack that
> + * http_get_info_packs() told us about, point lst to the head of the
> * pack list that target is in. finish_http_pack_request() will remove
> * target from lst and call install_packed_git() on target.
> */
> struct packed_git **lst;
>
> + /*
> + * If this is true, finish_http_pack_request() will pass "--keep" to
> + * index-pack, resulting in the creation of a keep file, and will not
> + * suppress its stdout (that is, the "keep\t<hash>\n" line will be
> + * printed to stdout).
> + */
> + unsigned generate_keep : 1;
> +
I suspect that this is a sign that this single patch is trying to
do too many things at the same time.
- Whether we are fetching a single packfile from a URL, or walking
to fetch all the packfiles in the repository at a given URL
- Whether packfiles taken from outer space are marked with the
"keep" bit
- Whether the obtained packfile(s) are internally "installed"
to the running process
are conceptually independent choices, but somehow mixed up, it
seems.
Thanks.
next prev parent reply other threads:[~2020-05-29 23:26 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-29 22:30 [PATCH 0/8] CDN offloading update Jonathan Tan
2020-05-29 22:30 ` [PATCH 1/8] http: use --stdin when getting dumb HTTP pack Jonathan Tan
2020-05-29 23:00 ` Junio C Hamano
2020-06-01 20:37 ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 2/8] http: improve documentation of http_pack_request Jonathan Tan
2020-05-29 22:30 ` [PATCH 3/8] http-fetch: support fetching packfiles by URL Jonathan Tan
2020-05-29 23:25 ` Junio C Hamano [this message]
2020-06-01 20:54 ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 4/8] Documentation: order protocol v2 sections Jonathan Tan
2020-05-29 23:32 ` Junio C Hamano
2020-06-01 20:57 ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 5/8] Documentation: add Packfile URIs design doc Jonathan Tan
2020-05-30 0:15 ` Junio C Hamano
2020-05-30 0:22 ` Junio C Hamano
2020-06-01 23:10 ` Jonathan Tan
2020-06-01 23:07 ` Jonathan Tan
2020-06-10 1:14 ` Jonathan Tan
2020-06-10 17:16 ` Junio C Hamano
2020-06-10 18:04 ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 6/8] upload-pack: refactor reading of pack-objects out Jonathan Tan
2020-05-29 22:30 ` [PATCH 7/8] fetch-pack: support more than one pack lockfile Jonathan Tan
2020-05-29 22:30 ` [PATCH 8/8] upload-pack: send part of packfile response as uri Jonathan Tan
2020-05-31 16:59 ` Junio C Hamano
2020-05-31 17:35 ` Junio C Hamano
2020-06-01 23:20 ` Jonathan Tan
2020-06-01 20:00 ` Jonathan Nieder
2020-06-10 20:57 ` [PATCH v2 0/9] CDN offloading update Jonathan Tan
2020-06-10 20:57 ` [PATCH v2 1/9] http: use --stdin when indexing dumb HTTP pack Jonathan Tan
2020-06-11 1:10 ` Junio C Hamano
2020-06-10 20:57 ` [PATCH v2 2/9] http: refactor finish_http_pack_request() Jonathan Tan
2020-06-10 20:57 ` [PATCH v2 3/9] http-fetch: refactor into function Jonathan Tan
2020-06-10 20:57 ` [PATCH v2 4/9] http-fetch: support fetching packfiles by URL Jonathan Tan
2020-06-11 1:30 ` Junio C Hamano
2020-06-10 20:57 ` [PATCH v2 5/9] Documentation: order protocol v2 sections Jonathan Tan
2020-06-10 20:57 ` [PATCH v2 6/9] Documentation: add Packfile URIs design doc Jonathan Tan
2020-06-11 1:55 ` Junio C Hamano
2020-11-25 9:15 ` Ævar Arnfjörð Bjarmason
2020-11-25 19:09 ` Jonathan Tan
2020-12-01 12:48 ` Ævar Arnfjörð Bjarmason
2020-06-10 20:57 ` [PATCH v2 7/9] upload-pack: refactor reading of pack-objects out Jonathan Tan
2020-06-10 20:57 ` [PATCH v2 8/9] fetch-pack: support more than one pack lockfile Jonathan Tan
2020-06-11 1:41 ` Junio C Hamano
2020-06-10 20:57 ` [PATCH v2 9/9] upload-pack: send part of packfile response as uri Jonathan Tan
2020-06-10 23:16 ` [PATCH v2 0/9] CDN offloading update Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqeer2xr0f.fsf@gitster.c.googlers.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).