git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "lilinchao@oschina.cn" <lilinchao@oschina.cn>
To: "Junio C Hamano" <gitster@pobox.com>,
	"Li Linchao via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git <git@vger.kernel.org>, "Derrick Stolee" <stolee@gmail.com>,
	dscho <johannes.schindelin@gmx.de>
Subject: Re: Re: [PATCH v3] builtin/clone.c: add --reject-shallow option
Date: Wed, 10 Feb 2021 17:07:08 +0800	[thread overview]
Message-ID: <5c4295e46b7f11eb8acc0024e87935e7@oschina.cn> (raw)
In-Reply-To: 026bd8966b1611eb975aa4badb2c2b1190694@pobox.com

--------------
lilinchao@oschina.cn
>"Li Linchao via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: lilinchao <lilinchao@oschina.cn>
>>
>> In some scenarios, users may want more history than the repository
>> offered for cloning, which mostly to be a shallow repository, can
>> give them.
>
>Sorry, but I find this hard to understand.  Are you saying that most
>of the repositories that users try to clone from are shallow? 
> 
Oh, sorry, it shoud be "which happens to be a shallow repository".

>> But users don't know it is a shallow repository until
>> they download it to local, users should have the option to refuse
>> to clone this kind of repository, and may want to exit the process
>> immediately without creating any unnecessary files.
>
>This one on the other hand is easy to understand, but we would
>probably need something like s/But/But because/.
> 
Ok, I will substitute it.

>> Althought there is an option '--depth=x' for users to decide how
>> deep history they can fetch, but as the unshallow cloning's depth
>> is INFINITY, we can't know exactly the minimun 'x' value that can
>> satisfy the minimum integrity, so we can't pass 'x' value to --depth,
>> and expect this can obtain a complete history of a repository.
>
>Hmph, that is an interesting point.  This makes me wonder if we can
>achieve the same without adding a new option at the UI level (e.g.
>by allowing "--depth" to take "infinity" and reject cloning if we
>find out that the origin repository is a shallow one).  But we can
>worry about it later once after we get the machinery driven by the
>UI right.
> 
Please let me explain the purpose of this patch in another way:
In practice, I may need a filter in clone command to filter out remote
shallow repository that may appear. I think a new option like '--reject-shallow',
or '--filter-shallow', or something like that, has a clearer purpose for users, they
don't have to come up with a specific depth number to achieve the same purpose.
A literal filter, or option is just ok, I think.

>> In other scenarios, given that we have an API that allow us to import
>> external repository, and then perform various operations on the repo.
>
>Sorry, but I do not understand what you want to say with these two
>lines ("Given that X and Y" needs to be followed by a concluding
>statement, e.g. "Given that we have API to import and operate, we
>can do Z"---you are missing that "we can do Z" part).
> 
Forgive my crappy English :-(
What I want to express is "if we have an API to do sth, then do sth".

>> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
>> index 02d9c19cec75..af5a97903a05 100644
>> --- a/Documentation/git-clone.txt
>> +++ b/Documentation/git-clone.txt
>> @@ -15,7 +15,7 @@ SYNOPSIS
>>    [--dissociate] [--separate-git-dir <git dir>]
>>    [--depth <depth>] [--[no-]single-branch] [--no-tags]
>>    [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
>> -	  [--[no-]remote-submodules] [--jobs <n>] [--sparse]
>> +	  [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--reject-shallow]
>
>Isn't it "--[no-]reject-shallow"?  Offering the negation from the
>command line is essential if "[clone] rejectshallow" configuration
>is allowed to set the default to true.
> 
Ok, that makes sense.

>> +--reject-shallow::
>> +	Don't clone a shallow source repository. In some scenarios, clients
>> +	want the cloned repository information to be complete. Otherwise,
>> +	the cloning process should end immediately without creating any files,
>> +	which can save some disk space. This can override `clone.rejectshallow`
>> +	from the configuration:
>> +
>> +	--------------------------------------------------------------------
>> +	$ git -c clone.rejectshallow=false clone --reject-shallow source out
>> +	--------------------------------------------------------------------
>> +
>> +	While there is a way to countermand a configured "I always refuse to
>> +	clone from a shallow repository" with "but let's allow it only this time":
>> +
>> +	----------------------------------------------------------------------
>> +	$ git -c clone.rejectshallow=true clone --no-reject-shallow source out
>> +	----------------------------------------------------------------------
>
>
>This is way too verbose and gives unnecessary details that readers
>already know or do not need to know (e.g. setting configuration from
>the command line and immediately override it from the command line
>is not something end-users would EVER need to do---only test writers
>who develop Git would need it).  Something like
>
>    Fail if the source repository is a shallow repository.  The
>    `clone.rejectShallow` configuration variable can be used to
>    give the default.  
>
>would be sufficient.  All readers ought to know when a configuration
>and command line option exist, the latter can be used to override
>the default former gives, and it is *not* a job for the description
>of an individual option to teach them to such a detail like the
>above does.
> 
>> +static int option_no_shallow = -1;  /* unspecified */
>> +static int config_shallow = -1;    /* unspecified */
>
>Hmph.  I would have expected the usual "prepare a single variable
>and initialize it to the default, read the config to set it, and
>then parse the command line to overwrite it" sequence would suffice
>so it is puzzling why we want two separate variables here.
>
>Let's read on to find out.
>
>> @@ -90,6 +92,8 @@ static struct option builtin_clone_options[] = {
>>  OPT__VERBOSITY(&option_verbosity),
>>  OPT_BOOL(0, "progress", &option_progress,
>>  N_("force progress reporting")),
>> +	OPT_BOOL(0, "reject-shallow", &option_no_shallow,
>> +	N_("don't clone shallow repository")),
>>  OPT_BOOL('n', "no-checkout", &option_no_checkout,
>>  N_("don't create a checkout")),
>>  OPT_BOOL(0, "bare", &option_bare, N_("create a bare repository")),
>> @@ -858,6 +862,9 @@ static int git_clone_config(const char *k, const char *v, void *cb)
>>  free(remote_name);
>>  remote_name = xstrdup(v);
>>  }
>> +	if (!strcmp(k, "clone.rejectshallow")) {
>> +	config_shallow = git_config_bool(k, v);
>> +	}
>>  return git_default_config(k, v, cb);
>>  }
>
>You are adding to git_clone_config(), so instead of setting the
>value to config_shallow, setting the value to the same variable that
>will be used in builtin_clone_options[] array should be sufficient.
>
>cmd_clone() begins like so:
>
>
>	git_config(git_clone_config, NULL);
>	argc = parse_options(...);
>
>which means that single variable (let's call it reject_shallow)
>can (1) stay to be its initial value if no config or option is
>given, (2) if there is config, the git_config() call will cause
>that variable assigned, (3) if there is option, parse_options()
>call will cause that variable assigned, possibly overwriting the
>value taken from the config.
> 
Sorry, you may forget there is a re-read git-config under these lines
(around at line 1160):
    	/*
	 * additional config can be injected with -c, make sure it's included
	 * after init_db, which clears the entire config environment.
	 */
	write_config(&option_config);

	/*
	 * re-read config after init_db and write_config to pick up any config
	 * injected by --template and --config, respectively.
	 */
	git_config(git_clone_config, NULL);

so, what I can think of is introducing a new variable for git_clone_config,
and I find that in the other place, "define a new config_xxx variable for
git-config" is usual.

>Which is exactly what we want.  So in short, declare just a single
>
>    static int reject_shallow; /* default to false */
>   
>instead of "option_no_shallow" and "config_shallow", and use it in
>both builtin_clone_options[] given to parse_options, and
>git_clone_config() that is given to git_config(), and you'd be fine,
>I think.
>
>> @@ -963,6 +970,7 @@ static int path_exists(const char *path)
>>  int cmd_clone(int argc, const char **argv, const char *prefix)
>>  {
>>  int is_bundle = 0, is_local;
>> +	int is_shallow = 0;
>>  const char *repo_name, *repo, *work_tree, *git_dir;
>>  char *path, *dir, *display_repo = NULL;
>>  int dest_exists, real_dest_exists = 0;
>> @@ -1205,6 +1213,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>> 
>>  path = get_repo_path(remote->url[0], &is_bundle);
>>  is_local = option_local != 0 && path && !is_bundle;
>> +
>> +	/* Detect if the remote repository is shallow */
>> +	if (!access(mkpath("%s/shallow", path), F_OK)) {
>> +	is_shallow = 1;
>> +	}
>
>This is only for cloning from a local repository, no?  IOW, path at
>this point may even be "git://github.com/git/git/" and checking with
>access() does not make sense.
>
>Ah, it is even worse.  get_repo_path() can return NULL, so mkpath()
>will crash in such a case.  This must be at least
>
>	if (path && !access(mkpath("%s/shallow", path), F_OK))
>	is_shallow = 1;
>
>but I think the logic fits better in the body of "if (is_Local)"
>thing that immediately follows.  It is specific to the case where
>cloning from a local repository and access(mkpath()) that is about
>the local filesystem (as opposed to going through the transport
>layer) belongs there.
>
>>  if (is_local) {
>>  if (option_depth)
>>  warning(_("--depth is ignored in local clones; use file:// instead."));
>> @@ -1214,7 +1228,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>>  warning(_("--shallow-exclude is ignored in local clones; use file:// instead."));
>>  if (filter_options.choice)
>>  warning(_("--filter is ignored in local clones; use file:// instead."));
>> -	if (!access(mkpath("%s/shallow", path), F_OK)) {
>> +	if (is_shallow) {
>>  if (option_local > 0)
>>  warning(_("source repository is shallow, ignoring --local"));
>>  is_local = 0;
>
>So, I think the above two hunks are making the code worse.  If we
>are to detect and reject cloning from the shallow repository when
>going through the transport layer (i.e. "--no-local" or cloning from
>"git://github.com/git/git", or "https://github.com/git/git", if it
>were a shallow repository), that must be handled separately.
> 
Sorry, I made the question simple. Reject cloning a shallow repository
should apply to all four type transport protocols. There still a bunch of
work to be done.

>> @@ -1222,6 +1236,25 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>>  }
>>  if (option_local > 0 && !is_local)
>>  warning(_("--local is ignored"));
>> +
>> +	if (is_shallow) {
>> +	int reject = 0;
>> +
>> +	/* If option_no_shallow is specified from CLI option,
>> +	* ignore config_shallow from git_clone_config.
>> +	*/
>> +
>> +	if (config_shallow != -1) {
>> +	reject = config_shallow;
>> +	}
>> +	if (option_no_shallow != -1) {
>> +	reject = option_no_shallow;
>> +	}
>
>I do not think any of the above is necessary with just a single
>reject_shallow variable that is initialized to 0, can be set by
>git_config() callback, and can further be set by parse_options().
>
>> +	if (reject) {
>> +	die(_("source repository is shallow, reject to clone."));
>> +	}
>
>> +	}
>> +
>>  transport->cloning = 1;
>> 
>>  transport_set_option(transport, TRANS_OPT_KEEP, "yes");
>
>I do not see how this change would allow users to reject cloning
>http://github.com/git/git, if that repository were shallow, though. 
>I think that would need changes to the code that interacts with
>these transport_* functions we see later part of this functrion.
>
>Thanks. 

On the whole, thank you for all your patience and suggestions!
I will dig into the code, and base on your suggestions to figure it out.

Finally, I wish you a Happy Lunar New Year! ^-^

-Lilinchao

  parent reply	other threads:[~2021-02-10  9:17 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-04  3:31 [PATCH] builtin/clone.c: add --no-shallow option Li Linchao via GitGitGadget
2021-02-04  5:50 ` Junio C Hamano
2021-02-04 10:32   ` lilinchao
2021-02-04 18:36     ` Junio C Hamano
2021-02-04 14:00 ` Johannes Schindelin
2021-02-04 18:24   ` Junio C Hamano
2021-02-08  8:31 ` [PATCH v2 0/2] " Li Linchao via GitGitGadget
2021-02-08  8:31   ` [PATCH v2 1/2] " lilinchao via GitGitGadget
2021-02-08  8:31   ` [PATCH v2 2/2] builtin/clone.c: add --reject-shallow option lilinchao via GitGitGadget
2021-02-08 13:33   ` [PATCH v2 0/2] builtin/clone.c: add --no-shallow option Derrick Stolee
     [not found]   ` <32bb0d006a1211ebb94254a05087d89a835@gmail.com>
2021-02-08 13:48     ` lilinchao
2021-02-08 14:12   ` [PATCH v3] builtin/clone.c: add --reject-shallow option Li Linchao via GitGitGadget
2021-02-09 20:32     ` Junio C Hamano
     [not found]     ` <026bd8966b1611eb975aa4badb2c2b1190694@pobox.com>
2021-02-10  9:07       ` lilinchao [this message]
2021-02-10 16:27         ` Junio C Hamano
     [not found]         ` <eaa219a86bbc11ebb6c7a4badb2c2b1165032@pobox.com>
2021-02-20 10:40           ` lilinchao
2021-02-21  7:05     ` [PATCH v4] " Li Linchao via GitGitGadget
2021-02-22 18:12       ` Junio C Hamano
2021-03-01 22:03         ` Jonathan Tan
2021-03-01 22:34           ` Junio C Hamano
2021-03-02  8:44           ` lilinchao
2021-03-03 23:59           ` Junio C Hamano
2021-03-04  1:53             ` Jonathan Tan
     [not found]       ` <8f3c00de753911eb93d3d4ae5278bc1270191@pobox.com>
2021-02-28 17:58         ` lilinchao
2021-02-28 18:06       ` [PATCH v5] " Li Linchao via GitGitGadget
2021-03-01  7:11         ` lilinchao
2021-03-01 22:40           ` Johannes Schindelin
2021-03-04  6:26             ` lilinchao
2021-03-03 23:21         ` Junio C Hamano
2021-03-04  5:50           ` lilinchao
2021-03-04 17:19         ` [PATCH v6] " Li Linchao via GitGitGadget
2021-03-12 18:25           ` lilinchao
2021-03-25 11:09           ` [PATCH v7] " Li Linchao via GitGitGadget
2021-03-25 20:31             ` Junio C Hamano
2021-03-25 22:57             ` Junio C Hamano
     [not found]             ` <19c9dc128da911ebacc7d4ae5278bc1233465@pobox.com>
2021-03-26  3:34               ` lilinchao
     [not found]             ` <7a71c96c8dbd11eb8bb0d4ae5278bc1296681@pobox.com>
2021-03-26  3:49               ` lilinchao
2021-03-29 10:19             ` [PATCH v8] " Li Linchao via GitGitGadget
2021-03-29 21:36               ` Junio C Hamano
2021-03-30  9:54               ` Johannes Schindelin
2021-03-30 17:46                 ` Junio C Hamano
2021-03-31 13:30                   ` Johannes Schindelin
     [not found]               ` <f8b2582c913d11ebaddbd4ae5278bc1214940@gmx.de>
2021-03-31 11:03                 ` lilinchao
2021-03-31 15:51               ` [PATCH v9] " lilinchao via GitGitGadget
2021-03-31 19:14                 ` Junio C Hamano
2021-03-31 22:24                   ` Johannes Schindelin
2021-03-31 22:37                     ` Junio C Hamano
2021-04-01 10:46                 ` [PATCH v10] " Li Linchao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c4295e46b7f11eb8acc0024e87935e7@oschina.cn \
    --to=lilinchao@oschina.cn \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).