git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>,
	Victoria Dye <vdye@github.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [WIP v1 2/4] mv: add check_dir_in_index() and solve general dir check issue
Date: Fri, 1 Apr 2022 10:49:24 -0400	[thread overview]
Message-ID: <22aadea2-9330-aa9e-7b6a-834585189144@github.com> (raw)
In-Reply-To: <CAJyCBOQmUYe53ahpEXQZAWMoers0o7b1xuCYu_k-LrfvKTkV-g@mail.gmail.com>

On 4/1/2022 8:49 AM, Shaoxuan Yuan wrote:> On Fri, Apr 1, 2022 at 5:28 AM Victoria Dye <vdye@github.com> wrote:
>>
>> Shaoxuan Yuan wrote:
>>> Originally, moving a <source> directory which is not on-disk due
>>> to its existence outside of sparse-checkout cone, "giv mv" command
>>> errors out with "bad source".
>>>
>>> Add a helper check_dir_in_index() function to see if a directory
>>> name exists in the index. Also add a SPARSE_DIRECTORY bit to mark
>>> such directories.
>>>
>>
>> Hmm, I think this patch would fit better in your eventual "sparse index
>> integration" series than this "prerequisite fixes to sparse-checkout"
>> series. Sparse directories *only* appear when you're using a sparse index
>> so, theoretically, this shouldn't ever come up (and thus isn't testable)
>> until you're using a sparse index.
> 
> After reading your feedback, I realized that I totally misused
> the phrase "sparse directory". Clearly, this patch series does not
> deal with sparse-
> index yet, as "sparse directory" is a cache entry that points to a
> tree, if sparse-index
> is enabled. Silly me ;)
> 
> What I was *actually* trying to say is: I want to change the checking
> logic of moving
> a "directory that exists outside of sparse-checkout cone", and I
> apparently misused
> "sparse directory" to reference such a thing.

In the case of a full index (or an expanded sparse index, which is
currently always the case for `git mv`), you detect a sparse directory
by looking for the directory in the index, _not_ finding it, and then
seeing if the cache entry at the position where the directory _would_
exist is marked with the SKIP_WORKTREE bit.

This works in cone mode and the old mode because I assume you've already
checked for the existence of the directory, so if there _was_ any
non-SKIP_WORKTREE cache entry within the directory, then the directory
would exist in the worktree.

(These are good details to include in the commit message.)

>>> +static int check_dir_in_index(const char *dir)
>>> +{
>>
>> This function can be made a lot simpler - you can use `cache_name_pos()` to
>> find the index entry - if it's found, the directory exists as a sparse
>> directory. And, because 'add_slash()' enforces the trailing slash later on,
>> you don't need to worry about adjusting the name before you look for the
>> entry.
> 
> Yes, if I correctly used the phrase "sparse directory", but I did not...
> I think I can use 'cache_name_pos()' to
> check a directory *iff* it is a legit sparse directory when using sparse-index?
> 
> In my case, I just want to check a regular directory that is not in
> the worktree,
> since the cone pattern excludes it. And in a non-sparse index, cache
> entry points only
> to blobs, not trees, and that's the reason I wrote this weird function
> to look into the
> index. I understand that sounds not compatible with how git manages
> index, but all
> I want to know is "does this directory exist in the index?" (this
> question is also quasi-correct).
> 
> I tried to find an existing API for this job, but I failed to find
> any. Though I have a hunch
> that there must be something to do it...

You can still use cache_name_pos() and if the resulting value is negative,
then you can "flip it" (pos = -1 - pos) to get the array index where the
directory _would_ be inserted.

For example, here is a case in unpack-trees.c (that uses the synonym
index_name_pos()):

static int locate_in_src_index(const struct cache_entry *ce,
			       struct unpack_trees_options *o)
{
	struct index_state *index = o->src_index;
	int len = ce_namelen(ce);
	int pos = index_name_pos(index, ce->name, len);
	if (pos < 0)
		pos = -1 - pos;
	return pos;
}

This uses a binary search inside the method, so it will be much faster
than the loop you wrote here.

If you have this helper, then you can integrate with the sparse index later
by checking for a sparse directory entry when pos is non-negative. But that
can wait for the next series.

>>>                       /* only error if existence is expected. */
>>>                       else if (modes[i] != SPARSE)
>>>                               bad = _("bad source");
>>> @@ -219,7 +246,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>>                               && lstat(dst, &st) == 0)
>>>                       bad = _("cannot move directory over file");
>>>               else if (src_is_dir) {
>>> -                     int first = cache_name_pos(src, length), last;
>>> +                     int first, last;
>>> +dir_check:
>>> +                     first = cache_name_pos(src, length);
>>>
>>>                       if (first >= 0)
>>>                               prepare_move_submodule(src, first,
>>> @@ -230,7 +259,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>>                       else { /* last - first >= 1 */
>>>                               int j, dst_len, n;
>>>
>>> -                             modes[i] = WORKING_DIRECTORY;
>>> +                             if (!modes[i])
>>> +                                     modes[i] = WORKING_DIRECTORY;

This is curious that we could get here with an existing mode. I wonder if
it would be worthwhile to make the enum using a "flags" mode (each state
is a different bit in the word) so instead of

	modes[i] = WORKING_DIRECTORY;

we would write

	modes[i] |= WORKING_DIRECTORY;

>>>                               n = argc + last - first;
>>>                               REALLOC_ARRAY(source, n);
>>>                               REALLOC_ARRAY(destination, n);
>>> @@ -332,7 +362,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>>                       printf(_("Renaming %s to %s\n"), src, dst);
>>>               if (show_only)
>>>                       continue;
>>> -             if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) {
>>> +             if (mode != INDEX && mode != SPARSE && mode != SPARSE_DIRECTORY &&

And here we would write something like

	if (!(mode & (INDEX | SPARSE | SPARSE_DIRECTORY)) &&

>>> +              rename(src, dst) < 0) {
>>>                       if (ignore_errors)
>>>                               continue;
>>>                       die_errno(_("renaming '%s' failed"), src);
>>> @@ -346,7 +377,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>>                                                             1);
>>>               }
>>>
>>> -             if (mode == WORKING_DIRECTORY)
>>> +             if (mode == WORKING_DIRECTORY || mode == SPARSE_DIRECTORY)

and here:

	if (mode & (WORKING_DIRECTORY | SPARSE_DIRECTORY))

This requires changing your enum definition. It got lost in the previous
quoting, it seems, but here it is again:

>>> @@ -129,7 +148,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>>  		OPT_END(),
>>>  	};
>>>  	const char **source, **destination, **dest_path, **submodule_gitfile;
>>> -	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes;
>>> +	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE,
>>> +	SPARSE_DIRECTORY } *modes;
>>>  	struct stat st;
>>>  	struct string_list src_for_dst = STRING_LIST_INIT_NODUP;
>>>  	struct lock_file lock_file = LOCK_INIT;

I think it is time to split out "enum update_mode" to the top of the file
instead of defining it inline here. Here is what it could look like:

enum update_mode {
	BOTH = 0,
	WORKING_DIRECTORY = (1 << 1),
	INDEX = (1 << 2),
	SPARSE = (1 << 3),
};

(This is how it would look before adding your new value.)

I can imagine making a new commit that does the following:

* Move update_mode to the top and set the values to be independent bits.
* Change "mode[i] =" to "mode[i] |="
* Change "mode ==" checks to "mode &" checks.

Think about it.

>> I'm a bit confused - doesn't this mean the sparse dir move will be skipped?
>> In your commit description, you mention that this 'mv' succeeds with the
>> '--sparse' option, but I don't see any place where the sparse directory
>> would be moved.
> 
> Well, you know the drill, I did not use "sparse directory" correctly, let alone
> 'SPARSE_DIRECTORY' enum bit in this hunk. I think it makes some sense
> if you apply my actual meaning of 'SPARSE_DIRECTORY' here (it should be
> something like OUT_OF_CONE_WORKING_DIRECTORY)? Because such
> directory is not on disk, it cannot be "rename()"d, and should also skip the
> "rename_cache_entry_at()" function. If all the files under the directory are
> moved/renamed, then (in my opinion) the directory is both moved to the
> destination,
> both in the worktree and in the index.

Perhaps a better name would be SKIP_WORKTREE_DIR.

But yes, we need to make sure that all cache entries under the directory
have their SKIP_WORKTREE bits re-checked and any that lose the bit need to
be written to the worktree.

I wonder if it is as simple as marking a boolean that says "I moved at
least one sparse entry" and then calling update_sparsity() at the end of
cmd_mv() if that boolean is true.

Thanks,
-Stolee

  reply	other threads:[~2022-04-01 16:09 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-31  9:17 [WIP v1 0/4] mv: fix out-of-cone file/directory move logic Shaoxuan Yuan
2022-03-31  9:17 ` [WIP v1 1/4] mv: check if out-of-cone file exists in index with SKIP_WORKTREE bit Shaoxuan Yuan
2022-03-31 16:39   ` Victoria Dye
2022-04-01 14:30     ` Derrick Stolee
2022-03-31  9:17 ` [WIP v1 2/4] mv: add check_dir_in_index() and solve general dir check issue Shaoxuan Yuan
2022-03-31 10:25   ` Ævar Arnfjörð Bjarmason
2022-04-01  3:51     ` Shaoxuan Yuan
2022-03-31 21:28   ` Victoria Dye
2022-04-01 12:49     ` Shaoxuan Yuan
2022-04-01 14:49       ` Derrick Stolee [this message]
2022-04-04  7:25         ` Shaoxuan Yuan
2022-04-04  7:49           ` Shaoxuan Yuan
2022-04-04 12:43             ` Derrick Stolee
2022-03-31  9:17 ` [WIP v1 3/4] mv: add advise_to_reapply hint for moving file into cone Shaoxuan Yuan
2022-03-31 10:30   ` Ævar Arnfjörð Bjarmason
2022-04-01  4:00     ` Shaoxuan Yuan
2022-04-01  8:02       ` Ævar Arnfjörð Bjarmason
2022-04-03  2:01         ` Eric Sunshine
2022-03-31 21:56   ` Victoria Dye
2022-04-01 14:55   ` Derrick Stolee
2022-03-31  9:17 ` [WIP v1 4/4] t7002: add tests for moving out-of-cone file/directory Shaoxuan Yuan
2022-03-31 10:33   ` Ævar Arnfjörð Bjarmason
2022-03-31 22:11   ` Victoria Dye
2022-03-31  9:28 ` [WIP v1 0/4] mv: fix out-of-cone file/directory move logic Shaoxuan Yuan
2022-03-31 22:21 ` Victoria Dye
2022-04-01 12:18   ` Shaoxuan Yuan
2022-04-08 12:22 ` Shaoxuan Yuan
2022-05-27 10:07 ` [WIP v2 0/5] " Shaoxuan Yuan
2022-05-27 10:08   ` [WIP v2 1/5] t7002: add tests for moving out-of-cone file/directory Shaoxuan Yuan
2022-05-27 12:07     ` Ævar Arnfjörð Bjarmason
2022-05-27 14:48     ` Derrick Stolee
2022-05-27 15:51     ` Victoria Dye
2022-05-27 10:08   ` [WIP v2 2/5] mv: check if out-of-cone file exists in index with SKIP_WORKTREE bit Shaoxuan Yuan
2022-05-27 15:13     ` Derrick Stolee
2022-05-27 22:38       ` Victoria Dye
2022-05-31  8:06       ` Shaoxuan Yuan
2022-05-27 10:08   ` [WIP v2 3/5] mv: check if <destination> exists in index to handle overwriting Shaoxuan Yuan
2022-05-27 22:04     ` Victoria Dye
2022-05-27 10:08   ` [WIP v2 4/5] mv: add check_dir_in_index() and solve general dir check issue Shaoxuan Yuan
2022-05-27 15:27     ` Derrick Stolee
2022-05-31  9:56       ` Shaoxuan Yuan
2022-05-31 15:49         ` Derrick Stolee
2022-05-27 10:08   ` [WIP v2 5/5] mv: use update_sparsity() after touching sparse contents Shaoxuan Yuan
2022-05-27 12:10     ` Ævar Arnfjörð Bjarmason
2022-05-27 19:36     ` Victoria Dye
2022-05-27 19:59       ` Junio C Hamano
2022-05-27 21:24         ` Victoria Dye
2022-06-16 13:51           ` Shaoxuan Yuan
2022-06-16 16:42             ` Victoria Dye
2022-06-17  2:15               ` Shaoxuan Yuan
2022-06-19  3:25 ` [WIP v3 0/7] mv: fix out-of-cone file/directory move logic Shaoxuan Yuan
2022-06-19  3:25   ` [WIP v3 1/7] t7002: add tests for moving out-of-cone file/directory Shaoxuan Yuan
2022-06-21 21:23     ` Victoria Dye
2022-06-19  3:25   ` [WIP v3 2/7] mv: decouple if/else-if checks using goto Shaoxuan Yuan
2022-06-19  3:25   ` [WIP v3 3/7] mv: check if out-of-cone file exists in index with SKIP_WORKTREE bit Shaoxuan Yuan
2022-06-19  3:25   ` [WIP v3 4/7] mv: check if <destination> exists in index to handle overwriting Shaoxuan Yuan
2022-06-19  3:25   ` [WIP v3 5/7] mv: use flags mode for update_mode Shaoxuan Yuan
2022-06-21 22:32     ` Victoria Dye
2022-06-22  9:37       ` Shaoxuan Yuan
2022-06-19  3:25   ` [WIP v3 6/7] mv: add check_dir_in_index() and solve general dir check issue Shaoxuan Yuan
2022-06-21 22:55     ` Victoria Dye
2022-06-19  3:25   ` [WIP v3 7/7] mv: update sparsity after moving from out-of-cone to in-cone Shaoxuan Yuan
2022-06-21 23:11     ` Victoria Dye
2022-06-21 23:30   ` [WIP v3 0/7] mv: fix out-of-cone file/directory move logic Victoria Dye
2022-06-23 15:06     ` Derrick Stolee
2022-06-23 16:19       ` Junio C Hamano
2022-06-24  8:26         ` Shaoxuan Yuan
2022-06-23 11:41 ` [PATCH v4 " Shaoxuan Yuan
2022-06-23 11:41   ` [PATCH v4 1/7] t7002: add tests for moving out-of-cone file/directory Shaoxuan Yuan
2022-06-23 11:41   ` [PATCH v4 2/7] mv: update sparsity after moving from out-of-cone to in-cone Shaoxuan Yuan
2022-06-23 15:08     ` Derrick Stolee
2022-06-24  8:04       ` Shaoxuan Yuan
2022-06-27 13:55         ` Derrick Stolee
2022-06-23 11:41   ` [PATCH v4 3/7] mv: decouple if/else-if checks using goto Shaoxuan Yuan
2022-06-23 11:41   ` [PATCH v4 4/7] mv: check if out-of-cone file exists in index with SKIP_WORKTREE bit Shaoxuan Yuan
2022-06-23 11:41   ` [PATCH v4 5/7] mv: check if <destination> exists in index to handle overwriting Shaoxuan Yuan
2022-06-23 11:41   ` [PATCH v4 6/7] mv: use flags mode for update_mode Shaoxuan Yuan
2022-06-23 15:10     ` Derrick Stolee
2022-06-23 11:41   ` [PATCH v4 7/7] mv: add check_dir_in_index() and solve general dir check issue Shaoxuan Yuan
2022-06-23 15:14     ` Derrick Stolee
2022-06-24  7:57       ` Shaoxuan Yuan
2022-06-27 13:59         ` Derrick Stolee
2022-06-23 15:16   ` [PATCH v4 0/7] mv: fix out-of-cone file/directory move logic Derrick Stolee
2022-06-23 18:05     ` Junio C Hamano
2022-06-30  2:37 ` [PATCH v5 0/8] " Shaoxuan Yuan
2022-06-30  2:37   ` [PATCH v5 1/8] t7002: add tests for moving out-of-cone file/directory Shaoxuan Yuan
2022-06-30  2:37   ` [PATCH v5 2/8] t1092: mv directory from out-of-cone to in-cone Shaoxuan Yuan
2022-06-30  2:37   ` [PATCH v5 3/8] mv: update sparsity after moving " Shaoxuan Yuan
2022-06-30  2:37   ` [PATCH v5 4/8] mv: decouple if/else-if checks using goto Shaoxuan Yuan
2022-06-30  2:37   ` [PATCH v5 5/8] mv: check if out-of-cone file exists in index with SKIP_WORKTREE bit Shaoxuan Yuan
2022-06-30  2:37   ` [PATCH v5 6/8] mv: check if <destination> exists in index to handle overwriting Shaoxuan Yuan
2022-06-30  2:37   ` [PATCH v5 7/8] mv: use flags mode for update_mode Shaoxuan Yuan
2022-06-30  2:37   ` [PATCH v5 8/8] mv: add check_dir_in_index() and solve general dir check issue Shaoxuan Yuan
2022-07-01 19:43   ` [PATCH v5 0/8] mv: fix out-of-cone file/directory move logic Derrick Stolee
2022-07-01 21:50     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22aadea2-9330-aa9e-7b6a-834585189144@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).