git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] sparse-checkout: respect core.ignoreCase in cone mode
@ 2019-12-09 19:42 Derrick Stolee via GitGitGadget
  2019-12-09 19:43 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-12-09 19:42 UTC (permalink / raw)
  To: git; +Cc: szeder.dev, newren, Derrick Stolee, Junio C Hamano

This is the first of several cleanups to the sparse-checkout feature that
had a large overhaul in ds/sparse-cone.

We have an internal customer that plans to use sparse-checkout as a core
feature of their version control experience. They use a case-insensitive
filesystem, so as they created their directory dependency graph they did not
keep consistent casing. This data is what they plan to feed to "git
sparse-checkout set".

While I would certainly prefer that the data is cleaned up (and that process
is ongoing), I could not argue with the logic that git add does the "right
thing" when core.ignoreCase is enabled.

Thanks, -Stolee

Derrick Stolee (1):
  sparse-checkout: respect core.ignoreCase in cone mode

 Documentation/git-sparse-checkout.txt |  4 ++++
 builtin/sparse-checkout.c             | 19 +++++++++++++++++--
 cache.h                               |  1 +
 name-hash.c                           | 10 ++++++++++
 t/t1091-sparse-checkout-builtin.sh    | 13 +++++++++++++
 5 files changed, 45 insertions(+), 2 deletions(-)


base-commit: cff4e9138d8df45e3b6199171092ee781cdadaeb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-488%2Fderrickstolee%2Fsparse-case-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-488/derrickstolee/sparse-case-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/488
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-09 19:42 [PATCH 0/1] sparse-checkout: respect core.ignoreCase in cone mode Derrick Stolee via GitGitGadget
@ 2019-12-09 19:43 ` Derrick Stolee via GitGitGadget
  2019-12-11 18:44   ` Junio C Hamano
  2019-12-12 20:59   ` Derrick Stolee
  2019-12-13 18:09 ` [PATCH v2 0/1] " Derrick Stolee via GitGitGadget
  2019-12-18 11:41 ` [PATCH " Ed Maste
  2 siblings, 2 replies; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-12-09 19:43 UTC (permalink / raw)
  To: git; +Cc: szeder.dev, newren, Derrick Stolee, Junio C Hamano,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When a user uses the sparse-checkout feature in cone mode, they
add patterns using "git sparse-checkout set <dir1> <dir2> ..."
or by using "--stdin" to provide the directories line-by-line over
stdin. This behaviour naturally looks a lot like the way a user
would type "git add <dir1> <dir2> ..."

If core.ignoreCase is enabled, then "git add" will match the input
using a case-insensitive match. Do the same for the sparse-checkout
feature.

The sanitize_cone_input() method is named to imply that other checks
may be added. In fact, such checks are planned including looking for
wildcards that make the paths invalid cone patterns or must be
escaped.

Specifically, if the path has a match in the index, then use that
path instead. If there is no match, still add that path to the
patterns, as the user may expect the directory to appear after a
checkout to another ref. However, we have no matching path to
correct for a case conflict, and must assume that the user provided
the correct case.

Another option would be to do case-insensitive checks while
updating the skip-worktree bits during unpack_trees(). Outside of
the potential performance loss on a more expensive code path, that
also breaks compatibility with older versions of Git as using the
same sparse-checkout file would change the paths that are included.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt |  4 ++++
 builtin/sparse-checkout.c             | 19 +++++++++++++++++--
 cache.h                               |  1 +
 name-hash.c                           | 10 ++++++++++
 t/t1091-sparse-checkout-builtin.sh    | 13 +++++++++++++
 5 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index b975285673..849efa0f0b 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -150,6 +150,10 @@ expecting patterns of these types. Git will warn if the patterns do not match.
 If the patterns do match the expected format, then Git will use faster hash-
 based algorithms to compute inclusion in the sparse-checkout.
 
+If `core.ignoreCase=true`, then the 'git sparse-checkout set' command will
+correct for incorrect case when assigning patterns to the sparse-checkout
+file.
+
 SEE ALSO
 --------
 
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index a542d617a5..0de426384e 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -336,6 +336,22 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
 	}
 }
 
+static void sanitize_cone_input(struct strbuf *line)
+{
+	if (ignore_case) {
+		struct index_state *istate = the_repository->index;
+		const char *name = index_dir_matching_name(istate, line->buf, line->len);
+
+		if (name) {
+			strbuf_setlen(line, 0);
+			strbuf_addstr(line, name);
+		}
+	}
+
+	if (line->buf[0] != '/')
+		strbuf_insert(line, 0, "/", 1);
+}
+
 static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
 {
 	strbuf_trim(line);
@@ -345,8 +361,7 @@ static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
 	if (!line->len)
 		return;
 
-	if (line->buf[0] != '/')
-		strbuf_insert(line, 0, "/", 1);
+	sanitize_cone_input(line);
 
 	insert_recursive_pattern(pl, line);
 }
diff --git a/cache.h b/cache.h
index d3c89e7a53..a2d9d437f0 100644
--- a/cache.h
+++ b/cache.h
@@ -728,6 +728,7 @@ int repo_index_has_changes(struct repository *repo,
 int verify_path(const char *path, unsigned mode);
 int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
 int index_dir_exists(struct index_state *istate, const char *name, int namelen);
+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen);
 void adjust_dirname_case(struct index_state *istate, char *name);
 struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
 
diff --git a/name-hash.c b/name-hash.c
index ceb1d7bd6f..46898b6571 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -681,6 +681,16 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
 	return dir && dir->nr;
 }
 
+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen)
+{
+	struct dir_entry *dir;
+
+	lazy_init_name_hash(istate);
+	dir = find_dir_entry(istate, name, namelen);
+
+	return dir ? dir->name : NULL;
+}
+
 void adjust_dirname_case(struct index_state *istate, char *name)
 {
 	const char *startPtr = name;
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index d5e2892526..d0ce48869f 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -304,4 +304,17 @@ test_expect_success 'sparse-checkout (init|set|disable) fails with dirty status'
 	git -C dirty sparse-checkout disable
 '
 
+test_expect_success 'cone mode: set with core.ignoreCase=true' '
+	test_when_finished git -C repo config --unset core.ignoreCase &&
+	git -C repo sparse-checkout init --cone &&
+	git -C repo config core.ignoreCase true &&
+	git -C repo sparse-checkout set Folder1 &&
+	cat >expect <<-EOF &&
+		/*
+		!/*/
+		/folder1/
+	EOF
+	test_cmp expect repo/.git/info/sparse-checkout
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-09 19:43 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
@ 2019-12-11 18:44   ` Junio C Hamano
  2019-12-11 19:11     ` Derrick Stolee
  2019-12-12 20:59   ` Derrick Stolee
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2019-12-11 18:44 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, szeder.dev, newren, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Another option would be to do case-insensitive checks while
> updating the skip-worktree bits during unpack_trees(). Outside of
> the potential performance loss on a more expensive code path, that
> also breaks compatibility with older versions of Git as using the
> same sparse-checkout file would change the paths that are included.

I haven't thought things through, but that sounds as if saying that
we cannot fix old bugs.  We use the entries in the index as a hint
for "correcting" a path to the right case on a case-insensitive
filesystem to avoid corrupting the case in the index, which is case
sensitive and is a way to convey the intended case (by writing them
out to a tree object) to those who use systems that can reliably
reproduce cases in pathnames.  But that still does not mean on a
case-insensitive filesystem, "hello.c" in the "right" case recorded
in the index will always be spelled like so---somebody can make
readdir() or equivalent to yield "Hello.c" for it, and if the user
tells us to say they want to do X (e.g. ignore, skip checkout, etc.)
to "hello.c", we should do the same to "Hello.c" on such a system, I
would think.

If the runtime needs to deal with the case insensitive filesystems
anyway (and I suspect it does), there isn't much point matching and
forcing the case to the paths in the index like this patch does, I
think.  So...


> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt |  4 ++++
>  builtin/sparse-checkout.c             | 19 +++++++++++++++++--
>  cache.h                               |  1 +
>  name-hash.c                           | 10 ++++++++++
>  t/t1091-sparse-checkout-builtin.sh    | 13 +++++++++++++
>  5 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index b975285673..849efa0f0b 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -150,6 +150,10 @@ expecting patterns of these types. Git will warn if the patterns do not match.
>  If the patterns do match the expected format, then Git will use faster hash-
>  based algorithms to compute inclusion in the sparse-checkout.
>  
> +If `core.ignoreCase=true`, then the 'git sparse-checkout set' command will
> +correct for incorrect case when assigning patterns to the sparse-checkout
> +file.
> +
>  SEE ALSO
>  --------
>  
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index a542d617a5..0de426384e 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -336,6 +336,22 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
>  	}
>  }
>  
> +static void sanitize_cone_input(struct strbuf *line)
> +{
> +	if (ignore_case) {
> +		struct index_state *istate = the_repository->index;
> +		const char *name = index_dir_matching_name(istate, line->buf, line->len);
> +
> +		if (name) {
> +			strbuf_setlen(line, 0);
> +			strbuf_addstr(line, name);
> +		}
> +	}
> +
> +	if (line->buf[0] != '/')
> +		strbuf_insert(line, 0, "/", 1);
> +}
> +
>  static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
>  {
>  	strbuf_trim(line);
> @@ -345,8 +361,7 @@ static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
>  	if (!line->len)
>  		return;
>  
> -	if (line->buf[0] != '/')
> -		strbuf_insert(line, 0, "/", 1);
> +	sanitize_cone_input(line);
>  
>  	insert_recursive_pattern(pl, line);
>  }
> diff --git a/cache.h b/cache.h
> index d3c89e7a53..a2d9d437f0 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -728,6 +728,7 @@ int repo_index_has_changes(struct repository *repo,
>  int verify_path(const char *path, unsigned mode);
>  int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
>  int index_dir_exists(struct index_state *istate, const char *name, int namelen);
> +const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen);
>  void adjust_dirname_case(struct index_state *istate, char *name);
>  struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
>  
> diff --git a/name-hash.c b/name-hash.c
> index ceb1d7bd6f..46898b6571 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -681,6 +681,16 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
>  	return dir && dir->nr;
>  }
>  
> +const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen)
> +{
> +	struct dir_entry *dir;
> +
> +	lazy_init_name_hash(istate);
> +	dir = find_dir_entry(istate, name, namelen);
> +
> +	return dir ? dir->name : NULL;
> +}
> +
>  void adjust_dirname_case(struct index_state *istate, char *name)
>  {
>  	const char *startPtr = name;
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index d5e2892526..d0ce48869f 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -304,4 +304,17 @@ test_expect_success 'sparse-checkout (init|set|disable) fails with dirty status'
>  	git -C dirty sparse-checkout disable
>  '
>  
> +test_expect_success 'cone mode: set with core.ignoreCase=true' '
> +	test_when_finished git -C repo config --unset core.ignoreCase &&
> +	git -C repo sparse-checkout init --cone &&
> +	git -C repo config core.ignoreCase true &&
> +	git -C repo sparse-checkout set Folder1 &&
> +	cat >expect <<-EOF &&
> +		/*
> +		!/*/
> +		/folder1/
> +	EOF
> +	test_cmp expect repo/.git/info/sparse-checkout
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-11 18:44   ` Junio C Hamano
@ 2019-12-11 19:11     ` Derrick Stolee
  2019-12-11 20:00       ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Derrick Stolee @ 2019-12-11 19:11 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, szeder.dev, newren, Derrick Stolee

On 12/11/2019 1:44 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Another option would be to do case-insensitive checks while
>> updating the skip-worktree bits during unpack_trees(). Outside of
>> the potential performance loss on a more expensive code path, that
>> also breaks compatibility with older versions of Git as using the
>> same sparse-checkout file would change the paths that are included.
> 
> I haven't thought things through, but that sounds as if saying that
> we cannot fix old bugs.  We use the entries in the index as a hint
> for "correcting" a path to the right case on a case-insensitive
> filesystem to avoid corrupting the case in the index, which is case
> sensitive and is a way to convey the intended case (by writing them
> out to a tree object) to those who use systems that can reliably
> reproduce cases in pathnames.  But that still does not mean on a
> case-insensitive filesystem, "hello.c" in the "right" case recorded
> in the index will always be spelled like so---somebody can make
> readdir() or equivalent to yield "Hello.c" for it, and if the user
> tells us to say they want to do X (e.g. ignore, skip checkout, etc.)
> to "hello.c", we should do the same to "Hello.c" on such a system, I
> would think.
> 
> If the runtime needs to deal with the case insensitive filesystems
> anyway (and I suspect it does), there isn't much point matching and
> forcing the case to the paths in the index like this patch does, I
> think.  So...

I'm trying to find a way around these two ideas:

1. The index is case-sensitive, and the sparse-checkout patterns are
   case-sensitive.

2. If a filesystem is case-insensitive, most index-change operations
   already succeed with incorrect case, especially with core.ignoreCase
   enabled.

The approach I have is to allow a user to provide a case that does not
match the index, and then we store the pattern in the sparse-checkout
that matches the case in the index.

Another approach would be to make the sparse-checkout patterns be
case-insensitive when core.ignoreCase is enabled. I chose against
this due to the compatibility and the performance cost. We would
likely pay that performance penalty even if all the patterns have
the correct case. It violates the existing "contract" with the
sparse-checkout file, and that is probably what you are talking about
with "we cannot fix old bugs".

It sounds like you are preferring this second option, despite the
performance costs. It is possible to use a case-insensitive hashing
algorithm when in the cone mode, but I'm not sure about how to do
a similar concept for the normal mode.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-11 19:11     ` Derrick Stolee
@ 2019-12-11 20:00       ` Junio C Hamano
  2019-12-11 20:29         ` Derrick Stolee
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2019-12-11 20:00 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, szeder.dev, newren,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> I'm trying to find a way around these two ideas:
>
> 1. The index is case-sensitive, and the sparse-checkout patterns are
>    case-sensitive.

OK.  The latter is local to the repository and not shared to the
world where people with case sensitive systems would live, right?

> 2. If a filesystem is case-insensitive, most index-change operations
>    already succeed with incorrect case, especially with core.ignoreCase
>    enabled.

I am not sure about "incorrect", though.  

My understanding is that modern case-insensitive systems are not
information-destroying [*1*].  A new file you create as "ReadMe.txt"
on your filesystem would be seen in that mixed case spelling via
readdir() or equivalent, so adding it to the index as-is would
likely be in the "correct" case, no?  If, after adding that path to
the index, you did "rm ReadMe.txt" and created "README.TXT", surely
we won't have both ReadMe.txt and README.TXT in the index with
ignoreCase set, and keep using ReadMe.txt that matches what you
added to the index.  I am not sure which one is the "incorrect" case
in such a scenario.

> The approach I have is to allow a user to provide a case that does not
> match the index, and then we store the pattern in the sparse-checkout
> that matches the case in the index.

Yes, I understood that from your proposed log message and cover
letter.  They were very clearly written.

But imagine that your user writes ABC in the sparse pattern file,
and there is no abc anything in the index in any case permutations.

When you check out a branch that has Abc, shouldn't the pattern ABC
affect the operation just like a pattern Abc would on a case
insensitive system?

Or are end users perfectly aware that the thing in that branch is
spelled "Abc" and not "ABC" (the data in Git does---it comes from a
tree object that is case sensitive)?  If so, then the pattern ABC
should not affect the subtree whose name is "Abc" even on a case
insensitive system.

I am not sure what the design of this part of the system expects out
of the end user.  Perhaps keeping the patterns case insensitive and
tell the end users to spell them correctly is the right way to go, I
guess, if it is just the filesystem that cannot represente the cases
correctly at times and the users are perfectly capable of telling
the right and wrong cases apart.

But then I am not sure why correcting misspelled names by comparing
them with the index entries is a good idea, either.

> It sounds like you are preferring this second option, despite the
> performance costs. It is possible to use a case-insensitive hashing
> algorithm when in the cone mode, but I'm not sure about how to do
> a similar concept for the normal mode.

I have no strong preference, other than that I prefer to see things
being done consistently.  Don't we already use case folding hash
function to ensure that we won't add duplicate to the index on
case-insensitive system?  I suspect that we would need to do the
same, whether in cone mode or just a normal sparse-checkout mode
consistently.

Thanks.


[Footnote]

*1* ... unlike HFS+ where everything is forced to NKD and a bit of
information---whether the original was in NKC or NKD---is discarded
forever.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-11 20:00       ` Junio C Hamano
@ 2019-12-11 20:29         ` Derrick Stolee
  2019-12-11 21:37           ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Derrick Stolee @ 2019-12-11 20:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, szeder.dev, newren,
	Derrick Stolee

On 12/11/2019 3:00 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> I'm trying to find a way around these two ideas:
>>
>> 1. The index is case-sensitive, and the sparse-checkout patterns are
>>    case-sensitive.
> 
> OK.  The latter is local to the repository and not shared to the
> world where people with case sensitive systems would live, right?

Yes, this information is local to a specific local copy. Even worktrees
have distinct sparse-checkout files.

>> 2. If a filesystem is case-insensitive, most index-change operations
>>    already succeed with incorrect case, especially with core.ignoreCase
>>    enabled.
> 
> I am not sure about "incorrect", though.  
> 
> My understanding is that modern case-insensitive systems are not
> information-destroying [*1*].  A new file you create as "ReadMe.txt"
> on your filesystem would be seen in that mixed case spelling via
> readdir() or equivalent, so adding it to the index as-is would
> likely be in the "correct" case, no?  If, after adding that path to
> the index, you did "rm ReadMe.txt" and created "README.TXT", surely
> we won't have both ReadMe.txt and README.TXT in the index with
> ignoreCase set, and keep using ReadMe.txt that matches what you
> added to the index.  I am not sure which one is the "incorrect" case
> in such a scenario.

The specific example I have in my mind is that the filesystem can have
"README.TXT" as what it would report by readdir(), but a user can type

	git add readme.txt

Without core.ignoreCase, the index will store the path as "readme.txt",
possibly adding a new entry in the index next to "README.TXT". If
core.ignoreCase is enabled, then the case reported by readdir() is used.
(This works both for a tracked and untracked path.)

>> The approach I have is to allow a user to provide a case that does not
>> match the index, and then we store the pattern in the sparse-checkout
>> that matches the case in the index.
> 
> Yes, I understood that from your proposed log message and cover
> letter.  They were very clearly written.
> 
> But imagine that your user writes ABC in the sparse pattern file,
> and there is no abc anything in the index in any case permutations.
> 
> When you check out a branch that has Abc, shouldn't the pattern ABC
> affect the operation just like a pattern Abc would on a case
> insensitive system?
> 
> Or are end users perfectly aware that the thing in that branch is
> spelled "Abc" and not "ABC" (the data in Git does---it comes from a
> tree object that is case sensitive)?  If so, then the pattern ABC
> should not affect the subtree whose name is "Abc" even on a case
> insensitive system.

This is a case that is difficult to control for. We have no source
of truth (filesystem or index) at the time of the 'git sparse-checkout
set' command. I cannot think of a solution to this specific issue
without doing the more-costly approach on every unpack_trees() call.

I believe this case may be narrow enough to "document and don't-fix",
but please speak up if anyone thinks this _must_ be fixed.

The other thing I can say is that my current approach _could_ be
replaced in the future by the more invasive check in unpack_trees().
The behavior change would be invisible to users who don't inspect
their sparse-checkout files other than this narrow case.

Specifically, our internal customer is planning to update the
sparse-checkout cone based on files in the workdir, which means that
the paths are expected to be in the index at the time of the 'set'
call.

> I am not sure what the design of this part of the system expects out
> of the end user.  Perhaps keeping the patterns case insensitive and
> tell the end users to spell them correctly is the right way to go, I
> guess, if it is just the filesystem that cannot represente the cases
> correctly at times and the users are perfectly capable of telling
> the right and wrong cases apart.

My first reaction to this internal request was "just clean up your
data." The issue is keeping it clean as users are changing the data
often and the only current checks are whether the build passes (and
the build "sees" the files because the filesystem accepts the wrong
case).

The "git add" behavior made me think there was sufficient precedent
in Git to try this change.

I'm happy to follow the community's opinion in this matter, including
a hard "we will not correct for case in this feature" or "if you want
this then you'll pay for it in performance." In the latter case I'd
prefer a new config option to toggle the sparse-checkout case
insensitivity. That way users could have core.ignoreCase behavior without
the perf hit if they use clean input to the sparse-checkout feature.

> But then I am not sure why correcting misspelled names by comparing
> them with the index entries is a good idea, either.

Right, that seems like a line too far.

>> It sounds like you are preferring this second option, despite the
>> performance costs. It is possible to use a case-insensitive hashing
>> algorithm when in the cone mode, but I'm not sure about how to do
>> a similar concept for the normal mode.
> 
> I have no strong preference, other than that I prefer to see things
> being done consistently.  Don't we already use case folding hash
> function to ensure that we won't add duplicate to the index on
> case-insensitive system?  I suspect that we would need to do the
> same, whether in cone mode or just a normal sparse-checkout mode
> consistently.

Since the cone mode uses a hashset to match the paths to the patterns,
that conversion is possible. I'm not sure how to start making arbitrary
regexes be case-insensitive in the non-cone-mode case. Suggestions?

Thank you for the discussion. I was hoping to get feedback on this
approach, which is why this patch is isolated to only this issue.

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-11 20:29         ` Derrick Stolee
@ 2019-12-11 21:37           ` Junio C Hamano
  2019-12-12  2:51             ` Derrick Stolee
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2019-12-11 21:37 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, szeder.dev, newren,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> The specific example I have in my mind is that the filesystem can have
> "README.TXT" as what it would report by readdir(), but a user can type
>
> 	git add readme.txt
>
> Without core.ignoreCase, the index will store the path as "readme.txt",
> possibly adding a new entry in the index next to "README.TXT". If
> core.ignoreCase is enabled, then the case reported by readdir() is used.
> (This works both for a tracked and untracked path.)

Yes, but which one is "correct"?  readme.txt read from the index or
README.TXT read from the filesystem?  Presumably, when the paths got
first checked out of the index, it would have been in "readme.txt"
on the filesystem, so the user must have done on purpose something
to cause the file to be named in all uppercase since then.

> This is a case that is difficult to control for. We have no source
> of truth (filesystem or index) at the time of the 'git sparse-checkout
> set' command. I cannot think of a solution to this specific issue
> without doing the more-costly approach on every unpack_trees() call.
>
> I believe this case may be narrow enough to "document and don't-fix",
> but please speak up if anyone thinks this _must_ be fixed.

I do not think it "must be" fixed in the sense that "if it hurts,
don't do so" is a sufficient remedy.  But then for exactly the same
reason, I do not think it is worth doing what you do in this patch.

On the other hand, I think runtime case folding, just like what we
do when "git add" is told to handle a path in different case, would
be the only "right" way to match end-user expectations on a case
insensitive system, even though that is a "nice to do so" and
certainly not a "must do so" thing.

> The "git add" behavior made me think there was sufficient precedent
> in Git to try this change.

Since I view that the "git add" behaviour is a "nice to do so"
runtime conversion, I would actually think the approach you
discarded would be more in line with the precedent.

> I'm happy to follow the community's opinion in this matter, including
> a hard "we will not correct for case in this feature" or "if you want
> this then you'll pay for it in performance." In the latter case I'd
> prefer a new config option to toggle the sparse-checkout case
> insensitivity. That way users could have core.ignoreCase behavior without
> the perf hit if they use clean input to the sparse-checkout feature.

I value correctness more---a system that does incorrect thing
quickly is not all that interesting.

Assuming that your users are sensible and feed good data, how much
"performance hit" are we talking about?  Wouldn't this be something
we can make into a "if you have the paths in the correct case, we'll
see the match just as quickly as in the case sensitive system, so
most of the time there is no penalty, but for correctness we would
also fall back to try different cases"?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-11 21:37           ` Junio C Hamano
@ 2019-12-12  2:51             ` Derrick Stolee
  0 siblings, 0 replies; 16+ messages in thread
From: Derrick Stolee @ 2019-12-12  2:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, szeder.dev, newren,
	Derrick Stolee

On 12/11/2019 4:37 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> The specific example I have in my mind is that the filesystem can have
>> "README.TXT" as what it would report by readdir(), but a user can type
>>
>> 	git add readme.txt
>>
>> Without core.ignoreCase, the index will store the path as "readme.txt",
>> possibly adding a new entry in the index next to "README.TXT". If
>> core.ignoreCase is enabled, then the case reported by readdir() is used.
>> (This works both for a tracked and untracked path.)
> 
> Yes, but which one is "correct"?  readme.txt read from the index or
> README.TXT read from the filesystem?  Presumably, when the paths got
> first checked out of the index, it would have been in "readme.txt"
> on the filesystem, so the user must have done on purpose something
> to cause the file to be named in all uppercase since then.

Ah. The issue here is that with core.ignoreCase=true, the index and
filesystem agree. It's just the user input that differs.

>> This is a case that is difficult to control for. We have no source
>> of truth (filesystem or index) at the time of the 'git sparse-checkout
>> set' command. I cannot think of a solution to this specific issue
>> without doing the more-costly approach on every unpack_trees() call.
>>
>> I believe this case may be narrow enough to "document and don't-fix",
>> but please speak up if anyone thinks this _must_ be fixed.
> 
> I do not think it "must be" fixed in the sense that "if it hurts,
> don't do so" is a sufficient remedy.  But then for exactly the same
> reason, I do not think it is worth doing what you do in this patch.
> 
> On the other hand, I think runtime case folding, just like what we
> do when "git add" is told to handle a path in different case, would
> be the only "right" way to match end-user expectations on a case
> insensitive system, even though that is a "nice to do so" and
> certainly not a "must do so" thing.
> 
>> The "git add" behavior made me think there was sufficient precedent
>> in Git to try this change.
> 
> Since I view that the "git add" behaviour is a "nice to do so"
> runtime conversion, I would actually think the approach you
> discarded would be more in line with the precedent.
> 
>> I'm happy to follow the community's opinion in this matter, including
>> a hard "we will not correct for case in this feature" or "if you want
>> this then you'll pay for it in performance." In the latter case I'd
>> prefer a new config option to toggle the sparse-checkout case
>> insensitivity. That way users could have core.ignoreCase behavior without
>> the perf hit if they use clean input to the sparse-checkout feature.
> 
> I value correctness more---a system that does incorrect thing
> quickly is not all that interesting.
> 
> Assuming that your users are sensible and feed good data, how much
> "performance hit" are we talking about?

I'll need to build a prototype to test the performance hit. Maybe
I'm overestimating the effect of using a case-insensitive hash.

> Wouldn't this be something
> we can make into a "if you have the paths in the correct case, we'll
> see the match just as quickly as in the case sensitive system, so
> most of the time there is no penalty, but for correctness we would
> also fall back to try different cases"?

I think we would want the config option present to say "my data may
not be in the proper case, please do extra work for me". I can't
think of a way to do the fallback in real-time.

I'll try again with the case-insensitive hash algorithm and see
how that shakes out.

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-09 19:43 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
  2019-12-11 18:44   ` Junio C Hamano
@ 2019-12-12 20:59   ` Derrick Stolee
  2019-12-13 19:05     ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Derrick Stolee @ 2019-12-12 20:59 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: szeder.dev, newren, Derrick Stolee, Junio C Hamano

On 12/9/2019 2:43 PM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> When a user uses the sparse-checkout feature in cone mode, they
> add patterns using "git sparse-checkout set <dir1> <dir2> ..."
> or by using "--stdin" to provide the directories line-by-line over
> stdin. This behaviour naturally looks a lot like the way a user
> would type "git add <dir1> <dir2> ..."

Junio:

While I was updating my GGG PR [1] with new logic, I see that you
added this commit to ds/sparse-cone. I didn't intend to these
to be in the same topic, and I was hoping this discussion doesn't
delay the other commits in ds/sparse-cone from merging.

If you have a plan for that branch and the merge status of those
commits, then I'm happy to re-target my PR against 'next' or
an equivalent branch.

Thanks,
-Stolee

[1] https://github.com/gitgitgadget/git/pull/488


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 0/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-09 19:42 [PATCH 0/1] sparse-checkout: respect core.ignoreCase in cone mode Derrick Stolee via GitGitGadget
  2019-12-09 19:43 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
@ 2019-12-13 18:09 ` Derrick Stolee via GitGitGadget
  2019-12-13 18:09   ` [PATCH v2 1/1] " Derrick Stolee via GitGitGadget
  2019-12-13 19:58   ` [PATCH v2 0/1] " Junio C Hamano
  2019-12-18 11:41 ` [PATCH " Ed Maste
  2 siblings, 2 replies; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-12-13 18:09 UTC (permalink / raw)
  To: git; +Cc: szeder.dev, newren, Derrick Stolee, Junio C Hamano

This is the first of several cleanups to the sparse-checkout feature that
had a large overhaul in ds/sparse-cone.

We have an internal customer that plans to use sparse-checkout as a core
feature of their version control experience. They use a case-insensitive
filesystem, so as they created their directory dependency graph they did not
keep consistent casing. This data is what they plan to feed to "git
sparse-checkout set".

While I would certainly prefer that the data is cleaned up (and that process
is ongoing), I could not argue with the logic that git add does the "right
thing" when core.ignoreCase is enabled.

Update in V2:

Based on excellent feedback from Junio, we decided to shore up the case
issues from the previous implementation. Now, the sparse-checkout file may
be stored with the "wrong" case but the hash algorithm is replaced with a
case-insensitive check when core.ignoreCase is set. Because the performance
impact is much lower than I anticipated, I no longer think it is important
to add a new config option to enable/disable this logic.

Thanks, -Stolee

Derrick Stolee (1):
  sparse-checkout: respect core.ignoreCase in cone mode

 Documentation/git-sparse-checkout.txt |  5 +++++
 builtin/sparse-checkout.c             | 10 ++++++++--
 dir.c                                 | 15 ++++++++++++---
 t/t1091-sparse-checkout-builtin.sh    | 17 +++++++++++++++++
 4 files changed, 42 insertions(+), 5 deletions(-)


base-commit: cff4e9138d8df45e3b6199171092ee781cdadaeb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-488%2Fderrickstolee%2Fsparse-case-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-488/derrickstolee/sparse-case-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/488

Range-diff vs v1:

 1:  23705845ce ! 1:  dc97d5a0d2 sparse-checkout: respect core.ignoreCase in cone mode
     @@ -12,23 +12,42 @@
          using a case-insensitive match. Do the same for the sparse-checkout
          feature.
      
     -    The sanitize_cone_input() method is named to imply that other checks
     -    may be added. In fact, such checks are planned including looking for
     -    wildcards that make the paths invalid cone patterns or must be
     -    escaped.
     -
     -    Specifically, if the path has a match in the index, then use that
     -    path instead. If there is no match, still add that path to the
     -    patterns, as the user may expect the directory to appear after a
     -    checkout to another ref. However, we have no matching path to
     -    correct for a case conflict, and must assume that the user provided
     -    the correct case.
     -
     -    Another option would be to do case-insensitive checks while
     -    updating the skip-worktree bits during unpack_trees(). Outside of
     -    the potential performance loss on a more expensive code path, that
     -    also breaks compatibility with older versions of Git as using the
     -    same sparse-checkout file would change the paths that are included.
     +    Perform case-insensitive checks while updating the skip-worktree
     +    bits during unpack_trees(). This is done by changing the hash
     +    algorithm and hashmap comparison methods to optionally use case-
     +    insensitive methods.
     +
     +    When this is enabled, there is a small performance cost in the
     +    hashing algorithm. To tease out the worst possible case, the
     +    following was run on a repo with a deep directory structure:
     +
     +            git ls-tree -d -r --name-only HEAD |
     +                    git sparse-checkout set --stdin
     +
     +    The 'set' command was timed with core.ignoreCase disabled or
     +    enabled. For the repo with a deep history, the numbers were
     +
     +            core.ignoreCase=false: 62s
     +            core.ignoreCase=true:  74s (+19.3%)
     +
     +    For reproducibility, the equivalent test on the Linux kernel
     +    repository had these numbers:
     +
     +            core.ignoreCase=false: 3.1s
     +            core.ignoreCase=true:  3.6s (+16%)
     +
     +    Now, this is not an entirely fair comparison, as most users
     +    will define their sparse cone using more shallow directories,
     +    and the performance improvement from eb42feca97 ("unpack-trees:
     +    hash less in cone mode" 2019-11-21) can remove most of the
     +    hash cost. For a more realistic test, drop the "-r" from the
     +    ls-tree command to store only the first-level directories.
     +    In that case, the Linux kernel repository takes 0.2-0.25s in
     +    each case, and the deep repository takes one second, plus or
     +    minus 0.05s, in each case.
     +
     +    Thus, we _can_ demonstrate a cost to this change, but it is
     +    unlikely to matter to any reasonable sparse-checkout cone.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ -39,9 +58,10 @@
       If the patterns do match the expected format, then Git will use faster hash-
       based algorithms to compute inclusion in the sparse-checkout.
       
     -+If `core.ignoreCase=true`, then the 'git sparse-checkout set' command will
     -+correct for incorrect case when assigning patterns to the sparse-checkout
     -+file.
     ++If `core.ignoreCase=true`, then the pattern-matching algorithm will use a
     ++case-insensitive check. This corrects for case mismatched filenames in the
     ++'git sparse-checkout set' command to reflect the expected cone in the working
     ++directory.
      +
       SEE ALSO
       --------
     @@ -51,71 +71,76 @@
       --- a/builtin/sparse-checkout.c
       +++ b/builtin/sparse-checkout.c
      @@
     - 	}
     - }
     + 	struct pattern_entry *e = xmalloc(sizeof(*e));
     + 	e->patternlen = path->len;
     + 	e->pattern = strbuf_detach(path, NULL);
     +-	hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
     ++	hashmap_entry_init(&e->ent,
     ++			   ignore_case ?
     ++			   strihash(e->pattern) :
     ++			   strhash(e->pattern));
     + 
     + 	hashmap_add(&pl->recursive_hashmap, &e->ent);
       
     -+static void sanitize_cone_input(struct strbuf *line)
     -+{
     -+	if (ignore_case) {
     -+		struct index_state *istate = the_repository->index;
     -+		const char *name = index_dir_matching_name(istate, line->buf, line->len);
     -+
     -+		if (name) {
     -+			strbuf_setlen(line, 0);
     -+			strbuf_addstr(line, name);
     -+		}
     -+	}
     -+
     -+	if (line->buf[0] != '/')
     -+		strbuf_insert(line, 0, "/", 1);
     -+}
     -+
     - static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
     - {
     - 	strbuf_trim(line);
      @@
     - 	if (!line->len)
     - 		return;
     + 		e = xmalloc(sizeof(struct pattern_entry));
     + 		e->patternlen = newlen;
     + 		e->pattern = xstrndup(oldpattern, newlen);
     +-		hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
     ++		hashmap_entry_init(&e->ent,
     ++				   ignore_case ?
     ++				   strihash(e->pattern) :
     ++				   strhash(e->pattern));
       
     --	if (line->buf[0] != '/')
     --		strbuf_insert(line, 0, "/", 1);
     -+	sanitize_cone_input(line);
     + 		if (!hashmap_get_entry(&pl->parent_hashmap, e, ent, NULL))
     + 			hashmap_add(&pl->parent_hashmap, &e->ent);
     +
     + diff --git a/dir.c b/dir.c
     + --- a/dir.c
     + +++ b/dir.c
     +@@
     + 			 ? ee1->patternlen
     + 			 : ee2->patternlen;
       
     - 	insert_recursive_pattern(pl, line);
     ++	if (ignore_case)
     ++		return strncasecmp(ee1->pattern, ee2->pattern, min_len);
     + 	return strncmp(ee1->pattern, ee2->pattern, min_len);
       }
     -
     - diff --git a/cache.h b/cache.h
     - --- a/cache.h
     - +++ b/cache.h
     + 
      @@
     - int verify_path(const char *path, unsigned mode);
     - int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
     - int index_dir_exists(struct index_state *istate, const char *name, int namelen);
     -+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen);
     - void adjust_dirname_case(struct index_state *istate, char *name);
     - struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
     + 		translated->pattern = truncated;
     + 		translated->patternlen = given->patternlen - 2;
     + 		hashmap_entry_init(&translated->ent,
     +-				   memhash(translated->pattern, translated->patternlen));
     ++				   ignore_case ?
     ++				   strihash(translated->pattern) :
     ++				   strhash(translated->pattern));
       
     -
     - diff --git a/name-hash.c b/name-hash.c
     - --- a/name-hash.c
     - +++ b/name-hash.c
     + 		if (!hashmap_get_entry(&pl->recursive_hashmap,
     + 				       translated, ent, NULL)) {
      @@
     - 	return dir && dir->nr;
     + 	translated->pattern = xstrdup(given->pattern);
     + 	translated->patternlen = given->patternlen;
     + 	hashmap_entry_init(&translated->ent,
     +-			   memhash(translated->pattern, translated->patternlen));
     ++			   ignore_case ?
     ++			   strihash(translated->pattern) :
     ++			   strhash(translated->pattern));
     + 
     + 	hashmap_add(&pl->recursive_hashmap, &translated->ent);
     + 
     +@@
     + 	/* Check straight mapping */
     + 	p.pattern = pattern->buf;
     + 	p.patternlen = pattern->len;
     +-	hashmap_entry_init(&p.ent, memhash(p.pattern, p.patternlen));
     ++	hashmap_entry_init(&p.ent,
     ++			   ignore_case ?
     ++			   strihash(p.pattern) :
     ++			   strhash(p.pattern));
     + 	return !!hashmap_get_entry(map, &p, ent, NULL);
       }
       
     -+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen)
     -+{
     -+	struct dir_entry *dir;
     -+
     -+	lazy_init_name_hash(istate);
     -+	dir = find_dir_entry(istate, name, namelen);
     -+
     -+	return dir ? dir->name : NULL;
     -+}
     -+
     - void adjust_dirname_case(struct index_state *istate, char *name)
     - {
     - 	const char *startPtr = name;
      
       diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
       --- a/t/t1091-sparse-checkout-builtin.sh
     @@ -125,16 +150,20 @@
       '
       
      +test_expect_success 'cone mode: set with core.ignoreCase=true' '
     -+	test_when_finished git -C repo config --unset core.ignoreCase &&
      +	git -C repo sparse-checkout init --cone &&
     -+	git -C repo config core.ignoreCase true &&
     -+	git -C repo sparse-checkout set Folder1 &&
     ++	git -C repo -c core.ignoreCase=true sparse-checkout set folder1 &&
      +	cat >expect <<-EOF &&
      +		/*
      +		!/*/
      +		/folder1/
      +	EOF
     -+	test_cmp expect repo/.git/info/sparse-checkout
     ++	test_cmp expect repo/.git/info/sparse-checkout &&
     ++	ls repo >dir &&
     ++	cat >expect <<-EOF &&
     ++		a
     ++		folder1
     ++	EOF
     ++	test_cmp expect dir
      +'
      +
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-13 18:09 ` [PATCH v2 0/1] " Derrick Stolee via GitGitGadget
@ 2019-12-13 18:09   ` Derrick Stolee via GitGitGadget
  2019-12-13 19:58   ` [PATCH v2 0/1] " Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-12-13 18:09 UTC (permalink / raw)
  To: git; +Cc: szeder.dev, newren, Derrick Stolee, Junio C Hamano,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When a user uses the sparse-checkout feature in cone mode, they
add patterns using "git sparse-checkout set <dir1> <dir2> ..."
or by using "--stdin" to provide the directories line-by-line over
stdin. This behaviour naturally looks a lot like the way a user
would type "git add <dir1> <dir2> ..."

If core.ignoreCase is enabled, then "git add" will match the input
using a case-insensitive match. Do the same for the sparse-checkout
feature.

Perform case-insensitive checks while updating the skip-worktree
bits during unpack_trees(). This is done by changing the hash
algorithm and hashmap comparison methods to optionally use case-
insensitive methods.

When this is enabled, there is a small performance cost in the
hashing algorithm. To tease out the worst possible case, the
following was run on a repo with a deep directory structure:

	git ls-tree -d -r --name-only HEAD |
		git sparse-checkout set --stdin

The 'set' command was timed with core.ignoreCase disabled or
enabled. For the repo with a deep history, the numbers were

	core.ignoreCase=false: 62s
	core.ignoreCase=true:  74s (+19.3%)

For reproducibility, the equivalent test on the Linux kernel
repository had these numbers:

	core.ignoreCase=false: 3.1s
	core.ignoreCase=true:  3.6s (+16%)

Now, this is not an entirely fair comparison, as most users
will define their sparse cone using more shallow directories,
and the performance improvement from eb42feca97 ("unpack-trees:
hash less in cone mode" 2019-11-21) can remove most of the
hash cost. For a more realistic test, drop the "-r" from the
ls-tree command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in
each case, and the deep repository takes one second, plus or
minus 0.05s, in each case.

Thus, we _can_ demonstrate a cost to this change, but it is
unlikely to matter to any reasonable sparse-checkout cone.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt |  5 +++++
 builtin/sparse-checkout.c             | 10 ++++++++--
 dir.c                                 | 15 ++++++++++++---
 t/t1091-sparse-checkout-builtin.sh    | 17 +++++++++++++++++
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index b975285673..9c3c66cc37 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -150,6 +150,11 @@ expecting patterns of these types. Git will warn if the patterns do not match.
 If the patterns do match the expected format, then Git will use faster hash-
 based algorithms to compute inclusion in the sparse-checkout.
 
+If `core.ignoreCase=true`, then the pattern-matching algorithm will use a
+case-insensitive check. This corrects for case mismatched filenames in the
+'git sparse-checkout set' command to reflect the expected cone in the working
+directory.
+
 SEE ALSO
 --------
 
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index a542d617a5..5d62f7a66d 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -313,7 +313,10 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
 	struct pattern_entry *e = xmalloc(sizeof(*e));
 	e->patternlen = path->len;
 	e->pattern = strbuf_detach(path, NULL);
-	hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
+	hashmap_entry_init(&e->ent,
+			   ignore_case ?
+			   strihash(e->pattern) :
+			   strhash(e->pattern));
 
 	hashmap_add(&pl->recursive_hashmap, &e->ent);
 
@@ -329,7 +332,10 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
 		e = xmalloc(sizeof(struct pattern_entry));
 		e->patternlen = newlen;
 		e->pattern = xstrndup(oldpattern, newlen);
-		hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
+		hashmap_entry_init(&e->ent,
+				   ignore_case ?
+				   strihash(e->pattern) :
+				   strhash(e->pattern));
 
 		if (!hashmap_get_entry(&pl->parent_hashmap, e, ent, NULL))
 			hashmap_add(&pl->parent_hashmap, &e->ent);
diff --git a/dir.c b/dir.c
index 2ef92a50a0..22d08e61c2 100644
--- a/dir.c
+++ b/dir.c
@@ -625,6 +625,8 @@ int pl_hashmap_cmp(const void *unused_cmp_data,
 			 ? ee1->patternlen
 			 : ee2->patternlen;
 
+	if (ignore_case)
+		return strncasecmp(ee1->pattern, ee2->pattern, min_len);
 	return strncmp(ee1->pattern, ee2->pattern, min_len);
 }
 
@@ -665,7 +667,9 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 		translated->pattern = truncated;
 		translated->patternlen = given->patternlen - 2;
 		hashmap_entry_init(&translated->ent,
-				   memhash(translated->pattern, translated->patternlen));
+				   ignore_case ?
+				   strihash(translated->pattern) :
+				   strhash(translated->pattern));
 
 		if (!hashmap_get_entry(&pl->recursive_hashmap,
 				       translated, ent, NULL)) {
@@ -694,7 +698,9 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 	translated->pattern = xstrdup(given->pattern);
 	translated->patternlen = given->patternlen;
 	hashmap_entry_init(&translated->ent,
-			   memhash(translated->pattern, translated->patternlen));
+			   ignore_case ?
+			   strihash(translated->pattern) :
+			   strhash(translated->pattern));
 
 	hashmap_add(&pl->recursive_hashmap, &translated->ent);
 
@@ -724,7 +730,10 @@ static int hashmap_contains_path(struct hashmap *map,
 	/* Check straight mapping */
 	p.pattern = pattern->buf;
 	p.patternlen = pattern->len;
-	hashmap_entry_init(&p.ent, memhash(p.pattern, p.patternlen));
+	hashmap_entry_init(&p.ent,
+			   ignore_case ?
+			   strihash(p.pattern) :
+			   strhash(p.pattern));
 	return !!hashmap_get_entry(map, &p, ent, NULL);
 }
 
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index d5e2892526..cee98a1c8a 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -304,4 +304,21 @@ test_expect_success 'sparse-checkout (init|set|disable) fails with dirty status'
 	git -C dirty sparse-checkout disable
 '
 
+test_expect_success 'cone mode: set with core.ignoreCase=true' '
+	git -C repo sparse-checkout init --cone &&
+	git -C repo -c core.ignoreCase=true sparse-checkout set folder1 &&
+	cat >expect <<-EOF &&
+		/*
+		!/*/
+		/folder1/
+	EOF
+	test_cmp expect repo/.git/info/sparse-checkout &&
+	ls repo >dir &&
+	cat >expect <<-EOF &&
+		a
+		folder1
+	EOF
+	test_cmp expect dir
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-12 20:59   ` Derrick Stolee
@ 2019-12-13 19:05     ` Junio C Hamano
  2019-12-13 19:40       ` Derrick Stolee
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2019-12-13 19:05 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, szeder.dev, newren,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> If you have a plan for that branch and the merge status of those
> commits, then I'm happy to re-target my PR against 'next' or
> an equivalent branch.

This change does not make much sense without the cone-mode topic,
no?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-13 19:05     ` Junio C Hamano
@ 2019-12-13 19:40       ` Derrick Stolee
  2019-12-13 22:02         ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Derrick Stolee @ 2019-12-13 19:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, szeder.dev, newren,
	Derrick Stolee

On 12/13/2019 2:05 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> If you have a plan for that branch and the merge status of those
>> commits, then I'm happy to re-target my PR against 'next' or
>> an equivalent branch.
> 
> This change does not make much sense without the cone-mode topic,
> no?

I'm saying it should go on top, as a new branch that depends on
ds/sparse-cone. Sorry for not making that clear.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-13 18:09 ` [PATCH v2 0/1] " Derrick Stolee via GitGitGadget
  2019-12-13 18:09   ` [PATCH v2 1/1] " Derrick Stolee via GitGitGadget
@ 2019-12-13 19:58   ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2019-12-13 19:58 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, szeder.dev, newren, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> case-insensitive check when core.ignoreCase is set. Because the performance
> impact is much lower than I anticipated, I no longer think it is important
> to add a new config option to enable/disable this logic.

Nice.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-13 19:40       ` Derrick Stolee
@ 2019-12-13 22:02         ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2019-12-13 22:02 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, szeder.dev, newren,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> On 12/13/2019 2:05 PM, Junio C Hamano wrote:
>> Derrick Stolee <stolee@gmail.com> writes:
>> 
>>> If you have a plan for that branch and the merge status of those
>>> commits, then I'm happy to re-target my PR against 'next' or
>>> an equivalent branch.
>> 
>> This change does not make much sense without the cone-mode topic,
>> no?
>
> I'm saying it should go on top, as a new branch that depends on
> ds/sparse-cone. Sorry for not making that clear.

I thought that it was a *bug* that ds/sparse-cone topic does not
match the pattern case insensitively on case insensitive systems,
and from that point of view, letting the earlier part of the topic
graduate without the fix would not make sense, I thought.

Thanks.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/1] sparse-checkout: respect core.ignoreCase in cone mode
  2019-12-09 19:42 [PATCH 0/1] sparse-checkout: respect core.ignoreCase in cone mode Derrick Stolee via GitGitGadget
  2019-12-09 19:43 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
  2019-12-13 18:09 ` [PATCH v2 0/1] " Derrick Stolee via GitGitGadget
@ 2019-12-18 11:41 ` Ed Maste
  2 siblings, 0 replies; 16+ messages in thread
From: Ed Maste @ 2019-12-18 11:41 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git

On Mon, 9 Dec 2019 at 14:43, Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first of several cleanups to the sparse-checkout feature that
> had a large overhaul in ds/sparse-cone.

After this landed in next t1091-sparse-checkout-builtin.sh failed in
my Cirrus-CI run on FreeBSD -
https://cirrus-ci.com/task/5085664047005696

The first failure is:

not ok 4 - git sparse-checkout init
#
# git -C repo sparse-checkout init &&
# cat >expect <<-EOF &&
# /*
# !/*/
# EOF
# test_cmp expect repo/.git/info/sparse-checkout &&
# test_cmp_config -C repo true core.sparsecheckout &&
# ls repo >dir &&
# echo a >expect &&
# test_cmp expect dir
#

I'm trying to reproduce locally and investigate.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-12-18 19:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09 19:42 [PATCH 0/1] sparse-checkout: respect core.ignoreCase in cone mode Derrick Stolee via GitGitGadget
2019-12-09 19:43 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
2019-12-11 18:44   ` Junio C Hamano
2019-12-11 19:11     ` Derrick Stolee
2019-12-11 20:00       ` Junio C Hamano
2019-12-11 20:29         ` Derrick Stolee
2019-12-11 21:37           ` Junio C Hamano
2019-12-12  2:51             ` Derrick Stolee
2019-12-12 20:59   ` Derrick Stolee
2019-12-13 19:05     ` Junio C Hamano
2019-12-13 19:40       ` Derrick Stolee
2019-12-13 22:02         ` Junio C Hamano
2019-12-13 18:09 ` [PATCH v2 0/1] " Derrick Stolee via GitGitGadget
2019-12-13 18:09   ` [PATCH v2 1/1] " Derrick Stolee via GitGitGadget
2019-12-13 19:58   ` [PATCH v2 0/1] " Junio C Hamano
2019-12-18 11:41 ` [PATCH " Ed Maste

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).