git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Re: BUG FOLLOWUP: Case insensitivity in worktrees
       [not found] <EEA65ED1-2BE0-41AD-84CC-780A9F4D9215@strongestfamilies.com>
@ 2020-07-23 15:20 ` Casey Meijer
  2020-07-24  1:19   ` brian m. carlson
  0 siblings, 1 reply; 8+ messages in thread
From: Casey Meijer @ 2020-07-23 15:20 UTC (permalink / raw)
  To: git@vger.kernel.org

This just bit me; it seems quite old, and I wanted to propose an alternative solution (maybe it doesn’t work for some reason I’m unaware of):
https://marc.info/?l=git&m=154473525401677&w=2
 
Why not just preserve the existing semantics of the main worktree by checking the worktree refs first unconditionally and only fall back to the main refs when the ref doesn’t exist locally in the worktree?
 
This would have the added benefit of allowing power users to override refs in their worktrees and would, if I’m not mistaken, preserve the semantics of the main worktree in case-insensitive and case-sensitive filesystems. 
 
Anywho, just a thought.  I could work on a patch if this approach makes sense at least as an intermediary until there’s a pluggable storage backend for non-FS stores 😉   (I'd also be somewhat interested in implementing a postgres/sql storage backend if this project is moving forwards __ ).
 
 
Best,
 
Casey Meijer


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG FOLLOWUP: Case insensitivity in worktrees
  2020-07-23 15:20 ` BUG FOLLOWUP: Case insensitivity in worktrees Casey Meijer
@ 2020-07-24  1:19   ` brian m. carlson
  2020-07-24  1:25     ` Junio C Hamano
  2020-07-24 18:14     ` Casey Meijer
  0 siblings, 2 replies; 8+ messages in thread
From: brian m. carlson @ 2020-07-24  1:19 UTC (permalink / raw)
  To: Casey Meijer; +Cc: git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1573 bytes --]

On 2020-07-23 at 15:20:50, Casey Meijer wrote:
> This just bit me; it seems quite old, and I wanted to propose an alternative solution (maybe it doesn’t work for some reason I’m unaware of):
> https://marc.info/?l=git&m=154473525401677&w=2
>  
> Why not just preserve the existing semantics of the main worktree by checking the worktree refs first unconditionally and only fall back to the main refs when the ref doesn’t exist locally in the worktree?
>  
> This would have the added benefit of allowing power users to override refs in their worktrees and would, if I’m not mistaken, preserve the semantics of the main worktree in case-insensitive and case-sensitive filesystems.

It isn't clear to me exactly what you're suggesting.  Are you suggesting
that we allow "head" instead of "HEAD" in worktrees, or that we allow
refs in general to be case insensitive, or something else?

> Anywho, just a thought.  I could work on a patch if this approach makes sense at least as an intermediary until there’s a pluggable storage backend for non-FS stores 😉   (I'd also be somewhat interested in implementing a postgres/sql storage backend if this project is moving forwards __ ).

There is a proposal for a ref storage backend called "reftable" which
will not store the ref names in the file system, and work is being done
on it.  There has been a suggestion for an SQLite store in the past, but
that causes problems for certain implementations, such as JGit, which do
not want to have C bindings.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG FOLLOWUP: Case insensitivity in worktrees
  2020-07-24  1:19   ` brian m. carlson
@ 2020-07-24  1:25     ` Junio C Hamano
  2020-07-24 18:07       ` Casey Meijer
  2020-07-24 18:17       ` Casey Meijer
  2020-07-24 18:14     ` Casey Meijer
  1 sibling, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2020-07-24  1:25 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Casey Meijer, git@vger.kernel.org

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> It isn't clear to me exactly what you're suggesting.  Are you suggesting
> that we allow "head" instead of "HEAD" in worktrees, or that we allow
> refs in general to be case insensitive, or something else?

> There is a proposal for a ref storage backend called "reftable" which
> will not store the ref names in the file system, and work is being done
> on it.  There has been a suggestion for an SQLite store in the past, but
> that causes problems for certain implementations, such as JGit, which do
> not want to have C bindings.

Yes, another important thing to point out is that one shared goal of
these efforts is so that users, even those on case insensitive
filesystems, can name their refs foo and FOO and have the system
treat these as two distinct refs.  IOW, wanting to enhance "support"
for case insensitive treatment of refs will not fly---asking for
"head" and getting contents of "HEAD" on certain platforms is a bug,
induced by limited filesystem these platforms use, and it is being
fixed.

Thanks.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG FOLLOWUP: Case insensitivity in worktrees
  2020-07-24  1:25     ` Junio C Hamano
@ 2020-07-24 18:07       ` Casey Meijer
  2020-07-24 18:17       ` Casey Meijer
  1 sibling, 0 replies; 8+ messages in thread
From: Casey Meijer @ 2020-07-24 18:07 UTC (permalink / raw)
  To: Junio C Hamano, brian m. carlson; +Cc: git@vger.kernel.org

It's definitely a bug and it's kind of amazing that it's been floating about for 2 years.  
I'm not suggesting anything really change except the way you determine whether a ref is "work-tree local" or not. 
This way on case sensitive filesystems only HEAD will be accepted, and on case insensitive filesystems both head and HEAD
will be valid (and will refer to the same file/ref), replicating the semantics of the primary worktree. 

Namely, instead of checking explicitly for "HEAD" (or going through some hoops to determine if the filesystem  
*is* case sensitive), just look in the worktree refs.  If it's in there, then it's worktree local.  If not, then not. 

Like I said, maybe there are some problems with this approach that I'm not aware of, but if so, I think it's worth thinking 
about whether those problems are resolvable 😊 

As far as alternate storage engines,  I'd be more interested in seeing core git builtout to support a plugable storage engine than any specific implementation.
Take a look at PostgreSQL's work on Table Access Methods for an example in this vein.  I think this idea plays well with my proposal above as well because it 
delegates the responsibility of case sensitivity to the storage backend (in this case, the filesystem).  


Best,

Casey

On 2020-07-23, 10:25 PM, "Junio C Hamano" <gitster@pobox.com> wrote:

    "brian m. carlson" <sandals@crustytoothpaste.net> writes:

    > It isn't clear to me exactly what you're suggesting.  Are you suggesting
    > that we allow "head" instead of "HEAD" in worktrees, or that we allow
    > refs in general to be case insensitive, or something else?

    > There is a proposal for a ref storage backend called "reftable" which
    > will not store the ref names in the file system, and work is being done
    > on it.  There has been a suggestion for an SQLite store in the past, but
    > that causes problems for certain implementations, such as JGit, which do
    > not want to have C bindings.

    Yes, another important thing to point out is that one shared goal of
    these efforts is so that users, even those on case insensitive
    filesystems, can name their refs foo and FOO and have the system
    treat these as two distinct refs.  IOW, wanting to enhance "support"
    for case insensitive treatment of refs will not fly---asking for
    "head" and getting contents of "HEAD" on certain platforms is a bug,
    induced by limited filesystem these platforms use, and it is being
    fixed.

    Thanks.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG FOLLOWUP: Case insensitivity in worktrees
  2020-07-24  1:19   ` brian m. carlson
  2020-07-24  1:25     ` Junio C Hamano
@ 2020-07-24 18:14     ` Casey Meijer
  2020-07-24 21:09       ` brian m. carlson
  1 sibling, 1 reply; 8+ messages in thread
From: Casey Meijer @ 2020-07-24 18:14 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git@vger.kernel.org

I think I misunderstood your claim actually Brian.   What is a bug is asking for worktree A's head and getting the main worktree's head. A super dangerous bug. 

I certainly disagree with your assertion that asking for head and not getting HEAD (or HeaD or hEAd) on a case-insensitive storage engine isn't a bug and it certainly 
shouldn't be a bug once extensible storage engines are in place: the storage engine should have final say on how objects are stored and retrieved, not git-core. 

Best,

Casey

On 2020-07-23, 10:19 PM, "brian m. carlson" <sandals@crustytoothpaste.net> wrote:

    On 2020-07-23 at 15:20:50, Casey Meijer wrote:
    > This just bit me; it seems quite old, and I wanted to propose an alternative solution (maybe it doesn’t work for some reason I’m unaware of):
    > https://marc.info/?l=git&m=154473525401677&w=2
    >  
    > Why not just preserve the existing semantics of the main worktree by checking the worktree refs first unconditionally and only fall back to the main refs when the ref doesn’t exist locally in the worktree?
    >  
    > This would have the added benefit of allowing power users to override refs in their worktrees and would, if I’m not mistaken, preserve the semantics of the main worktree in case-insensitive and case-sensitive filesystems.

    It isn't clear to me exactly what you're suggesting.  Are you suggesting
    that we allow "head" instead of "HEAD" in worktrees, or that we allow
    refs in general to be case insensitive, or something else?

    > Anywho, just a thought.  I could work on a patch if this approach makes sense at least as an intermediary until there’s a pluggable storage backend for non-FS stores 😉   (I'd also be somewhat interested in implementing a postgres/sql storage backend if this project is moving forwards __ ).

    There is a proposal for a ref storage backend called "reftable" which
    will not store the ref names in the file system, and work is being done
    on it.  There has been a suggestion for an SQLite store in the past, but
    that causes problems for certain implementations, such as JGit, which do
    not want to have C bindings.
    -- 
    brian m. carlson: Houston, Texas, US


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG FOLLOWUP: Case insensitivity in worktrees
  2020-07-24  1:25     ` Junio C Hamano
  2020-07-24 18:07       ` Casey Meijer
@ 2020-07-24 18:17       ` Casey Meijer
  2020-07-24 19:36         ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Casey Meijer @ 2020-07-24 18:17 UTC (permalink / raw)
  To: Junio C Hamano, brian m. carlson; +Cc: git@vger.kernel.org

Sorry I got mixed up,, that last message should have been addressed to Junio. 

My apologies. 

To put it very simply, I'm asking that git respect the separation of concerns between itself 
and  its storage engine (regardless of whether that's pluggable, or just the current filesystem, which I guess is technically pluggable, lol).


Best,

Casey

On 2020-07-23, 10:25 PM, "Junio C Hamano" <gitster@pobox.com> wrote:

    "brian m. carlson" <sandals@crustytoothpaste.net> writes:

    > It isn't clear to me exactly what you're suggesting.  Are you suggesting
    > that we allow "head" instead of "HEAD" in worktrees, or that we allow
    > refs in general to be case insensitive, or something else?

    > There is a proposal for a ref storage backend called "reftable" which
    > will not store the ref names in the file system, and work is being done
    > on it.  There has been a suggestion for an SQLite store in the past, but
    > that causes problems for certain implementations, such as JGit, which do
    > not want to have C bindings.

    Yes, another important thing to point out is that one shared goal of
    these efforts is so that users, even those on case insensitive
    filesystems, can name their refs foo and FOO and have the system
    treat these as two distinct refs.  IOW, wanting to enhance "support"
    for case insensitive treatment of refs will not fly---asking for
    "head" and getting contents of "HEAD" on certain platforms is a bug,
    induced by limited filesystem these platforms use, and it is being
    fixed.

    Thanks.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG FOLLOWUP: Case insensitivity in worktrees
  2020-07-24 18:17       ` Casey Meijer
@ 2020-07-24 19:36         ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2020-07-24 19:36 UTC (permalink / raw)
  To: Casey Meijer; +Cc: brian m. carlson, git@vger.kernel.org

Casey Meijer <cmeijer@strongestfamilies.com> writes:

> Sorry I got mixed up,, that last message should have been
> addressed to Junio.
>
> My apologies. 
>
> To put it very simply, I'm asking that git respect the separation
> of concerns between itself and its storage engine (regardless of
> whether that's pluggable, or just the current filesystem, which I
> guess is technically pluggable, lol).

If "git" is told to store ref 'foo' pointing at object X and then
ref 'Foo' pointing at object Y by the end user, after claiming to
have done these two operations, if it is then asked about the value
of 'foo', it must say that 'foo' points at object X and not Y.  If a
ref backend is based on case insensitive filesystem, there are only
two options available.  (1) ignore case and violate the expectation
of end user. (2) come up with a way to "defeat" the limitation of
case insensitivity imposed by the filesystem (e.g. your ref backend
implementation _could_ URLencode/decode the ref before using it as a
filename on such a filesystem).  

Doing (2) would be transparent to the rest of Git (i.e. the rest of
Git does not have to care that each ref is stored in a file, whose
filename is encoded version of the refname) and gives us a good
separation of concerns between it and the storage backend.  Those
who ported Git to case insensitive filesystems didn't and chose (1).

As (1) violates end-user expectation, I would think it is fair to
declare it a bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG FOLLOWUP: Case insensitivity in worktrees
  2020-07-24 18:14     ` Casey Meijer
@ 2020-07-24 21:09       ` brian m. carlson
  0 siblings, 0 replies; 8+ messages in thread
From: brian m. carlson @ 2020-07-24 21:09 UTC (permalink / raw)
  To: Casey Meijer; +Cc: git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3352 bytes --]

On 2020-07-24 at 18:14:03, Casey Meijer wrote:
> I think I misunderstood your claim actually Brian.   What is a bug is
> asking for worktree A's head and getting the main worktree's head. A
> super dangerous bug.
> 
> I certainly disagree with your assertion that asking for head and not
> getting HEAD (or HeaD or hEAd) on a case-insensitive storage engine
> isn't a bug and it certainly shouldn't be a bug once extensible
> storage engines are in place: the storage engine should have final say
> on how objects are stored and retrieved, not git-core.

If you want to refer to HEAD, writing it "head" is always wrong.  "head"
is not a special ref to Git, and on a case-sensitive system, I am fully
entitled to create a branch, tag, or other ref with that name that is
independent from HEAD.

It's wrong because regardless of operating system, you don't
intrinsically know whether the repository is case sensitive.  Windows 10
permits case-sensitive directories and macOS has case-sensitive file
systems, so you cannot assume that "head" and "HEAD" are the same
without knowing the setting of "core.ignorecase" and the properties of
the file system.

So when you write "head", you are not asking for HEAD in any worktree or
repository at all.

We are fully aware that Git cannot consistently store refs differing in
case on case-insensitive file systems, and we agree that's a bug.
Reftable will fix that, and as I mentioned, it is being worked on.  It
is not, however, a deficiency that refs are intrinsically case
sensitive, and let me explain why.

First, Git does not require that refs are in any particular encoding.
Specifically, they need not be in Unicode or UTF-8.  It is valid to have
many characters in a ref name, including 0xff.  That means any type of
case folding is not possible, since a ref need not correspond to actual
text.

Second, even if we did require them to be UTF-8, it is impossible to
consistently fold case in a way that works for all locales.  Turkish and
other Turkic languages have a dotted I and a dotless I[0].  The ASCII
uppercase I would fold to a dotless lowercase I for Turkish and to the
ASCII (dotted) lowercase I for English.  Similarly, the ASCII lowercase
I is dotted, and folds to a dotted uppercase I in Turkish and an ASCII
(dotless) uppercase I in English.

It is literally not possible to correctly perform case-folding in a
locale-independent way.  Every attempt to do so will get at least this
case wrong (not to mention other cases that occur), and Turkic languages
are spoken by 200 million people, so ignoring their needs is not only
harmful, but also impacts a massive number of people.  That major OS
designers have made this mistake doesn't mean that we should as well.

We wouldn't perform ASCII-only case folding for all of the reasons
mentioned above and because it's Anglocentric.  As someone who speaks
both Spanish and French, I would find that unsuitable and the results
bizarre.

So I understand that you may expect that on Windows or macOS that you
can write "head" and get HEAD and be surprised when that doesn't work in
all cases.  But that is not, and never has been, expected to work, nor
is it a bug that it doesn't.

[0] https://en.wikipedia.org/wiki/Dotted_and_dotless_I
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-24 21:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <EEA65ED1-2BE0-41AD-84CC-780A9F4D9215@strongestfamilies.com>
2020-07-23 15:20 ` BUG FOLLOWUP: Case insensitivity in worktrees Casey Meijer
2020-07-24  1:19   ` brian m. carlson
2020-07-24  1:25     ` Junio C Hamano
2020-07-24 18:07       ` Casey Meijer
2020-07-24 18:17       ` Casey Meijer
2020-07-24 19:36         ` Junio C Hamano
2020-07-24 18:14     ` Casey Meijer
2020-07-24 21:09       ` brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).