git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Git Mailing List <git@vger.kernel.org>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Konstantin Khomoutov" <kostix+git@007spb.ru>,
	"Jeff King" <peff@peff.net>
Subject: Fwd: Should "head" also work for "HEAD" on case-insensitive FS?
Date: Wed, 26 Jul 2017 17:23:23 -0700	[thread overview]
Message-ID: <CAMy9T_HA1LcAo2pi27gPaRd4KZzTdT3tSYzV_CC8UBkbtk78RA@mail.gmail.com> (raw)
In-Reply-To: <CAMy9T_FmE=8xzjRJJRxLwQjoMStJx5sYO_xtODv2OEZm54DurA@mail.gmail.com>

Dang, I just noticed that I hit "reply" rather than "reply-to-all" on
the below email (stupid GMail default). Junio, your response to this
email accordingly went only to me.

Michael

---------- Forwarded message ----------
From: Michael Haggerty <mhagger@alum.mit.edu>
Date: Mon, Jul 10, 2017 at 7:52 AM
Subject: Re: Should "head" also work for "HEAD" on case-insensitive FS?
To: Junio C Hamano <gitster@pobox.com>


On Fri, Jul 7, 2017 at 12:34 AM, Junio C Hamano <gitster@pobox.com> wrote:
> [...]
> The exact detail of the encoding used here is immaterial, but I just
> used "encode uppercase letters and % as % followed by two hex",
> which was simple enough.  Usual refs/heads/master and friends will
> not have to be touched when encoded this way.  Perhaps the decoding
> side should be tweaked so that uppercase letters it sees needs to be
> downcased to avoid "refs/heads/Foo" getting returned as "Foo" branch,
> as a "Foo" branch should have been encoded as "refs/heads/%46oo".
>
> Having said that, this patch alone does not quite work yet.
>
>  * In the repository discovery code, we have some logic that
>    hard-codes the path in the directory (which is a candidate for
>    being a repository) to check, like "refs/" and "HEAD".  In the
>    attached illustration patch, files_path_encode() special cases
>    "HEAD" so that it is not munged, which is a bit of ugly
>    workaround for this.
>
>  * I haven't figured out why, but what refs.c calls "pseudo refs"
>    bypasses the files backend layer for some but not all operations,
>    which causes t1405-main-ref-store to fail.  The test creates a
>    "pseudo ref" FOO and then tries to remove it.  Creation seems to
>    follow the files-backend.c and thusly goes through the escaping;
>    refs.::refs_delete_ref() however does not consult files-backend.c
>    and fails to find and delete .git/FOO, because the creation side
>    encoded it as ".git/%46%4F%4F".

I think the most natural thing would be to use different encoding
rules for pseudo-refs (references like "HEAD" and "FETCH_HEAD") and
for other references (those starting with "refs/").

Pseudo-refs (with the partial exception of "HEAD") are quite peculiar
beasts. They sometimes include other information besides the reference
value and IIRC the refs code doesn't have any idea how to write or
read those extra contents. I believe that "HEAD" is the only pseudo
ref for which reflogs are ever written. Pseudo-refs have to match
/[A-Z_]+/ (see https://github.com/git/git/blob/8b2efe2a0fd93b8721879f796d848a9ce785647f/refs.c#L169-L173),
so ISTM that there is no need to encode such references' filenames at
all. (It's possible that the pattern could be made even stricter, like
/[A-Z_]+HEAD/.) Moreover, IIRC, such references are never scanned for
(as in for-each-refs) but rather are always asked for by name. So
their names might never have to be *de*coded, either. On the other
hand, when trying to look them up, it would be a good idea to verify
that the requested name satisfies the above naming rule. Other than
that, I believe it would be preferable to leave pseudo-refs untouched
by your new encoding/decoding code.

Whereas other references are typically lower-case, so it makes sense
for lower-case letters to be the ones that are passed through
transparently in such references' filenames (as in your scheme).

But...since we are talking about introducing a new loose reference
filename encoding, I think it would be a good idea to address a couple
of related issues at the same time:

* Some filesystems natively use Unicode, and insist on a particular
Unicode normalization (NFC vs NFD), which might differ from the
"upstream" normalization. So such reference names get munged when
written as loose references. I'm not enough of an expert in Unicode to
know what the best solution is, except for the strong feeling that it
would require some cooperation from the rest of Git to ensure a good
user experience.

* Another bad effect of our current loose reference encoding is that
it prohibits references that D/F conflict with each other (like
"refs/heads/foo" and "refs/heads/foo/bar") because
"$GIT_DIR/refs/heads/foo" can't be a file and a directory at the same
time. Even if we don't want to support that, this problem also
prevents us from storing reflogs for deleted references, which is a
serious flaw. We could solve this problem by encoding "directory"
components of reference names differently than "leaf" components; for
example, the above references could be encoded as
"refs/heads.d/foo.ref" and "refs/heads.d/foo.d/bar.ref".

It'd be nice to solve all of these related problems at the same time,
because whatever encoding we choose now will have to be supported
forever.

Michael

  parent reply	other threads:[~2017-07-27  0:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-03 22:00 Should "head" also work for "HEAD" on case-insensitive FS? Ævar Arnfjörð Bjarmason
2017-07-04  7:19 ` Konstantin Khomoutov
2017-07-04  8:24   ` Ævar Arnfjörð Bjarmason
2017-07-05  8:36     ` Jeff King
2017-07-05 17:07       ` Junio C Hamano
2017-07-06 22:34         ` Junio C Hamano
     [not found]           ` <CAMy9T_FmE=8xzjRJJRxLwQjoMStJx5sYO_xtODv2OEZm54DurA@mail.gmail.com>
2017-07-27  0:23             ` Michael Haggerty [this message]
     [not found]             ` <xmqqa84c6v41.fsf@gitster.mtv.corp.google.com>
2017-07-27  0:49               ` Fwd: " Junio C Hamano
2017-07-27 14:35                 ` Jeff King
2017-07-27 15:26                   ` Junio C Hamano
2017-07-06  3:35   ` Kenneth Hsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMy9T_HA1LcAo2pi27gPaRd4KZzTdT3tSYzV_CC8UBkbtk78RA@mail.gmail.com \
    --to=mhagger@alum.mit.edu \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=kostix+git@007spb.ru \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).