On Wed, Nov 29, 2023 at 04:59:35PM -0500, Taylor Blau wrote:
> On Wed, Nov 29, 2023 at 09:14:20AM +0100, Patrick Steinhardt wrote:
> > We have some references that are more special than others. The reason
> > for them being special is that they either do not follow the usual
> > format of references, or that they are written to the filesystem
> > directly by the respective owning subsystem and thus circumvent the
> > reference backend.
> >
> > This works perfectly fine right now because the reffiles backend will
> > know how to read those refs just fine. But with the prospect of gaining
> > a new reference backend implementation we need to be a lot more careful
> > here:
> >
> >   - We need to make sure that we are consistent about how those refs are
> >     written. They must either always be written via the filesystem, or
> >     they must always be written via the reference backend. Any mixture
> >     will lead to inconsistent state.
> >
> >   - We need to make sure that such special refs are always handled
> >     specially when reading them.
> >
> > We're already mostly good with regard to the first item, except for
> > `BISECT_EXPECTED_REV` which will be addressed in a subsequent commit.
> > But the current list of special refs is missing a lot of refs that
> > really should be treated specially. Right now, we only treat
> > `FETCH_HEAD` and `MERGE_HEAD` specially here.
> >
> > Introduce a new function `is_special_ref()` that contains all current
> > instances of special refs to fix the reading path.
> >
> > Based-on-patch-by: Han-Wen Nienhuys <hanwenn@gmail.com>
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  refs.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 56 insertions(+), 2 deletions(-)
> >
> > diff --git a/refs.c b/refs.c
> > index 7d4a057f36..2d39d3fe80 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -1822,15 +1822,69 @@ static int refs_read_special_head(struct ref_store *ref_store,
> >  	return result;
> >  }
> >
> > +static int is_special_ref(const char *refname)
> > +{
> > +	/*
> > +	 * Special references get written and read directly via the filesystem
> > +	 * by the subsystems that create them. Thus, they must not go through
> > +	 * the reference backend but must instead be read directly. It is
> > +	 * arguable whether this behaviour is sensible, or whether it's simply
> > +	 * a leaky abstraction enabled by us only having a single reference
> > +	 * backend implementation. But at least for a subset of references it
> > +	 * indeed does make sense to treat them specially:
> > +	 *
> > +	 * - FETCH_HEAD may contain multiple object IDs, and each one of them
> > +	 *   carries additional metadata like where it came from.
> > +	 *
> > +	 * - MERGE_HEAD may contain multiple object IDs when merging multiple
> > +	 *   heads.
> > +	 *
> > +	 * - "rebase-apply/" and "rebase-merge/" contain all of the state for
> > +	 *   rebases, where keeping it closely together feels sensible.
> > +	 *
> > +	 * There are some exceptions that you might expect to see on this list
> > +	 * but which are handled exclusively via the reference backend:
> > +	 *
> > +	 * - CHERRY_PICK_HEAD
> > +	 * - HEAD
> > +	 * - ORIG_HEAD
> > +	 *
> > +	 * Writing or deleting references must consistently go either through
> > +	 * the filesystem (special refs) or through the reference backend
> > +	 * (normal ones).
> > +	 */
> > +	const char * const special_refs[] = {
> > +		"AUTO_MERGE",
> > +		"BISECT_EXPECTED_REV",
> > +		"FETCH_HEAD",
> > +		"MERGE_AUTOSTASH",
> > +		"MERGE_HEAD",
> > +	};
> 
> Is there a reason that we don't want to declare this statically? If we
> did, I think we could drop one const, since the strings would instead
> reside in the .rodata section.

Not really, no.

> > +	int i;
> 
> Not that it matters for this case, but it may be worth declaring i to be
> an unsigned type, since it's used as an index into an array. size_t
> seems like an appropriate choice there.

Hm. We do use `int` almost everywhere when iterating through an array
via `ARRAY_SIZE`, but ultimately I don't mind whether it's `int`,
`unsigned` or `size_t`.

> > +	for (i = 0; i < ARRAY_SIZE(special_refs); i++)
> > +		if (!strcmp(refname, special_refs[i]))
> > +			return 1;
> > +
> > +	/*
> > +	 * git-rebase(1) stores its state in `rebase-apply/` or
> > +	 * `rebase-merge/`, including various reference-like bits.
> > +	 */
> > +	if (starts_with(refname, "rebase-apply/") ||
> > +	    starts_with(refname, "rebase-merge/"))
> 
> Do we care about case sensitivity here? Definitely not on case-sensitive
> filesystems, but I'm not sure about case-insensitive ones. For instance,
> on macOS, I can do:
> 
>     $ git rev-parse hEAd
> 
> and get the same value as "git rev-parse HEAD" (on my Linux workstation,
> this fails as expected).
> 
> I doubt that there are many users in the wild asking to resolve
> reBASe-APPLY/xyz, but I think that after this patch that would no longer
> work as-is, so we may want to replace this with istarts_with() instead.

In practice I'd argue that nobody is ever going to ask for something in
`rebase-apply/` outside of Git internals or scripts, and I'd expect
these to always use proper casing. So I rather lean towards a "no, we
don't care about case sensitivity".

Patrick