git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Standardized escaping to store a .git in git?
@ 2021-05-19 21:00 Josh Triplett
  2021-05-19 21:31 ` Jonathan Nieder
  0 siblings, 1 reply; 5+ messages in thread
From: Josh Triplett @ 2021-05-19 21:00 UTC (permalink / raw)
  To: git

On rare occasions, a project may need to store and version a .git
directory in a git repository. For instance, a project that interacts
with git repositories may need test cases. Or, a project using git to
store backups may also want to back up git repositories. `.git` is the
only filename that git can't transparently store and version.

I've seen projects take different approaches to work around this. For
instance, the libgit2 project renames the `.git` directory to `.gitted`,
and then their test framework copies that to a temporary directory as
`.git`.

Would it make sense to have a standardized escaping mechanism for this,
that git could then standardize the handling of in a safe way (taking
both project configuration and local configuration into account)? Such a
mechanism would not, by default, result in git checking out a `.git`
directory verbatim, as that wouldn't be safe (due to hook scripts and
due to searches for .git directories), but a user could configure their
own system to do so for a specific project, tools like `git archive`
could have a way to un-escape the directory in a generated archive, and
references to objects within a treeish could use such paths.
Standardizing this would allow tools to interoperate rather than each
inventing their own convention.

(Note that today, git *can* successfully check in, version, update, and
check out a bare repo.git directory, just not a non-bare .git
directory.)

As one possible escaping (absolutely subject to bikeshedding):

- Reserve names starting with a specified character (e.g. \x01); call
  that escape character E.
- Encode filenames that actually start with E to start with EE
- Encode .git as E.git
- Require an opt-in to interpret this escaping; tools that don't
  interpret this escaping will still be able to operate on the files, in
  much the same way that it's possible to operate on a symlink as if it
  were a file containing the target path.

There are tradeoffs here: using a more type-able escape character would
be convenient if a user ever had to deal with the raw name, but on the
other hand, using a more type-able escape character would make the need
to escape the escape character come up more often.

Regardless of the specific approach to escaping `.git`, does the general
idea of standardizing such escaping across tools seem like something git
could potentially do, to allow transparently storing *any* file or
directory in a git repository?

- Josh Triplett

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Standardized escaping to store a .git in git?
  2021-05-19 21:00 Standardized escaping to store a .git in git? Josh Triplett
@ 2021-05-19 21:31 ` Jonathan Nieder
  2021-05-19 22:08   ` Josh Triplett
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Nieder @ 2021-05-19 21:31 UTC (permalink / raw)
  To: Josh Triplett; +Cc: git

Hi Josh,

Josh Triplett wrote:

> On rare occasions, a project may need to store and version a .git
> directory in a git repository. For instance, a project that interacts
> with git repositories may need test cases. Or, a project using git to
> store backups may also want to back up git repositories. `.git` is the
> only filename that git can't transparently store and version.

My take on this might be a bit surprising, but it's probably worth
spelling out anyway: Git is first and foremost a source code
management tool, and ".git" directories are not a good interchange
format, so while I have sympathy for this use case, I do _not_ think
that Git should make changes that hurt other use cases in order to
support it.

Instead, I recommend doing one of the following, in order from most to
least preferred:

 1. Make the test case run git commands to create a Git repository.
    This makes it obvious what the test is trying to do, without
    having to deal with unrelated details also recorded in ".git".
    This is what Git's test suite does, for example.

 2. Check in a fast-import file and use "git fast-import" to make a
    Git repository out of it.

 3. Check in a "git bundle" file and use "git clone" to make a Git
    repository out of it.

 4. Check in an archive file (e.g., tar) containing a .git directory.
    (I consider this preferable over checking in a .git directory
    directly because it prevents a user from accidentally "cd"-ing
    into it and running git commands within the checked-in repository
    that they intended to run in the top-level repository.  That seems
    especially worth preventing because the checked-in repository can
    contain git aliases and other settings such as core.pager that
    cause automatic code execution, as you mentioned.)

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Standardized escaping to store a .git in git?
  2021-05-19 21:31 ` Jonathan Nieder
@ 2021-05-19 22:08   ` Josh Triplett
  2021-05-19 22:37     ` Jonathan Nieder
  0 siblings, 1 reply; 5+ messages in thread
From: Josh Triplett @ 2021-05-19 22:08 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Wed, May 19, 2021 at 02:31:00PM -0700, Jonathan Nieder wrote:
> Josh Triplett wrote:
> > On rare occasions, a project may need to store and version a .git
> > directory in a git repository. For instance, a project that interacts
> > with git repositories may need test cases. Or, a project using git to
> > store backups may also want to back up git repositories. `.git` is the
> > only filename that git can't transparently store and version.
>
> My take on this might be a bit surprising, but it's probably worth
> spelling out anyway: Git is first and foremost a source code
> management tool, and ".git" directories are not a good interchange
> format, so while I have sympathy for this use case, I do _not_ think
> that Git should make changes that hurt other use cases in order to
> support it.

I absolutely agree that such changes would be entirely inappropriate if
they hurt other use cases. That's part of why I'm suggesting that I
don't think any *defaults* in git should change. My hope is more to have
some kind of guidance along the lines of "if you need to do escaping, do
it this way", to lead towards having one canonical way to do such
escaping rather than multiple incompatible ways.

Part of my motivation, here, is that I'm looking to implement one such
escaping mechanism (in a tool built atop libgit2 that needs to handle
and version arbitrary files), and rather than inventing something
bespoke I'd love to interoperate. And since I've seen various approaches
used in the wild, I didn't want to add Yet Another distinct approach
before starting a design conversation about it.

> Instead, I recommend doing one of the following, in order from most to
> least preferred:
> 
>  1. Make the test case run git commands to create a Git repository.
>     This makes it obvious what the test is trying to do, without
>     having to deal with unrelated details also recorded in ".git".
>     This is what Git's test suite does, for example.
> 
>  2. Check in a fast-import file and use "git fast-import" to make a
>     Git repository out of it.
> 
>  3. Check in a "git bundle" file and use "git clone" to make a Git
>     repository out of it.

For the test-case approach, these are potentially workable, though they
only work if you just need a git repo with a given set of semantics,
rather than a binary-identical test case.

For the storing-arbitrary-files case, these wouldn't apply.

>  4. Check in an archive file (e.g., tar) containing a .git directory.
>     (I consider this preferable over checking in a .git directory
>     directly because it prevents a user from accidentally "cd"-ing
>     into it and running git commands within the checked-in repository
>     that they intended to run in the top-level repository.  That seems
>     especially worth preventing because the checked-in repository can
>     contain git aliases and other settings such as core.pager that
>     cause automatic code execution, as you mentioned.)

Storing as an archive is an option, but that would then require tools
that want to track arbitrary files to distinguish between "tar file that
should be unpacked" and "tar file that was originally a tar file". It's
also a harder format to interoperate with.

To clarify, I don't think the default behavior of git should be to
un-escape this escaping mechanism. Rather, I think the default behavior
should be to treat the filenames as literal, and the user could opt in
to un-escaping on checkout and escaping on check-in.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Standardized escaping to store a .git in git?
  2021-05-19 22:08   ` Josh Triplett
@ 2021-05-19 22:37     ` Jonathan Nieder
  2021-05-20  3:26       ` Josh Triplett
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Nieder @ 2021-05-19 22:37 UTC (permalink / raw)
  To: Josh Triplett; +Cc: git

Hi,

Josh Triplett wrote:

> Part of my motivation, here, is that I'm looking to implement one such
> escaping mechanism (in a tool built atop libgit2 that needs to handle
> and version arbitrary files), and rather than inventing something
> bespoke I'd love to interoperate. And since I've seen various approaches
> used in the wild, I didn't want to add Yet Another distinct approach
> before starting a design conversation about it.

*nod* To be clear, I'm glad you brought it up, among other reasons
because it means this discussion becomes available in the list archive
for when people are wondering about the same thing in the future.

> On Wed, May 19, 2021 at 02:31:00PM -0700, Jonathan Nieder wrote:

>> Instead, I recommend doing one of the following, in order from most to
>> least preferred:
[...]
> For the test-case approach, these are potentially workable, though they
> only work if you just need a git repo with a given set of semantics,
> rather than a binary-identical test case.

For cases wanting something binary-indentical, it still seems
preferable to check in the individual relevant binary file (e.g., an
index file or a packfile) instead of a full repository.  In addition
to the safety improvement involved, this makes the test case easier to
understand.

> For the storing-arbitrary-files case, these wouldn't apply.

Can you say a little more about the storing-arbitrary-files case?

For example, 'bup' is a tool built on top of Git formats that stores
arbitrary files without using Git tree objects for it.  'etckeeper' is
another tool that stores additional information that Git does not (such
as detailed filesystem permissions).

If you have a use case in common with other tools, then finding a way
to interoperate sounds great. :)  The best way to do that is likely to
depend on the details of what the family of tools want to do.

There are some other filenames that "git fsck" also forbids, so this
comes down to more than figuring out how to handle ".git".

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Standardized escaping to store a .git in git?
  2021-05-19 22:37     ` Jonathan Nieder
@ 2021-05-20  3:26       ` Josh Triplett
  0 siblings, 0 replies; 5+ messages in thread
From: Josh Triplett @ 2021-05-20  3:26 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Wed, May 19, 2021 at 03:37:12PM -0700, Jonathan Nieder wrote:
> Josh Triplett wrote:
> > Part of my motivation, here, is that I'm looking to implement one such
> > escaping mechanism (in a tool built atop libgit2 that needs to handle
> > and version arbitrary files), and rather than inventing something
> > bespoke I'd love to interoperate. And since I've seen various approaches
> > used in the wild, I didn't want to add Yet Another distinct approach
> > before starting a design conversation about it.
>
> *nod* To be clear, I'm glad you brought it up, among other reasons
> because it means this discussion becomes available in the list archive
> for when people are wondering about the same thing in the future.
>
> > For the storing-arbitrary-files case, these wouldn't apply.
>
> Can you say a little more about the storing-arbitrary-files case?

Sure. I'm using git to record before-and-after states of running
commands in an isolated environment, to see the differences caused by
those commands. The "before" state includes everything the command
needs, and the delta from "before" to "after" is exactly what the
command changed. Some commands create git repositories; for instance,
some software build scripts `git clone` their dependencies or other
data. So when I go to record the "after" state, it might include a .git
directory. And I need to record that as transparently as possible.

I'd like to use git repositories so that people *can* push and pull
data using git, inspect the repository with things like "git show", use
"git diff", and similar.

> There are some other filenames that "git fsck" also forbids, so this
> comes down to more than figuring out how to handle ".git".

Are you talking about the case-insensitive check for paths that can be
confused with .git on some platforms, or something more than that?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-20  3:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19 21:00 Standardized escaping to store a .git in git? Josh Triplett
2021-05-19 21:31 ` Jonathan Nieder
2021-05-19 22:08   ` Josh Triplett
2021-05-19 22:37     ` Jonathan Nieder
2021-05-20  3:26       ` Josh Triplett

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).