git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* clone, hardlinks, and file modes (and CAP_FOWNER)
@ 2018-08-24 12:14 Andreas Krey
  2018-08-24 14:48 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 3+ messages in thread
From: Andreas Krey @ 2018-08-24 12:14 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Hi everybody,

I'm currently looking into more aggressively sharing space between multiple repositories,
and into getting them to share again after one did a repack (which costs us 15G space).

One thing I stumbled on is the /proc/sys/fs/protected_hardlinks stuff which disallows
hardlinking pack files belonging to someone else. This consequently inhibits sharing
when first cloning from a common shared cache repo.

Installing git with CAP_FOWNER is probably too dangerous;
at least the capability should only be enabled during the directory copying.

*

And the next thing is that copied object/pack files are created with mode rw-rw-r--,
unlike those that come out of the regular transports.

Apparent patch:

diff --git a/builtin/clone.c b/builtin/clone.c
index fd2c3ef090..6ffb4db4da 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -448,7 +448,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
                                die_errno(_("failed to create link '%s'"), dest->buf);
                        option_no_hardlinks = 1;
                }
-               if (copy_file_with_time(dest->buf, src->buf, 0666))
+               if (copy_file_with_time(dest->buf, src->buf, 0444))
                        die_errno(_("failed to copy file to '%s'"), dest->buf);
        }
        closedir(dir);

Alas, copy_file takes the mode just as a crude hint to executability, so also:

diff --git a/copy.c b/copy.c
index 4de6a110f0..883060009c 100644
--- a/copy.c
+++ b/copy.c
@@ -32,7 +32,7 @@ int copy_file(const char *dst, const char *src, int mode)
 {
        int fdi, fdo, status;
 
-       mode = (mode & 0111) ? 0777 : 0666;
+       mode = (mode & 0111) ? 0777 : (mode & 0222) ? 0666 : 0444;
        if ((fdi = open(src, O_RDONLY)) < 0)
                return fdi;
        if ((fdo = open(dst, O_WRONLY | O_CREAT | O_EXCL, mode)) < 0) {

(copy_file is also used with 0644 instead of the usual 0666 in refs/files-backend.c)

Will submit as patch if acceptable; I'm not sure what the mode casing will
do with other users.

- Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: clone, hardlinks, and file modes (and CAP_FOWNER)
  2018-08-24 12:14 clone, hardlinks, and file modes (and CAP_FOWNER) Andreas Krey
@ 2018-08-24 14:48 ` Ævar Arnfjörð Bjarmason
  2018-08-24 19:59   ` Andreas Krey
  0 siblings, 1 reply; 3+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-24 14:48 UTC (permalink / raw)
  To: Andreas Krey; +Cc: git, Junio C Hamano


On Fri, Aug 24 2018, Andreas Krey wrote:

> I'm currently looking into more aggressively sharing space between multiple repositories,
> and into getting them to share again after one did a repack (which costs us 15G space).
>
> One thing I stumbled on is the /proc/sys/fs/protected_hardlinks stuff which disallows
> hardlinking pack files belonging to someone else. This consequently inhibits sharing
> when first cloning from a common shared cache repo.
>
> Installing git with CAP_FOWNER is probably too dangerous;
> at least the capability should only be enabled during the directory copying.
>
> *
>
> And the next thing is that copied object/pack files are created with mode rw-rw-r--,
> unlike those that come out of the regular transports.
>
> Apparent patch:
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index fd2c3ef090..6ffb4db4da 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -448,7 +448,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>                                 die_errno(_("failed to create link '%s'"), dest->buf);
>                         option_no_hardlinks = 1;
>                 }
> -               if (copy_file_with_time(dest->buf, src->buf, 0666))
> +               if (copy_file_with_time(dest->buf, src->buf, 0444))
>                         die_errno(_("failed to copy file to '%s'"), dest->buf);
>         }
>         closedir(dir);
>
> Alas, copy_file takes the mode just as a crude hint to executability, so also:
>
> diff --git a/copy.c b/copy.c
> index 4de6a110f0..883060009c 100644
> --- a/copy.c
> +++ b/copy.c
> @@ -32,7 +32,7 @@ int copy_file(const char *dst, const char *src, int mode)
>  {
>         int fdi, fdo, status;
>
> -       mode = (mode & 0111) ? 0777 : 0666;
> +       mode = (mode & 0111) ? 0777 : (mode & 0222) ? 0666 : 0444;
>         if ((fdi = open(src, O_RDONLY)) < 0)
>                 return fdi;
>         if ((fdo = open(dst, O_WRONLY | O_CREAT | O_EXCL, mode)) < 0) {
>
> (copy_file is also used with 0644 instead of the usual 0666 in refs/files-backend.c)
>
> Will submit as patch if acceptable; I'm not sure what the mode casing will
> do with other users.

This is mostly unrelated to your suggestion, but you might be interested
in this thread I started a while ago of doing this with an approach
unrelated to hardlinks, although you'll need a FS that does block
de-duplication (and it won't work at all currently, needs some
patching):
https://public-inbox.org/git/87bmhiykvw.fsf@evledraar.gmail.com/

I don't understand how this hardlink approach would work (doesn't mean
it won't, just that I don't get it).

Are you meaning to clone without --reference and instead via file:// and
rely on FS-local hardlinks, but then how will that work once one of the
repos does a full repack? Are you going to inhibit that in some way,
e.g. with gc.bigPackThreshold (but then why doesn't that work already?).

If you have such a tightly coupled approach isn't --reference closed to
what you want in that case?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: clone, hardlinks, and file modes (and CAP_FOWNER)
  2018-08-24 14:48 ` Ævar Arnfjörð Bjarmason
@ 2018-08-24 19:59   ` Andreas Krey
  0 siblings, 0 replies; 3+ messages in thread
From: Andreas Krey @ 2018-08-24 19:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Fri, 24 Aug 2018 16:48:37 +0000, Ævar Arnfjörð Bjarmason wrote:
...
> I don't understand how this hardlink approach would work (doesn't mean
> it won't, just that I don't get it).

I just detect whether there is insufficient sharing (df is quite handy
here; 'df this/.git that/.git' tells the unshared part of that/.git only).

When I detect 'unsharedness', I just hardlink the biggest .pack and the
corresponding .idx into the target repo, create a .keep file for that,
run 'git gc', and remove the .keep file. Effect: repack uses the .kept
file and only creates a small additional pack file for the remaining
objects, thus the biggest part of the objects are now shared between
the cache and the target repo.

This is going to be run once a week over all the repos on a machine
(that were created by our tooling and thus have known locations),
to avoid eventual repacks of repos to gradually and completely
lose the sharedness of the objects/packs.

> If you have such a tightly coupled approach isn't --reference closed to
> what you want in that case?

Close, but not. --reference et al. all need the promise that the
referenced repo isn't going away, and I don't want to rely on this
(if someone thinks he can drop the cache this should not lead to
breakage in the work repos).

- Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-08-24 19:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-24 12:14 clone, hardlinks, and file modes (and CAP_FOWNER) Andreas Krey
2018-08-24 14:48 ` Ævar Arnfjörð Bjarmason
2018-08-24 19:59   ` Andreas Krey

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).