Question: What's the best way to implement directory permission control in git?

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Question: What's the best way to implement directory permission control in git?
@ 2022-07-27  8:56 ZheNing Hu
  2022-07-27  9:17 ` Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: ZheNing Hu @ 2022-07-27  8:56 UTC (permalink / raw)
  To: Git List, Derrick Stolee, Junio C Hamano, Christian Couder,
	Jeff King, Ævar Arnfjörð Bjarmason

if there is a monorepo such as
git@github.com:derrickstolee/sparse-checkout-example.git

There are many files and directories:

client/
    android/
    electron/
    iOS/
service/
    common/
    identity/
    list/
    photos/
web/
    browser/
    editor/
    friends/
boostrap.sh
LICENSE.md
README.md

Now we can use partial-clone + sparse-checkout to reduce
the network overhead, and reduce disk storage space size, that's good.

But I also need a ACL to control what directory or file people can fetch/push.
e.g. I don't want a client fetch the code in "service" or "web".

Now if the user client use "git log -p" or "git sparse-checkout add service"...
or other git command, git which will  download them by
"git fetch --filter=blob:none --stdin <oid>" automatically.

This means that the git client and server interact with git objects
(and don't care about path) we cannot simply ban someone download
a "path" on the server side.

What should I do? You may recommend me to use submodule,
but due to its complexity, I don't really want to use it :-(

ZheNing Hu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-27  8:56 Question: What's the best way to implement directory permission control in git? ZheNing Hu
@ 2022-07-27  9:17 ` Ævar Arnfjörð Bjarmason
  2022-07-28 14:54   ` ZheNing Hu
  2022-07-27  9:24 ` Thomas Guyot
  2022-07-29 23:50 ` Emily Shaffer
  2 siblings, 1 reply; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-27  9:17 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: Git List, Derrick Stolee, Junio C Hamano, Christian Couder,
	Jeff King


On Wed, Jul 27 2022, ZheNing Hu wrote:

> if there is a monorepo such as
> git@github.com:derrickstolee/sparse-checkout-example.git
>
> There are many files and directories:
>
> client/
>     android/
>     electron/
>     iOS/
> service/
>     common/
>     identity/
>     list/
>     photos/
> web/
>     browser/
>     editor/
>     friends/
> boostrap.sh
> LICENSE.md
> README.md
>
> Now we can use partial-clone + sparse-checkout to reduce
> the network overhead, and reduce disk storage space size, that's good.
>
> But I also need a ACL to control what directory or file people can fetch/push.
> e.g. I don't want a client fetch the code in "service" or "web".
>
> Now if the user client use "git log -p" or "git sparse-checkout add service"...
> or other git command, git which will  download them by
> "git fetch --filter=blob:none --stdin <oid>" automatically.
>
> This means that the git client and server interact with git objects
> (and don't care about path) we cannot simply ban someone download
> a "path" on the server side.
>
> What should I do? You may recommend me to use submodule,
> but due to its complexity, I don't really want to use it :-(

There isn't a way to do this in git.

It's theoretically possible, i.e. a client could be told that the SHA-1
of a directory is XYZ, and construct a commit object with a reference to
it.

But currently a *lot* of things in the client code assume that these
things will be available in one way or another.

The state-of-the-art in the "sparse" code may differ from the above, I
don't know.

Also note that there's a well-known edge case in the git protocol where
it's really incompatible with the notion of "secret" data, i.e. even if
you hide a ref you'll be able to "guess" it by seeing what delta(s) the
server will produce or accept etc.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-27  8:56 Question: What's the best way to implement directory permission control in git? ZheNing Hu
  2022-07-27  9:17 ` Ævar Arnfjörð Bjarmason
@ 2022-07-27  9:24 ` Thomas Guyot
  2022-07-29 12:49   ` ZheNing Hu
  2022-07-29 23:50 ` Emily Shaffer
  2 siblings, 1 reply; 13+ messages in thread
From: Thomas Guyot @ 2022-07-27  9:24 UTC (permalink / raw)
  To: ZheNing Hu, Git List, Derrick Stolee, Junio C Hamano,
	Christian Couder, Jeff King,
	Ævar Arnfjörð Bjarmason

On 2022-07-27 04:56, ZheNing Hu wrote:
> if there is a monorepo such as
> git@github.com:derrickstolee/sparse-checkout-example.git
>
> There are many files and directories:
>
> client/
>      android/
>      electron/
>      iOS/
> service/
>      common/
>      identity/
>      list/
>      photos/
> web/
>      browser/
>      editor/
>      friends/
> boostrap.sh
> LICENSE.md
> README.md
>
> Now we can use partial-clone + sparse-checkout to reduce
> the network overhead, and reduce disk storage space size, that's good.
>
> But I also need a ACL to control what directory or file people can fetch/push.
> e.g. I don't want a client fetch the code in "service" or "web".

Pushes can easily be blocked with a pre-receive or update hook on the 
server side. That covers the case where you want to prevenr users to 
update certain paths in the repo.
> Now if the user client use "git log -p" or "git sparse-checkout add service"...
> or other git command, git which will  download them by
> "git fetch --filter=blob:none --stdin <oid>" automatically.
>
> This means that the git client and server interact with git objects
> (and don't care about path) we cannot simply ban someone download
> a "path" on the server side.

Indeed - core devs can correct me if I'm wrong but afaik even in the 
case of sparse checkouts and partial clones the packs may include other 
objects. I have no ideas how git selects objects and packs on sent and 
when it decides to repack objects... What I know is it can pack entire 
repos in just a few files using delta compression and it would probably 
make sense to sent these pack if there is no real benefit in repacking 
just the requested objects.
> What should I do? You may recommend me to use submodule,
> but due to its complexity, I don't really want to use it :-(

Submodules is definitively an option for read ACLs, and considering git 
was not originally designed to hide information from a single store it's 
probably your only option. Moreover, if the git client is able to fetch 
directly blobs and trees (the later includes partial trees as a tree 
object is a single "directory" that can contain other blobs and trees), 
then even the server has no knowledge of where a tree hook into, or even 
how it's named. All that information would have to be mapped elsewhere.

To take your example above, the "common" subtree of "service/" could be 
in multiple top level directories (i,e, the same tree with same 
contents), and each top level dirs could have a different "common" 
subtree. So git would have to find where each tree object (one per 
directory) is accessible from for *each revision* before deciding if a 
client should be authorized to fetch an object, and the same would be 
required for blobs (and tree objects don't even know their own name, 
that comes from the reference in the parent tree or commit object for 
the top-level tree).

So even before solving the client/server protocol issue you mentioned, 
you can't just hide part of a repo in git right now and changing that is 
definitively not trivial.

--
Thomas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-27  9:17 ` Ævar Arnfjörð Bjarmason
@ 2022-07-28 14:54   ` ZheNing Hu
  2022-07-28 15:50     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 13+ messages in thread
From: ZheNing Hu @ 2022-07-28 14:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Derrick Stolee, Junio C Hamano, Christian Couder,
	Jeff King

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2022年7月27日周三 17:20写道：
>
>
> On Wed, Jul 27 2022, ZheNing Hu wrote:
>
> > if there is a monorepo such as
> > git@github.com:derrickstolee/sparse-checkout-example.git
> >
> > There are many files and directories:
> >
> > client/
> >     android/
> >     electron/
> >     iOS/
> > service/
> >     common/
> >     identity/
> >     list/
> >     photos/
> > web/
> >     browser/
> >     editor/
> >     friends/
> > boostrap.sh
> > LICENSE.md
> > README.md
> >
> > Now we can use partial-clone + sparse-checkout to reduce
> > the network overhead, and reduce disk storage space size, that's good.
> >
> > But I also need a ACL to control what directory or file people can fetch/push.
> > e.g. I don't want a client fetch the code in "service" or "web".
> >
> > Now if the user client use "git log -p" or "git sparse-checkout add service"...
> > or other git command, git which will  download them by
> > "git fetch --filter=blob:none --stdin <oid>" automatically.
> >
> > This means that the git client and server interact with git objects
> > (and don't care about path) we cannot simply ban someone download
> > a "path" on the server side.
> >
> > What should I do? You may recommend me to use submodule,
> > but due to its complexity, I don't really want to use it :-(
>
> There isn't a way to do this in git.
>
> It's theoretically possible, i.e. a client could be told that the SHA-1
> of a directory is XYZ, and construct a commit object with a reference to
> it.
>

I guess you mean use a special reference to hold the restricted path which
the client can access, and pre-receive-hook can ban the client from downloading
other references. But this method is a little weird... How can this reference
sync with main branches? If we have changed client permission to access
server directory, how to get the "history" of the server directory?

I believe this approach is not very appropriate and is not maintainable.

> But currently a *lot* of things in the client code assume that these
> things will be available in one way or another.
>
> The state-of-the-art in the "sparse" code may differ from the above, I
> don't know.
>
> Also note that there's a well-known edge case in the git protocol where
> it's really incompatible with the notion of "secret" data, i.e. even if
> you hide a ref you'll be able to "guess" it by seeing what delta(s) the
> server will produce or accept etc.

Yeah, there are data security issues... Unless we need to isolate objects
between directories. Or in this case we disable the delta object.....
Okay, this seems a little strange.

Anyway, thanks for the answer!

ZheNing Hu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-28 14:54   ` ZheNing Hu
@ 2022-07-28 15:50     ` Ævar Arnfjörð Bjarmason
  2022-07-29  1:48       ` Elijah Newren
  2022-07-29 13:15       ` ZheNing Hu
  0 siblings, 2 replies; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-28 15:50 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: Git List, Derrick Stolee, Junio C Hamano, Christian Couder,
	Jeff King


On Thu, Jul 28 2022, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2022年7月27日周三 17:20写道：
>>
>>
>> On Wed, Jul 27 2022, ZheNing Hu wrote:
>>
>> > if there is a monorepo such as
>> > git@github.com:derrickstolee/sparse-checkout-example.git
>> >
>> > There are many files and directories:
>> >
>> > client/
>> >     android/
>> >     electron/
>> >     iOS/
>> > service/
>> >     common/
>> >     identity/
>> >     list/
>> >     photos/
>> > web/
>> >     browser/
>> >     editor/
>> >     friends/
>> > boostrap.sh
>> > LICENSE.md
>> > README.md
>> >
>> > Now we can use partial-clone + sparse-checkout to reduce
>> > the network overhead, and reduce disk storage space size, that's good.
>> >
>> > But I also need a ACL to control what directory or file people can fetch/push.
>> > e.g. I don't want a client fetch the code in "service" or "web".
>> >
>> > Now if the user client use "git log -p" or "git sparse-checkout add service"...
>> > or other git command, git which will  download them by
>> > "git fetch --filter=blob:none --stdin <oid>" automatically.
>> >
>> > This means that the git client and server interact with git objects
>> > (and don't care about path) we cannot simply ban someone download
>> > a "path" on the server side.
>> >
>> > What should I do? You may recommend me to use submodule,
>> > but due to its complexity, I don't really want to use it :-(
>>
>> There isn't a way to do this in git.
>>
>> It's theoretically possible, i.e. a client could be told that the SHA-1
>> of a directory is XYZ, and construct a commit object with a reference to
>> it.
>>
>
> I guess you mean use a special reference to hold the restricted path which
> the client can access, and pre-receive-hook can ban the client from downloading
> other references. But this method is a little weird... How can this reference
> sync with main branches? If we have changed client permission to access
> server directory, how to get the "history" of the server directory?
>
> I believe this approach is not very appropriate and is not maintainable.

It's not maintainable at all, and I don't believe any current git client
supports this.

But due to git's commits referring to a Merkle tree I can tell you that
a subdirectory "secret" has a current tree SHA-1 of XYZ, without giving
you any of that content.

You *could* then manually construct a commit like:

	tree <NEW_TREE>
	...

Where the "<NEW_TREE>" would be a tree like:

	100644 blob <NEW-BLOB-SHA1>	UPDATED.md
	040000 tree <XYZ>	secret-stuff

And send you a PACK with my new two three new objects (commit, blob &
new top-level NEW_TREE). To the remote end & protocol it wouldn't be
distinguishable from a "normal" push.

But nothing supports this already, as a practical matter most of git
either hard dies if content is missing, or has other odd edge-case
semantics (and I'm not up-to-date on the state of the art).

Anyway, just saying that for the longer term I'm not aware of an
*intrinsic* reason for why we couldn't support this sort of thing, in
case anyone's interested in putting in a *lot* of leg work to make it
happen.

>> But currently a *lot* of things in the client code assume that these
>> things will be available in one way or another.
>>
>> The state-of-the-art in the "sparse" code may differ from the above, I
>> don't know.
>>
>> Also note that there's a well-known edge case in the git protocol where
>> it's really incompatible with the notion of "secret" data, i.e. even if
>> you hide a ref you'll be able to "guess" it by seeing what delta(s) the
>> server will produce or accept etc.
>
> Yeah, there are data security issues... Unless we need to isolate objects
> between directories. Or in this case we disable the delta object.....
> Okay, this seems a little strange.

You can't really just "disable the delta(s)". Well, you can in
principle, but like what I outlined above it's one of those things
that's a far way off, and it's one thing to e.g. have a client that's
able to craft a commit referring to data it doesn't have.

It's quite another to secure a server in such a way that it can serve up
secret data from the repo to some clients, but not to others.

I can imagine some hacks to make that happen, but I won't go into that
here...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-28 15:50     ` Ævar Arnfjörð Bjarmason
@ 2022-07-29  1:48       ` Elijah Newren
  2022-07-29 14:22         ` ZheNing Hu
  2022-07-29 13:15       ` ZheNing Hu
  1 sibling, 1 reply; 13+ messages in thread
From: Elijah Newren @ 2022-07-29  1:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: ZheNing Hu, Git List, Derrick Stolee, Junio C Hamano,
	Christian Couder, Jeff King

On Thu, Jul 28, 2022 at 9:28 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Thu, Jul 28 2022, ZheNing Hu wrote:
>
> > Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2022年7月27日周三 17:20写道：
> >>
> >>
> >> On Wed, Jul 27 2022, ZheNing Hu wrote:
> >>
> >> > if there is a monorepo such as
> >> > git@github.com:derrickstolee/sparse-checkout-example.git
> >> >
> >> > There are many files and directories:
> >> >
> >> > client/
> >> >     android/
> >> >     electron/
> >> >     iOS/
> >> > service/
> >> >     common/
> >> >     identity/
> >> >     list/
> >> >     photos/
> >> > web/
> >> >     browser/
> >> >     editor/
> >> >     friends/
> >> > boostrap.sh
> >> > LICENSE.md
> >> > README.md
> >> >
> >> > Now we can use partial-clone + sparse-checkout to reduce
> >> > the network overhead, and reduce disk storage space size, that's good.
> >> >
> >> > But I also need a ACL to control what directory or file people can fetch/push.
> >> > e.g. I don't want a client fetch the code in "service" or "web".
> >> >
> >> > Now if the user client use "git log -p" or "git sparse-checkout add service"...
> >> > or other git command, git which will  download them by
> >> > "git fetch --filter=blob:none --stdin <oid>" automatically.
> >> >
> >> > This means that the git client and server interact with git objects
> >> > (and don't care about path) we cannot simply ban someone download
> >> > a "path" on the server side.
> >> >
> >> > What should I do? You may recommend me to use submodule,
> >> > but due to its complexity, I don't really want to use it :-(
> >>
> >> There isn't a way to do this in git.
> >>
> >> It's theoretically possible, i.e. a client could be told that the SHA-1
> >> of a directory is XYZ, and construct a commit object with a reference to
> >> it.
> >>
> >
> > I guess you mean use a special reference to hold the restricted path which
> > the client can access, and pre-receive-hook can ban the client from downloading
> > other references. But this method is a little weird... How can this reference
> > sync with main branches? If we have changed client permission to access
> > server directory, how to get the "history" of the server directory?
> >
> > I believe this approach is not very appropriate and is not maintainable.
>
> It's not maintainable at all, and I don't believe any current git client
> supports this.

I agree it's not maintainable and a bad idea.  But I did want to
correct one small thing, and I do have an alternative suggestion at
the end...

> But due to git's commits referring to a Merkle tree I can tell you that
> a subdirectory "secret" has a current tree SHA-1 of XYZ, without giving
> you any of that content.
>
> You *could* then manually construct a commit like:
>
>         tree <NEW_TREE>
>         ...
>
> Where the "<NEW_TREE>" would be a tree like:
>
>         100644 blob <NEW-BLOB-SHA1>     UPDATED.md
>         040000 tree <XYZ>       secret-stuff
>
> And send you a PACK with my new two three new objects (commit, blob &
> new top-level NEW_TREE). To the remote end & protocol it wouldn't be
> distinguishable from a "normal" push.
>
> But nothing supports this already, as a practical matter most of git
> either hard dies if content is missing, or has other odd edge-case
> semantics (and I'm not up-to-date on the state of the art).

Actually, this is what sparse-index (as a sub-option in
sparse-checkout) already basically does.  See
Documentation/technical/sparse-index.txt for details, and note that
we're basically in Phase IV of that document.  In short, the
sparse-index makes it so that common operations based on the index do
not need and do not use information about some subtrees, so if someone
has a partial clone starting with no blobs, they will only have to
download a small subset of the repository blobs in order to handle
most Git operations, and many operations become much faster since the
index is so much smaller.

However:

* Users can run `git sparse-checkout reapply --no-sparse-index` at any
time to force the index to be full again.  This is documented, and
even suggested that users remember in case they attempt to use
external tools (jgit? libgit2? others?) that don't understand sparse
directory entries.  So, removing this ability would be problematic.

* It makes no guarantee whatsoever that the sparse directory entries
are not expanded by less frequently used Git commands.  Notice the
"ensure_full_index()" calls sprinkled throughout the code.  Some have
been removed, one by one, as commands have been modified to better
operate with a sparse index.  The odds they'll all be removed in the
future may well be close to 0%.

* The `ort` merge strategy ignores the index altogether during
operation.  If it needs to walk into a tree to complete a
merge/rebase/revert/cherry-pick/etc., it will.  Further, it doesn't
just look into those paths, it intentionally de-sparsifies paths
involved in conflicts, so it can display it to the user.

* Just because the index is sparse does not mean other commands can't
walk into those directories.  So `git grep` (when given a revision),
`git diff`, `git log`, etc. will look in (old versions of) those
paths.

> Anyway, just saying that for the longer term I'm not aware of an
> *intrinsic* reason for why we couldn't support this sort of thing, in
> case anyone's interested in putting in a *lot* of leg work to make it
> happen.

And on top of the technical leg work required, they would also need to
somehow convince everyone else that it's worth accepting the increased
maintenance effort.  Right now, even if someone had already done the
work to implement it, I'd say it's not worth the maintenance costs.

However, there are two alternative choices I can think of here: You
can use submodules if you want a fixed part of the repository to only
be available to a subset of folks, or use josh
(https://github.com/josh-project/josh) if you need it to be more
dynamic.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-27  9:24 ` Thomas Guyot
@ 2022-07-29 12:49   ` ZheNing Hu
  0 siblings, 0 replies; 13+ messages in thread
From: ZheNing Hu @ 2022-07-29 12:49 UTC (permalink / raw)
  To: Thomas Guyot
  Cc: Git List, Derrick Stolee, Junio C Hamano, Christian Couder,
	Jeff King, Ævar Arnfjörð Bjarmason

Thomas Guyot <tguyot@gmail.com> 于2022年7月27日周三 17:27写道：
>
> On 2022-07-27 04:56, ZheNing Hu wrote:
> > if there is a monorepo such as
> > git@github.com:derrickstolee/sparse-checkout-example.git
> >
> > There are many files and directories:
> >
> > client/
> >      android/
> >      electron/
> >      iOS/
> > service/
> >      common/
> >      identity/
> >      list/
> >      photos/
> > web/
> >      browser/
> >      editor/
> >      friends/
> > boostrap.sh
> > LICENSE.md
> > README.md
> >
> > Now we can use partial-clone + sparse-checkout to reduce
> > the network overhead, and reduce disk storage space size, that's good.
> >
> > But I also need a ACL to control what directory or file people can fetch/push.
> > e.g. I don't want a client fetch the code in "service" or "web".
>
> Pushes can easily be blocked with a pre-receive or update hook on the
> server side. That covers the case where you want to prevenr users to
> update certain paths in the repo.

Agree. pre-receive-hook/update-hook may be the only way git does ACL
when client push.

> > Now if the user client use "git log -p" or "git sparse-checkout add service"...
> > or other git command, git which will  download them by
> > "git fetch --filter=blob:none --stdin <oid>" automatically.
> >
> > This means that the git client and server interact with git objects
> > (and don't care about path) we cannot simply ban someone download
> > a "path" on the server side.
>
> Indeed - core devs can correct me if I'm wrong but afaik even in the
> case of sparse checkouts and partial clones the packs may include other
> objects. I have no ideas how git selects objects and packs on sent and
> when it decides to repack objects... What I know is it can pack entire
> repos in just a few files using delta compression and it would probably
> make sense to sent these pack if there is no real benefit in repacking
> just the requested objects.

Yeah, here we need to consider the tradeoff between repacking all
objects in path
and sending objects one by one.

> > What should I do? You may recommend me to use submodule,
> > but due to its complexity, I don't really want to use it :-(
>
> Submodules is definitively an option for read ACLs, and considering git
> was not originally designed to hide information from a single store it's
> probably your only option. Moreover, if the git client is able to fetch
> directly blobs and trees (the later includes partial trees as a tree
> object is a single "directory" that can contain other blobs and trees),
> then even the server has no knowledge of where a tree hook into, or even
> how it's named. All that information would have to be mapped elsewhere.
>

An association: how does the linux file system do ACL?

1. The uid, gid of the file or sub directory record in directory entry.
2. When a user wants to access the file or sub directory, filesystem
check if the user has the same permission of uid/gid.

This is just a casual thought:

Let git imitate the linux file system, we may need to record user
signatures in entry of tree objects.

Then we let the server just download the objects which match
the user signature.

> To take your example above, the "common" subtree of "service/" could be
> in multiple top level directories (i,e, the same tree with same
> contents), and each top level dirs could have a different "common"
> subtree. So git would have to find where each tree object (one per
> directory) is accessible from for *each revision* before deciding if a
> client should be authorized to fetch an object, and the same would be
> required for blobs (and tree objects don't even know their own name,
> that comes from the reference in the parent tree or commit object for
> the top-level tree).
>

If the client tells the server what the objects it wants , yes, it
needs to check
all the paths which may "hold" these objects... But if user tell
server what's the
path it wants,  that will be easier for the server to check...

> So even before solving the client/server protocol issue you mentioned,
> you can't just hide part of a repo in git right now and changing that is
> definitively not trivial.
>

Agree. It's hard to change...

> --
> Thomas

ZheNing Hu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-28 15:50     ` Ævar Arnfjörð Bjarmason
  2022-07-29  1:48       ` Elijah Newren
@ 2022-07-29 13:15       ` ZheNing Hu
  1 sibling, 0 replies; 13+ messages in thread
From: ZheNing Hu @ 2022-07-29 13:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Derrick Stolee, Junio C Hamano, Christian Couder,
	Jeff King

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2022年7月28日周四 23:59写道：
>
>
> On Thu, Jul 28 2022, ZheNing Hu wrote:
>
> > Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2022年7月27日周三 17:20写道：
> >>
> >>
> >> On Wed, Jul 27 2022, ZheNing Hu wrote:
> >>
> >> > if there is a monorepo such as
> >> > git@github.com:derrickstolee/sparse-checkout-example.git
> >> >
> >> > There are many files and directories:
> >> >
> >> > client/
> >> >     android/
> >> >     electron/
> >> >     iOS/
> >> > service/
> >> >     common/
> >> >     identity/
> >> >     list/
> >> >     photos/
> >> > web/
> >> >     browser/
> >> >     editor/
> >> >     friends/
> >> > boostrap.sh
> >> > LICENSE.md
> >> > README.md
> >> >
> >> > Now we can use partial-clone + sparse-checkout to reduce
> >> > the network overhead, and reduce disk storage space size, that's good.
> >> >
> >> > But I also need a ACL to control what directory or file people can fetch/push.
> >> > e.g. I don't want a client fetch the code in "service" or "web".
> >> >
> >> > Now if the user client use "git log -p" or "git sparse-checkout add service"...
> >> > or other git command, git which will  download them by
> >> > "git fetch --filter=blob:none --stdin <oid>" automatically.
> >> >
> >> > This means that the git client and server interact with git objects
> >> > (and don't care about path) we cannot simply ban someone download
> >> > a "path" on the server side.
> >> >
> >> > What should I do? You may recommend me to use submodule,
> >> > but due to its complexity, I don't really want to use it :-(
> >>
> >> There isn't a way to do this in git.
> >>
> >> It's theoretically possible, i.e. a client could be told that the SHA-1
> >> of a directory is XYZ, and construct a commit object with a reference to
> >> it.
> >>
> >
> > I guess you mean use a special reference to hold the restricted path which
> > the client can access, and pre-receive-hook can ban the client from downloading
> > other references. But this method is a little weird... How can this reference
> > sync with main branches? If we have changed client permission to access
> > server directory, how to get the "history" of the server directory?
> >
> > I believe this approach is not very appropriate and is not maintainable.
>
> It's not maintainable at all, and I don't believe any current git client
> supports this.
>
> But due to git's commits referring to a Merkle tree I can tell you that
> a subdirectory "secret" has a current tree SHA-1 of XYZ, without giving
> you any of that content.
>
> You *could* then manually construct a commit like:
>
>         tree <NEW_TREE>
>         ...
>
> Where the "<NEW_TREE>" would be a tree like:
>
>         100644 blob <NEW-BLOB-SHA1>     UPDATED.md
>         040000 tree <XYZ>       secret-stuff
>
> And send you a PACK with my new two three new objects (commit, blob &
> new top-level NEW_TREE). To the remote end & protocol it wouldn't be
> distinguishable from a "normal" push.
>
> But nothing supports this already, as a practical matter most of git
> either hard dies if content is missing, or has other odd edge-case
> semantics (and I'm not up-to-date on the state of the art).
>
> Anyway, just saying that for the longer term I'm not aware of an
> *intrinsic* reason for why we couldn't support this sort of thing, in
> case anyone's interested in putting in a *lot* of leg work to make it
> happen.
>

As Newren said, this is just like what sparse-index does. I use
partial clone + sparse-checkout + sparse-index to do git add/git commit,
git can add and commit correctly without fetching any excess objects.
But we can't prevent users from downloading other directories or files.

> >> But currently a *lot* of things in the client code assume that these
> >> things will be available in one way or another.
> >>
> >> The state-of-the-art in the "sparse" code may differ from the above, I
> >> don't know.
> >>
> >> Also note that there's a well-known edge case in the git protocol where
> >> it's really incompatible with the notion of "secret" data, i.e. even if
> >> you hide a ref you'll be able to "guess" it by seeing what delta(s) the
> >> server will produce or accept etc.
> >
> > Yeah, there are data security issues... Unless we need to isolate objects
> > between directories. Or in this case we disable the delta object.....
> > Okay, this seems a little strange.
>
> You can't really just "disable the delta(s)". Well, you can in
> principle, but like what I outlined above it's one of those things
> that's a far way off, and it's one thing to e.g. have a client that's
> able to craft a commit referring to data it doesn't have.
>
> It's quite another to secure a server in such a way that it can serve up
> secret data from the repo to some clients, but not to others.
>

All right... I might have to think of something else.

> I can imagine some hacks to make that happen, but I won't go into that
> here...

ZheNing Hu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-29  1:48       ` Elijah Newren
@ 2022-07-29 14:22         ` ZheNing Hu
  2022-07-29 14:57           ` rsbecker
  0 siblings, 1 reply; 13+ messages in thread
From: ZheNing Hu @ 2022-07-29 14:22 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ævar Arnfjörð Bjarmason, Git List, Derrick Stolee,
	Junio C Hamano, Christian Couder, Jeff King

Elijah Newren <newren@gmail.com> 于2022年7月29日周五 09:48写道：

> > But due to git's commits referring to a Merkle tree I can tell you that
> > a subdirectory "secret" has a current tree SHA-1 of XYZ, without giving
> > you any of that content.
> >
> > You *could* then manually construct a commit like:
> >
> >         tree <NEW_TREE>
> >         ...
> >
> > Where the "<NEW_TREE>" would be a tree like:
> >
> >         100644 blob <NEW-BLOB-SHA1>     UPDATED.md
> >         040000 tree <XYZ>       secret-stuff
> >
> > And send you a PACK with my new two three new objects (commit, blob &
> > new top-level NEW_TREE). To the remote end & protocol it wouldn't be
> > distinguishable from a "normal" push.
> >
> > But nothing supports this already, as a practical matter most of git
> > either hard dies if content is missing, or has other odd edge-case
> > semantics (and I'm not up-to-date on the state of the art).
>
> Actually, this is what sparse-index (as a sub-option in
> sparse-checkout) already basically does.  See
> Documentation/technical/sparse-index.txt for details, and note that
> we're basically in Phase IV of that document.  In short, the
> sparse-index makes it so that common operations based on the index do
> not need and do not use information about some subtrees, so if someone
> has a partial clone starting with no blobs, they will only have to
> download a small subset of the repository blobs in order to handle
> most Git operations, and many operations become much faster since the
> index is so much smaller.
>

I think this is mainly due to sparse-checkout instead of sparse-index.
Without the sparse-index, we also can do git add, git commit without fetching
other blob objects.

But sparse-index can help reduce the size of indexes.

> However:
>
> * Users can run `git sparse-checkout reapply --no-sparse-index` at any
> time to force the index to be full again.  This is documented, and
> even suggested that users remember in case they attempt to use
> external tools (jgit? libgit2? others?) that don't understand sparse
> directory entries.  So, removing this ability would be problematic.
>

Or `git sparse-checkout disable`? Whatever, when git finds other objects
missing, it will fetch the objects from remote, and we may do ACL check here.
Just let jgit/libgit2/others fail to fetch objects (in this special case?)

> * It makes no guarantee whatsoever that the sparse directory entries
> are not expanded by less frequently used Git commands.  Notice the
> "ensure_full_index()" calls sprinkled throughout the code.  Some have
> been removed, one by one, as commands have been modified to better
> operate with a sparse index.  The odds they'll all be removed in the
> future may well be close to 0%.
>

That's good...

> * The `ort` merge strategy ignores the index altogether during
> operation.  If it needs to walk into a tree to complete a
> merge/rebase/revert/cherry-pick/etc., it will.  Further, it doesn't
> just look into those paths, it intentionally de-sparsifies paths
> involved in conflicts, so it can display it to the user.
>

So the user has to care and deal with a merge conflict in a directory
that he "doesn't have access to"...

It would be nice to have the user only care about conflicts in directories/files
to which he has permissions. I don't know if it would be very
difficult to design.

> * Just because the index is sparse does not mean other commands can't
> walk into those directories.  So `git grep` (when given a revision),
> `git diff`, `git log`, etc. will look in (old versions of) those
> paths.
>

Agree.

> > Anyway, just saying that for the longer term I'm not aware of an
> > *intrinsic* reason for why we couldn't support this sort of thing, in
> > case anyone's interested in putting in a *lot* of leg work to make it
> > happen.
>
> And on top of the technical leg work required, they would also need to
> somehow convince everyone else that it's worth accepting the increased
> maintenance effort.  Right now, even if someone had already done the
> work to implement it, I'd say it's not worth the maintenance costs.
>
> However, there are two alternative choices I can think of here: You
> can use submodules if you want a fixed part of the repository to only
> be available to a subset of folks, or use josh
> (https://github.com/josh-project/josh) if you need it to be more
> dynamic.

Thanks, I will take a look.

ZheNing Hu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Question: What's the best way to implement directory permission control in git?
  2022-07-29 14:22         ` ZheNing Hu
@ 2022-07-29 14:57           ` rsbecker
  0 siblings, 0 replies; 13+ messages in thread
From: rsbecker @ 2022-07-29 14:57 UTC (permalink / raw)
  To: 'ZheNing Hu', 'Elijah Newren'
  Cc: 'Ævar Arnfjörð Bjarmason',
	'Git List', 'Derrick Stolee',
	'Junio C Hamano', 'Christian Couder',
	'Jeff King'

On July 29, 2022 10:22 AM, ZheNing Hu wrote:
>Elijah Newren <newren@gmail.com> 于2022年7月29日周五 09:48写道：
>
>> > But due to git's commits referring to a Merkle tree I can tell you
>> > that a subdirectory "secret" has a current tree SHA-1 of XYZ,
>> > without giving you any of that content.
>> >
>> > You *could* then manually construct a commit like:
>> >
>> >         tree <NEW_TREE>
>> >         ...
>> >
>> > Where the "<NEW_TREE>" would be a tree like:
>> >
>> >         100644 blob <NEW-BLOB-SHA1>     UPDATED.md
>> >         040000 tree <XYZ>       secret-stuff
>> >
>> > And send you a PACK with my new two three new objects (commit, blob
>> > & new top-level NEW_TREE). To the remote end & protocol it wouldn't
>> > be distinguishable from a "normal" push.
>> >
>> > But nothing supports this already, as a practical matter most of git
>> > either hard dies if content is missing, or has other odd edge-case
>> > semantics (and I'm not up-to-date on the state of the art).
>>
>> Actually, this is what sparse-index (as a sub-option in
>> sparse-checkout) already basically does.  See
>> Documentation/technical/sparse-index.txt for details, and note that
>> we're basically in Phase IV of that document.  In short, the
>> sparse-index makes it so that common operations based on the index do
>> not need and do not use information about some subtrees, so if someone
>> has a partial clone starting with no blobs, they will only have to
>> download a small subset of the repository blobs in order to handle
>> most Git operations, and many operations become much faster since the
>> index is so much smaller.
>>
>
>I think this is mainly due to sparse-checkout instead of sparse-index.
>Without the sparse-index, we also can do git add, git commit without fetching
>other blob objects.
>
>But sparse-index can help reduce the size of indexes.
>
>> However:
>>
>> * Users can run `git sparse-checkout reapply --no-sparse-index` at any
>> time to force the index to be full again.  This is documented, and
>> even suggested that users remember in case they attempt to use
>> external tools (jgit? libgit2? others?) that don't understand sparse
>> directory entries.  So, removing this ability would be problematic.
>>
>
>Or `git sparse-checkout disable`? Whatever, when git finds other objects missing,
>it will fetch the objects from remote, and we may do ACL check here.
>Just let jgit/libgit2/others fail to fetch objects (in this special case?)
>
>> * It makes no guarantee whatsoever that the sparse directory entries
>> are not expanded by less frequently used Git commands.  Notice the
>> "ensure_full_index()" calls sprinkled throughout the code.  Some have
>> been removed, one by one, as commands have been modified to better
>> operate with a sparse index.  The odds they'll all be removed in the
>> future may well be close to 0%.
>>
>
>That's good...
>
>> * The `ort` merge strategy ignores the index altogether during
>> operation.  If it needs to walk into a tree to complete a
>> merge/rebase/revert/cherry-pick/etc., it will.  Further, it doesn't
>> just look into those paths, it intentionally de-sparsifies paths
>> involved in conflicts, so it can display it to the user.
>>
>
>So the user has to care and deal with a merge conflict in a directory that he
>"doesn't have access to"...
>
>It would be nice to have the user only care about conflicts in directories/files to
>which he has permissions. I don't know if it would be very difficult to design.
>
>> * Just because the index is sparse does not mean other commands can't
>> walk into those directories.  So `git grep` (when given a revision),
>> `git diff`, `git log`, etc. will look in (old versions of) those
>> paths.
>>
>
>Agree.
>
>> > Anyway, just saying that for the longer term I'm not aware of an
>> > *intrinsic* reason for why we couldn't support this sort of thing,
>> > in case anyone's interested in putting in a *lot* of leg work to
>> > make it happen.
>>
>> And on top of the technical leg work required, they would also need to
>> somehow convince everyone else that it's worth accepting the increased
>> maintenance effort.  Right now, even if someone had already done the
>> work to implement it, I'd say it's not worth the maintenance costs.
>>
>> However, there are two alternative choices I can think of here: You
>> can use submodules if you want a fixed part of the repository to only
>> be available to a subset of folks, or use josh
>> (https://github.com/josh-project/josh) if you need it to be more
>> dynamic.
>
>Thanks, I will take a look.

As a completely side perspective on this, I had to integrate security management with five separate security subsystems/mechanisms (not joking) on the NonStop platform that included Unix-style Access Control Lists (ACLs), non-inode ACLs on the NonStop side of the platform, and some recent new thing called XOS - I don't know it yet but provisioned for it. The solution I ended up with was writing a full Workflow wrapper around git that does things similar to GitHub Actions, so after an operation like checkout/switch, merge, pull, etc., specific rules specified in YAML in the repo (if enabled by the user) are run that apply the ACLs. It is a very heavy-weight solution to the problem but works pretty well on this "exotic" platform - Workflows were needed for other reasons as well, so I just piggybacked the security handling into my Workflow structure. Again, not built into git but wrapped around it. I could have used hooks for some of it but needed support for more operations than hooks had.
--Randall


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-27  8:56 Question: What's the best way to implement directory permission control in git? ZheNing Hu
  2022-07-27  9:17 ` Ævar Arnfjörð Bjarmason
  2022-07-27  9:24 ` Thomas Guyot
@ 2022-07-29 23:50 ` Emily Shaffer
  2022-07-31 16:15   ` ZheNing Hu
  2022-08-01 10:14   ` Han-Wen Nienhuys
  2 siblings, 2 replies; 13+ messages in thread
From: Emily Shaffer @ 2022-07-29 23:50 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: Git List, Derrick Stolee, Junio C Hamano, Christian Couder,
	Jeff King, Ævar Arnfjörð Bjarmason

On Wed, Jul 27, 2022 at 1:56 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> if there is a monorepo such as
> git@github.com:derrickstolee/sparse-checkout-example.git
>
> There are many files and directories:
>
> client/
>     android/
>     electron/
>     iOS/
> service/
>     common/
>     identity/
>     list/
>     photos/
> web/
>     browser/
>     editor/
>     friends/
> boostrap.sh
> LICENSE.md
> README.md
>
> Now we can use partial-clone + sparse-checkout to reduce
> the network overhead, and reduce disk storage space size, that's good.
>
> But I also need a ACL to control what directory or file people can fetch/push.
> e.g. I don't want a client fetch the code in "service" or "web".
>
> Now if the user client use "git log -p" or "git sparse-checkout add service"...
> or other git command, git which will  download them by
> "git fetch --filter=blob:none --stdin <oid>" automatically.
>
> This means that the git client and server interact with git objects
> (and don't care about path) we cannot simply ban someone download
> a "path" on the server side.
>
> What should I do? You may recommend me to use submodule,
> but due to its complexity, I don't really want to use it :-(

As a quick note, there is some effort on making submodules less
complex, at least from the user perspective. My team and I have been
actively working on improvements in that area for the past year or so.
Please feel free to read and examine the design doc[1] to see if the
future looks brighter in that direction than you thought - or, even
better, if there's something missing from that design that would be
compelling in allowing you to use submodules to solve your use case.

As for differing ACLs within a single repository... Google has had
some attempts at it and has only found pain, at least where Git is
involved. As others have mentioned elsewhere downthread, it doesn't
really match Git's data model.

Gerrit has tried to support something sort of similar to this -
per-branch read permissions. They were really painful! So much so that
our Gerrit team is actively discouraging their use, and in the process
of deprecating them. It turns out that on the server side, calculating
permissions for which commit should be visible is very expensive,
because you are not just saying "is commit abcdef on
forbidden-branch?" but rather are saying "is commit abcdef on
forbidden-branch *and not on any branches $user is allowed to see*?"
The same calculation woes would be true of per-object or per-tree
permissions, because Git will treat 'everyone/can/see/.linter.config'
and 'very/secret/dir/.linter.config' as a single object with a single
ID if the contents of each '.linter.config' are identical. It is still
very expensive for the server to decide whether or not it's okay to
send a certain object. Part of the reason the branch ACL calculation
is so painful is that we have some repositories with many many
branches (100,000+); if you're using a very large monorepo you will
probably find similarly expensive and complex calculations even in a
single repository.

Generally, this isn't something I'd like to see Git support - I think
it would by necessity be kludgey and has some very pointy edge cases
for the user (what if I'm trying to merge from another branch and
there is a conflict in very/secret/dir/, but I'm not allowed to see
it?). But of course Git is open source, and my opinion is only one of
many; I just wanted to share some past pain that we've had in this
area.

 - Emily

1: https://lore.kernel.org/git/YHofmWcIAidkvJiD@google.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-29 23:50 ` Emily Shaffer
@ 2022-07-31 16:15   ` ZheNing Hu
  2022-08-01 10:14   ` Han-Wen Nienhuys
  1 sibling, 0 replies; 13+ messages in thread
From: ZheNing Hu @ 2022-07-31 16:15 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Git List, Derrick Stolee, Junio C Hamano, Christian Couder,
	Jeff King, Ævar Arnfjörð Bjarmason

Emily Shaffer <emilyshaffer@google.com> 于2022年7月30日周六 07:50写道：
>
> On Wed, Jul 27, 2022 at 1:56 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > if there is a monorepo such as
> > git@github.com:derrickstolee/sparse-checkout-example.git
> >
> > There are many files and directories:
> >
> > client/
> >     android/
> >     electron/
> >     iOS/
> > service/
> >     common/
> >     identity/
> >     list/
> >     photos/
> > web/
> >     browser/
> >     editor/
> >     friends/
> > boostrap.sh
> > LICENSE.md
> > README.md
> >
> > Now we can use partial-clone + sparse-checkout to reduce
> > the network overhead, and reduce disk storage space size, that's good.
> >
> > But I also need a ACL to control what directory or file people can fetch/push.
> > e.g. I don't want a client fetch the code in "service" or "web".
> >
> > Now if the user client use "git log -p" or "git sparse-checkout add service"...
> > or other git command, git which will  download them by
> > "git fetch --filter=blob:none --stdin <oid>" automatically.
> >
> > This means that the git client and server interact with git objects
> > (and don't care about path) we cannot simply ban someone download
> > a "path" on the server side.
> >
> > What should I do? You may recommend me to use submodule,
> > but due to its complexity, I don't really want to use it :-(
>
> As a quick note, there is some effort on making submodules less
> complex, at least from the user perspective. My team and I have been
> actively working on improvements in that area for the past year or so.
> Please feel free to read and examine the design doc[1] to see if the
> future looks brighter in that direction than you thought - or, even
> better, if there's something missing from that design that would be
> compelling in allowing you to use submodules to solve your use case.
>

Thanks, I think submodules’ improvement may shift my perception.
But the problem I'm having is whether I should give permission control
to all "subdirectories" (if and when I find out that this is not necessary,
then submodules might be an option)

> As for differing ACLs within a single repository... Google has had
> some attempts at it and has only found pain, at least where Git is
> involved. As others have mentioned elsewhere downthread, it doesn't
> really match Git's data model.
>

That's so sad :(

> Gerrit has tried to support something sort of similar to this -
> per-branch read permissions. They were really painful! So much so that
> our Gerrit team is actively discouraging their use, and in the process
> of deprecating them. It turns out that on the server side, calculating
> permissions for which commit should be visible is very expensive,
> because you are not just saying "is commit abcdef on
> forbidden-branch?" but rather are saying "is commit abcdef on
> forbidden-branch *and not on any branches $user is allowed to see*?"
> The same calculation woes would be true of per-object or per-tree
> permissions, because Git will treat 'everyone/can/see/.linter.config'
> and 'very/secret/dir/.linter.config' as a single object with a single
> ID if the contents of each '.linter.config' are identical. It is still
> very expensive for the server to decide whether or not it's okay to
> send a certain object. Part of the reason the branch ACL calculation
> is so painful is that we have some repositories with many many
> branches (100,000+); if you're using a very large monorepo you will
> probably find similarly expensive and complex calculations even in a
> single repository.
>

Agree, as Avar said that there are delta data too (so data cannot easily
hidden)

> Generally, this isn't something I'd like to see Git support - I think
> it would by necessity be kludgey and has some very pointy edge cases
> for the user (what if I'm trying to merge from another branch and
> there is a conflict in very/secret/dir/, but I'm not allowed to see
> it?). But of course Git is open source, and my opinion is only one of
> many; I just wanted to share some past pain that we've had in this
> area.
>

To summarize (your and other answers' ideas), I have reasons to believe
that git itself cannot easily solve this directory permissions problem:
1. Files with the same object id can be in different directories
(data cannot be isolated).
2. DELTA data can share data between multiple objects
(data cannot be isolated).
3. Permission management is very cumbersome and time consuming,
especially on large repositories.
4. The directories that are not accessible should be or not see merge
conflict is a big problem.

>  - Emily
>
> 1: https://lore.kernel.org/git/YHofmWcIAidkvJiD@google.com/

Thanks.

ZheNing Hu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: What's the best way to implement directory permission control in git?
  2022-07-29 23:50 ` Emily Shaffer
  2022-07-31 16:15   ` ZheNing Hu
@ 2022-08-01 10:14   ` Han-Wen Nienhuys
  1 sibling, 0 replies; 13+ messages in thread
From: Han-Wen Nienhuys @ 2022-08-01 10:14 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: ZheNing Hu, Git List, Derrick Stolee, Junio C Hamano,
	Christian Couder, Jeff King,
	Ævar Arnfjörð Bjarmason

On Sat, Jul 30, 2022 at 1:50 AM Emily Shaffer <emilyshaffer@google.com> wrote:
> Gerrit has tried to support something sort of similar to this -
> per-branch read permissions. They were really painful! So much so that
> our Gerrit team is actively discouraging their use, and in the process
> of deprecating them. It turns out that on the server side, calculating
> permissions for which commit should be visible is very expensive,
> because you are not just saying "is commit abcdef on
> forbidden-branch?" but rather are saying "is commit abcdef on
> forbidden-branch *and not on any branches $user is allowed to see*?"
> The same calculation woes would be true of per-object or per-tree
> permissions, because Git will treat 'everyone/can/see/.linter.config'
> and 'very/secret/dir/.linter.config' as a single object with a single
> ID if the contents of each '.linter.config' are identical. It is still
> very expensive for the server to decide whether or not it's okay to
> send a certain object. Part of the reason the branch ACL calculation
> is so painful is that we have some repositories with many many
> branches (100,000+); if you're using a very large monorepo you will
> probably find similarly expensive and complex calculations even in a
> single repository.

Thanks Emily,

I agree with your points, but as the manager of Google's Gerrit team,
I just wanted to add a few clarifications:

* The max number of branches we have on repositories is O(1000s). IIRC
our Android repositories are the worst offenders, because there is a
combinatorial explosion of {major release, minor release, target
device}. Pending reviews number in the millions, but we usually don't
have to evaluate ACLs fully, as the review refs aren't downloaded
commonly.

* The read ACLs are assigned to {branch-regexp, group} tuples. This
means that you can't precompute visibility either, because each
individual user may be in a different set of groups.

* Even disconsidering that, you can still do optimizations if updates
are FF (because each update only increases the visibility of each
commit). However, non-FF branch updates preclude such precomputations.
(Gerrit has non-FF updates in a number of places).

* The Gerrit team isn't actively deprecating read ACLs: the problem is
hard, because removing read ACLs on branches means that the read ACLs
move to repository level, which implies setting up complex ACL
configuration and replication infrastructure for repositories to
address existing use cases. It's currently just one of these features
that we wish hadn't been added, but now that it's there, we suffer
through it.

More generally, read permissions are hard to get right in a monorepo:
even if you stop developers from accessing the code through Git fetch,
the permissions must also be enforced throughout the entire dev stack,
including code browsing, code search, viewing CI artifacts etc.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Liana Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-08-01 10:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-27  8:56 Question: What's the best way to implement directory permission control in git? ZheNing Hu
2022-07-27  9:17 ` Ævar Arnfjörð Bjarmason
2022-07-28 14:54   ` ZheNing Hu
2022-07-28 15:50     ` Ævar Arnfjörð Bjarmason
2022-07-29  1:48       ` Elijah Newren
2022-07-29 14:22         ` ZheNing Hu
2022-07-29 14:57           ` rsbecker
2022-07-29 13:15       ` ZheNing Hu
2022-07-27  9:24 ` Thomas Guyot
2022-07-29 12:49   ` ZheNing Hu
2022-07-29 23:50 ` Emily Shaffer
2022-07-31 16:15   ` ZheNing Hu
2022-08-01 10:14   ` Han-Wen Nienhuys

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).