git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Question regarding quarantine environments
@ 2018-08-02 17:58 Liam Decker
  2018-08-02 18:39 ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Liam Decker @ 2018-08-02 17:58 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1265 bytes --]

Hi all,

I've been working on a git hook in golang recently. However, the library I
was using did not support a possible quarantine directory, which I would
use for my hook.

I have been trying to find out how git finds this incoming directory in the
objects folder, as their code simply assumed it resided in
.git/objects/<1st byte>/<last 19 bytes>
I read the documentation describing the git repository layout here [1] as
well as the objects documentation here [2] and the git hooks documentation
here [3] directed me to the receive-pack documentation here [4]
I have also tried googling, but this is a pretty specific question

The solution that I implemented was to check the objects directory for the
object, and if it was not there, to look for a quarantine directory and try
there. However, that feels fairly inefficient.
For the curious, the library and solution I attempted are both here [5]

If anyone could help direct me to find specifically how git looks for
objects in the repository, I would be very grateful

[1] https://git-scm.com/docs/gitrepository-layout
[2] https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
[3] https://git-scm.com/docs/githooks
[4] https://git-scm.com/docs/git-receive-pack
[5] https://github.com/src-d/go-git/pull/887

[-- Attachment #2: Type: text/html, Size: 1790 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question regarding quarantine environments
  2018-08-02 17:58 Question regarding quarantine environments Liam Decker
@ 2018-08-02 18:39 ` Jeff King
  2018-08-03  2:49   ` Jonathan Nieder
  2018-08-03 12:56   ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 7+ messages in thread
From: Jeff King @ 2018-08-02 18:39 UTC (permalink / raw)
  To: Liam Decker; +Cc: git

On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote:

> I've been working on a git hook in golang recently. However, the library I
> was using did not support a possible quarantine directory, which I would
> use for my hook.
> 
> I have been trying to find out how git finds this incoming directory in the
> objects folder, as their code simply assumed it resided in
> .git/objects/<1st byte>/<last 19 bytes>

When you're running a hook inside the quarantine environment, then
$GIT_OBJECT_DIRECTORY in the environment will be set to the quarantine
directory, and $GIT_ALTERNATE_OBJECT_DIRECTORIES will point to the main
repository object directory (possibly alongside other alternates, if
there were any already set).

Any Git commands you run should therefore find objects from either
location, but any writes would go to the quarantine (most notably, Git's
own index-pack/unpack-objects processes, which is the point of the
quarantine in the first place).

> The solution that I implemented was to check the objects directory for the
> object, and if it was not there, to look for a quarantine directory and try
> there. However, that feels fairly inefficient.

That's more or less what Git will do under the hood (though in the
opposite order).

> For the curious, the library and solution I attempted are both here [5]

Just skimming, but it sounds like go-git does not support the
GIT_OBJECT_DIRECTORY environment variable.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question regarding quarantine environments
  2018-08-02 18:39 ` Jeff King
@ 2018-08-03  2:49   ` Jonathan Nieder
  2018-08-03 12:56   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2018-08-03  2:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Liam Decker, git

Hi,

Jeff King wrote:
> On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote:

>> The solution that I implemented was to check the objects directory for the
>> object, and if it was not there, to look for a quarantine directory and try
>> there. However, that feels fairly inefficient.
>
> That's more or less what Git will do under the hood (though in the
> opposite order).
>
>> For the curious, the library and solution I attempted are both here [5]
>
> Just skimming, but it sounds like go-git does not support the
> GIT_OBJECT_DIRECTORY environment variable.

To be clear: we don't guarantee that the quarantine directory in the
future will be where it is today.

So as Peff hinted, supporting GIT_OBJECT_DIRECTORY in go-git is likely
to be the best way forward for your tool.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question regarding quarantine environments
  2018-08-02 18:39 ` Jeff King
  2018-08-03  2:49   ` Jonathan Nieder
@ 2018-08-03 12:56   ` Ævar Arnfjörð Bjarmason
  2018-08-03 13:00     ` Jeff King
  1 sibling, 1 reply; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-03 12:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Liam Decker, git


On Thu, Aug 02 2018, Jeff King wrote:

> On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote:
>
>> I've been working on a git hook in golang recently. However, the library I
>> was using did not support a possible quarantine directory, which I would
>> use for my hook.
>>
>> I have been trying to find out how git finds this incoming directory in the
>> objects folder, as their code simply assumed it resided in
>> .git/objects/<1st byte>/<last 19 bytes>
>
> When you're running a hook inside the quarantine environment, then
> $GIT_OBJECT_DIRECTORY in the environment will be set to the quarantine
> directory, and $GIT_ALTERNATE_OBJECT_DIRECTORIES will point to the main
> repository object directory (possibly alongside other alternates, if
> there were any already set).
>
> Any Git commands you run should therefore find objects from either
> location, but any writes would go to the quarantine (most notably, Git's
> own index-pack/unpack-objects processes, which is the point of the
> quarantine in the first place).

To add to this, one interesting thing that you can do with hooks because
of this quarantine is to answer certain questions about the push that
were prohibitively expensive before it existed, but there's no explicit
documentation for this.

E.g. for a hook that wants to ban big blobs in the repo, but wants to
allow all existing blobs (you don't want to block e.g. a revert of a
commit that removed it from the checkout), you can juggle these two env
variables and hide the "main" object dir from the hook for some
operations, so e.g. if a blob lookup succeeds in the alternate
quarantine dir, but not the main object dir, you know it's new.

>> The solution that I implemented was to check the objects directory for the
>> object, and if it was not there, to look for a quarantine directory and try
>> there. However, that feels fairly inefficient.
>
> That's more or less what Git will do under the hood (though in the
> opposite order).
>
>> For the curious, the library and solution I attempted are both here [5]
>
> Just skimming, but it sounds like go-git does not support the
> GIT_OBJECT_DIRECTORY environment variable.
>
> -Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question regarding quarantine environments
  2018-08-03 12:56   ` Ævar Arnfjörð Bjarmason
@ 2018-08-03 13:00     ` Jeff King
  2018-08-03 13:25       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2018-08-03 13:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Liam Decker, git

On Fri, Aug 03, 2018 at 02:56:11PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Any Git commands you run should therefore find objects from either
> > location, but any writes would go to the quarantine (most notably, Git's
> > own index-pack/unpack-objects processes, which is the point of the
> > quarantine in the first place).
> 
> To add to this, one interesting thing that you can do with hooks because
> of this quarantine is to answer certain questions about the push that
> were prohibitively expensive before it existed, but there's no explicit
> documentation for this.
> 
> E.g. for a hook that wants to ban big blobs in the repo, but wants to
> allow all existing blobs (you don't want to block e.g. a revert of a
> commit that removed it from the checkout), you can juggle these two env
> variables and hide the "main" object dir from the hook for some
> operations, so e.g. if a blob lookup succeeds in the alternate
> quarantine dir, but not the main object dir, you know it's new.

I'd be a bit careful with that, though, as the definition of "new" is
vague there.

For example, completing a thin pack may mean that the receiver creates a
copy of a base object found in the main repo. That object isn't new as
part of the push, nor was it even sent on the wire, but it will appear
in the quarantine directory. But only sometimes, depending on whether we
kept the sender's pack or exploded it to loose objects.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question regarding quarantine environments
  2018-08-03 13:00     ` Jeff King
@ 2018-08-03 13:25       ` Ævar Arnfjörð Bjarmason
  2018-08-03 13:29         ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-03 13:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Liam Decker, git


On Fri, Aug 03 2018, Jeff King wrote:

> On Fri, Aug 03, 2018 at 02:56:11PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Any Git commands you run should therefore find objects from either
>> > location, but any writes would go to the quarantine (most notably, Git's
>> > own index-pack/unpack-objects processes, which is the point of the
>> > quarantine in the first place).
>>
>> To add to this, one interesting thing that you can do with hooks because
>> of this quarantine is to answer certain questions about the push that
>> were prohibitively expensive before it existed, but there's no explicit
>> documentation for this.
>>
>> E.g. for a hook that wants to ban big blobs in the repo, but wants to
>> allow all existing blobs (you don't want to block e.g. a revert of a
>> commit that removed it from the checkout), you can juggle these two env
>> variables and hide the "main" object dir from the hook for some
>> operations, so e.g. if a blob lookup succeeds in the alternate
>> quarantine dir, but not the main object dir, you know it's new.
>
> I'd be a bit careful with that, though, as the definition of "new" is
> vague there.
>
> For example, completing a thin pack may mean that the receiver creates a
> copy of a base object found in the main repo. That object isn't new as
> part of the push, nor was it even sent on the wire, but it will appear
> in the quarantine directory. But only sometimes, depending on whether we
> kept the sender's pack or exploded it to loose objects.

Right, I mean:

    is_new = !in_quarantine() && in_main()

Or:

    is_new = !in_main()

Should work, in the latter case if the object really is missing from the
quarnatine too, other fsck bits will stop the push.

But as you point out:

    is_new = in_quarantine()

Cannot be relied upon, although it'll be true most of the time.

Perhaps I'm missing some edge case above, but I wanted to reword it to
make sure I understood it correctly (and perhaps you have a correction).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question regarding quarantine environments
  2018-08-03 13:25       ` Ævar Arnfjörð Bjarmason
@ 2018-08-03 13:29         ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2018-08-03 13:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Liam Decker, git

On Fri, Aug 03, 2018 at 03:25:08PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I'd be a bit careful with that, though, as the definition of "new" is
> > vague there.
> >
> > For example, completing a thin pack may mean that the receiver creates a
> > copy of a base object found in the main repo. That object isn't new as
> > part of the push, nor was it even sent on the wire, but it will appear
> > in the quarantine directory. But only sometimes, depending on whether we
> > kept the sender's pack or exploded it to loose objects.
> 
> Right, I mean:
> 
>     is_new = !in_quarantine() && in_main()
> 
> Or:
> 
>     is_new = !in_main()
> 
> Should work, in the latter case if the object really is missing from the
> quarnatine too, other fsck bits will stop the push.

Ah, OK. Yes, I agree that should work to cover new objects (including
ones that the other side but aren't actually needed to update the refs,
though hopefully that is rare).

There may also be other object stores, if the main repository used
alternates (or if somebody set GIT_ALTERNATE_OBJECT_DIRECTORIES). You
can probably disregard that, though, as:

  1. If you ignore the main repo, presumably you ignore its
     recursive info/alternates, too.

  2. The easy mechanism for ignoring the main repo is to ignore
     GIT_ALTERNATE_OBJECT_DIRECTORIES, so you'd already be handling
     that.

> But as you point out:
> 
>     is_new = in_quarantine()
> 
> Cannot be relied upon, although it'll be true most of the time.
> 
> Perhaps I'm missing some edge case above, but I wanted to reword it to
> make sure I understood it correctly (and perhaps you have a correction).

Nope, I just didn't think through what you were saying carefully enough. ;)

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-08-03 13:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-02 17:58 Question regarding quarantine environments Liam Decker
2018-08-02 18:39 ` Jeff King
2018-08-03  2:49   ` Jonathan Nieder
2018-08-03 12:56   ` Ævar Arnfjörð Bjarmason
2018-08-03 13:00     ` Jeff King
2018-08-03 13:25       ` Ævar Arnfjörð Bjarmason
2018-08-03 13:29         ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).