From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Junio C Hamano <gitster@pobox.com>,
Git mailing list <git@vger.kernel.org>,
Mark Thomas <markbt@efaref.net>,
Jeff Hostetler <git@jeffhostetler.com>,
Kevin David <kevin.david@microsoft.com>
Subject: Re: Proposal for missing blob support in Git repos
Date: Tue, 2 May 2017 20:32:05 +0200 [thread overview]
Message-ID: <CACBZZX6jQtO_3zYjnvq0dhtWvUxb7vYLtQUWpFHLw1v-SteHcQ@mail.gmail.com> (raw)
In-Reply-To: <CAGf8dgK05+f4uX-8+iMFvQd0n2JP6YxJ18ag8uDaEH6qc6SgVQ@mail.gmail.com>
On Tue, May 2, 2017 at 7:21 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Mon, May 1, 2017 at 6:41 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Jonathan Tan <jonathantanmy@google.com> writes:
>>
>>> On 05/01/2017 04:29 PM, Junio C Hamano wrote:
>>>> Jonathan Tan <jonathantanmy@google.com> writes:
>>>>
>>>>> Thanks for your comments. If you're referring to the codepath
>>>>> involving write_sha1_file() (for example, builtin/hash-object ->
>>>>> index_fd or builtin/unpack-objects), that is fine because
>>>>> write_sha1_file() invokes freshen_packed_object() and
>>>>> freshen_loose_object() directly to check if the object already exists
>>>>> (and thus does not invoke the new mechanism in this patch).
>>>>
>>>> Is that a good thing, though? It means that you an attacker can
>>>> feed one version to the remote object store your "grab blob" hook
>>>> gets the blobs from, and have you add a colliding object locally,
>>>> and the usual "are we recording the same object as existing one?"
>>>> check is bypassed.
>>>
>>> If I understand this correctly, what you mean is the situation where
>>> the hook adds an object to the local repo, overriding another object
>>> of the same name?
>>
>> No.
>>
>> write_sha1_file() pays attention to objects already in the local
>> object store to avoid hash collisions that can be used to replace a
>> known-to-be-good object and that is done as a security measure.
>> What I am reading in your response was that this new mechanism
>> bypasses that, and I was wondering if that is a good thing.
>
> Oh, what I meant was that write_sha1_file() bypasses the new
> mechanism, not that the new mechanism bypasses the checks in
> write_sha1_file().
>
> To be clear, here's what happens when write_sha1_file() is invoked
> (before and after this patch - this patch does not affect
> write_sha1_file at all):
> 1. (some details omitted)
> 2. call freshen_packed_object
> 3, call freshen_loose_object if necessary
> 4. write object (if freshen_packed_object and freshen_loose_object do
> not both return 0)
>
> Nothing changes in this patch (whether a hook is defined or not).
But don't the semantics change in the sense that before
core.missingBlobCommand you couldn't write a new blob SHA1 that was
already part of your history, whereas with this change
write_sha1_file() might write what it considers to be a new blob, but
it's actually colliding with an existing blob, but write_sha1_file()
doesn't know that because knowing would involve asking the hook to
fetch the blob?
> And here's what happens when has_sha1_file (or another function listed
> in the commit message) is invoked:
> 1. check for existence of packed object of the requested name
> 2. check for existence of loose object of the requested name
> 3. check again for existence of packed object of the requested name
> 4. if a hook is defined, invoke the hook and repeat 1-3
>
> Here, in step 4, the hook could do whatever it wants to the repository.
This might be a bit of early bikeshedding, but then again the lack of
early bikeshedding tends to turn into standards.
Wouldn't it be better to name this core.missingObjectCommand & have
the hook take a list on stdin like:
<id> <TAB> <object_type> <TAB> <object_id> <TAB> <request_type> <TAB> [....]
And have the hook respond:
<id> <TAB> <status> <TAB> [....]
I.e. what you'd do now is send this to the hook:
1 <TAB> blob <TAB> <sha1> <TAB> missing
And the hook would respond:
1 <TAB> ok
But this leaves open the door addressing this potential edge case with
writing new blobs in the future, i.e. write_sha1_file() could call it
as:
1 <TAB> blob <TAB> <sha1> <TAB> new
And the hook could either respond immediately as:
1 <TAB> ok
If it's in some #YOLO mode where it's not going to check for colliding
blobs over the network, or alternatively & ask the parent repo if it
has those blobs, and if so print:
1 <TAB> collision
Or something like that.
This also enables future lazy loading of trees/commits from the same
API, and for the hook to respond out-of-order to the input it gets as
it can, since each request is prefixed with an incrementing request
id.
next prev parent reply other threads:[~2017-05-02 18:32 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-26 22:13 Proposal for missing blob support in Git repos Jonathan Tan
2017-05-01 3:57 ` Junio C Hamano
2017-05-01 19:12 ` Jonathan Tan
2017-05-01 23:29 ` Junio C Hamano
2017-05-02 0:33 ` Jonathan Tan
2017-05-02 0:38 ` Brandon Williams
2017-05-02 1:41 ` Junio C Hamano
2017-05-02 17:21 ` Jonathan Tan
2017-05-02 18:32 ` Ævar Arnfjörð Bjarmason [this message]
2017-05-02 21:45 ` Jonathan Tan
2017-05-04 4:29 ` Junio C Hamano
2017-05-04 17:09 ` Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACBZZX6jQtO_3zYjnvq0dhtWvUxb7vYLtQUWpFHLw1v-SteHcQ@mail.gmail.com \
--to=avarab@gmail.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=kevin.david@microsoft.com \
--cc=markbt@efaref.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).