git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Masaya Suzuki <masayasuzuki@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] doc: describe Git bundle format
Date: Fri, 31 Jan 2020 15:57:55 -0800	[thread overview]
Message-ID: <CAJB1erXqK-a2uDPPQDLpdLYnPC8Mcxjo2ER0qSAsD9DOVHSmGQ@mail.gmail.com> (raw)
In-Reply-To: <xmqqy2tn8c3w.fsf@gitster-ct.c.googlers.com>

On Fri, Jan 31, 2020 at 3:01 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Masaya Suzuki <masayasuzuki@google.com> writes:
>
> >> > +prerequisite = "-" obj-id SP comment LF
> >> > +comment      = *CHAR
> >>
> >> Do readers know what CHAR consists of?  Anything other than NUL and
> >> LF?
> >
> > RFC 5234 defines core rules
> > (https://tools.ietf.org/html/rfc5234#appendix-B.1), and these CHAR etc
> > are defined there. It should be OK to use these rules.
>
> That's not what I asked.  Do readers know that?  Did you tell them
> that we expect they are familiar with the RFC convention?

The patch says "We will use ABNF notation to define the Git bundle
format. See protocol-common.txt for the details.", and
protocol-common.txt says "ABNF notation as described by RFC 5234 is
used within the protocol documents, except the following replacement
core rules are used:". In order to interpret this ABNF definition,
it's not enough to read RFC 5234, but the reader has to read
protocol-common.txt. Otherwise, they cannot understand what `obj-id`
is and what `refname` is. Those are not defined in RFC 5234. They're
defined in protocol-common.txt.

Based on the fact that (1) this document instructs the reader to see
protocol-common.txt in the beginning and (2) protocol-common.txt is
needed to interpret this definition and protocol-common.txt says RFC
5234 describes ABNF format, the readers should know ABNF is defined in
RFC 5234 and ABNF includes those LF, CHAR, and SP as a part of the
definition after reading the first sentence and referenced documents.

>
> It might be easier to make the above simple ABNF understandable to
> those without knowledge of RFC 5234 by spelling out what CHAR in the
> context of the above description means.  Or to tell them "go over
> there and learn CHAR then come back".  We need to do one of them.

As I said above, the first sentence says "See protocol-common.txt"
which includes the reference to the RFC and other non-terminals. Note
that, not only CHAR, but obj-id and refname are not defined here as
well. The readers need to reference protocol-common.txt to get the
definition of them.

>
> > I want to make sure the meaning of prerequisites.
> >
> > 1. Are they meant for a delta base? Or are they meant to represent a
> > partial/shallow state?
>
> They are meant as the "bottom boundary" of the range of the pack
> data stored in the bundle.
>
> Think of "git rev-list --objects $heads --not $prerequisites".  If
> we limit ourselves to commits, in the simplest case, "git log
> maint..master".  Imagine your repository has everything up to
> 'maint' (and nothing else) and then you are "git fetch"-ing from
> another repository that advanced the tip that now points at
> 'master'.  Imagine the data transferred over the network.  Imagine
> that data is frozen on disk somehow.  That is what a bundle is.
>
> So, 'maint' is the prerequisite---for the person who builds the
> bundle, it can safely be assumed that the bundle will be used only
> by those who already has 'maint'.
>
> There is nothing about 'partial' or 'shallow'.  And even though a
> bundle typically has deltified objects in the packfile, it does not
> have to.  Some objects are delitifed against prerequisite, and the
> logic to generate thin packs may even prefer to use the
> prerequisites as the delta base, but it is merely a side effect that
> the prerequisites are at the "bottom boundary" of the range.

OK. Then, it's better to make this clear. If you follow the analogy of
saved git-fetch response, it's possible that these prerequisites are
interpreted same as "shallow" lines of the shallow clone response.
It's more like "have" lines of git-fetch request.

> > 2. Do they need to be commits? Or can they be any object type?
> >
> > From what I can see, it seems that they should always be commits.
> >
> > 3. Does the receiver have to have all reachable objects from prerequisites?
>
> I would say that the receiver needs to have everything that is
> needed to "complete" prereqs.
>
> Bundle transfer predates shallow or incomplete repositories, but I
> think that we can (and we should if needed) update it to adjust to
> these situations by using the appropriate definition of what it
> means to "complete".  In a lazy clone, it may be sufficient to have
> promisor remote that has everything reachable from them.  In a
> shallow clone, the repository may have to be deep enough to have
> them and objects immediately reachable from them (e.g. trees and
> blobs for a commit at the "bottom boundary").

I think there are two completeness of a packfile:

* Delta complete: If an object in a packfile is deltified, the delta
base exists in the same packfile.
* Object complete: If an object in a packfile contains a reference to
another object, that object exists in the same packfile.

For example, initial shallow clone response should contain a
delta-complete object-incomplete packfile. Incremental fetch response
and bundles with prereqs would have a delta-incomplete
object-incomplete packfile. Creating delta-incomplete object-complete
packfile is possible (e.g. create a parallel history with all blobs
slightly modified and deltify against the original branch. I can
create a packfile with all objects in one history with all objects
deltified with the other history), but it's a rare case.

The reader of a bundle SHOULD have all objects reachable from prereqs.

  reply	other threads:[~2020-02-01  0:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-30 22:58 [PATCH] doc: describe Git bundle format Masaya Suzuki
2020-01-31 13:56 ` Johannes Schindelin
2020-01-31 20:38 ` Junio C Hamano
2020-01-31 21:49   ` Masaya Suzuki
2020-01-31 23:01     ` Junio C Hamano
2020-01-31 23:57       ` Masaya Suzuki [this message]
2020-02-04 18:20         ` Junio C Hamano
2020-01-31 22:18 ` [PATCH v2] " Masaya Suzuki
2020-01-31 23:06   ` Junio C Hamano
2020-02-07 20:42   ` [PATCH v3] " Masaya Suzuki
2020-02-07 20:44     ` Masaya Suzuki
2020-02-07 20:59       ` Junio C Hamano
2020-02-07 22:21         ` Masaya Suzuki
2020-02-08  1:49           ` Junio C Hamano
2020-02-12 22:13             ` Masaya Suzuki
2020-02-12 22:43               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJB1erXqK-a2uDPPQDLpdLYnPC8Mcxjo2ER0qSAsD9DOVHSmGQ@mail.gmail.com \
    --to=masayasuzuki@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).