git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>,
	Hariom Verma <hariom18599@gmail.com>
Subject: Re: [PATCH] [GSOC] ref-filter: add contents:raw atom
Date: Fri, 21 May 2021 21:12:35 +0800	[thread overview]
Message-ID: <CAOLTT8Q9meKiHG=TvwsW49dwWtzBNnkfT03+dQrvKDLGuLiOFg@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD1yLS9UBs0r6_GjB-K6prW7GNxR+z445HJe8cR4HYLewA@mail.gmail.com>

Christian Couder <christian.couder@gmail.com> 于2021年5月21日周五 下午5:10写道:
>
> > Okay, some explanations are indeed missing here:
>
> By the way I forgot to say that your patch might want to also properly
> document %(contents:raw). If the doc was updated as part of the patch,
> maybe the whole patch would be easier to understand.
>

Indeed so.

> > %(contents) will discard the metadata part of the object file,
>
> It's not clear what you mean when you talk about metadata in objects.
> Are you taking about only the headers, like the "tree XXX", "parent
> YYY", etc lines in commits, and the "object OOO", "type TTT", etc
> lines in tags? Or does it also includes the lines in tree objects that
> reference other trees or blobs?
>

Well, the metadata here only refers to the header of the commit and
tag objects, "tree XXX", "parent YYY"...  The lines in tree objects that
reference other trees or blobs will not output. What we have to do here
is to simulate the behavior of `git cat-file --batch`. But we encounter a
very serious problem here:

Now we want to use for-each-ref to print a ref point to a tree:

git for-each-ref --format="%(contents:raw)" refs/mytrees/first

will output:

100644 one

but

git cat-file tree refs/mytrees/first

will output:

100644 onem�cֈ��q�D�֧hJ)E-100644 two.t���0�+��VjC��eV�ӈq

which is the compressed data, it may contains '\0'.

Whne we use `append_atom()` to add the contents of the tree object
to the buffer, notice that it uses `strbuf_addstr()`, the underlying call
is `strlen()`, which truncates the data we added. Can we have any good
remedies? For example, record the length of "v->s" by "v->s_size" and
use `strbuf_addstr(&state->stack->output, v->s, v->s_size)`?

> > and only print the data contents part of it. %(contents:raw)
> > can will not discard the metadata part of the object file, this
>
> s/can//
>
> >  means that it can print the "raw" content of an object.
>
> The above seems to be changing the definition of %(contents) (as
> previously it was only the commit message of a commit or the tag
> message a tag) to explain %(contents:raw)...
>
> > In addition, %(contents:raw) can support print commit, blob,
> > tag, tree objects contents which %(contents) can only support
> > commit,tag objects.
>
> ...but this is saying that the definition of %(contents) actually
> doesn't change. This doesn't seem logical to me.
>
> If %(contents:raw) can support any kind of object (commit, blob, tag
> or tree) then it would be logical that %(contents) also support any
> kind of object.
>
> So if you really want to define %(contents:raw) as the raw object
> contents, you might want to consider a preparatory patch that would
> first change the definition of %(contents) to be the object contents
> without the headers. This would keep the same output for any commit or
> tag. But for blobs and trees it would print the whole content of the
> object in the same way as `git cat-file -p` does, instead of nothing.
>

Yes, I agree with you. Another thing worth mentioning is:
%(contents:raw) or %(contents) is not similar to "git cat-file -p foo"
they are more like "git cat-file <type> foo" or
"git rev-parse foo | git cat-file --batch", "git cat-file -p foo" It will
process the original data:
For example, a tree object, with "git cat-file -p",
will use "ls-tree" to output all the sub-trees and blobs it connects.

$ git cat-file -p refs/mytrees/first
100644 blob 6dde63d6888cec71ea82449bd6a7684a29452d7f    one
100644 blob f719efd430d52bcfc8566a43b2eb655688d38871    two.t

but with "git cat-file <type>" will output uncompressed data:

git cat-file tree refs/mytrees/first
100644 onem�cֈ��q�D�֧hJ)E-100644 two.t���0�+��VjC��eV�ӈq%

> I think this acceptable as refs rarely point to something other than
> commits, so people are not likely to rely on the fact that %(contents)
> currently prints nothing for blobs and trees.
>
> > E.g:
> >
> > git for-each-ref --format=%(contents:raw) refs/heads/foo
> >
> > will have the same output as:
> >
> > git rev-parse refs/heads/foo | git cat-file --batch
>
> Ok, I think that would indeed be useful.
>
> > > Also is %(contents:raw) supposed to print something for a blob or a
> > > tree, while I guess %(contents) is printing nothing for them?
> >
> > Now my thoughts are:
> > Let %(contents) learn to print four kinds of objects.
> > and then let %(contents:raw) learn to print metadata.
> > I will split it into two patches.
>
> Yeah, great!
>

Thanks!
--
ZheNing Hu

  reply	other threads:[~2021-05-21 13:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-20  8:49 [PATCH] [GSOC] ref-filter: add contents:raw atom ZheNing Hu via GitGitGadget
2021-05-20 11:29 ` Hariom verma
2021-05-20 14:34   ` ZheNing Hu
2021-05-20 16:20 ` Christian Couder
2021-05-21  4:43   ` ZheNing Hu
2021-05-21  9:10     ` Christian Couder
2021-05-21 13:12       ` ZheNing Hu [this message]
2021-05-21 13:39         ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOLTT8Q9meKiHG=TvwsW49dwWtzBNnkfT03+dQrvKDLGuLiOFg@mail.gmail.com' \
    --to=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).