git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Stefan Beller <sbeller@google.com>,
	Hariom verma <hariom18599@gmail.com>
Subject: Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
Date: Tue, 24 Aug 2021 09:11:14 +0200	[thread overview]
Message-ID: <CAP8UFD0pZVCHG_i+u_6QAP5yKUpmdYm+fkwr9aJMCDXukKF_7g@mail.gmail.com> (raw)
In-Reply-To: <CAOLTT8St0PGPsw6zki2fwO3iVG_4W=s6GEqZfp3cmH=xngcO2A@mail.gmail.com>

On Sat, Aug 21, 2021 at 4:16 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> 于2021年8月20日周五 下午11:58写道:
> >
> > On Wed, Aug 18, 2021 at 1:11 PM ZheNing Hu <adlternative@gmail.com> wrote:

> > > -static void grab_sub_body_contents(struct atom_value *val, int deref,
> > > struct expand_data *data)
> > > +static void grab_sub_body_contents(struct atom_value *val, int deref,
> > > struct expand_data *data, int eaten)
> > >  {
> > >         int i;
> > >         const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
> > > @@ -1499,7 +1499,10 @@ static void grab_sub_body_contents(struct
> > > atom_value *val, int deref, struct exp
> > >                         unsigned long buf_size = data->size;
> > >
> > >                         if (atom->u.raw_data.option == RAW_BARE) {
> > > -                               v->s = buf;
> > > +                               if (eaten)
> > > +                                       v->s = xmemdupz(buf, buf_size);
> >
> > Can the 'buf' already be eaten at this point? I thought that it could
> > be eaten only through v->s so after this code.
>
> I think where the 'buf' is eaten is parse_object_buffer().

Yeah, right, and parse_object_buffer() seems to be called by
get_object() which calls grab_values() after that. So the buf can
indeed be eaten at that point.

> set_commit_buffer()
> or parse_tree_buffer() cache the 'buf'. 'eaten' means we don't need to free it
> manually.

Yeah, but it doesn't mean that we cannot use the buf. So perhaps we
could still use it, even if it's eaten.

> In grab_sub_body_contents() we can indeed write the address of
> 'buf' to v->s, but what I worry about is that before we completely write v->s to
> the output buffer through append_atom(), will this buf be freed in
> somewhere else?

Yeah, that's a good question. But actually it seems that the buf is
freed (at least in get_object()) only when it's not eaten.

> > > +                               else
> > > +                                       v->s = buf;
> > >                                 v->s_size = buf_size;
> > >                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
> > >                                 v->s = xstrfmt_len(&v->s_size,
> > > "%"PRIuMAX, (uintmax_t)buf_size);
> > >
> > > As parse_object_buffer() does internally: the buffer of commit/tree objects
> > > needs to be copied, but blob/tag not. You said that the number of commits
> > > is generally significantly greater than the others. It seems that we cannot
> > > make full use of this idea. But remember the "skip_parse_buffer" before?
> > > If we skip the parse_object_buffer(), this buffer is also "!eaten".
> > >
> > > In other words, those default paths of git cat-file --batch are included in it!
> > > So from the perspective of performance regression, this patch will be
> > > beneficial.
> >
> > Yeah, it seems that we can benefit from something like this, but it'd
>
> Only when the data allocated to us is dynamically allocated and we do have
> the ownership of it, we can benefit by reducing copies and allocate
> memory again.
>
> > be nice if things were clarified a bit more.
>
> OK. In get_object(), eaten means that the oi->content we obtained is cached
> by git in parse_object_buffer(). This means that we don't need to free this buf
> manually. In 28511adfa5 ([GSOC] ref-filter: skip parse_object_buffer()), we skip
> parse_buffer() when using some atoms, which can avoid some meaningless parsing.
> So under this path, our'buf' is not cached. This means we have
> ownership of it, so we
> can free it freely, and benefit from this.

In the patch we are not freeing it, we only duplicate it when it's
eaten. So I am not sure that the above explanation are clear enough.
That's the issue I have with this patch, but maybe the commit message
can now be discussed and improved after sending the patch series to
the mailing list instead of in this thread.

Thanks!

  reply	other threads:[~2021-08-24  7:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 14:00 [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content? ZheNing Hu
2021-08-17 14:34 ` Fwd: " ZheNing Hu
2021-08-17 16:09 ` Christian Couder
2021-08-18  4:51   ` ZheNing Hu
2021-08-18  8:53     ` Christian Couder
2021-08-18  9:07       ` ZheNing Hu
2021-08-18 11:11         ` ZheNing Hu
2021-08-19  1:39           ` ZheNing Hu
2021-08-20 16:13             ` Christian Couder
2021-08-21  2:36               ` ZheNing Hu
2021-08-20 15:58           ` Christian Couder
2021-08-21  2:16             ` ZheNing Hu
2021-08-24  7:11               ` Christian Couder [this message]
2021-08-25  8:11                 ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP8UFD0pZVCHG_i+u_6QAP5yKUpmdYm+fkwr9aJMCDXukKF_7g@mail.gmail.com \
    --to=christian.couder@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).