git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Stefan Beller <sbeller@google.com>,
	Hariom verma <hariom18599@gmail.com>
Subject: Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
Date: Sat, 21 Aug 2021 10:16:02 +0800	[thread overview]
Message-ID: <CAOLTT8St0PGPsw6zki2fwO3iVG_4W=s6GEqZfp3cmH=xngcO2A@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD0aDrqHxNp2ztecmZR49gs=SVJTN1NUzJvz6+ewpY-_wA@mail.gmail.com>

Christian Couder <christian.couder@gmail.com> 于2021年8月20日周五 下午11:58写道:
>
> On Wed, Aug 18, 2021 at 1:11 PM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > ZheNing Hu <adlternative@gmail.com> 于2021年8月18日周三 下午5:07写道:
> > > I think the 'buf' is not freed in ref-filter logic.
> > >
> > > But one thing worth noting is:
> > >
> > > parse_object_buffer() may take this buffer away, and store it in
> > > tree->buffer or use set_commit_buffer() to store it in commit_slab.
> > >
> > > So in theory, as long as we don’t use parse_object_buffer(), this
> > > dynamically allocated memory can be used freely for us. If we use
> > > parse_object_buffer() and free the buffer, there seems to be a risk,
> > > but it does not appear so far.
> > >
> >
> > When parse_object_buffer() hints that we don’t free the buffer when
> > eaten == 1, I have a better idea, If the buffer is "eaten", we copy it,
> > otherwise we use the originally allocated space.
> >
> > -static void grab_sub_body_contents(struct atom_value *val, int deref,
> > struct expand_data *data)
> > +static void grab_sub_body_contents(struct atom_value *val, int deref,
> > struct expand_data *data, int eaten)
> >  {
> >         int i;
> >         const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
> > @@ -1499,7 +1499,10 @@ static void grab_sub_body_contents(struct
> > atom_value *val, int deref, struct exp
> >                         unsigned long buf_size = data->size;
> >
> >                         if (atom->u.raw_data.option == RAW_BARE) {
> > -                               v->s = buf;
> > +                               if (eaten)
> > +                                       v->s = xmemdupz(buf, buf_size);
>
> Can the 'buf' already be eaten at this point? I thought that it could
> be eaten only through v->s so after this code.
>

I think where the 'buf' is eaten is parse_object_buffer(). set_commit_buffer()
or parse_tree_buffer() cache the 'buf'. 'eaten' means we don't need to free it
manually. In grab_sub_body_contents() we can indeed write the address of
'buf' to v->s, but what I worry about is that before we completely write v->s to
the output buffer through append_atom(), will this buf be freed in
somewhere else?

> > +                               else
> > +                                       v->s = buf;
> >                                 v->s_size = buf_size;
> >                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
> >                                 v->s = xstrfmt_len(&v->s_size,
> > "%"PRIuMAX, (uintmax_t)buf_size);
> >
> > As parse_object_buffer() does internally: the buffer of commit/tree objects
> > needs to be copied, but blob/tag not. You said that the number of commits
> > is generally significantly greater than the others. It seems that we cannot
> > make full use of this idea. But remember the "skip_parse_buffer" before?
> > If we skip the parse_object_buffer(), this buffer is also "!eaten".
> >
> > In other words, those default paths of git cat-file --batch are included in it!
> > So from the perspective of performance regression, this patch will be
> > beneficial.
>
> Yeah, it seems that we can benefit from something like this, but it'd

Only when the data allocated to us is dynamically allocated and we do have
the ownership of it, we can benefit by reducing copies and allocate
memory again.

> be nice if things were clarified a bit more.

OK. In get_object(), eaten means that the oi->content we obtained is cached
by git in parse_object_buffer(). This means that we don't need to free this buf
manually. In 28511adfa5 ([GSOC] ref-filter: skip parse_object_buffer()), we skip
parse_buffer() when using some atoms, which can avoid some meaningless parsing.
So under this path, our'buf' is not cached. This means we have
ownership of it, so we
can free it freely, and benefit from this.

Thanks.
--
ZheNing Hu

  reply	other threads:[~2021-08-21  2:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 14:00 [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content? ZheNing Hu
2021-08-17 14:34 ` Fwd: " ZheNing Hu
2021-08-17 16:09 ` Christian Couder
2021-08-18  4:51   ` ZheNing Hu
2021-08-18  8:53     ` Christian Couder
2021-08-18  9:07       ` ZheNing Hu
2021-08-18 11:11         ` ZheNing Hu
2021-08-19  1:39           ` ZheNing Hu
2021-08-20 16:13             ` Christian Couder
2021-08-21  2:36               ` ZheNing Hu
2021-08-20 15:58           ` Christian Couder
2021-08-21  2:16             ` ZheNing Hu [this message]
2021-08-24  7:11               ` Christian Couder
2021-08-25  8:11                 ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOLTT8St0PGPsw6zki2fwO3iVG_4W=s6GEqZfp3cmH=xngcO2A@mail.gmail.com' \
    --to=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).