git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
	Git List <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>,
	Christian Couder <chriscool@tuxfamily.org>,
	Hariom Verma <hariom18599@gmail.com>
Subject: Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
Date: Tue, 6 Apr 2021 16:53:15 +0800	[thread overview]
Message-ID: <CAOLTT8SoRPbcqCM33RkqQ0_rWmax7aAati0Q7L7x9JGBcVjPzA@mail.gmail.com> (raw)
In-Reply-To: <CAPig+cRzv3sPHpOY_ZGBu8mGp=gy6E+c9ige3-AHh2DM+YcKjw@mail.gmail.com>

Eric Sunshine <sunshine@sunshineco.com> 于2021年4月6日周二 上午1:05写道:
>
> On Mon, Apr 5, 2021 at 10:01 AM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can provide two single strbuf:
> > final_buf and error_buf that get reused for each output.
> > [...]
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
>
> Was there a discussion leading up to this change? If so, it may be a
> good idea to provide a link to it in the mailing list here under the
> "---" line.
>

Okay, I will add them in cover-letter next time.

> Some comments below...
>
> > diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> > @@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >         struct ref_format format = REF_FORMAT_INIT;
> > +       struct strbuf final_buf = STRBUF_INIT;
> > +       struct strbuf error_buf = STRBUF_INIT;
> >
> >         for (i = 0; i < maxcount; i++)
> > -               show_ref_array_item(array.items[i], &format);
> > +               show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);
> >         ref_array_clear(&array);
> >         return 0;
> >  }
>
> The call to strbuf_reset() within show_ref_array_item() does not free
> the memory from the strbufs, so the memory is being leaked. Therefore,
> at the end of this function, you should have:
>
>     strbuf_release(final_buf);
>     strbuf_release(error_buf);
>

Thanks, I ignored this point.

> > diff --git a/builtin/tag.c b/builtin/tag.c
> > @@ -39,6 +39,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >         struct ref_array array;
> > +       struct strbuf final_buf = STRBUF_INIT;
> > +       struct strbuf error_buf = STRBUF_INIT;
> >
> >         for (i = 0; i < array.nr; i++)
> > -               show_ref_array_item(array.items[i], format);
> > +               show_ref_array_item(array.items[i], format, &final_buf, &error_buf);
> >         ref_array_clear(&array);
> >         free(to_free);
>
> Leaking `final_buf` and `error_buf`.
>
> > diff --git a/ref-filter.c b/ref-filter.c
> > @@ -2436,16 +2436,16 @@ int format_ref_array_item(struct ref_array_item *info,
> >  void show_ref_array_item(struct ref_array_item *info,
> > -                        const struct ref_format *format)
> > +                        const struct ref_format *format,
> > +                        struct strbuf *final_buf,
> > +                        struct strbuf *error_buf)
> >  {
> > -       struct strbuf final_buf = STRBUF_INIT;
> > -       struct strbuf error_buf = STRBUF_INIT;
> >
> > -       if (format_ref_array_item(info, format, &final_buf, &error_buf))
> > -               die("%s", error_buf.buf);
> > -       fwrite(final_buf.buf, 1, final_buf.len, stdout);
> > -       strbuf_release(&error_buf);
> > -       strbuf_release(&final_buf);
> > +       if (format_ref_array_item(info, format, final_buf, error_buf))
> > +               die("%s", error_buf->buf);
> > +       fwrite(final_buf->buf, 1, final_buf->len, stdout);
> > +       strbuf_reset(error_buf);
> > +       strbuf_reset(final_buf);
> >         putchar('\n');
> >  }
>
> A couple comments:
>
> It is especially ugly that `error_buf` needs to be passed in by the
> caller since it is only ever used in case of an error, at which point
> the program will die() anyhow, so it's not on a critical,
> speed-sensitive path. The initialization with STRBUF_INIT is
> practically cost-free, so this variable could easily stay local to
> this function without cost-penalty rather than forcing the caller to
> pass it in. (This assumes that none of the consumers of `error_buf`
> down the line insert into the buffer unnecessarily, which is probably
> a reasonable assumption.)
>

What you said makes sense. The `error_buf` may not need to be passed
as a parameter, because errors are generally less.

> It is an unsafe assumption to only call strbuf_reset() at the end of
> the function. For this to be robust, you can't assume that the caller
> has given you an empty strbuf. Instead, you must ensure it by calling
> strbuf_reset() at the start. (It doesn't hurt to also call
> strbuf_reset() at the end, but that is not critical to correct
> operation, so could be omitted.)
>

Well, indeed, it would be better to use `strbuf_reset()` first, as it do in
`cat-file.c`.

> > @@ -2453,9 +2453,12 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
> >         struct ref_array_item *ref_item;
> > +       struct strbuf final_buf = STRBUF_INIT;
> > +       struct strbuf error_buf = STRBUF_INIT;
> > +
> >         ref_item = new_ref_array_item(name, oid);
> >         ref_item->kind = ref_kind_from_refname(name);
> > -       show_ref_array_item(ref_item, format);
> > +       show_ref_array_item(ref_item, format, &final_buf, &error_buf);
> >         free_array_item(ref_item);
> >  }
>
> Leaking `final_buf` and `error_buf`.
>
> > diff --git a/ref-filter.h b/ref-filter.h
> > @@ -120,7 +120,10 @@ int format_ref_array_item(struct ref_array_item *info,
> >  /*  Print the ref using the given format and quote_style */
> > -void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> > +void show_ref_array_item(struct ref_array_item *info,
> > +                        const struct ref_format *format,
> > +                        struct strbuf *final_buf,
> > +                        struct strbuf *error_buf);
>
> It is not clear to the person reading this what these two new
> arguments are for or what should be provided, so the comment above the
> function deserves an update explaining what these arguments are for
> and how to provide them. This is especially important since this is a
> public function.
>
> If this function was merely internal to some builtin command, this
> sort of change for the sake of optimization might not be so bad, but
> as a public function, it comes across as rather ugly. In general, we
> don't necessarily want to pollute an otherwise clean API with changes
> which make the API uglier merely for the sake of tiny optimizations
> like this (IMHO). The extra burden placed on callers by this change,
> coupled with the ugliness this introduces into the API, makes the
> change seem less than desirable.
>

Well, for the time being, there are relatively few places
where`show_ref_array_item()`
is used in the entire git repository. I may need to pay attention to this later:
the ease of use of public interfaces is also important.

> One way you might be able to mitigate the undesirableness would be to
> have two versions of this function. For instance:
>
>     /* Print the ref using the given format and quote_style */
>     show_ref_array_item(...);
>    /* This is like show_ref_array_item() but ... */
>     show_ref_array_item_optim(...);
>
> The comment of the second function would, of course, need to explain
> that it is similar to show_ref_array_item() but more optimal because
> <some reasons> and that the caller is responsible for allocating and
> releasing the strbufs, and that the strbufs are used only for
> temporary storage, thus the caller should not assume anything about
> them.
>

Yes, this will ensure that this new public interface will not be misused.

> This way, callers which don't invoke show_ref_array_item() in a tight
> loop don't need to be burdened by the new arguments (and won't have to
> remember to release them), and callers with a tight loop can take
> advantage of the optimization with the bit of extra work of having to
> declare and release the strbufs.
>

For external calls, reasonable release of strbuf does require attention,
which is indeed a disadvantage at some time.

> So, having said all that, it's not clear that the ugliness and extra
> work are worth the gain. However, if it is decided that the change is
> worthwhile, then the commit message probably should explain cases in
> which such an optimization will be really beneficial.

I now estimate that the optimization brought here may appear in a more refs
git repo like `linux.git`. I have to experiment first.

Thanks, Eric.

--
ZheNing Hu

  reply	other threads:[~2021-04-06  8:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05 14:01 [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
2021-04-05 17:05 ` Eric Sunshine
2021-04-06  8:53   ` ZheNing Hu [this message]
2021-04-05 21:02 ` Derrick Stolee
2021-04-06  8:58   ` ZheNing Hu
2021-04-05 22:17 ` Jeff King
2021-04-06  9:49   ` ZheNing Hu
2021-04-06 10:35     ` ZheNing Hu
2021-04-06 14:00       ` Jeff King
2021-04-06 14:35         ` ZheNing Hu
2021-04-06 18:34 ` René Scharfe
2021-04-07 13:57   ` ZheNing Hu
2021-04-07 15:26 ` [PATCH v2] " ZheNing Hu via GitGitGadget
2021-04-07 20:31   ` Junio C Hamano
2021-04-08 12:05     ` ZheNing Hu
2021-04-07 21:27   ` Jeff King
2021-04-08 12:18     ` ZheNing Hu
2021-04-08 14:32       ` Jeff King
2021-04-08 14:43         ` ZheNing Hu
2021-04-08 14:51           ` Jeff King
2021-04-08 15:12             ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8SoRPbcqCM33RkqQ0_rWmax7aAati0Q7L7x9JGBcVjPzA@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).