git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: ZheNing Hu <adlternative@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Hariom verma" <hariom18599@gmail.com>,
	"Shourya Shukla" <periperidip@gmail.com>,
	"Оля Тележная" <olyatelezhnaya@gmail.com>,
	"Jiang Xin" <worldhello.net@gmail.com>,
	"Git List" <git@vger.kernel.org>
Subject: Re: GSoC Git Proposal Draft - ZheNing Hu
Date: Tue, 13 Apr 2021 02:40:41 -0400	[thread overview]
Message-ID: <YHU86dPKODAPDXY0@coredump.intra.peff.net> (raw)
In-Reply-To: <CAOLTT8SW0n=x3HBL=php0aC1nhP7eU-MHFLustC3H0opxGRV1w@mail.gmail.com>

On Sun, Apr 11, 2021 at 11:34:35PM +0800, ZheNing Hu wrote:

> > Why is Olga’s solution rejected?
> > 1. Olga's solution is to let `git cat-file` use the `ref-filter` interface,
> > the performance of `cat-file` appears to be degraded due "very eager to
> > allocate lots of separate strings" in `ref-filter` and other reasons.
> 
> I am thinking today whether we can directly append some object information
> directly to `&state->stack->output`, Instead of assigning to `v->s` firstly.

Yes, that's the direction I think we'd want to go.

> But in `cmp_ref_sorting()` we will use `get_ref_atom_value()`, It is possible
> to compare `v->s` of two different refs, I must goto fill object info in `v->s`.
> 
> So I think this is one of the reasons why `ref-filter` desires to
> allocate a large
> number of strings, right?

Yeah, I think sorting in general is a bit tricky, because it inherently
requires collecting the value for each item. Just thinking about what
properties an ideal solution would have (which we might not be able to
get all of):

  - if we're sorting by something numeric (e.g., an committer
    timestamp), we should avoid forming it into a string at all

  - if the sort item requires work to extract that overlaps with the
    output format (e.g., sorting by authordate and showing author name
    in the format, both of which require parsing the author ident line
    of a commit), ideally we'd just do that work once per ref/object.

  - if we are sorting, obviously we have to hold some amount of data for
    each item in memory all at once (since we have to get data on the
    sort properties for each, and then sort the result). So we'd
    probably need at least some allocation per ref anyway, and an extra
    string isn't too bad. But if we're not sorting, then it would be
    nice to consider one ref/object at a time, which lets us keep our
    peak memory usage lower, reuse output buffers, etc.

I think some of those are in competition with each other. Minimizing
work shared between the sorting and format steps means keeping more data
in memory. So it might be sensible to just treat them totally
independently, and not worry about sharing work (I haven't looked at how
ref-filter does this now).  TBH, I care a lot less about making the
"sorting" case fast than I do about making sure that if we _aren't_
sorting, we go as fast as possible.

-Peff

  reply	other threads:[~2021-04-13  6:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02  9:03 GSoC Git Proposal Draft - ZheNing Hu ZheNing Hu
2021-04-02 14:57 ` Christian Couder
2021-04-03 13:23   ` ZheNing Hu
2021-04-02 15:39 ` Jeff King
2021-04-03 14:27   ` ZheNing Hu
2021-04-07 19:28     ` Jeff King
2021-04-08 13:29       ` ZheNing Hu
2021-04-11  6:11 ` ZheNing Hu
2021-04-11 15:34   ` ZheNing Hu
2021-04-13  6:40     ` Jeff King [this message]
2021-04-13 14:51       ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHU86dPKODAPDXY0@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=periperidip@gmail.com \
    --cc=sunshine@sunshineco.com \
    --cc=worldhello.net@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).