git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
To: Christian Couder <christian.couder@gmail.com>
Cc: git <git@vger.kernel.org>, "Jeff King" <peff@peff.net>,
	"Duy Nguyen" <pclouds@gmail.com>,
	"Thomas Gummerer" <t.gummerer@gmail.com>,
	"Оля Тележная" <olyatelezhnaya@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Tanushree Tumane" <tanushreetumane@gmail.com>,
	"David Kastrup" <dak@gnu.org>
Subject: Re: Questions on GSoC 2019 Ideas
Date: Fri, 5 Apr 2019 13:28:48 -0300	[thread overview]
Message-ID: <CAHd-oW6iBQJ_SCTbRtDdWrg=NftqcMhyZ=SFkj7Am==OpG3bTA@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD33xf8FMuVNakzaUhYXo3A2fnvBAoFgoDQUOKgqnWYQBw@mail.gmail.com>

On Thu, Apr 4, 2019 at 4:56 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> Hi,
>
> On Thu, Apr 4, 2019 at 3:15 AM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:
> >
> > I've been studying the codebase and looking for older emails in the ML
> > that discussed what I want to propose as my GSoC project. In
> > particular, I found a thread about slow git commands on chromium, so I
> > reached them out at chromium's ML to ask if it's still an issue. I got
> > the following answer:
> >
> > On Wed, Apr 3, 2019 at 1:41 PM Erik Chen <erikchen@chromium.org> wrote:
> > > Yes, this is absolutely still a problem for Chrome. I filed some bugs for common operations that are slow for Chrome: git blame [1], git stash [2], git status [3]
> > > On Linux, blame is the only operation that is really problematic. On macOS and Windows ... it's hard to find a git operation that isn't slow. :(
>
> Nice investigation. About git status I wonder though if they have
> tried the possible optimizations, like untracked cache or
> core.fsmonitor.

I don't know if they did, but I suggested them to check
core.commitGraph, pack.useBitmaps and core.untrackedCache (which Duy
suggested me in another thread).

> > I don't really know if treading would help stash and status, but I
> > think it could help blame. By the little I've read of blame's code so
> > far, my guess is that the priority queue used for the commits could be
> > an interface for a producer-consumer mechanism and that way,
> > assign_blame's main loop could be done in parallel. And as we can se
> > at [4], that is 90% of the command's time. Does this makes sense?
>
> I can't really tell as I haven't studied this, but from the links in
> your email I think it kind of makes sense.
>
> Instead of doing assign_blame()'s main loop in parallel though, if my
> focus was only making git blame faster, I think I would first try to
> cache xdl_hash_record() results and then if possible to compute
> xdl_hash_record() in parallel as it seems to be a big bottleneck and a
> quite low hanging fruit.

Hm, I see. But although it would take more effort to add threading at
assign_blame(), wouldn't it be better because more work could be done
in parallel? I think it could be implemented in the same fashion git
grep does.

> > But as Duy pointed out, if I recall correctly, for git blame to be
> > parallel, pack access and diff code would have to be thread-safe
> > first. And also, it seems, by what we've talked earlier, that this
> > much wouldn't fit all together in a single GSoC. So, would it be a
> > nice GSoC proposal to try "making code used by blame thread-safe",
> > targeting a future parallelism on blame to be done after GSoC?
>
> Yeah, I think it would be a nice proposal, even though it doesn't seem
> to be the most straightforward way to make git blame faster.
>
> Back in 2008 when we proposed a GSoC about creating a sequencer, it
> wasn't something that would easily fit in a GSoC, and in fact it
> didn't, but over the long run it has been very fruitful as the
> sequencer is now used by cherry-pick and rebase -i, and there are
> plans to use it even more. So unless people think it's not a good idea
> for some reason, which hasn't been the case yet, I am ok with a GSoC
> project like this.
>
> > And if
> > so, could you please point me out which files should I be studying to
> > write the planning for this proposal? (Unfortunately I wasn't able to
> > study pack access and diff code yet. I got carried on looking for
> > performance hostposts and now I'm a bit behind schedule :(
>
> I don't think you need to study everything yet, and I think you
> already did a lot of studying, so I would suggest you first try to
> send soon a proposal with the information you have right now, and then
> depending on the feedback you get and the time left (likely not
> much!!!), you might study some parts of the code a bit more later.

Thanks a lot, Christian. I'm writing my proposal and will try to send it today.

> > Also, an implementation for fuzzy blame is being developer right
> > now[5] and Jeff (CC-ed) suggested recently another performance
> > improvement that could be done in blame[6]. So I would like to know
> > wether you think it is worthy putting efforts trying to parallelize
> > it.
>
> What you would do seems compatible to me with the fuzzy blame effort
> and an effort to cache xdl_hash_record() results.
>
> Thanks,
> Christian.

  parent reply	other threads:[~2019-04-05 16:29 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-28 21:46 Questions on GSoC 2019 Ideas Matheus Tavares Bernardino
2019-02-28 22:07 ` Christian Couder
2019-03-01  9:30   ` Duy Nguyen
2019-03-02 15:09     ` Thomas Gummerer
2019-03-03  7:18       ` Christian Couder
2019-03-03 10:12         ` Duy Nguyen
2019-03-03 10:17           ` Duy Nguyen
2019-03-05  4:51           ` Jeff King
2019-03-05 12:57             ` Duy Nguyen
2019-03-05 23:46               ` Matheus Tavares Bernardino
2019-03-06 10:17                 ` Duy Nguyen
2019-03-12  0:18                   ` Matheus Tavares Bernardino
2019-03-12 10:02                     ` Duy Nguyen
2019-03-12 10:11                       ` Duy Nguyen
2019-04-04  1:15                         ` Matheus Tavares Bernardino
2019-04-04  7:56                           ` Christian Couder
2019-04-04  8:20                             ` Mike Hommey
2019-04-05 16:28                             ` Matheus Tavares Bernardino [this message]
2019-04-07 23:40                               ` Christian Couder
2019-03-05 23:03         ` Matheus Tavares Bernardino
2019-03-06 23:17           ` Thomas Gummerer
2019-03-03 10:03       ` Duy Nguyen
2019-03-03 16:12         ` Thomas Gummerer
2019-03-01 15:20   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHd-oW6iBQJ_SCTbRtDdWrg=NftqcMhyZ=SFkj7Am==OpG3bTA@mail.gmail.com' \
    --to=matheus.bernardino@usp.br \
    --cc=christian.couder@gmail.com \
    --cc=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=t.gummerer@gmail.com \
    --cc=tanushreetumane@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).