git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
Cc: git <git@vger.kernel.org>, "Jeff King" <peff@peff.net>,
	"Duy Nguyen" <pclouds@gmail.com>,
	"Thomas Gummerer" <t.gummerer@gmail.com>,
	"Оля Тележная" <olyatelezhnaya@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Tanushree Tumane" <tanushreetumane@gmail.com>,
	"David Kastrup" <dak@gnu.org>
Subject: Re: Questions on GSoC 2019 Ideas
Date: Thu, 4 Apr 2019 09:56:35 +0200	[thread overview]
Message-ID: <CAP8UFD33xf8FMuVNakzaUhYXo3A2fnvBAoFgoDQUOKgqnWYQBw@mail.gmail.com> (raw)
In-Reply-To: <CAHd-oW7fXbJyxesgCoiTOWGLH9Tpk5FUN7VsaBrqU842BJpT3Q@mail.gmail.com>

Hi,

On Thu, Apr 4, 2019 at 3:15 AM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> I've been studying the codebase and looking for older emails in the ML
> that discussed what I want to propose as my GSoC project. In
> particular, I found a thread about slow git commands on chromium, so I
> reached them out at chromium's ML to ask if it's still an issue. I got
> the following answer:
>
> On Wed, Apr 3, 2019 at 1:41 PM Erik Chen <erikchen@chromium.org> wrote:
> > Yes, this is absolutely still a problem for Chrome. I filed some bugs for common operations that are slow for Chrome: git blame [1], git stash [2], git status [3]
> > On Linux, blame is the only operation that is really problematic. On macOS and Windows ... it's hard to find a git operation that isn't slow. :(

Nice investigation. About git status I wonder though if they have
tried the possible optimizations, like untracked cache or
core.fsmonitor.

> I don't really know if treading would help stash and status, but I
> think it could help blame. By the little I've read of blame's code so
> far, my guess is that the priority queue used for the commits could be
> an interface for a producer-consumer mechanism and that way,
> assign_blame's main loop could be done in parallel. And as we can se
> at [4], that is 90% of the command's time. Does this makes sense?

I can't really tell as I haven't studied this, but from the links in
your email I think it kind of makes sense.

Instead of doing assign_blame()'s main loop in parallel though, if my
focus was only making git blame faster, I think I would first try to
cache xdl_hash_record() results and then if possible to compute
xdl_hash_record() in parallel as it seems to be a big bottleneck and a
quite low hanging fruit.

> But as Duy pointed out, if I recall correctly, for git blame to be
> parallel, pack access and diff code would have to be thread-safe
> first. And also, it seems, by what we've talked earlier, that this
> much wouldn't fit all together in a single GSoC. So, would it be a
> nice GSoC proposal to try "making code used by blame thread-safe",
> targeting a future parallelism on blame to be done after GSoC?

Yeah, I think it would be a nice proposal, even though it doesn't seem
to be the most straightforward way to make git blame faster.

Back in 2008 when we proposed a GSoC about creating a sequencer, it
wasn't something that would easily fit in a GSoC, and in fact it
didn't, but over the long run it has been very fruitful as the
sequencer is now used by cherry-pick and rebase -i, and there are
plans to use it even more. So unless people think it's not a good idea
for some reason, which hasn't been the case yet, I am ok with a GSoC
project like this.

> And if
> so, could you please point me out which files should I be studying to
> write the planning for this proposal? (Unfortunately I wasn't able to
> study pack access and diff code yet. I got carried on looking for
> performance hostposts and now I'm a bit behind schedule :(

I don't think you need to study everything yet, and I think you
already did a lot of studying, so I would suggest you first try to
send soon a proposal with the information you have right now, and then
depending on the feedback you get and the time left (likely not
much!!!), you might study some parts of the code a bit more later.

> Also, an implementation for fuzzy blame is being developer right
> now[5] and Jeff (CC-ed) suggested recently another performance
> improvement that could be done in blame[6]. So I would like to know
> wether you think it is worthy putting efforts trying to parallelize
> it.

What you would do seems compatible to me with the fuzzy blame effort
and an effort to cache xdl_hash_record() results.

Thanks,
Christian.

  reply	other threads:[~2019-04-04  7:56 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-28 21:46 Questions on GSoC 2019 Ideas Matheus Tavares Bernardino
2019-02-28 22:07 ` Christian Couder
2019-03-01  9:30   ` Duy Nguyen
2019-03-02 15:09     ` Thomas Gummerer
2019-03-03  7:18       ` Christian Couder
2019-03-03 10:12         ` Duy Nguyen
2019-03-03 10:17           ` Duy Nguyen
2019-03-05  4:51           ` Jeff King
2019-03-05 12:57             ` Duy Nguyen
2019-03-05 23:46               ` Matheus Tavares Bernardino
2019-03-06 10:17                 ` Duy Nguyen
2019-03-12  0:18                   ` Matheus Tavares Bernardino
2019-03-12 10:02                     ` Duy Nguyen
2019-03-12 10:11                       ` Duy Nguyen
2019-04-04  1:15                         ` Matheus Tavares Bernardino
2019-04-04  7:56                           ` Christian Couder [this message]
2019-04-04  8:20                             ` Mike Hommey
2019-04-05 16:28                             ` Matheus Tavares Bernardino
2019-04-07 23:40                               ` Christian Couder
2019-03-05 23:03         ` Matheus Tavares Bernardino
2019-03-06 23:17           ` Thomas Gummerer
2019-03-03 10:03       ` Duy Nguyen
2019-03-03 16:12         ` Thomas Gummerer
2019-03-01 15:20   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP8UFD33xf8FMuVNakzaUhYXo3A2fnvBAoFgoDQUOKgqnWYQBw@mail.gmail.com \
    --to=christian.couder@gmail.com \
    --cc=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=t.gummerer@gmail.com \
    --cc=tanushreetumane@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).