git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: Thomas Gummerer <t.gummerer@gmail.com>
Cc: "Duy Nguyen" <pclouds@gmail.com>,
	"Matheus Tavares Bernardino" <matheus.bernardino@usp.br>,
	git <git@vger.kernel.org>,
	"Оля Тележная" <olyatelezhnaya@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Tanushree Tumane" <tanushreetumane@gmail.com>
Subject: Re: Questions on GSoC 2019 Ideas
Date: Sun, 3 Mar 2019 08:18:45 +0100	[thread overview]
Message-ID: <CAP8UFD31YKt7fm+shWdBxsL4fCSO4dU=97YwFsZ9gZBpEWmRPQ@mail.gmail.com> (raw)
In-Reply-To: <20190302150900.GU6085@hank.intra.tgummerer.com>

On Sat, Mar 2, 2019 at 4:09 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/01, Duy Nguyen wrote:
> > On Fri, Mar 1, 2019 at 5:20 AM Christian Couder
> > <christian.couder@gmail.com> wrote:
> > >
> > > Hi Matheus,
> > >
> > > On Thu, Feb 28, 2019 at 10:46 PM Matheus Tavares Bernardino
> > > <matheus.bernardino@usp.br> wrote:
> > > >
> > > > I've been in the mailing list for a couple weeks now, mainly working
> > > > on my gsoc micro-project[1] and in other patches that derived from it.
> > > > I also have been contributing to the Linux Kernel for half an year,
> > > > but am now mainly just supporting other students here at USP.
> > > >
> > > > I have read the ideas page for the GSoC 2019 and many of them interest
> > > > me. Also, looking around git-dev materials on the web, I got to the
> > > > GSoC 2012 ideas page. And this one got my attention:
> > > > https://github.com/peff/git/wiki/SoC-2012-Ideas#improving-parallelism-in-various-commands
> > > >
> > > > I'm interested in parallel computing and that has been my research
> > > > topic for about an year now. So I would like to ask what's the status
> > > > of this GSoC idea. I've read git-grep and saw that it is already
> > > > parallel, but I was wondering if there is any other section in git in
> > > > which it was already considered to bring parallelism, seeking to
> > > > achieve greater performance. And also, if this could, perhaps, be a
> > > > GSoC project.
> > >
> > > I vaguely remember that we thought at one point that all the low
> > > hanging fruits had already been taken in this area but I might be
> > > wrong.
> >
> > We still have to remove some global variables, which is quite easy to
> > do, before one could actually add mutexes and stuff to allow multiple
> > pack access. I don't know though if the removing global variables is
> > that exciting for GSoC, or if both tasks could fit in one GSoC. The
> > adding parallel access is not that hard, I think, once you know
> > packfile.c and sha1-file.c relatively well. It's mostly dealing with
> > caches and all the sliding access windows safely.
>
> I'm not very familiar with what's required here, but reading the above
> makes me think it's likely too much for a GSoC project.  I think I'd
> be happy with a project that declares removing the global variables as
> the main goal, and adding parallelism as a potential bonus.

Yeah, I think that the main issue, now that Duy found something that
could be a GSoC project, is that the potential mentors are not
familiar with the pack access code. It means that Matheus would
probably not get a lot of help from his mentors when he would work on
adding parallelism.

That may not be too big a problem though if Matheus is ok to ask many
technical questions on the mailing list. It seems to me that he could
succeed.

> I'm a bit wary of a too large proposal here, as we've historically
> overestimated what kind of project is achievable over a summer (I've
> been there myself, as my GSoC project was also more than I was able to
> do in a summer :)).  I'd rather have a project whose goal is rather
> small and can be expanded later, than having something that could
> potentially take more than 3 months, where the student (or their
> mentors) have to finish it after GSoC.

Yeah, I agree with your suggestion about a project that declares
removing the global variables as the main goal, and adding parallelism
as a potential bonus.

One thing I am still worried about is if we are sure that adding
parallelism is likely to get us a significant performance improvement
or not. If the performance of this code is bounded by disk or memory
access, then adding parallelism might not bring any benefit. (It could
perhaps decrease performance if memory locality gets worse.) So I'd
like some confirmation either by running some tests or by experienced
Git developers that it is likely to be a win.

  reply	other threads:[~2019-03-03  7:19 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-28 21:46 Questions on GSoC 2019 Ideas Matheus Tavares Bernardino
2019-02-28 22:07 ` Christian Couder
2019-03-01  9:30   ` Duy Nguyen
2019-03-02 15:09     ` Thomas Gummerer
2019-03-03  7:18       ` Christian Couder [this message]
2019-03-03 10:12         ` Duy Nguyen
2019-03-03 10:17           ` Duy Nguyen
2019-03-05  4:51           ` Jeff King
2019-03-05 12:57             ` Duy Nguyen
2019-03-05 23:46               ` Matheus Tavares Bernardino
2019-03-06 10:17                 ` Duy Nguyen
2019-03-12  0:18                   ` Matheus Tavares Bernardino
2019-03-12 10:02                     ` Duy Nguyen
2019-03-12 10:11                       ` Duy Nguyen
2019-04-04  1:15                         ` Matheus Tavares Bernardino
2019-04-04  7:56                           ` Christian Couder
2019-04-04  8:20                             ` Mike Hommey
2019-04-05 16:28                             ` Matheus Tavares Bernardino
2019-04-07 23:40                               ` Christian Couder
2019-03-05 23:03         ` Matheus Tavares Bernardino
2019-03-06 23:17           ` Thomas Gummerer
2019-03-03 10:03       ` Duy Nguyen
2019-03-03 16:12         ` Thomas Gummerer
2019-03-01 15:20   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP8UFD31YKt7fm+shWdBxsL4fCSO4dU=97YwFsZ9gZBpEWmRPQ@mail.gmail.com' \
    --to=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=t.gummerer@gmail.com \
    --cc=tanushreetumane@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).