git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Re: Speeding up history traversals with caches
       [not found] <CAGdZCnE-5CUxc-1PSSYagfXE9HMhu9k8xpo8rwM1K_urmY8gHw@mail.gmail.com>
@ 2017-09-27  8:09 ` Jeff King
  0 siblings, 0 replies; only message in thread
From: Jeff King @ 2017-09-27  8:09 UTC (permalink / raw)
  To: Sabelo Mhlambi; +Cc: git

On Mon, Sep 25, 2017 at 05:28:43PM -0700, Sabelo Mhlambi wrote:

> Hi Jeff (and the Git community),
> 
> As my intro to open source contributions I'd like to attempt the "Speeding
> up history traversals with caches" as outlined here
> https://git.github.io/Outreachy-15/.
> 
> It seems like a challenging and worthwhile problem. May I have more
> information on the project and on how to get get started on the application.
> 
> Thanks!

Hi Sabelo, welcome to Git!

Unfortunately your message didn't make it to the mailing list, because
the list software is strict about messages not including any HTML parts.
It looks like you're using Gmail; you'll need to ask it to send
plain-text emails.

The general idea of the project is: a lot of git commands need to access
commit objects to walk the history graph, but they're expensive to
access because we have to inflate the whole commit object from disk.
What I'd like to have instead is a compact representation that we can
quickly use to get the main interesting data out of a commit message
without having to inflate all of the bytes.

I did a prototype of this a few years ago:

  https://public-inbox.org/git/20130129091434.GA6975@sigill.intra.peff.net/

Compared to those patches, there are a lot of possible things to work
on:

  - the code needs cleaned up and ported to a more modern git

  - the implementation is a bit complex; it was anticipating having
    several types of auxiliary files, but probably we really just need
    one

  - we've also discussed storing computed data about the graph, such as
    generation numbers, which can help speed up some traversals

  - we may be able to cache some interesting tree data (e.g., bitmaps of
    which paths are touched by a particular commit).

I wouldn't expect us to cover all of that during the internship period,
but it gives a sense of the possible directions.

That thread may work as a starting point for understanding the problem
space. You can also probably find some interesting discussions if you
search for "generation number" in the mailing list archive at
https://public-inbox.org/git.

The first step is probably to get comfortable with building Git and
submitting a small patch. Christian posted some advice on finding a
topic to work on:

  https://public-inbox.org/git/CAP8UFD3vPQHJZNt1+egKkshiyqrGKiJp7eWU-Es6bTLgvXe1Kg@mail.gmail.com/

Let us know if you get stuck or if you have any questions!

-Peff

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2017-09-27  8:09 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAGdZCnE-5CUxc-1PSSYagfXE9HMhu9k8xpo8rwM1K_urmY8gHw@mail.gmail.com>
2017-09-27  8:09 ` Speeding up history traversals with caches Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).