git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Matthieu Moy <git@matthieu-moy.fr>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Students projects: looking for small and medium project ideas
Date: Tue, 22 Jan 2019 17:09:34 +0700	[thread overview]
Message-ID: <CACsJy8CiPPp0C6o_ADv_mvi2gv=PsR+W=E3OHYw8hWbsPhrpOQ@mail.gmail.com> (raw)
In-Reply-To: <86fttvcehs.fsf@matthieu-moy.fr>

On Tue, Jan 15, 2019 at 12:55 AM Matthieu Moy <git@matthieu-moy.fr> wrote:
>
> Hi,
>
> ...
>
> You may suggest ideas by editting the wiki page, or just by replying to
> this email (I'll point my students to the thread). Don't hesitate to
> remove entries (or ask me to do so) on the wiki page if you think they
> are not relevant anymore.

I just mentioned this elsewhere [1] but let me summarize it here
because I think this could be an interesting thing to do and once you
get attr.c code it's not that hard to do. The student would need to
understand about git attributes and how it's implemented in attr.c.
But that's about it. More background below, but the summary line is
"optimize attribute lookup to be proportional with the number of
attributes queried, not the number of attributes present in
.gitattributes files".

So, we normally look up the same set of attributes over a long list of
paths. We do this by building up an "attribute stack" containing all
attribute info collected from all related .gitattributes files.
Whenever we move from one path to the next, we update the stack
slightly (e.g. if the previous path is a/b/c and the current one is
a/d/e, we need to delete attributes from a/b/.gitattributes from the
stack, then add ones from a/d/.gitattributes). Looking up is just a
matter of going through this stack, find attribute lines that match
the given path, then get the attribute value.

This approach will not scale well. Assume that you have a giant
.gitattrbutes file (or spreading over many files) with a zillion
random attributes and two lines about "love" attribute. When you look
up this "love" attribute you may end up going through all those
attribute lines. [2] hints about a better approach in the comment near
cannot_trust_maybe_real. If you know you are looking for "love", when
you build up the attribute stack, just keep "love" and ignore
everything else [3]. This way, the attribute stack that we need to
lookup will have two lines about "love". Lookup time is of course now
much faster. In the best possible case, when you look for an attribute
that is not defined anywhere in .gitattributes files in your repo, you
get an instant "not found" response because the attribute stack is
empty. This edge case was implemented in [4].

[1] https://public-inbox.org/git/20190118165800.GA9956@sigill.intra.peff.net/T/#m32fef6a9e8f65dffae41e44a62dd76b4a84fa0fe
[2] 7d42ec547c (attr.c: outline the future plans by heavily commenting
- 2017-01-27)
[3] well, macros make it a bit more complex, but I'll leave that as an exercise.
[4] 06a604e670 (attr: avoid heavy work when we know the specified attr
is not defined - 2014-12-28)
-- 
Duy

  parent reply	other threads:[~2019-01-22 10:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-14 17:53 Students projects: looking for small and medium project ideas Matthieu Moy
2019-01-14 23:04 ` Ævar Arnfjörð Bjarmason
2019-01-15 21:32 ` Alban Gruin
2019-01-22 10:09 ` Duy Nguyen [this message]
2019-02-23 13:28 ` Fabio Aiuto
2019-02-26 17:51   ` Matthieu Moy
2019-02-26 20:14     ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8CiPPp0C6o_ADv_mvi2gv=PsR+W=E3OHYw8hWbsPhrpOQ@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=git@matthieu-moy.fr \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).