From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Geoffrey Irving <irving@naml.us>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] Adding a cache of commit to patch-id pairs to speed up git-cherry
Date: Mon, 2 Jun 2008 16:52:32 +0100 (BST) [thread overview]
Message-ID: <alpine.DEB.1.00.0806021639150.13507@racer.site.net> (raw)
In-Reply-To: <7f9d599f0806020750g78e6816dl884d36bb903c707b@mail.gmail.com>
Hi,
On Mon, 2 Jun 2008, Geoffrey Irving wrote:
> On Sun, Jun 1, 2008 at 11:13 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > On Sun, 1 Jun 2008, Geoffrey Irving wrote:
> >
> >> See commit d56dbe8cb56ce9b4697d1f1c2425e2e133a932a5 for the original
> >> code.
> >
> > This is not in any official git.git repository. And my branches are
> > prone to be rebased. So this is not a good reference. The mailing
> > list, however, would be one.
>
> That was added only because I knew it wouldn't be the final patch, and
> that it'd be useful to you.
Ah, then you might want to say "[PATCH/RFC]" in the subject. I thought
you meant this for inclusion.
> >> diff --git a/Makefile b/Makefile
> >> index cce5a6e..3a5396d 100644
> >> --- a/Makefile
> >> +++ b/Makefile
> >> @@ -435,6 +435,7 @@ LIB_OBJS += pager.o
> >> LIB_OBJS += parse-options.o
> >> LIB_OBJS += patch-delta.o
> >> LIB_OBJS += patch-ids.o
> >> +LIB_OBJS += patch-id-cache.o
> >
> > If all you do is a hashmap from SHA-1 to SHA-1, then I think
> > "patch-id-cache" is a misnomer for that file and structure.
> >
> > Let's not repeat the same naming mistakes as we have for path_list and
> > decoration.
>
> Is persistent_sha1_map a good name?
I'd prefer cached_sha1_map, but I do not really care.
> >> +void write_patch_id_cache()
> >> +{
> >> + if (!cache || cache->count == written_count)
> >> + return;
> >
> > Does that mean that the patch_id_cache is not updated when the number
> > of commits stays the same after committing, branch -d and gc?
>
> The patch_id_cache is only updated when it changes. Since entries are
> immutable and are never deleted, all changes increase the count.
Right, I only realized that with my previous post to this thread.
However, in that case I would prefer a flag "int dirty:1" to the cache
structure, which is set whenever it needs writing. Certainly not a global
variable (which the cache already is).
> >> + hashcpy(entry.commit_sha1, commit_sha1);
> >> + hashcpy(entry.patch_id_sha1, patch_id_sha1);
> >
> > It would be more elegant to copy the SHA-1s _after_ finding where to
> > write them.
>
> Alas, that would break the elegance of my loop, since the loop swaps in
> new entries to keep the table sorted. I can remove the sorting if you
> want: I only left it in there to be more similar to your code.
I think that my code tried to do too much here; it was before I benched
hashmap vs binary-search.
So I think that can just go and simplify the elegance of the loop even
more ;-)
> > Declar... ah, well, suffice to say that you should read the
> > CodingGuidelines, and try to fix up all the offending sites in your
> > code.
>
> I'd be happy to do that, but I don't see mention of either C++
> comments or declarations after statements in the CodingGuidelines.
>From the Coding Guidelines:
As for more concrete guidelines, just imitate the existing code
(this is a good guideline, no matter which project you are
contributing to).
>
> >> + if (!cmp) {
> >> + if (hashcmp(entry.patch_id_sha1, cache->entries[i].patch_id_sha1))
> >> + die("found mismatched entry in patch-id-cache");
> >
> > I wonder if that potentially expensive operation should not rather be
> > wrapped in an assert() call (since I recently learnt that Git's source
> > code has more than one instance of assert()).
>
> A 20 byte comparison doesn't seem potentially expensive to me.
It's all a question of repetition, isn't it?
In any case, I am not a fan of wasting cycles unnecessarily. The check
would indicate a programming error, not a user error, and should therefore
punish the programmer, not the user.
> >> static uint32_t take2(const unsigned char *id)
> >> @@ -136,6 +150,8 @@ int free_patch_ids(struct patch_ids *ids)
> >> next = patches->next;
> >> free(patches);
> >> }
> >> +
> >> + write_patch_id_cache();
> >> return 0;
> >
> > That's cute.
>
> Thanks (assuming cute means good).
Yes, it does!
> I'll wait to actually make any of these changes until you and Jeff
> decide whether I should stick to hashes or use binary search. It seems
> sad not to use hashes for a map when we get the best hash keys in the
> world for free, so I'd prefer not switching.
Me, too.
BTW I think that you did some awesome work here, and I am glad that you
could use my code as starting point.
Thanks,
Dscho
next prev parent reply other threads:[~2008-06-02 15:54 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-02 3:54 [PATCH] Adding a cache of commit to patch-id pairs to speed up git-cherry Geoffrey Irving
2008-06-02 6:13 ` Johannes Schindelin
2008-06-02 6:42 ` Jeff King
2008-06-02 14:35 ` Geoffrey Irving
2008-06-02 15:37 ` Johannes Schindelin
2008-06-02 15:49 ` Geoffrey Irving
2008-06-02 15:56 ` Shawn O. Pearce
2008-06-02 16:18 ` Johannes Schindelin
2008-06-02 16:26 ` Geoffrey Irving
2008-06-02 18:15 ` Johannes Schindelin
2008-06-07 23:50 ` Geoffrey Irving
2008-06-08 16:10 ` Johannes Schindelin
2008-06-02 14:50 ` Geoffrey Irving
2008-06-02 15:52 ` Johannes Schindelin [this message]
2008-06-02 17:23 ` Geoffrey Irving
2008-06-02 18:22 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.1.00.0806021639150.13507@racer.site.net \
--to=johannes.schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=irving@naml.us \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).