git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Geoffrey Irving <irving@naml.us>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] Adding a cache of commit to patch-id pairs to speed up git-cherry
Date: Mon, 2 Jun 2008 16:52:32 +0100 (BST)	[thread overview]
Message-ID: <alpine.DEB.1.00.0806021639150.13507@racer.site.net> (raw)
In-Reply-To: <7f9d599f0806020750g78e6816dl884d36bb903c707b@mail.gmail.com>

Hi,

On Mon, 2 Jun 2008, Geoffrey Irving wrote:

> On Sun, Jun 1, 2008 at 11:13 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > On Sun, 1 Jun 2008, Geoffrey Irving wrote:
> >
> >> See commit d56dbe8cb56ce9b4697d1f1c2425e2e133a932a5 for the original 
> >> code.
> >
> > This is not in any official git.git repository.  And my branches are 
> > prone to be rebased.  So this is not a good reference.  The mailing 
> > list, however, would be one.
>
> That was added only because I knew it wouldn't be the final patch, and 
> that it'd be useful to you.

Ah, then you might want to say "[PATCH/RFC]" in the subject.  I thought 
you meant this for inclusion.

> >> diff --git a/Makefile b/Makefile
> >> index cce5a6e..3a5396d 100644
> >> --- a/Makefile
> >> +++ b/Makefile
> >> @@ -435,6 +435,7 @@ LIB_OBJS += pager.o
> >>  LIB_OBJS += parse-options.o
> >>  LIB_OBJS += patch-delta.o
> >>  LIB_OBJS += patch-ids.o
> >> +LIB_OBJS += patch-id-cache.o
> >
> > If all you do is a hashmap from SHA-1 to SHA-1, then I think
> > "patch-id-cache" is a misnomer for that file and structure.
> >
> > Let's not repeat the same naming mistakes as we have for path_list and
> > decoration.
> 
> Is persistent_sha1_map a good name?

I'd prefer cached_sha1_map, but I do not really care.

> >> +void write_patch_id_cache()
> >> +{
> >> +     if (!cache || cache->count == written_count)
> >> +             return;
> >
> > Does that mean that the patch_id_cache is not updated when the number 
> > of commits stays the same after committing, branch -d and gc?
> 
> The patch_id_cache is only updated when it changes.  Since entries are 
> immutable and are never deleted, all changes increase the count.

Right, I only realized that with my previous post to this thread.  
However, in that case I would prefer a flag "int dirty:1" to the cache 
structure, which is set whenever it needs writing.  Certainly not a global 
variable (which the cache already is).
 
> >> +     hashcpy(entry.commit_sha1, commit_sha1);
> >> +     hashcpy(entry.patch_id_sha1, patch_id_sha1);
> >
> > It would be more elegant to copy the SHA-1s _after_ finding where to 
> > write them.
> 
> Alas, that would break the elegance of my loop, since the loop swaps in 
> new entries to keep the table sorted.  I can remove the sorting if you 
> want: I only left it in there to be more similar to your code.

I think that my code tried to do too much here; it was before I benched 
hashmap vs binary-search.

So I think that can just go and simplify the elegance of the loop even 
more ;-)

> > Declar... ah, well, suffice to say that you should read the 
> > CodingGuidelines, and try to fix up all the offending sites in your 
> > code.
> 
> I'd be happy to do that, but I don't see mention of either C++
> comments or declarations after statements in the CodingGuidelines.

>From the Coding Guidelines:

	As for more concrete guidelines, just imitate the existing code
	(this is a good guideline, no matter which project you are
	contributing to).

> 
> >> +             if (!cmp) {
> >> +                     if (hashcmp(entry.patch_id_sha1, cache->entries[i].patch_id_sha1))
> >> +                             die("found mismatched entry in patch-id-cache");
> >
> > I wonder if that potentially expensive operation should not rather be 
> > wrapped in an assert() call (since I recently learnt that Git's source 
> > code has more than one instance of assert()).
> 
> A 20 byte comparison doesn't seem potentially expensive to me.

It's all a question of repetition, isn't it?

In any case, I am not a fan of wasting cycles unnecessarily.  The check 
would indicate a programming error, not a user error, and should therefore 
punish the programmer, not the user.

> >>  static uint32_t take2(const unsigned char *id)
> >> @@ -136,6 +150,8 @@ int free_patch_ids(struct patch_ids *ids)
> >>               next = patches->next;
> >>               free(patches);
> >>       }
> >> +
> >> +     write_patch_id_cache();
> >>       return 0;
> >
> > That's cute.
> 
> Thanks (assuming cute means good).

Yes, it does!

> I'll wait to actually make any of these changes until you and Jeff 
> decide whether I should stick to hashes or use binary search.  It seems 
> sad not to use hashes for a map when we get the best hash keys in the 
> world for free, so I'd prefer not switching.

Me, too.

BTW I think that you did some awesome work here, and I am glad that you 
could use my code as starting point.

Thanks,
Dscho

  reply	other threads:[~2008-06-02 15:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-02  3:54 [PATCH] Adding a cache of commit to patch-id pairs to speed up git-cherry Geoffrey Irving
2008-06-02  6:13 ` Johannes Schindelin
2008-06-02  6:42   ` Jeff King
2008-06-02 14:35     ` Geoffrey Irving
2008-06-02 15:37       ` Johannes Schindelin
2008-06-02 15:49         ` Geoffrey Irving
2008-06-02 15:56           ` Shawn O. Pearce
2008-06-02 16:18           ` Johannes Schindelin
2008-06-02 16:26             ` Geoffrey Irving
2008-06-02 18:15               ` Johannes Schindelin
2008-06-07 23:50                 ` Geoffrey Irving
2008-06-08 16:10                   ` Johannes Schindelin
2008-06-02 14:50   ` Geoffrey Irving
2008-06-02 15:52     ` Johannes Schindelin [this message]
2008-06-02 17:23     ` Geoffrey Irving
2008-06-02 18:22       ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.1.00.0806021639150.13507@racer.site.net \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=irving@naml.us \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).