git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Dmitry Potapov <dpotapov@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marko Kreen <markokr@gmail.com>, Andreas Ericsson <ae@op5.se>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: I'm a total push-over..
Date: Sun, 27 Jan 2008 11:21:28 +0300	[thread overview]
Message-ID: <20080127082128.GH26664@dpotapov.dyndns.org> (raw)
In-Reply-To: <alpine.LFD.1.00.0801262247140.3222@www.l.google.com>

On Sat, Jan 26, 2008 at 10:51:18PM -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 26 Jan 2008, Marko Kreen wrote:
> > 
> > Here you misunderstood me, I was proposing following:
> > 
> > int hash_folded(const char *str, int len)
> > {
> >    char buf[512];
> >    do_folding(buf, str, len);
> >    return do_hash(buf, len);
> > }
> > 
> > That is - the folded string should stay internal to hash function.
> 
> If it's internal, it's much better, but you still missed the performance 
> angle.
> 
> The fact is, hashing can take shortcuts that folding cannot do!
> 
> Case folding, by definition, has to be "exact" (since the whole point is 
> what you're going to use the same folding function to do the compare, so 
> if you play games with folding, the compares will be wrong).

Let's rename do_folding as something else, because it is not a real
folding, but a preparation step for hash calculation. Keeping these
steps separately simplifies the code, and allows further optimization,
for instance, you do not need this do_folding step on a case-sensitive
filesystem. Though it is certainly possible to mix both steps together,
it bloats the code and makes it less readable. Of course, the idea to
avoid a temporary buffer and do everything at once is very appealing,
so I gave it a try -- and here is a 32-bit version of name_hash(), but
I am not very happy with the result:


#define rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))

#define mix(a,b,c) \
{ \
	a -= c;  a ^= rot(c, 4);  c += b; \
	b -= a;  b ^= rot(a, 6);  a += c; \
	c -= b;  c ^= rot(b, 8);  b += a; \
	a -= c;  a ^= rot(c,16);  c += b; \
	b -= a;  b ^= rot(a,19);  a += c; \
	c -= b;  c ^= rot(b, 4);  b += a; \
}
#define final(a,b,c) \
{ \
	c ^= b; c -= rot(b,14); \
	a ^= c; a -= rot(c,11); \
	b ^= a; b -= rot(a,25); \
	c ^= b; c -= rot(b,16); \
	a ^= c; a -= rot(c,4);  \
	b ^= a; b -= rot(a,14); \
	c ^= b; c -= rot(b,24); \
}

#define hash_value(x) \
	hs[hp] += (x); \
	if (++hp == 3) { \
		mix (hs[0], hs[1], hs[2]); \
		hp = 0; \
	}
unsigned int name_hash(const char *name, unsigned size)
{
	unsigned hp = 0;
	unsigned hs[3];
	hs[0] = hs[1] = hs[2] = 0xdeadbeef + size;

	do {
		unsigned char c;
		if (size >= sizeof(unsigned)) {
			unsigned val = get_unaligned_uint(name);
			if (!(val & 0x80808080)) {
				val &= ~0x20202020;
				hash_value(val);
				name += sizeof(val);
				size -= sizeof(val);
				continue;
			}
		}

		while (!((c = *name) & 0x80)) {
			hash_value(c & ~0x20);
			name++;
			if (!--size)
				goto done:
		}

		do {
			// TODO: add denormalization for Mac
			unsigned val = towupper (utf8_to_wchar(&name, &size));
			hash_value(val);
		} while (size && (*name & 0x80));

	} while (size);
done:
	if (hp)
		final(a,b,c);
	return hs[2];
}


Dmitry

  reply	other threads:[~2008-01-27  8:22 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-22 23:37 I'm a total push-over Linus Torvalds
2008-01-23  1:35 ` Kevin Ballard
2008-01-23  2:23 ` Junio C Hamano
2008-01-23  2:36   ` Junio C Hamano
2008-01-23 12:24     ` Johannes Schindelin
2008-01-23 12:28       ` David Kastrup
2008-01-23 12:56         ` Theodore Tso
2008-01-23  2:58   ` Linus Torvalds
2008-01-23  3:19     ` Linus Torvalds
2008-01-25  6:50       ` Junio C Hamano
2008-01-25 16:24         ` Linus Torvalds
2008-01-23  7:23     ` Junio C Hamano
2008-01-23 12:25       ` Johannes Schindelin
2008-01-23 16:25         ` Linus Torvalds
2008-01-23 16:34           ` Johannes Schindelin
2008-01-23 17:09             ` Linus Torvalds
2008-01-23 17:29               ` Linus Torvalds
2008-01-25  5:21                 ` Jeremy Maitin-Shepard
2008-01-25 12:51                   ` Johannes Schindelin
2008-01-25 18:19                     ` Jeremy Maitin-Shepard
2008-01-25 18:24                       ` Johannes Schindelin
2008-01-25 19:07                       ` Junio C Hamano
2008-01-23  8:32 ` Andreas Ericsson
2008-01-23  9:15   ` Dmitry Potapov
2008-01-23  9:31     ` Andreas Ericsson
2008-01-23 14:01       ` Marko Kreen
2008-01-23 14:39         ` Andreas Ericsson
2008-01-24  6:51           ` Luke Lu
2008-01-24 10:24             ` Andreas Ericsson
2008-01-24 13:19           ` Marko Kreen
2008-01-24 16:00             ` Andreas Ericsson
2008-01-24 16:13               ` Marko Kreen
2008-01-24 16:28               ` Dmitry Potapov
2008-01-24 17:15                 ` Linus Torvalds
2008-01-24 18:45                   ` Dmitry Potapov
2008-01-24 19:08                     ` Linus Torvalds
2008-01-25 20:52                   ` Marko Kreen
2008-01-25 22:16                     ` Linus Torvalds
2008-01-25 22:35                       ` Linus Torvalds
2008-01-26 12:16                       ` Marko Kreen
2008-01-27  6:51                         ` Linus Torvalds
2008-01-27  8:21                           ` Dmitry Potapov [this message]
2008-01-27 14:07                             ` Johannes Schindelin
2008-01-27 14:48                               ` Dmitry Potapov
2008-01-27  9:45                           ` Marko Kreen
2008-01-27 15:06                             ` Dmitry Potapov
2008-01-26 12:37                       ` Marko Kreen
2008-01-25 20:08               ` Marko Kreen
2008-01-23 17:10       ` Dmitry Potapov
2008-01-24 10:39         ` Andreas Ericsson
2008-01-23 16:06   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080127082128.GH26664@dpotapov.dyndns.org \
    --to=dpotapov@gmail.com \
    --cc=ae@op5.se \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=markokr@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).