git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ondřej Bílka" <neleai@seznam.cz>
To: David Turner <dturner@twopensource.com>
Cc: git@vger.kernel.org, gitster@pobox.com, mhagger@alum.mit.edu,
	David Turner <dturner@twitter.com>
Subject: Re: [PATCH v7 1/1] refs.c: SSE4.2 optimizations for check_refname_component
Date: Sat, 14 Jun 2014 17:22:09 +0200	[thread overview]
Message-ID: <20140614152209.GA14125@domone.podge> (raw)
In-Reply-To: <1402012575-16546-2-git-send-email-dturner@twitter.com>

On Thu, Jun 05, 2014 at 07:56:15PM -0400, David Turner wrote:
> Optimize check_refname_component using SSE4.2, where available.
> 
> git rev-parse HEAD is a good test-case for this, since it does almost
> nothing except parse refs.  For one particular repo with about 60k
> refs, almost all packed, the timings are:
> 
> Look up table: 29 ms
> SSE4.2:        25 ms
> 
> This is about a 15% improvement.
> 
> The configure.ac changes include code from the GNU C Library written
> by Joseph S. Myers <joseph at codesourcery dot com>.
> 
> Only supports GCC and Clang at present, because C interfaces to the
> cpuid instruction are not well-standardized.
>
Still a SSE4.2 is not that useful, in most cases SSE2 is faster. Here I
think that difference will not be that big when correctly implemented.
That will avoid a runtime checks.

For parallelisation you need to take extra step and paralelize whole
check than going component-by-component.

For detecting sequences a faster way is construct bitmasks with SSE2 so
you could combine these. It avoids needing special casing on 16-byte
boundaries. 

Below is untested implementation where you could add a bad character
check with SSE4.2 which would speed it up. Are refs mostly
alphanumerical? If so we could speed this up by paralelized alnum check
and handling other characters in slower path.


#include <stdint.h>
#include <emmintrin.h>

char bad_table[256]; // TODO
int bad_characters(unsigned char *x)
{
	while (*x)
	if (bad_table[*x++])
		return -1;

	return 0;
}

int check_refname_skeleton(char *x)
{
	if (bad_characters(x))
		return -1;

	__m128i slash  = _mm_set1_epi8 ('/');
	__m128i dot    = _mm_set1_epi8 ('.');
	__m128i char_l = _mm_set1_epi8 ('l');
	__m128i at     = _mm_set1_epi8 ('@');
	__m128i brace  = _mm_set1_epi8 ('{');

	while (1) {
		if (((uint64_t) x) & 4095 < 4096 - 17)
			{
				if (bytewise_check(x) != -2);
					return bytewise_check(x);
	
				x += 16;
			}

		__m128i v0 = _mm_loadu_si128 ((__m128i *)(x));
		__m128i v1 = _mm_loadu_si128 ((__m128i *)(x + 1));

		__m128i result;

		// terminating 0
		result = _mm_cmpeq_epi8(v0, _mm_set1_epi8('\000'));

		// sequence ..
		result = _mm_or_si128(result, _mm_and_si128 (_mm_cmpeq_epi8(v0, dot),
                		              		     _mm_cmpeq_epi8(v1, dot)));

		// sequence /.
		result = _mm_or_si128(result, _mm_and_si128 (_mm_cmpeq_epi8(v0, slash),
                		              		     _mm_cmpeq_epi8(v1, dot)));

		// sequence //
		result = _mm_or_si128(result, _mm_and_si128 (_mm_cmpeq_epi8(v0, slash),
                		              		     _mm_cmpeq_epi8(v1, slash)));

                                                                 
		// sequence .l                                                   
		result = _mm_or_si128(result, _mm_and_si128 (_mm_cmpeq_epi8(v0, dot),
                		              		     _mm_cmpeq_epi8(v1, char_l)));
                                                                 
		// sequence @{                                                   
                                                                 
		result = _mm_or_si128(result, _mm_and_si128 (_mm_cmpeq_epi8(v0, at),
                		                             _mm_cmpeq_epi8(v1, brace)));

		uint64_t mask = _mm_movemask_epi8(result);
		if (mask) {
		        char *p = x + __builtin_ctzl(mask);

		        if (!*p)
                		return 0;
		        else if (p[0] == '.' && p[1] == 'l')
                		if (bytewise_check(x) != -2)
					return bytewise_check(x);
		        else
                		return -1;
		}
		x += 16;
	}
}

  reply	other threads:[~2014-06-14 15:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-05 23:56 [PATCH v7 0/1] refs.c: SSE4.2 optimizations for check_refname_component David Turner
2014-06-05 23:56 ` [PATCH v7 1/1] " David Turner
2014-06-14 15:22   ` Ondřej Bílka [this message]
2014-06-15  5:53     ` David Turner
2014-06-09 22:16 ` [PATCH v7 0/1] " Junio C Hamano
2014-06-09 22:39   ` David Turner
2014-06-09 23:05   ` Junio C Hamano
2014-06-10  6:04     ` Johannes Sixt
2014-06-10  6:55       ` Junio C Hamano
2014-06-13  1:18       ` David Turner
2014-06-13  4:11         ` Torsten Bögershausen
2014-06-14 10:24           ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140614152209.GA14125@domone.podge \
    --to=neleai@seznam.cz \
    --cc=dturner@twitter.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).