[RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X)

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X)
@ 2007-04-06 23:48 Arjen Laarhoven
  2007-04-07  0:47 ` Junio C Hamano
  2007-04-07  1:40 ` Linus Torvalds
  0 siblings, 2 replies; 7+ messages in thread
From: Arjen Laarhoven @ 2007-04-06 23:48 UTC (permalink / raw
  To: Git Mailing List

The compiler toolchain supplied by Apple's Xcode environment has an old
version (1.38) of the GNU assembler.  It cannot assemble the optimized
ppc/sha1ppc.S file.  ppc/sha1ppc.S was rewritten into a Perl script
which outputs the same code, but valid for the Xcode assembler.

Signed-off-by: Arjen Laarhoven <arjen@yaph.org>
---
 Makefile                     |   15 +++-
 ppc/darwin/darwin_ppc_gen.pl |  211 ++++++++++++++++++++++++++++++++++++++++++
 ppc/{ => linux}/sha1ppc.S    |    0 
 3 files changed, 223 insertions(+), 3 deletions(-)
 create mode 100755 ppc/darwin/darwin_ppc_gen.pl
 rename ppc/{ => linux}/sha1ppc.S (100%)

diff --git a/Makefile b/Makefile
index b159ffd..a91fa2a 100644
--- a/Makefile
+++ b/Makefile
@@ -587,9 +587,13 @@ ifdef OLD_ICONV
 	BASIC_CFLAGS += -DOLD_ICONV
 endif
 
-ifdef PPC_SHA1
+ifdef PPC_SHA1_LINUX
 	SHA1_HEADER = "ppc/sha1.h"
-	LIB_OBJS += ppc/sha1.o ppc/sha1ppc.o
+	LIB_OBJS += ppc/sha1.o ppc/linux/sha1ppc.o
+else
+ifdef PPC_SHA1_DARWIN
+	SHA1_HEADER = "ppc/sha1.h"
+	LIB_OBJS += ppc/sha1.o ppc/darwin/sha1ppc.o
 else
 ifdef ARM_SHA1
 	SHA1_HEADER = "arm/sha1.h"
@@ -604,6 +608,7 @@ else
 endif
 endif
 endif
+endif
 ifdef NO_PERL_MAKEMAKER
 	export NO_PERL_MAKEMAKER
 endif
@@ -620,6 +625,7 @@ endif
 ifneq ($(findstring $(MAKEFLAGS),s),s)
 ifndef V
 	QUIET_CC       = @echo '   ' CC $@;
+	QUIET_AS       = @echo '   ' AS $@; 
 	QUIET_AR       = @echo '   ' AR $@;
 	QUIET_LINK     = @echo '   ' LINK $@;
 	QUIET_BUILT_IN = @echo '   ' BUILTIN $@;
@@ -780,6 +786,9 @@ exec_cmd.o: exec_cmd.c GIT-CFLAGS
 builtin-init-db.o: builtin-init-db.c GIT-CFLAGS
 	$(QUIET_CC)$(CC) -o $*.o -c $(ALL_CFLAGS) -DDEFAULT_GIT_TEMPLATE_DIR='"$(template_dir_SQ)"' $<
 
+ppc/darwin/sha1ppc.S:
+	$(QUIET_GEN)$(PERL_PATH) ppc/darwin/darwin_ppc_gen.pl > $@
+
 http.o: http.c GIT-CFLAGS
 	$(QUIET_CC)$(CC) -o $*.o -c $(ALL_CFLAGS) -DGIT_USER_AGENT='"git/$(GIT_VERSION)"' $<
 
@@ -962,7 +971,7 @@ dist-doc:
 ### Cleaning rules
 
 clean:
-	rm -f *.o mozilla-sha1/*.o arm/*.o ppc/*.o compat/*.o xdiff/*.o \
+	rm -f *.o mozilla-sha1/*.o arm/*.o ppc/*.o ppc/darwin/*.[os] ppc/linux/*.o compat/*.o xdiff/*.o \
 		test-chmtime$X $(LIB_FILE) $(XDIFF_LIB)
 	rm -f $(ALL_PROGRAMS) $(BUILT_INS) git$X
 	rm -f *.spec *.pyc *.pyo */*.pyc */*.pyo common-cmds.h TAGS tags
diff --git a/ppc/darwin/darwin_ppc_gen.pl b/ppc/darwin/darwin_ppc_gen.pl
new file mode 100755
index 0000000..346cd71
--- /dev/null
+++ b/ppc/darwin/darwin_ppc_gen.pl
@@ -0,0 +1,211 @@
+#!/usr/bin/perl
+
+# This script generates the PowerPC assembly code for optimized SHA-1
+# hash generation on Darwin (Mac OS X).  It is a rewrite of the original
+# ppc/sha1ppc.S file.
+#
+# The original ppc/sha1ppc.S cannot be assembled with the toolchain
+# supplied with Xcode, as the assembler is (based on) GNU as version
+# 1.38.  The problem is basically that the 1.38 assembler doesn't
+# understand the computed register numbers used in the macros and
+# register numbers without the 'r'.  This script acts as preprocessor
+# and evaluates the # expressions for the register numbers and outputs
+# the final correct # assembly for the 1.38 assembler.
+
+use strict;
+use warnings;
+
+
+sub RA { my ($t) = @_; 'r'.((($t)+4)%5+6) }
+sub RB { my ($t) = @_; 'r'.((($t)+3)%5+6) }
+sub RC { my ($t) = @_; 'r'.((($t)+2)%5+6) }
+sub RD { my ($t) = @_; 'r'.((($t)+1)%5+6) }
+sub RE { my ($t) = @_; 'r'.((($t)+0)%5+6) }
+sub W  { my ($t) = @_; 'r'.(($t)%16+11)   }
+
+sub LOADW { my $s = shift; return "\tlwz ".W($s).','.($s)*4 .'(r4)'; }
+
+sub STEPD0_LOAD {
+    my ($t, $s) = @_;
+
+    return join "\n",
+        "\tadd ".RE($t).','.RE($t).','.W($t),
+        "\tandc r0,".RD($t).','.RB($t),
+        "\tand ".W($s).','.RC($t).','.RB($t),
+        "\tadd ".RE($t).','.RE($t).',r0',
+        "\trotlwi r0,".RA($t).',5',
+        "\trotlwi ".RB($t).','.RB($t).',30',
+        "\tadd ".RE($t).','.RE($t).','.W($s),
+        "\tadd r0,r0,r5",
+        "\tlwz ".W($s).','.($s)*4 .'(r4)',
+        "\tadd ".RE($t).','.RE($t).',r0';
+}
+
+sub STEPD0_UPDATE {
+    my ($t, $s, $loadk) = @_;
+
+    return join "\n",
+    "\tadd ".RE($t).','.RE($t).','.W($t),
+    "\tandc r0,".RD($t).','.RB($t),
+    "\txor ".W($s).','.W(($s)-16).','.W(($s)-3),
+    "\tadd ".RE($t).','.RE($t).',r0',
+    "\tand r0,".RC($t).','.RB($t),
+    "\txor ".W($s).','.W($s).','.W(($s)-8),
+    "\tadd ".RE($t).','.RE($t).',r0',
+    "\trotlwi r0,".RA($t).',5',
+    "\txor ".W($s).','.W($s).','.W(($s)-14),
+    "\tadd ".RE($t).','.RE($t).',r5',
+    $loadk || (),
+    "\trotlwi ".RB($t).','.RB($t).',30',
+    "\trotlwi ".W($s).','.W($s).',1',
+    "\tadd ".RE($t).','.RE($t).',r0';
+}
+
+sub STEPD1_UPDATE {
+    my ($t, $s, $loadk) = @_;
+
+    return join "\n",
+        "\tadd ".RE($t).','.RE($t).','.W($t),
+        "\txor r0,".RD($t).','.RB($t),
+        "\txor ".W($s).','.W(($s)-16).','.W(($s)-3),
+        "\tadd ".RE($t).','.RE($t).',r5',
+        $loadk || (),
+        "\txor r0,r0,".RC($t),
+        "\txor ".W($s).','.W($s).','.W(($s)-8),
+        "\tadd ".RE($t).','.RE($t).',r0',
+        "\trotlwi r0,".RA($t).',5',
+        "\txor ".W($s).','.W($s).','.W(($s)-14),
+        "\tadd ".RE($t).','.RE($t).',r0',
+        "\trotlwi ".RB($t).','.RB($t).',30',
+        "\trotlwi ".W($s).','.W($s).',1';
+}
+
+sub STEPD1 {
+    my ($t) = @_;
+
+    return join "\n",
+        "\tadd ".RE($t).','.RE($t).','.W($t),
+        "\txor r0,".RD($t).','.RB($t),
+        "\trotlwi ".RB($t).','.RB($t).',30',
+        "\tadd ".RE($t).','.RE($t).',r5',
+        "\txor r0,r0,".RC($t),
+        "\tadd ".RE($t).','.RE($t).',r0',
+        "\trotlwi r0,".RA($t).',5',
+        "\tadd ".RE($t).','.RE($t).',r0';
+}
+
+sub STEPD2_UPDATE {
+    my ($t, $s, $loadk) = @_;
+
+    return join "\n",
+        "\tadd ".RE($t).','.RE($t).','.W($t),
+        "\tand r0,".RD($t).','.RB($t),
+        "\txor ".W($s).','.W(($s)-16).','.W(($s)-3),
+        "\tadd ".RE($t).','.RE($t).',r0',
+        "\txor r0,".RD($t).','.RB($t),
+        "\txor ".W($s).','.W($s).','.W(($s)-8),
+        "\tadd ".RE($t).','.RE($t).',r5',
+        $loadk || (),
+        "\tand r0,r0,".RC($t),
+        "\txor ".W($s).','.W($s).','.W(($s)-14),
+        "\tadd ".RE($t).','.RE($t).',r0',
+        "\trotlwi r0,".RA($t).',5',
+        "\trotlwi ".W($s).','.W($s).',1',
+        "\tadd ".RE($t).','.RE($t).',r0',
+        "\trotlwi ".RB($t).','.RB($t).',30',
+}
+
+sub STEP0_LOAD4 {
+    my ($t, $s) = @_;
+
+    return join "\n",
+        STEPD0_LOAD($t, $s),
+        STEPD0_LOAD($t+1, $s+1),
+        STEPD0_LOAD($t+2, $s+2),
+        STEPD0_LOAD($t+3, $s+3);
+}
+
+sub STEPUP4 {
+    my ($fn, $t, $s, $loadk) = @_;
+
+    no strict 'refs';
+    return join "\n",
+        &{'STEP' . $fn . '_UPDATE'}($t, $s),
+        &{'STEP' . $fn . '_UPDATE'}($t+1, $s+1),
+        &{'STEP' . $fn . '_UPDATE'}($t+2, $s+2),
+        &{'STEP' . $fn . '_UPDATE'}($t+3, $s+3, $loadk),
+}
+
+sub STEPUP20 {
+    my ($fn, $t, $s, $loadk) = @_;
+
+    return join "\n",
+        STEPUP4($fn, $t, $s),
+        STEPUP4($fn, $t+4, $s+4),
+        STEPUP4($fn, $t+8, $s+8),
+        STEPUP4($fn, $t+12, $s+12),
+        STEPUP4($fn, $t+16, $s+16, $loadk),
+}
+
+print <<'EOA';
+        .globl  _sha1_core
+_sha1_core:
+        stwu    r1,-80(r1)
+        stmw    r13,4(r1)
+
+        /* Load up A - E */
+        lmw     r27,0(r3)
+
+        mtctr   r5
+
+1:
+EOA
+
+print LOADW(0)."\n";
+print "\tlis r5,0x5a82\n";
+print "\tmr ".RE(0).",r31\n";
+print LOADW(1)."\n";
+print "\tmr ".RD(0).",r30\n";
+print "\tmr ".RC(0).",r29\n";
+print LOADW(2)."\n";
+print "\tori r5,r5,0x7999\n";
+print "\tmr ".RB(0).",r28\n";
+print LOADW(3)."\n";
+print "\tmr ".RA(0).",r27\n";
+
+print STEP0_LOAD4(0, 4)."\n";
+print STEP0_LOAD4(4, 8)."\n";
+print STEP0_LOAD4(8, 12)."\n";
+print STEPUP4("D0", 12, 16,)."\n";
+print STEPUP4("D0", 16, 20, "lis r5,0x6ed9")."\n";
+
+print "\tori r5,r5,0xeba1\n";
+print STEPUP20("D1", 20, 24, "lis r5,0x8f1b")."\n";
+
+print "\tori r5,r5,0xbcdc\n";
+print STEPUP20("D2", 40, 44, "lis r5,0xca62")."\n";
+
+print "\tori r5,r5,0xc1d6\n";
+print STEPUP4("D1", 60, 64,)."\n";
+print STEPUP4("D1", 64, 68,)."\n";
+print STEPUP4("D1", 68, 72,)."\n";
+print STEPUP4("D1", 72, 76,)."\n";
+print "\taddi r4,r4,64\n";
+print STEPD1(76)."\n";
+print STEPD1(77)."\n";
+print STEPD1(78)."\n";
+print STEPD1(79)."\n";
+
+print "\tadd r31,r31,".RE(0)."\n";
+print "\tadd r30,r30,".RD(0)."\n";
+print "\tadd r29,r29,".RC(0)."\n";
+print "\tadd r28,r28,".RB(0)."\n";
+print "\tadd r27,r27,".RA(0)."\n";
+
+print "\tbdnz 1b\n";
+
+print "\tstmw r27,0(r3)\n";
+print "\tlmw  r13,4(r1)\n";
+print "\taddi r1,r1,80\n";
+print "\tblr\n";
+
diff --git a/ppc/sha1ppc.S b/ppc/linux/sha1ppc.S
similarity index 100%
rename from ppc/sha1ppc.S
rename to ppc/linux/sha1ppc.S
-- 
1.5.1.rc3.29.gd8b6

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X)
  2007-04-06 23:48 [RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X) Arjen Laarhoven
@ 2007-04-07  0:47 ` Junio C Hamano
  2007-04-07  1:40 ` Linus Torvalds
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2007-04-07  0:47 UTC (permalink / raw
  To: Arjen Laarhoven; +Cc: Git Mailing List

arjen@yaph.org (Arjen Laarhoven) writes:

> The compiler toolchain supplied by Apple's Xcode environment has an old
> version (1.38) of the GNU assembler.  It cannot assemble the optimized
> ppc/sha1ppc.S file.  ppc/sha1ppc.S was rewritten into a Perl script
> which outputs the same code, but valid for the Xcode assembler.
>
> Signed-off-by: Arjen Laarhoven <arjen@yaph.org>

Gaah.

When there are improvements/fixes to the sha1ppc.S side, how are
you going to keep that in sync with darwin_ppc_gen.pl?  If that
script *_gen.pl were a postprocessor that munges CPP output from
sha1ppc.S to make it assemblable with an old assembler, it would
be one thing.  But this looks horrible.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X)
  2007-04-06 23:48 [RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X) Arjen Laarhoven
  2007-04-07  0:47 ` Junio C Hamano
@ 2007-04-07  1:40 ` Linus Torvalds
  2007-04-08 20:09   ` Arjen Laarhoven
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2007-04-07  1:40 UTC (permalink / raw
  To: Arjen Laarhoven; +Cc: Git Mailing List

On Sat, 7 Apr 2007, Arjen Laarhoven wrote:
>
> The compiler toolchain supplied by Apple's Xcode environment has an old
> version (1.38) of the GNU assembler.  It cannot assemble the optimized
> ppc/sha1ppc.S file.  ppc/sha1ppc.S was rewritten into a Perl script
> which outputs the same code, but valid for the Xcode assembler.

Ugh. That's just too ugly.

The Linux version of the GNU assembler can certainly take the same limited 
input as the old Apple one. 

So how about instea dof having two totally different versions of this 
file, just having *one*, and having a pre-processor that turns it into 
something that is acceptable to both?

And yes, it could be your perl script, except your perl script is ugly as 
*hell*. The old C preprocessor code is much nicer than your perl script 
that does "print" statements.

How about something like the following instead?

 (a) make the register macros expand to something easily 
     greppable/parseable
 (b) have a *separate* preprocessor phase that actually then takes that 
     pattern, and evaluates it to a numeric value.
 (c) assemble the end result

The (a) part is trivial. Just a patch like the appended will make sure 
that all the registers are now written as "REG[int-expression]", and then 
all you need is a perl-script or something that can trigger on the regexp

	"REG\[\([^]]*\)\]"

and replace that regex with

	"%eval(\1)"

which is somethign that perl should be designed for.

That way you just have *one* source file (the "sha1ppc.S" one), which is 
readable, and a simple script to then evaluate the register numbers 
statically instead of expecting that the assembler can do it (since the 
Apple one apparently cannot).

So it would just require somebody who knows perl. What's a one-liner perl 
script to turn a line like

	add REG[((0)+0)%5+6],REG[((0)+0)%5+6],REG[(0)%16+11];

into

	add %6,%6,%11

(ie it just evaluated the expression inside the [] things, and replaced it 
with the "%<num>" string)?

<Taunting mode>Or maybe perl can't do that in a single line!</Taunting mode>

		Linus

---
diff --git a/ppc/sha1ppc.S b/ppc/sha1ppc.S
index f132696..cc554a4 100644
--- a/ppc/sha1ppc.S
+++ b/ppc/sha1ppc.S
@@ -32,14 +32,14 @@
  * We use registers 6 - 10 for this.  (Registers 27 - 31 hold
  * the previous values.)
  */
-#define RA(t)	(((t)+4)%5+6)
-#define RB(t)	(((t)+3)%5+6)
-#define RC(t)	(((t)+2)%5+6)
-#define RD(t)	(((t)+1)%5+6)
-#define RE(t)	(((t)+0)%5+6)
+#define RA(t)	REG[((t)+4)%5+6]
+#define RB(t)	REG[((t)+3)%5+6]
+#define RC(t)	REG[((t)+2)%5+6]
+#define RD(t)	REG[((t)+1)%5+6]
+#define RE(t)	REG[((t)+0)%5+6]

 /* We use registers 11 - 26 for the W values */
-#define W(t)	((t)%16+11)
+#define W(t)	REG[(t)%16+11]

 /* Register 5 is used for the constant k */

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X)
  2007-04-07  1:40 ` Linus Torvalds
@ 2007-04-08 20:09   ` Arjen Laarhoven
  2007-04-10  9:48     ` Karl Hasselström
  0 siblings, 1 reply; 7+ messages in thread
From: Arjen Laarhoven @ 2007-04-08 20:09 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

Hi,

On Fri, Apr 06, 2007 at 06:40:53PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 7 Apr 2007, Arjen Laarhoven wrote:
> >
> > The compiler toolchain supplied by Apple's Xcode environment has an old
> > version (1.38) of the GNU assembler.  It cannot assemble the optimized
> > ppc/sha1ppc.S file.  ppc/sha1ppc.S was rewritten into a Perl script
> > which outputs the same code, but valid for the Xcode assembler.
> 
> Ugh. That's just too ugly.

Yes.  Very.  I should've reworked it before sending it to the list.  Ah
well.

> The Linux version of the GNU assembler can certainly take the same limited 
> input as the old Apple one. 
> 
> So how about instea dof having two totally different versions of this 
> file, just having *one*, and having a pre-processor that turns it into 
> something that is acceptable to both?

That is of course the best way to handle it.  See the patch below for
the reworked solution.

[snip excellent pointers]

> So it would just require somebody who knows perl. What's a one-liner perl 
> script to turn a line like
> 
> 	add REG[((0)+0)%5+6],REG[((0)+0)%5+6],REG[(0)%16+11];
> 
> into
> 
> 	add %6,%6,%11
> 
> (ie it just evaluated the expression inside the [] things, and replaced it 
> with the "%<num>" string)?
> 
> <Taunting mode>Or maybe perl can't do that in a single line!</Taunting mode>

Of course it can! :-P

But there are some other issues like the underscore prefix of the symbol
in the assembly and the inability of Apple's assembler to handle
multiple statements per line.  So for the sake of maintainability I've
put it in its own file, and even turned on warnings and strict ;-)

I don't have access to a Linux/PPC machine, so it could very well need
some tweaking.  Someone with a Linux/PPC box want to give it a try?

---snip---
Optimized PowerPC SHA-1 calculation for Darwin

The compiler toolchain from Apple's Xcode environment uses an old
version (1.38) of the GNU assembler which cannot assemble the
optimized SHA-1 calculation in ppc/sha1ppc.S.  The main problem is the
use of calculated register numbers which gas 1.38 doesn't understand.

To create valid assembly code the registers in ppc/sha1ppc.in.S are
represented by R[<register number>].  sha1ppc.in.S is postprocessed by
gen_sha1ppc.pl to generate valid assembly code for gas 1.38.

Signed-off-by: Arjen Laarhoven <arjen@yaph.org>
---
 Makefile                        |    7 ++-
 ppc/gen_sha1ppc.pl              |   19 +++++++
 ppc/{sha1ppc.S => sha1ppc.in.S} |  110 +++++++++++++++++++-------------------
 3 files changed, 79 insertions(+), 57 deletions(-)
 create mode 100644 ppc/gen_sha1ppc.pl
 rename ppc/{sha1ppc.S => sha1ppc.in.S} (70%)

diff --git a/Makefile b/Makefile
index ac29c62..01b69e7 100644
--- a/Makefile
+++ b/Makefile
@@ -825,7 +825,7 @@ git$X git.spec \
 
 %.o: %.c GIT-CFLAGS
 	$(QUIET_CC)$(CC) -o $*.o -c $(ALL_CFLAGS) $<
-%.o: %.S
+%.o: %.s
 	$(QUIET_CC)$(CC) -o $*.o -c $(ALL_CFLAGS) $<
 
 exec_cmd.o: exec_cmd.c GIT-CFLAGS
@@ -836,6 +836,9 @@ builtin-init-db.o: builtin-init-db.c GIT-CFLAGS
 http.o: http.c GIT-CFLAGS
 	$(QUIET_CC)$(CC) -o $*.o -c $(ALL_CFLAGS) -DGIT_USER_AGENT='"git/$(GIT_VERSION)"' $<
 
+ppc/sha1ppc.s: ppc/sha1ppc.in.S
+	$(QUIET_CC)$(CC) -c -E $< | $(PERL_PATH) ppc/gen_sha1ppc.pl > $@
+
 ifdef NO_EXPAT
 http-fetch.o: http-fetch.c http.h GIT-CFLAGS
 	$(QUIET_CC)$(CC) -o $*.o -c $(ALL_CFLAGS) -DNO_EXPAT $<
@@ -1032,7 +1035,7 @@ dist-doc:
 ### Cleaning rules
 
 clean:
-	rm -f *.o mozilla-sha1/*.o arm/*.o ppc/*.o compat/*.o xdiff/*.o \
+	rm -f *.o mozilla-sha1/*.o arm/*.o ppc/*.[so] compat/*.o xdiff/*.o \
 		test-chmtime$X $(LIB_FILE) $(XDIFF_LIB)
 	rm -f $(ALL_PROGRAMS) $(BUILT_INS) git$X
 	rm -f *.spec *.pyc *.pyo */*.pyc */*.pyo common-cmds.h TAGS tags
diff --git a/ppc/gen_sha1ppc.pl b/ppc/gen_sha1ppc.pl
new file mode 100644
index 0000000..79ba1a1
--- /dev/null
+++ b/ppc/gen_sha1ppc.pl
@@ -0,0 +1,19 @@
+#!/usr/bin/perl -w
+
+use strict;
+
+my %platform = (
+    # Special extra substitutions that have to be done on this platform
+    darwin => sub {
+        s{sha1_core}{_sha1_core};
+        s{;}{\n}g;
+    },
+);
+
+my $extra = exists $platform{$^O} ? $platform{$^O} : sub {};
+
+while (<>) {
+    $extra->();
+    s{R\[([^]]+)\]}{'r'.eval"$1"}ge;
+    print;
+}
diff --git a/ppc/sha1ppc.S b/ppc/sha1ppc.in.S
similarity index 70%
rename from ppc/sha1ppc.S
rename to ppc/sha1ppc.in.S
index f132696..11bc2e0 100644
--- a/ppc/sha1ppc.S
+++ b/ppc/sha1ppc.in.S
@@ -32,14 +32,14 @@
  * We use registers 6 - 10 for this.  (Registers 27 - 31 hold
  * the previous values.)
  */
-#define RA(t)	(((t)+4)%5+6)
-#define RB(t)	(((t)+3)%5+6)
-#define RC(t)	(((t)+2)%5+6)
-#define RD(t)	(((t)+1)%5+6)
-#define RE(t)	(((t)+0)%5+6)
+#define RA(t)	R[((t)+4)%5+6]
+#define RB(t)	R[((t)+3)%5+6]
+#define RC(t)	R[((t)+2)%5+6]
+#define RD(t)	R[((t)+1)%5+6]
+#define RE(t)	R[((t)+0)%5+6]
 
 /* We use registers 11 - 26 for the W values */
-#define W(t)	((t)%16+11)
+#define W(t)	R[(t)%16+11]
 
 /* Register 5 is used for the constant k */
 
@@ -86,7 +86,7 @@
 
 /* the initial loads. */
 #define LOADW(s) \
-	lwz	W(s),(s)*4(%r4)
+	lwz	W(s),(s)*4(R[4])
 
 /*
  * Perform a step with F0, and load W(s).  Uses W(s) as a temporary
@@ -97,10 +97,10 @@
  * second line.)  Thus, two iterations take 7 cycles, 3.5 cycles per round.
  */
 #define STEPD0_LOAD(t,s) \
-add RE(t),RE(t),W(t); andc   %r0,RD(t),RB(t);  and    W(s),RC(t),RB(t); \
-add RE(t),RE(t),%r0;  rotlwi %r0,RA(t),5;      rotlwi RB(t),RB(t),30;   \
-add RE(t),RE(t),W(s); add    %r0,%r0,%r5;      lwz    W(s),(s)*4(%r4);  \
-add RE(t),RE(t),%r0
+add RE(t),RE(t),W(t); andc   R[0],RD(t),RB(t); and    W(s),RC(t),RB(t); \
+add RE(t),RE(t),R[0]; rotlwi R[0],RA(t),5;     rotlwi RB(t),RB(t),30;   \
+add RE(t),RE(t),W(s); add    R[0],R[0],R[5];   lwz    W(s),(s)*4(R[4]); \
+add RE(t),RE(t),R[0]
 
 /*
  * This is likewise awkward, 13 instructions.  However, it can also
@@ -108,28 +108,28 @@ add RE(t),RE(t),%r0
  * in 9 cycles, 4.5 cycles/round.
  */
 #define STEPD0_UPDATE(t,s,loadk...) \
-add RE(t),RE(t),W(t); andc   %r0,RD(t),RB(t); xor    W(s),W((s)-16),W((s)-3); \
-add RE(t),RE(t),%r0;  and    %r0,RC(t),RB(t); xor    W(s),W(s),W((s)-8);      \
-add RE(t),RE(t),%r0;  rotlwi %r0,RA(t),5;     xor    W(s),W(s),W((s)-14);     \
-add RE(t),RE(t),%r5;  loadk; rotlwi RB(t),RB(t),30;  rotlwi W(s),W(s),1;     \
-add RE(t),RE(t),%r0
+add RE(t),RE(t),W(t); andc   R[0],RD(t),RB(t); xor   W(s),W((s)-16),W((s)-3); \
+add RE(t),RE(t),R[0]; and    R[0],RC(t),RB(t); xor   W(s),W(s),W((s)-8);      \
+add RE(t),RE(t),R[0]; rotlwi R[0],RA(t),5;     xor   W(s),W(s),W((s)-14);     \
+add RE(t),RE(t),R[5]; loadk; rotlwi RB(t),RB(t),30;  rotlwi W(s),W(s),1;      \
+add RE(t),RE(t),R[0]
 
 /* Nicely optimal.  Conveniently, also the most common. */
 #define STEPD1_UPDATE(t,s,loadk...) \
-add RE(t),RE(t),W(t); xor    %r0,RD(t),RB(t); xor    W(s),W((s)-16),W((s)-3); \
-add RE(t),RE(t),%r5;  loadk; xor %r0,%r0,RC(t);  xor W(s),W(s),W((s)-8);      \
-add RE(t),RE(t),%r0;  rotlwi %r0,RA(t),5;     xor    W(s),W(s),W((s)-14);     \
-add RE(t),RE(t),%r0;  rotlwi RB(t),RB(t),30;  rotlwi W(s),W(s),1
+add RE(t),RE(t),W(t); xor    R[0],RD(t),RB(t);    xor W(s),W((s)-16),W((s)-3); \
+add RE(t),RE(t),R[5]; loadk; xor R[0],R[0],RC(t); xor W(s),W(s),W((s)-8);    \
+add RE(t),RE(t),R[0]; rotlwi R[0],RA(t),5;   xor    W(s),W(s),W((s)-14);  \
+add RE(t),RE(t),R[0]; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1
 
 /*
  * The naked version, no UPDATE, for the last 4 rounds.  3 cycles per.
  * We could use W(s) as a temp register, but we don't need it.
  */
 #define STEPD1(t) \
-                        add   RE(t),RE(t),W(t); xor    %r0,RD(t),RB(t); \
-rotlwi RB(t),RB(t),30;  add   RE(t),RE(t),%r5;  xor    %r0,%r0,RC(t);   \
-add    RE(t),RE(t),%r0; rotlwi %r0,RA(t),5;     /* spare slot */        \
-add    RE(t),RE(t),%r0
+                        add   RE(t),RE(t),W(t); xor    R[0],RD(t),RB(t); \
+rotlwi RB(t),RB(t),30;  add   RE(t),RE(t),R[5]; xor    R[0],R[0],RC(t);   \
+add    RE(t),RE(t),R[0]; rotlwi R[0],RA(t),5;     /* spare slot */        \
+add    RE(t),RE(t),R[0]
 
 /*
  * 14 instructions, 5 cycles per.  The majority function is a bit
@@ -137,11 +137,11 @@ add    RE(t),RE(t),%r0
  * but it causes a 2-instruction delay, which triggers a stall.
  */
 #define STEPD2_UPDATE(t,s,loadk...) \
-add RE(t),RE(t),W(t); and    %r0,RD(t),RB(t); xor    W(s),W((s)-16),W((s)-3); \
-add RE(t),RE(t),%r0;  xor    %r0,RD(t),RB(t); xor    W(s),W(s),W((s)-8);      \
-add RE(t),RE(t),%r5;  loadk; and %r0,%r0,RC(t);  xor W(s),W(s),W((s)-14);     \
-add RE(t),RE(t),%r0;  rotlwi %r0,RA(t),5;     rotlwi W(s),W(s),1;             \
-add RE(t),RE(t),%r0;  rotlwi RB(t),RB(t),30
+add RE(t),RE(t),W(t); and    R[0],RD(t),RB(t); xor  W(s),W((s)-16),W((s)-3); \
+add RE(t),RE(t),R[0]; xor    R[0],RD(t),RB(t); xor  W(s),W(s),W((s)-8);      \
+add RE(t),RE(t),R[5]; loadk; and R[0],R[0],RC(t);  xor W(s),W(s),W((s)-14);  \
+add RE(t),RE(t),R[0]; rotlwi R[0],RA(t),5;     rotlwi W(s),W(s),1;           \
+add RE(t),RE(t),R[0]; rotlwi RB(t),RB(t),30
 
 #define STEP0_LOAD4(t,s)		\
 	STEPD0_LOAD(t,s);		\
@@ -164,61 +164,61 @@ add RE(t),RE(t),%r0;  rotlwi RB(t),RB(t),30
 
 	.globl	sha1_core
 sha1_core:
-	stwu	%r1,-80(%r1)
-	stmw	%r13,4(%r1)
+	stwu	R[1],-80(R[1])
+	stmw	R[13],4(R[1])
 
 	/* Load up A - E */
-	lmw	%r27,0(%r3)
+	lmw	R[27],0(R[3])
 
-	mtctr	%r5
+	mtctr	R[5]
 
 1:
 	LOADW(0)
-	lis	%r5,0x5a82
-	mr	RE(0),%r31
+	lis	R[5],0x5a82
+	mr	RE(0),R[31]
 	LOADW(1)
-	mr	RD(0),%r30
-	mr	RC(0),%r29
+	mr	RD(0),R[30]
+	mr	RC(0),R[29]
 	LOADW(2)
-	ori	%r5,%r5,0x7999	/* K0-19 */
-	mr	RB(0),%r28
+	ori	R[5],R[5],0x7999	/* K0-19 */
+	mr	RB(0),R[28]
 	LOADW(3)
-	mr	RA(0),%r27
+	mr	RA(0),R[27]
 
 	STEP0_LOAD4(0, 4)
 	STEP0_LOAD4(4, 8)
 	STEP0_LOAD4(8, 12)
 	STEPUP4(D0, 12, 16,)
-	STEPUP4(D0, 16, 20, lis %r5,0x6ed9)
+	STEPUP4(D0, 16, 20, lis R[5],0x6ed9)
 
-	ori	%r5,%r5,0xeba1	/* K20-39 */
-	STEPUP20(D1, 20, 24, lis %r5,0x8f1b)
+	ori	R[5],R[5],0xeba1	/* K20-39 */
+	STEPUP20(D1, 20, 24, lis R[5],0x8f1b)
 
-	ori	%r5,%r5,0xbcdc	/* K40-59 */
-	STEPUP20(D2, 40, 44, lis %r5,0xca62)
+	ori	R[5],R[5],0xbcdc	/* K40-59 */
+	STEPUP20(D2, 40, 44, lis R[5],0xca62)
 
-	ori	%r5,%r5,0xc1d6	/* K60-79 */
+	ori	R[5],R[5],0xc1d6	/* K60-79 */
 	STEPUP4(D1, 60, 64,)
 	STEPUP4(D1, 64, 68,)
 	STEPUP4(D1, 68, 72,)
 	STEPUP4(D1, 72, 76,)
-	addi	%r4,%r4,64
+	addi	R[4],R[4],64
 	STEPD1(76)
 	STEPD1(77)
 	STEPD1(78)
 	STEPD1(79)
 
 	/* Add results to original values */
-	add	%r31,%r31,RE(0)
-	add	%r30,%r30,RD(0)
-	add	%r29,%r29,RC(0)
-	add	%r28,%r28,RB(0)
-	add	%r27,%r27,RA(0)
+	add	R[31],R[31],RE(0)
+	add	R[30],R[30],RD(0)
+	add	R[29],R[29],RC(0)
+	add	R[28],R[28],RB(0)
+	add	R[27],R[27],RA(0)
 
 	bdnz	1b
 
 	/* Save final hash, restore registers, and return */
-	stmw	%r27,0(%r3)
-	lmw	%r13,4(%r1)
-	addi	%r1,%r1,80
+	stmw	R[27],0(R[3])
+	lmw	R[13],4(R[1])
+	addi	R[1],R[1],80
 	blr
-- 
1.5.1.rc3.29.gd8b6

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X)
  2007-04-08 20:09   ` Arjen Laarhoven
@ 2007-04-10  9:48     ` Karl Hasselström
  2007-04-10 11:45       ` Arjen Laarhoven
  0 siblings, 1 reply; 7+ messages in thread
From: Karl Hasselström @ 2007-04-10  9:48 UTC (permalink / raw
  To: Arjen Laarhoven; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On 2007-04-08 22:09:39 +0200, Arjen Laarhoven wrote:

>  ppc/{sha1ppc.S => sha1ppc.in.S} |  110 +++++++++++++++++++-------------------

Wouldn't it be prettier if this filename was .S.in instead of .in.S?
Additional file suffixes are usually added at the end (e.g. .tar.gz),
and it makes more sense too.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X)
  2007-04-10  9:48     ` Karl Hasselström
@ 2007-04-10 11:45       ` Arjen Laarhoven
  2007-04-10 13:00         ` Karl Hasselström
  0 siblings, 1 reply; 7+ messages in thread
From: Arjen Laarhoven @ 2007-04-10 11:45 UTC (permalink / raw
  To: Karl Hasselstr?m; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

Hi,

On Tue, Apr 10, 2007 at 11:48:01AM +0200, Karl Hasselstr?m wrote:
> On 2007-04-08 22:09:39 +0200, Arjen Laarhoven wrote:
> 
> >  ppc/{sha1ppc.S => sha1ppc.in.S} |  110 +++++++++++++++++++-------------------
> 
> Wouldn't it be prettier if this filename was .S.in instead of .in.S?
> Additional file suffixes are usually added at the end (e.g. .tar.gz),
> and it makes more sense too.

Using the .S suffix makes gcc automatically do the right thing. .S.in
requires an extra '-x assembler-with-cpp' option to gcc.  Of course,
it's trivial fix.

Arjen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X)
  2007-04-10 11:45       ` Arjen Laarhoven
@ 2007-04-10 13:00         ` Karl Hasselström
  0 siblings, 0 replies; 7+ messages in thread
From: Karl Hasselström @ 2007-04-10 13:00 UTC (permalink / raw
  To: Arjen Laarhoven; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On 2007-04-10 13:45:07 +0200, Arjen Laarhoven wrote:

> On Tue, Apr 10, 2007 at 11:48:01AM +0200, Karl Hasselström wrote:
>
> > On 2007-04-08 22:09:39 +0200, Arjen Laarhoven wrote:
> >
> > >  ppc/{sha1ppc.S => sha1ppc.in.S} |  110 +++++++++++++++++++-------------------
> >
> > Wouldn't it be prettier if this filename was .S.in instead of
> > .in.S? Additional file suffixes are usually added at the end (e.g.
> > .tar.gz), and it makes more sense too.
>
> Using the .S suffix makes gcc automatically do the right thing.
> .S.in requires an extra '-x assembler-with-cpp' option to gcc. Of
> course, it's trivial fix.

I just read the Makefile changes again, a bit slower this time, and
noticed that you _first_ feed the .in.S file to gcc, and _then_ to the
perl script, instead of the other way around like I was expecting.
With that arrangement, your naming makes sense, since it reflects
which file format is contained in which. Sorry for the noise.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-04-10 13:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-06 23:48 [RFC/PATCH] Optimized PowerPC SHA1 generation for Darwin (OS X) Arjen Laarhoven
2007-04-07  0:47 ` Junio C Hamano
2007-04-07  1:40 ` Linus Torvalds
2007-04-08 20:09   ` Arjen Laarhoven
2007-04-10  9:48     ` Karl Hasselström
2007-04-10 11:45       ` Arjen Laarhoven
2007-04-10 13:00         ` Karl Hasselström

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).