git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1
@ 2009-04-04 20:59 Christian Couder
  2009-04-05 10:17 ` Sverre Rabbelier
  2009-04-05 13:11 ` Johannes Schindelin
  0 siblings, 2 replies; 20+ messages in thread
From: Christian Couder @ 2009-04-04 20:59 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Johannes Schindelin

This function has been copied from the "patch_pos" function in
"patch-ids.c" but an additional parameter has been added.

The new parameter is a function pointer, that is used to access the
sha1 of an element in the table.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 sha1-lookup.c |  101 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 sha1-lookup.h |    7 ++++
 2 files changed, 108 insertions(+), 0 deletions(-)

diff --git a/sha1-lookup.c b/sha1-lookup.c
index da35747..055dd87 100644
--- a/sha1-lookup.c
+++ b/sha1-lookup.c
@@ -1,6 +1,107 @@
 #include "cache.h"
 #include "sha1-lookup.h"
 
+static uint32_t take2(const unsigned char *sha1)
+{
+	return ((sha1[0] << 8) | sha1[1]);
+}
+
+/*
+ * Conventional binary search loop looks like this:
+ *
+ *      do {
+ *              int mi = (lo + hi) / 2;
+ *              int cmp = "entry pointed at by mi" minus "target";
+ *              if (!cmp)
+ *                      return (mi is the wanted one)
+ *              if (cmp > 0)
+ *                      hi = mi; "mi is larger than target"
+ *              else
+ *                      lo = mi+1; "mi is smaller than target"
+ *      } while (lo < hi);
+ *
+ * The invariants are:
+ *
+ * - When entering the loop, lo points at a slot that is never
+ *   above the target (it could be at the target), hi points at a
+ *   slot that is guaranteed to be above the target (it can never
+ *   be at the target).
+ *
+ * - We find a point 'mi' between lo and hi (mi could be the same
+ *   as lo, but never can be the same as hi), and check if it hits
+ *   the target.  There are three cases:
+ *
+ *    - if it is a hit, we are happy.
+ *
+ *    - if it is strictly higher than the target, we update hi with
+ *      it.
+ *
+ *    - if it is strictly lower than the target, we update lo to be
+ *      one slot after it, because we allow lo to be at the target.
+ *
+ * When choosing 'mi', we do not have to take the "middle" but
+ * anywhere in between lo and hi, as long as lo <= mi < hi is
+ * satisfied.  When we somehow know that the distance between the
+ * target and lo is much shorter than the target and hi, we could
+ * pick mi that is much closer to lo than the midway.
+ */
+/*
+ * The table should contain "nr" elements.
+ * The sha1 of element i (between 0 and nr - 1) should be returned
+ * by "fn(i, table)".
+ */
+int sha1_pos(const unsigned char *sha1, void *table, size_t nr,
+	     sha1_access_fn fn)
+{
+	size_t hi = nr;
+	size_t lo = 0;
+	size_t mi = 0;
+
+	if (!nr)
+		return -1;
+
+	if (nr != 1) {
+		size_t lov, hiv, miv, ofs;
+
+		for (ofs = 0; ofs < 18; ofs += 2) {
+			lov = take2(fn(0, table) + ofs);
+			hiv = take2(fn(nr - 1, table) + ofs);
+			miv = take2(sha1 + ofs);
+			if (miv < lov)
+				return -1;
+			if (hiv < miv)
+				return -1 - nr;
+			if (lov != hiv) {
+				/*
+				 * At this point miv could be equal
+				 * to hiv (but sha1 could still be higher);
+				 * the invariant of (mi < hi) should be
+				 * kept.
+				 */
+				mi = (nr - 1) * (miv - lov) / (hiv - lov);
+				if (lo <= mi && mi < hi)
+					break;
+				die("oops");
+			}
+		}
+		if (18 <= ofs)
+			die("cannot happen -- lo and hi are identical");
+	}
+
+	do {
+		int cmp;
+		cmp = hashcmp(fn(mi, table), sha1);
+		if (!cmp)
+			return mi;
+		if (cmp > 0)
+			hi = mi;
+		else
+			lo = mi + 1;
+		mi = (hi + lo) / 2;
+	} while (lo < hi);
+	return -lo-1;
+}
+
 /*
  * Conventional binary search loop looks like this:
  *
diff --git a/sha1-lookup.h b/sha1-lookup.h
index 3249a81..20af285 100644
--- a/sha1-lookup.h
+++ b/sha1-lookup.h
@@ -1,6 +1,13 @@
 #ifndef SHA1_LOOKUP_H
 #define SHA1_LOOKUP_H
 
+typedef const unsigned char *sha1_access_fn(size_t index, void *table);
+
+extern int sha1_pos(const unsigned char *sha1,
+		    void *table,
+		    size_t nr,
+		    sha1_access_fn fn);
+
 extern int sha1_entry_pos(const void *table,
 			  size_t elem_size,
 			  size_t key_offset,
-- 
1.6.2.2.404.ge96f3.dirty

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-04 20:59 [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1 Christian Couder
@ 2009-04-05 10:17 ` Sverre Rabbelier
  2009-04-05 14:41   ` Jeff King
  2009-04-05 18:59   ` Junio C Hamano
  2009-04-05 13:11 ` Johannes Schindelin
  1 sibling, 2 replies; 20+ messages in thread
From: Sverre Rabbelier @ 2009-04-05 10:17 UTC (permalink / raw
  To: Christian Couder; +Cc: Junio C Hamano, git, Johannes Schindelin

Heya,

On Sat, Apr 4, 2009 at 22:59, Christian Couder <chriscool@tuxfamily.org> wrote:
> +                               if (lo <= mi && mi < hi)
> +                                       break;
> +                               die("oops");

That's going to be an official git error message? Why not make it "The
fatal error oops has occured, press ctrl-c to lose all your work, or
press any other key to do the same"?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1
  2009-04-04 20:59 [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1 Christian Couder
  2009-04-05 10:17 ` Sverre Rabbelier
@ 2009-04-05 13:11 ` Johannes Schindelin
  1 sibling, 0 replies; 20+ messages in thread
From: Johannes Schindelin @ 2009-04-05 13:11 UTC (permalink / raw
  To: Christian Couder; +Cc: Junio C Hamano, git

Hi,

On Sat, 4 Apr 2009, Christian Couder wrote:

> This function has been copied from the "patch_pos" function in
> "patch-ids.c" but an additional parameter has been added.
> 
> The new parameter is a function pointer, that is used to access the
> sha1 of an element in the table.

Frankly, this is hard to follow.

It would have been easier if the first patch added that parameter, and the 
second patch just _moved_ the function.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1
  2009-04-05 10:17 ` Sverre Rabbelier
@ 2009-04-05 14:41   ` Jeff King
  2009-04-05 14:42     ` Sverre Rabbelier
  2009-04-05 18:59   ` Junio C Hamano
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff King @ 2009-04-05 14:41 UTC (permalink / raw
  To: Sverre Rabbelier
  Cc: Christian Couder, Junio C Hamano, git, Johannes Schindelin

On Sun, Apr 05, 2009 at 12:17:32PM +0200, Sverre Rabbelier wrote:

> On Sat, Apr 4, 2009 at 22:59, Christian Couder <chriscool@tuxfamily.org> wrote:
> > +                               if (lo <= mi && mi < hi)
> > +                                       break;
> > +                               die("oops");
> 
> That's going to be an official git error message? Why not make it "The
> fatal error oops has occured, press ctrl-c to lose all your work, or
> press any other key to do the same"?

From my cursory reading of the code, this is a "cannot happen"
assertion (and there is one a few lines below, too):

+                               die("oops");
+                       }
+               }
+               if (18 <= ofs)
+                       die("cannot happen -- lo and hi are identical");
+       }

I don't think we have an established style for such assertions. In
theory, users never see it, but the whole point of it being there is
that they _might_. :) One could use the "assert" macro, though I think
its output is just as cryptic to end users. I usually do

  die("BUG: <something that makes a little bit of sense to the user>");

some examples of which you can see via "git grep BUG:".

Of course, "binary search on fire?" would probably work, too.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-05 14:41   ` Jeff King
@ 2009-04-05 14:42     ` Sverre Rabbelier
  0 siblings, 0 replies; 20+ messages in thread
From: Sverre Rabbelier @ 2009-04-05 14:42 UTC (permalink / raw
  To: Jeff King; +Cc: Christian Couder, Junio C Hamano, git, Johannes Schindelin

Heya,

On Sun, Apr 5, 2009 at 16:41, Jeff King <peff@peff.net> wrote:
> some examples of which you can see via "git grep BUG:".
>
> Of course, "binary search on fire?" would probably work, too.

Right, I'm thinking anything that makes it easier to find out what
went wrong later on would be better than just "oops".

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-05 10:17 ` Sverre Rabbelier
  2009-04-05 14:41   ` Jeff King
@ 2009-04-05 18:59   ` Junio C Hamano
  2009-04-05 19:06     ` Sverre Rabbelier
  2009-04-05 19:19     ` Felipe Contreras
  1 sibling, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2009-04-05 18:59 UTC (permalink / raw
  To: Sverre Rabbelier; +Cc: Christian Couder, git, Johannes Schindelin

Sverre Rabbelier <srabbelier@gmail.com> writes:

> On Sat, Apr 4, 2009 at 22:59, Christian Couder <chriscool@tuxfamily.org> wrote:
>> +                               if (lo <= mi && mi < hi)
>> +                                       break;
>> +                               die("oops");
>
> That's going to be an official git error message? Why not make it "The

It's not "going to be", but "has been so for the last two years since
5d23e13".

It is an assert, and I think Peff's die("BUG: ...") would be a good idea.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-05 18:59   ` Junio C Hamano
@ 2009-04-05 19:06     ` Sverre Rabbelier
  2009-04-05 19:59       ` Jeff King
  2009-04-05 19:19     ` Felipe Contreras
  1 sibling, 1 reply; 20+ messages in thread
From: Sverre Rabbelier @ 2009-04-05 19:06 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Christian Couder, git, Johannes Schindelin

Heya,

On Sun, Apr 5, 2009 at 20:59, Junio C Hamano <gitster@pobox.com> wrote:
> It's not "going to be", but "has been so for the last two years since
> 5d23e13".

Ah, I did not see that earlier, as I read and commented-on this patch
before reading 3/4.

> It is an assert, and I think Peff's die("BUG: ...") would be a good idea.

As long as the <something that makes sense to the user> does indeed
make sense, right :).

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-05 18:59   ` Junio C Hamano
  2009-04-05 19:06     ` Sverre Rabbelier
@ 2009-04-05 19:19     ` Felipe Contreras
       [not found]       ` <87vdpi29a4.fsf@iki.fi>
  2009-04-05 19:31       ` Reece Dunn
  1 sibling, 2 replies; 20+ messages in thread
From: Felipe Contreras @ 2009-04-05 19:19 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Sverre Rabbelier, Christian Couder, git, Johannes Schindelin

On Sun, Apr 5, 2009 at 9:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
> U3ZlcnJlIFJhYmJlbGllciA8c3JhYmJlbGllckBnbWFpbC5jb20+IHdyaXRlczoNCg0KPiBPbiBT
> YXQsIEFwciA0LCAyMDA5IGF0IDIyOjU5LCBDaHJpc3RpYW4gQ291ZGVyIDxjaHJpc2Nvb2xAdHV4
> ZmFtaWx5Lm9yZz4gd3JvdGU6DQo+PiArIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKg
> IMKgIMKgIMKgIMKgIGlmIChsbyA8PSBtaSAmJiBtaSA8IGhpKQ0KPj4gKyDCoCDCoCDCoCDCoCDC
> oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCBicmVhazsNCj4+ICsg
> wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgZGllKCJvb3BzIik7
> DQo+DQo+IFRoYXQncyBnb2luZyB0byBiZSBhbiBvZmZpY2lhbCBnaXQgZXJyb3IgbWVzc2FnZT8g
> V2h5IG5vdCBtYWtlIGl0ICJUaGUNCg0KSXQncyBub3QgImdvaW5nIHRvIGJlIiwgYnV0ICJoYXMg
> YmVlbiBzbyBmb3IgdGhlIGxhc3QgdHdvIHllYXJzIHNpbmNlDQo1ZDIzZTEzIi4NCg0KSXQgaXMg
> YW4gYXNzZXJ0LCBhbmQgSSB0aGluayBQZWZmJ3MgZGllKCJCVUc6IC4uLiIpIHdvdWxkIGJlIGEg
> Z29vZCBpZGVhLg0K

Huh?

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
       [not found]       ` <87vdpi29a4.fsf@iki.fi>
@ 2009-04-05 19:30         ` Felipe Contreras
  0 siblings, 0 replies; 20+ messages in thread
From: Felipe Contreras @ 2009-04-05 19:30 UTC (permalink / raw
  To: Teemu Likonen; +Cc: Junio C Hamano, git

On Sun, Apr 5, 2009 at 10:24 PM, Teemu Likonen <tlikonen@iki.fi> wrote:
> On 2009-04-05 22:19 (+0300), Felipe Contreras wrote:
>
>> On Sun, Apr 5, 2009 at 9:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> U3ZlcnJlIFJhYmJlbGllciA8c3JhYmJlbGllckBnbWFpbC5jb20+IHdyaXRlczoNCg0KPiBPbiBT
> [...]
>> Huh?
>
> Base64 transfer encoding:
>
>    Content-Type: text/plain; charset=utf-8
>    Content-Transfer-Encoding: base64

Yeah, I know, but apparently Gmail doesn't support it.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-05 19:19     ` Felipe Contreras
       [not found]       ` <87vdpi29a4.fsf@iki.fi>
@ 2009-04-05 19:31       ` Reece Dunn
  2009-04-05 20:02         ` Junio C Hamano
                           ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Reece Dunn @ 2009-04-05 19:31 UTC (permalink / raw
  To: Felipe Contreras
  Cc: Junio C Hamano, Sverre Rabbelier, Christian Couder, git,
	Johannes Schindelin

2009/4/5 Felipe Contreras <felipe.contreras@gmail.com>:
> On Sun, Apr 5, 2009 at 9:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> U3ZlcnJlIFJhYmJlbGllciA8c3JhYmJlbGllckBnbWFpbC5jb20+IHdyaXRlczoNCg0KPiBPbiBT
>> YXQsIEFwciA0LCAyMDA5IGF0IDIyOjU5LCBDaHJpc3RpYW4gQ291ZGVyIDxjaHJpc2Nvb2xAdHV4
>> ZmFtaWx5Lm9yZz4gd3JvdGU6DQo+PiArIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKg
>> IMKgIMKgIMKgIMKgIGlmIChsbyA8PSBtaSAmJiBtaSA8IGhpKQ0KPj4gKyDCoCDCoCDCoCDCoCDC
>> oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCBicmVhazsNCj4+ICsg
>> wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgZGllKCJvb3BzIik7
>> DQo+DQo+IFRoYXQncyBnb2luZyB0byBiZSBhbiBvZmZpY2lhbCBnaXQgZXJyb3IgbWVzc2FnZT8g
>> V2h5IG5vdCBtYWtlIGl0ICJUaGUNCg0KSXQncyBub3QgImdvaW5nIHRvIGJlIiwgYnV0ICJoYXMg
>> YmVlbiBzbyBmb3IgdGhlIGxhc3QgdHdvIHllYXJzIHNpbmNlDQo1ZDIzZTEzIi4NCg0KSXQgaXMg
>> YW4gYXNzZXJ0LCBhbmQgSSB0aGluayBQZWZmJ3MgZGllKCJCVUc6IC4uLiIpIHdvdWxkIGJlIGEg
>> Z29vZCBpZGVhLg0K
>
> Huh?

I think Junio is trying to learn base64 :)!

This is what `base64 -d` gives:

Sverre Rabbelier <srabbelier@gmail.com> writes:

> On Sat, Apr 4, 2009 at 22:59, Christian Couder <chriscool@tuxfamily.org> wrote:
>> +                               if (lo <= mi && mi < hi)
>> +                                       break;
>> +                               die("oops");
>
> That's going to be an official git error message? Why not make it "The

It's not "going to be", but "has been so for the last two years since
5d23e13".

It is an assert, and I think Peff's die("BUG: ...") would be a good idea.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1
  2009-04-05 19:06     ` Sverre Rabbelier
@ 2009-04-05 19:59       ` Jeff King
  2009-04-05 20:05         ` Sverre Rabbelier
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2009-04-05 19:59 UTC (permalink / raw
  To: Sverre Rabbelier
  Cc: Junio C Hamano, Christian Couder, git, Johannes Schindelin

On Sun, Apr 05, 2009 at 09:06:56PM +0200, Sverre Rabbelier wrote:

> > It is an assert, and I think Peff's die("BUG: ...") would be a good idea.
> 
> As long as the <something that makes sense to the user> does indeed
> make sense, right :).

I think:

  die("BUG: assertion failed in binary search")

would be sufficient to tell the user what is going on, and let them
inform the list what happened.

However, if this "oops" has been there for 2 years and nobody has seen
it, it's entirely possible that somebody actually got the binary search
code right in the first place. ;)

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-05 19:31       ` Reece Dunn
@ 2009-04-05 20:02         ` Junio C Hamano
  2009-04-05 20:25           ` Jeff King
                             ` (2 more replies)
  2009-04-05 20:17         ` Jeff King
  2009-04-05 20:34         ` Jay Soffian
  2 siblings, 3 replies; 20+ messages in thread
From: Junio C Hamano @ 2009-04-05 20:02 UTC (permalink / raw
  To: Reece Dunn
  Cc: Felipe Contreras, Sverre Rabbelier, Christian Couder, git,
	Johannes Schindelin

Reece Dunn <msclrhd@googlemail.com> writes:

> 2009/4/5 Felipe Contreras <felipe.contreras@gmail.com>:
>> On Sun, Apr 5, 2009 at 9:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> U3ZlcnJlIFJhYmJlbGllciA8c3JhYmJlbGllckBnbWFpbC5jb20+IHdyaXRlczoNCg0KPiBPbiBT
>>> YXQsIEFwciA0LCAyMDA5IGF0IDIyOjU5LCBDaHJpc3RpYW4gQ291ZGVyIDxjaHJpc2Nvb2xAdHV4
>>> ZmFtaWx5Lm9yZz4gd3JvdGU6DQo+PiArIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKg
>>> IMKgIMKgIMKgIMKgIGlmIChsbyA8PSBtaSAmJiBtaSA8IGhpKQ0KPj4gKyDCoCDCoCDCoCDCoCDC
>>> oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCBicmVhazsNCj4+ICsg
>>> wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgZGllKCJvb3BzIik7
>>> DQo+DQo+IFRoYXQncyBnb2luZyB0byBiZSBhbiBvZmZpY2lhbCBnaXQgZXJyb3IgbWVzc2FnZT8g
>>> V2h5IG5vdCBtYWtlIGl0ICJUaGUNCg0KSXQncyBub3QgImdvaW5nIHRvIGJlIiwgYnV0ICJoYXMg
>>> YmVlbiBzbyBmb3IgdGhlIGxhc3QgdHdvIHllYXJzIHNpbmNlDQo1ZDIzZTEzIi4NCg0KSXQgaXMg
>>> YW4gYXNzZXJ0LCBhbmQgSSB0aGluayBQZWZmJ3MgZGllKCJCVUc6IC4uLiIpIHdvdWxkIGJlIGEg
>>> Z29vZCBpZGVhLg0K
>>
>> Huh?
>
> I think Junio is trying to learn base64 :)!

I think that is what my Gnus/message-mode did.  I do not know which letter
triggered it to decide it is UTF-8 to begin with, though.  As far as I am
aware, I didn't type anything non-ascii in my message.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-05 19:59       ` Jeff King
@ 2009-04-05 20:05         ` Sverre Rabbelier
  0 siblings, 0 replies; 20+ messages in thread
From: Sverre Rabbelier @ 2009-04-05 20:05 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, Christian Couder, git, Johannes Schindelin

Heya,

On Sun, Apr 5, 2009 at 21:59, Jeff King <peff@peff.net> wrote:
>  die("BUG: assertion failed in binary search")

Given that we now only have one binary search (which should be re-used
everywhere), I think it's fair enough to describe it like that.

> However, if this "oops" has been there for 2 years and nobody has seen
> it, it's entirely possible that somebody actually got the binary search
> code right in the first place. ;)

Hehe, never underestimate the difficulty of writing a proper binary
search! :P But I do agree two years of 'testing' is more than more
binary searches get before being 'released' ;).

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1
  2009-04-05 19:31       ` Reece Dunn
  2009-04-05 20:02         ` Junio C Hamano
@ 2009-04-05 20:17         ` Jeff King
  2009-04-05 20:34         ` Jay Soffian
  2 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2009-04-05 20:17 UTC (permalink / raw
  To: Reece Dunn
  Cc: Felipe Contreras, Junio C Hamano, Sverre Rabbelier,
	Christian Couder, git, Johannes Schindelin

On Sun, Apr 05, 2009 at 08:31:06PM +0100, Reece Dunn wrote:

> This is what `base64 -d` gives:
> [...]
> It's not "going to be", but "has been so for the last two years since
> 5d23e13".
> 
> It is an assert, and I think Peff's die("BUG: ...") would be a good idea.

Interestingly, I get a bunch of unprintable crap at the end. The culprit
seems to be that vger stupidly adds:

    --
    To unsubscribe from this list: send the line "unsubscribe git" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html

to the bottom, regardless of transfer-encoding. At best, this is
pointless and invisible, as the reader will just show the base64
content. But some decoders (like mutt) actually treat non-base64
characters not as "end of base64" but as "ignore and keep looking for
more base64". So this decodes into a bunch of random characters. And to
make it even more fun, it only happens if the message is a certain
length; otherwise, it needs "=" fill characters at the end, which
unambiguously signal the end.

"openssl base64 -d" stops decoding at the cruft. But I think what mutt
is doing is right. According to RFC 2045:

     The encoded output stream must be represented in lines of no more
     than 76 characters each.  All line breaks or other characters not
     found in Table 1 must be ignored by decoding software.  In base64
     data, characters other than those in Table 1, line breaks, and
     other white space probably indicate a transmission error, about
     which a warning message or even a message rejection might be
     appropriate under some circumstances.

I don't know if it is worth trying to get vger to be smarter. According
to this, they consider base64 text parts not worth handling:

  http://lkml.indiana.edu/hypermail/linux/kernel/0304.0/0901.html

So maybe it is worth trying to get Junio not to send base64 mail. ;)

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1
  2009-04-05 20:02         ` Junio C Hamano
@ 2009-04-05 20:25           ` Jeff King
  2009-04-05 20:35             ` Sverre Rabbelier
  2009-04-05 20:28           ` Gnus content transfer encoding (was: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1) Teemu Likonen
  2009-04-05 20:39           ` [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1 Reece Dunn
  2 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2009-04-05 20:25 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Reece Dunn, Sverre Rabbelier, Christian Couder, git

On Sun, Apr 05, 2009 at 01:02:30PM -0700, Junio C Hamano wrote:

> > I think Junio is trying to learn base64 :)!
> 
> I think that is what my Gnus/message-mode did.  I do not know which letter
> triggered it to decide it is UTF-8 to begin with, though.  As far as I am
> aware, I didn't type anything non-ascii in my message.

Actually, it is Sverre's fault. :)

You quoted his message, quoting Christian's message. Christian's message
was 7bit. But for some reason, Sverre's quoting of Christian's message
contains weird iso8859 space characters (0xa0).

But it is probably worth configuring Gnus to use QP instead of base64.
It's more efficient (for mostly ascii text), more readable to humans
looking at the encoded form, and is less likely to make you look like a
spammer. :)

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Gnus content transfer encoding (was: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1)
  2009-04-05 20:02         ` Junio C Hamano
  2009-04-05 20:25           ` Jeff King
@ 2009-04-05 20:28           ` Teemu Likonen
  2009-04-06  0:30             ` Gnus content transfer encoding Junio C Hamano
  2009-04-05 20:39           ` [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1 Reece Dunn
  2 siblings, 1 reply; 20+ messages in thread
From: Teemu Likonen @ 2009-04-05 20:28 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Reece Dunn, Felipe Contreras, Sverre Rabbelier, Christian Couder,
	git, Johannes Schindelin

On 2009-04-05 13:02 (-0700), Junio C Hamano wrote:

> Reece Dunn <msclrhd@googlemail.com> writes:
>> I think Junio is trying to learn base64 :)!
>
> I think that is what my Gnus/message-mode did. I do not know which
> letter triggered it to decide it is UTF-8 to begin with, though. As
> far as I am aware, I didn't type anything non-ascii in my message.

You can customize the encoding decision mechanism, for example:

    (setq mm-body-charset-encoding-alist
          '((iso-8859-1 . 8bit)
            (utf-8 . 8bit)))

For more info, see:

    C-h v mm-body-charset-encoding-alist RET

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-05 19:31       ` Reece Dunn
  2009-04-05 20:02         ` Junio C Hamano
  2009-04-05 20:17         ` Jeff King
@ 2009-04-05 20:34         ` Jay Soffian
  2 siblings, 0 replies; 20+ messages in thread
From: Jay Soffian @ 2009-04-05 20:34 UTC (permalink / raw
  To: Reece Dunn
  Cc: Felipe Contreras, Junio C Hamano, Sverre Rabbelier,
	Christian Couder, git, Johannes Schindelin

On Sun, Apr 5, 2009 at 3:31 PM, Reece Dunn <msclrhd@googlemail.com> wrote:
> 2009/4/5 Felipe Contreras <felipe.contreras@gmail.com>:
>> On Sun, Apr 5, 2009 at 9:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> U3ZlcnJlIFJhYmJlbGllciA8c3JhYmJlbGllckBnbWFpbC5jb20+IHdyaXRlczoNCg0KPiBPbiBT
>>> YXQsIEFwciA0LCAyMDA5IGF0IDIyOjU5LCBDaHJpc3RpYW4gQ291ZGVyIDxjaHJpc2Nvb2xAdHV4
>>> ZmFtaWx5Lm9yZz4gd3JvdGU6DQo+PiArIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKg
>>> IMKgIMKgIMKgIMKgIGlmIChsbyA8PSBtaSAmJiBtaSA8IGhpKQ0KPj4gKyDCoCDCoCDCoCDCoCDC
>>> oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCBicmVhazsNCj4+ICsg
>>> wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgZGllKCJvb3BzIik7
>>> DQo+DQo+IFRoYXQncyBnb2luZyB0byBiZSBhbiBvZmZpY2lhbCBnaXQgZXJyb3IgbWVzc2FnZT8g
>>> V2h5IG5vdCBtYWtlIGl0ICJUaGUNCg0KSXQncyBub3QgImdvaW5nIHRvIGJlIiwgYnV0ICJoYXMg
>>> YmVlbiBzbyBmb3IgdGhlIGxhc3QgdHdvIHllYXJzIHNpbmNlDQo1ZDIzZTEzIi4NCg0KSXQgaXMg
>>> YW4gYXNzZXJ0LCBhbmQgSSB0aGluayBQZWZmJ3MgZGllKCJCVUc6IC4uLiIpIHdvdWxkIGJlIGEg
>>> Z29vZCBpZGVhLg0K
>>
>> Huh?
>
> I think Junio is trying to learn base64 :)!

Junio's _original_ message was fine. The problem is that vger
(majordomo) appends the mailing list footer which technically corrupts
the message. Respectable MUA's can deal with the corruption, but
gmail's web-interface just shows the raw base64 (previously it used to
just show an empty message). I've filed a bug against gmail, but who
knows.

The other options are:

- fix majordomo on vger
- replace majordomo on vger with a decent MLM
- disable the mailing list footer
- deal with it if you're a gmail user

j.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-05 20:25           ` Jeff King
@ 2009-04-05 20:35             ` Sverre Rabbelier
  0 siblings, 0 replies; 20+ messages in thread
From: Sverre Rabbelier @ 2009-04-05 20:35 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, Reece Dunn, Christian Couder, git

Heya,

On Sun, Apr 5, 2009 at 22:25, Jeff King <peff@peff.net> wrote:
> Actually, it is Sverre's fault. :)

Fine, blame the Dutch, why not! :P

> You quoted his message, quoting Christian's message. Christian's message
> was 7bit. But for some reason, Sverre's quoting of Christian's message
> contains weird iso8859 space characters (0xa0).

*points at GMail*, I ain't dun nothing funny! It's always been funny
WRT to vger and encoding issues though, so I'm not surprised.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to  efficiently lookup sha1
  2009-04-05 20:02         ` Junio C Hamano
  2009-04-05 20:25           ` Jeff King
  2009-04-05 20:28           ` Gnus content transfer encoding (was: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1) Teemu Likonen
@ 2009-04-05 20:39           ` Reece Dunn
  2 siblings, 0 replies; 20+ messages in thread
From: Reece Dunn @ 2009-04-05 20:39 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Felipe Contreras, Sverre Rabbelier, Christian Couder, git,
	Johannes Schindelin

2009/4/5 Junio C Hamano <gitster@pobox.com>:
> Reece Dunn <msclrhd@googlemail.com> writes:
>
>> 2009/4/5 Felipe Contreras <felipe.contreras@gmail.com>:
>>>
>>> Huh?
>>
>> I think Junio is trying to learn base64 :)!
>
> I think that is what my Gnus/message-mode did.  I do not know which letter
> triggered it to decide it is UTF-8 to begin with, though.  As far as I am
> aware, I didn't type anything non-ascii in my message.

Ok, so digging a little deeper, `base64 -d | od -t x1` gives the
following (partial) output:

0000200 6f 74 65 3a 0d 0a 3e 3e 20 2b 20 c2 a0 20 c2 a0

The key entry here is the a0 character (NO-BREAK SPACE) or NBSP.

This is at:
$ base64 -d test | od -t x1 | grep a0 | head -n 1 | xxd -r
ote:
>> +

So looks like it is in the patch you are quoting. NOTE: I have removed
the spaces after the +. Also, according to www.unicode.org, c2 is A^
(A with a circumflex) -- not sure what that is doing there, though.

- Reece

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gnus content transfer encoding
  2009-04-05 20:28           ` Gnus content transfer encoding (was: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1) Teemu Likonen
@ 2009-04-06  0:30             ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2009-04-06  0:30 UTC (permalink / raw
  To: Teemu Likonen
  Cc: Reece Dunn, Felipe Contreras, Sverre Rabbelier, Christian Couder,
	git, Johannes Schindelin

Teemu Likonen <tlikonen@iki.fi> writes:

> On 2009-04-05 13:02 (-0700), Junio C Hamano wrote:
>
>> Reece Dunn <msclrhd@googlemail.com> writes:
>>> I think Junio is trying to learn base64 :)!
>>
>> I think that is what my Gnus/message-mode did. I do not know which
>> letter triggered it to decide it is UTF-8 to begin with, though. As
>> far as I am aware, I didn't type anything non-ascii in my message.
>
> You can customize the encoding decision mechanism, for example:
>
>     (setq mm-body-charset-encoding-alist
>           '((iso-8859-1 . 8bit)
>             (utf-8 . 8bit)))
>
> For more info, see:
>
>     C-h v mm-body-charset-encoding-alist RET

Interesting.

I have had these for a long time:

	(setq mm-coding-system-priorities '(us-ascii iso-2022-jp utf-8 iso-8859-1))
        (setq mm-content-transfer-encoding-defaults
              '(("text/.*" 8bit)
                ("message/rfc822" 8bit)
                ("application/emacs-lisp" qp-or-base64)
                ("application/x-emacs-lisp" qp-or-base64)
                ("application/x-patch" qp-or-base64)
                (".*" base64)))

I did not have any customization on my own to body-charset-encoding-alist 
and C-h v gave me:

    mm-body-charset-encoding-alist's value is 
    ((iso-2022-jp . 7bit)
     (iso-2022-jp-2 . 7bit)
     (utf-16 . base64)
     (utf-16be . base64)
     (utf-16le . base64))

I'll have the following in my .emacs in addition to the coding-system-prio
and c-t-e-defaults I already have:

        (setq mm-body-charset-encoding-alist
              '((iso-2022-jp . 7bit)
                (iso-2022-jp-2 . 7bit)
                (iso-8859-1 . 8bit)
                (utf-8 . 8bit)))

and will see what happens, but I wonder how this new one interacts with
the c-t-e-defaults.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-04-06  0:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-04 20:59 [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1 Christian Couder
2009-04-05 10:17 ` Sverre Rabbelier
2009-04-05 14:41   ` Jeff King
2009-04-05 14:42     ` Sverre Rabbelier
2009-04-05 18:59   ` Junio C Hamano
2009-04-05 19:06     ` Sverre Rabbelier
2009-04-05 19:59       ` Jeff King
2009-04-05 20:05         ` Sverre Rabbelier
2009-04-05 19:19     ` Felipe Contreras
     [not found]       ` <87vdpi29a4.fsf@iki.fi>
2009-04-05 19:30         ` Felipe Contreras
2009-04-05 19:31       ` Reece Dunn
2009-04-05 20:02         ` Junio C Hamano
2009-04-05 20:25           ` Jeff King
2009-04-05 20:35             ` Sverre Rabbelier
2009-04-05 20:28           ` Gnus content transfer encoding (was: [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1) Teemu Likonen
2009-04-06  0:30             ` Gnus content transfer encoding Junio C Hamano
2009-04-05 20:39           ` [PATCH 1/4] sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1 Reece Dunn
2009-04-05 20:17         ` Jeff King
2009-04-05 20:34         ` Jay Soffian
2009-04-05 13:11 ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).