git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* security: potential out-of-bound read at ewah_io.c |ewah_read_mmap|
@ 2018-06-14 22:59 Luat Nguyen
  2018-06-15  3:28 ` Jeff King
  2018-06-19 19:00 ` Dyer, Edwin
  0 siblings, 2 replies; 58+ messages in thread
From: Luat Nguyen @ 2018-06-14 22:59 UTC (permalink / raw)
  To: git

Hi folks,

Recently, I’ve found a security issue related to out-of-bound read at function named `ewah_read_mmap`

Assume that, an attacker can put malicious `./git/index` into a repo by somehow.

Since there is lack of check whether the remaining size of `ptr`is equal to `buffer_size` or not.

So the code reads exceed the buffer of `ptr` and reach to higher page. In this case, it is `/lib/x86_64-linux-gnu/ld-2.23.so`.

Leads to infoleak. You can find more details and asan crash below.



# xxd .git/index
00000000: 4449 5243 0000 0002 0000 0000 4653 4d4e  DIRC........FSMN
00000010: 0000 0024 0000 0001 1538 2489 c8fc 3616  ...$.....8$...6.
00000020: 0000 0014 0000 0000 0000 2000 4141       .......... .AA
                                    ^ evil size here = 0x2000


***** SNIP CODE *****

int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len)
{
… 
	self->buffer_size = self->alloc_size = get_be32(ptr);
	ptr += sizeof(uint32_t);
… 
	memcpy(self->buffer, ptr, self->buffer_size * sizeof(eword_t));


[memory map]

    0x7f990eca3000     0x7f990eca4000 r--p     1000 0      /media/sf_Fuzz/vuln_repo/.git/index <— where `ptr` is placed
    0x7f990eca4000     0x7f990eca5000 r--p     1000 25000  /lib/x86_64-linux-gnu/ld-2.23.so <— memcpy will reach here
    0x7f990eca5000     0x7f990eca6000 rw-p     1000 26000  /lib/x86_64-linux-gnu/ld-2.23.so <— and here 


[ ASAN log ]

root@guest:/media/sf_SHARE/vuln_repo# /media/sf_SHARE/git-master-asan/git status
=================================================================
==4324==ERROR: AddressSanitizer: unknown-crash on address 0x7f6f235b0000 at pc 0x0000004bba79 bp 0x7ffc75e68850 sp 0x7ffc75e68000
READ of size 65536 at 0x7f6f235b0000 thread T0
    #0 0x4bba78 in __asan_memcpy /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cc:23:3
    #1 0x8c910e in ewah_read_mmap /media/sf_SHARE/git-master-asan/ewah/ewah_io.c:144:2
    #2 0x8e2534 in read_fsmonitor_extension /media/sf_SHARE/git-master-asan/fsmonitor.c:46:8
    #3 0xa05862 in read_index_extension /media/sf_SHARE/git-master-asan/read-cache.c:1615:3
    #4 0xa046f3 in do_read_index /media/sf_SHARE/git-master-asan/read-cache.c:1872:7
    #5 0xa03325 in read_index_from /media/sf_SHARE/git-master-asan/read-cache.c:1913:8
    #6 0xa03231 in read_index /media/sf_SHARE/git-master-asan/read-cache.c:1634:9
    #7 0x9de5e8 in read_index_preload /media/sf_SHARE/git-master-asan/preload-index.c:119:15
    #8 0x566cc6 in cmd_status /media/sf_SHARE/git-master-asan/builtin/commit.c:1358:2
    #9 0x4ede8c in run_builtin /media/sf_SHARE/git-master-asan/git.c:417:11
    #10 0x4ea939 in handle_builtin /media/sf_SHARE/git-master-asan/git.c:632:8
    #11 0x4ed655 in run_argv /media/sf_SHARE/git-master-asan/git.c:684:4
    #12 0x4ea037 in cmd_main /media/sf_SHARE/git-master-asan/git.c:761:19
    #13 0x759c8b in main /media/sf_SHARE/git-master-asan/common-main.c:45:9
    #14 0x7f6f2243382f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #15 0x41c268 in _start (/media/sf_SHARE/git-master-asan/git+0x41c268)

Address 0x7f6f235b0000 is a wild pointer.
SUMMARY: AddressSanitizer: unknown-crash /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cc:23:3 in __asan_memcpy
Shadow bytes around the buggy address:
  0x0fee646adfb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee646adfc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee646adfd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee646adfe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee646adff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0fee646ae000:[fe]fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fee646ae010: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fee646ae020: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fee646ae030: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fee646ae040: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fee646ae050: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==4324==ABORTING
root@guest:/media/sf_SHARE/vuln_repo#


Regards,
Luat Nguyen.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: security: potential out-of-bound read at ewah_io.c |ewah_read_mmap|
  2018-06-14 22:59 security: potential out-of-bound read at ewah_io.c |ewah_read_mmap| Luat Nguyen
@ 2018-06-15  3:28 ` Jeff King
  2018-06-15  3:31   ` [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads Jeff King
                     ` (4 more replies)
  2018-06-19 19:00 ` Dyer, Edwin
  1 sibling, 5 replies; 58+ messages in thread
From: Jeff King @ 2018-06-15  3:28 UTC (permalink / raw)
  To: Luat Nguyen; +Cc: git

On Fri, Jun 15, 2018 at 06:59:43AM +0800, Luat Nguyen wrote:

> Recently, I’ve found a security issue related to out-of-bound read at function named `ewah_read_mmap`

Thanks, this is definitely a bug worth addressing. But note...

> Assume that, an attacker can put malicious `./git/index` into a repo by somehow.

We generally don't consider .git/index (or pack .bitmap files, which
also use this implementation) to be a major part of Git's attack
surface, since they are generated locally. And if you can write to
somebody's .git directory, there are already much easier ways to execute
arbitrary code.

> Since there is lack of check whether the remaining size of `ptr`is
> equal to `buffer_size` or not.

Yep. We also fail to check if we even have enough bytes to read the
buffer_size in the first place.

Here are some patches. The first one fixes the problem you found. The
second one drops some dead code that has a related problem. And the
third just drops some dead code that I noticed in the same file. :)

  [1/3]: ewah_read_mmap: bounds-check mmap reads
  [2/3]: ewah: drop ewah_deserialize function
  [3/3]: ewah: drop ewah_serialize_native function

 ewah/ewah_io.c          | 106 ++++++++--------------------------------
 ewah/ewok.h             |   4 +-
 t/t5310-pack-bitmaps.sh |  13 +++++
 3 files changed, 35 insertions(+), 88 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-15  3:28 ` Jeff King
@ 2018-06-15  3:31   ` Jeff King
  2018-06-15  9:14     ` SZEDER Gábor
                       ` (2 more replies)
  2018-06-15  3:31   ` [PATCH 2/3] ewah: drop ewah_deserialize function Jeff King
                     ` (3 subsequent siblings)
  4 siblings, 3 replies; 58+ messages in thread
From: Jeff King @ 2018-06-15  3:31 UTC (permalink / raw)
  To: Luat Nguyen; +Cc: git

The on-disk ewah format tells us how big the ewah data is,
and we blindly read that much from the buffer without
considering whether the mmap'd data is long enough, which
can lead to out-of-bound reads.

Let's make sure we have data available before reading it,
both for the ewah header/footer as well as for the bit data
itself. In particular:

  - keep our ptr/len pair in sync as we move through the
    buffer, and check it before each read

  - check the size for integer overflow (this should be
    impossible on 64-bit, as the size is given as a 32-bit
    count of 8-byte words, but is possible on a 32-bit
    system)

  - return the number of bytes read as an ssize_t instead of
    an int, again to prevent integer overflow

  - compute the return value using a pointer difference;
    this should yield the same result as the existing code,
    but makes it more obvious that we got our computations
    right

The included test is far from comprehensive, as it just
picks a static point at which to truncate the generated
bitmap. But in practice this will hit in the middle of an
ewah and make sure we're at least exercising this code.

Reported-by: Luat Nguyen <root@l4w.io>
Signed-off-by: Jeff King <peff@peff.net>
---
 ewah/ewah_io.c          | 25 +++++++++++++++++++++----
 ewah/ewok.h             |  2 +-
 t/t5310-pack-bitmaps.sh | 13 +++++++++++++
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index bed1994551..33c08c40f8 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -122,16 +122,23 @@ int ewah_serialize_strbuf(struct ewah_bitmap *self, struct strbuf *sb)
 	return ewah_serialize_to(self, write_strbuf, sb);
 }
 
-int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len)
+ssize_t ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len)
 {
 	const uint8_t *ptr = map;
+	size_t data_len;
 	size_t i;
 
+	if (len < sizeof(uint32_t))
+		return error("corrupt ewah bitmap: eof before bit size");
 	self->bit_size = get_be32(ptr);
 	ptr += sizeof(uint32_t);
+	len -= sizeof(uint32_t);
 
+	if (len < sizeof(uint32_t))
+		return error("corrupt ewah bitmap: eof before length");
 	self->buffer_size = self->alloc_size = get_be32(ptr);
 	ptr += sizeof(uint32_t);
+	len -= sizeof(uint32_t);
 
 	REALLOC_ARRAY(self->buffer, self->alloc_size);
 
@@ -141,15 +148,25 @@ int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len)
 	 * the endianness conversion in a separate pass to ensure
 	 * we're loading 8-byte aligned words.
 	 */
-	memcpy(self->buffer, ptr, self->buffer_size * sizeof(eword_t));
-	ptr += self->buffer_size * sizeof(eword_t);
+	data_len = st_mult(self->buffer_size, sizeof(eword_t));
+	if (len < data_len)
+		return error("corrupt ewah bitmap: eof in data "
+			     "(%"PRIuMAX" bytes short)",
+			     (uintmax_t)(data_len - len));
+	memcpy(self->buffer, ptr, data_len);
+	ptr += data_len;
+	len -= data_len;
 
 	for (i = 0; i < self->buffer_size; ++i)
 		self->buffer[i] = ntohll(self->buffer[i]);
 
+	if (len < sizeof(uint32_t))
+		return error("corrupt ewah bitmap: eof before rlw");
 	self->rlw = self->buffer + get_be32(ptr);
+	ptr += sizeof(uint32_t);
+	len -= sizeof(uint32_t);
 
-	return (3 * 4) + (self->buffer_size * 8);
+	return ptr - (const uint8_t *)map;
 }
 
 int ewah_deserialize(struct ewah_bitmap *self, int fd)
diff --git a/ewah/ewok.h b/ewah/ewok.h
index dc43d05b64..357fd93c84 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -91,7 +91,7 @@ int ewah_serialize_native(struct ewah_bitmap *self, int fd);
 int ewah_serialize_strbuf(struct ewah_bitmap *self, struct strbuf *);
 
 int ewah_deserialize(struct ewah_bitmap *self, int fd);
-int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
+ssize_t ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
 
 uint32_t ewah_checksum(struct ewah_bitmap *self);
 
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 423c0a475f..237ee6e5fc 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -331,4 +331,17 @@ test_expect_success 'pack reuse respects --incremental' '
 	git show-index <empty.idx >actual &&
 	test_cmp expect actual
 '
+
+test_expect_success 'truncated bitmap fails gracefully' '
+	git repack -ad &&
+	git rev-list --use-bitmap-index --count --all >expect &&
+	bitmap=$(ls .git/objects/pack/*.bitmap) &&
+	test_when_finished "rm -f $bitmap" &&
+	head -c 512 <$bitmap >$bitmap.tmp &&
+	mv $bitmap.tmp $bitmap &&
+	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+	test_cmp expect actual &&
+	test_i18ngrep corrupt stderr
+'
+
 test_done
-- 
2.18.0.rc2.534.g53d976aeb8


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/3] ewah: drop ewah_deserialize function
  2018-06-15  3:28 ` Jeff King
  2018-06-15  3:31   ` [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads Jeff King
@ 2018-06-15  3:31   ` Jeff King
  2018-06-15  3:32   ` [PATCH 3/3] ewah: drop ewah_serialize_native function Jeff King
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-06-15  3:31 UTC (permalink / raw)
  To: Luat Nguyen; +Cc: git

We don't call this function, and in fact never have since it
was added (at least not in iterations of the ewah patches
that got merged). Instead we use ewah_read_mmap().

Let's drop the unused code.

Note to anybody who later wants to resurrect this: it does
not check for integer overflow in the ewah data size,
meaning it may be possible to convince the code to allocate
a too-small buffer and read() into it.

Signed-off-by: Jeff King <peff@peff.net>
---
 ewah/ewah_io.c | 55 --------------------------------------------------
 ewah/ewok.h    |  1 -
 2 files changed, 56 deletions(-)

diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index 33c08c40f8..97c74266da 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -168,58 +168,3 @@ ssize_t ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len)
 
 	return ptr - (const uint8_t *)map;
 }
-
-int ewah_deserialize(struct ewah_bitmap *self, int fd)
-{
-	size_t i;
-	eword_t dump[2048];
-	const size_t words_per_dump = sizeof(dump) / sizeof(eword_t);
-	uint32_t bitsize, word_count, rlw_pos;
-
-	eword_t *buffer = NULL;
-	size_t words_left;
-
-	ewah_clear(self);
-
-	/* 32 bit -- bit size for the map */
-	if (read(fd, &bitsize, 4) != 4)
-		return -1;
-
-	self->bit_size = (size_t)ntohl(bitsize);
-
-	/** 32 bit -- number of compressed 64-bit words */
-	if (read(fd, &word_count, 4) != 4)
-		return -1;
-
-	self->buffer_size = self->alloc_size = (size_t)ntohl(word_count);
-	REALLOC_ARRAY(self->buffer, self->alloc_size);
-
-	/** 64 bit x N -- compressed words */
-	buffer = self->buffer;
-	words_left = self->buffer_size;
-
-	while (words_left >= words_per_dump) {
-		if (read(fd, dump, sizeof(dump)) != sizeof(dump))
-			return -1;
-
-		for (i = 0; i < words_per_dump; ++i, ++buffer)
-			*buffer = ntohll(dump[i]);
-
-		words_left -= words_per_dump;
-	}
-
-	if (words_left) {
-		if (read(fd, dump, words_left * 8) != words_left * 8)
-			return -1;
-
-		for (i = 0; i < words_left; ++i, ++buffer)
-			*buffer = ntohll(dump[i]);
-	}
-
-	/** 32 bit -- position for the RLW */
-	if (read(fd, &rlw_pos, 4) != 4)
-		return -1;
-
-	self->rlw = self->buffer + ntohl(rlw_pos);
-	return 0;
-}
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 357fd93c84..7e25ca2e61 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -90,7 +90,6 @@ int ewah_serialize(struct ewah_bitmap *self, int fd);
 int ewah_serialize_native(struct ewah_bitmap *self, int fd);
 int ewah_serialize_strbuf(struct ewah_bitmap *self, struct strbuf *);
 
-int ewah_deserialize(struct ewah_bitmap *self, int fd);
 ssize_t ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
 
 uint32_t ewah_checksum(struct ewah_bitmap *self);
-- 
2.18.0.rc2.534.g53d976aeb8


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 3/3] ewah: drop ewah_serialize_native function
  2018-06-15  3:28 ` Jeff King
  2018-06-15  3:31   ` [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads Jeff King
  2018-06-15  3:31   ` [PATCH 2/3] ewah: drop ewah_deserialize function Jeff King
@ 2018-06-15  3:32   ` Jeff King
  2018-06-15 13:56     ` Ramsay Jones
  2018-06-15  3:44   ` [PATCH 4/3] ewah: adjust callers of ewah_read_mmap() Jeff King
  2018-06-15 16:11   ` security: potential out-of-bound read at ewah_io.c |ewah_read_mmap| Junio C Hamano
  4 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-06-15  3:32 UTC (permalink / raw)
  To: Luat Nguyen; +Cc: git

We don't call this function, and never have. The on-disk
bitmap format uses network-byte-order integers, meaning that
we cannot use the native-byte-order format written here.

Let's drop it in the name of simplicity.

Signed-off-by: Jeff King <peff@peff.net>
---
 ewah/ewah_io.c | 26 --------------------------
 ewah/ewok.h    |  1 -
 2 files changed, 27 deletions(-)

diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index 97c74266da..586396122f 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -20,32 +20,6 @@
 #include "ewok.h"
 #include "strbuf.h"
 
-int ewah_serialize_native(struct ewah_bitmap *self, int fd)
-{
-	uint32_t write32;
-	size_t to_write = self->buffer_size * 8;
-
-	/* 32 bit -- bit size for the map */
-	write32 = (uint32_t)self->bit_size;
-	if (write(fd, &write32, 4) != 4)
-		return -1;
-
-	/** 32 bit -- number of compressed 64-bit words */
-	write32 = (uint32_t)self->buffer_size;
-	if (write(fd, &write32, 4) != 4)
-		return -1;
-
-	if (write(fd, self->buffer, to_write) != to_write)
-		return -1;
-
-	/** 32 bit -- position for the RLW */
-	write32 = self->rlw - self->buffer;
-	if (write(fd, &write32, 4) != 4)
-		return -1;
-
-	return (3 * 4) + to_write;
-}
-
 int ewah_serialize_to(struct ewah_bitmap *self,
 		      int (*write_fun)(void *, const void *, size_t),
 		      void *data)
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 7e25ca2e61..e6102e6c75 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -87,7 +87,6 @@ int ewah_serialize_to(struct ewah_bitmap *self,
 		      int (*write_fun)(void *out, const void *buf, size_t len),
 		      void *out);
 int ewah_serialize(struct ewah_bitmap *self, int fd);
-int ewah_serialize_native(struct ewah_bitmap *self, int fd);
 int ewah_serialize_strbuf(struct ewah_bitmap *self, struct strbuf *);
 
 ssize_t ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
-- 
2.18.0.rc2.534.g53d976aeb8

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 4/3] ewah: adjust callers of ewah_read_mmap()
  2018-06-15  3:28 ` Jeff King
                     ` (2 preceding siblings ...)
  2018-06-15  3:32   ` [PATCH 3/3] ewah: drop ewah_serialize_native function Jeff King
@ 2018-06-15  3:44   ` Jeff King
  2018-06-15 11:23     ` Derrick Stolee
  2018-06-15 17:12     ` Junio C Hamano
  2018-06-15 16:11   ` security: potential out-of-bound read at ewah_io.c |ewah_read_mmap| Junio C Hamano
  4 siblings, 2 replies; 58+ messages in thread
From: Jeff King @ 2018-06-15  3:44 UTC (permalink / raw)
  To: Luat Nguyen; +Cc: git

On Thu, Jun 14, 2018 at 11:28:50PM -0400, Jeff King wrote:

> Yep. We also fail to check if we even have enough bytes to read the
> buffer_size in the first place.
> 
> Here are some patches. The first one fixes the problem you found. The
> second one drops some dead code that has a related problem. And the
> third just drops some dead code that I noticed in the same file. :)
> 
>   [1/3]: ewah_read_mmap: bounds-check mmap reads
>   [2/3]: ewah: drop ewah_deserialize function
>   [3/3]: ewah: drop ewah_serialize_native function

Actually, we'd want this one on top. Arguably it could be squashed into
patch 1.

-- >8 --
Subject: ewah: adjust callers of ewah_read_mmap()

The return value of ewah_read_mmap() is now an ssize_t,
since we could (in theory) process up to 32GB of data. This
would never happen in practice, but a corrupt or malicious
.bitmap or index file could convince us to do so.

Let's make sure that we don't stuff the value into an int,
which would cause us to incorrectly move our pointer
forward.  We'd always move too little, since negative values
are used for reporting errors. So the worst case is just
that we end up reporting a corrupt file, not an
out-of-bounds read.

Signed-off-by: Jeff King <peff@peff.net>
---
 dir.c         | 3 ++-
 pack-bitmap.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 61b513a078..d5185660f1 100644
--- a/dir.c
+++ b/dir.c
@@ -2725,7 +2725,8 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 	struct read_data rd;
 	const unsigned char *next = data, *end = (const unsigned char *)data + sz;
 	const char *ident;
-	int ident_len, len;
+	int ident_len;
+	ssize_t len;
 	const char *exclude_per_dir;
 
 	if (sz <= 1 || end[-1] != '\0')
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 369bf69d75..2f27b10e35 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -125,7 +125,7 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 {
 	struct ewah_bitmap *b = ewah_pool_new();
 
-	int bitmap_size = ewah_read_mmap(b,
+	ssize_t bitmap_size = ewah_read_mmap(b,
 		index->map + index->map_pos,
 		index->map_size - index->map_pos);
 
-- 
2.18.0.rc2.534.g53d976aeb8


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-15  3:31   ` [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads Jeff King
@ 2018-06-15  9:14     ` SZEDER Gábor
  2018-06-15 16:20       ` Junio C Hamano
  2018-06-15 17:05     ` Junio C Hamano
  2018-06-16 14:35     ` SZEDER Gábor
  2 siblings, 1 reply; 58+ messages in thread
From: SZEDER Gábor @ 2018-06-15  9:14 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Luat Nguyen, git

> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 423c0a475f..237ee6e5fc 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -331,4 +331,17 @@ test_expect_success 'pack reuse respects --incremental' '
>  	git show-index <empty.idx >actual &&
>  	test_cmp expect actual
>  '
> +
> +test_expect_success 'truncated bitmap fails gracefully' '
> +	git repack -ad &&
> +	git rev-list --use-bitmap-index --count --all >expect &&
> +	bitmap=$(ls .git/objects/pack/*.bitmap) &&

I think the 'ls' is unnecessary and this would do:

  bitmap=.git/objects/pack/*.bitmap

> +	test_when_finished "rm -f $bitmap" &&
> +	head -c 512 <$bitmap >$bitmap.tmp &&
> +	mv $bitmap.tmp $bitmap &&
> +	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
> +	test_cmp expect actual &&
> +	test_i18ngrep corrupt stderr
> +'
> +
>  test_done
> -- 
> 2.18.0.rc2.534.g53d976aeb8
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/3] ewah: adjust callers of ewah_read_mmap()
  2018-06-15  3:44   ` [PATCH 4/3] ewah: adjust callers of ewah_read_mmap() Jeff King
@ 2018-06-15 11:23     ` Derrick Stolee
  2018-06-15 16:41       ` Junio C Hamano
  2018-06-15 17:12     ` Junio C Hamano
  1 sibling, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 11:23 UTC (permalink / raw)
  To: Jeff King, Luat Nguyen; +Cc: git

On 6/14/2018 11:44 PM, Jeff King wrote:
> The return value of ewah_read_mmap() is now an ssize_t,
> since we could (in theory) process up to 32GB of data. This
> would never happen in practice, but a corrupt or malicious
> .bitmap or index file could convince us to do so.

Aside: Why a 32GB limit? Usually I see 32-bit issues limiting by 2 
(signed) or 4 GB (unsigned). Is there something specifically in the 
bitmap format that stops at this size?

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/3] ewah: drop ewah_serialize_native function
  2018-06-15  3:32   ` [PATCH 3/3] ewah: drop ewah_serialize_native function Jeff King
@ 2018-06-15 13:56     ` Ramsay Jones
  2018-06-15 14:07       ` Ramsay Jones
  2018-06-15 14:15       ` [PATCH 3/3] ewah: drop ewah_serialize_native function Derrick Stolee
  0 siblings, 2 replies; 58+ messages in thread
From: Ramsay Jones @ 2018-06-15 13:56 UTC (permalink / raw)
  To: Jeff King, Luat Nguyen; +Cc: git



On 15/06/18 04:32, Jeff King wrote:
> We don't call this function, and never have. The on-disk
> bitmap format uses network-byte-order integers, meaning that
> we cannot use the native-byte-order format written here.
> 
> Let's drop it in the name of simplicity.

Hmm, if you are in the mood to drop ewah dead code, how about:

  ewah/bitmap.o   - bitmap_clear
  ewah/bitmap.o   - bitmap_each_bit
  ewah/ewah_bitmap.o      - ewah_and
  ewah/ewah_bitmap.o      - ewah_and_not
  ewah/ewah_bitmap.o      - ewah_not
  ewah/ewah_bitmap.o      - ewah_or

... in addition to these *(de)serialize* functions. ;-)

ATB,
Ramsay Jones


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/3] ewah: drop ewah_serialize_native function
  2018-06-15 13:56     ` Ramsay Jones
@ 2018-06-15 14:07       ` Ramsay Jones
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
  2018-06-15 14:15       ` [PATCH 3/3] ewah: drop ewah_serialize_native function Derrick Stolee
  1 sibling, 1 reply; 58+ messages in thread
From: Ramsay Jones @ 2018-06-15 14:07 UTC (permalink / raw)
  To: Jeff King, Luat Nguyen; +Cc: git



On 15/06/18 14:56, Ramsay Jones wrote:
> 
> 
> On 15/06/18 04:32, Jeff King wrote:
>> We don't call this function, and never have. The on-disk
>> bitmap format uses network-byte-order integers, meaning that
>> we cannot use the native-byte-order format written here.
>>
>> Let's drop it in the name of simplicity.
> 
> Hmm, if you are in the mood to drop ewah dead code, how about:
> 
>   ewah/bitmap.o   - bitmap_clear
>   ewah/bitmap.o   - bitmap_each_bit
>   ewah/ewah_bitmap.o      - ewah_and
>   ewah/ewah_bitmap.o      - ewah_and_not
>   ewah/ewah_bitmap.o      - ewah_not
>   ewah/ewah_bitmap.o      - ewah_or
> 
> ... in addition to these *(de)serialize* functions. ;-)

Hmm, I didn't spot it at first, but ewah_serialize() also
seems to be unused!

ATB,
Ramsay Jones


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/3] ewah: drop ewah_serialize_native function
  2018-06-15 13:56     ` Ramsay Jones
  2018-06-15 14:07       ` Ramsay Jones
@ 2018-06-15 14:15       ` Derrick Stolee
  2018-06-15 17:51         ` Jeff King
  1 sibling, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:15 UTC (permalink / raw)
  To: Ramsay Jones, Jeff King, Luat Nguyen; +Cc: git

On 6/15/2018 9:56 AM, Ramsay Jones wrote:
>
> On 15/06/18 04:32, Jeff King wrote:
>> We don't call this function, and never have. The on-disk
>> bitmap format uses network-byte-order integers, meaning that
>> we cannot use the native-byte-order format written here.
>>
>> Let's drop it in the name of simplicity.
> Hmm, if you are in the mood to drop ewah dead code, how about:
>
>    ewah/bitmap.o   - bitmap_clear
>    ewah/bitmap.o   - bitmap_each_bit
>    ewah/ewah_bitmap.o      - ewah_and
>    ewah/ewah_bitmap.o      - ewah_and_not
>    ewah/ewah_bitmap.o      - ewah_not
>    ewah/ewah_bitmap.o      - ewah_or
>
> ... in addition to these *(de)serialize* functions. ;-)
>

Normally, I would say we should keep this folder as close to the 
"original" [1] as possible, so we could more easily take improvements to 
that library. However, it appears that code is not being updated. 
Perhaps this is in favor of other EWAH libraries [2] or other compressed 
bitmap formats [3].

For that reason, I agree we should clean this up. We shouldn't block 
Peff's patch for that reason. I'll send a patch that deletes these methods.

Thanks,
-Stolee:

[1] https://github.com/vmg/libewok

[2] https://github.com/lemire/EWAHBoolArray

[3] https://github.com/RoaringBitmap/CRoaring

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 0/8] Delete unused methods in EWAH bitmap
  2018-06-15 14:07       ` Ramsay Jones
@ 2018-06-15 14:30         ` Derrick Stolee
  2018-06-15 14:30           ` [PATCH 1/8] ewah/bitmap.c: delete unused 'bitmap_clear()' Derrick Stolee
                             ` (9 more replies)
  0 siblings, 10 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:30 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: ramsay@ramsayjones.plus.com, peff@peff.next, Derrick Stolee

The EWAH bitmap code includes several logical operations that are
important for a general-purpose bitmap library. However, Git only
uses the XOR operation for storing deltas between reachability
bitmaps. This means that we can delete the following unused methods:

* ewah_and()
* ewah_and_not()
* ewah_not()
* ewah_or()
* ewah_serialize()
* ewah_serialize_native()

We can also delete the unused methods bitmap_clear() and
bitmap_each_bit().

Derrick Stolee (8):
  ewah/bitmap.c: delete unused 'bitmap_clear()'
  ewah/bitmap.c: delete unused 'bitmap_each_bit()'
  ewah_bitmap: delete unused 'ewah_and()'
  ewah_bitmap: delete unused 'ewah_and_not()'
  ewah_bitmap: delete unused 'ewah_not()'
  ewah_bitmap: delete unused 'ewah_or()'
  ewah_io: delete unused 'ewah_serialize()'
  ewah_io: delete unused 'ewah_serialize_native()'

 ewah/bitmap.c      |  32 -------
 ewah/ewah_bitmap.c | 229 ---------------------------------------------
 ewah/ewah_io.c     |  36 -------
 ewah/ewok.h        |  24 -----
 4 files changed, 321 deletions(-)


base-commit: fc54c1af3ec09bab8b8ea09768c2da4069b7f53e
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/8] ewah/bitmap.c: delete unused 'bitmap_clear()'
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
@ 2018-06-15 14:30           ` Derrick Stolee
  2018-06-15 14:46             ` Ramsay Jones
  2018-06-15 14:30           ` [PATCH 2/8] ewah/bitmap.c: delete unused 'bitmap_each_bit()' Derrick Stolee
                             ` (8 subsequent siblings)
  9 siblings, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:30 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: ramsay@ramsayjones.plus.com, peff@peff.next, Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/bitmap.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 756bdd050e..d61dc6114a 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -45,14 +45,6 @@ void bitmap_set(struct bitmap *self, size_t pos)
 	self->words[block] |= EWAH_MASK(pos);
 }
 
-void bitmap_clear(struct bitmap *self, size_t pos)
-{
-	size_t block = EWAH_BLOCK(pos);
-
-	if (block < self->word_alloc)
-		self->words[block] &= ~EWAH_MASK(pos);
-}
-
 int bitmap_get(struct bitmap *self, size_t pos)
 {
 	size_t block = EWAH_BLOCK(pos);
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/8] ewah/bitmap.c: delete unused 'bitmap_each_bit()'
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
  2018-06-15 14:30           ` [PATCH 1/8] ewah/bitmap.c: delete unused 'bitmap_clear()' Derrick Stolee
@ 2018-06-15 14:30           ` Derrick Stolee
  2018-06-15 15:03             ` Ramsay Jones
  2018-06-15 14:30           ` [PATCH 3/8] ewah_bitmap: delete unused 'ewah_and()' Derrick Stolee
                             ` (7 subsequent siblings)
  9 siblings, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:30 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: ramsay@ramsayjones.plus.com, peff@peff.next, Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/bitmap.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index d61dc6114a..52f1178db4 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -129,30 +129,6 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other)
 		self->words[i++] |= word;
 }
 
-void bitmap_each_bit(struct bitmap *self, ewah_callback callback, void *data)
-{
-	size_t pos = 0, i;
-
-	for (i = 0; i < self->word_alloc; ++i) {
-		eword_t word = self->words[i];
-		uint32_t offset;
-
-		if (word == (eword_t)~0) {
-			for (offset = 0; offset < BITS_IN_EWORD; ++offset)
-				callback(pos++, data);
-		} else {
-			for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
-				if ((word >> offset) == 0)
-					break;
-
-				offset += ewah_bit_ctz64(word >> offset);
-				callback(pos + offset, data);
-			}
-			pos += BITS_IN_EWORD;
-		}
-	}
-}
-
 size_t bitmap_popcount(struct bitmap *self)
 {
 	size_t i, count = 0;
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 3/8] ewah_bitmap: delete unused 'ewah_and()'
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
  2018-06-15 14:30           ` [PATCH 1/8] ewah/bitmap.c: delete unused 'bitmap_clear()' Derrick Stolee
  2018-06-15 14:30           ` [PATCH 2/8] ewah/bitmap.c: delete unused 'bitmap_each_bit()' Derrick Stolee
@ 2018-06-15 14:30           ` Derrick Stolee
  2018-06-15 14:30           ` [PATCH 4/8] ewah_bitmap: delete unused 'ewah_and_not()' Derrick Stolee
                             ` (6 subsequent siblings)
  9 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:30 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: ramsay@ramsayjones.plus.com, peff@peff.next, Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_bitmap.c | 68 ----------------------------------------------
 ewah/ewok.h        |  5 ----
 2 files changed, 73 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index b9fdda1d3d..5a4d3a6eb6 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -459,74 +459,6 @@ void ewah_xor(
 	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
 }
 
-void ewah_and(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out)
-{
-	struct rlw_iterator rlw_i;
-	struct rlw_iterator rlw_j;
-	size_t literals;
-
-	rlwit_init(&rlw_i, ewah_i);
-	rlwit_init(&rlw_j, ewah_j);
-
-	while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
-		while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
-			struct rlw_iterator *prey, *predator;
-
-			if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
-				prey = &rlw_i;
-				predator = &rlw_j;
-			} else {
-				prey = &rlw_j;
-				predator = &rlw_i;
-			}
-
-			if (predator->rlw.running_bit == 0) {
-				ewah_add_empty_words(out, 0,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(prey,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			} else {
-				size_t index = rlwit_discharge(prey, out,
-					predator->rlw.running_len, 0);
-				ewah_add_empty_words(out, 0,
-					predator->rlw.running_len - index);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			}
-		}
-
-		literals = min_size(
-			rlw_i.rlw.literal_words,
-			rlw_j.rlw.literal_words);
-
-		if (literals) {
-			size_t k;
-
-			for (k = 0; k < literals; ++k) {
-				ewah_add(out,
-					rlw_i.buffer[rlw_i.literal_word_start + k] &
-					rlw_j.buffer[rlw_j.literal_word_start + k]
-				);
-			}
-
-			rlwit_discard_first_words(&rlw_i, literals);
-			rlwit_discard_first_words(&rlw_j, literals);
-		}
-	}
-
-	if (rlwit_word_size(&rlw_i) > 0)
-		rlwit_discharge_empty(&rlw_i, out);
-	else
-		rlwit_discharge_empty(&rlw_j, out);
-
-	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
-}
-
 void ewah_and_not(
 	struct ewah_bitmap *ewah_i,
 	struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index dc43d05b64..5841c83507 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -179,11 +179,6 @@ void ewah_xor(
 	struct ewah_bitmap *ewah_j,
 	struct ewah_bitmap *out);
 
-void ewah_and(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out);
-
 /**
  * Direct word access
  */
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 4/8] ewah_bitmap: delete unused 'ewah_and_not()'
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
                             ` (2 preceding siblings ...)
  2018-06-15 14:30           ` [PATCH 3/8] ewah_bitmap: delete unused 'ewah_and()' Derrick Stolee
@ 2018-06-15 14:30           ` Derrick Stolee
  2018-06-15 14:30           ` [PATCH 5/8] ewah_bitmap: delete unused 'ewah_not()' Derrick Stolee
                             ` (5 subsequent siblings)
  9 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:30 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: ramsay@ramsayjones.plus.com, peff@peff.next, Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_bitmap.c | 73 ----------------------------------------------
 ewah/ewok.h        |  5 ----
 2 files changed, 78 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 5a4d3a6eb6..559adde37c 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -459,79 +459,6 @@ void ewah_xor(
 	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
 }
 
-void ewah_and_not(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out)
-{
-	struct rlw_iterator rlw_i;
-	struct rlw_iterator rlw_j;
-	size_t literals;
-
-	rlwit_init(&rlw_i, ewah_i);
-	rlwit_init(&rlw_j, ewah_j);
-
-	while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
-		while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
-			struct rlw_iterator *prey, *predator;
-
-			if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
-				prey = &rlw_i;
-				predator = &rlw_j;
-			} else {
-				prey = &rlw_j;
-				predator = &rlw_i;
-			}
-
-			if ((predator->rlw.running_bit && prey == &rlw_i) ||
-				(!predator->rlw.running_bit && prey != &rlw_i)) {
-				ewah_add_empty_words(out, 0,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(prey,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			} else {
-				size_t index;
-				int negate_words;
-
-				negate_words = (&rlw_i != prey);
-				index = rlwit_discharge(prey, out,
-					predator->rlw.running_len, negate_words);
-				ewah_add_empty_words(out, negate_words,
-					predator->rlw.running_len - index);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			}
-		}
-
-		literals = min_size(
-			rlw_i.rlw.literal_words,
-			rlw_j.rlw.literal_words);
-
-		if (literals) {
-			size_t k;
-
-			for (k = 0; k < literals; ++k) {
-				ewah_add(out,
-					rlw_i.buffer[rlw_i.literal_word_start + k] &
-					~(rlw_j.buffer[rlw_j.literal_word_start + k])
-				);
-			}
-
-			rlwit_discard_first_words(&rlw_i, literals);
-			rlwit_discard_first_words(&rlw_j, literals);
-		}
-	}
-
-	if (rlwit_word_size(&rlw_i) > 0)
-		rlwit_discharge(&rlw_i, out, ~0, 0);
-	else
-		rlwit_discharge_empty(&rlw_j, out);
-
-	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
-}
-
 void ewah_or(
 	struct ewah_bitmap *ewah_i,
 	struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 5841c83507..21c420770e 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -169,11 +169,6 @@ void ewah_or(
 	struct ewah_bitmap *ewah_j,
 	struct ewah_bitmap *out);
 
-void ewah_and_not(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out);
-
 void ewah_xor(
 	struct ewah_bitmap *ewah_i,
 	struct ewah_bitmap *ewah_j,
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 5/8] ewah_bitmap: delete unused 'ewah_not()'
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
                             ` (3 preceding siblings ...)
  2018-06-15 14:30           ` [PATCH 4/8] ewah_bitmap: delete unused 'ewah_and_not()' Derrick Stolee
@ 2018-06-15 14:30           ` Derrick Stolee
  2018-06-15 14:30           ` [PATCH 6/8] ewah_bitmap: delete unused 'ewah_or()' Derrick Stolee
                             ` (4 subsequent siblings)
  9 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:30 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: ramsay@ramsayjones.plus.com, peff@peff.next, Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_bitmap.c | 19 -------------------
 ewah/ewok.h        |  7 -------
 2 files changed, 26 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 559adde37c..3fa3edf2a3 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -376,25 +376,6 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
 		read_new_rlw(it);
 }
 
-void ewah_not(struct ewah_bitmap *self)
-{
-	size_t pointer = 0;
-
-	while (pointer < self->buffer_size) {
-		eword_t *word = &self->buffer[pointer];
-		size_t literals, k;
-
-		rlw_xor_run_bit(word);
-		++pointer;
-
-		literals = rlw_get_literal_words(word);
-		for (k = 0; k < literals; ++k) {
-			self->buffer[pointer] = ~self->buffer[pointer];
-			++pointer;
-		}
-	}
-}
-
 void ewah_xor(
 	struct ewah_bitmap *ewah_i,
 	struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 21c420770e..4b665d09d9 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -95,13 +95,6 @@ int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
 
 uint32_t ewah_checksum(struct ewah_bitmap *self);
 
-/**
- * Logical not (bitwise negation) in-place on the bitmap
- *
- * This operation is linear time based on the size of the bitmap.
- */
-void ewah_not(struct ewah_bitmap *self);
-
 /**
  * Call the given callback with the position of every single bit
  * that has been set on the bitmap.
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 6/8] ewah_bitmap: delete unused 'ewah_or()'
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
                             ` (4 preceding siblings ...)
  2018-06-15 14:30           ` [PATCH 5/8] ewah_bitmap: delete unused 'ewah_not()' Derrick Stolee
@ 2018-06-15 14:30           ` Derrick Stolee
  2018-06-15 14:30           ` [PATCH 7/8] ewah_io: delete unused 'ewah_serialize()' Derrick Stolee
                             ` (3 subsequent siblings)
  9 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:30 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: ramsay@ramsayjones.plus.com, peff@peff.next, Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_bitmap.c | 69 ----------------------------------------------
 ewah/ewok.h        |  5 ----
 2 files changed, 74 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 3fa3edf2a3..017c677f98 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -440,75 +440,6 @@ void ewah_xor(
 	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
 }
 
-void ewah_or(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out)
-{
-	struct rlw_iterator rlw_i;
-	struct rlw_iterator rlw_j;
-	size_t literals;
-
-	rlwit_init(&rlw_i, ewah_i);
-	rlwit_init(&rlw_j, ewah_j);
-
-	while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
-		while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
-			struct rlw_iterator *prey, *predator;
-
-			if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
-				prey = &rlw_i;
-				predator = &rlw_j;
-			} else {
-				prey = &rlw_j;
-				predator = &rlw_i;
-			}
-
-			if (predator->rlw.running_bit) {
-				ewah_add_empty_words(out, 0,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(prey,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			} else {
-				size_t index = rlwit_discharge(prey, out,
-					predator->rlw.running_len, 0);
-				ewah_add_empty_words(out, 0,
-					predator->rlw.running_len - index);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			}
-		}
-
-		literals = min_size(
-			rlw_i.rlw.literal_words,
-			rlw_j.rlw.literal_words);
-
-		if (literals) {
-			size_t k;
-
-			for (k = 0; k < literals; ++k) {
-				ewah_add(out,
-					rlw_i.buffer[rlw_i.literal_word_start + k] |
-					rlw_j.buffer[rlw_j.literal_word_start + k]
-				);
-			}
-
-			rlwit_discard_first_words(&rlw_i, literals);
-			rlwit_discard_first_words(&rlw_j, literals);
-		}
-	}
-
-	if (rlwit_word_size(&rlw_i) > 0)
-		rlwit_discharge(&rlw_i, out, ~0, 0);
-	else
-		rlwit_discharge(&rlw_j, out, ~0, 0);
-
-	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
-}
-
-
 #define BITMAP_POOL_MAX 16
 static struct ewah_bitmap *bitmap_pool[BITMAP_POOL_MAX];
 static size_t bitmap_pool_size;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 4b665d09d9..cd826a0136 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -157,11 +157,6 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent);
  */
 int ewah_iterator_next(eword_t *next, struct ewah_iterator *it);
 
-void ewah_or(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out);
-
 void ewah_xor(
 	struct ewah_bitmap *ewah_i,
 	struct ewah_bitmap *ewah_j,
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 7/8] ewah_io: delete unused 'ewah_serialize()'
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
                             ` (5 preceding siblings ...)
  2018-06-15 14:30           ` [PATCH 6/8] ewah_bitmap: delete unused 'ewah_or()' Derrick Stolee
@ 2018-06-15 14:30           ` Derrick Stolee
  2018-06-15 14:30           ` [PATCH 8/8] ewah_io: delete unused 'ewah_serialize_native()' Derrick Stolee
                             ` (2 subsequent siblings)
  9 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:30 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: ramsay@ramsayjones.plus.com, peff@peff.next, Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_io.c | 10 ----------
 ewah/ewok.h    |  1 -
 2 files changed, 11 deletions(-)

diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index bed1994551..86d3f1d02e 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -100,16 +100,6 @@ int ewah_serialize_to(struct ewah_bitmap *self,
 	return (3 * 4) + (self->buffer_size * 8);
 }
 
-static int write_helper(void *fd, const void *buf, size_t len)
-{
-	return write((intptr_t)fd, buf, len);
-}
-
-int ewah_serialize(struct ewah_bitmap *self, int fd)
-{
-	return ewah_serialize_to(self, write_helper, (void *)(intptr_t)fd);
-}
-
 static int write_strbuf(void *user_data, const void *data, size_t len)
 {
 	struct strbuf *sb = user_data;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index cd826a0136..1055706e44 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -86,7 +86,6 @@ void ewah_free(struct ewah_bitmap *self);
 int ewah_serialize_to(struct ewah_bitmap *self,
 		      int (*write_fun)(void *out, const void *buf, size_t len),
 		      void *out);
-int ewah_serialize(struct ewah_bitmap *self, int fd);
 int ewah_serialize_native(struct ewah_bitmap *self, int fd);
 int ewah_serialize_strbuf(struct ewah_bitmap *self, struct strbuf *);
 
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 8/8] ewah_io: delete unused 'ewah_serialize_native()'
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
                             ` (6 preceding siblings ...)
  2018-06-15 14:30           ` [PATCH 7/8] ewah_io: delete unused 'ewah_serialize()' Derrick Stolee
@ 2018-06-15 14:30           ` Derrick Stolee
  2018-06-15 15:01             ` Ramsay Jones
  2018-06-15 14:35           ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
  2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
  9 siblings, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:30 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: ramsay@ramsayjones.plus.com, peff@peff.next, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_io.c | 26 --------------------------
 ewah/ewok.h    |  1 -
 2 files changed, 27 deletions(-)

diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index 86d3f1d02e..dde736bcd9 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -20,32 +20,6 @@
 #include "ewok.h"
 #include "strbuf.h"
 
-int ewah_serialize_native(struct ewah_bitmap *self, int fd)
-{
-	uint32_t write32;
-	size_t to_write = self->buffer_size * 8;
-
-	/* 32 bit -- bit size for the map */
-	write32 = (uint32_t)self->bit_size;
-	if (write(fd, &write32, 4) != 4)
-		return -1;
-
-	/** 32 bit -- number of compressed 64-bit words */
-	write32 = (uint32_t)self->buffer_size;
-	if (write(fd, &write32, 4) != 4)
-		return -1;
-
-	if (write(fd, self->buffer, to_write) != to_write)
-		return -1;
-
-	/** 32 bit -- position for the RLW */
-	write32 = self->rlw - self->buffer;
-	if (write(fd, &write32, 4) != 4)
-		return -1;
-
-	return (3 * 4) + to_write;
-}
-
 int ewah_serialize_to(struct ewah_bitmap *self,
 		      int (*write_fun)(void *, const void *, size_t),
 		      void *data)
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 1055706e44..8a2b334278 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -86,7 +86,6 @@ void ewah_free(struct ewah_bitmap *self);
 int ewah_serialize_to(struct ewah_bitmap *self,
 		      int (*write_fun)(void *out, const void *buf, size_t len),
 		      void *out);
-int ewah_serialize_native(struct ewah_bitmap *self, int fd);
 int ewah_serialize_strbuf(struct ewah_bitmap *self, struct strbuf *);
 
 int ewah_deserialize(struct ewah_bitmap *self, int fd);
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/8] Delete unused methods in EWAH bitmap
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
                             ` (7 preceding siblings ...)
  2018-06-15 14:30           ` [PATCH 8/8] ewah_io: delete unused 'ewah_serialize_native()' Derrick Stolee
@ 2018-06-15 14:35           ` Derrick Stolee
  2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
  9 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 14:35 UTC (permalink / raw)
  To: Derrick Stolee, git@vger.kernel.org
  Cc: ramsay@ramsayjones.plus.com, peff@peff.next

On 6/15/2018 10:30 AM, Derrick Stolee wrote:
> The EWAH bitmap code includes several logical operations that are
> important for a general-purpose bitmap library. However, Git only
> uses the XOR operation for storing deltas between reachability
> bitmaps. This means that we can delete the following unused methods:
>
> * ewah_and()
> * ewah_and_not()
> * ewah_not()
> * ewah_or()
> * ewah_serialize()
> * ewah_serialize_native()
>
> We can also delete the unused methods bitmap_clear() and
> bitmap_each_bit().
>
> Derrick Stolee (8):
>    ewah/bitmap.c: delete unused 'bitmap_clear()'
>    ewah/bitmap.c: delete unused 'bitmap_each_bit()'
>    ewah_bitmap: delete unused 'ewah_and()'
>    ewah_bitmap: delete unused 'ewah_and_not()'
>    ewah_bitmap: delete unused 'ewah_not()'
>    ewah_bitmap: delete unused 'ewah_or()'
>    ewah_io: delete unused 'ewah_serialize()'
>    ewah_io: delete unused 'ewah_serialize_native()'
>
>   ewah/bitmap.c      |  32 -------
>   ewah/ewah_bitmap.c | 229 ---------------------------------------------
>   ewah/ewah_io.c     |  36 -------
>   ewah/ewok.h        |  24 -----
>   4 files changed, 321 deletions(-)
>
>
> base-commit: fc54c1af3ec09bab8b8ea09768c2da4069b7f53e

Responders to this thread beware: I accidentally added an extra letter 
in Peff's email address, so be careful with reply-all.

Sorry!
-Stolee

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/8] ewah/bitmap.c: delete unused 'bitmap_clear()'
  2018-06-15 14:30           ` [PATCH 1/8] ewah/bitmap.c: delete unused 'bitmap_clear()' Derrick Stolee
@ 2018-06-15 14:46             ` Ramsay Jones
  2018-06-15 15:11               ` Derrick Stolee
  0 siblings, 1 reply; 58+ messages in thread
From: Ramsay Jones @ 2018-06-15 14:46 UTC (permalink / raw)
  To: Derrick Stolee, git@vger.kernel.org; +Cc: Jeff King



On 15/06/18 15:30, Derrick Stolee wrote:
> Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  ewah/bitmap.c | 8 --------
>  1 file changed, 8 deletions(-)
> 
> diff --git a/ewah/bitmap.c b/ewah/bitmap.c
> index 756bdd050e..d61dc6114a 100644
> --- a/ewah/bitmap.c
> +++ b/ewah/bitmap.c
> @@ -45,14 +45,6 @@ void bitmap_set(struct bitmap *self, size_t pos)
>  	self->words[block] |= EWAH_MASK(pos);
>  }
>  
> -void bitmap_clear(struct bitmap *self, size_t pos)
> -{
> -	size_t block = EWAH_BLOCK(pos);
> -
> -	if (block < self->word_alloc)
> -		self->words[block] &= ~EWAH_MASK(pos);
> -}
> -
>  int bitmap_get(struct bitmap *self, size_t pos)
>  {
>  	size_t block = EWAH_BLOCK(pos);
> 

I haven't read all the patches yet, but what about the extern
declaration in ewah/ewok.h?

ATB,
Ramsay Jones



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 8/8] ewah_io: delete unused 'ewah_serialize_native()'
  2018-06-15 14:30           ` [PATCH 8/8] ewah_io: delete unused 'ewah_serialize_native()' Derrick Stolee
@ 2018-06-15 15:01             ` Ramsay Jones
  2018-06-15 15:10               ` Derrick Stolee
  0 siblings, 1 reply; 58+ messages in thread
From: Ramsay Jones @ 2018-06-15 15:01 UTC (permalink / raw)
  To: Derrick Stolee, git@vger.kernel.org; +Cc: Jeff King



On 15/06/18 15:30, Derrick Stolee wrote:
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  ewah/ewah_io.c | 26 --------------------------
>  ewah/ewok.h    |  1 -
>  2 files changed, 27 deletions(-)

This duplicates Jeff's patch #3.

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/8] ewah/bitmap.c: delete unused 'bitmap_each_bit()'
  2018-06-15 14:30           ` [PATCH 2/8] ewah/bitmap.c: delete unused 'bitmap_each_bit()' Derrick Stolee
@ 2018-06-15 15:03             ` Ramsay Jones
  0 siblings, 0 replies; 58+ messages in thread
From: Ramsay Jones @ 2018-06-15 15:03 UTC (permalink / raw)
  To: Derrick Stolee, git@vger.kernel.org; +Cc: Jeff King



On 15/06/18 15:30, Derrick Stolee wrote:
> Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  ewah/bitmap.c | 24 ------------------------
>  1 file changed, 24 deletions(-)
> 
> diff --git a/ewah/bitmap.c b/ewah/bitmap.c
> index d61dc6114a..52f1178db4 100644
> --- a/ewah/bitmap.c
> +++ b/ewah/bitmap.c
> @@ -129,30 +129,6 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other)
>  		self->words[i++] |= word;
>  }
>  
> -void bitmap_each_bit(struct bitmap *self, ewah_callback callback, void *data)
> -{
> -	size_t pos = 0, i;
> -
> -	for (i = 0; i < self->word_alloc; ++i) {
> -		eword_t word = self->words[i];
> -		uint32_t offset;
> -
> -		if (word == (eword_t)~0) {
> -			for (offset = 0; offset < BITS_IN_EWORD; ++offset)
> -				callback(pos++, data);
> -		} else {
> -			for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
> -				if ((word >> offset) == 0)
> -					break;
> -
> -				offset += ewah_bit_ctz64(word >> offset);
> -				callback(pos + offset, data);
> -			}
> -			pos += BITS_IN_EWORD;
> -		}
> -	}
> -}
> -
>  size_t bitmap_popcount(struct bitmap *self)
>  {
>  	size_t i, count = 0;
> 

Again, remove extern declaration from header file.

ATB,
Ramsay Jones



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 8/8] ewah_io: delete unused 'ewah_serialize_native()'
  2018-06-15 15:01             ` Ramsay Jones
@ 2018-06-15 15:10               ` Derrick Stolee
  0 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 15:10 UTC (permalink / raw)
  To: Ramsay Jones, Derrick Stolee, git@vger.kernel.org; +Cc: Jeff King

On 6/15/2018 11:01 AM, Ramsay Jones wrote:
>
> On 15/06/18 15:30, Derrick Stolee wrote:
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   ewah/ewah_io.c | 26 --------------------------
>>   ewah/ewok.h    |  1 -
>>   2 files changed, 27 deletions(-)
> This duplicates Jeff's patch #3.
Thanks. I got a bit hasty as I was working from 'maint' and didn't 
double-check.

-Stolee

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/8] ewah/bitmap.c: delete unused 'bitmap_clear()'
  2018-06-15 14:46             ` Ramsay Jones
@ 2018-06-15 15:11               ` Derrick Stolee
  0 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 15:11 UTC (permalink / raw)
  To: Ramsay Jones, Derrick Stolee, git@vger.kernel.org; +Cc: Jeff King

On 6/15/2018 10:46 AM, Ramsay Jones wrote:
>
> On 15/06/18 15:30, Derrick Stolee wrote:
>> Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   ewah/bitmap.c | 8 --------
>>   1 file changed, 8 deletions(-)
>>
>> diff --git a/ewah/bitmap.c b/ewah/bitmap.c
>> index 756bdd050e..d61dc6114a 100644
>> --- a/ewah/bitmap.c
>> +++ b/ewah/bitmap.c
>> @@ -45,14 +45,6 @@ void bitmap_set(struct bitmap *self, size_t pos)
>>   	self->words[block] |= EWAH_MASK(pos);
>>   }
>>   
>> -void bitmap_clear(struct bitmap *self, size_t pos)
>> -{
>> -	size_t block = EWAH_BLOCK(pos);
>> -
>> -	if (block < self->word_alloc)
>> -		self->words[block] &= ~EWAH_MASK(pos);
>> -}
>> -
>>   int bitmap_get(struct bitmap *self, size_t pos)
>>   {
>>   	size_t block = EWAH_BLOCK(pos);
>>
> I haven't read all the patches yet, but what about the extern
> declaration in ewah/ewok.h?

Thanks! I'll get that, too.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: security: potential out-of-bound read at ewah_io.c |ewah_read_mmap|
  2018-06-15  3:28 ` Jeff King
                     ` (3 preceding siblings ...)
  2018-06-15  3:44   ` [PATCH 4/3] ewah: adjust callers of ewah_read_mmap() Jeff King
@ 2018-06-15 16:11   ` Junio C Hamano
  4 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2018-06-15 16:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Luat Nguyen, git

Jeff King <peff@peff.net> writes:

> On Fri, Jun 15, 2018 at 06:59:43AM +0800, Luat Nguyen wrote:
>
>> Recently, I’ve found a security issue related to out-of-bound read at function named `ewah_read_mmap`
>
> Thanks, this is definitely a bug worth addressing. But note...
>
>> Assume that, an attacker can put malicious `./git/index` into a repo by somehow.
>
> We generally don't consider .git/index (or pack .bitmap files, which
> also use this implementation) to be a major part of Git's attack
> surface, since they are generated locally. And if you can write to
> somebody's .git directory, there are already much easier ways to execute
> arbitrary code.

Thanks for giving a fair assessment on the gravity of the issue, to
which I agree fully, and also fixes and clean-ups.



>
>> Since there is lack of check whether the remaining size of `ptr`is
>> equal to `buffer_size` or not.
>
> Yep. We also fail to check if we even have enough bytes to read the
> buffer_size in the first place.
>
> Here are some patches. The first one fixes the problem you found. The
> second one drops some dead code that has a related problem. And the
> third just drops some dead code that I noticed in the same file. :)
>
>   [1/3]: ewah_read_mmap: bounds-check mmap reads
>   [2/3]: ewah: drop ewah_deserialize function
>   [3/3]: ewah: drop ewah_serialize_native function
>
>  ewah/ewah_io.c          | 106 ++++++++--------------------------------
>  ewah/ewok.h             |   4 +-
>  t/t5310-pack-bitmaps.sh |  13 +++++
>  3 files changed, 35 insertions(+), 88 deletions(-)
>
> -Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-15  9:14     ` SZEDER Gábor
@ 2018-06-15 16:20       ` Junio C Hamano
  2018-06-15 17:10         ` SZEDER Gábor
  0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2018-06-15 16:20 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Jeff King, Luat Nguyen, git

SZEDER Gábor <szeder.dev@gmail.com> writes:

>> +	bitmap=$(ls .git/objects/pack/*.bitmap) &&
>
> I think the 'ls' is unnecessary and this would do:
>
>   bitmap=.git/objects/pack/*.bitmap

Yuck.

>> +	test_when_finished "rm -f $bitmap" &&

OK, this will become "rm -f .git/objects/pack/*.bitmap" and then
eval that implements when-finished would make it work OK.

>> +	head -c 512 <$bitmap >$bitmap.tmp &&

The reading side would be OK but would the writing side also be
alright?

	makefile=Mak?file
	head -n 4 <$makefile >$makefile.tmp
	/bin/ls -t -1 | head -n 1
        Mak?file.tmp

Hmm...

>> +	mv $bitmap.tmp $bitmap &&

Likewise...

I said "yuck" because the original does not work if there happen to
be more than (or for that matter, less than) one '.bitmap' file
there.  But at least as long as there is one, it should work ;-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/3] ewah: adjust callers of ewah_read_mmap()
  2018-06-15 11:23     ` Derrick Stolee
@ 2018-06-15 16:41       ` Junio C Hamano
  2018-06-15 17:31         ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2018-06-15 16:41 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jeff King, Luat Nguyen, git

Derrick Stolee <stolee@gmail.com> writes:

> On 6/14/2018 11:44 PM, Jeff King wrote:
>> The return value of ewah_read_mmap() is now an ssize_t,
>> since we could (in theory) process up to 32GB of data. This
>> would never happen in practice, but a corrupt or malicious
>> .bitmap or index file could convince us to do so.
>
> Aside: Why a 32GB limit? Usually I see 32-bit issues limiting by 2
> (signed) or 4 GB (unsigned). Is there something specifically in the
> bitmap format that stops at this size?

The proposed log message for 1/3 has this bit

  - check the size for integer overflow (this should be
    impossible on 64-bit, as the size is given as a 32-bit
    count of 8-byte words, but is possible on a 32-bit
    system)

4 Giga 8-byte words makes 32 Giga bytes, I'd guess.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-15  3:31   ` [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads Jeff King
  2018-06-15  9:14     ` SZEDER Gábor
@ 2018-06-15 17:05     ` Junio C Hamano
  2018-06-15 17:26       ` Jeff King
  2018-06-16 14:35     ` SZEDER Gábor
  2 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2018-06-15 17:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Luat Nguyen, git

Jeff King <peff@peff.net> writes:

> diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
> index bed1994551..33c08c40f8 100644
> --- a/ewah/ewah_io.c
> +++ b/ewah/ewah_io.c
> @@ -122,16 +122,23 @@ int ewah_serialize_strbuf(struct ewah_bitmap *self, struct strbuf *sb)
>  	return ewah_serialize_to(self, write_strbuf, sb);
>  }
>  
> -int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len)
> +ssize_t ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len)
>  {
>  	const uint8_t *ptr = map;
> +	size_t data_len;
>  	size_t i;
>  
> +	if (len < sizeof(uint32_t))
> +		return error("corrupt ewah bitmap: eof before bit size");
>  	self->bit_size = get_be32(ptr);
>  	ptr += sizeof(uint32_t);
> +	len -= sizeof(uint32_t);
>  
> +	if (len < sizeof(uint32_t))
> +		return error("corrupt ewah bitmap: eof before length");
>  	self->buffer_size = self->alloc_size = get_be32(ptr);
>  	ptr += sizeof(uint32_t);
> +	len -= sizeof(uint32_t);
>  
>  	REALLOC_ARRAY(self->buffer, self->alloc_size);
>  
> @@ -141,15 +148,25 @@ int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len)
>  	 * the endianness conversion in a separate pass to ensure
>  	 * we're loading 8-byte aligned words.
>  	 */
> -	memcpy(self->buffer, ptr, self->buffer_size * sizeof(eword_t));
> -	ptr += self->buffer_size * sizeof(eword_t);


> +	data_len = st_mult(self->buffer_size, sizeof(eword_t));

This is a faithful conversion from the original, but I somehow would
have appreciated if the latter were not sizeof(eword_t) but rather
sizeof(self->buffer_size[0]), especially as I wondered ...

> +	if (len < data_len)
> +		return error("corrupt ewah bitmap: eof in data "
> +			     "(%"PRIuMAX" bytes short)",
> +			     (uintmax_t)(data_len - len));
> +	memcpy(self->buffer, ptr, data_len);
> +	ptr += data_len;
> +	len -= data_len;
>  
>  	for (i = 0; i < self->buffer_size; ++i)
>  		self->buffer[i] = ntohll(self->buffer[i]);

... what individual datum one iteration of this loop is copying, and
then realized "buffer_size" is a misleading field name (anything that 
claims to be size and not measuring in bytes is misleading to me ;-).

> +	if (len < sizeof(uint32_t))
> +		return error("corrupt ewah bitmap: eof before rlw");
>  	self->rlw = self->buffer + get_be32(ptr);
> +	ptr += sizeof(uint32_t);
> +	len -= sizeof(uint32_t);
>  
> -	return (3 * 4) + (self->buffer_size * 8);
> +	return ptr - (const uint8_t *)map;

Much nicer; I needed to wonder what these 12 and 8 in the original are.

>  
>  int ewah_deserialize(struct ewah_bitmap *self, int fd);
> -int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
> +ssize_t ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);

I double checked all the callers and made sure that they are already
prepared to react sensibly to error returns, which is good.

>  
>  uint32_t ewah_checksum(struct ewah_bitmap *self);
>  
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 423c0a475f..237ee6e5fc 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -331,4 +331,17 @@ test_expect_success 'pack reuse respects --incremental' '
>  	git show-index <empty.idx >actual &&
>  	test_cmp expect actual
>  '
> +
> +test_expect_success 'truncated bitmap fails gracefully' '
> +	git repack -ad &&
> +	git rev-list --use-bitmap-index --count --all >expect &&
> +	bitmap=$(ls .git/objects/pack/*.bitmap) &&
> +	test_when_finished "rm -f $bitmap" &&
> +	head -c 512 <$bitmap >$bitmap.tmp &&
> +	mv $bitmap.tmp $bitmap &&
> +	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
> +	test_cmp expect actual &&
> +	test_i18ngrep corrupt stderr
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-15 16:20       ` Junio C Hamano
@ 2018-06-15 17:10         ` SZEDER Gábor
  2018-06-15 17:21           ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: SZEDER Gábor @ 2018-06-15 17:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Luat Nguyen, Git mailing list

On Fri, Jun 15, 2018 at 6:20 PM, Junio C Hamano <gitster@pobox.com> wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
>
>>> +    bitmap=$(ls .git/objects/pack/*.bitmap) &&
>>
>> I think the 'ls' is unnecessary and this would do:
>>
>>   bitmap=.git/objects/pack/*.bitmap
>
> Yuck.

Oh, wow, now this is embarrassing...


> I said "yuck" because the original does not work if there happen to
> be more than (or for that matter, less than) one '.bitmap' file
> there.  But at least as long as there is one, it should work ;-)

Well, the test starts with 'git repack -ad', so there can be only one
bitmap file.  (Unless something is broken, of course, but the second
test would catch that much earlier.)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/3] ewah: adjust callers of ewah_read_mmap()
  2018-06-15  3:44   ` [PATCH 4/3] ewah: adjust callers of ewah_read_mmap() Jeff King
  2018-06-15 11:23     ` Derrick Stolee
@ 2018-06-15 17:12     ` Junio C Hamano
  1 sibling, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2018-06-15 17:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Luat Nguyen, git

Jeff King <peff@peff.net> writes:

> On Thu, Jun 14, 2018 at 11:28:50PM -0400, Jeff King wrote:
>
>> Yep. We also fail to check if we even have enough bytes to read the
>> buffer_size in the first place.
>> 
>> Here are some patches. The first one fixes the problem you found. The
>> second one drops some dead code that has a related problem. And the
>> third just drops some dead code that I noticed in the same file. :)
>> 
>>   [1/3]: ewah_read_mmap: bounds-check mmap reads
>>   [2/3]: ewah: drop ewah_deserialize function
>>   [3/3]: ewah: drop ewah_serialize_native function
>
> Actually, we'd want this one on top. Arguably it could be squashed into
> patch 1.
>
> -- >8 --
> Subject: ewah: adjust callers of ewah_read_mmap()
>
> The return value of ewah_read_mmap() is now an ssize_t,
> since we could (in theory) process up to 32GB of data. This
> would never happen in practice, but a corrupt or malicious
> .bitmap or index file could convince us to do so.
>
> Let's make sure that we don't stuff the value into an int,
> which would cause us to incorrectly move our pointer
> forward.  We'd always move too little, since negative values
> are used for reporting errors. So the worst case is just
> that we end up reporting a corrupt file, not an
> out-of-bounds read.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

Makes sense.

>  dir.c         | 3 ++-
>  pack-bitmap.c | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 61b513a078..d5185660f1 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2725,7 +2725,8 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
>  	struct read_data rd;
>  	const unsigned char *next = data, *end = (const unsigned char *)data + sz;
>  	const char *ident;
> -	int ident_len, len;
> +	int ident_len;
> +	ssize_t len;
>  	const char *exclude_per_dir;
>  
>  	if (sz <= 1 || end[-1] != '\0')
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 369bf69d75..2f27b10e35 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -125,7 +125,7 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
>  {
>  	struct ewah_bitmap *b = ewah_pool_new();
>  
> -	int bitmap_size = ewah_read_mmap(b,
> +	ssize_t bitmap_size = ewah_read_mmap(b,
>  		index->map + index->map_pos,
>  		index->map_size - index->map_pos);

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-15 17:10         ` SZEDER Gábor
@ 2018-06-15 17:21           ` Jeff King
  2018-06-15 19:42             ` Junio C Hamano
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-06-15 17:21 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, Luat Nguyen, Git mailing list

On Fri, Jun 15, 2018 at 07:10:28PM +0200, SZEDER Gábor wrote:

> > I said "yuck" because the original does not work if there happen to
> > be more than (or for that matter, less than) one '.bitmap' file
> > there.  But at least as long as there is one, it should work ;-)
> 
> Well, the test starts with 'git repack -ad', so there can be only one
> bitmap file.  (Unless something is broken, of course, but the second
> test would catch that much earlier.)

Right. I almost put "head -1" in there, but we know that we just created
a single bitmap. And my thought was that if we don't, the test would
blow up horribly, which is exactly what you want.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-15 17:05     ` Junio C Hamano
@ 2018-06-15 17:26       ` Jeff King
  2018-06-15 19:44         ` Junio C Hamano
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-06-15 17:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Luat Nguyen, git

On Fri, Jun 15, 2018 at 10:05:42AM -0700, Junio C Hamano wrote:

> > -	memcpy(self->buffer, ptr, self->buffer_size * sizeof(eword_t));
> > -	ptr += self->buffer_size * sizeof(eword_t);
> 
> 
> > +	data_len = st_mult(self->buffer_size, sizeof(eword_t));
> 
> This is a faithful conversion from the original, but I somehow would
> have appreciated if the latter were not sizeof(eword_t) but rather
> sizeof(self->buffer_size[0]), especially as I wondered ...

I actually thought about going the _other_ way. The sizeof(eword_t) is
not something we can change, but is actually decided by the on-disk
format.  So I wondered if this should be much more clearly "hey, this is
8 bytes". Possibly with an assert(sizeof(*self->buffer_size) == 8).

And yes, I think having the on-disk format specify the size in 8-byte
double words is vaguely crazy. Blame JGit. ;) Or maybe even blame the
original EWAH authors, this may have originated even earlier (I didn't
dig).

> > +	if (len < data_len)
> > +		return error("corrupt ewah bitmap: eof in data "
> > +			     "(%"PRIuMAX" bytes short)",
> > +			     (uintmax_t)(data_len - len));
> > +	memcpy(self->buffer, ptr, data_len);
> > +	ptr += data_len;
> > +	len -= data_len;
> >  
> >  	for (i = 0; i < self->buffer_size; ++i)
> >  		self->buffer[i] = ntohll(self->buffer[i]);
> 
> ... what individual datum one iteration of this loop is copying, and
> then realized "buffer_size" is a misleading field name (anything that 
> claims to be size and not measuring in bytes is misleading to me ;-).

Yeah, it confused me at first, too. I don't mind changing these kinds of
cosmetics, but I'd like to do it in a separate patch from this fix.

> > -	return (3 * 4) + (self->buffer_size * 8);
> > +	return ptr - (const uint8_t *)map;
> 
> Much nicer; I needed to wonder what these 12 and 8 in the original are.

Me too. ;)

> >  int ewah_deserialize(struct ewah_bitmap *self, int fd);
> > -int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
> > +ssize_t ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
> 
> I double checked all the callers and made sure that they are already
> prepared to react sensibly to error returns, which is good.

Yep, modulo the int/ssize_t thing from the fourth patch.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/3] ewah: adjust callers of ewah_read_mmap()
  2018-06-15 16:41       ` Junio C Hamano
@ 2018-06-15 17:31         ` Jeff King
  2018-06-15 18:23           ` Derrick Stolee
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-06-15 17:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, Luat Nguyen, git

On Fri, Jun 15, 2018 at 09:41:46AM -0700, Junio C Hamano wrote:

> Derrick Stolee <stolee@gmail.com> writes:
> 
> > On 6/14/2018 11:44 PM, Jeff King wrote:
> >> The return value of ewah_read_mmap() is now an ssize_t,
> >> since we could (in theory) process up to 32GB of data. This
> >> would never happen in practice, but a corrupt or malicious
> >> .bitmap or index file could convince us to do so.
> >
> > Aside: Why a 32GB limit? Usually I see 32-bit issues limiting by 2
> > (signed) or 4 GB (unsigned). Is there something specifically in the
> > bitmap format that stops at this size?
> 
> The proposed log message for 1/3 has this bit
> 
>   - check the size for integer overflow (this should be
>     impossible on 64-bit, as the size is given as a 32-bit
>     count of 8-byte words, but is possible on a 32-bit
>     system)
> 
> 4 Giga 8-byte words makes 32 Giga bytes, I'd guess.

Yes, exactly. It's definitely an odd size. I think using the same
varints we use in the packfile would be more efficient (and they're
already variable-length records), though we tend to have few enough of
these that it probably doesn't matter.

I think a more useful change for the bitmap format would be some kind of
index. IIRC, right now readers have to linearly scan the whole file in
order to use a single bitmap.

With Stolee's commit-metadata files, we could potentially move to
storing reachability bitmaps as a data element there. But there are two
complications:

 - the bitmaps themselves are dependent on having an ordered list of
   objects (one per bit) to talk about. And that's why they're
   integrated with packfiles, since that provides a constrained universe
   with a set ordering.

 - the existing storage isn't entirely independent between bitmaps. Some
   of them are basically "deltas" off of nearby bitmaps.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/3] ewah: drop ewah_serialize_native function
  2018-06-15 14:15       ` [PATCH 3/3] ewah: drop ewah_serialize_native function Derrick Stolee
@ 2018-06-15 17:51         ` Jeff King
  2018-06-15 18:33           ` Junio C Hamano
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-06-15 17:51 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Ramsay Jones, Luat Nguyen, git

On Fri, Jun 15, 2018 at 10:15:50AM -0400, Derrick Stolee wrote:

> > Hmm, if you are in the mood to drop ewah dead code, how about:
> > 
> >    ewah/bitmap.o   - bitmap_clear
> >    ewah/bitmap.o   - bitmap_each_bit
> >    ewah/ewah_bitmap.o      - ewah_and
> >    ewah/ewah_bitmap.o      - ewah_and_not
> >    ewah/ewah_bitmap.o      - ewah_not
> >    ewah/ewah_bitmap.o      - ewah_or
> > 
> > ... in addition to these *(de)serialize* functions. ;-)
> > 
> 
> Normally, I would say we should keep this folder as close to the "original"
> [1] as possible, so we could more easily take improvements to that library.
> However, it appears that code is not being updated. Perhaps this is in favor
> of other EWAH libraries [2] or other compressed bitmap formats [3].

The ewok library was written specifically for Git, as a port of Daniel
Lemire's original EWAH work, since at the time there was no pure-C
implementation of EWAH (that might even still be the case -- I don't
know).

We pulled it out into a separate library with the thought that maybe
other people would find it useful. But I don't specifically know of any
other users of the library, nor do I think Vicent has spent any time on
it since writing it for Git (I know I haven't).

So yeah, I think it is fine to just consider this "our" code and chop it
up as we see fit (and you can see we've already tweaked it for things
like allocating consistently with our other code).

One thing that gave me pause on ripping out more code is that I have
some bitmap-related patches on my send-to-upstream list, and I wasn't
sure if they used any of this code. But I checked against your patches,
and no, this can all go (which makes sense -- my patches are about using
.bitmap files in more places, so they build at a higher level).

So your patches look good to me, modulo the declarations that Ramsay
noticed should be removed, too.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/3] ewah: adjust callers of ewah_read_mmap()
  2018-06-15 17:31         ` Jeff King
@ 2018-06-15 18:23           ` Derrick Stolee
  2018-06-15 20:38             ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 18:23 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Luat Nguyen, git

On 6/15/2018 1:31 PM, Jeff King wrote:
> On Fri, Jun 15, 2018 at 09:41:46AM -0700, Junio C Hamano wrote:
>
>> Derrick Stolee <stolee@gmail.com> writes:
>>
>>> On 6/14/2018 11:44 PM, Jeff King wrote:
>>>> The return value of ewah_read_mmap() is now an ssize_t,
>>>> since we could (in theory) process up to 32GB of data. This
>>>> would never happen in practice, but a corrupt or malicious
>>>> .bitmap or index file could convince us to do so.
>>> Aside: Why a 32GB limit? Usually I see 32-bit issues limiting by 2
>>> (signed) or 4 GB (unsigned). Is there something specifically in the
>>> bitmap format that stops at this size?
>> The proposed log message for 1/3 has this bit
>>
>>    - check the size for integer overflow (this should be
>>      impossible on 64-bit, as the size is given as a 32-bit
>>      count of 8-byte words, but is possible on a 32-bit
>>      system)
>>
>> 4 Giga 8-byte words makes 32 Giga bytes, I'd guess.
> Yes, exactly. It's definitely an odd size. I think using the same
> varints we use in the packfile would be more efficient (and they're
> already variable-length records), though we tend to have few enough of
> these that it probably doesn't matter.
>
> I think a more useful change for the bitmap format would be some kind of
> index. IIRC, right now readers have to linearly scan the whole file in
> order to use a single bitmap.
>
> With Stolee's commit-metadata files, we could potentially move to
> storing reachability bitmaps as a data element there. But there are two
> complications:
>
>   - the bitmaps themselves are dependent on having an ordered list of
>     objects (one per bit) to talk about. And that's why they're
>     integrated with packfiles, since that provides a constrained universe
>     with a set ordering.
>
>   - the existing storage isn't entirely independent between bitmaps. Some
>     of them are basically "deltas" off of nearby bitmaps.

The VSTS reachability bitmap is different from the Git bitmap in a few ways.

1. It uses Roaring+Run bitmaps [1] instead of EWAH bitmap. This format 
is simpler (in my opinion) in several ways, especially in how it splits 
the bitmap into 16-bit address ranges and compresses each on its own. 
This makes set containment queries much faster, as we can navigate 
directly to the region that is important (and then binary search if that 
chunk is an "array" or "run" chunk). I say this is simpler because I can 
explain the entire compression format to you in five minutes and a 
whiteboard. The paper [2] is a simple enough read (start at the "Roaring 
Bitmap" section at the end of page 4).

2. Our file uses a chunk-based approach similar to the commit-graph 
file: one chunk is simply a list of the commits that have bitmaps. 
Another chunk is the raw binary data of all bitmaps concatenated 
together. The last chunk connects the other two chunks by a) providing 
an offset into the binary data for the bitmap at that commit, and b) the 
commit that provides a delta base for the bitmap.

If we are considering changing the reachability bitmap, then I'm very 
intrigued. I think the number one thing to consider is to use the 
multi-pack-index as a reference point (with a stable object order) so 
the objects can be repacked independently from the reachability bitmap 
computation. If we are changing the model at that level, then it is 
worth thinking about other questions, like how we index the file or how 
we compress the bitmaps.

Thanks,
-Stolee

[1] https://roaringbitmap.org/about/

[2] https://arxiv.org/abs/1603.06549
     Consistently faster and smaller compressed bitmaps with Roaring

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 0/7] Delete unused methods in EWAH bitmap
  2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
                             ` (8 preceding siblings ...)
  2018-06-15 14:35           ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
@ 2018-06-15 18:27           ` Derrick Stolee
  2018-06-15 18:27             ` [PATCH v2 1/7] ewah/bitmap.c: delete unused 'bitmap_clear()' Derrick Stolee
                               ` (7 more replies)
  9 siblings, 8 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 18:27 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, gitster@pobox.com, ramsay@ramsayjones.plus.com,
	Derrick Stolee

The EWAH bitmap code includes several logical operations that are
important for a general-purpose bitmap library. However, Git only
uses the XOR operation for storing deltas between reachability
bitmaps. This means that we can delete the following unused methods:

* ewah_and()
* ewah_and_not()
* ewah_not()
* ewah_or()
* ewah_serialize()

We can also delete the unused methods bitmap_clear() and
bitmap_each_bit().

Derrick Stolee (7):
  ewah/bitmap.c: delete unused 'bitmap_clear()'
  ewah/bitmap.c: delete unused 'bitmap_each_bit()'
  ewah_bitmap: delete unused 'ewah_and()'
  ewah_bitmap: delete unused 'ewah_and_not()'
  ewah_bitmap: delete unused 'ewah_not()'
  ewah_bitmap: delete unused 'ewah_or()'
  ewah_io: delete unused 'ewah_serialize()'

 ewah/bitmap.c      |  32 -------
 ewah/ewah_bitmap.c | 229 ---------------------------------------------
 ewah/ewah_io.c     |  10 --
 ewah/ewok.h        |  25 -----
 4 files changed, 296 deletions(-)


base-commit: fc54c1af3ec09bab8b8ea09768c2da4069b7f53e
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 1/7] ewah/bitmap.c: delete unused 'bitmap_clear()'
  2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
@ 2018-06-15 18:27             ` Derrick Stolee
  2018-06-15 18:27             ` [PATCH v2 2/7] ewah/bitmap.c: delete unused 'bitmap_each_bit()' Derrick Stolee
                               ` (6 subsequent siblings)
  7 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 18:27 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, gitster@pobox.com, ramsay@ramsayjones.plus.com,
	Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/bitmap.c | 8 --------
 ewah/ewok.h   | 1 -
 2 files changed, 9 deletions(-)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 756bdd050e..d61dc6114a 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -45,14 +45,6 @@ void bitmap_set(struct bitmap *self, size_t pos)
 	self->words[block] |= EWAH_MASK(pos);
 }
 
-void bitmap_clear(struct bitmap *self, size_t pos)
-{
-	size_t block = EWAH_BLOCK(pos);
-
-	if (block < self->word_alloc)
-		self->words[block] &= ~EWAH_MASK(pos);
-}
-
 int bitmap_get(struct bitmap *self, size_t pos)
 {
 	size_t block = EWAH_BLOCK(pos);
diff --git a/ewah/ewok.h b/ewah/ewok.h
index dc43d05b64..2d25e93b9e 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -204,7 +204,6 @@ struct bitmap {
 
 struct bitmap *bitmap_new(void);
 void bitmap_set(struct bitmap *self, size_t pos);
-void bitmap_clear(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_reset(struct bitmap *self);
 void bitmap_free(struct bitmap *self);
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 2/7] ewah/bitmap.c: delete unused 'bitmap_each_bit()'
  2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
  2018-06-15 18:27             ` [PATCH v2 1/7] ewah/bitmap.c: delete unused 'bitmap_clear()' Derrick Stolee
@ 2018-06-15 18:27             ` Derrick Stolee
  2018-06-15 18:27             ` [PATCH v2 3/7] ewah_bitmap: delete unused 'ewah_and()' Derrick Stolee
                               ` (5 subsequent siblings)
  7 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 18:27 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, gitster@pobox.com, ramsay@ramsayjones.plus.com,
	Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/bitmap.c | 24 ------------------------
 ewah/ewok.h   |  1 -
 2 files changed, 25 deletions(-)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index d61dc6114a..52f1178db4 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -129,30 +129,6 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other)
 		self->words[i++] |= word;
 }
 
-void bitmap_each_bit(struct bitmap *self, ewah_callback callback, void *data)
-{
-	size_t pos = 0, i;
-
-	for (i = 0; i < self->word_alloc; ++i) {
-		eword_t word = self->words[i];
-		uint32_t offset;
-
-		if (word == (eword_t)~0) {
-			for (offset = 0; offset < BITS_IN_EWORD; ++offset)
-				callback(pos++, data);
-		} else {
-			for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
-				if ((word >> offset) == 0)
-					break;
-
-				offset += ewah_bit_ctz64(word >> offset);
-				callback(pos + offset, data);
-			}
-			pos += BITS_IN_EWORD;
-		}
-	}
-}
-
 size_t bitmap_popcount(struct bitmap *self)
 {
 	size_t i, count = 0;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 2d25e93b9e..71704ed37f 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -217,7 +217,6 @@ void bitmap_and_not(struct bitmap *self, struct bitmap *other);
 void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other);
 void bitmap_or(struct bitmap *self, const struct bitmap *other);
 
-void bitmap_each_bit(struct bitmap *self, ewah_callback callback, void *data);
 size_t bitmap_popcount(struct bitmap *self);
 
 #endif
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 3/7] ewah_bitmap: delete unused 'ewah_and()'
  2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
  2018-06-15 18:27             ` [PATCH v2 1/7] ewah/bitmap.c: delete unused 'bitmap_clear()' Derrick Stolee
  2018-06-15 18:27             ` [PATCH v2 2/7] ewah/bitmap.c: delete unused 'bitmap_each_bit()' Derrick Stolee
@ 2018-06-15 18:27             ` Derrick Stolee
  2018-06-15 18:27             ` [PATCH v2 4/7] ewah_bitmap: delete unused 'ewah_and_not()' Derrick Stolee
                               ` (4 subsequent siblings)
  7 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 18:27 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, gitster@pobox.com, ramsay@ramsayjones.plus.com,
	Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_bitmap.c | 68 ----------------------------------------------
 ewah/ewok.h        |  5 ----
 2 files changed, 73 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index b9fdda1d3d..5a4d3a6eb6 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -459,74 +459,6 @@ void ewah_xor(
 	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
 }
 
-void ewah_and(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out)
-{
-	struct rlw_iterator rlw_i;
-	struct rlw_iterator rlw_j;
-	size_t literals;
-
-	rlwit_init(&rlw_i, ewah_i);
-	rlwit_init(&rlw_j, ewah_j);
-
-	while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
-		while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
-			struct rlw_iterator *prey, *predator;
-
-			if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
-				prey = &rlw_i;
-				predator = &rlw_j;
-			} else {
-				prey = &rlw_j;
-				predator = &rlw_i;
-			}
-
-			if (predator->rlw.running_bit == 0) {
-				ewah_add_empty_words(out, 0,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(prey,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			} else {
-				size_t index = rlwit_discharge(prey, out,
-					predator->rlw.running_len, 0);
-				ewah_add_empty_words(out, 0,
-					predator->rlw.running_len - index);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			}
-		}
-
-		literals = min_size(
-			rlw_i.rlw.literal_words,
-			rlw_j.rlw.literal_words);
-
-		if (literals) {
-			size_t k;
-
-			for (k = 0; k < literals; ++k) {
-				ewah_add(out,
-					rlw_i.buffer[rlw_i.literal_word_start + k] &
-					rlw_j.buffer[rlw_j.literal_word_start + k]
-				);
-			}
-
-			rlwit_discard_first_words(&rlw_i, literals);
-			rlwit_discard_first_words(&rlw_j, literals);
-		}
-	}
-
-	if (rlwit_word_size(&rlw_i) > 0)
-		rlwit_discharge_empty(&rlw_i, out);
-	else
-		rlwit_discharge_empty(&rlw_j, out);
-
-	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
-}
-
 void ewah_and_not(
 	struct ewah_bitmap *ewah_i,
 	struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 71704ed37f..f4ada6979e 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -179,11 +179,6 @@ void ewah_xor(
 	struct ewah_bitmap *ewah_j,
 	struct ewah_bitmap *out);
 
-void ewah_and(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out);
-
 /**
  * Direct word access
  */
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 4/7] ewah_bitmap: delete unused 'ewah_and_not()'
  2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
                               ` (2 preceding siblings ...)
  2018-06-15 18:27             ` [PATCH v2 3/7] ewah_bitmap: delete unused 'ewah_and()' Derrick Stolee
@ 2018-06-15 18:27             ` Derrick Stolee
  2018-06-15 18:27             ` [PATCH v2 5/7] ewah_bitmap: delete unused 'ewah_not()' Derrick Stolee
                               ` (3 subsequent siblings)
  7 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 18:27 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, gitster@pobox.com, ramsay@ramsayjones.plus.com,
	Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_bitmap.c | 73 ----------------------------------------------
 ewah/ewok.h        |  5 ----
 2 files changed, 78 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 5a4d3a6eb6..559adde37c 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -459,79 +459,6 @@ void ewah_xor(
 	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
 }
 
-void ewah_and_not(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out)
-{
-	struct rlw_iterator rlw_i;
-	struct rlw_iterator rlw_j;
-	size_t literals;
-
-	rlwit_init(&rlw_i, ewah_i);
-	rlwit_init(&rlw_j, ewah_j);
-
-	while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
-		while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
-			struct rlw_iterator *prey, *predator;
-
-			if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
-				prey = &rlw_i;
-				predator = &rlw_j;
-			} else {
-				prey = &rlw_j;
-				predator = &rlw_i;
-			}
-
-			if ((predator->rlw.running_bit && prey == &rlw_i) ||
-				(!predator->rlw.running_bit && prey != &rlw_i)) {
-				ewah_add_empty_words(out, 0,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(prey,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			} else {
-				size_t index;
-				int negate_words;
-
-				negate_words = (&rlw_i != prey);
-				index = rlwit_discharge(prey, out,
-					predator->rlw.running_len, negate_words);
-				ewah_add_empty_words(out, negate_words,
-					predator->rlw.running_len - index);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			}
-		}
-
-		literals = min_size(
-			rlw_i.rlw.literal_words,
-			rlw_j.rlw.literal_words);
-
-		if (literals) {
-			size_t k;
-
-			for (k = 0; k < literals; ++k) {
-				ewah_add(out,
-					rlw_i.buffer[rlw_i.literal_word_start + k] &
-					~(rlw_j.buffer[rlw_j.literal_word_start + k])
-				);
-			}
-
-			rlwit_discard_first_words(&rlw_i, literals);
-			rlwit_discard_first_words(&rlw_j, literals);
-		}
-	}
-
-	if (rlwit_word_size(&rlw_i) > 0)
-		rlwit_discharge(&rlw_i, out, ~0, 0);
-	else
-		rlwit_discharge_empty(&rlw_j, out);
-
-	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
-}
-
 void ewah_or(
 	struct ewah_bitmap *ewah_i,
 	struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index f4ada6979e..8dae8a608c 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -169,11 +169,6 @@ void ewah_or(
 	struct ewah_bitmap *ewah_j,
 	struct ewah_bitmap *out);
 
-void ewah_and_not(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out);
-
 void ewah_xor(
 	struct ewah_bitmap *ewah_i,
 	struct ewah_bitmap *ewah_j,
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 5/7] ewah_bitmap: delete unused 'ewah_not()'
  2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
                               ` (3 preceding siblings ...)
  2018-06-15 18:27             ` [PATCH v2 4/7] ewah_bitmap: delete unused 'ewah_and_not()' Derrick Stolee
@ 2018-06-15 18:27             ` Derrick Stolee
  2018-06-15 18:27             ` [PATCH v2 6/7] ewah_bitmap: delete unused 'ewah_or()' Derrick Stolee
                               ` (2 subsequent siblings)
  7 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 18:27 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, gitster@pobox.com, ramsay@ramsayjones.plus.com,
	Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_bitmap.c | 19 -------------------
 ewah/ewok.h        |  7 -------
 2 files changed, 26 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 559adde37c..3fa3edf2a3 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -376,25 +376,6 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
 		read_new_rlw(it);
 }
 
-void ewah_not(struct ewah_bitmap *self)
-{
-	size_t pointer = 0;
-
-	while (pointer < self->buffer_size) {
-		eword_t *word = &self->buffer[pointer];
-		size_t literals, k;
-
-		rlw_xor_run_bit(word);
-		++pointer;
-
-		literals = rlw_get_literal_words(word);
-		for (k = 0; k < literals; ++k) {
-			self->buffer[pointer] = ~self->buffer[pointer];
-			++pointer;
-		}
-	}
-}
-
 void ewah_xor(
 	struct ewah_bitmap *ewah_i,
 	struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 8dae8a608c..a467645eec 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -95,13 +95,6 @@ int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
 
 uint32_t ewah_checksum(struct ewah_bitmap *self);
 
-/**
- * Logical not (bitwise negation) in-place on the bitmap
- *
- * This operation is linear time based on the size of the bitmap.
- */
-void ewah_not(struct ewah_bitmap *self);
-
 /**
  * Call the given callback with the position of every single bit
  * that has been set on the bitmap.
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 6/7] ewah_bitmap: delete unused 'ewah_or()'
  2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
                               ` (4 preceding siblings ...)
  2018-06-15 18:27             ` [PATCH v2 5/7] ewah_bitmap: delete unused 'ewah_not()' Derrick Stolee
@ 2018-06-15 18:27             ` Derrick Stolee
  2018-06-15 18:27             ` [PATCH v2 7/7] ewah_io: delete unused 'ewah_serialize()' Derrick Stolee
  2018-06-15 18:51             ` [PATCH v2 0/7] Delete unused methods in EWAH bitmap Junio C Hamano
  7 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 18:27 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, gitster@pobox.com, ramsay@ramsayjones.plus.com,
	Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_bitmap.c | 69 ----------------------------------------------
 ewah/ewok.h        |  5 ----
 2 files changed, 74 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 3fa3edf2a3..017c677f98 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -440,75 +440,6 @@ void ewah_xor(
 	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
 }
 
-void ewah_or(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out)
-{
-	struct rlw_iterator rlw_i;
-	struct rlw_iterator rlw_j;
-	size_t literals;
-
-	rlwit_init(&rlw_i, ewah_i);
-	rlwit_init(&rlw_j, ewah_j);
-
-	while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) {
-		while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) {
-			struct rlw_iterator *prey, *predator;
-
-			if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) {
-				prey = &rlw_i;
-				predator = &rlw_j;
-			} else {
-				prey = &rlw_j;
-				predator = &rlw_i;
-			}
-
-			if (predator->rlw.running_bit) {
-				ewah_add_empty_words(out, 0,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(prey,
-					predator->rlw.running_len);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			} else {
-				size_t index = rlwit_discharge(prey, out,
-					predator->rlw.running_len, 0);
-				ewah_add_empty_words(out, 0,
-					predator->rlw.running_len - index);
-				rlwit_discard_first_words(predator,
-					predator->rlw.running_len);
-			}
-		}
-
-		literals = min_size(
-			rlw_i.rlw.literal_words,
-			rlw_j.rlw.literal_words);
-
-		if (literals) {
-			size_t k;
-
-			for (k = 0; k < literals; ++k) {
-				ewah_add(out,
-					rlw_i.buffer[rlw_i.literal_word_start + k] |
-					rlw_j.buffer[rlw_j.literal_word_start + k]
-				);
-			}
-
-			rlwit_discard_first_words(&rlw_i, literals);
-			rlwit_discard_first_words(&rlw_j, literals);
-		}
-	}
-
-	if (rlwit_word_size(&rlw_i) > 0)
-		rlwit_discharge(&rlw_i, out, ~0, 0);
-	else
-		rlwit_discharge(&rlw_j, out, ~0, 0);
-
-	out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size);
-}
-
-
 #define BITMAP_POOL_MAX 16
 static struct ewah_bitmap *bitmap_pool[BITMAP_POOL_MAX];
 static size_t bitmap_pool_size;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index a467645eec..a540df2aa9 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -157,11 +157,6 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent);
  */
 int ewah_iterator_next(eword_t *next, struct ewah_iterator *it);
 
-void ewah_or(
-	struct ewah_bitmap *ewah_i,
-	struct ewah_bitmap *ewah_j,
-	struct ewah_bitmap *out);
-
 void ewah_xor(
 	struct ewah_bitmap *ewah_i,
 	struct ewah_bitmap *ewah_j,
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 7/7] ewah_io: delete unused 'ewah_serialize()'
  2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
                               ` (5 preceding siblings ...)
  2018-06-15 18:27             ` [PATCH v2 6/7] ewah_bitmap: delete unused 'ewah_or()' Derrick Stolee
@ 2018-06-15 18:27             ` Derrick Stolee
  2018-06-15 18:51             ` [PATCH v2 0/7] Delete unused methods in EWAH bitmap Junio C Hamano
  7 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 18:27 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, gitster@pobox.com, ramsay@ramsayjones.plus.com,
	Derrick Stolee

Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 ewah/ewah_io.c | 10 ----------
 ewah/ewok.h    |  1 -
 2 files changed, 11 deletions(-)

diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index bed1994551..86d3f1d02e 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -100,16 +100,6 @@ int ewah_serialize_to(struct ewah_bitmap *self,
 	return (3 * 4) + (self->buffer_size * 8);
 }
 
-static int write_helper(void *fd, const void *buf, size_t len)
-{
-	return write((intptr_t)fd, buf, len);
-}
-
-int ewah_serialize(struct ewah_bitmap *self, int fd)
-{
-	return ewah_serialize_to(self, write_helper, (void *)(intptr_t)fd);
-}
-
 static int write_strbuf(void *user_data, const void *data, size_t len)
 {
 	struct strbuf *sb = user_data;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index a540df2aa9..f3ac78ef1a 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -86,7 +86,6 @@ void ewah_free(struct ewah_bitmap *self);
 int ewah_serialize_to(struct ewah_bitmap *self,
 		      int (*write_fun)(void *out, const void *buf, size_t len),
 		      void *out);
-int ewah_serialize(struct ewah_bitmap *self, int fd);
 int ewah_serialize_native(struct ewah_bitmap *self, int fd);
 int ewah_serialize_strbuf(struct ewah_bitmap *self, struct strbuf *);
 
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/3] ewah: drop ewah_serialize_native function
  2018-06-15 17:51         ` Jeff King
@ 2018-06-15 18:33           ` Junio C Hamano
  2018-06-15 18:46             ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2018-06-15 18:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, Ramsay Jones, Luat Nguyen, git

Jeff King <peff@peff.net> writes:

> One thing that gave me pause on ripping out more code is that I have
> some bitmap-related patches on my send-to-upstream list, and I wasn't
> sure if they used any of this code. But I checked against your patches,
> and no, this can all go (which makes sense -- my patches are about using
> .bitmap files in more places, so they build at a higher level).
>
> So your patches look good to me, modulo the declarations that Ramsay
> noticed should be removed, too.

I'll queue your 1/3 and 4/3 (without 2&3/3) for now and let Derrick
and you handle the removal of unused stuff separately on top, so
that the fix-proper can graduate earlier than the rest.

Thanks.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/3] ewah: drop ewah_serialize_native function
  2018-06-15 18:33           ` Junio C Hamano
@ 2018-06-15 18:46             ` Jeff King
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-06-15 18:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, Ramsay Jones, Luat Nguyen, git

On Fri, Jun 15, 2018 at 11:33:14AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > One thing that gave me pause on ripping out more code is that I have
> > some bitmap-related patches on my send-to-upstream list, and I wasn't
> > sure if they used any of this code. But I checked against your patches,
> > and no, this can all go (which makes sense -- my patches are about using
> > .bitmap files in more places, so they build at a higher level).
> >
> > So your patches look good to me, modulo the declarations that Ramsay
> > noticed should be removed, too.
> 
> I'll queue your 1/3 and 4/3 (without 2&3/3) for now and let Derrick
> and you handle the removal of unused stuff separately on top, so
> that the fix-proper can graduate earlier than the rest.

Thanks, that sounds good. 2/3 is sort-of-related in that it has an
integer overflow bug like the one in 1/3. But the fact that it is not
used at all makes it very low priority. ;)

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 0/7] Delete unused methods in EWAH bitmap
  2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
                               ` (6 preceding siblings ...)
  2018-06-15 18:27             ` [PATCH v2 7/7] ewah_io: delete unused 'ewah_serialize()' Derrick Stolee
@ 2018-06-15 18:51             ` Junio C Hamano
  2018-06-15 18:56               ` Derrick Stolee
  7 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2018-06-15 18:51 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git@vger.kernel.org, peff@peff.net, ramsay@ramsayjones.plus.com

Derrick Stolee <dstolee@microsoft.com> writes:

> The EWAH bitmap code includes several logical operations that are
> important for a general-purpose bitmap library. However, Git only
> uses the XOR operation for storing deltas between reachability
> bitmaps. This means that we can delete the following unused methods:
>
> * ewah_and()
> * ewah_and_not()
> * ewah_not()
> * ewah_or()
> * ewah_serialize()
>
> We can also delete the unused methods bitmap_clear() and
> bitmap_each_bit().
>
> Derrick Stolee (7):
>   ewah/bitmap.c: delete unused 'bitmap_clear()'
>   ewah/bitmap.c: delete unused 'bitmap_each_bit()'
>   ewah_bitmap: delete unused 'ewah_and()'
>   ewah_bitmap: delete unused 'ewah_and_not()'
>   ewah_bitmap: delete unused 'ewah_not()'
>   ewah_bitmap: delete unused 'ewah_or()'
>   ewah_io: delete unused 'ewah_serialize()'
>
>  ewah/bitmap.c      |  32 -------
>  ewah/ewah_bitmap.c | 229 ---------------------------------------------
>  ewah/ewah_io.c     |  10 --
>  ewah/ewok.h        |  25 -----
>  4 files changed, 296 deletions(-)
>
>
> base-commit: fc54c1af3ec09bab8b8ea09768c2da4069b7f53e

Thanks.  

ewah_clear() can become file-scope static, and
rlwit_discharge_empty() can be eliminated.  I do not know if either
is worth doing, though.

 ewah/ewah_bitmap.c | 20 ++++++++++++--------
 ewah/ewah_rlw.c    |  8 --------
 ewah/ewok.h        |  6 ------
 ewah/ewok_rlw.h    |  1 -
 4 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 017c677f98..d59b1afe3d 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -276,6 +276,18 @@ void ewah_each_bit(struct ewah_bitmap *self, void (*callback)(size_t, void*), vo
 	}
 }
 
+/**
+ * Clear all the bits in the bitmap. Does not free or resize
+ * memory.
+ */
+static void ewah_clear(struct ewah_bitmap *self)
+{
+	self->buffer_size = 1;
+	self->buffer[0] = 0;
+	self->bit_size = 0;
+	self->rlw = self->buffer;
+}
+
 struct ewah_bitmap *ewah_new(void)
 {
 	struct ewah_bitmap *self;
@@ -288,14 +300,6 @@ struct ewah_bitmap *ewah_new(void)
 	return self;
 }
 
-void ewah_clear(struct ewah_bitmap *self)
-{
-	self->buffer_size = 1;
-	self->buffer[0] = 0;
-	self->bit_size = 0;
-	self->rlw = self->buffer;
-}
-
 void ewah_free(struct ewah_bitmap *self)
 {
 	if (!self)
diff --git a/ewah/ewah_rlw.c b/ewah/ewah_rlw.c
index b9643b7d0f..5093d43e2f 100644
--- a/ewah/ewah_rlw.c
+++ b/ewah/ewah_rlw.c
@@ -104,11 +104,3 @@ size_t rlwit_discharge(
 
 	return index;
 }
-
-void rlwit_discharge_empty(struct rlw_iterator *it, struct ewah_bitmap *out)
-{
-	while (rlwit_word_size(it) > 0) {
-		ewah_add_empty_words(out, 0, rlwit_word_size(it));
-		rlwit_discard_first_words(it, rlwit_word_size(it));
-	}
-}
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 0c504f28e2..84b2a29faa 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -72,12 +72,6 @@ void ewah_pool_free(struct ewah_bitmap *self);
  */
 struct ewah_bitmap *ewah_new(void);
 
-/**
- * Clear all the bits in the bitmap. Does not free or resize
- * memory.
- */
-void ewah_clear(struct ewah_bitmap *self);
-
 /**
  * Free all the memory of the bitmap
  */
diff --git a/ewah/ewok_rlw.h b/ewah/ewok_rlw.h
index bb3c6ff7e0..7cdfdd0c02 100644
--- a/ewah/ewok_rlw.h
+++ b/ewah/ewok_rlw.h
@@ -98,7 +98,6 @@ void rlwit_init(struct rlw_iterator *it, struct ewah_bitmap *bitmap);
 void rlwit_discard_first_words(struct rlw_iterator *it, size_t x);
 size_t rlwit_discharge(
 	struct rlw_iterator *it, struct ewah_bitmap *out, size_t max, int negate);
-void rlwit_discharge_empty(struct rlw_iterator *it, struct ewah_bitmap *out);
 
 static inline size_t rlwit_word_size(struct rlw_iterator *it)
 {



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 0/7] Delete unused methods in EWAH bitmap
  2018-06-15 18:51             ` [PATCH v2 0/7] Delete unused methods in EWAH bitmap Junio C Hamano
@ 2018-06-15 18:56               ` Derrick Stolee
  2018-06-15 19:48                 ` Junio C Hamano
  0 siblings, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-06-15 18:56 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee
  Cc: git@vger.kernel.org, peff@peff.net, ramsay@ramsayjones.plus.com

On 6/15/2018 2:51 PM, Junio C Hamano wrote:
> Derrick Stolee <dstolee@microsoft.com> writes:
>
>> The EWAH bitmap code includes several logical operations that are
>> important for a general-purpose bitmap library. However, Git only
>> uses the XOR operation for storing deltas between reachability
>> bitmaps. This means that we can delete the following unused methods:
>>
>> * ewah_and()
>> * ewah_and_not()
>> * ewah_not()
>> * ewah_or()
>> * ewah_serialize()
>>
>> We can also delete the unused methods bitmap_clear() and
>> bitmap_each_bit().
>>
>> Derrick Stolee (7):
>>    ewah/bitmap.c: delete unused 'bitmap_clear()'
>>    ewah/bitmap.c: delete unused 'bitmap_each_bit()'
>>    ewah_bitmap: delete unused 'ewah_and()'
>>    ewah_bitmap: delete unused 'ewah_and_not()'
>>    ewah_bitmap: delete unused 'ewah_not()'
>>    ewah_bitmap: delete unused 'ewah_or()'
>>    ewah_io: delete unused 'ewah_serialize()'
>>
>>   ewah/bitmap.c      |  32 -------
>>   ewah/ewah_bitmap.c | 229 ---------------------------------------------
>>   ewah/ewah_io.c     |  10 --
>>   ewah/ewok.h        |  25 -----
>>   4 files changed, 296 deletions(-)
>>
>>
>> base-commit: fc54c1af3ec09bab8b8ea09768c2da4069b7f53e
> Thanks.
>
> ewah_clear() can become file-scope static, and
> rlwit_discharge_empty() can be eliminated.  I do not know if either
> is worth doing, though.

With Peff's patches, this is true. When I applied your diff to my patch 
alone we could not do that.

If you want to create a commit with the below, I'm happy to add my 
"Reviewed-by: Derrick Stolee <dstolee@microsoft.com>"

Good team effort today!

>
>   ewah/ewah_bitmap.c | 20 ++++++++++++--------
>   ewah/ewah_rlw.c    |  8 --------
>   ewah/ewok.h        |  6 ------
>   ewah/ewok_rlw.h    |  1 -
>   4 files changed, 12 insertions(+), 23 deletions(-)
>
> diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
> index 017c677f98..d59b1afe3d 100644
> --- a/ewah/ewah_bitmap.c
> +++ b/ewah/ewah_bitmap.c
> @@ -276,6 +276,18 @@ void ewah_each_bit(struct ewah_bitmap *self, void (*callback)(size_t, void*), vo
>   	}
>   }
>   
> +/**
> + * Clear all the bits in the bitmap. Does not free or resize
> + * memory.
> + */
> +static void ewah_clear(struct ewah_bitmap *self)
> +{
> +	self->buffer_size = 1;
> +	self->buffer[0] = 0;
> +	self->bit_size = 0;
> +	self->rlw = self->buffer;
> +}
> +
>   struct ewah_bitmap *ewah_new(void)
>   {
>   	struct ewah_bitmap *self;
> @@ -288,14 +300,6 @@ struct ewah_bitmap *ewah_new(void)
>   	return self;
>   }
>   
> -void ewah_clear(struct ewah_bitmap *self)
> -{
> -	self->buffer_size = 1;
> -	self->buffer[0] = 0;
> -	self->bit_size = 0;
> -	self->rlw = self->buffer;
> -}
> -
>   void ewah_free(struct ewah_bitmap *self)
>   {
>   	if (!self)
> diff --git a/ewah/ewah_rlw.c b/ewah/ewah_rlw.c
> index b9643b7d0f..5093d43e2f 100644
> --- a/ewah/ewah_rlw.c
> +++ b/ewah/ewah_rlw.c
> @@ -104,11 +104,3 @@ size_t rlwit_discharge(
>   
>   	return index;
>   }
> -
> -void rlwit_discharge_empty(struct rlw_iterator *it, struct ewah_bitmap *out)
> -{
> -	while (rlwit_word_size(it) > 0) {
> -		ewah_add_empty_words(out, 0, rlwit_word_size(it));
> -		rlwit_discard_first_words(it, rlwit_word_size(it));
> -	}
> -}
> diff --git a/ewah/ewok.h b/ewah/ewok.h
> index 0c504f28e2..84b2a29faa 100644
> --- a/ewah/ewok.h
> +++ b/ewah/ewok.h
> @@ -72,12 +72,6 @@ void ewah_pool_free(struct ewah_bitmap *self);
>    */
>   struct ewah_bitmap *ewah_new(void);
>   
> -/**
> - * Clear all the bits in the bitmap. Does not free or resize
> - * memory.
> - */
> -void ewah_clear(struct ewah_bitmap *self);
> -
>   /**
>    * Free all the memory of the bitmap
>    */
> diff --git a/ewah/ewok_rlw.h b/ewah/ewok_rlw.h
> index bb3c6ff7e0..7cdfdd0c02 100644
> --- a/ewah/ewok_rlw.h
> +++ b/ewah/ewok_rlw.h
> @@ -98,7 +98,6 @@ void rlwit_init(struct rlw_iterator *it, struct ewah_bitmap *bitmap);
>   void rlwit_discard_first_words(struct rlw_iterator *it, size_t x);
>   size_t rlwit_discharge(
>   	struct rlw_iterator *it, struct ewah_bitmap *out, size_t max, int negate);
> -void rlwit_discharge_empty(struct rlw_iterator *it, struct ewah_bitmap *out);
>   
>   static inline size_t rlwit_word_size(struct rlw_iterator *it)
>   {
>
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-15 17:21           ` Jeff King
@ 2018-06-15 19:42             ` Junio C Hamano
  0 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2018-06-15 19:42 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Luat Nguyen, Git mailing list

Jeff King <peff@peff.net> writes:

> On Fri, Jun 15, 2018 at 07:10:28PM +0200, SZEDER Gábor wrote:
>
>> > I said "yuck" because the original does not work if there happen to
>> > be more than (or for that matter, less than) one '.bitmap' file
>> > there.  But at least as long as there is one, it should work ;-)
>> 
>> Well, the test starts with 'git repack -ad', so there can be only one
>> bitmap file.  (Unless something is broken, of course, but the second
>> test would catch that much earlier.)
>
> Right. I almost put "head -1" in there, but we know that we just created
> a single bitmap. And my thought was that if we don't, the test would
> blow up horribly, which is exactly what you want.

Yes, and "file=$(ls gl*b) &&" breaks if there is no matching file
(iow, I said 'for that matter, less than' which was incorrect), so
the original is OK in practice.

Thanks.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-15 17:26       ` Jeff King
@ 2018-06-15 19:44         ` Junio C Hamano
  0 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2018-06-15 19:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Luat Nguyen, git

Jeff King <peff@peff.net> writes:

> On Fri, Jun 15, 2018 at 10:05:42AM -0700, Junio C Hamano wrote:
>
>> > -	memcpy(self->buffer, ptr, self->buffer_size * sizeof(eword_t));
>> > -	ptr += self->buffer_size * sizeof(eword_t);
>> 
>> 
>> > +	data_len = st_mult(self->buffer_size, sizeof(eword_t));
>> 
>> This is a faithful conversion from the original, but I somehow would
>> have appreciated if the latter were not sizeof(eword_t) but rather
>> sizeof(self->buffer_size[0]), especially as I wondered ...
>
> I actually thought about going the _other_ way. The sizeof(eword_t) is
> not something we can change, but is actually decided by the on-disk
> format.  So I wondered if this should be much more clearly "hey, this is
> 8 bytes". Possibly with an assert(sizeof(*self->buffer_size) == 8).

Hmph, that's a thought.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 0/7] Delete unused methods in EWAH bitmap
  2018-06-15 18:56               ` Derrick Stolee
@ 2018-06-15 19:48                 ` Junio C Hamano
  2018-06-15 20:35                   ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2018-06-15 19:48 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee, git@vger.kernel.org, peff@peff.net,
	ramsay@ramsayjones.plus.com

Derrick Stolee <stolee@gmail.com> writes:

>> ewah_clear() can become file-scope static, and
>> rlwit_discharge_empty() can be eliminated.  I do not know if either
>> is worth doing, though.
>
> With Peff's patches, this is true. When I applied your diff to my
> patch alone we could not do that.

Yeah, as I organized the patches on two topics in this order:

  jk/ewah-bounds-check (build on 'maint')
    Peff's 1/3
    Peff's 4/3

  ds/ewah-cleanup (build on top of the above)
    7 patches in this series
    Peff's 2/3
    Peff's 3/3

at the end, these two changes become possible.  Again, I am not sure
if these are worth doing, so I'll leave it out of these two series,
at least for now.

This is a tangent, but I too felt that roaring paper quite a
pleasant read ;-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 0/7] Delete unused methods in EWAH bitmap
  2018-06-15 19:48                 ` Junio C Hamano
@ 2018-06-15 20:35                   ` Jeff King
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-06-15 20:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Derrick Stolee, git@vger.kernel.org,
	ramsay@ramsayjones.plus.com

On Fri, Jun 15, 2018 at 12:48:52PM -0700, Junio C Hamano wrote:

> Derrick Stolee <stolee@gmail.com> writes:
> 
> >> ewah_clear() can become file-scope static, and
> >> rlwit_discharge_empty() can be eliminated.  I do not know if either
> >> is worth doing, though.
> >
> > With Peff's patches, this is true. When I applied your diff to my
> > patch alone we could not do that.
> 
> Yeah, as I organized the patches on two topics in this order:
> 
>   jk/ewah-bounds-check (build on 'maint')
>     Peff's 1/3
>     Peff's 4/3
> 
>   ds/ewah-cleanup (build on top of the above)
>     7 patches in this series
>     Peff's 2/3
>     Peff's 3/3

Thanks, I wasn't sure if I should resend mine on top, but it looks like
you've got it all organized already.

> at the end, these two changes become possible.  Again, I am not sure
> if these are worth doing, so I'll leave it out of these two series,
> at least for now.

IMHO the extra cleanups you showed are worth doing.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/3] ewah: adjust callers of ewah_read_mmap()
  2018-06-15 18:23           ` Derrick Stolee
@ 2018-06-15 20:38             ` Jeff King
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-06-15 20:38 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, Luat Nguyen, git

On Fri, Jun 15, 2018 at 02:23:44PM -0400, Derrick Stolee wrote:

> If we are considering changing the reachability bitmap, then I'm very
> intrigued. I think the number one thing to consider is to use the
> multi-pack-index as a reference point (with a stable object order) so the
> objects can be repacked independently from the reachability bitmap
> computation. If we are changing the model at that level, then it is worth
> thinking about other questions, like how we index the file or how we
> compress the bitmaps.

I'm open to a new format if it provides significant improvements over
the existing one. I think the existing bitmaps have served us well for
several years, but they do have a few weaknesses. Some of which I
mentioned before, but the most obvious one is that being very
pack-oriented they require repacking to update (and don't handle
cross-pack reachability at all). I know that doesn't fly for
Windows-sized repos at all, but it would also be nice if we could do
incremental updates more cheaply (e.g., after every push instead of just
once a day).

The Roaring stuff looks really interesting. I'm curious about the stable
object order you guys use. Because EWAH is basically run-length-encoding,
it benefits hugely from having the bitmaps in pack order (where there's
a enormous locality with respect to reachability) as opposed to sha1
order (where it's essentially random).

Is your stable object order based on traversing the commit graph? Or
does Roaring do a sufficiently better job of compressing the jumbled
sha1 order?

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-15  3:31   ` [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads Jeff King
  2018-06-15  9:14     ` SZEDER Gábor
  2018-06-15 17:05     ` Junio C Hamano
@ 2018-06-16 14:35     ` SZEDER Gábor
  2018-06-16 19:14       ` Jeff King
  2 siblings, 1 reply; 58+ messages in thread
From: SZEDER Gábor @ 2018-06-16 14:35 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Luat Nguyen, git

> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 423c0a475f..237ee6e5fc 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -331,4 +331,17 @@ test_expect_success 'pack reuse respects --incremental' '
>  	git show-index <empty.idx >actual &&
>  	test_cmp expect actual
>  '
> +
> +test_expect_success 'truncated bitmap fails gracefully' '
> +	git repack -ad &&
> +	git rev-list --use-bitmap-index --count --all >expect &&
> +	bitmap=$(ls .git/objects/pack/*.bitmap) &&
> +	test_when_finished "rm -f $bitmap" &&
> +	head -c 512 <$bitmap >$bitmap.tmp &&
> +	mv $bitmap.tmp $bitmap &&

This line turns out to be problematic on OSX and ultimately causes the
test to fail.

When OSX's 'mv's destination is read-only, it asks whether to replace
the destination even though in the test its stdin is not a terminal
(and thus doesn't conform to POSIX[1]).  Since the '.bitmap' file is
read-only, and since 'mv' obviously doesn't get an affirmative
response from /dev/null, the original '.bitmap' file is not
overwritten, the subsequent 'git rev-list' doesn't print any error
message, and finally 'test_i18ngrep' causes the test to fail.

The relevant part of the '-x' test output on Travis CI:

  ++mv .git/objects/pack/pack-8886db3fce4f9657c1a43fee7d3ea4f2a4b5be2d.bitmap.tmp .git/objects/pack/pack-8886db3fce4f9657c1a43fee7d3ea4f2a4b5be2d.bitmap
  override r--r--r--  travis/staff for .git/objects/pack/pack-8886db3fce4f9657c1a43fee7d3ea4f2a4b5be2d.bitmap? (y/n [n]) not overwritten
  ++git rev-list --use-bitmap-index --count --all
  ++test_cmp expect actual
  ++diff -u expect actual
  ++test_i18ngrep corrupt stderr
  ++eval 'last_arg=${2}'
  +++last_arg=stderr
  ++test -f stderr
  ++test 2 -lt 2
  ++test 'x!' = xcorrupt
  ++test -n ''
  ++test 'x!' = xcorrupt
  ++grep corrupt stderr
  ++echo 'error: '\''grep corrupt' 'stderr'\'' didn'\''t find a match in:'
  error: 'grep corrupt stderr' didn't find a match in:
  ++test -s stderr
  ++echo '<File '\''stderr'\'' is empty>'
  <File 'stderr' is empty>
  ++return 1
  error: last command exited with $?=1
  not ok 43 - truncated bitmap fails gracefully

As far as I can tell, 'mv -f' appears to make the test work on OSX as
well.

I've run a build job with an additional 'grep ^override
t/test-results/*.out' command following the tests to see whether there
are any other cases where OSX 'mv' doesn't overwrite a read-only file
without causing the tests to fail, but found nothing.  (But note that
the OSX build jobs don't run all tests.)

[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/mv.html

> +	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
> +	test_cmp expect actual &&
> +	test_i18ngrep corrupt stderr
> +'
> +

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads
  2018-06-16 14:35     ` SZEDER Gábor
@ 2018-06-16 19:14       ` Jeff King
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-06-16 19:14 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, Luat Nguyen, git

On Sat, Jun 16, 2018 at 04:35:13PM +0200, SZEDER Gábor wrote:

> > +	head -c 512 <$bitmap >$bitmap.tmp &&
> > +	mv $bitmap.tmp $bitmap &&
> 
> This line turns out to be problematic on OSX and ultimately causes the
> test to fail.
> 
> When OSX's 'mv's destination is read-only, it asks whether to replace
> the destination even though in the test its stdin is not a terminal
> (and thus doesn't conform to POSIX[1]).  Since the '.bitmap' file is
> read-only, and since 'mv' obviously doesn't get an affirmative
> response from /dev/null, the original '.bitmap' file is not
> overwritten, the subsequent 'git rev-list' doesn't print any error
> message, and finally 'test_i18ngrep' causes the test to fail.

Right, sorry, I should have remembered that we've run into this before.
Using "mv -f" is the standard solution. E.g., c20d4d702f (t1450: use "mv
-f" within loose object directory, 2017-01-24).

Junio, can you squash this in to jk/ewah-bounds-check~1?

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index c4ed88030c..b11bc392a8 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -338,7 +338,7 @@ test_expect_success 'truncated bitmap fails gracefully' '
 	bitmap=$(ls .git/objects/pack/*.bitmap) &&
 	test_when_finished "rm -f $bitmap" &&
 	head -c 512 <$bitmap >$bitmap.tmp &&
-	mv $bitmap.tmp $bitmap &&
+	mv -f $bitmap.tmp $bitmap &&
 	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
 	test_cmp expect actual &&
 	test_i18ngrep corrupt stderr

-Peff

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* RE: security: potential out-of-bound read at ewah_io.c |ewah_read_mmap|
  2018-06-14 22:59 security: potential out-of-bound read at ewah_io.c |ewah_read_mmap| Luat Nguyen
  2018-06-15  3:28 ` Jeff King
@ 2018-06-19 19:00 ` Dyer, Edwin
  2018-06-19 19:56   ` Jeff King
  1 sibling, 1 reply; 58+ messages in thread
From: Dyer, Edwin @ 2018-06-19 19:00 UTC (permalink / raw)
  To: git@vger.kernel.org

Greetings, all:

Just curious if there was any additional comment on this potential OOB? I may have missed it and if so, apologies for the ask.

Cheers,

Ed


-----Original Message-----
From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On Behalf Of Luat Nguyen
Sent: Thursday, June 14, 2018 7:00 PM
To: git@vger.kernel.org
Subject: security: potential out-of-bound read at ewah_io.c |ewah_read_mmap|

Hi folks,

Recently, I’ve found a security issue related to out-of-bound read at function named `ewah_read_mmap`

Assume that, an attacker can put malicious `./git/index` into a repo by somehow.

Since there is lack of check whether the remaining size of `ptr`is equal to `buffer_size` or not.

So the code reads exceed the buffer of `ptr` and reach to higher page. In this case, it is `/lib/x86_64-linux-gnu/ld-2.23.so`.

Leads to infoleak. You can find more details and asan crash below.



# xxd .git/index
00000000: 4449 5243 0000 0002 0000 0000 4653 4d4e  DIRC........FSMN
00000010: 0000 0024 0000 0001 1538 2489 c8fc 3616  ...$.....8$...6.
00000020: 0000 0014 0000 0000 0000 2000 4141       .......... .AA
                                    ^ evil size here = 0x2000


***** SNIP CODE *****

int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len) { … 
	self->buffer_size = self->alloc_size = get_be32(ptr);
	ptr += sizeof(uint32_t);
… 
	memcpy(self->buffer, ptr, self->buffer_size * sizeof(eword_t));


[memory map]

    0x7f990eca3000     0x7f990eca4000 r--p     1000 0      /media/sf_Fuzz/vuln_repo/.git/index <— where `ptr` is placed
    0x7f990eca4000     0x7f990eca5000 r--p     1000 25000  /lib/x86_64-linux-gnu/ld-2.23.so <— memcpy will reach here
    0x7f990eca5000     0x7f990eca6000 rw-p     1000 26000  /lib/x86_64-linux-gnu/ld-2.23.so <— and here 


[ ASAN log ]

root@guest:/media/sf_SHARE/vuln_repo# /media/sf_SHARE/git-master-asan/git status =================================================================
==4324==ERROR: AddressSanitizer: unknown-crash on address 0x7f6f235b0000 at pc 0x0000004bba79 bp 0x7ffc75e68850 sp 0x7ffc75e68000 READ of size 65536 at 0x7f6f235b0000 thread T0
    #0 0x4bba78 in __asan_memcpy /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cc:23:3
    #1 0x8c910e in ewah_read_mmap /media/sf_SHARE/git-master-asan/ewah/ewah_io.c:144:2
    #2 0x8e2534 in read_fsmonitor_extension /media/sf_SHARE/git-master-asan/fsmonitor.c:46:8
    #3 0xa05862 in read_index_extension /media/sf_SHARE/git-master-asan/read-cache.c:1615:3
    #4 0xa046f3 in do_read_index /media/sf_SHARE/git-master-asan/read-cache.c:1872:7
    #5 0xa03325 in read_index_from /media/sf_SHARE/git-master-asan/read-cache.c:1913:8
    #6 0xa03231 in read_index /media/sf_SHARE/git-master-asan/read-cache.c:1634:9
    #7 0x9de5e8 in read_index_preload /media/sf_SHARE/git-master-asan/preload-index.c:119:15
    #8 0x566cc6 in cmd_status /media/sf_SHARE/git-master-asan/builtin/commit.c:1358:2
    #9 0x4ede8c in run_builtin /media/sf_SHARE/git-master-asan/git.c:417:11
    #10 0x4ea939 in handle_builtin /media/sf_SHARE/git-master-asan/git.c:632:8
    #11 0x4ed655 in run_argv /media/sf_SHARE/git-master-asan/git.c:684:4
    #12 0x4ea037 in cmd_main /media/sf_SHARE/git-master-asan/git.c:761:19
    #13 0x759c8b in main /media/sf_SHARE/git-master-asan/common-main.c:45:9
    #14 0x7f6f2243382f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #15 0x41c268 in _start (/media/sf_SHARE/git-master-asan/git+0x41c268)

Address 0x7f6f235b0000 is a wild pointer.
SUMMARY: AddressSanitizer: unknown-crash /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cc:23:3 in __asan_memcpy Shadow bytes around the buggy address:
  0x0fee646adfb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee646adfc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee646adfd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee646adfe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee646adff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0fee646ae000:[fe]fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fee646ae010: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fee646ae020: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fee646ae030: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fee646ae040: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fee646ae050: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==4324==ABORTING
root@guest:/media/sf_SHARE/vuln_repo#


Regards,
Luat Nguyen.

______________________________________________________________________
The information contained in this e-mail message and any attachments may be privileged and confidential. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that any review, dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this e-mail and delete the message and any attachments from your computer.
______________________________________________________________________

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: security: potential out-of-bound read at ewah_io.c |ewah_read_mmap|
  2018-06-19 19:00 ` Dyer, Edwin
@ 2018-06-19 19:56   ` Jeff King
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-06-19 19:56 UTC (permalink / raw)
  To: Dyer, Edwin; +Cc: git@vger.kernel.org

On Tue, Jun 19, 2018 at 07:00:48PM +0000, Dyer, Edwin wrote:

> Just curious if there was any additional comment on this potential
> OOB? I may have missed it and if so, apologies for the ask.

The fix is in master, and should be part of the upcoming v2.18. See
commit 9d2e330b17 (ewah_read_mmap: bounds-check mmap reads, 2018-06-14).

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2018-06-19 19:56 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-14 22:59 security: potential out-of-bound read at ewah_io.c |ewah_read_mmap| Luat Nguyen
2018-06-15  3:28 ` Jeff King
2018-06-15  3:31   ` [PATCH 1/3] ewah_read_mmap: bounds-check mmap reads Jeff King
2018-06-15  9:14     ` SZEDER Gábor
2018-06-15 16:20       ` Junio C Hamano
2018-06-15 17:10         ` SZEDER Gábor
2018-06-15 17:21           ` Jeff King
2018-06-15 19:42             ` Junio C Hamano
2018-06-15 17:05     ` Junio C Hamano
2018-06-15 17:26       ` Jeff King
2018-06-15 19:44         ` Junio C Hamano
2018-06-16 14:35     ` SZEDER Gábor
2018-06-16 19:14       ` Jeff King
2018-06-15  3:31   ` [PATCH 2/3] ewah: drop ewah_deserialize function Jeff King
2018-06-15  3:32   ` [PATCH 3/3] ewah: drop ewah_serialize_native function Jeff King
2018-06-15 13:56     ` Ramsay Jones
2018-06-15 14:07       ` Ramsay Jones
2018-06-15 14:30         ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
2018-06-15 14:30           ` [PATCH 1/8] ewah/bitmap.c: delete unused 'bitmap_clear()' Derrick Stolee
2018-06-15 14:46             ` Ramsay Jones
2018-06-15 15:11               ` Derrick Stolee
2018-06-15 14:30           ` [PATCH 2/8] ewah/bitmap.c: delete unused 'bitmap_each_bit()' Derrick Stolee
2018-06-15 15:03             ` Ramsay Jones
2018-06-15 14:30           ` [PATCH 3/8] ewah_bitmap: delete unused 'ewah_and()' Derrick Stolee
2018-06-15 14:30           ` [PATCH 4/8] ewah_bitmap: delete unused 'ewah_and_not()' Derrick Stolee
2018-06-15 14:30           ` [PATCH 5/8] ewah_bitmap: delete unused 'ewah_not()' Derrick Stolee
2018-06-15 14:30           ` [PATCH 6/8] ewah_bitmap: delete unused 'ewah_or()' Derrick Stolee
2018-06-15 14:30           ` [PATCH 7/8] ewah_io: delete unused 'ewah_serialize()' Derrick Stolee
2018-06-15 14:30           ` [PATCH 8/8] ewah_io: delete unused 'ewah_serialize_native()' Derrick Stolee
2018-06-15 15:01             ` Ramsay Jones
2018-06-15 15:10               ` Derrick Stolee
2018-06-15 14:35           ` [PATCH 0/8] Delete unused methods in EWAH bitmap Derrick Stolee
2018-06-15 18:27           ` [PATCH v2 0/7] " Derrick Stolee
2018-06-15 18:27             ` [PATCH v2 1/7] ewah/bitmap.c: delete unused 'bitmap_clear()' Derrick Stolee
2018-06-15 18:27             ` [PATCH v2 2/7] ewah/bitmap.c: delete unused 'bitmap_each_bit()' Derrick Stolee
2018-06-15 18:27             ` [PATCH v2 3/7] ewah_bitmap: delete unused 'ewah_and()' Derrick Stolee
2018-06-15 18:27             ` [PATCH v2 4/7] ewah_bitmap: delete unused 'ewah_and_not()' Derrick Stolee
2018-06-15 18:27             ` [PATCH v2 5/7] ewah_bitmap: delete unused 'ewah_not()' Derrick Stolee
2018-06-15 18:27             ` [PATCH v2 6/7] ewah_bitmap: delete unused 'ewah_or()' Derrick Stolee
2018-06-15 18:27             ` [PATCH v2 7/7] ewah_io: delete unused 'ewah_serialize()' Derrick Stolee
2018-06-15 18:51             ` [PATCH v2 0/7] Delete unused methods in EWAH bitmap Junio C Hamano
2018-06-15 18:56               ` Derrick Stolee
2018-06-15 19:48                 ` Junio C Hamano
2018-06-15 20:35                   ` Jeff King
2018-06-15 14:15       ` [PATCH 3/3] ewah: drop ewah_serialize_native function Derrick Stolee
2018-06-15 17:51         ` Jeff King
2018-06-15 18:33           ` Junio C Hamano
2018-06-15 18:46             ` Jeff King
2018-06-15  3:44   ` [PATCH 4/3] ewah: adjust callers of ewah_read_mmap() Jeff King
2018-06-15 11:23     ` Derrick Stolee
2018-06-15 16:41       ` Junio C Hamano
2018-06-15 17:31         ` Jeff King
2018-06-15 18:23           ` Derrick Stolee
2018-06-15 20:38             ` Jeff King
2018-06-15 17:12     ` Junio C Hamano
2018-06-15 16:11   ` security: potential out-of-bound read at ewah_io.c |ewah_read_mmap| Junio C Hamano
2018-06-19 19:00 ` Dyer, Edwin
2018-06-19 19:56   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).