git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Derrick Stolee <stolee@gmail.com>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Stefan Beller" <sbeller@google.com>, git <git@vger.kernel.org>,
	"Duy Nguyen" <pclouds@gmail.com>
Subject: [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode
Date: Tue, 9 Oct 2018 19:14:05 -0400	[thread overview]
Message-ID: <20181009231405.GB23730@sigill.intra.peff.net> (raw)
In-Reply-To: <20181009231250.GA19342@sigill.intra.peff.net>

This teaches "gen" mode (formerly the only mode) to include
the list of paths, and to prefix each bitmap with its
matching oid.

The "dump" mode can then read that back in and generate the
list of changed paths. This should be almost identical to:

  git rev-list --all |
  git diff-tree --stdin --name-only -t

The one difference is the sort order: git's diff output is
in tree-sort order, so a subtree "foo" sorts like "foo/",
which is after "foo.bar". Whereas the bitmap path list has a
true byte sort, which puts "foo.bar" after "foo".

Signed-off-by: Jeff King <peff@peff.net>
---
 t/helper/test-tree-bitmap.c | 104 +++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/t/helper/test-tree-bitmap.c b/t/helper/test-tree-bitmap.c
index bc5cf0e514..6f8833344a 100644
--- a/t/helper/test-tree-bitmap.c
+++ b/t/helper/test-tree-bitmap.c
@@ -112,6 +112,14 @@ static void collect_paths(struct hashmap *paths)
 	QSORT(sorted, i, pathmap_entry_strcmp);
 	for (i = 0; i < n; i++)
 		sorted[i]->pos = i;
+
+	/* dump it while we have the sorted order in memory */
+	for (i = 0; i < n; i++) {
+		printf("%s", sorted[i]->path);
+		putchar('\0');
+	}
+	putchar('\0');
+
 	free(sorted);
 }
 
@@ -142,6 +150,8 @@ static void generate_bitmap(struct diff_queue_struct *q,
 
 	ewah = bitmap_to_ewah(bitmap);
 	ewah_serialize_strbuf(ewah, &out);
+
+	fwrite(data->commit->object.oid.hash, 1, GIT_SHA1_RAWSZ, stdout);
 	fwrite(out.buf, 1, out.len, stdout);
 
 	trace_printf("bitmap %s %u %u",
@@ -154,14 +164,104 @@ static void generate_bitmap(struct diff_queue_struct *q,
 	bitmap_free(bitmap);
 }
 
-int cmd_main(int argc, const char **argv)
+static void do_gen(void)
 {
 	struct hashmap paths;
-
 	setup_git_directory();
 	collect_paths(&paths);
 
 	walk_paths(generate_bitmap, &paths);
+}
+
+static void show_path(size_t pos, void *data)
+{
+	const char **paths = data;
+
+	/* assert(pos < nr_paths), but we didn't pass the latter in */
+	printf("%s\n", paths[pos]);
+}
+
+static void do_dump(void)
+{
+	struct strbuf in = STRBUF_INIT;
+	const char *cur;
+	size_t remain;
+
+	const char **paths = NULL;
+	size_t alloc_paths = 0, nr_paths = 0;
+
+	/* slurp stdin; in the real world we'd mmap all this */
+	strbuf_read(&in, 0, 0);
+	cur = in.buf;
+	remain = in.len;
+
+	/* read path for each bit; in the real world this would be separate */
+	while (remain) {
+		const char *end = memchr(cur, '\0', remain);
+		if (!end) {
+			error("truncated input while reading path");
+			goto out;
+		}
+		if (end == cur) {
+			/* empty field signals end of paths */
+			cur++;
+			remain--;
+			break;
+		}
+
+		ALLOC_GROW(paths, nr_paths + 1, alloc_paths);
+		paths[nr_paths++] = cur;
+
+		remain -= end - cur + 1;
+		cur = end + 1;
+	}
+
+	/* read the bitmap for each commit */
+	while (remain) {
+		struct object_id oid;
+		struct ewah_bitmap *ewah;
+		ssize_t len;
+
+		if (remain < GIT_SHA1_RAWSZ) {
+			error("truncated input reading oid");
+			goto out;
+		}
+		hashcpy(oid.hash, (const unsigned char *)cur);
+		cur += GIT_SHA1_RAWSZ;
+		remain -= GIT_SHA1_RAWSZ;
+
+		ewah = ewah_new();
+		len = ewah_read_mmap(ewah, cur, remain);
+		if (len < 0) {
+			ewah_free(ewah);
+			goto out;
+		}
+
+		printf("%s\n", oid_to_hex(&oid));
+		ewah_each_bit(ewah, show_path, paths);
+
+		ewah_free(ewah);
+		cur += len;
+		remain -= len;
+	}
+
+out:
+	free(paths);
+	strbuf_release(&in);
+}
+
+int cmd_main(int argc, const char **argv)
+{
+	const char *usage_msg = "test-tree-bitmap <gen|dump>";
+
+	if (!argv[1])
+		usage(usage_msg);
+	else if (!strcmp(argv[1], "gen"))
+		do_gen();
+	else if (!strcmp(argv[1], "dump"))
+		do_dump();
+	else
+		usage(usage_msg);
 
 	return 0;
 }
-- 
2.19.1.550.g7610f1eecb


  parent reply	other threads:[~2018-10-09 23:14 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-03 13:23 We should add a "git gc --auto" after "git clone" due to commit graph Ævar Arnfjörð Bjarmason
2018-10-03 13:36 ` SZEDER Gábor
2018-10-03 13:42   ` Derrick Stolee
2018-10-03 14:18     ` Ævar Arnfjörð Bjarmason
2018-10-03 14:01   ` Ævar Arnfjörð Bjarmason
2018-10-03 14:17     ` SZEDER Gábor
2018-10-03 14:22       ` Ævar Arnfjörð Bjarmason
2018-10-03 14:53         ` SZEDER Gábor
2018-10-03 15:19           ` Ævar Arnfjörð Bjarmason
2018-10-03 16:59             ` SZEDER Gábor
2018-10-05  6:09               ` Junio C Hamano
2018-10-10 22:07                 ` SZEDER Gábor
2018-10-10 23:01                   ` Ævar Arnfjörð Bjarmason
2018-10-03 19:08           ` Stefan Beller
2018-10-03 19:21             ` Jeff King
2018-10-03 20:35               ` Ævar Arnfjörð Bjarmason
2018-10-03 17:47         ` Stefan Beller
2018-10-03 18:47           ` Ævar Arnfjörð Bjarmason
2018-10-03 18:51             ` Jeff King
2018-10-03 18:59               ` Derrick Stolee
2018-10-03 19:18                 ` Jeff King
2018-10-08 16:41                   ` SZEDER Gábor
2018-10-08 16:57                     ` Derrick Stolee
2018-10-08 18:10                       ` SZEDER Gábor
2018-10-08 18:29                         ` Derrick Stolee
2018-10-09  3:08                           ` Jeff King
2018-10-09 13:48                             ` Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) Derrick Stolee
2018-10-09 18:45                               ` Ævar Arnfjörð Bjarmason
2018-10-09 18:46                               ` Jeff King
2018-10-09 19:03                                 ` Derrick Stolee
2018-10-09 21:14                                   ` Jeff King
2018-10-09 23:12                                     ` Bloom Filters Jeff King
2018-10-09 23:13                                       ` [PoC -- do not apply 1/3] initial tree-bitmap proof of concept Jeff King
2018-10-09 23:14                                       ` Jeff King [this message]
2018-10-10  0:48                                         ` [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode Junio C Hamano
2018-10-11  3:13                                           ` Jeff King
2018-10-09 23:14                                       ` [PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding Jeff King
2018-10-10  0:58                                         ` Junio C Hamano
2018-10-11  3:20                                           ` Jeff King
2018-10-11 12:33                                       ` Bloom Filters Derrick Stolee
2018-10-11 13:43                                         ` Jeff King
2018-10-09 21:30                             ` We should add a "git gc --auto" after "git clone" due to commit graph SZEDER Gábor
2018-10-09 19:34                       ` [PATCH 0/4] Bloom filter experiment SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 1/4] Add a (very) barebones Bloom filter implementation SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit SZEDER Gábor
2018-10-09 21:06                           ` Jeff King
2018-10-09 21:37                             ` SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 3/4] revision.c: use the Bloom filter to speed up path-limited revision walks SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 4/4] revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics SZEDER Gábor
2018-10-09 19:47                         ` [PATCH 0/4] Bloom filter experiment Derrick Stolee
2018-10-11  1:21                         ` [PATCH 0/2] Per-commit filter proof of concept Jonathan Tan
2018-10-11  1:21                           ` [PATCH 1/2] One filter per commit Jonathan Tan
2018-10-11 12:49                             ` Derrick Stolee
2018-10-11 19:11                               ` [PATCH] Per-commit and per-parent filters for 2 parents Jonathan Tan
2018-10-11  1:21                           ` [PATCH 2/2] Only make bloom filter for first parent Jonathan Tan
2018-10-11  7:37                           ` [PATCH 0/2] Per-commit filter proof of concept Ævar Arnfjörð Bjarmason
2018-10-15 14:39                         ` [PATCH 0/4] Bloom filter experiment Derrick Stolee
2018-10-16  4:45                           ` Junio C Hamano
2018-10-16 11:13                             ` Derrick Stolee
2018-10-16 12:57                               ` Ævar Arnfjörð Bjarmason
2018-10-16 13:03                                 ` Derrick Stolee
2018-10-18  2:00                                 ` Junio C Hamano
2018-10-16 23:41                           ` Jonathan Tan
2018-10-08 23:02                     ` We should add a "git gc --auto" after "git clone" due to commit graph Junio C Hamano
2018-10-03 14:32     ` Duy Nguyen
2018-10-03 16:45 ` Duy Nguyen
2018-10-04 21:42 ` [RFC PATCH] " Ævar Arnfjörð Bjarmason
2018-10-05 12:05   ` Derrick Stolee
2018-10-05 13:05     ` Ævar Arnfjörð Bjarmason
2018-10-05 13:45       ` Derrick Stolee
2018-10-05 14:04         ` Ævar Arnfjörð Bjarmason
2018-10-05 19:21         ` Jeff King
2018-10-05 19:41           ` Derrick Stolee
2018-10-05 19:47             ` Jeff King
2018-10-05 20:00               ` Derrick Stolee
2018-10-05 20:02                 ` Jeff King
2018-10-05 20:01               ` Ævar Arnfjörð Bjarmason
2018-10-05 20:09                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181009231405.GB23730@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=sbeller@google.com \
    --cc=stolee@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).