From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 08/12] add oidset API
Date: Mon, 23 Jan 2017 19:46:47 -0500 [thread overview]
Message-ID: <20170124004647.3o26ionfq3td2irf@sigill.intra.peff.net> (raw)
In-Reply-To: <20170124003729.j4ygjcgypdq7hceg@sigill.intra.peff.net>
This is similar to many of our uses of sha1-array, but it
overcomes one limitation of a sha1-array: when you are
de-duplicating a large input with relatively few unique
entries, sha1-array uses 20 bytes per non-unique entry.
Whereas this set will use memory linear in the number of
unique entries (albeit a few more than 20 bytes due to
hashmap overhead).
Signed-off-by: Jeff King <peff@peff.net>
---
This may be overkill. You can get roughly the same thing by making
actual object structs via lookup_unknown_object(). But see the next
patch for some comments on that.
Makefile | 1 +
oidset.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
oidset.h | 45 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 95 insertions(+)
create mode 100644 oidset.c
create mode 100644 oidset.h
diff --git a/Makefile b/Makefile
index 27afd0f37..e41efc2d8 100644
--- a/Makefile
+++ b/Makefile
@@ -774,6 +774,7 @@ LIB_OBJS += notes-cache.o
LIB_OBJS += notes-merge.o
LIB_OBJS += notes-utils.o
LIB_OBJS += object.o
+LIB_OBJS += oidset.o
LIB_OBJS += pack-bitmap.o
LIB_OBJS += pack-bitmap-write.o
LIB_OBJS += pack-check.o
diff --git a/oidset.c b/oidset.c
new file mode 100644
index 000000000..6094cff8c
--- /dev/null
+++ b/oidset.c
@@ -0,0 +1,49 @@
+#include "cache.h"
+#include "oidset.h"
+
+struct oidset_entry {
+ struct hashmap_entry hash;
+ struct object_id oid;
+};
+
+int oidset_hashcmp(const void *va, const void *vb,
+ const void *vkey)
+{
+ const struct oidset_entry *a = va, *b = vb;
+ const struct object_id *key = vkey;
+ return oidcmp(&a->oid, key ? key : &b->oid);
+}
+
+int oidset_contains(const struct oidset *set, const struct object_id *oid)
+{
+ struct hashmap_entry key;
+
+ if (!set->map.cmpfn)
+ return 0;
+
+ hashmap_entry_init(&key, sha1hash(oid->hash));
+ return !!hashmap_get(&set->map, &key, oid);
+}
+
+int oidset_insert(struct oidset *set, const struct object_id *oid)
+{
+ struct oidset_entry *entry;
+
+ if (!set->map.cmpfn)
+ hashmap_init(&set->map, oidset_hashcmp, 0);
+
+ if (oidset_contains(set, oid))
+ return 1;
+
+ entry = xmalloc(sizeof(*entry));
+ hashmap_entry_init(&entry->hash, sha1hash(oid->hash));
+ oidcpy(&entry->oid, oid);
+
+ hashmap_add(&set->map, entry);
+ return 0;
+}
+
+void oidset_clear(struct oidset *set)
+{
+ hashmap_free(&set->map, 1);
+}
diff --git a/oidset.h b/oidset.h
new file mode 100644
index 000000000..b7eaab5b8
--- /dev/null
+++ b/oidset.h
@@ -0,0 +1,45 @@
+#ifndef OIDSET_H
+#define OIDSET_H
+
+/**
+ * This API is similar to sha1-array, in that it maintains a set of object ids
+ * in a memory-efficient way. The major differences are:
+ *
+ * 1. It uses a hash, so we can do online duplicate removal, rather than
+ * sort-and-uniq at the end. This can reduce memory footprint if you have
+ * a large list of oids with many duplicates.
+ *
+ * 2. The per-unique-oid memory footprint is slightly higher due to hash
+ * table overhead.
+ */
+
+/**
+ * A single oidset; should be zero-initialized (or use OIDSET_INIT).
+ */
+struct oidset {
+ struct hashmap map;
+};
+
+#define OIDSET_INIT { { NULL } }
+
+/**
+ * Returns true iff `set` contains `oid`.
+ */
+int oidset_contains(const struct oidset *set, const struct object_id *oid);
+
+/**
+ * Insert the oid into the set; a copy is made, so "oid" does not need
+ * to persist after this function is called.
+ *
+ * Returns 1 if the oid was already in the set, 0 otherwise. This can be used
+ * to perform an efficient check-and-add.
+ */
+int oidset_insert(struct oidset *set, const struct object_id *oid);
+
+/**
+ * Remove all entries from the oidset, freeing any resources associated with
+ * it.
+ */
+void oidset_clear(struct oidset *set);
+
+#endif /* OIDSET_H */
--
2.11.0.765.g454d2182f
next prev parent reply other threads:[~2017-01-24 0:47 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-24 0:37 [PATCH 0/12] reducing resource usage of for_each_alternate_ref Jeff King
2017-01-24 0:38 ` [PATCH 01/12] for_each_alternate_ref: handle failure from real_pathdup() Jeff King
2017-01-25 18:26 ` Junio C Hamano
2017-01-24 0:39 ` [PATCH 02/12] for_each_alternate_ref: stop trimming trailing slashes Jeff King
2017-01-24 0:40 ` [PATCH 03/12] for_each_alternate_ref: use strbuf for path allocation Jeff King
2017-01-25 18:29 ` Junio C Hamano
2017-01-25 18:40 ` Jeff King
2017-01-24 0:40 ` [PATCH 04/12] for_each_alternate_ref: pass name/oid instead of ref struct Jeff King
2017-01-24 0:44 ` [PATCH 05/12] for_each_alternate_ref: replace transport code with for-each-ref Jeff King
2017-01-25 19:00 ` Junio C Hamano
2017-01-24 0:45 ` [PATCH 06/12] clone: disable save_commit_buffer Jeff King
2017-01-25 19:11 ` Junio C Hamano
2017-01-25 19:27 ` Jeff King
2017-01-25 19:35 ` Jeff King
2017-01-25 21:07 ` Jeff King
2017-01-24 0:45 ` [PATCH 07/12] fetch-pack: cache results of for_each_alternate_ref Jeff King
2017-01-25 19:21 ` Junio C Hamano
2017-01-25 19:47 ` Jeff King
2017-01-24 0:46 ` Jeff King [this message]
2017-01-24 20:26 ` [PATCH 08/12] add oidset API Ramsay Jones
2017-01-24 20:35 ` Jeff King
2017-01-24 0:47 ` [PATCH 09/12] receive-pack: use oidset to de-duplicate .have lines Jeff King
2017-01-25 19:32 ` Junio C Hamano
2017-01-25 19:54 ` Jeff King
2017-01-24 0:47 ` [PATCH 10/12] receive-pack: fix misleading namespace/.have comment Jeff King
2017-01-24 0:48 ` [PATCH 11/12] receive-pack: treat namespace .have lines like alternates Jeff King
2017-01-25 19:51 ` Junio C Hamano
2017-01-25 19:58 ` Jeff King
2017-01-27 17:45 ` Lukas Fleischer
2017-01-27 17:58 ` Jeff King
2017-01-27 20:42 ` Junio C Hamano
2017-01-24 0:48 ` [PATCH 12/12] receive-pack: avoid duplicates between our refs and alternates Jeff King
2017-01-25 20:02 ` Junio C Hamano
2017-01-25 20:05 ` Jeff King
2017-01-24 1:33 ` [PATCH 0/12] reducing resource usage of for_each_alternate_ref Brandon Williams
2017-01-24 2:12 ` Jeff King
2017-02-08 20:52 ` [PATCH v2 0/11] " Jeff King
2017-02-08 20:52 ` [PATCH v2 01/11] for_each_alternate_ref: handle failure from real_pathdup() Jeff King
2017-02-08 20:52 ` [PATCH v2 02/11] for_each_alternate_ref: stop trimming trailing slashes Jeff King
2017-02-08 20:52 ` [PATCH v2 03/11] for_each_alternate_ref: use strbuf for path allocation Jeff King
2017-02-08 20:52 ` [PATCH v2 04/11] for_each_alternate_ref: pass name/oid instead of ref struct Jeff King
2017-02-08 20:53 ` [PATCH v2 05/11] for_each_alternate_ref: replace transport code with for-each-ref Jeff King
2017-02-08 20:53 ` [PATCH v2 06/11] fetch-pack: cache results of for_each_alternate_ref Jeff King
2017-02-08 20:53 ` [PATCH v2 07/11] add oidset API Jeff King
2017-02-08 20:53 ` [PATCH v2 08/11] receive-pack: use oidset to de-duplicate .have lines Jeff King
2017-02-08 20:53 ` [PATCH v2 09/11] receive-pack: fix misleading namespace/.have comment Jeff King
2017-02-08 20:53 ` [PATCH v2 10/11] receive-pack: treat namespace .have lines like alternates Jeff King
2017-02-08 20:53 ` [PATCH v2 11/11] receive-pack: avoid duplicates between our refs and alternates Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170124004647.3o26ionfq3td2irf@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).