* [PATCH/RFC v3 01/12] pack-objects: a bit of document about struct object_entry
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-09 22:34 ` Junio C Hamano
2018-03-08 11:42 ` [PATCH/RFC v3 02/12] pack-objects: turn type and in_pack_type to bitfields Nguyễn Thái Ngọc Duy
` (11 subsequent siblings)
12 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
The role of this comment block becomes more important after we shuffle
fields around to shrink this struct. It will be much harder to see what
field is related to what. This also documents the holes in this struct
according to pahole.
A couple of notes on shrinking the struct:
1) The reader may notice one thing from this document and the shrinking
business. If "delta" is NULL, all other delta-related fields should be
irrelevant. We could group all these in a separate struct and replace
them all with a pointer to this struct (allocated separately).
This does not help much though since 85% of objects are deltified
(source: linux-2.6.git). The gain is only from non-delta objects, which
is not that significant.
2) The field in_pack_offset and idx.offset could be merged. But we need
to be very careful. Up until the very last phase (object writing),
idx.offset is not used and can hold in_pack_offset. Then idx.offset will
be updated with _destination pack's_ offset, not source's. But since we
always write delta's bases first, and we only use in_pack_offset in
writing phase when we reuse objects, we should be ok?
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
pack-objects.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/pack-objects.h b/pack-objects.h
index 03f1191659..f834ead541 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,52 @@
#ifndef PACK_OBJECTS_H
#define PACK_OBJECTS_H
+/*
+ * basic object info
+ * -----------------
+ * idx.oid is filled up before delta searching starts. idx.crc32 and
+ * is only valid after the object is written down and will be used for
+ * generating the index. idx.offset will be both gradually set and
+ * used in writing phase (base objects get offset first, then deltas
+ * refer to them)
+ *
+ * "size" is the uncompressed object size. Compressed size is not
+ * cached (ie. raw data in a pack) but available via revindex.
+ *
+ * "hash" contains a path name hash which is used for sorting the
+ * delta list and also during delta searching. Once prepare_pack()
+ * returns it's no longer needed.
+ *
+ * source pack info
+ * ----------------
+ * The (in_pack, in_pack_offset, in_pack_header_size) tuple contains
+ * the location of the object in the source pack, with or without
+ * header.
+ *
+ * "type" and "in_pack_type" both describe object type. in_pack_type
+ * may contain a delta type, while type is always the canonical type.
+ *
+ * deltas
+ * ------
+ * Delta links (delta, delta_child and delta_sibling) are created
+ * reflect that delta graph from the source pack then updated or added
+ * during delta searching phase when we find better deltas.
+ *
+ * delta_child and delta_sibling are last needed in
+ * compute_write_order(). "delta" and "delta_size" must remain valid
+ * at object writing phase in case the delta is not cached.
+ *
+ * If a delta is cached in memory and is compressed, "delta" points to
+ * the data and z_delta_size contains the compressed size. If it's
+ * uncompressed [1], z_delta_size must be zero. delta_size is always
+ * the uncompressed size and must be valid even if the delta is not
+ * cached. Delta recreation technically only depends on "delta"
+ * pointer, but delta_size is still used to verify it's the same as
+ * before.
+ *
+ * [1] during try_delta phase we don't bother with compressing because
+ * the delta could be quickly replaced with a better one.
+ */
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
@@ -28,6 +74,7 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
+ /* XXX 28 bits hole, try to pack */
/*
* State flags for depth-first search used for analyzing delta cycles.
*
@@ -40,6 +87,7 @@ struct object_entry {
DFS_DONE
} dfs_state;
int depth;
+ /* size: 136, padding: 4 */
};
struct packing_data {
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH/RFC v3 01/12] pack-objects: a bit of document about struct object_entry
2018-03-08 11:42 ` [PATCH/RFC v3 01/12] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-09 22:34 ` Junio C Hamano
0 siblings, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2018-03-09 22:34 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> The role of this comment block becomes more important after we shuffle
> fields around to shrink this struct. It will be much harder to see what
> field is related to what. This also documents the holes in this struct
> according to pahole.
>
> A couple of notes on shrinking the struct:
>
> 1) The reader may notice one thing from this document and the shrinking
> business. If "delta" is NULL, all other delta-related fields should be
> irrelevant. We could group all these in a separate struct and replace
> them all with a pointer to this struct (allocated separately).
>
> This does not help much though since 85% of objects are deltified
> (source: linux-2.6.git). The gain is only from non-delta objects, which
> is not that significant.
OK.
> 2) The field in_pack_offset and idx.offset could be merged. But we need
> to be very careful. Up until the very last phase (object writing),
> idx.offset is not used and can hold in_pack_offset. Then idx.offset will
> be updated with _destination pack's_ offset, not source's. But since we
> always write delta's bases first, and we only use in_pack_offset in
> writing phase when we reuse objects, we should be ok?
By separating the processing in strict phases, I do think the result
would be OK, but at the same time, that does smell like an
invitation for future bugs.
> +/*
> + * basic object info
> + * -----------------
> + * idx.oid is filled up before delta searching starts. idx.crc32 and
> + * is only valid after the object is written down and will be used for
> + * generating the index. idx.offset will be both gradually set and
> + * used in writing phase (base objects get offset first, then deltas
> + * refer to them)
Here, I'd feel that "written out" somehow would sound more natural
than "written down", but that is perhaps because I've seen it used
elsewhere and I am confusing familiarlity with naturalness. In any
case, if we mean "written to the resulting packdata stream", saying
that to be more explicit is probably a good idea. We compute crc32
and learn the offset for each object as we write them to the result.
> + * If a delta is cached in memory and is compressed, "delta" points to
> + * the data and z_delta_size contains the compressed size. If it's
Isn't it "delta_data" (aot "delta") that points at the cached delta
data?
> + * uncompressed [1], z_delta_size must be zero. delta_size is always
> + * the uncompressed size and must be valid even if the delta is not
> + * cached. Delta recreation technically only depends on "delta"
> + * pointer, but delta_size is still used to verify it's the same as
> + * before.
> + *
> + * [1] during try_delta phase we don't bother with compressing because
> + * the delta could be quickly replaced with a better one.
> + */
> struct object_entry {
> struct pack_idx_entry idx;
> unsigned long size; /* uncompressed size */
> @@ -28,6 +74,7 @@ struct object_entry {
> unsigned tagged:1; /* near the very tip of refs */
> unsigned filled:1; /* assigned write-order */
>
> + /* XXX 28 bits hole, try to pack */
> /*
> * State flags for depth-first search used for analyzing delta cycles.
> *
> @@ -40,6 +87,7 @@ struct object_entry {
> DFS_DONE
> } dfs_state;
> int depth;
> + /* size: 136, padding: 4 */
> };
>
> struct packing_data {
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 02/12] pack-objects: turn type and in_pack_type to bitfields
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
2018-03-08 11:42 ` [PATCH/RFC v3 01/12] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-09 22:54 ` Junio C Hamano
2018-03-08 11:42 ` [PATCH/RFC v3 03/12] pack-objects: use bitfield for object_entry::dfs_state Nguyễn Thái Ngọc Duy
` (10 subsequent siblings)
12 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
This saves 8 bytes in sizeof(struct object_entry). On a large
repository like linux-2.6.git (6.5M objects), this saves us 52MB
memory.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 14 ++++++++++++--
cache.h | 2 ++
object.h | 1 -
pack-objects.h | 8 ++++----
4 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5c674b2843..fd217cb51f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1407,6 +1407,7 @@ static void check_object(struct object_entry *entry)
unsigned long avail;
off_t ofs;
unsigned char *buf, c;
+ enum object_type type;
buf = use_pack(p, &w_curs, entry->in_pack_offset, &avail);
@@ -1415,11 +1416,15 @@ static void check_object(struct object_entry *entry)
* since non-delta representations could still be reused.
*/
used = unpack_object_header_buffer(buf, avail,
- &entry->in_pack_type,
+ &type,
&entry->size);
if (used == 0)
goto give_up;
+ if (type < 0)
+ die("BUG: invalid type %d", type);
+ entry->in_pack_type = type;
+
/*
* Determine if this is a delta and if so whether we can
* reuse it or not. Otherwise let's find out as cheaply as
@@ -1559,6 +1564,7 @@ static void drop_reused_delta(struct object_entry *entry)
{
struct object_entry **p = &entry->delta->delta_child;
struct object_info oi = OBJECT_INFO_INIT;
+ enum object_type type;
while (*p) {
if (*p == entry)
@@ -1570,7 +1576,7 @@ static void drop_reused_delta(struct object_entry *entry)
entry->depth = 0;
oi.sizep = &entry->size;
- oi.typep = &entry->type;
+ oi.typep = &type;
if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
@@ -1580,6 +1586,10 @@ static void drop_reused_delta(struct object_entry *entry)
*/
entry->type = sha1_object_info(entry->idx.oid.hash,
&entry->size);
+ } else {
+ if (type < 0)
+ die("BUG: invalid type %d", type);
+ entry->type = type;
}
}
diff --git a/cache.h b/cache.h
index 21fbcc2414..862bdff83a 100644
--- a/cache.h
+++ b/cache.h
@@ -373,6 +373,8 @@ extern void free_name_hash(struct index_state *istate);
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
#endif
+#define TYPE_BITS 3
+
enum object_type {
OBJ_BAD = -1,
OBJ_NONE = 0,
diff --git a/object.h b/object.h
index 87563d9056..8ce294d6ec 100644
--- a/object.h
+++ b/object.h
@@ -25,7 +25,6 @@ struct object_array {
#define OBJECT_ARRAY_INIT { 0, 0, NULL }
-#define TYPE_BITS 3
/*
* object flag allocation:
* revision.h: 0---------10 26
diff --git a/pack-objects.h b/pack-objects.h
index f834ead541..85b01b66da 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -60,11 +60,11 @@ struct object_entry {
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
- enum object_type type;
- enum object_type in_pack_type; /* could be delta */
uint32_t hash; /* name hint hash */
unsigned int in_pack_pos;
unsigned char in_pack_header_size;
+ unsigned type:TYPE_BITS;
+ unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
@@ -74,7 +74,7 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
- /* XXX 28 bits hole, try to pack */
+ /* XXX 22 bits hole, try to pack */
/*
* State flags for depth-first search used for analyzing delta cycles.
*
@@ -87,7 +87,7 @@ struct object_entry {
DFS_DONE
} dfs_state;
int depth;
- /* size: 136, padding: 4 */
+ /* size: 128, padding: 4 */
};
struct packing_data {
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH/RFC v3 02/12] pack-objects: turn type and in_pack_type to bitfields
2018-03-08 11:42 ` [PATCH/RFC v3 02/12] pack-objects: turn type and in_pack_type to bitfields Nguyễn Thái Ngọc Duy
@ 2018-03-09 22:54 ` Junio C Hamano
2018-03-12 17:51 ` Duy Nguyen
0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2018-03-09 22:54 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> @@ -1570,7 +1576,7 @@ static void drop_reused_delta(struct object_entry *entry)
> entry->depth = 0;
>
> oi.sizep = &entry->size;
> - oi.typep = &entry->type;
> + oi.typep = &type;
> if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
> /*
> * We failed to get the info from this pack for some reason;
> @@ -1580,6 +1586,10 @@ static void drop_reused_delta(struct object_entry *entry)
> */
> entry->type = sha1_object_info(entry->idx.oid.hash,
> &entry->size);
The comment immediately before this pre-context reads as such:
/*
* We failed to get the info from this pack for some reason;
* fall back to sha1_object_info, which may find another copy.
* And if that fails, the error will be recorded in entry->type
* and dealt with in prepare_pack().
*/
The rest of the code relies on the ability of entry->type to record
the error by storing an invalid (negative) type; otherwise, it cannot
detect an error where (1) the entry in _this_ pack was corrupt, and
(2) we wished to find another copy of the object elsewhere (which
would overwrite the negative entry->type we assign here), but we
didn't find any.
How should we propagate the error we found here down the control
flow in this new code?
> + } else {
> + if (type < 0)
> + die("BUG: invalid type %d", type);
> + entry->type = type;
The BUG() on this side is sensible, as packed_object_info()
shouldn't report success when it stored negative result in *oi.typep
anyway.
> unsigned char in_pack_header_size;
> + unsigned type:TYPE_BITS;
> + unsigned in_pack_type:TYPE_BITS; /* could be delta */
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH/RFC v3 02/12] pack-objects: turn type and in_pack_type to bitfields
2018-03-09 22:54 ` Junio C Hamano
@ 2018-03-12 17:51 ` Duy Nguyen
0 siblings, 0 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-12 17:51 UTC (permalink / raw)
To: Junio C Hamano; +Cc: avarab, e, git, peff
On Fri, Mar 09, 2018 at 02:54:53PM -0800, Junio C Hamano wrote:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
> > @@ -1570,7 +1576,7 @@ static void drop_reused_delta(struct object_entry *entry)
> > entry->depth = 0;
> >
> > oi.sizep = &entry->size;
> > - oi.typep = &entry->type;
> > + oi.typep = &type;
> > if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
> > /*
> > * We failed to get the info from this pack for some reason;
> > @@ -1580,6 +1586,10 @@ static void drop_reused_delta(struct object_entry *entry)
> > */
> > entry->type = sha1_object_info(entry->idx.oid.hash,
> > &entry->size);
>
> The comment immediately before this pre-context reads as such:
>
> /*
> * We failed to get the info from this pack for some reason;
> * fall back to sha1_object_info, which may find another copy.
> * And if that fails, the error will be recorded in entry->type
> * and dealt with in prepare_pack().
> */
>
> The rest of the code relies on the ability of entry->type to record
> the error by storing an invalid (negative) type; otherwise, it cannot
> detect an error where (1) the entry in _this_ pack was corrupt, and
> (2) we wished to find another copy of the object elsewhere (which
> would overwrite the negative entry->type we assign here), but we
> didn't find any.
>
> How should we propagate the error we found here down the control
> flow in this new code?
Good catch! I don't have any magic trick to do this, so I'm adding an
extra bit to store type validity. Something like this as a fixup patch
(I'll resend the whole series soon).
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index fd217cb51f..f164f1797b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -265,7 +265,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
struct git_istream *st = NULL;
if (!usable_delta) {
- if (entry->type == OBJ_BLOB &&
+ if (oe_type(entry) == OBJ_BLOB &&
entry->size > big_file_threshold &&
(st = open_istream(entry->idx.oid.hash, &type, &size, NULL)) != NULL)
buf = NULL;
@@ -371,7 +371,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
- enum object_type type = entry->type;
+ enum object_type type = oe_type(entry);
off_t datalen;
unsigned char header[MAX_PACK_OBJECT_HEADER],
dheader[MAX_PACK_OBJECT_HEADER];
@@ -480,11 +480,12 @@ static off_t write_object(struct hashfile *f,
to_reuse = 0; /* explicit */
else if (!entry->in_pack)
to_reuse = 0; /* can't reuse what we don't have */
- else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
+ else if (oe_type(entry) == OBJ_REF_DELTA ||
+ oe_type(entry) == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
to_reuse = usable_delta;
/* ... but pack split may override that */
- else if (entry->type != entry->in_pack_type)
+ else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0; /* pack has delta which is unusable */
else if (entry->delta)
to_reuse = 0; /* we want to pack afresh */
@@ -705,8 +706,8 @@ static struct object_entry **compute_write_order(void)
* And then all remaining commits and tags.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (objects[i].type != OBJ_COMMIT &&
- objects[i].type != OBJ_TAG)
+ if (oe_type(&objects[i]) != OBJ_COMMIT &&
+ oe_type(&objects[i]) != OBJ_TAG)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -715,7 +716,7 @@ static struct object_entry **compute_write_order(void)
* And then all the trees.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (objects[i].type != OBJ_TREE)
+ if (oe_type(&objects[i]) != OBJ_TREE)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -1067,7 +1068,7 @@ static void create_object_entry(const struct object_id *oid,
entry = packlist_alloc(&to_pack, oid->hash, index_pos);
entry->hash = hash;
if (type)
- entry->type = type;
+ oe_set_type(entry, type);
if (exclude)
entry->preferred_base = 1;
else
@@ -1433,9 +1434,9 @@ static void check_object(struct object_entry *entry)
switch (entry->in_pack_type) {
default:
/* Not a delta hence we've already got all we need. */
- entry->type = entry->in_pack_type;
+ oe_set_type(entry, entry->in_pack_type);
entry->in_pack_header_size = used;
- if (entry->type < OBJ_COMMIT || entry->type > OBJ_BLOB)
+ if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > OBJ_BLOB)
goto give_up;
unuse_pack(&w_curs);
return;
@@ -1489,7 +1490,7 @@ static void check_object(struct object_entry *entry)
* deltify other objects against, in order to avoid
* circular deltas.
*/
- entry->type = entry->in_pack_type;
+ oe_set_type(entry, entry->in_pack_type);
entry->delta = base_entry;
entry->delta_size = entry->size;
entry->delta_sibling = base_entry->delta_child;
@@ -1498,7 +1499,7 @@ static void check_object(struct object_entry *entry)
return;
}
- if (entry->type) {
+ if (oe_type(entry)) {
/*
* This must be a delta and we already know what the
* final object type is. Let's extract the actual
@@ -1521,7 +1522,7 @@ static void check_object(struct object_entry *entry)
unuse_pack(&w_curs);
}
- entry->type = sha1_object_info(entry->idx.oid.hash, &entry->size);
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &entry->size));
/*
* The error condition is checked in prepare_pack(). This is
* to permit a missing preferred base object to be ignored
@@ -1584,12 +1585,10 @@ static void drop_reused_delta(struct object_entry *entry)
* And if that fails, the error will be recorded in entry->type
* and dealt with in prepare_pack().
*/
- entry->type = sha1_object_info(entry->idx.oid.hash,
- &entry->size);
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
+ &entry->size));
} else {
- if (type < 0)
- die("BUG: invalid type %d", type);
- entry->type = type;
+ oe_set_type(entry, type);
}
}
@@ -1757,10 +1756,12 @@ static int type_size_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+ enum object_type a_type = oe_type(a);
+ enum object_type b_type = oe_type(b);
- if (a->type > b->type)
+ if (a_type > b_type)
return -1;
- if (a->type < b->type)
+ if (a_type < b_type)
return 1;
if (a->hash > b->hash)
return -1;
@@ -1836,7 +1837,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
void *delta_buf;
/* Don't bother doing diffs between different types */
- if (trg_entry->type != src_entry->type)
+ if (oe_type(trg_entry) != oe_type(src_entry))
return -1;
/*
@@ -2442,11 +2443,11 @@ static void prepare_pack(int window, int depth)
if (!entry->preferred_base) {
nr_deltas++;
- if (entry->type < 0)
+ if (oe_type(entry) < 0)
die("unable to get type of object %s",
oid_to_hex(&entry->idx.oid));
} else {
- if (entry->type < 0) {
+ if (oe_type(entry) < 0) {
/*
* This object is not found, but we
* don't have to include it anyway.
diff --git a/pack-objects.h b/pack-objects.h
index 3e5a89569a..90fbbc9394 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -58,8 +58,9 @@ struct object_entry {
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
- unsigned type:TYPE_BITS;
+ unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
+ unsigned type_valid:1;
uint32_t hash; /* name hint hash */
unsigned int in_pack_pos;
unsigned char in_pack_header_size;
@@ -122,4 +123,19 @@ static inline uint32_t pack_name_hash(const char *name)
return hash;
}
+static inline enum object_type oe_type(const struct object_entry *e)
+{
+ return e->type_valid ? e->type_ : OBJ_BAD;
+}
+
+static inline void oe_set_type(struct object_entry *e,
+ enum object_type type)
+{
+ if (type >= OBJ_ANY)
+ die("BUG: OBJ_ANY cannot be set in pack-objects code");
+
+ e->type_valid = type >= 0;
+ e->type_ = (unsigned)type;
+}
+
#endif
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 03/12] pack-objects: use bitfield for object_entry::dfs_state
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
2018-03-08 11:42 ` [PATCH/RFC v3 01/12] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
2018-03-08 11:42 ` [PATCH/RFC v3 02/12] pack-objects: turn type and in_pack_type to bitfields Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-08 11:42 ` [PATCH/RFC v3 04/12] pack-objects: use bitfield for object_entry::depth Nguyễn Thái Ngọc Duy
` (9 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 3 +++
pack-objects.h | 33 ++++++++++++++++++++-------------
2 files changed, 23 insertions(+), 13 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index fd217cb51f..a4dbb40824 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3049,6 +3049,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
OPT_END(),
};
+ if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS))
+ die("BUG: too many dfs states, increase OE_DFS_STATE_BITS");
+
check_replace_refs = 0;
reset_pack_idx_option(&pack_idx_opts);
diff --git a/pack-objects.h b/pack-objects.h
index 85b01b66da..628c45871c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,21 @@
#ifndef PACK_OBJECTS_H
#define PACK_OBJECTS_H
+#define OE_DFS_STATE_BITS 2
+
+/*
+ * State flags for depth-first search used for analyzing delta cycles.
+ *
+ * The depth is measured in delta-links to the base (so if A is a delta
+ * against B, then A has a depth of 1, and B a depth of 0).
+ */
+enum dfs_state {
+ DFS_NONE = 0,
+ DFS_ACTIVE,
+ DFS_DONE,
+ DFS_NUM_STATES
+};
+
/*
* basic object info
* -----------------
@@ -73,21 +88,13 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
+ unsigned dfs_state:OE_DFS_STATE_BITS;
+
+ /* XXX 20 bits hole, try to pack */
- /* XXX 22 bits hole, try to pack */
- /*
- * State flags for depth-first search used for analyzing delta cycles.
- *
- * The depth is measured in delta-links to the base (so if A is a delta
- * against B, then A has a depth of 1, and B a depth of 0).
- */
- enum {
- DFS_NONE = 0,
- DFS_ACTIVE,
- DFS_DONE
- } dfs_state;
int depth;
- /* size: 128, padding: 4 */
+
+ /* size: 120 */
};
struct packing_data {
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 04/12] pack-objects: use bitfield for object_entry::depth
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
` (2 preceding siblings ...)
2018-03-08 11:42 ` [PATCH/RFC v3 03/12] pack-objects: use bitfield for object_entry::dfs_state Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-09 23:07 ` Junio C Hamano
2018-03-08 11:42 ` [PATCH/RFC v3 05/12] pack-objects: note about in_pack_header_size Nguyễn Thái Ngọc Duy
` (8 subsequent siblings)
12 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
This does not give us any saving due to padding. But we will be able
to save once we cut 4 bytes out of this struct in a subsequent patch.
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/config.txt | 1 +
Documentation/git-pack-objects.txt | 4 +++-
Documentation/git-repack.txt | 4 +++-
builtin/pack-objects.c | 4 ++++
pack-objects.h | 8 +++-----
5 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index f57e9cf10c..9bd3f5a789 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2412,6 +2412,7 @@ pack.window::
pack.depth::
The maximum delta depth used by linkgit:git-pack-objects[1] when no
maximum depth is given on the command line. Defaults to 50.
+ Maximum value is 4095.
pack.windowMemory::
The maximum size of memory that is consumed by each thread
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 81bc490ac5..3503c9e3e6 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -96,7 +96,9 @@ base-name::
it too deep affects the performance on the unpacker
side, because delta data needs to be applied that many
times to get to the necessary object.
- The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
--window-memory=<n>::
This option provides an additional limit on top of `--window`;
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index ae750e9e11..25c83c4927 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -90,7 +90,9 @@ other objects in that pack they already have locally.
space. `--depth` limits the maximum delta depth; making it too deep
affects the performance on the unpacker side, because delta data needs
to be applied that many times to get to the necessary object.
- The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
--threads=<n>::
This option is passed through to `git pack-objects`.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a4dbb40824..cfd97da7db 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3068,6 +3068,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (pack_to_stdout != !base_name || argc)
usage_with_options(pack_usage, pack_objects_options);
+ if (depth > (1 << OE_DEPTH_BITS))
+ die(_("delta chain depth %d is greater than maximum limit %d"),
+ depth, (1 << OE_DEPTH_BITS));
+
argv_array_push(&rp, "pack-objects");
if (thin) {
use_internal_rev_list = 1;
diff --git a/pack-objects.h b/pack-objects.h
index 628c45871c..4b17402953 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -2,6 +2,7 @@
#define PACK_OBJECTS_H
#define OE_DFS_STATE_BITS 2
+#define OE_DEPTH_BITS 12
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -89,12 +90,9 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
+ unsigned depth:OE_DEPTH_BITS;
- /* XXX 20 bits hole, try to pack */
-
- int depth;
-
- /* size: 120 */
+ /* size: 120, bit_padding: 8 bits */
};
struct packing_data {
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 05/12] pack-objects: note about in_pack_header_size
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
` (3 preceding siblings ...)
2018-03-08 11:42 ` [PATCH/RFC v3 04/12] pack-objects: use bitfield for object_entry::depth Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-08 11:42 ` [PATCH/RFC v3 06/12] pack-objects: move in_pack_pos out of struct object_entry Nguyễn Thái Ngọc Duy
` (7 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Object header in a pack is packed really tight (see
pack-format.txt). Even with 8 bytes length, we need 9-10 bytes most,
plus a hash (20 bytes). Which means this field only needs to store a
number as big as 32 (5 bits).
This is trickier to pack tight though since a new hash algorithm is
coming, the number of bits needed may quickly increase. So leave it
for now.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
pack-objects.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pack-objects.h b/pack-objects.h
index 4b17402953..2ccd6359d2 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -78,7 +78,7 @@ struct object_entry {
unsigned long z_delta_size; /* delta data size (compressed) */
uint32_t hash; /* name hint hash */
unsigned int in_pack_pos;
- unsigned char in_pack_header_size;
+ unsigned char in_pack_header_size; /* note: spare bits available! */
unsigned type:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned preferred_base:1; /*
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 06/12] pack-objects: move in_pack_pos out of struct object_entry
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
` (4 preceding siblings ...)
2018-03-08 11:42 ` [PATCH/RFC v3 05/12] pack-objects: note about in_pack_header_size Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-08 11:42 ` [PATCH/RFC v3 07/12] pack-objects: move in_pack " Nguyễn Thái Ngọc Duy
` (6 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (it's not freed in the same way that objects[] is
not). This saves us 8 bytes in struct object_entry.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 3 ++-
pack-bitmap-write.c | 8 +++++---
pack-bitmap.c | 2 +-
pack-bitmap.h | 4 +++-
pack-objects.h | 18 ++++++++++++++++--
5 files changed, 27 insertions(+), 8 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index cfd97da7db..7bb5544883 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -878,7 +878,8 @@ static void write_pack_file(void)
if (write_bitmap_index) {
bitmap_writer_set_checksum(oid.hash);
- bitmap_writer_build_type_index(written_list, nr_written);
+ bitmap_writer_build_type_index(
+ &to_pack, written_list, nr_written);
}
finish_tmp_packfile(&tmpname, pack_tmp_name,
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e01f992884..256a63f892 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,8 @@ void bitmap_writer_show_progress(int show)
/**
* Build the initial type index for the packfile
*/
-void bitmap_writer_build_type_index(struct pack_idx_entry **index,
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+ struct pack_idx_entry **index,
uint32_t index_nr)
{
uint32_t i;
@@ -57,12 +58,13 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index,
writer.trees = ewah_new();
writer.blobs = ewah_new();
writer.tags = ewah_new();
+ ALLOC_ARRAY(to_pack->in_pack_pos, to_pack->nr_objects);
for (i = 0; i < index_nr; ++i) {
struct object_entry *entry = (struct object_entry *)index[i];
enum object_type real_type;
- entry->in_pack_pos = i;
+ oe_set_in_pack_pos(to_pack, entry, i);
switch (entry->type) {
case OBJ_COMMIT:
@@ -147,7 +149,7 @@ static uint32_t find_object_pos(const unsigned char *sha1)
"(object %s is missing)", sha1_to_hex(sha1));
}
- return entry->in_pack_pos;
+ return oe_in_pack_pos(writer.to_pack, entry);
}
static void show_object(struct object *object, const char *name, void *data)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9270983e5f..865d9ecc4e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1032,7 +1032,7 @@ int rebuild_existing_bitmaps(struct packing_data *mapping,
oe = packlist_find(mapping, sha1, NULL);
if (oe)
- reposition[i] = oe->in_pack_pos + 1;
+ reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
}
rebuild = bitmap_new();
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3742a00e14..5ded2f139a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,7 +44,9 @@ int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bi
void bitmap_writer_show_progress(int show);
void bitmap_writer_set_checksum(unsigned char *sha1);
-void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t index_nr);
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+ struct pack_idx_entry **index,
+ uint32_t index_nr);
void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack);
void bitmap_writer_select_commits(struct commit **indexed_commits,
unsigned int indexed_commits_nr, int max_bitmaps);
diff --git a/pack-objects.h b/pack-objects.h
index 2ccd6359d2..9ab0ce300d 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -77,7 +77,6 @@ struct object_entry {
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
uint32_t hash; /* name hint hash */
- unsigned int in_pack_pos;
unsigned char in_pack_header_size; /* note: spare bits available! */
unsigned type:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
@@ -92,7 +91,7 @@ struct object_entry {
unsigned dfs_state:OE_DFS_STATE_BITS;
unsigned depth:OE_DEPTH_BITS;
- /* size: 120, bit_padding: 8 bits */
+ /* size: 112, bit_padding: 8 bits */
};
struct packing_data {
@@ -101,6 +100,8 @@ struct packing_data {
int32_t *index;
uint32_t index_size;
+
+ unsigned int *in_pack_pos;
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -131,4 +132,17 @@ static inline uint32_t pack_name_hash(const char *name)
return hash;
}
+static inline unsigned int oe_in_pack_pos(const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ return pack->in_pack_pos[e - pack->objects];
+}
+
+static inline void oe_set_in_pack_pos(const struct packing_data *pack,
+ const struct object_entry *e,
+ unsigned int pos)
+{
+ pack->in_pack_pos[e - pack->objects] = pos;
+}
+
#endif
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 07/12] pack-objects: move in_pack out of struct object_entry
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
` (5 preceding siblings ...)
2018-03-08 11:42 ` [PATCH/RFC v3 06/12] pack-objects: move in_pack_pos out of struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-09 23:21 ` Junio C Hamano
2018-03-08 11:42 ` [PATCH/RFC v3 08/12] pack-objects: refer to delta objects by index instead of pointer Nguyễn Thái Ngọc Duy
` (5 subsequent siblings)
12 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index isntead since the number of packs should be
relatively small.
This limits the number of packs we can handle to 16k. For now if you hit
16k pack files limit, pack-objects will simply fail [1].
This technically saves 7 bytes. But we don't see any of that in
practice due to padding. The saving becomes real when we pack this
struct tighter later.
[1] The escape hatch is .keep file to limit the non-kept pack files
below 16k limit. Then you can go for another pack-objects run to
combine another 16k pack files. Repeat until you're satisfied.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/git-pack-objects.txt | 9 ++++++
builtin/pack-objects.c | 40 +++++++++++++++---------
cache.h | 1 +
pack-objects.h | 49 ++++++++++++++++++++++++++++--
4 files changed, 83 insertions(+), 16 deletions(-)
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 3503c9e3e6..b8d936ccf5 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -269,6 +269,15 @@ Unexpected missing object will raise an error.
locally created objects [without .promisor] and objects from the
promisor remote [with .promisor].) This is used with partial clone.
+LIMITATIONS
+-----------
+
+This command could only handle 16384 existing pack files at a time.
+If you have more than this, you need to exclude some pack files with
+".keep" file and --honor-pack-keep option, to combine 16k pack files
+in one, then remove these .keep files and run pack-objects one more
+time.
+
SEE ALSO
--------
linkgit:git-rev-list[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7bb5544883..7df525e201 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -29,6 +29,8 @@
#include "list.h"
#include "packfile.h"
+#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+
static const char *pack_usage[] = {
N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
N_("git pack-objects [<options>...] <base-name> [< <ref-list> | < <object-list>]"),
@@ -367,7 +369,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned long limit, int usable_delta)
{
- struct packed_git *p = entry->in_pack;
+ struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
@@ -478,7 +480,7 @@ static off_t write_object(struct hashfile *f,
if (!reuse_object)
to_reuse = 0; /* explicit */
- else if (!entry->in_pack)
+ else if (!IN_PACK(entry))
to_reuse = 0; /* can't reuse what we don't have */
else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
@@ -1024,7 +1026,7 @@ static int want_object_in_pack(const struct object_id *oid,
if (*found_pack) {
want = want_found_object(exclude, *found_pack);
if (want != -1)
- return want;
+ goto done;
}
list_for_each(pos, &packed_git_mru) {
@@ -1047,11 +1049,16 @@ static int want_object_in_pack(const struct object_id *oid,
if (!exclude && want > 0)
list_move(&p->mru, &packed_git_mru);
if (want != -1)
- return want;
+ goto done;
}
}
- return 1;
+ want = 1;
+done:
+ if (want && *found_pack && !(*found_pack)->index)
+ oe_add_pack(&to_pack, *found_pack);
+
+ return want;
}
static void create_object_entry(const struct object_id *oid,
@@ -1074,7 +1081,7 @@ static void create_object_entry(const struct object_id *oid,
else
nr_result++;
if (found_pack) {
- entry->in_pack = found_pack;
+ oe_set_in_pack(entry, found_pack);
entry->in_pack_offset = found_offset;
}
@@ -1399,8 +1406,8 @@ static void cleanup_preferred_base(void)
static void check_object(struct object_entry *entry)
{
- if (entry->in_pack) {
- struct packed_git *p = entry->in_pack;
+ if (IN_PACK(entry)) {
+ struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
const unsigned char *base_ref = NULL;
struct object_entry *base_entry;
@@ -1535,14 +1542,16 @@ static int pack_offset_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+ const struct packed_git *a_in_pack = IN_PACK(a);
+ const struct packed_git *b_in_pack = IN_PACK(b);
/* avoid filesystem trashing with loose objects */
- if (!a->in_pack && !b->in_pack)
+ if (!a_in_pack && !b_in_pack)
return oidcmp(&a->idx.oid, &b->idx.oid);
- if (a->in_pack < b->in_pack)
+ if (a_in_pack < b_in_pack)
return -1;
- if (a->in_pack > b->in_pack)
+ if (a_in_pack > b_in_pack)
return 1;
return a->in_pack_offset < b->in_pack_offset ? -1 :
(a->in_pack_offset > b->in_pack_offset);
@@ -1578,7 +1587,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &entry->size;
oi.typep = &type;
- if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to sha1_object_info, which may find another copy.
@@ -1848,8 +1857,8 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
* it, we will still save the transfer cost, as we already know
* the other side has it and we won't send src_entry at all.
*/
- if (reuse_delta && trg_entry->in_pack &&
- trg_entry->in_pack == src_entry->in_pack &&
+ if (reuse_delta && IN_PACK(trg_entry) &&
+ IN_PACK(trg_entry) == IN_PACK(src_entry) &&
!src_entry->preferred_base &&
trg_entry->in_pack_type != OBJ_REF_DELTA &&
trg_entry->in_pack_type != OBJ_OFS_DELTA)
@@ -3191,6 +3200,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
}
}
+ /* make sure IN_PACK(0) return NULL */
+ oe_add_pack(&to_pack, NULL);
+
if (progress)
progress_state = start_progress(_("Counting objects"), 0);
if (!use_internal_rev_list)
diff --git a/cache.h b/cache.h
index 862bdff83a..b90feb3802 100644
--- a/cache.h
+++ b/cache.h
@@ -1635,6 +1635,7 @@ extern struct packed_git {
int index_version;
time_t mtime;
int pack_fd;
+ int index; /* for builtin/pack-objects.c */
unsigned pack_local:1,
pack_keep:1,
freshened:1,
diff --git a/pack-objects.h b/pack-objects.h
index 9ab0ce300d..59c44b3420 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -3,6 +3,7 @@
#define OE_DFS_STATE_BITS 2
#define OE_DEPTH_BITS 12
+#define OE_IN_PACK_BITS 14
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -18,6 +19,10 @@ enum dfs_state {
};
/*
+ * The size of struct nearly determines pack-objects's memory
+ * consumption. This struct is packed tight for that reason. When you
+ * add or reorder something in this struct, think a bit about this.
+ *
* basic object info
* -----------------
* idx.oid is filled up before delta searching starts. idx.crc32 and
@@ -66,7 +71,6 @@ enum dfs_state {
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
- struct packed_git *in_pack; /* already in pack */
off_t in_pack_offset;
struct object_entry *delta; /* delta base object */
struct object_entry *delta_child; /* deltified objects who bases me */
@@ -78,6 +82,7 @@ struct object_entry {
unsigned long z_delta_size; /* delta data size (compressed) */
uint32_t hash; /* name hint hash */
unsigned char in_pack_header_size; /* note: spare bits available! */
+ unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
unsigned type:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned preferred_base:1; /*
@@ -89,9 +94,12 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
+
+ /* XXX 8 bits hole, try to pack */
+
unsigned depth:OE_DEPTH_BITS;
- /* size: 112, bit_padding: 8 bits */
+ /* size: 112, padding: 4, bit_padding: 18 bits */
};
struct packing_data {
@@ -102,6 +110,8 @@ struct packing_data {
uint32_t index_size;
unsigned int *in_pack_pos;
+ int in_pack_count;
+ struct packed_git *in_pack[1 << OE_IN_PACK_BITS];
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -145,4 +155,39 @@ static inline void oe_set_in_pack_pos(const struct packing_data *pack,
pack->in_pack_pos[e - pack->objects] = pos;
}
+static inline unsigned int oe_add_pack(struct packing_data *pack,
+ struct packed_git *p)
+{
+ if (pack->in_pack_count >= (1 << OE_IN_PACK_BITS))
+ die(_("too many packs to handle in one go. "
+ "Please add .keep files to exclude\n"
+ "some pack files and keep the number "
+ "of non-kept files below %d."),
+ 1 << OE_IN_PACK_BITS);
+ if (p) {
+ if (p->index > 0)
+ die("BUG: this packed is already indexed");
+ p->index = pack->in_pack_count;
+ }
+ pack->in_pack[pack->in_pack_count] = p;
+ return pack->in_pack_count++;
+}
+
+static inline struct packed_git *oe_in_pack(const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ return pack->in_pack[e->in_pack_idx];
+
+}
+
+static inline void oe_set_in_pack(struct object_entry *e,
+ struct packed_git *p)
+{
+ if (p->index <= 0)
+ die("BUG: found_pack should be NULL "
+ "instead of having non-positive index");
+ e->in_pack_idx = p->index;
+
+}
+
#endif
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH/RFC v3 07/12] pack-objects: move in_pack out of struct object_entry
2018-03-08 11:42 ` [PATCH/RFC v3 07/12] pack-objects: move in_pack " Nguyễn Thái Ngọc Duy
@ 2018-03-09 23:21 ` Junio C Hamano
0 siblings, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2018-03-09 23:21 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
> pack. Use an index isntead since the number of packs should be
> relatively small.
>
> This limits the number of packs we can handle to 16k. For now if you hit
> 16k pack files limit, pack-objects will simply fail [1].
>
> This technically saves 7 bytes. But we don't see any of that in
> practice due to padding. The saving becomes real when we pack this
> struct tighter later.
Somehow 7 and 16k do not add up.
We use 8 bytes in the original code, and a solution that potentially
saves 7 bytes would use only 1 byte instead of the original 8, which
would allow us to index/identify 1<<8 == 256 packs, but for some reason
we can handle up to 16k.
> [1] The escape hatch is .keep file to limit the non-kept pack files
> below 16k limit. Then you can go for another pack-objects run to
> combine another 16k pack files. Repeat until you're satisfied.
;-)
> +static inline unsigned int oe_add_pack(struct packing_data *pack,
> + struct packed_git *p)
> +{
> + if (pack->in_pack_count >= (1 << OE_IN_PACK_BITS))
> + die(_("too many packs to handle in one go. "
> + "Please add .keep files to exclude\n"
> + "some pack files and keep the number "
> + "of non-kept files below %d."),
> + 1 << OE_IN_PACK_BITS);
OK.
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 08/12] pack-objects: refer to delta objects by index instead of pointer
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
` (6 preceding siblings ...)
2018-03-08 11:42 ` [PATCH/RFC v3 07/12] pack-objects: move in_pack " Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-14 16:18 ` Junio C Hamano
2018-03-08 11:42 ` [PATCH/RFC v3 09/12] pack-objects: reorder 'hash' to pack struct object_entry Nguyễn Thái Ngọc Duy
` (4 subsequent siblings)
12 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Notice that packing_data::nr_objects is uint32_t, we could only handle
maximum 4G objects and can address all of them with an uint32_t. If we
use a pointer here, we waste 4 bytes on 64 bit architecture.
Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].
There are holes in this struct but this patch is already big. Struct
packing can be done separately. Even with holes, we save 8 bytes per
object_entry.
[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 116 ++++++++++++++++++++++-------------------
pack-objects.h | 71 ++++++++++++++++++++++---
2 files changed, 127 insertions(+), 60 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7df525e201..82a4a95888 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,6 +30,12 @@
#include "packfile.h"
#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+#define DELTA(obj) oe_delta(&to_pack, obj)
+#define DELTA_CHILD(obj) oe_delta_child(&to_pack, obj)
+#define DELTA_SIBLING(obj) oe_delta_sibling(&to_pack, obj)
+#define SET_DELTA(obj, val) oe_set_delta(&to_pack, obj, val)
+#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(&to_pack, obj, val)
+#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(&to_pack, obj, val)
static const char *pack_usage[] = {
N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
@@ -127,11 +133,11 @@ static void *get_delta(struct object_entry *entry)
buf = read_sha1_file(entry->idx.oid.hash, &type, &size);
if (!buf)
die("unable to read %s", oid_to_hex(&entry->idx.oid));
- base_buf = read_sha1_file(entry->delta->idx.oid.hash, &type,
+ base_buf = read_sha1_file(DELTA(entry)->idx.oid.hash, &type,
&base_size);
if (!base_buf)
die("unable to read %s",
- oid_to_hex(&entry->delta->idx.oid));
+ oid_to_hex(&DELTA(entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
buf, size, &delta_size, 0);
if (!delta_buf || delta_size != entry->delta_size)
@@ -288,12 +294,12 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
size = entry->delta_size;
buf = entry->delta_data;
entry->delta_data = NULL;
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
size = entry->delta_size;
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
@@ -317,7 +323,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
* encoding of the relative offset for the delta
* base from this object's position in the pack.
*/
- off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+ off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -343,7 +349,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
return 0;
}
hashwrite(f, header, hdrlen);
- hashwrite(f, entry->delta->idx.oid.hash, 20);
+ hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
} else {
if (limit && hdrlen + datalen + 20 >= limit) {
@@ -379,8 +385,8 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
- if (entry->delta)
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ if (DELTA(entry))
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
type, entry->size);
@@ -408,7 +414,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
}
if (type == OBJ_OFS_DELTA) {
- off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+ off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -427,7 +433,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
return 0;
}
hashwrite(f, header, hdrlen);
- hashwrite(f, entry->delta->idx.oid.hash, 20);
+ hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
reused_delta++;
} else {
@@ -467,13 +473,13 @@ static off_t write_object(struct hashfile *f,
else
limit = pack_size_limit - write_offset;
- if (!entry->delta)
+ if (!DELTA(entry))
usable_delta = 0; /* no delta */
else if (!pack_size_limit)
usable_delta = 1; /* unlimited packfile */
- else if (entry->delta->idx.offset == (off_t)-1)
+ else if (DELTA(entry)->idx.offset == (off_t)-1)
usable_delta = 0; /* base was written to another pack */
- else if (entry->delta->idx.offset)
+ else if (DELTA(entry)->idx.offset)
usable_delta = 1; /* base already exists in this pack */
else
usable_delta = 0; /* base could end up in another pack */
@@ -488,7 +494,7 @@ static off_t write_object(struct hashfile *f,
/* ... but pack split may override that */
else if (entry->type != entry->in_pack_type)
to_reuse = 0; /* pack has delta which is unusable */
- else if (entry->delta)
+ else if (DELTA(entry))
to_reuse = 0; /* we want to pack afresh */
else
to_reuse = 1; /* we have it in-pack undeltified,
@@ -540,12 +546,12 @@ static enum write_one_status write_one(struct hashfile *f,
}
/* if we are deltified, write out base object first. */
- if (e->delta) {
+ if (DELTA(e)) {
e->idx.offset = 1; /* now recurse */
- switch (write_one(f, e->delta, offset)) {
+ switch (write_one(f, DELTA(e), offset)) {
case WRITE_ONE_RECURSIVE:
/* we cannot depend on this one */
- e->delta = NULL;
+ SET_DELTA(e, NULL);
break;
default:
break;
@@ -607,34 +613,34 @@ static void add_descendants_to_write_order(struct object_entry **wo,
/* add this node... */
add_to_write_order(wo, endp, e);
/* all its siblings... */
- for (s = e->delta_sibling; s; s = s->delta_sibling) {
+ for (s = DELTA_SIBLING(e); s; s = DELTA_SIBLING(s)) {
add_to_write_order(wo, endp, s);
}
}
/* drop down a level to add left subtree nodes if possible */
- if (e->delta_child) {
+ if (DELTA_CHILD(e)) {
add_to_order = 1;
- e = e->delta_child;
+ e = DELTA_CHILD(e);
} else {
add_to_order = 0;
/* our sibling might have some children, it is next */
- if (e->delta_sibling) {
- e = e->delta_sibling;
+ if (DELTA_SIBLING(e)) {
+ e = DELTA_SIBLING(e);
continue;
}
/* go back to our parent node */
- e = e->delta;
- while (e && !e->delta_sibling) {
+ e = DELTA(e);
+ while (e && !DELTA_SIBLING(e)) {
/* we're on the right side of a subtree, keep
* going up until we can go right again */
- e = e->delta;
+ e = DELTA(e);
}
if (!e) {
/* done- we hit our original root node */
return;
}
/* pass it off to sibling at this level */
- e = e->delta_sibling;
+ e = DELTA_SIBLING(e);
}
};
}
@@ -645,7 +651,7 @@ static void add_family_to_write_order(struct object_entry **wo,
{
struct object_entry *root;
- for (root = e; root->delta; root = root->delta)
+ for (root = e; DELTA(root); root = DELTA(root))
; /* nothing */
add_descendants_to_write_order(wo, endp, root);
}
@@ -660,8 +666,8 @@ static struct object_entry **compute_write_order(void)
for (i = 0; i < to_pack.nr_objects; i++) {
objects[i].tagged = 0;
objects[i].filled = 0;
- objects[i].delta_child = NULL;
- objects[i].delta_sibling = NULL;
+ SET_DELTA_CHILD(&objects[i], NULL);
+ SET_DELTA_SIBLING(&objects[i], NULL);
}
/*
@@ -671,11 +677,11 @@ static struct object_entry **compute_write_order(void)
*/
for (i = to_pack.nr_objects; i > 0;) {
struct object_entry *e = &objects[--i];
- if (!e->delta)
+ if (!DELTA(e))
continue;
/* Mark me as the first child */
- e->delta_sibling = e->delta->delta_child;
- e->delta->delta_child = e;
+ e->delta_sibling_idx = DELTA(e)->delta_child_idx;
+ SET_DELTA_CHILD(DELTA(e), e);
}
/*
@@ -1498,10 +1504,10 @@ static void check_object(struct object_entry *entry)
* circular deltas.
*/
entry->type = entry->in_pack_type;
- entry->delta = base_entry;
+ SET_DELTA(entry, base_entry);
entry->delta_size = entry->size;
- entry->delta_sibling = base_entry->delta_child;
- base_entry->delta_child = entry;
+ entry->delta_sibling_idx = base_entry->delta_child_idx;
+ SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
return;
}
@@ -1572,17 +1578,19 @@ static int pack_offset_sort(const void *_a, const void *_b)
*/
static void drop_reused_delta(struct object_entry *entry)
{
- struct object_entry **p = &entry->delta->delta_child;
+ unsigned *idx = &to_pack.objects[entry->delta_idx - 1].delta_child_idx;
struct object_info oi = OBJECT_INFO_INIT;
enum object_type type;
- while (*p) {
- if (*p == entry)
- *p = (*p)->delta_sibling;
+ while (*idx) {
+ struct object_entry *oe = &to_pack.objects[*idx - 1];
+
+ if (oe == entry)
+ *idx = oe->delta_sibling_idx;
else
- p = &(*p)->delta_sibling;
+ idx = &oe->delta_sibling_idx;
}
- entry->delta = NULL;
+ SET_DELTA(entry, NULL);
entry->depth = 0;
oi.sizep = &entry->size;
@@ -1624,7 +1632,7 @@ static void break_delta_chains(struct object_entry *entry)
for (cur = entry, total_depth = 0;
cur;
- cur = cur->delta, total_depth++) {
+ cur = DELTA(cur), total_depth++) {
if (cur->dfs_state == DFS_DONE) {
/*
* We've already seen this object and know it isn't
@@ -1649,7 +1657,7 @@ static void break_delta_chains(struct object_entry *entry)
* it's not a delta, we're done traversing, but we'll mark it
* done to save time on future traversals.
*/
- if (!cur->delta) {
+ if (!DELTA(cur)) {
cur->dfs_state = DFS_DONE;
break;
}
@@ -1672,7 +1680,7 @@ static void break_delta_chains(struct object_entry *entry)
* We keep all commits in the chain that we examined.
*/
cur->dfs_state = DFS_ACTIVE;
- if (cur->delta->dfs_state == DFS_ACTIVE) {
+ if (DELTA(cur)->dfs_state == DFS_ACTIVE) {
drop_reused_delta(cur);
cur->dfs_state = DFS_DONE;
break;
@@ -1687,7 +1695,7 @@ static void break_delta_chains(struct object_entry *entry)
* an extra "next" pointer to keep going after we reset cur->delta.
*/
for (cur = entry; cur; cur = next) {
- next = cur->delta;
+ next = DELTA(cur);
/*
* We should have a chain of zero or more ACTIVE states down to
@@ -1870,7 +1878,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
/* Now some size filtering heuristics. */
trg_size = trg_entry->size;
- if (!trg_entry->delta) {
+ if (!DELTA(trg_entry)) {
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
@@ -1946,7 +1954,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
if (!delta_buf)
return 0;
- if (trg_entry->delta) {
+ if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
if (delta_size == trg_entry->delta_size &&
src->depth + 1 >= trg->depth) {
@@ -1975,7 +1983,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
free(delta_buf);
}
- trg_entry->delta = src_entry;
+ SET_DELTA(trg_entry, src_entry);
trg_entry->delta_size = delta_size;
trg->depth = src->depth + 1;
@@ -1984,13 +1992,13 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)
{
- struct object_entry *child = me->delta_child;
+ struct object_entry *child = DELTA_CHILD(me);
unsigned int m = n;
while (child) {
unsigned int c = check_delta_limit(child, n + 1);
if (m < c)
m = c;
- child = child->delta_sibling;
+ child = DELTA_SIBLING(child);
}
return m;
}
@@ -2059,7 +2067,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* otherwise they would become too deep.
*/
max_depth = depth;
- if (entry->delta_child) {
+ if (DELTA_CHILD(entry)) {
max_depth -= check_delta_limit(entry, 0);
if (max_depth <= 0)
goto next;
@@ -2109,7 +2117,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* depth, leaving it in the window is pointless. we
* should evict it first.
*/
- if (entry->delta && max_depth <= n->depth)
+ if (DELTA(entry) && max_depth <= n->depth)
continue;
/*
@@ -2117,7 +2125,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* currently deltified object, to keep it longer. It will
* be the first base object to be attempted next.
*/
- if (entry->delta) {
+ if (DELTA(entry)) {
struct unpacked swap = array[best_base];
int dist = (window + idx - best_base) % window;
int dst = best_base;
@@ -2438,7 +2446,7 @@ static void prepare_pack(int window, int depth)
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = to_pack.objects + i;
- if (entry->delta)
+ if (DELTA(entry))
/* This happens if we decided to reuse existing
* delta from a pack. "reuse_delta &&" is implied.
*/
diff --git a/pack-objects.h b/pack-objects.h
index 59c44b3420..1c0ad4c9ef 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -72,11 +72,13 @@ struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
off_t in_pack_offset;
- struct object_entry *delta; /* delta base object */
- struct object_entry *delta_child; /* deltified objects who bases me */
- struct object_entry *delta_sibling; /* other deltified objects who
- * uses the same base as me
- */
+ uint32_t delta_idx; /* delta base object */
+ uint32_t delta_child_idx; /* deltified objects who bases me */
+ uint32_t delta_sibling_idx; /* other deltified objects who
+ * uses the same base as me
+ */
+ /* XXX 4 bytes hole, try to pack */
+
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
@@ -99,7 +101,7 @@ struct object_entry {
unsigned depth:OE_DEPTH_BITS;
- /* size: 112, padding: 4, bit_padding: 18 bits */
+ /* size: 104, padding: 4, bit_padding: 18 bits */
};
struct packing_data {
@@ -190,4 +192,61 @@ static inline void oe_set_in_pack(struct object_entry *e,
}
+static inline struct object_entry *oe_delta(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_idx)
+ return &pack->objects[e->delta_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_idx = 0;
+}
+
+static inline struct object_entry *oe_delta_child(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_child_idx)
+ return &pack->objects[e->delta_child_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta_child(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_child_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_child_idx = 0;
+}
+
+static inline struct object_entry *oe_delta_sibling(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_sibling_idx)
+ return &pack->objects[e->delta_sibling_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta_sibling(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_sibling_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_sibling_idx = 0;
+}
+
#endif
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH/RFC v3 08/12] pack-objects: refer to delta objects by index instead of pointer
2018-03-08 11:42 ` [PATCH/RFC v3 08/12] pack-objects: refer to delta objects by index instead of pointer Nguyễn Thái Ngọc Duy
@ 2018-03-14 16:18 ` Junio C Hamano
0 siblings, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2018-03-14 16:18 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> Notice that packing_data::nr_objects is uint32_t, we could only handle
> maximum 4G objects and can address all of them with an uint32_t. If we
> use a pointer here, we waste 4 bytes on 64 bit architecture.
Some things are left unsaid or left unclear and make readers stutter
a bit while reading this paragraph. We can address them with
uint32_t only because we happen to have a linear array of all
objects involved already, i.e. the pack->objects[] array. The
readers are forced to rephrase the above in their mind
... and each of them can be identified with an uint32_t.
Because we have all of these objects in pack->objects[], we
can replace the "delta" field in each object entry that
points at its delta base object with uint32_t index into
this array to save memory (on 64-bit arch, 8-byte pointer
gets shrunk to 4-byte uint).
or something like that before understanding why this is a valid
memory footprint optimization.
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 09/12] pack-objects: reorder 'hash' to pack struct object_entry
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
` (7 preceding siblings ...)
2018-03-08 11:42 ` [PATCH/RFC v3 08/12] pack-objects: refer to delta objects by index instead of pointer Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-08 11:42 ` [PATCH/RFC v3 10/12] pack-objects: shrink z_delta_size field in " Nguyễn Thái Ngọc Duy
` (3 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
pack-objects.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/pack-objects.h b/pack-objects.h
index 1c0ad4c9ef..3c15cf7b23 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -77,12 +77,10 @@ struct object_entry {
uint32_t delta_sibling_idx; /* other deltified objects who
* uses the same base as me
*/
- /* XXX 4 bytes hole, try to pack */
-
+ uint32_t hash; /* name hint hash */
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
- uint32_t hash; /* name hint hash */
unsigned char in_pack_header_size; /* note: spare bits available! */
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
unsigned type:TYPE_BITS;
@@ -101,7 +99,7 @@ struct object_entry {
unsigned depth:OE_DEPTH_BITS;
- /* size: 104, padding: 4, bit_padding: 18 bits */
+ /* size: 96, bit_padding: 18 bits */
};
struct packing_data {
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 10/12] pack-objects: shrink z_delta_size field in struct object_entry
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
` (8 preceding siblings ...)
2018-03-08 11:42 ` [PATCH/RFC v3 09/12] pack-objects: reorder 'hash' to pack struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-08 11:42 ` [PATCH/RFC v3 11/12] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
` (2 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 16 bits, so you can only cache 65kb deltas.
Larger deltas must be recomputed at when the pack is written down.
This saves us 8 bytes (some from previous bit padding).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/config.txt | 3 ++-
builtin/pack-objects.c | 22 ++++++++++++++++------
pack-objects.h | 11 ++++++++---
3 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9bd3f5a789..00fa824448 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2449,7 +2449,8 @@ pack.deltaCacheLimit::
The maximum size of a delta, that is cached in
linkgit:git-pack-objects[1]. This cache is used to speed up the
writing object phase by not having to recompute the final delta
- result once the best match for all objects is found. Defaults to 1000.
+ result once the best match for all objects is found.
+ Defaults to 1000. Maximum value is 65535.
pack.threads::
Specifies the number of threads to spawn when searching for best
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 82a4a95888..39920061e9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2105,12 +2105,19 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* between writes at that moment.
*/
if (entry->delta_data && !pack_to_stdout) {
- entry->z_delta_size = do_compress(&entry->delta_data,
- entry->delta_size);
- cache_lock();
- delta_cache_size -= entry->delta_size;
- delta_cache_size += entry->z_delta_size;
- cache_unlock();
+ unsigned long size;
+
+ size = do_compress(&entry->delta_data, entry->delta_size);
+ entry->z_delta_size = size;
+ if (entry->z_delta_size == size) {
+ cache_lock();
+ delta_cache_size -= entry->delta_size;
+ delta_cache_size += entry->z_delta_size;
+ cache_unlock();
+ } else {
+ FREE_AND_NULL(entry->delta_data);
+ entry->z_delta_size = 0;
+ }
}
/* if we made n a delta, and if n is already at max
@@ -3089,6 +3096,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (depth > (1 << OE_DEPTH_BITS))
die(_("delta chain depth %d is greater than maximum limit %d"),
depth, (1 << OE_DEPTH_BITS));
+ if (cache_max_small_delta_size >= (1 << OE_Z_DELTA_BITS))
+ die(_("pack.deltaCacheLimit is greater than maximum limit %d"),
+ 1 << OE_Z_DELTA_BITS);
argv_array_push(&rp, "pack-objects");
if (thin) {
diff --git a/pack-objects.h b/pack-objects.h
index 3c15cf7b23..cbb39ab568 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -4,6 +4,7 @@
#define OE_DFS_STATE_BITS 2
#define OE_DEPTH_BITS 12
#define OE_IN_PACK_BITS 14
+#define OE_Z_DELTA_BITS 16
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -80,7 +81,6 @@ struct object_entry {
uint32_t hash; /* name hint hash */
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
- unsigned long z_delta_size; /* delta data size (compressed) */
unsigned char in_pack_header_size; /* note: spare bits available! */
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
unsigned type:TYPE_BITS;
@@ -93,13 +93,18 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
- unsigned dfs_state:OE_DFS_STATE_BITS;
/* XXX 8 bits hole, try to pack */
+ unsigned dfs_state:OE_DFS_STATE_BITS;
unsigned depth:OE_DEPTH_BITS;
+ /*
+ * if delta_data contains a compressed delta, this contains
+ * the compressed length
+ */
+ unsigned z_delta_size:OE_Z_DELTA_BITS;
- /* size: 96, bit_padding: 18 bits */
+ /* size: 88, bit_padding: 2 bits */
};
struct packing_data {
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 11/12] pack-objects: shrink size field in struct object_entry
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
` (9 preceding siblings ...)
2018-03-08 11:42 ` [PATCH/RFC v3 10/12] pack-objects: shrink z_delta_size field in " Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-08 11:42 ` [PATCH/RFC v3 12/12] pack-objects: shrink delta_size " Nguyễn Thái Ngọc Duy
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
It's very very rare that an uncompressd object is larger than
4GB (partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size is
smaller than this limit.
Shrink size field down to 32 bits [1] and one overflow bit. If the size
is too large, we read it back from disk.
Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).
There's no actual saving from this due to holes. Which should be gone in
the next patch.
[1] it's actually already 32 bits on 64-bit Windows
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 49 ++++++++++++++++++++++++++----------------
pack-objects.h | 48 +++++++++++++++++++++++++++++++++++++++--
2 files changed, 77 insertions(+), 20 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 39920061e9..db040e95db 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -274,7 +274,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
if (!usable_delta) {
if (entry->type == OBJ_BLOB &&
- entry->size > big_file_threshold &&
+ oe_size_greater_than(entry, big_file_threshold) &&
(st = open_istream(entry->idx.oid.hash, &type, &size, NULL)) != NULL)
buf = NULL;
else {
@@ -384,12 +384,13 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned char header[MAX_PACK_OBJECT_HEADER],
dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
+ unsigned long entry_size = oe_size(entry);
if (DELTA(entry))
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
- type, entry->size);
+ type, entry_size);
offset = entry->in_pack_offset;
revidx = find_pack_revindex(p, offset);
@@ -406,7 +407,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
datalen -= entry->in_pack_header_size;
if (!pack_to_stdout && p->index_version == 1 &&
- check_pack_inflate(p, &w_curs, offset, datalen, entry->size)) {
+ check_pack_inflate(p, &w_curs, offset, datalen, entry_size)) {
error("corrupt packed object for %s",
oid_to_hex(&entry->idx.oid));
unuse_pack(&w_curs);
@@ -1412,6 +1413,8 @@ static void cleanup_preferred_base(void)
static void check_object(struct object_entry *entry)
{
+ unsigned long size;
+
if (IN_PACK(entry)) {
struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
@@ -1431,13 +1434,14 @@ static void check_object(struct object_entry *entry)
*/
used = unpack_object_header_buffer(buf, avail,
&type,
- &entry->size);
+ &size);
if (used == 0)
goto give_up;
if (type < 0)
die("BUG: invalid type %d", type);
entry->in_pack_type = type;
+ oe_set_size(entry, size);
/*
* Determine if this is a delta and if so whether we can
@@ -1505,7 +1509,7 @@ static void check_object(struct object_entry *entry)
*/
entry->type = entry->in_pack_type;
SET_DELTA(entry, base_entry);
- entry->delta_size = entry->size;
+ entry->delta_size = oe_size(entry);
entry->delta_sibling_idx = base_entry->delta_child_idx;
SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
@@ -1513,14 +1517,17 @@ static void check_object(struct object_entry *entry)
}
if (entry->type) {
+ unsigned long size;
+
+ size = get_size_from_delta(p, &w_curs,
+ entry->in_pack_offset + entry->in_pack_header_size);
/*
* This must be a delta and we already know what the
* final object type is. Let's extract the actual
* object size from the delta header.
*/
- entry->size = get_size_from_delta(p, &w_curs,
- entry->in_pack_offset + entry->in_pack_header_size);
- if (entry->size == 0)
+ oe_set_size(entry, size);
+ if (oe_size_less_than(entry, 1))
goto give_up;
unuse_pack(&w_curs);
return;
@@ -1535,13 +1542,14 @@ static void check_object(struct object_entry *entry)
unuse_pack(&w_curs);
}
- entry->type = sha1_object_info(entry->idx.oid.hash, &entry->size);
+ entry->type = sha1_object_info(entry->idx.oid.hash, &size);
/*
* The error condition is checked in prepare_pack(). This is
* to permit a missing preferred base object to be ignored
* as a preferred base. Doing so can result in a larger
* pack file, but the transfer will still take place.
*/
+ oe_set_size(entry, size);
}
static int pack_offset_sort(const void *_a, const void *_b)
@@ -1581,6 +1589,7 @@ static void drop_reused_delta(struct object_entry *entry)
unsigned *idx = &to_pack.objects[entry->delta_idx - 1].delta_child_idx;
struct object_info oi = OBJECT_INFO_INIT;
enum object_type type;
+ unsigned long size;
while (*idx) {
struct object_entry *oe = &to_pack.objects[*idx - 1];
@@ -1593,7 +1602,7 @@ static void drop_reused_delta(struct object_entry *entry)
SET_DELTA(entry, NULL);
entry->depth = 0;
- oi.sizep = &entry->size;
+ oi.sizep = &size;
oi.typep = &type;
if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
@@ -1603,11 +1612,13 @@ static void drop_reused_delta(struct object_entry *entry)
* and dealt with in prepare_pack().
*/
entry->type = sha1_object_info(entry->idx.oid.hash,
- &entry->size);
+ &size);
+ oe_set_size(entry, size);
} else {
if (type < 0)
die("BUG: invalid type %d", type);
entry->type = type;
+ oe_set_size(entry, size);
}
}
@@ -1748,7 +1759,7 @@ static void get_object_details(void)
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = sorted_by_offset[i];
check_object(entry);
- if (big_file_threshold < entry->size)
+ if (oe_size_greater_than(entry, big_file_threshold))
entry->no_try_delta = 1;
}
@@ -1775,6 +1786,8 @@ static int type_size_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+ unsigned long a_size = oe_size(a);
+ unsigned long b_size = oe_size(b);
if (a->type > b->type)
return -1;
@@ -1788,9 +1801,9 @@ static int type_size_sort(const void *_a, const void *_b)
return -1;
if (a->preferred_base < b->preferred_base)
return 1;
- if (a->size > b->size)
+ if (a_size > b_size)
return -1;
- if (a->size < b->size)
+ if (a_size < b_size)
return 1;
return a < b ? -1 : (a > b); /* newest first */
}
@@ -1877,7 +1890,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
return 0;
/* Now some size filtering heuristics. */
- trg_size = trg_entry->size;
+ trg_size = oe_size(trg_entry);
if (!DELTA(trg_entry)) {
max_size = trg_size/2 - 20;
ref_depth = 1;
@@ -1889,7 +1902,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
(max_depth - ref_depth + 1);
if (max_size == 0)
return 0;
- src_size = src_entry->size;
+ src_size = oe_size(src_entry);
sizediff = src_size < trg_size ? trg_size - src_size : 0;
if (sizediff >= max_size)
return 0;
@@ -2009,7 +2022,7 @@ static unsigned long free_unpacked(struct unpacked *n)
free_delta_index(n->index);
n->index = NULL;
if (n->data) {
- freed_mem += n->entry->size;
+ freed_mem += oe_size(n->entry);
FREE_AND_NULL(n->data);
}
n->entry = NULL;
@@ -2459,7 +2472,7 @@ static void prepare_pack(int window, int depth)
*/
continue;
- if (entry->size < 50)
+ if (oe_size_less_than(entry, 50))
continue;
if (entry->no_try_delta)
diff --git a/pack-objects.h b/pack-objects.h
index cbb39ab568..0253df6cd4 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -71,7 +71,11 @@ enum dfs_state {
*/
struct object_entry {
struct pack_idx_entry idx;
- unsigned long size; /* uncompressed size */
+ /* object uncompressed size _if_ size_valid is true */
+ uint32_t size_;
+
+ /* XXX 4 bytes hole, try to pack */
+
off_t in_pack_offset;
uint32_t delta_idx; /* delta base object */
uint32_t delta_child_idx; /* deltified objects who bases me */
@@ -93,6 +97,7 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
+ unsigned size_valid:1;
/* XXX 8 bits hole, try to pack */
@@ -104,7 +109,7 @@ struct object_entry {
*/
unsigned z_delta_size:OE_Z_DELTA_BITS;
- /* size: 88, bit_padding: 2 bits */
+ /* size: 88, bit_padding: 1 bits */
};
struct packing_data {
@@ -252,4 +257,43 @@ static inline void oe_set_delta_sibling(struct packing_data *pack,
e->delta_sibling_idx = 0;
}
+static inline unsigned long oe_size(const struct object_entry *e)
+{
+ if (e->size_valid) {
+ return e->size_;
+ } else {
+ unsigned long size;
+
+ sha1_object_info(e->idx.oid.hash, &size);
+ return size;
+ }
+}
+
+static inline int oe_size_less_than(const struct object_entry *e,
+ unsigned long limit)
+{
+ if (e->size_valid)
+ return e->size_ < limit;
+ if (limit > maximum_unsigned_value_of_type(uint32_t))
+ return 1;
+ return oe_size(e) < limit;
+}
+
+static inline int oe_size_greater_than(const struct object_entry *e,
+ unsigned long limit)
+{
+ if (e->size_valid)
+ return e->size_ > limit;
+ if (limit <= maximum_unsigned_value_of_type(uint32_t))
+ return 1;
+ return oe_size(e) > limit;
+}
+
+static inline void oe_set_size(struct object_entry *e,
+ unsigned long size)
+{
+ e->size_ = size;
+ e->size_valid = e->size_ == size;
+}
+
#endif
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH/RFC v3 12/12] pack-objects: shrink delta_size field in struct object_entry
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
` (10 preceding siblings ...)
2018-03-08 11:42 ` [PATCH/RFC v3 11/12] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
@ 2018-03-08 11:42 ` Nguyễn Thái Ngọc Duy
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-08 11:42 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Allowing a delta size of 64 bits is crazy. Shrink this field down to
31 bits with one overflow bit.
If we encounter an existing delta larger than 2GB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 24 ++++++++++++++----------
pack-objects.h | 30 +++++++++++++++++++++++++-----
2 files changed, 39 insertions(+), 15 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index db040e95db..0f65e0f243 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,10 +30,12 @@
#include "packfile.h"
#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+#define DELTA_SIZE(obj) oe_delta_size(&to_pack, obj)
#define DELTA(obj) oe_delta(&to_pack, obj)
#define DELTA_CHILD(obj) oe_delta_child(&to_pack, obj)
#define DELTA_SIBLING(obj) oe_delta_sibling(&to_pack, obj)
#define SET_DELTA(obj, val) oe_set_delta(&to_pack, obj, val)
+#define SET_DELTA_SIZE(obj, val) oe_set_delta_size(&to_pack, obj, val)
#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(&to_pack, obj, val)
#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(&to_pack, obj, val)
@@ -140,7 +142,7 @@ static void *get_delta(struct object_entry *entry)
oid_to_hex(&DELTA(entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
buf, size, &delta_size, 0);
- if (!delta_buf || delta_size != entry->delta_size)
+ if (!delta_buf || delta_size != DELTA_SIZE(entry))
die("delta size changed");
free(buf);
free(base_buf);
@@ -291,14 +293,14 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
FREE_AND_NULL(entry->delta_data);
entry->z_delta_size = 0;
} else if (entry->delta_data) {
- size = entry->delta_size;
+ size = DELTA_SIZE(entry);
buf = entry->delta_data;
entry->delta_data = NULL;
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
- size = entry->delta_size;
+ size = DELTA_SIZE(entry);
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
@@ -1509,7 +1511,7 @@ static void check_object(struct object_entry *entry)
*/
entry->type = entry->in_pack_type;
SET_DELTA(entry, base_entry);
- entry->delta_size = oe_size(entry);
+ SET_DELTA_SIZE(entry, oe_size(entry));
entry->delta_sibling_idx = base_entry->delta_child_idx;
SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
@@ -1895,7 +1897,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
- max_size = trg_entry->delta_size;
+ max_size = DELTA_SIZE(trg_entry);
ref_depth = trg->depth;
}
max_size = (uint64_t)max_size * (max_depth - src->depth) /
@@ -1966,10 +1968,12 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
if (!delta_buf)
return 0;
+ if (delta_size >= maximum_unsigned_value_of_type(uint32_t))
+ return 0;
if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
- if (delta_size == trg_entry->delta_size &&
+ if (delta_size == DELTA_SIZE(trg_entry) &&
src->depth + 1 >= trg->depth) {
free(delta_buf);
return 0;
@@ -1984,7 +1988,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
free(trg_entry->delta_data);
cache_lock();
if (trg_entry->delta_data) {
- delta_cache_size -= trg_entry->delta_size;
+ delta_cache_size -= DELTA_SIZE(trg_entry);
trg_entry->delta_data = NULL;
}
if (delta_cacheable(src_size, trg_size, delta_size)) {
@@ -1997,7 +2001,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
}
SET_DELTA(trg_entry, src_entry);
- trg_entry->delta_size = delta_size;
+ SET_DELTA_SIZE(trg_entry, delta_size);
trg->depth = src->depth + 1;
return 1;
@@ -2120,11 +2124,11 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
if (entry->delta_data && !pack_to_stdout) {
unsigned long size;
- size = do_compress(&entry->delta_data, entry->delta_size);
+ size = do_compress(&entry->delta_data, DELTA_SIZE(entry));
entry->z_delta_size = size;
if (entry->z_delta_size == size) {
cache_lock();
- delta_cache_size -= entry->delta_size;
+ delta_cache_size -= DELTA_SIZE(entry);
delta_cache_size += entry->z_delta_size;
cache_unlock();
} else {
diff --git a/pack-objects.h b/pack-objects.h
index 0253df6cd4..f1a82bf9ac 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -5,6 +5,7 @@
#define OE_DEPTH_BITS 12
#define OE_IN_PACK_BITS 14
#define OE_Z_DELTA_BITS 16
+#define OE_DELTA_SIZE_BITS 31
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -73,9 +74,6 @@ struct object_entry {
struct pack_idx_entry idx;
/* object uncompressed size _if_ size_valid is true */
uint32_t size_;
-
- /* XXX 4 bytes hole, try to pack */
-
off_t in_pack_offset;
uint32_t delta_idx; /* delta base object */
uint32_t delta_child_idx; /* deltified objects who bases me */
@@ -84,7 +82,10 @@ struct object_entry {
*/
uint32_t hash; /* name hint hash */
void *delta_data; /* cached delta (uncompressed) */
- unsigned long delta_size; /* delta data size (uncompressed) */
+ /* object uncompressed size _if_ size_valid is true */
+ uint32_t size;
+ uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
+ uint32_t delta_size_valid:1;
unsigned char in_pack_header_size; /* note: spare bits available! */
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
unsigned type:TYPE_BITS;
@@ -109,7 +110,7 @@ struct object_entry {
*/
unsigned z_delta_size:OE_Z_DELTA_BITS;
- /* size: 88, bit_padding: 1 bits */
+ /* size: 80, bit_padding: 1 bits */
};
struct packing_data {
@@ -296,4 +297,23 @@ static inline void oe_set_size(struct object_entry *e,
e->size_valid = e->size_ == size;
}
+static inline unsigned long oe_delta_size(struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_size_valid)
+ return e->delta_size_;
+ return oe_size(e);
+}
+
+static inline void oe_set_delta_size(struct packing_data *pack,
+ struct object_entry *e,
+ unsigned long size)
+{
+ e->delta_size_ = size;
+ e->delta_size_valid =e->delta_size_ == size;
+ if (!e->delta_size_valid && size != oe_size(e))
+ die("BUG: this can only happen in check_object() "
+ "where delta size is the same as entry size");
+}
+
#endif
--
2.16.2.873.g32ff258c87
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v4 00/11] nd/pack-objects-pack-struct updates
2018-03-08 11:42 ` [PATCH/RFC v3 00/12] " Nguyễn Thái Ngọc Duy
` (11 preceding siblings ...)
2018-03-08 11:42 ` [PATCH/RFC v3 12/12] pack-objects: shrink delta_size " Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-16 18:31 ` [PATCH v4 01/11] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
` (11 more replies)
12 siblings, 12 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
The most important change in v4 is it fixes a case where I failed to
propagate an error condition to later code in 02/11. This results in
new wrappers, oe_type() and oe_set_type(). This also reveals another
extra object type, OBJ_NONE, that's also used by pack-objects.
Other changes are comments fixes, commit messages fixes, off-by-one
bugs. No more saving compared to v3.
I also changed my approach a bit. I stop trying to make struct
reduction visible at every patch. All these patches shrink some field
even if the struct size is the same. The reordering and packing
happens at the last patch.
I'm not super happy that many corner cases of my changes are not
covered by the test suite. In many cases it's very hard or expensive
to create the right error condition. If only this code is part of
libgit.a and I could write C unit tests for it...
Interdiff
-- 8< --
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0f65e0f243..c388d87c3e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -275,7 +275,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
struct git_istream *st = NULL;
if (!usable_delta) {
- if (entry->type == OBJ_BLOB &&
+ if (oe_type(entry) == OBJ_BLOB &&
oe_size_greater_than(entry, big_file_threshold) &&
(st = open_istream(entry->idx.oid.hash, &type, &size, NULL)) != NULL)
buf = NULL;
@@ -381,7 +381,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
- enum object_type type = entry->type;
+ enum object_type type = oe_type(entry);
off_t datalen;
unsigned char header[MAX_PACK_OBJECT_HEADER],
dheader[MAX_PACK_OBJECT_HEADER];
@@ -491,11 +491,12 @@ static off_t write_object(struct hashfile *f,
to_reuse = 0; /* explicit */
else if (!IN_PACK(entry))
to_reuse = 0; /* can't reuse what we don't have */
- else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
+ else if (oe_type(entry) == OBJ_REF_DELTA ||
+ oe_type(entry) == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
to_reuse = usable_delta;
/* ... but pack split may override that */
- else if (entry->type != entry->in_pack_type)
+ else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0; /* pack has delta which is unusable */
else if (DELTA(entry))
to_reuse = 0; /* we want to pack afresh */
@@ -716,8 +717,8 @@ static struct object_entry **compute_write_order(void)
* And then all remaining commits and tags.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (objects[i].type != OBJ_COMMIT &&
- objects[i].type != OBJ_TAG)
+ if (oe_type(&objects[i]) != OBJ_COMMIT &&
+ oe_type(&objects[i]) != OBJ_TAG)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -726,7 +727,7 @@ static struct object_entry **compute_write_order(void)
* And then all the trees.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (objects[i].type != OBJ_TREE)
+ if (oe_type(&objects[i]) != OBJ_TREE)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -1083,8 +1084,7 @@ static void create_object_entry(const struct object_id *oid,
entry = packlist_alloc(&to_pack, oid->hash, index_pos);
entry->hash = hash;
- if (type)
- entry->type = type;
+ oe_set_type(entry, type);
if (exclude)
entry->preferred_base = 1;
else
@@ -1453,9 +1453,9 @@ static void check_object(struct object_entry *entry)
switch (entry->in_pack_type) {
default:
/* Not a delta hence we've already got all we need. */
- entry->type = entry->in_pack_type;
+ oe_set_type(entry, entry->in_pack_type);
entry->in_pack_header_size = used;
- if (entry->type < OBJ_COMMIT || entry->type > OBJ_BLOB)
+ if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > OBJ_BLOB)
goto give_up;
unuse_pack(&w_curs);
return;
@@ -1509,7 +1509,7 @@ static void check_object(struct object_entry *entry)
* deltify other objects against, in order to avoid
* circular deltas.
*/
- entry->type = entry->in_pack_type;
+ oe_set_type(entry, entry->in_pack_type);
SET_DELTA(entry, base_entry);
SET_DELTA_SIZE(entry, oe_size(entry));
entry->delta_sibling_idx = base_entry->delta_child_idx;
@@ -1518,7 +1518,7 @@ static void check_object(struct object_entry *entry)
return;
}
- if (entry->type) {
+ if (oe_type(entry)) {
unsigned long size;
size = get_size_from_delta(p, &w_curs,
@@ -1544,14 +1544,15 @@ static void check_object(struct object_entry *entry)
unuse_pack(&w_curs);
}
- entry->type = sha1_object_info(entry->idx.oid.hash, &size);
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &size));
/*
* The error condition is checked in prepare_pack(). This is
* to permit a missing preferred base object to be ignored
* as a preferred base. Doing so can result in a larger
* pack file, but the transfer will still take place.
*/
- oe_set_size(entry, size);
+ if (entry->type_valid)
+ oe_set_size(entry, size);
}
static int pack_offset_sort(const void *_a, const void *_b)
@@ -1613,15 +1614,12 @@ static void drop_reused_delta(struct object_entry *entry)
* And if that fails, the error will be recorded in entry->type
* and dealt with in prepare_pack().
*/
- entry->type = sha1_object_info(entry->idx.oid.hash,
- &size);
- oe_set_size(entry, size);
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
+ &size));
} else {
- if (type < 0)
- die("BUG: invalid type %d", type);
- entry->type = type;
- oe_set_size(entry, size);
+ oe_set_type(entry, type);
}
+ oe_set_size(entry, size);
}
/*
@@ -1788,12 +1786,14 @@ static int type_size_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+ enum object_type a_type = oe_type(a);
+ enum object_type b_type = oe_type(b);
unsigned long a_size = oe_size(a);
unsigned long b_size = oe_size(b);
- if (a->type > b->type)
+ if (a_type > b_type)
return -1;
- if (a->type < b->type)
+ if (a_type < b_type)
return 1;
if (a->hash > b->hash)
return -1;
@@ -1869,7 +1869,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
void *delta_buf;
/* Don't bother doing diffs between different types */
- if (trg_entry->type != src_entry->type)
+ if (oe_type(trg_entry) != oe_type(src_entry))
return -1;
/*
@@ -2484,11 +2484,11 @@ static void prepare_pack(int window, int depth)
if (!entry->preferred_base) {
nr_deltas++;
- if (entry->type < 0)
+ if (oe_type(entry) < 0)
die("unable to get type of object %s",
oid_to_hex(&entry->idx.oid));
} else {
- if (entry->type < 0) {
+ if (oe_type(entry) < 0) {
/*
* This object is not found, but we
* don't have to include it anyway.
@@ -2597,7 +2597,7 @@ static void read_object_list_from_stdin(void)
die("expected object ID, got garbage:\n %s", line);
add_preferred_base_object(p + 1);
- add_object_entry(&oid, 0, p + 1, 0);
+ add_object_entry(&oid, OBJ_NONE, p + 1, 0);
}
}
@@ -3110,7 +3110,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (pack_to_stdout != !base_name || argc)
usage_with_options(pack_usage, pack_objects_options);
- if (depth > (1 << OE_DEPTH_BITS))
+ if (depth >= (1 << OE_DEPTH_BITS))
die(_("delta chain depth %d is greater than maximum limit %d"),
depth, (1 << OE_DEPTH_BITS));
if (cache_max_small_delta_size >= (1 << OE_Z_DELTA_BITS))
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 256a63f892..f7c897515b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -66,12 +66,12 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
oe_set_in_pack_pos(to_pack, entry, i);
- switch (entry->type) {
+ switch (oe_type(entry)) {
case OBJ_COMMIT:
case OBJ_TREE:
case OBJ_BLOB:
case OBJ_TAG:
- real_type = entry->type;
+ real_type = oe_type(entry);
break;
default:
@@ -100,7 +100,7 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
default:
die("Missing type information for %s (%d/%d)",
oid_to_hex(&entry->idx.oid), real_type,
- entry->type);
+ oe_type(entry));
}
}
}
diff --git a/pack-objects.h b/pack-objects.h
index 1a159aba37..0fa0c83294 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -28,7 +28,7 @@ enum dfs_state {
* basic object info
* -----------------
* idx.oid is filled up before delta searching starts. idx.crc32 and
- * is only valid after the object is written down and will be used for
+ * is only valid after the object is written out and will be used for
* generating the index. idx.offset will be both gradually set and
* used in writing phase (base objects get offset first, then deltas
* refer to them)
@@ -59,8 +59,8 @@ enum dfs_state {
* compute_write_order(). "delta" and "delta_size" must remain valid
* at object writing phase in case the delta is not cached.
*
- * If a delta is cached in memory and is compressed delta points to
- * the data and z_delta_size contains the compressed size. If it's
+ * If a delta is cached in memory and is compressed delta_data points
+ * to the data and z_delta_size contains the compressed size. If it's
* uncompressed [1], z_delta_size must be zero. delta_size is always
* the uncompressed size and must be valid even if the delta is not
* cached.
@@ -70,23 +70,22 @@ enum dfs_state {
*/
struct object_entry {
struct pack_idx_entry idx;
- /* object uncompressed size _if_ size_valid is true */
- uint32_t size_;
+ void *delta_data; /* cached delta (uncompressed) */
off_t in_pack_offset;
+ uint32_t hash; /* name hint hash */
+ uint32_t size_; /* object uncompressed size _if_ size_valid is true */
uint32_t delta_idx; /* delta base object */
uint32_t delta_child_idx; /* deltified objects who bases me */
uint32_t delta_sibling_idx; /* other deltified objects who
* uses the same base as me
*/
- uint32_t hash; /* name hint hash */
- void *delta_data; /* cached delta (uncompressed) */
- /* object uncompressed size _if_ size_valid is true */
- uint32_t size;
- uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
+ uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
uint32_t delta_size_valid:1;
- unsigned char in_pack_header_size; /* note: spare bits available! */
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
- unsigned type:TYPE_BITS;
+ unsigned size_valid:1;
+ unsigned z_delta_size:OE_Z_DELTA_BITS;
+ unsigned type_valid:1;
+ unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned preferred_base:1; /*
* we do not pack this, but is available
@@ -94,21 +93,13 @@ struct object_entry {
* objects against.
*/
unsigned no_try_delta:1;
+ unsigned char in_pack_header_size;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
- unsigned size_valid:1;
-
- /* XXX 8 bits hole, try to pack */
-
unsigned dfs_state:OE_DFS_STATE_BITS;
unsigned depth:OE_DEPTH_BITS;
- /*
- * if delta_data contains a compressed delta, this contains
- * the compressed length
- */
- unsigned z_delta_size:OE_Z_DELTA_BITS;
- /* size: 80, bit_padding: 1 bits */
+ /* size: 80, bit_padding: 16 bits */
};
struct packing_data {
@@ -151,6 +142,21 @@ static inline uint32_t pack_name_hash(const char *name)
return hash;
}
+static inline enum object_type oe_type(const struct object_entry *e)
+{
+ return e->type_valid ? e->type_ : OBJ_BAD;
+}
+
+static inline void oe_set_type(struct object_entry *e,
+ enum object_type type)
+{
+ if (type >= OBJ_ANY)
+ die("BUG: OBJ_ANY cannot be set in pack-objects code");
+
+ e->type_valid = type >= OBJ_NONE;
+ e->type_ = (unsigned)type;
+}
+
static inline unsigned int oe_in_pack_pos(const struct packing_data *pack,
const struct object_entry *e)
{
-- 8< --
Nguyễn Thái Ngọc Duy (11):
pack-objects: a bit of document about struct object_entry
pack-objects: turn type and in_pack_type to bitfields
pack-objects: use bitfield for object_entry::dfs_state
pack-objects: use bitfield for object_entry::depth
pack-objects: move in_pack_pos out of struct object_entry
pack-objects: move in_pack out of struct object_entry
pack-objects: refer to delta objects by index instead of pointer
pack-objects: shrink z_delta_size field in struct object_entry
pack-objects: shrink size field in struct object_entry
pack-objects: shrink delta_size field in struct object_entry
pack-objects.h: reorder members to shrink struct object_entry
Documentation/config.txt | 4 +-
Documentation/git-pack-objects.txt | 13 +-
Documentation/git-repack.txt | 4 +-
builtin/pack-objects.c | 309 +++++++++++++++++------------
cache.h | 3 +
object.h | 1 -
pack-bitmap-write.c | 14 +-
pack-bitmap.c | 2 +-
pack-bitmap.h | 4 +-
pack-objects.h | 294 ++++++++++++++++++++++++---
10 files changed, 488 insertions(+), 160 deletions(-)
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v4 01/11] pack-objects: a bit of document about struct object_entry
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-16 20:32 ` Junio C Hamano
2018-03-16 18:31 ` [PATCH v4 02/11] pack-objects: turn type and in_pack_type to bitfields Nguyễn Thái Ngọc Duy
` (10 subsequent siblings)
11 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
The role of this comment block becomes more important after we shuffle
fields around to shrink this struct. It will be much harder to see what
field is related to what.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
pack-objects.h | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/pack-objects.h b/pack-objects.h
index 03f1191659..85345a4af1 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,50 @@
#ifndef PACK_OBJECTS_H
#define PACK_OBJECTS_H
+/*
+ * basic object info
+ * -----------------
+ * idx.oid is filled up before delta searching starts. idx.crc32 and
+ * is only valid after the object is written out and will be used for
+ * generating the index. idx.offset will be both gradually set and
+ * used in writing phase (base objects get offset first, then deltas
+ * refer to them)
+ *
+ * "size" is the uncompressed object size. Compressed size is not
+ * cached (ie. raw data in a pack) but available via revindex.
+ *
+ * "hash" contains a path name hash which is used for sorting the
+ * delta list and also during delta searching. Once prepare_pack()
+ * returns it's no longer needed.
+ *
+ * source pack info
+ * ----------------
+ * The (in_pack, in_pack_offset, in_pack_header_size) tuple contains
+ * the location of the object in the source pack, with or without
+ * header.
+ *
+ * "type" and "in_pack_type" both describe object type. in_pack_type
+ * may contain a delta type, while type is always the canonical type.
+ *
+ * deltas
+ * ------
+ * Delta links (delta, delta_child and delta_sibling) are created
+ * reflect that delta graph from the source pack then updated or added
+ * during delta searching phase when we find better deltas.
+ *
+ * delta_child and delta_sibling are last needed in
+ * compute_write_order(). "delta" and "delta_size" must remain valid
+ * at object writing phase in case the delta is not cached.
+ *
+ * If a delta is cached in memory and is compressed delta_data points
+ * to the data and z_delta_size contains the compressed size. If it's
+ * uncompressed [1], z_delta_size must be zero. delta_size is always
+ * the uncompressed size and must be valid even if the delta is not
+ * cached.
+ *
+ * [1] during try_delta phase we don't bother with compressing because
+ * the delta could be quickly replaced with a better one.
+ */
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v4 01/11] pack-objects: a bit of document about struct object_entry
2018-03-16 18:31 ` [PATCH v4 01/11] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-16 20:32 ` Junio C Hamano
2018-03-17 11:59 ` Duy Nguyen
0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2018-03-16 20:32 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> The role of this comment block becomes more important after we shuffle
> fields around to shrink this struct. It will be much harder to see what
> field is related to what.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> pack-objects.h | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 44 insertions(+)
>
> diff --git a/pack-objects.h b/pack-objects.h
> index 03f1191659..85345a4af1 100644
> --- a/pack-objects.h
> +++ b/pack-objects.h
> @@ -1,6 +1,50 @@
> #ifndef PACK_OBJECTS_H
> #define PACK_OBJECTS_H
>
> +/*
> + * basic object info
> + * -----------------
> + * idx.oid is filled up before delta searching starts. idx.crc32 and
> + * is only valid after the object is written out and will be used for
"and is"?
> + * generating the index. idx.offset will be both gradually set and
> + * used in writing phase (base objects get offset first, then deltas
> + * refer to them)
> + *
> + * "size" is the uncompressed object size. Compressed size is not
> + * cached (ie. raw data in a pack) but available via revindex.
I am having a hard time understanding what "ie. raw data in a pack"
is doing in that sentence.
It is correct that compressed size is not cached; it does not even
exist and the only way to know it is to compute it by reversing the
.idx file (or actually uncompressing the compressed stream).
Perhaps:
Compressed size of the raw data for an object in a pack is not
stored anywhere but is computed and made available when reverse
.idx is made.
> + * "hash" contains a path name hash which is used for sorting the
> + * delta list and also during delta searching. Once prepare_pack()
> + * returns it's no longer needed.
Hmm, that suggests an interesting optimization opportunity ;-)
> + * source pack info
> + * ----------------
> + * The (in_pack, in_pack_offset, in_pack_header_size) tuple contains
> + * the location of the object in the source pack, with or without
> + * header.
"with or without", meaning...? An object in the source pack may or
may not have any in_pack_header, in which case in_pack_header_size
is zero, or something? Not suggesting to rephrase (at least not
yet), but trying to understand.
> + * "type" and "in_pack_type" both describe object type. in_pack_type
> + * may contain a delta type, while type is always the canonical type.
> + *
> + * deltas
> + * ------
> + * Delta links (delta, delta_child and delta_sibling) are created
> + * reflect that delta graph from the source pack then updated or added
> + * during delta searching phase when we find better deltas.
Isn't anything missing after "are created"? Perhaps "to"?
> + *
> + * delta_child and delta_sibling are last needed in
> + * compute_write_order(). "delta" and "delta_size" must remain valid
> + * at object writing phase in case the delta is not cached.
True. I thought child and sibling are only needed during write
order computing, so there may be an optimization opportunity there.
> + * If a delta is cached in memory and is compressed delta_data points
s/compressed delta_data/compressed, delta_data/;
> + * to the data and z_delta_size contains the compressed size. If it's
> + * uncompressed [1], z_delta_size must be zero. delta_size is always
> + * the uncompressed size and must be valid even if the delta is not
> + * cached.
> + *
> + * [1] during try_delta phase we don't bother with compressing because
> + * the delta could be quickly replaced with a better one.
> + */
> struct object_entry {
> struct pack_idx_entry idx;
> unsigned long size; /* uncompressed size */
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v4 01/11] pack-objects: a bit of document about struct object_entry
2018-03-16 20:32 ` Junio C Hamano
@ 2018-03-17 11:59 ` Duy Nguyen
0 siblings, 0 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-17 11:59 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
On Fri, Mar 16, 2018 at 9:32 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> +/*
>> + * basic object info
>> + * -----------------
>> + * idx.oid is filled up before delta searching starts. idx.crc32 and
>> + * is only valid after the object is written out and will be used for
>
> "and is"?
There was another field that I thought was only valid after blah blah.
But it was wrong and I forgot to delete this "and" after deleting that
field.
>> + * "hash" contains a path name hash which is used for sorting the
>> + * delta list and also during delta searching. Once prepare_pack()
>> + * returns it's no longer needed.
>
> Hmm, that suggests an interesting optimization opportunity ;-)
Heh.. it does not reduce peak memory consumption though which is why
I'm less interested in freeing it after prepare_pack().
>> + * source pack info
>> + * ----------------
>> + * The (in_pack, in_pack_offset, in_pack_header_size) tuple contains
>> + * the location of the object in the source pack, with or without
>> + * header.
>
> "with or without", meaning...? An object in the source pack may or
> may not have any in_pack_header, in which case in_pack_header_size
> is zero, or something? Not suggesting to rephrase (at least not
> yet), but trying to understand.
The location with the header (i.e. true beginning an object in a pack)
or without/after the header so you are at the zlib stream, ready to
inflate or reuse. I'll rephrase this a bit.
>> + *
>> + * delta_child and delta_sibling are last needed in
>> + * compute_write_order(). "delta" and "delta_size" must remain valid
>> + * at object writing phase in case the delta is not cached.
>
> True. I thought child and sibling are only needed during write
> order computing, so there may be an optimization opportunity there.
See. I wrote all this for a reason. Somebody looking for low hang
fruit can always find some ;-)
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v4 02/11] pack-objects: turn type and in_pack_type to bitfields
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
2018-03-16 18:31 ` [PATCH v4 01/11] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-16 20:49 ` Junio C Hamano
2018-03-16 18:31 ` [PATCH v4 03/11] pack-objects: use bitfield for object_entry::dfs_state Nguyễn Thái Ngọc Duy
` (9 subsequent siblings)
11 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.
A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.
But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger
fatal: unable to get type of object ...
Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.
[1] See 5c49c11686 (pack-objects: better check_object() performances -
2007-04-16)
[2] 21666f1aae (convert object type handling from a string to a number
- 2007-02-26)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 58 +++++++++++++++++++++++++-----------------
cache.h | 2 ++
object.h | 1 -
pack-bitmap-write.c | 6 ++---
pack-objects.h | 20 +++++++++++++--
5 files changed, 57 insertions(+), 30 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5c674b2843..13f6a44fb2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -265,7 +265,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
struct git_istream *st = NULL;
if (!usable_delta) {
- if (entry->type == OBJ_BLOB &&
+ if (oe_type(entry) == OBJ_BLOB &&
entry->size > big_file_threshold &&
(st = open_istream(entry->idx.oid.hash, &type, &size, NULL)) != NULL)
buf = NULL;
@@ -371,7 +371,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
- enum object_type type = entry->type;
+ enum object_type type = oe_type(entry);
off_t datalen;
unsigned char header[MAX_PACK_OBJECT_HEADER],
dheader[MAX_PACK_OBJECT_HEADER];
@@ -480,11 +480,12 @@ static off_t write_object(struct hashfile *f,
to_reuse = 0; /* explicit */
else if (!entry->in_pack)
to_reuse = 0; /* can't reuse what we don't have */
- else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
+ else if (oe_type(entry) == OBJ_REF_DELTA ||
+ oe_type(entry) == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
to_reuse = usable_delta;
/* ... but pack split may override that */
- else if (entry->type != entry->in_pack_type)
+ else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0; /* pack has delta which is unusable */
else if (entry->delta)
to_reuse = 0; /* we want to pack afresh */
@@ -705,8 +706,8 @@ static struct object_entry **compute_write_order(void)
* And then all remaining commits and tags.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (objects[i].type != OBJ_COMMIT &&
- objects[i].type != OBJ_TAG)
+ if (oe_type(&objects[i]) != OBJ_COMMIT &&
+ oe_type(&objects[i]) != OBJ_TAG)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -715,7 +716,7 @@ static struct object_entry **compute_write_order(void)
* And then all the trees.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (objects[i].type != OBJ_TREE)
+ if (oe_type(&objects[i]) != OBJ_TREE)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -1066,8 +1067,7 @@ static void create_object_entry(const struct object_id *oid,
entry = packlist_alloc(&to_pack, oid->hash, index_pos);
entry->hash = hash;
- if (type)
- entry->type = type;
+ oe_set_type(entry, type);
if (exclude)
entry->preferred_base = 1;
else
@@ -1407,6 +1407,7 @@ static void check_object(struct object_entry *entry)
unsigned long avail;
off_t ofs;
unsigned char *buf, c;
+ enum object_type type;
buf = use_pack(p, &w_curs, entry->in_pack_offset, &avail);
@@ -1415,11 +1416,15 @@ static void check_object(struct object_entry *entry)
* since non-delta representations could still be reused.
*/
used = unpack_object_header_buffer(buf, avail,
- &entry->in_pack_type,
+ &type,
&entry->size);
if (used == 0)
goto give_up;
+ if (type < 0)
+ die("BUG: invalid type %d", type);
+ entry->in_pack_type = type;
+
/*
* Determine if this is a delta and if so whether we can
* reuse it or not. Otherwise let's find out as cheaply as
@@ -1428,9 +1433,9 @@ static void check_object(struct object_entry *entry)
switch (entry->in_pack_type) {
default:
/* Not a delta hence we've already got all we need. */
- entry->type = entry->in_pack_type;
+ oe_set_type(entry, entry->in_pack_type);
entry->in_pack_header_size = used;
- if (entry->type < OBJ_COMMIT || entry->type > OBJ_BLOB)
+ if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > OBJ_BLOB)
goto give_up;
unuse_pack(&w_curs);
return;
@@ -1484,7 +1489,7 @@ static void check_object(struct object_entry *entry)
* deltify other objects against, in order to avoid
* circular deltas.
*/
- entry->type = entry->in_pack_type;
+ oe_set_type(entry, entry->in_pack_type);
entry->delta = base_entry;
entry->delta_size = entry->size;
entry->delta_sibling = base_entry->delta_child;
@@ -1493,7 +1498,7 @@ static void check_object(struct object_entry *entry)
return;
}
- if (entry->type) {
+ if (oe_type(entry)) {
/*
* This must be a delta and we already know what the
* final object type is. Let's extract the actual
@@ -1516,7 +1521,7 @@ static void check_object(struct object_entry *entry)
unuse_pack(&w_curs);
}
- entry->type = sha1_object_info(entry->idx.oid.hash, &entry->size);
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &entry->size));
/*
* The error condition is checked in prepare_pack(). This is
* to permit a missing preferred base object to be ignored
@@ -1559,6 +1564,7 @@ static void drop_reused_delta(struct object_entry *entry)
{
struct object_entry **p = &entry->delta->delta_child;
struct object_info oi = OBJECT_INFO_INIT;
+ enum object_type type;
while (*p) {
if (*p == entry)
@@ -1570,7 +1576,7 @@ static void drop_reused_delta(struct object_entry *entry)
entry->depth = 0;
oi.sizep = &entry->size;
- oi.typep = &entry->type;
+ oi.typep = &type;
if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
@@ -1578,8 +1584,10 @@ static void drop_reused_delta(struct object_entry *entry)
* And if that fails, the error will be recorded in entry->type
* and dealt with in prepare_pack().
*/
- entry->type = sha1_object_info(entry->idx.oid.hash,
- &entry->size);
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
+ &entry->size));
+ } else {
+ oe_set_type(entry, type);
}
}
@@ -1747,10 +1755,12 @@ static int type_size_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+ enum object_type a_type = oe_type(a);
+ enum object_type b_type = oe_type(b);
- if (a->type > b->type)
+ if (a_type > b_type)
return -1;
- if (a->type < b->type)
+ if (a_type < b_type)
return 1;
if (a->hash > b->hash)
return -1;
@@ -1826,7 +1836,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
void *delta_buf;
/* Don't bother doing diffs between different types */
- if (trg_entry->type != src_entry->type)
+ if (oe_type(trg_entry) != oe_type(src_entry))
return -1;
/*
@@ -2432,11 +2442,11 @@ static void prepare_pack(int window, int depth)
if (!entry->preferred_base) {
nr_deltas++;
- if (entry->type < 0)
+ if (oe_type(entry) < 0)
die("unable to get type of object %s",
oid_to_hex(&entry->idx.oid));
} else {
- if (entry->type < 0) {
+ if (oe_type(entry) < 0) {
/*
* This object is not found, but we
* don't have to include it anyway.
@@ -2545,7 +2555,7 @@ static void read_object_list_from_stdin(void)
die("expected object ID, got garbage:\n %s", line);
add_preferred_base_object(p + 1);
- add_object_entry(&oid, 0, p + 1, 0);
+ add_object_entry(&oid, OBJ_NONE, p + 1, 0);
}
}
diff --git a/cache.h b/cache.h
index 21fbcc2414..862bdff83a 100644
--- a/cache.h
+++ b/cache.h
@@ -373,6 +373,8 @@ extern void free_name_hash(struct index_state *istate);
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
#endif
+#define TYPE_BITS 3
+
enum object_type {
OBJ_BAD = -1,
OBJ_NONE = 0,
diff --git a/object.h b/object.h
index 87563d9056..8ce294d6ec 100644
--- a/object.h
+++ b/object.h
@@ -25,7 +25,6 @@ struct object_array {
#define OBJECT_ARRAY_INIT { 0, 0, NULL }
-#define TYPE_BITS 3
/*
* object flag allocation:
* revision.h: 0---------10 26
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e01f992884..fd11f08940 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -64,12 +64,12 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index,
entry->in_pack_pos = i;
- switch (entry->type) {
+ switch (oe_type(entry)) {
case OBJ_COMMIT:
case OBJ_TREE:
case OBJ_BLOB:
case OBJ_TAG:
- real_type = entry->type;
+ real_type = oe_type(entry);
break;
default:
@@ -98,7 +98,7 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index,
default:
die("Missing type information for %s (%d/%d)",
oid_to_hex(&entry->idx.oid), real_type,
- entry->type);
+ oe_type(entry));
}
}
}
diff --git a/pack-objects.h b/pack-objects.h
index 85345a4af1..38d3ff167f 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -58,8 +58,9 @@ struct object_entry {
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
- enum object_type type;
- enum object_type in_pack_type; /* could be delta */
+ unsigned type_:TYPE_BITS;
+ unsigned in_pack_type:TYPE_BITS; /* could be delta */
+ unsigned type_valid:1;
uint32_t hash; /* name hint hash */
unsigned int in_pack_pos;
unsigned char in_pack_header_size;
@@ -122,4 +123,19 @@ static inline uint32_t pack_name_hash(const char *name)
return hash;
}
+static inline enum object_type oe_type(const struct object_entry *e)
+{
+ return e->type_valid ? e->type_ : OBJ_BAD;
+}
+
+static inline void oe_set_type(struct object_entry *e,
+ enum object_type type)
+{
+ if (type >= OBJ_ANY)
+ die("BUG: OBJ_ANY cannot be set in pack-objects code");
+
+ e->type_valid = type >= OBJ_NONE;
+ e->type_ = (unsigned)type;
+}
+
#endif
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v4 02/11] pack-objects: turn type and in_pack_type to bitfields
2018-03-16 18:31 ` [PATCH v4 02/11] pack-objects: turn type and in_pack_type to bitfields Nguyễn Thái Ngọc Duy
@ 2018-03-16 20:49 ` Junio C Hamano
0 siblings, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2018-03-16 20:49 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> An extra field type_valid is added to carry the equivalent of OBJ_BAD
> in the original "type" field. in_pack_type always contains a valid
> type so we only need 3 bits for it.
> ...
> @@ -1570,7 +1576,7 @@ static void drop_reused_delta(struct object_entry *entry)
> entry->depth = 0;
>
> oi.sizep = &entry->size;
> - oi.typep = &entry->type;
> + oi.typep = &type;
> if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
> /*
> * We failed to get the info from this pack for some reason;
> @@ -1578,8 +1584,10 @@ static void drop_reused_delta(struct object_entry *entry)
> * And if that fails, the error will be recorded in entry->type
This "entry->type" needs updating.
> * and dealt with in prepare_pack().
> */
> - entry->type = sha1_object_info(entry->idx.oid.hash,
> - &entry->size);
> + oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
> + &entry->size));
> + } else {
> + oe_set_type(entry, type);
> }
> }
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v4 03/11] pack-objects: use bitfield for object_entry::dfs_state
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
2018-03-16 18:31 ` [PATCH v4 01/11] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
2018-03-16 18:31 ` [PATCH v4 02/11] pack-objects: turn type and in_pack_type to bitfields Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-16 18:31 ` [PATCH v4 04/11] pack-objects: use bitfield for object_entry::depth Nguyễn Thái Ngọc Duy
` (8 subsequent siblings)
11 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 3 +++
pack-objects.h | 28 +++++++++++++++++-----------
2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 13f6a44fb2..09f8b4ef3e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3049,6 +3049,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
OPT_END(),
};
+ if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS))
+ die("BUG: too many dfs states, increase OE_DFS_STATE_BITS");
+
check_replace_refs = 0;
reset_pack_idx_option(&pack_idx_opts);
diff --git a/pack-objects.h b/pack-objects.h
index 38d3ff167f..2bb1732098 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,21 @@
#ifndef PACK_OBJECTS_H
#define PACK_OBJECTS_H
+#define OE_DFS_STATE_BITS 2
+
+/*
+ * State flags for depth-first search used for analyzing delta cycles.
+ *
+ * The depth is measured in delta-links to the base (so if A is a delta
+ * against B, then A has a depth of 1, and B a depth of 0).
+ */
+enum dfs_state {
+ DFS_NONE = 0,
+ DFS_ACTIVE,
+ DFS_DONE,
+ DFS_NUM_STATES
+};
+
/*
* basic object info
* -----------------
@@ -72,19 +87,10 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
+ unsigned dfs_state:OE_DFS_STATE_BITS;
- /*
- * State flags for depth-first search used for analyzing delta cycles.
- *
- * The depth is measured in delta-links to the base (so if A is a delta
- * against B, then A has a depth of 1, and B a depth of 0).
- */
- enum {
- DFS_NONE = 0,
- DFS_ACTIVE,
- DFS_DONE
- } dfs_state;
int depth;
+
};
struct packing_data {
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v4 04/11] pack-objects: use bitfield for object_entry::depth
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (2 preceding siblings ...)
2018-03-16 18:31 ` [PATCH v4 03/11] pack-objects: use bitfield for object_entry::dfs_state Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-16 18:31 ` [PATCH v4 05/11] pack-objects: move in_pack_pos out of struct object_entry Nguyễn Thái Ngọc Duy
` (7 subsequent siblings)
11 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/config.txt | 1 +
Documentation/git-pack-objects.txt | 4 +++-
Documentation/git-repack.txt | 4 +++-
builtin/pack-objects.c | 4 ++++
pack-objects.h | 5 ++---
5 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index f57e9cf10c..9bd3f5a789 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2412,6 +2412,7 @@ pack.window::
pack.depth::
The maximum delta depth used by linkgit:git-pack-objects[1] when no
maximum depth is given on the command line. Defaults to 50.
+ Maximum value is 4095.
pack.windowMemory::
The maximum size of memory that is consumed by each thread
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 81bc490ac5..3503c9e3e6 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -96,7 +96,9 @@ base-name::
it too deep affects the performance on the unpacker
side, because delta data needs to be applied that many
times to get to the necessary object.
- The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
--window-memory=<n>::
This option provides an additional limit on top of `--window`;
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index ae750e9e11..25c83c4927 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -90,7 +90,9 @@ other objects in that pack they already have locally.
space. `--depth` limits the maximum delta depth; making it too deep
affects the performance on the unpacker side, because delta data needs
to be applied that many times to get to the necessary object.
- The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
--threads=<n>::
This option is passed through to `git pack-objects`.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 09f8b4ef3e..668eaf8cd7 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3068,6 +3068,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (pack_to_stdout != !base_name || argc)
usage_with_options(pack_usage, pack_objects_options);
+ if (depth >= (1 << OE_DEPTH_BITS))
+ die(_("delta chain depth %d is greater than maximum limit %d"),
+ depth, (1 << OE_DEPTH_BITS));
+
argv_array_push(&rp, "pack-objects");
if (thin) {
use_internal_rev_list = 1;
diff --git a/pack-objects.h b/pack-objects.h
index 2bb1732098..50908d1f2d 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -2,6 +2,7 @@
#define PACK_OBJECTS_H
#define OE_DFS_STATE_BITS 2
+#define OE_DEPTH_BITS 12
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -88,9 +89,7 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
-
- int depth;
-
+ unsigned depth:OE_DEPTH_BITS;
};
struct packing_data {
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v4 05/11] pack-objects: move in_pack_pos out of struct object_entry
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (3 preceding siblings ...)
2018-03-16 18:31 ` [PATCH v4 04/11] pack-objects: use bitfield for object_entry::depth Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-16 18:31 ` [PATCH v4 06/11] pack-objects: move in_pack " Nguyễn Thái Ngọc Duy
` (6 subsequent siblings)
11 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (it's not freed in the same way that objects[] is
not).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 3 ++-
pack-bitmap-write.c | 8 +++++---
pack-bitmap.c | 2 +-
pack-bitmap.h | 4 +++-
pack-objects.h | 16 +++++++++++++++-
5 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 668eaf8cd7..b281487b96 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -879,7 +879,8 @@ static void write_pack_file(void)
if (write_bitmap_index) {
bitmap_writer_set_checksum(oid.hash);
- bitmap_writer_build_type_index(written_list, nr_written);
+ bitmap_writer_build_type_index(
+ &to_pack, written_list, nr_written);
}
finish_tmp_packfile(&tmpname, pack_tmp_name,
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index fd11f08940..f7c897515b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,8 @@ void bitmap_writer_show_progress(int show)
/**
* Build the initial type index for the packfile
*/
-void bitmap_writer_build_type_index(struct pack_idx_entry **index,
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+ struct pack_idx_entry **index,
uint32_t index_nr)
{
uint32_t i;
@@ -57,12 +58,13 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index,
writer.trees = ewah_new();
writer.blobs = ewah_new();
writer.tags = ewah_new();
+ ALLOC_ARRAY(to_pack->in_pack_pos, to_pack->nr_objects);
for (i = 0; i < index_nr; ++i) {
struct object_entry *entry = (struct object_entry *)index[i];
enum object_type real_type;
- entry->in_pack_pos = i;
+ oe_set_in_pack_pos(to_pack, entry, i);
switch (oe_type(entry)) {
case OBJ_COMMIT:
@@ -147,7 +149,7 @@ static uint32_t find_object_pos(const unsigned char *sha1)
"(object %s is missing)", sha1_to_hex(sha1));
}
- return entry->in_pack_pos;
+ return oe_in_pack_pos(writer.to_pack, entry);
}
static void show_object(struct object *object, const char *name, void *data)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9270983e5f..865d9ecc4e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1032,7 +1032,7 @@ int rebuild_existing_bitmaps(struct packing_data *mapping,
oe = packlist_find(mapping, sha1, NULL);
if (oe)
- reposition[i] = oe->in_pack_pos + 1;
+ reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
}
rebuild = bitmap_new();
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3742a00e14..5ded2f139a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,7 +44,9 @@ int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bi
void bitmap_writer_show_progress(int show);
void bitmap_writer_set_checksum(unsigned char *sha1);
-void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t index_nr);
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+ struct pack_idx_entry **index,
+ uint32_t index_nr);
void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack);
void bitmap_writer_select_commits(struct commit **indexed_commits,
unsigned int indexed_commits_nr, int max_bitmaps);
diff --git a/pack-objects.h b/pack-objects.h
index 50908d1f2d..dae160e7c2 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -78,7 +78,6 @@ struct object_entry {
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned type_valid:1;
uint32_t hash; /* name hint hash */
- unsigned int in_pack_pos;
unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
@@ -98,6 +97,8 @@ struct packing_data {
int32_t *index;
uint32_t index_size;
+
+ unsigned int *in_pack_pos;
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -143,4 +144,17 @@ static inline void oe_set_type(struct object_entry *e,
e->type_ = (unsigned)type;
}
+static inline unsigned int oe_in_pack_pos(const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ return pack->in_pack_pos[e - pack->objects];
+}
+
+static inline void oe_set_in_pack_pos(const struct packing_data *pack,
+ const struct object_entry *e,
+ unsigned int pos)
+{
+ pack->in_pack_pos[e - pack->objects] = pos;
+}
+
#endif
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v4 06/11] pack-objects: move in_pack out of struct object_entry
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (4 preceding siblings ...)
2018-03-16 18:31 ` [PATCH v4 05/11] pack-objects: move in_pack_pos out of struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-26 20:39 ` Stefan Beller
2018-03-16 18:31 ` [PATCH v4 07/11] pack-objects: refer to delta objects by index instead of pointer Nguyễn Thái Ngọc Duy
` (5 subsequent siblings)
11 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.
This limits the number of packs we can handle to 16k. For now if you hit
16k pack files limit, pack-objects will simply fail [1].
[1] The escape hatch is .keep file to limit the non-kept pack files
below 16k limit. Then you can go for another pack-objects run to
combine another 16k pack files. Repeat until you're satisfied.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/git-pack-objects.txt | 9 ++++++
builtin/pack-objects.c | 40 +++++++++++++++++----------
cache.h | 1 +
pack-objects.h | 44 +++++++++++++++++++++++++++++-
4 files changed, 79 insertions(+), 15 deletions(-)
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 3503c9e3e6..b8d936ccf5 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -269,6 +269,15 @@ Unexpected missing object will raise an error.
locally created objects [without .promisor] and objects from the
promisor remote [with .promisor].) This is used with partial clone.
+LIMITATIONS
+-----------
+
+This command could only handle 16384 existing pack files at a time.
+If you have more than this, you need to exclude some pack files with
+".keep" file and --honor-pack-keep option, to combine 16k pack files
+in one, then remove these .keep files and run pack-objects one more
+time.
+
SEE ALSO
--------
linkgit:git-rev-list[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b281487b96..ca993e55dd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -29,6 +29,8 @@
#include "list.h"
#include "packfile.h"
+#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+
static const char *pack_usage[] = {
N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
N_("git pack-objects [<options>...] <base-name> [< <ref-list> | < <object-list>]"),
@@ -367,7 +369,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned long limit, int usable_delta)
{
- struct packed_git *p = entry->in_pack;
+ struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
@@ -478,7 +480,7 @@ static off_t write_object(struct hashfile *f,
if (!reuse_object)
to_reuse = 0; /* explicit */
- else if (!entry->in_pack)
+ else if (!IN_PACK(entry))
to_reuse = 0; /* can't reuse what we don't have */
else if (oe_type(entry) == OBJ_REF_DELTA ||
oe_type(entry) == OBJ_OFS_DELTA)
@@ -1025,7 +1027,7 @@ static int want_object_in_pack(const struct object_id *oid,
if (*found_pack) {
want = want_found_object(exclude, *found_pack);
if (want != -1)
- return want;
+ goto done;
}
list_for_each(pos, &packed_git_mru) {
@@ -1048,11 +1050,16 @@ static int want_object_in_pack(const struct object_id *oid,
if (!exclude && want > 0)
list_move(&p->mru, &packed_git_mru);
if (want != -1)
- return want;
+ goto done;
}
}
- return 1;
+ want = 1;
+done:
+ if (want && *found_pack && !(*found_pack)->index)
+ oe_add_pack(&to_pack, *found_pack);
+
+ return want;
}
static void create_object_entry(const struct object_id *oid,
@@ -1074,7 +1081,7 @@ static void create_object_entry(const struct object_id *oid,
else
nr_result++;
if (found_pack) {
- entry->in_pack = found_pack;
+ oe_set_in_pack(entry, found_pack);
entry->in_pack_offset = found_offset;
}
@@ -1399,8 +1406,8 @@ static void cleanup_preferred_base(void)
static void check_object(struct object_entry *entry)
{
- if (entry->in_pack) {
- struct packed_git *p = entry->in_pack;
+ if (IN_PACK(entry)) {
+ struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
const unsigned char *base_ref = NULL;
struct object_entry *base_entry;
@@ -1535,14 +1542,16 @@ static int pack_offset_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+ const struct packed_git *a_in_pack = IN_PACK(a);
+ const struct packed_git *b_in_pack = IN_PACK(b);
/* avoid filesystem trashing with loose objects */
- if (!a->in_pack && !b->in_pack)
+ if (!a_in_pack && !b_in_pack)
return oidcmp(&a->idx.oid, &b->idx.oid);
- if (a->in_pack < b->in_pack)
+ if (a_in_pack < b_in_pack)
return -1;
- if (a->in_pack > b->in_pack)
+ if (a_in_pack > b_in_pack)
return 1;
return a->in_pack_offset < b->in_pack_offset ? -1 :
(a->in_pack_offset > b->in_pack_offset);
@@ -1578,7 +1587,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &entry->size;
oi.typep = &type;
- if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to sha1_object_info, which may find another copy.
@@ -1848,8 +1857,8 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
* it, we will still save the transfer cost, as we already know
* the other side has it and we won't send src_entry at all.
*/
- if (reuse_delta && trg_entry->in_pack &&
- trg_entry->in_pack == src_entry->in_pack &&
+ if (reuse_delta && IN_PACK(trg_entry) &&
+ IN_PACK(trg_entry) == IN_PACK(src_entry) &&
!src_entry->preferred_base &&
trg_entry->in_pack_type != OBJ_REF_DELTA &&
trg_entry->in_pack_type != OBJ_OFS_DELTA)
@@ -3191,6 +3200,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
}
}
+ /* make sure IN_PACK(0) return NULL */
+ oe_add_pack(&to_pack, NULL);
+
if (progress)
progress_state = start_progress(_("Counting objects"), 0);
if (!use_internal_rev_list)
diff --git a/cache.h b/cache.h
index 862bdff83a..b90feb3802 100644
--- a/cache.h
+++ b/cache.h
@@ -1635,6 +1635,7 @@ extern struct packed_git {
int index_version;
time_t mtime;
int pack_fd;
+ int index; /* for builtin/pack-objects.c */
unsigned pack_local:1,
pack_keep:1,
freshened:1,
diff --git a/pack-objects.h b/pack-objects.h
index dae160e7c2..9bcb5946e5 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -3,6 +3,7 @@
#define OE_DFS_STATE_BITS 2
#define OE_DEPTH_BITS 12
+#define OE_IN_PACK_BITS 14
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -18,6 +19,10 @@ enum dfs_state {
};
/*
+ * The size of struct nearly determines pack-objects's memory
+ * consumption. This struct is packed tight for that reason. When you
+ * add or reorder something in this struct, think a bit about this.
+ *
* basic object info
* -----------------
* idx.oid is filled up before delta searching starts. idx.crc32 and
@@ -64,7 +69,7 @@ enum dfs_state {
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
- struct packed_git *in_pack; /* already in pack */
+ unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
off_t in_pack_offset;
struct object_entry *delta; /* delta base object */
struct object_entry *delta_child; /* deltified objects who bases me */
@@ -99,6 +104,8 @@ struct packing_data {
uint32_t index_size;
unsigned int *in_pack_pos;
+ int in_pack_count;
+ struct packed_git *in_pack[1 << OE_IN_PACK_BITS];
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -157,4 +164,39 @@ static inline void oe_set_in_pack_pos(const struct packing_data *pack,
pack->in_pack_pos[e - pack->objects] = pos;
}
+static inline unsigned int oe_add_pack(struct packing_data *pack,
+ struct packed_git *p)
+{
+ if (pack->in_pack_count >= (1 << OE_IN_PACK_BITS))
+ die(_("too many packs to handle in one go. "
+ "Please add .keep files to exclude\n"
+ "some pack files and keep the number "
+ "of non-kept files below %d."),
+ 1 << OE_IN_PACK_BITS);
+ if (p) {
+ if (p->index > 0)
+ die("BUG: this packed is already indexed");
+ p->index = pack->in_pack_count;
+ }
+ pack->in_pack[pack->in_pack_count] = p;
+ return pack->in_pack_count++;
+}
+
+static inline struct packed_git *oe_in_pack(const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ return pack->in_pack[e->in_pack_idx];
+
+}
+
+static inline void oe_set_in_pack(struct object_entry *e,
+ struct packed_git *p)
+{
+ if (p->index <= 0)
+ die("BUG: found_pack should be NULL "
+ "instead of having non-positive index");
+ e->in_pack_idx = p->index;
+
+}
+
#endif
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v4 06/11] pack-objects: move in_pack out of struct object_entry
2018-03-16 18:31 ` [PATCH v4 06/11] pack-objects: move in_pack " Nguyễn Thái Ngọc Duy
@ 2018-03-26 20:39 ` Stefan Beller
0 siblings, 0 replies; 273+ messages in thread
From: Stefan Beller @ 2018-03-26 20:39 UTC (permalink / raw)
To: Duy Nguyen
Cc: Ævar Arnfjörð Bjarmason, Eric Wong, git,
Junio C Hamano, Jeff King
Hi,
sorry for the late review, as I am pointed here indirectly via
https://public-inbox.org/git/xmqqy3iebpsw.fsf@gitster-ct.c.googlers.com/
On Fri, Mar 16, 2018 at 11:33 AM Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
wrote:
> +LIMITATIONS
> +-----------
> +
> +This command could only handle 16384 existing pack files at a time.
s/could/can/ ?
> @@ -3191,6 +3200,9 @@ int cmd_pack_objects(int argc, const char **argv,
const char *prefix)
> }
> }
> + /* make sure IN_PACK(0) return NULL */
I was confused for a while staring at this comment, /s/0/NULL/
would have helped me.
> +static inline unsigned int oe_add_pack(struct packing_data *pack,
> + struct packed_git *p)
> +{
> + if (pack->in_pack_count >= (1 << OE_IN_PACK_BITS))
> + die(_("too many packs to handle in one go. "
> + "Please add .keep files to exclude\n"
> + "some pack files and keep the number "
> + "of non-kept files below %d."),
> + 1 << OE_IN_PACK_BITS);
The packs are indexed 0..N-1, so we can actually handle N
packs I presume. But if we actually have N, then we'd run the
/* make sure IN_PACK(0) return NULL */
oe_add_pack(.., NULL);
as N+1, hence the user can only do N-1 ?
Oh wait! the code below makes me think we index from 1..N,
treating index 0 special as uninitialized? So we actually can only
store N-1 ?
> + if (p) {
> + if (p->index > 0)
s/>/!=/ ?
The new index variable is only used in these three
inlined header functions, and in_pack_count is strictly
positive, so index as well as in_pack_count could be made
unsigned?
Given that oe_add_pack returns an unsigned, I would actually
prefer to have in_pack_count an unsigned as well.
> + die("BUG: this packed is already indexed");
> + p->index = pack->in_pack_count;
> + }
> + pack->in_pack[pack->in_pack_count] = p;
> + return pack->in_pack_count++;
> +}
> +
> +static inline struct packed_git *oe_in_pack(const struct packing_data
*pack,
> + const struct object_entry *e)
> +{
> + return pack->in_pack[e->in_pack_idx];
> +
> +}
extra new line after return?
> +static inline void oe_set_in_pack(struct object_entry *e,
> + struct packed_git *p)
> +{
> + if (p->index <= 0)
> + die("BUG: found_pack should be NULL "
> + "instead of having non-positive index");
Do we also want to guard against
p->index > (1 << OE_IN_PACK_BITS)
here? Also there is a BUG() macro, that would be better
as it reports line file/number, but we cannot use it here as
it is a header inline.
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v4 07/11] pack-objects: refer to delta objects by index instead of pointer
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (5 preceding siblings ...)
2018-03-16 18:31 ` [PATCH v4 06/11] pack-objects: move in_pack " Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-16 20:59 ` Junio C Hamano
2018-03-16 18:31 ` [PATCH v4 08/11] pack-objects: shrink z_delta_size field in struct object_entry Nguyễn Thái Ngọc Duy
` (4 subsequent siblings)
11 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4GB of those objects
because the array length, nr_objects, is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.
Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].
[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 116 ++++++++++++++++++++++-------------------
pack-objects.h | 67 ++++++++++++++++++++++--
2 files changed, 124 insertions(+), 59 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ca993e55dd..cdbad57082 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,6 +30,12 @@
#include "packfile.h"
#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+#define DELTA(obj) oe_delta(&to_pack, obj)
+#define DELTA_CHILD(obj) oe_delta_child(&to_pack, obj)
+#define DELTA_SIBLING(obj) oe_delta_sibling(&to_pack, obj)
+#define SET_DELTA(obj, val) oe_set_delta(&to_pack, obj, val)
+#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(&to_pack, obj, val)
+#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(&to_pack, obj, val)
static const char *pack_usage[] = {
N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
@@ -127,11 +133,11 @@ static void *get_delta(struct object_entry *entry)
buf = read_sha1_file(entry->idx.oid.hash, &type, &size);
if (!buf)
die("unable to read %s", oid_to_hex(&entry->idx.oid));
- base_buf = read_sha1_file(entry->delta->idx.oid.hash, &type,
+ base_buf = read_sha1_file(DELTA(entry)->idx.oid.hash, &type,
&base_size);
if (!base_buf)
die("unable to read %s",
- oid_to_hex(&entry->delta->idx.oid));
+ oid_to_hex(&DELTA(entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
buf, size, &delta_size, 0);
if (!delta_buf || delta_size != entry->delta_size)
@@ -288,12 +294,12 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
size = entry->delta_size;
buf = entry->delta_data;
entry->delta_data = NULL;
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
size = entry->delta_size;
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
@@ -317,7 +323,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
* encoding of the relative offset for the delta
* base from this object's position in the pack.
*/
- off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+ off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -343,7 +349,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
return 0;
}
hashwrite(f, header, hdrlen);
- hashwrite(f, entry->delta->idx.oid.hash, 20);
+ hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
} else {
if (limit && hdrlen + datalen + 20 >= limit) {
@@ -379,8 +385,8 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
- if (entry->delta)
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ if (DELTA(entry))
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
type, entry->size);
@@ -408,7 +414,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
}
if (type == OBJ_OFS_DELTA) {
- off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+ off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -427,7 +433,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
return 0;
}
hashwrite(f, header, hdrlen);
- hashwrite(f, entry->delta->idx.oid.hash, 20);
+ hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
reused_delta++;
} else {
@@ -467,13 +473,13 @@ static off_t write_object(struct hashfile *f,
else
limit = pack_size_limit - write_offset;
- if (!entry->delta)
+ if (!DELTA(entry))
usable_delta = 0; /* no delta */
else if (!pack_size_limit)
usable_delta = 1; /* unlimited packfile */
- else if (entry->delta->idx.offset == (off_t)-1)
+ else if (DELTA(entry)->idx.offset == (off_t)-1)
usable_delta = 0; /* base was written to another pack */
- else if (entry->delta->idx.offset)
+ else if (DELTA(entry)->idx.offset)
usable_delta = 1; /* base already exists in this pack */
else
usable_delta = 0; /* base could end up in another pack */
@@ -489,7 +495,7 @@ static off_t write_object(struct hashfile *f,
/* ... but pack split may override that */
else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0; /* pack has delta which is unusable */
- else if (entry->delta)
+ else if (DELTA(entry))
to_reuse = 0; /* we want to pack afresh */
else
to_reuse = 1; /* we have it in-pack undeltified,
@@ -541,12 +547,12 @@ static enum write_one_status write_one(struct hashfile *f,
}
/* if we are deltified, write out base object first. */
- if (e->delta) {
+ if (DELTA(e)) {
e->idx.offset = 1; /* now recurse */
- switch (write_one(f, e->delta, offset)) {
+ switch (write_one(f, DELTA(e), offset)) {
case WRITE_ONE_RECURSIVE:
/* we cannot depend on this one */
- e->delta = NULL;
+ SET_DELTA(e, NULL);
break;
default:
break;
@@ -608,34 +614,34 @@ static void add_descendants_to_write_order(struct object_entry **wo,
/* add this node... */
add_to_write_order(wo, endp, e);
/* all its siblings... */
- for (s = e->delta_sibling; s; s = s->delta_sibling) {
+ for (s = DELTA_SIBLING(e); s; s = DELTA_SIBLING(s)) {
add_to_write_order(wo, endp, s);
}
}
/* drop down a level to add left subtree nodes if possible */
- if (e->delta_child) {
+ if (DELTA_CHILD(e)) {
add_to_order = 1;
- e = e->delta_child;
+ e = DELTA_CHILD(e);
} else {
add_to_order = 0;
/* our sibling might have some children, it is next */
- if (e->delta_sibling) {
- e = e->delta_sibling;
+ if (DELTA_SIBLING(e)) {
+ e = DELTA_SIBLING(e);
continue;
}
/* go back to our parent node */
- e = e->delta;
- while (e && !e->delta_sibling) {
+ e = DELTA(e);
+ while (e && !DELTA_SIBLING(e)) {
/* we're on the right side of a subtree, keep
* going up until we can go right again */
- e = e->delta;
+ e = DELTA(e);
}
if (!e) {
/* done- we hit our original root node */
return;
}
/* pass it off to sibling at this level */
- e = e->delta_sibling;
+ e = DELTA_SIBLING(e);
}
};
}
@@ -646,7 +652,7 @@ static void add_family_to_write_order(struct object_entry **wo,
{
struct object_entry *root;
- for (root = e; root->delta; root = root->delta)
+ for (root = e; DELTA(root); root = DELTA(root))
; /* nothing */
add_descendants_to_write_order(wo, endp, root);
}
@@ -661,8 +667,8 @@ static struct object_entry **compute_write_order(void)
for (i = 0; i < to_pack.nr_objects; i++) {
objects[i].tagged = 0;
objects[i].filled = 0;
- objects[i].delta_child = NULL;
- objects[i].delta_sibling = NULL;
+ SET_DELTA_CHILD(&objects[i], NULL);
+ SET_DELTA_SIBLING(&objects[i], NULL);
}
/*
@@ -672,11 +678,11 @@ static struct object_entry **compute_write_order(void)
*/
for (i = to_pack.nr_objects; i > 0;) {
struct object_entry *e = &objects[--i];
- if (!e->delta)
+ if (!DELTA(e))
continue;
/* Mark me as the first child */
- e->delta_sibling = e->delta->delta_child;
- e->delta->delta_child = e;
+ e->delta_sibling_idx = DELTA(e)->delta_child_idx;
+ SET_DELTA_CHILD(DELTA(e), e);
}
/*
@@ -1498,10 +1504,10 @@ static void check_object(struct object_entry *entry)
* circular deltas.
*/
oe_set_type(entry, entry->in_pack_type);
- entry->delta = base_entry;
+ SET_DELTA(entry, base_entry);
entry->delta_size = entry->size;
- entry->delta_sibling = base_entry->delta_child;
- base_entry->delta_child = entry;
+ entry->delta_sibling_idx = base_entry->delta_child_idx;
+ SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
return;
}
@@ -1572,17 +1578,19 @@ static int pack_offset_sort(const void *_a, const void *_b)
*/
static void drop_reused_delta(struct object_entry *entry)
{
- struct object_entry **p = &entry->delta->delta_child;
+ unsigned *idx = &to_pack.objects[entry->delta_idx - 1].delta_child_idx;
struct object_info oi = OBJECT_INFO_INIT;
enum object_type type;
- while (*p) {
- if (*p == entry)
- *p = (*p)->delta_sibling;
+ while (*idx) {
+ struct object_entry *oe = &to_pack.objects[*idx - 1];
+
+ if (oe == entry)
+ *idx = oe->delta_sibling_idx;
else
- p = &(*p)->delta_sibling;
+ idx = &oe->delta_sibling_idx;
}
- entry->delta = NULL;
+ SET_DELTA(entry, NULL);
entry->depth = 0;
oi.sizep = &entry->size;
@@ -1622,7 +1630,7 @@ static void break_delta_chains(struct object_entry *entry)
for (cur = entry, total_depth = 0;
cur;
- cur = cur->delta, total_depth++) {
+ cur = DELTA(cur), total_depth++) {
if (cur->dfs_state == DFS_DONE) {
/*
* We've already seen this object and know it isn't
@@ -1647,7 +1655,7 @@ static void break_delta_chains(struct object_entry *entry)
* it's not a delta, we're done traversing, but we'll mark it
* done to save time on future traversals.
*/
- if (!cur->delta) {
+ if (!DELTA(cur)) {
cur->dfs_state = DFS_DONE;
break;
}
@@ -1670,7 +1678,7 @@ static void break_delta_chains(struct object_entry *entry)
* We keep all commits in the chain that we examined.
*/
cur->dfs_state = DFS_ACTIVE;
- if (cur->delta->dfs_state == DFS_ACTIVE) {
+ if (DELTA(cur)->dfs_state == DFS_ACTIVE) {
drop_reused_delta(cur);
cur->dfs_state = DFS_DONE;
break;
@@ -1685,7 +1693,7 @@ static void break_delta_chains(struct object_entry *entry)
* an extra "next" pointer to keep going after we reset cur->delta.
*/
for (cur = entry; cur; cur = next) {
- next = cur->delta;
+ next = DELTA(cur);
/*
* We should have a chain of zero or more ACTIVE states down to
@@ -1870,7 +1878,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
/* Now some size filtering heuristics. */
trg_size = trg_entry->size;
- if (!trg_entry->delta) {
+ if (!DELTA(trg_entry)) {
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
@@ -1946,7 +1954,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
if (!delta_buf)
return 0;
- if (trg_entry->delta) {
+ if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
if (delta_size == trg_entry->delta_size &&
src->depth + 1 >= trg->depth) {
@@ -1975,7 +1983,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
free(delta_buf);
}
- trg_entry->delta = src_entry;
+ SET_DELTA(trg_entry, src_entry);
trg_entry->delta_size = delta_size;
trg->depth = src->depth + 1;
@@ -1984,13 +1992,13 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)
{
- struct object_entry *child = me->delta_child;
+ struct object_entry *child = DELTA_CHILD(me);
unsigned int m = n;
while (child) {
unsigned int c = check_delta_limit(child, n + 1);
if (m < c)
m = c;
- child = child->delta_sibling;
+ child = DELTA_SIBLING(child);
}
return m;
}
@@ -2059,7 +2067,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* otherwise they would become too deep.
*/
max_depth = depth;
- if (entry->delta_child) {
+ if (DELTA_CHILD(entry)) {
max_depth -= check_delta_limit(entry, 0);
if (max_depth <= 0)
goto next;
@@ -2109,7 +2117,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* depth, leaving it in the window is pointless. we
* should evict it first.
*/
- if (entry->delta && max_depth <= n->depth)
+ if (DELTA(entry) && max_depth <= n->depth)
continue;
/*
@@ -2117,7 +2125,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* currently deltified object, to keep it longer. It will
* be the first base object to be attempted next.
*/
- if (entry->delta) {
+ if (DELTA(entry)) {
struct unpacked swap = array[best_base];
int dist = (window + idx - best_base) % window;
int dst = best_base;
@@ -2438,7 +2446,7 @@ static void prepare_pack(int window, int depth)
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = to_pack.objects + i;
- if (entry->delta)
+ if (DELTA(entry))
/* This happens if we decided to reuse existing
* delta from a pack. "reuse_delta &&" is implied.
*/
diff --git a/pack-objects.h b/pack-objects.h
index 9bcb5946e5..7f32de2a35 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -71,11 +71,11 @@ struct object_entry {
unsigned long size; /* uncompressed size */
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
off_t in_pack_offset;
- struct object_entry *delta; /* delta base object */
- struct object_entry *delta_child; /* deltified objects who bases me */
- struct object_entry *delta_sibling; /* other deltified objects who
- * uses the same base as me
- */
+ uint32_t delta_idx; /* delta base object */
+ uint32_t delta_child_idx; /* deltified objects who bases me */
+ uint32_t delta_sibling_idx; /* other deltified objects who
+ * uses the same base as me
+ */
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
@@ -199,4 +199,61 @@ static inline void oe_set_in_pack(struct object_entry *e,
}
+static inline struct object_entry *oe_delta(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_idx)
+ return &pack->objects[e->delta_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_idx = 0;
+}
+
+static inline struct object_entry *oe_delta_child(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_child_idx)
+ return &pack->objects[e->delta_child_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta_child(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_child_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_child_idx = 0;
+}
+
+static inline struct object_entry *oe_delta_sibling(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_sibling_idx)
+ return &pack->objects[e->delta_sibling_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta_sibling(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_sibling_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_sibling_idx = 0;
+}
+
#endif
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v4 07/11] pack-objects: refer to delta objects by index instead of pointer
2018-03-16 18:31 ` [PATCH v4 07/11] pack-objects: refer to delta objects by index instead of pointer Nguyễn Thái Ngọc Duy
@ 2018-03-16 20:59 ` Junio C Hamano
0 siblings, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2018-03-16 20:59 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> These delta pointers always point to elements in the objects[] array
> in packing_data struct. We can only hold maximum 4GB of those objects
4GB, as in "number of bytes"? Or "We can hold 4 billion or so of
those objects"?
> because the array length, nr_objects, is uint32_t. We could use
> uint32_t indexes to address these elements instead of pointers. On
> 64-bit architecture (8 bytes per pointer) this would save 4 bytes per
> pointer.
>
> Convert these delta pointers to indexes. Since we need to handle NULL
> pointers as well, the index is shifted by one [1].
>
> [1] This means we can only index 2^32-2 objects even though nr_objects
> could contain 2^32-1 objects. It should not be a problem in
> practice because when we grow objects[], nr_alloc would probably
> blow up long before nr_objects hits the wall.
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v4 08/11] pack-objects: shrink z_delta_size field in struct object_entry
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (6 preceding siblings ...)
2018-03-16 18:31 ` [PATCH v4 07/11] pack-objects: refer to delta objects by index instead of pointer Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-16 19:40 ` Junio C Hamano
2018-03-16 18:31 ` [PATCH v4 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
` (3 subsequent siblings)
11 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 16 bits, so you can only cache 65kb deltas.
Larger deltas must be recomputed at when the pack is written down.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/config.txt | 3 ++-
builtin/pack-objects.c | 22 ++++++++++++++++------
pack-objects.h | 3 ++-
3 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9bd3f5a789..00fa824448 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2449,7 +2449,8 @@ pack.deltaCacheLimit::
The maximum size of a delta, that is cached in
linkgit:git-pack-objects[1]. This cache is used to speed up the
writing object phase by not having to recompute the final delta
- result once the best match for all objects is found. Defaults to 1000.
+ result once the best match for all objects is found.
+ Defaults to 1000. Maximum value is 65535.
pack.threads::
Specifies the number of threads to spawn when searching for best
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index cdbad57082..9a0962cf31 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2105,12 +2105,19 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* between writes at that moment.
*/
if (entry->delta_data && !pack_to_stdout) {
- entry->z_delta_size = do_compress(&entry->delta_data,
- entry->delta_size);
- cache_lock();
- delta_cache_size -= entry->delta_size;
- delta_cache_size += entry->z_delta_size;
- cache_unlock();
+ unsigned long size;
+
+ size = do_compress(&entry->delta_data, entry->delta_size);
+ entry->z_delta_size = size;
+ if (entry->z_delta_size == size) {
+ cache_lock();
+ delta_cache_size -= entry->delta_size;
+ delta_cache_size += entry->z_delta_size;
+ cache_unlock();
+ } else {
+ FREE_AND_NULL(entry->delta_data);
+ entry->z_delta_size = 0;
+ }
}
/* if we made n a delta, and if n is already at max
@@ -3089,6 +3096,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (depth >= (1 << OE_DEPTH_BITS))
die(_("delta chain depth %d is greater than maximum limit %d"),
depth, (1 << OE_DEPTH_BITS));
+ if (cache_max_small_delta_size >= (1 << OE_Z_DELTA_BITS))
+ die(_("pack.deltaCacheLimit is greater than maximum limit %d"),
+ 1 << OE_Z_DELTA_BITS);
argv_array_push(&rp, "pack-objects");
if (thin) {
diff --git a/pack-objects.h b/pack-objects.h
index 7f32de2a35..a66c37e35a 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -4,6 +4,7 @@
#define OE_DFS_STATE_BITS 2
#define OE_DEPTH_BITS 12
#define OE_IN_PACK_BITS 14
+#define OE_Z_DELTA_BITS 16
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -78,7 +79,7 @@ struct object_entry {
*/
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
- unsigned long z_delta_size; /* delta data size (compressed) */
+ unsigned z_delta_size:OE_Z_DELTA_BITS;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned type_valid:1;
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v4 08/11] pack-objects: shrink z_delta_size field in struct object_entry
2018-03-16 18:31 ` [PATCH v4 08/11] pack-objects: shrink z_delta_size field in struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-16 19:40 ` Junio C Hamano
0 siblings, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2018-03-16 19:40 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> We only cache deltas when it's smaller than a certain limit. This limit
> defaults to 1000 but save its compressed length in a 64-bit field.
> Shrink that field down to 16 bits, so you can only cache 65kb deltas.
> Larger deltas must be recomputed at when the pack is written down.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> if (entry->delta_data && !pack_to_stdout) {
> - entry->z_delta_size = do_compress(&entry->delta_data,
> - entry->delta_size);
> - cache_lock();
> - delta_cache_size -= entry->delta_size;
> - delta_cache_size += entry->z_delta_size;
> - cache_unlock();
> + unsigned long size;
> +
> + size = do_compress(&entry->delta_data, entry->delta_size);
> + entry->z_delta_size = size;
> + if (entry->z_delta_size == size) {
It is confusing to readers to write
A = B;
if (A == B) {
/* OK, A was big enough */
} else {
/* No, B is too big to fit on A */
}
I actually was about to complain that you attempted an unrelated
micro-optimization to skip cache_lock/unlock when delta_size and
z_delta_size are the same, and made a typo. Something like:
size = do_compress(...);
if (size < (1 << OE_Z_DELTA_BITS)) {
entry->z_delta_size = size;
cache_lock();
...
cache_unlock();
} else {
FREE_AND_NULL(entry->delta_data);
entry->z_delta_size = 0;
}
would have saved me a few dozens of seconds of head-scratching.
> + cache_lock();
> + delta_cache_size -= entry->delta_size;
> + delta_cache_size += entry->z_delta_size;
> + cache_unlock();
> + } else {
> + FREE_AND_NULL(entry->delta_data);
> + entry->z_delta_size = 0;
> + }
> }
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v4 09/11] pack-objects: shrink size field in struct object_entry
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (7 preceding siblings ...)
2018-03-16 18:31 ` [PATCH v4 08/11] pack-objects: shrink z_delta_size field in struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-16 19:49 ` Junio C Hamano
2018-03-16 18:31 ` [PATCH v4 10/11] pack-objects: shrink delta_size " Nguyễn Thái Ngọc Duy
` (2 subsequent siblings)
11 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
It's very very rare that an uncompressd object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.
Shrink size field down to 32 bits [1] and one overflow bit. If the size
is too large, we read it back from disk.
Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).
A small note about the conditional oe_set_size() in
check_object(). Technically if we don't get a valid type, it's not
wrong if we set uninitialized value "size" (we don't pre-initialize
this and sha1_object_info will not assign anything when it fails to
get the info).
This how changes the writing code path slightly which emits different
error messages (either way we die). One of our tests in t5530 depends
on this specific error message. Let's just keep the test as-is and
play safe by not assigning random value. That might trigger valgrind
anyway.
[1] it's actually already 32 bits on Windows
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 49 ++++++++++++++++++++++++++----------------
pack-objects.h | 43 +++++++++++++++++++++++++++++++++++-
2 files changed, 73 insertions(+), 19 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9a0962cf31..14aa4acd50 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -274,7 +274,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
if (!usable_delta) {
if (oe_type(entry) == OBJ_BLOB &&
- entry->size > big_file_threshold &&
+ oe_size_greater_than(entry, big_file_threshold) &&
(st = open_istream(entry->idx.oid.hash, &type, &size, NULL)) != NULL)
buf = NULL;
else {
@@ -384,12 +384,13 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned char header[MAX_PACK_OBJECT_HEADER],
dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
+ unsigned long entry_size = oe_size(entry);
if (DELTA(entry))
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
- type, entry->size);
+ type, entry_size);
offset = entry->in_pack_offset;
revidx = find_pack_revindex(p, offset);
@@ -406,7 +407,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
datalen -= entry->in_pack_header_size;
if (!pack_to_stdout && p->index_version == 1 &&
- check_pack_inflate(p, &w_curs, offset, datalen, entry->size)) {
+ check_pack_inflate(p, &w_curs, offset, datalen, entry_size)) {
error("corrupt packed object for %s",
oid_to_hex(&entry->idx.oid));
unuse_pack(&w_curs);
@@ -1412,6 +1413,8 @@ static void cleanup_preferred_base(void)
static void check_object(struct object_entry *entry)
{
+ unsigned long size;
+
if (IN_PACK(entry)) {
struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
@@ -1431,13 +1434,14 @@ static void check_object(struct object_entry *entry)
*/
used = unpack_object_header_buffer(buf, avail,
&type,
- &entry->size);
+ &size);
if (used == 0)
goto give_up;
if (type < 0)
die("BUG: invalid type %d", type);
entry->in_pack_type = type;
+ oe_set_size(entry, size);
/*
* Determine if this is a delta and if so whether we can
@@ -1505,7 +1509,7 @@ static void check_object(struct object_entry *entry)
*/
oe_set_type(entry, entry->in_pack_type);
SET_DELTA(entry, base_entry);
- entry->delta_size = entry->size;
+ entry->delta_size = oe_size(entry);
entry->delta_sibling_idx = base_entry->delta_child_idx;
SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
@@ -1513,14 +1517,17 @@ static void check_object(struct object_entry *entry)
}
if (oe_type(entry)) {
+ unsigned long size;
+
+ size = get_size_from_delta(p, &w_curs,
+ entry->in_pack_offset + entry->in_pack_header_size);
/*
* This must be a delta and we already know what the
* final object type is. Let's extract the actual
* object size from the delta header.
*/
- entry->size = get_size_from_delta(p, &w_curs,
- entry->in_pack_offset + entry->in_pack_header_size);
- if (entry->size == 0)
+ oe_set_size(entry, size);
+ if (oe_size_less_than(entry, 1))
goto give_up;
unuse_pack(&w_curs);
return;
@@ -1535,13 +1542,15 @@ static void check_object(struct object_entry *entry)
unuse_pack(&w_curs);
}
- oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &entry->size));
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &size));
/*
* The error condition is checked in prepare_pack(). This is
* to permit a missing preferred base object to be ignored
* as a preferred base. Doing so can result in a larger
* pack file, but the transfer will still take place.
*/
+ if (entry->type_valid)
+ oe_set_size(entry, size);
}
static int pack_offset_sort(const void *_a, const void *_b)
@@ -1581,6 +1590,7 @@ static void drop_reused_delta(struct object_entry *entry)
unsigned *idx = &to_pack.objects[entry->delta_idx - 1].delta_child_idx;
struct object_info oi = OBJECT_INFO_INIT;
enum object_type type;
+ unsigned long size;
while (*idx) {
struct object_entry *oe = &to_pack.objects[*idx - 1];
@@ -1593,7 +1603,7 @@ static void drop_reused_delta(struct object_entry *entry)
SET_DELTA(entry, NULL);
entry->depth = 0;
- oi.sizep = &entry->size;
+ oi.sizep = &size;
oi.typep = &type;
if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
@@ -1603,10 +1613,11 @@ static void drop_reused_delta(struct object_entry *entry)
* and dealt with in prepare_pack().
*/
oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
- &entry->size));
+ &size));
} else {
oe_set_type(entry, type);
}
+ oe_set_size(entry, size);
}
/*
@@ -1746,7 +1757,7 @@ static void get_object_details(void)
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = sorted_by_offset[i];
check_object(entry);
- if (big_file_threshold < entry->size)
+ if (oe_size_greater_than(entry, big_file_threshold))
entry->no_try_delta = 1;
}
@@ -1775,6 +1786,8 @@ static int type_size_sort(const void *_a, const void *_b)
const struct object_entry *b = *(struct object_entry **)_b;
enum object_type a_type = oe_type(a);
enum object_type b_type = oe_type(b);
+ unsigned long a_size = oe_size(a);
+ unsigned long b_size = oe_size(b);
if (a_type > b_type)
return -1;
@@ -1788,9 +1801,9 @@ static int type_size_sort(const void *_a, const void *_b)
return -1;
if (a->preferred_base < b->preferred_base)
return 1;
- if (a->size > b->size)
+ if (a_size > b_size)
return -1;
- if (a->size < b->size)
+ if (a_size < b_size)
return 1;
return a < b ? -1 : (a > b); /* newest first */
}
@@ -1877,7 +1890,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
return 0;
/* Now some size filtering heuristics. */
- trg_size = trg_entry->size;
+ trg_size = oe_size(trg_entry);
if (!DELTA(trg_entry)) {
max_size = trg_size/2 - 20;
ref_depth = 1;
@@ -1889,7 +1902,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
(max_depth - ref_depth + 1);
if (max_size == 0)
return 0;
- src_size = src_entry->size;
+ src_size = oe_size(src_entry);
sizediff = src_size < trg_size ? trg_size - src_size : 0;
if (sizediff >= max_size)
return 0;
@@ -2009,7 +2022,7 @@ static unsigned long free_unpacked(struct unpacked *n)
free_delta_index(n->index);
n->index = NULL;
if (n->data) {
- freed_mem += n->entry->size;
+ freed_mem += oe_size(n->entry);
FREE_AND_NULL(n->data);
}
n->entry = NULL;
@@ -2459,7 +2472,7 @@ static void prepare_pack(int window, int depth)
*/
continue;
- if (entry->size < 50)
+ if (oe_size_less_than(entry, 50))
continue;
if (entry->no_try_delta)
diff --git a/pack-objects.h b/pack-objects.h
index a66c37e35a..5c7e15ca92 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -69,7 +69,9 @@ enum dfs_state {
*/
struct object_entry {
struct pack_idx_entry idx;
- unsigned long size; /* uncompressed size */
+ /* object uncompressed size _if_ size_valid is true */
+ uint32_t size_;
+ unsigned size_valid:1;
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
off_t in_pack_offset;
uint32_t delta_idx; /* delta base object */
@@ -257,4 +259,43 @@ static inline void oe_set_delta_sibling(struct packing_data *pack,
e->delta_sibling_idx = 0;
}
+static inline unsigned long oe_size(const struct object_entry *e)
+{
+ if (e->size_valid) {
+ return e->size_;
+ } else {
+ unsigned long size;
+
+ sha1_object_info(e->idx.oid.hash, &size);
+ return size;
+ }
+}
+
+static inline int oe_size_less_than(const struct object_entry *e,
+ unsigned long limit)
+{
+ if (e->size_valid)
+ return e->size_ < limit;
+ if (limit > maximum_unsigned_value_of_type(uint32_t))
+ return 1;
+ return oe_size(e) < limit;
+}
+
+static inline int oe_size_greater_than(const struct object_entry *e,
+ unsigned long limit)
+{
+ if (e->size_valid)
+ return e->size_ > limit;
+ if (limit <= maximum_unsigned_value_of_type(uint32_t))
+ return 1;
+ return oe_size(e) > limit;
+}
+
+static inline void oe_set_size(struct object_entry *e,
+ unsigned long size)
+{
+ e->size_ = size;
+ e->size_valid = e->size_ == size;
+}
+
#endif
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v4 09/11] pack-objects: shrink size field in struct object_entry
2018-03-16 18:31 ` [PATCH v4 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
@ 2018-03-16 19:49 ` Junio C Hamano
2018-03-16 21:34 ` Junio C Hamano
0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2018-03-16 19:49 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> It's very very rare that an uncompressd object is larger than 4GB
> (partly because Git does not handle those large files very well to
> begin with). Let's optimize it for the common case where object size
> is smaller than this limit.
>
> Shrink size field down to 32 bits [1] and one overflow bit. If the size
> is too large, we read it back from disk.
OK.
> Add two compare helpers that can take advantage of the overflow
> bit (e.g. if the file is 4GB+, chances are it's already larger than
> core.bigFileThreshold and there's no point in comparing the actual
> value).
I had trouble reading the callers of these helpers.
> +static inline int oe_size_less_than(const struct object_entry *e,
> + unsigned long limit)
> +{
> + if (e->size_valid)
> + return e->size_ < limit;
> + if (limit > maximum_unsigned_value_of_type(uint32_t))
> + return 1;
When size_valid bit is false, that means that the size is larger
than 4GB. If "limit" is larger than 4GB, then we do not know
anything, no? I'd understand if this "optimization" were
if (limit < 4GB) {
/*
* we know e whose size won't fit in 4GB is larger
* than that!
*/
return 0;
}
> + return oe_size(e) < limit;
> +}
Also, don't we want to use uintmax_t throughout the callchain? How
would the code in this series work when your ulong is 32-bit?
> +
> +static inline int oe_size_greater_than(const struct object_entry *e,
> + unsigned long limit)
> +{
> + if (e->size_valid)
> + return e->size_ > limit;
> + if (limit <= maximum_unsigned_value_of_type(uint32_t))
> + return 1;
> + return oe_size(e) > limit;
> +}
> +
> +static inline void oe_set_size(struct object_entry *e,
> + unsigned long size)
> +{
> + e->size_ = size;
> + e->size_valid = e->size_ == size;
> +}
> +
> #endif
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v4 09/11] pack-objects: shrink size field in struct object_entry
2018-03-16 19:49 ` Junio C Hamano
@ 2018-03-16 21:34 ` Junio C Hamano
0 siblings, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2018-03-16 21:34 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Junio C Hamano <gitster@pobox.com> writes:
> Also, don't we want to use uintmax_t throughout the callchain? How
> would the code in this series work when your ulong is 32-bit?
My own answer to this question is "no conversion to uintmax_t, at
least not in this series." As long as the original code uses
"unsigned long", this series also should, I think.
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v4 10/11] pack-objects: shrink delta_size field in struct object_entry
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (8 preceding siblings ...)
2018-03-16 18:31 ` [PATCH v4 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:31 ` Nguyễn Thái Ngọc Duy
2018-03-16 18:32 ` [PATCH v4 11/11] pack-objects.h: reorder members to shrink " Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
11 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:31 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Allowing a delta size of 64 bits is crazy. Shrink this field down to
31 bits with one overflow bit.
If we find an existing delta larger than 2GB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 24 ++++++++++++++----------
pack-objects.h | 23 ++++++++++++++++++++++-
2 files changed, 36 insertions(+), 11 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 14aa4acd50..c388d87c3e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,10 +30,12 @@
#include "packfile.h"
#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+#define DELTA_SIZE(obj) oe_delta_size(&to_pack, obj)
#define DELTA(obj) oe_delta(&to_pack, obj)
#define DELTA_CHILD(obj) oe_delta_child(&to_pack, obj)
#define DELTA_SIBLING(obj) oe_delta_sibling(&to_pack, obj)
#define SET_DELTA(obj, val) oe_set_delta(&to_pack, obj, val)
+#define SET_DELTA_SIZE(obj, val) oe_set_delta_size(&to_pack, obj, val)
#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(&to_pack, obj, val)
#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(&to_pack, obj, val)
@@ -140,7 +142,7 @@ static void *get_delta(struct object_entry *entry)
oid_to_hex(&DELTA(entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
buf, size, &delta_size, 0);
- if (!delta_buf || delta_size != entry->delta_size)
+ if (!delta_buf || delta_size != DELTA_SIZE(entry))
die("delta size changed");
free(buf);
free(base_buf);
@@ -291,14 +293,14 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
FREE_AND_NULL(entry->delta_data);
entry->z_delta_size = 0;
} else if (entry->delta_data) {
- size = entry->delta_size;
+ size = DELTA_SIZE(entry);
buf = entry->delta_data;
entry->delta_data = NULL;
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
- size = entry->delta_size;
+ size = DELTA_SIZE(entry);
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
@@ -1509,7 +1511,7 @@ static void check_object(struct object_entry *entry)
*/
oe_set_type(entry, entry->in_pack_type);
SET_DELTA(entry, base_entry);
- entry->delta_size = oe_size(entry);
+ SET_DELTA_SIZE(entry, oe_size(entry));
entry->delta_sibling_idx = base_entry->delta_child_idx;
SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
@@ -1895,7 +1897,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
- max_size = trg_entry->delta_size;
+ max_size = DELTA_SIZE(trg_entry);
ref_depth = trg->depth;
}
max_size = (uint64_t)max_size * (max_depth - src->depth) /
@@ -1966,10 +1968,12 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
if (!delta_buf)
return 0;
+ if (delta_size >= maximum_unsigned_value_of_type(uint32_t))
+ return 0;
if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
- if (delta_size == trg_entry->delta_size &&
+ if (delta_size == DELTA_SIZE(trg_entry) &&
src->depth + 1 >= trg->depth) {
free(delta_buf);
return 0;
@@ -1984,7 +1988,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
free(trg_entry->delta_data);
cache_lock();
if (trg_entry->delta_data) {
- delta_cache_size -= trg_entry->delta_size;
+ delta_cache_size -= DELTA_SIZE(trg_entry);
trg_entry->delta_data = NULL;
}
if (delta_cacheable(src_size, trg_size, delta_size)) {
@@ -1997,7 +2001,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
}
SET_DELTA(trg_entry, src_entry);
- trg_entry->delta_size = delta_size;
+ SET_DELTA_SIZE(trg_entry, delta_size);
trg->depth = src->depth + 1;
return 1;
@@ -2120,11 +2124,11 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
if (entry->delta_data && !pack_to_stdout) {
unsigned long size;
- size = do_compress(&entry->delta_data, entry->delta_size);
+ size = do_compress(&entry->delta_data, DELTA_SIZE(entry));
entry->z_delta_size = size;
if (entry->z_delta_size == size) {
cache_lock();
- delta_cache_size -= entry->delta_size;
+ delta_cache_size -= DELTA_SIZE(entry);
delta_cache_size += entry->z_delta_size;
cache_unlock();
} else {
diff --git a/pack-objects.h b/pack-objects.h
index 5c7e15ca92..f430d938c6 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -5,6 +5,7 @@
#define OE_DEPTH_BITS 12
#define OE_IN_PACK_BITS 14
#define OE_Z_DELTA_BITS 16
+#define OE_DELTA_SIZE_BITS 31
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -80,7 +81,8 @@ struct object_entry {
* uses the same base as me
*/
void *delta_data; /* cached delta (uncompressed) */
- unsigned long delta_size; /* delta data size (uncompressed) */
+ uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
+ uint32_t delta_size_valid:1;
unsigned z_delta_size:OE_Z_DELTA_BITS;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
@@ -298,4 +300,23 @@ static inline void oe_set_size(struct object_entry *e,
e->size_valid = e->size_ == size;
}
+static inline unsigned long oe_delta_size(struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_size_valid)
+ return e->delta_size_;
+ return oe_size(e);
+}
+
+static inline void oe_set_delta_size(struct packing_data *pack,
+ struct object_entry *e,
+ unsigned long size)
+{
+ e->delta_size_ = size;
+ e->delta_size_valid =e->delta_size_ == size;
+ if (!e->delta_size_valid && size != oe_size(e))
+ die("BUG: this can only happen in check_object() "
+ "where delta size is the same as entry size");
+}
+
#endif
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v4 11/11] pack-objects.h: reorder members to shrink struct object_entry
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (9 preceding siblings ...)
2018-03-16 18:31 ` [PATCH v4 10/11] pack-objects: shrink delta_size " Nguyễn Thái Ngọc Duy
@ 2018-03-16 18:32 ` Nguyễn Thái Ngọc Duy
2018-03-16 21:02 ` Junio C Hamano
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
11 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-16 18:32 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Previous patches leave lots of holes and padding in this struct. This
patch reorders the members and shrinks the struct down to 80 bytes
(from 136 bytes, before any field shrinking is done) with 16 bits to
spare (and a couple more in in_pack_header_size when we really run out
of bits).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
pack-objects.h | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/pack-objects.h b/pack-objects.h
index f430d938c6..0fa0c83294 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -70,35 +70,36 @@ enum dfs_state {
*/
struct object_entry {
struct pack_idx_entry idx;
- /* object uncompressed size _if_ size_valid is true */
- uint32_t size_;
- unsigned size_valid:1;
- unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
+ void *delta_data; /* cached delta (uncompressed) */
off_t in_pack_offset;
+ uint32_t hash; /* name hint hash */
+ uint32_t size_; /* object uncompressed size _if_ size_valid is true */
uint32_t delta_idx; /* delta base object */
uint32_t delta_child_idx; /* deltified objects who bases me */
uint32_t delta_sibling_idx; /* other deltified objects who
* uses the same base as me
*/
- void *delta_data; /* cached delta (uncompressed) */
uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
uint32_t delta_size_valid:1;
+ unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
+ unsigned size_valid:1;
unsigned z_delta_size:OE_Z_DELTA_BITS;
+ unsigned type_valid:1;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
- unsigned type_valid:1;
- uint32_t hash; /* name hint hash */
- unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
* objects against.
*/
unsigned no_try_delta:1;
+ unsigned char in_pack_header_size;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
unsigned depth:OE_DEPTH_BITS;
+
+ /* size: 80, bit_padding: 16 bits */
};
struct packing_data {
--
2.16.2.903.gd04caf5039
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v4 11/11] pack-objects.h: reorder members to shrink struct object_entry
2018-03-16 18:32 ` [PATCH v4 11/11] pack-objects.h: reorder members to shrink " Nguyễn Thái Ngọc Duy
@ 2018-03-16 21:02 ` Junio C Hamano
2018-03-17 12:07 ` Duy Nguyen
0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2018-03-16 21:02 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> Previous patches leave lots of holes and padding in this struct. This
> patch reorders the members and shrinks the struct down to 80 bytes
> (from 136 bytes, before any field shrinking is done) with 16 bits to
> spare (and a couple more in in_pack_header_size when we really run out
> of bits).
Nice.
I am wondering if we need some conditional code for 32-bit platform.
For example, you have uint32_t field and do things like this:
static inline int oe_size_less_than(const struct object_entry *e,
unsigned long limit)
{
if (e->size_valid)
return e->size_ < limit;
if (limit > maximum_unsigned_value_of_type(uint32_t))
return 1;
return oe_size(e) < limit;
}
Do we and compilers do the right thing when your ulong is uint32_t?
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v4 11/11] pack-objects.h: reorder members to shrink struct object_entry
2018-03-16 21:02 ` Junio C Hamano
@ 2018-03-17 12:07 ` Duy Nguyen
0 siblings, 0 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-17 12:07 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
On Fri, Mar 16, 2018 at 10:02 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> Previous patches leave lots of holes and padding in this struct. This
>> patch reorders the members and shrinks the struct down to 80 bytes
>> (from 136 bytes, before any field shrinking is done) with 16 bits to
>> spare (and a couple more in in_pack_header_size when we really run out
>> of bits).
>
> Nice.
>
> I am wondering if we need some conditional code for 32-bit platform.
> For example, you have uint32_t field and do things like this:
>
> static inline int oe_size_less_than(const struct object_entry *e,
> unsigned long limit)
> {
> if (e->size_valid)
> return e->size_ < limit;
> if (limit > maximum_unsigned_value_of_type(uint32_t))
> return 1;
> return oe_size(e) < limit;
> }
>
> Do we and compilers do the right thing when your ulong is uint32_t?
Another good point. My 32-bit build does complain
In file included from builtin/pack-objects.c:20:0:
./pack-objects.h: In function ?oe_size_less_than?:
./pack-objects.h:282:12: error: comparison is always false due to
limited range of data type [-Werror=type-limits]
if (limit > maximum_unsigned_value_of_type(uint32_t))
^
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v5 00/11] nd/pack-objects-pack-struct updates
2018-03-16 18:31 ` [PATCH v4 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (10 preceding siblings ...)
2018-03-16 18:32 ` [PATCH v4 11/11] pack-objects.h: reorder members to shrink " Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 01/11] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
` (12 more replies)
11 siblings, 13 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
v5 changes are small enough that the interdiff is pretty self
explanatory (there's also a couple commit msg updates).
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c388d87c3e..fb2aba80bf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1611,7 +1611,7 @@ static void drop_reused_delta(struct object_entry *entry)
/*
* We failed to get the info from this pack for some reason;
* fall back to sha1_object_info, which may find another copy.
- * And if that fails, the error will be recorded in entry->type
+ * And if that fails, the error will be recorded in oe_type(entry)
* and dealt with in prepare_pack().
*/
oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
@@ -1968,7 +1968,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
if (!delta_buf)
return 0;
- if (delta_size >= maximum_unsigned_value_of_type(uint32_t))
+ if (delta_size >= (1 << OE_DELTA_SIZE_BITS))
return 0;
if (DELTA(trg_entry)) {
@@ -2125,8 +2125,8 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
unsigned long size;
size = do_compress(&entry->delta_data, DELTA_SIZE(entry));
- entry->z_delta_size = size;
- if (entry->z_delta_size == size) {
+ if (size < (1 << OE_Z_DELTA_BITS)) {
+ entry->z_delta_size = size;
cache_lock();
delta_cache_size -= DELTA_SIZE(entry);
delta_cache_size += entry->z_delta_size;
diff --git a/pack-objects.h b/pack-objects.h
index 0fa0c83294..8979289f5f 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -27,14 +27,15 @@ enum dfs_state {
*
* basic object info
* -----------------
- * idx.oid is filled up before delta searching starts. idx.crc32 and
- * is only valid after the object is written out and will be used for
+ * idx.oid is filled up before delta searching starts. idx.crc32 is
+ * only valid after the object is written out and will be used for
* generating the index. idx.offset will be both gradually set and
* used in writing phase (base objects get offset first, then deltas
* refer to them)
*
- * "size" is the uncompressed object size. Compressed size is not
- * cached (ie. raw data in a pack) but available via revindex.
+ * "size" is the uncompressed object size. Compressed size of the raw
+ * data for an object in a pack is not stored anywhere but is computed
+ * and made available when reverse .idx is made.
*
* "hash" contains a path name hash which is used for sorting the
* delta list and also during delta searching. Once prepare_pack()
@@ -42,16 +43,16 @@ enum dfs_state {
*
* source pack info
* ----------------
- * The (in_pack, in_pack_offset, in_pack_header_size) tuple contains
- * the location of the object in the source pack, with or without
- * header.
+ * The (in_pack, in_pack_offset) tuple contains the location of the
+ * object in the source pack. in_pack_header_size allows quickly
+ * skipping the header and going straight to the zlib stream.
*
* "type" and "in_pack_type" both describe object type. in_pack_type
* may contain a delta type, while type is always the canonical type.
*
* deltas
* ------
- * Delta links (delta, delta_child and delta_sibling) are created
+ * Delta links (delta, delta_child and delta_sibling) are created to
* reflect that delta graph from the source pack then updated or added
* during delta searching phase when we find better deltas.
*
@@ -59,7 +60,7 @@ enum dfs_state {
* compute_write_order(). "delta" and "delta_size" must remain valid
* at object writing phase in case the delta is not cached.
*
- * If a delta is cached in memory and is compressed delta_data points
+ * If a delta is cached in memory and is compressed, delta_data points
* to the data and z_delta_size contains the compressed size. If it's
* uncompressed [1], z_delta_size must be zero. delta_size is always
* the uncompressed size and must be valid even if the delta is not
@@ -274,12 +275,19 @@ static inline unsigned long oe_size(const struct object_entry *e)
}
}
+static inline int contains_in_32bits(unsigned long limit)
+{
+ uint32_t truncated_limit = (uint32_t)limit;
+
+ return limit == truncated_limit;
+}
+
static inline int oe_size_less_than(const struct object_entry *e,
unsigned long limit)
{
if (e->size_valid)
return e->size_ < limit;
- if (limit > maximum_unsigned_value_of_type(uint32_t))
+ if (contains_in_32bits(limit))
return 1;
return oe_size(e) < limit;
}
@@ -289,8 +297,8 @@ static inline int oe_size_greater_than(const struct object_entry *e,
{
if (e->size_valid)
return e->size_ > limit;
- if (limit <= maximum_unsigned_value_of_type(uint32_t))
- return 1;
+ if (contains_in_32bits(limit))
+ return 0;
return oe_size(e) > limit;
}
@@ -314,7 +322,7 @@ static inline void oe_set_delta_size(struct packing_data *pack,
unsigned long size)
{
e->delta_size_ = size;
- e->delta_size_valid =e->delta_size_ == size;
+ e->delta_size_valid = e->delta_size_ == size;
if (!e->delta_size_valid && size != oe_size(e))
die("BUG: this can only happen in check_object() "
"where delta size is the same as entry size");
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v5 01/11] pack-objects: a bit of document about struct object_entry
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 02/11] pack-objects: turn type and in_pack_type to bitfields Nguyễn Thái Ngọc Duy
` (11 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
The role of this comment block becomes more important after we shuffle
fields around to shrink this struct. It will be much harder to see what
field is related to what.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
pack-objects.h | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/pack-objects.h b/pack-objects.h
index 03f1191659..de91edd264 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,51 @@
#ifndef PACK_OBJECTS_H
#define PACK_OBJECTS_H
+/*
+ * basic object info
+ * -----------------
+ * idx.oid is filled up before delta searching starts. idx.crc32 is
+ * only valid after the object is written out and will be used for
+ * generating the index. idx.offset will be both gradually set and
+ * used in writing phase (base objects get offset first, then deltas
+ * refer to them)
+ *
+ * "size" is the uncompressed object size. Compressed size of the raw
+ * data for an object in a pack is not stored anywhere but is computed
+ * and made available when reverse .idx is made.
+ *
+ * "hash" contains a path name hash which is used for sorting the
+ * delta list and also during delta searching. Once prepare_pack()
+ * returns it's no longer needed.
+ *
+ * source pack info
+ * ----------------
+ * The (in_pack, in_pack_offset) tuple contains the location of the
+ * object in the source pack. in_pack_header_size allows quickly
+ * skipping the header and going straight to the zlib stream.
+ *
+ * "type" and "in_pack_type" both describe object type. in_pack_type
+ * may contain a delta type, while type is always the canonical type.
+ *
+ * deltas
+ * ------
+ * Delta links (delta, delta_child and delta_sibling) are created to
+ * reflect that delta graph from the source pack then updated or added
+ * during delta searching phase when we find better deltas.
+ *
+ * delta_child and delta_sibling are last needed in
+ * compute_write_order(). "delta" and "delta_size" must remain valid
+ * at object writing phase in case the delta is not cached.
+ *
+ * If a delta is cached in memory and is compressed, delta_data points
+ * to the data and z_delta_size contains the compressed size. If it's
+ * uncompressed [1], z_delta_size must be zero. delta_size is always
+ * the uncompressed size and must be valid even if the delta is not
+ * cached.
+ *
+ * [1] during try_delta phase we don't bother with compressing because
+ * the delta could be quickly replaced with a better one.
+ */
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v5 02/11] pack-objects: turn type and in_pack_type to bitfields
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 01/11] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 03/11] pack-objects: use bitfield for object_entry::dfs_state Nguyễn Thái Ngọc Duy
` (10 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.
A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.
But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger
fatal: unable to get type of object ...
Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.
[1] See 5c49c11686 (pack-objects: better check_object() performances -
2007-04-16)
[2] 21666f1aae (convert object type handling from a string to a number
- 2007-02-26)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 60 ++++++++++++++++++++++++------------------
cache.h | 2 ++
object.h | 1 -
pack-bitmap-write.c | 6 ++---
pack-objects.h | 20 ++++++++++++--
5 files changed, 58 insertions(+), 31 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5c674b2843..647c01ea34 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -265,7 +265,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
struct git_istream *st = NULL;
if (!usable_delta) {
- if (entry->type == OBJ_BLOB &&
+ if (oe_type(entry) == OBJ_BLOB &&
entry->size > big_file_threshold &&
(st = open_istream(entry->idx.oid.hash, &type, &size, NULL)) != NULL)
buf = NULL;
@@ -371,7 +371,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
- enum object_type type = entry->type;
+ enum object_type type = oe_type(entry);
off_t datalen;
unsigned char header[MAX_PACK_OBJECT_HEADER],
dheader[MAX_PACK_OBJECT_HEADER];
@@ -480,11 +480,12 @@ static off_t write_object(struct hashfile *f,
to_reuse = 0; /* explicit */
else if (!entry->in_pack)
to_reuse = 0; /* can't reuse what we don't have */
- else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
+ else if (oe_type(entry) == OBJ_REF_DELTA ||
+ oe_type(entry) == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
to_reuse = usable_delta;
/* ... but pack split may override that */
- else if (entry->type != entry->in_pack_type)
+ else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0; /* pack has delta which is unusable */
else if (entry->delta)
to_reuse = 0; /* we want to pack afresh */
@@ -705,8 +706,8 @@ static struct object_entry **compute_write_order(void)
* And then all remaining commits and tags.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (objects[i].type != OBJ_COMMIT &&
- objects[i].type != OBJ_TAG)
+ if (oe_type(&objects[i]) != OBJ_COMMIT &&
+ oe_type(&objects[i]) != OBJ_TAG)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -715,7 +716,7 @@ static struct object_entry **compute_write_order(void)
* And then all the trees.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (objects[i].type != OBJ_TREE)
+ if (oe_type(&objects[i]) != OBJ_TREE)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -1066,8 +1067,7 @@ static void create_object_entry(const struct object_id *oid,
entry = packlist_alloc(&to_pack, oid->hash, index_pos);
entry->hash = hash;
- if (type)
- entry->type = type;
+ oe_set_type(entry, type);
if (exclude)
entry->preferred_base = 1;
else
@@ -1407,6 +1407,7 @@ static void check_object(struct object_entry *entry)
unsigned long avail;
off_t ofs;
unsigned char *buf, c;
+ enum object_type type;
buf = use_pack(p, &w_curs, entry->in_pack_offset, &avail);
@@ -1415,11 +1416,15 @@ static void check_object(struct object_entry *entry)
* since non-delta representations could still be reused.
*/
used = unpack_object_header_buffer(buf, avail,
- &entry->in_pack_type,
+ &type,
&entry->size);
if (used == 0)
goto give_up;
+ if (type < 0)
+ die("BUG: invalid type %d", type);
+ entry->in_pack_type = type;
+
/*
* Determine if this is a delta and if so whether we can
* reuse it or not. Otherwise let's find out as cheaply as
@@ -1428,9 +1433,9 @@ static void check_object(struct object_entry *entry)
switch (entry->in_pack_type) {
default:
/* Not a delta hence we've already got all we need. */
- entry->type = entry->in_pack_type;
+ oe_set_type(entry, entry->in_pack_type);
entry->in_pack_header_size = used;
- if (entry->type < OBJ_COMMIT || entry->type > OBJ_BLOB)
+ if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > OBJ_BLOB)
goto give_up;
unuse_pack(&w_curs);
return;
@@ -1484,7 +1489,7 @@ static void check_object(struct object_entry *entry)
* deltify other objects against, in order to avoid
* circular deltas.
*/
- entry->type = entry->in_pack_type;
+ oe_set_type(entry, entry->in_pack_type);
entry->delta = base_entry;
entry->delta_size = entry->size;
entry->delta_sibling = base_entry->delta_child;
@@ -1493,7 +1498,7 @@ static void check_object(struct object_entry *entry)
return;
}
- if (entry->type) {
+ if (oe_type(entry)) {
/*
* This must be a delta and we already know what the
* final object type is. Let's extract the actual
@@ -1516,7 +1521,7 @@ static void check_object(struct object_entry *entry)
unuse_pack(&w_curs);
}
- entry->type = sha1_object_info(entry->idx.oid.hash, &entry->size);
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &entry->size));
/*
* The error condition is checked in prepare_pack(). This is
* to permit a missing preferred base object to be ignored
@@ -1559,6 +1564,7 @@ static void drop_reused_delta(struct object_entry *entry)
{
struct object_entry **p = &entry->delta->delta_child;
struct object_info oi = OBJECT_INFO_INIT;
+ enum object_type type;
while (*p) {
if (*p == entry)
@@ -1570,16 +1576,18 @@ static void drop_reused_delta(struct object_entry *entry)
entry->depth = 0;
oi.sizep = &entry->size;
- oi.typep = &entry->type;
+ oi.typep = &type;
if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to sha1_object_info, which may find another copy.
- * And if that fails, the error will be recorded in entry->type
+ * And if that fails, the error will be recorded in oe_type(entry)
* and dealt with in prepare_pack().
*/
- entry->type = sha1_object_info(entry->idx.oid.hash,
- &entry->size);
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
+ &entry->size));
+ } else {
+ oe_set_type(entry, type);
}
}
@@ -1747,10 +1755,12 @@ static int type_size_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+ enum object_type a_type = oe_type(a);
+ enum object_type b_type = oe_type(b);
- if (a->type > b->type)
+ if (a_type > b_type)
return -1;
- if (a->type < b->type)
+ if (a_type < b_type)
return 1;
if (a->hash > b->hash)
return -1;
@@ -1826,7 +1836,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
void *delta_buf;
/* Don't bother doing diffs between different types */
- if (trg_entry->type != src_entry->type)
+ if (oe_type(trg_entry) != oe_type(src_entry))
return -1;
/*
@@ -2432,11 +2442,11 @@ static void prepare_pack(int window, int depth)
if (!entry->preferred_base) {
nr_deltas++;
- if (entry->type < 0)
+ if (oe_type(entry) < 0)
die("unable to get type of object %s",
oid_to_hex(&entry->idx.oid));
} else {
- if (entry->type < 0) {
+ if (oe_type(entry) < 0) {
/*
* This object is not found, but we
* don't have to include it anyway.
@@ -2545,7 +2555,7 @@ static void read_object_list_from_stdin(void)
die("expected object ID, got garbage:\n %s", line);
add_preferred_base_object(p + 1);
- add_object_entry(&oid, 0, p + 1, 0);
+ add_object_entry(&oid, OBJ_NONE, p + 1, 0);
}
}
diff --git a/cache.h b/cache.h
index 21fbcc2414..862bdff83a 100644
--- a/cache.h
+++ b/cache.h
@@ -373,6 +373,8 @@ extern void free_name_hash(struct index_state *istate);
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
#endif
+#define TYPE_BITS 3
+
enum object_type {
OBJ_BAD = -1,
OBJ_NONE = 0,
diff --git a/object.h b/object.h
index 87563d9056..8ce294d6ec 100644
--- a/object.h
+++ b/object.h
@@ -25,7 +25,6 @@ struct object_array {
#define OBJECT_ARRAY_INIT { 0, 0, NULL }
-#define TYPE_BITS 3
/*
* object flag allocation:
* revision.h: 0---------10 26
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e01f992884..fd11f08940 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -64,12 +64,12 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index,
entry->in_pack_pos = i;
- switch (entry->type) {
+ switch (oe_type(entry)) {
case OBJ_COMMIT:
case OBJ_TREE:
case OBJ_BLOB:
case OBJ_TAG:
- real_type = entry->type;
+ real_type = oe_type(entry);
break;
default:
@@ -98,7 +98,7 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index,
default:
die("Missing type information for %s (%d/%d)",
oid_to_hex(&entry->idx.oid), real_type,
- entry->type);
+ oe_type(entry));
}
}
}
diff --git a/pack-objects.h b/pack-objects.h
index de91edd264..5f568b609c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -59,8 +59,9 @@ struct object_entry {
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
- enum object_type type;
- enum object_type in_pack_type; /* could be delta */
+ unsigned type_:TYPE_BITS;
+ unsigned in_pack_type:TYPE_BITS; /* could be delta */
+ unsigned type_valid:1;
uint32_t hash; /* name hint hash */
unsigned int in_pack_pos;
unsigned char in_pack_header_size;
@@ -123,4 +124,19 @@ static inline uint32_t pack_name_hash(const char *name)
return hash;
}
+static inline enum object_type oe_type(const struct object_entry *e)
+{
+ return e->type_valid ? e->type_ : OBJ_BAD;
+}
+
+static inline void oe_set_type(struct object_entry *e,
+ enum object_type type)
+{
+ if (type >= OBJ_ANY)
+ die("BUG: OBJ_ANY cannot be set in pack-objects code");
+
+ e->type_valid = type >= OBJ_NONE;
+ e->type_ = (unsigned)type;
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v5 03/11] pack-objects: use bitfield for object_entry::dfs_state
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 01/11] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 02/11] pack-objects: turn type and in_pack_type to bitfields Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 04/11] pack-objects: use bitfield for object_entry::depth Nguyễn Thái Ngọc Duy
` (9 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 3 +++
pack-objects.h | 28 +++++++++++++++++-----------
2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 647c01ea34..83f8154865 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3049,6 +3049,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
OPT_END(),
};
+ if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS))
+ die("BUG: too many dfs states, increase OE_DFS_STATE_BITS");
+
check_replace_refs = 0;
reset_pack_idx_option(&pack_idx_opts);
diff --git a/pack-objects.h b/pack-objects.h
index 5f568b609c..4c6b73a4d6 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,21 @@
#ifndef PACK_OBJECTS_H
#define PACK_OBJECTS_H
+#define OE_DFS_STATE_BITS 2
+
+/*
+ * State flags for depth-first search used for analyzing delta cycles.
+ *
+ * The depth is measured in delta-links to the base (so if A is a delta
+ * against B, then A has a depth of 1, and B a depth of 0).
+ */
+enum dfs_state {
+ DFS_NONE = 0,
+ DFS_ACTIVE,
+ DFS_DONE,
+ DFS_NUM_STATES
+};
+
/*
* basic object info
* -----------------
@@ -73,19 +88,10 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
+ unsigned dfs_state:OE_DFS_STATE_BITS;
- /*
- * State flags for depth-first search used for analyzing delta cycles.
- *
- * The depth is measured in delta-links to the base (so if A is a delta
- * against B, then A has a depth of 1, and B a depth of 0).
- */
- enum {
- DFS_NONE = 0,
- DFS_ACTIVE,
- DFS_DONE
- } dfs_state;
int depth;
+
};
struct packing_data {
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v5 04/11] pack-objects: use bitfield for object_entry::depth
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (2 preceding siblings ...)
2018-03-17 14:10 ` [PATCH v5 03/11] pack-objects: use bitfield for object_entry::dfs_state Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 21:26 ` Ævar Arnfjörð Bjarmason
2018-03-17 14:10 ` [PATCH v5 05/11] pack-objects: move in_pack_pos out of struct object_entry Nguyễn Thái Ngọc Duy
` (8 subsequent siblings)
12 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/config.txt | 1 +
Documentation/git-pack-objects.txt | 4 +++-
Documentation/git-repack.txt | 4 +++-
builtin/pack-objects.c | 4 ++++
pack-objects.h | 5 ++---
5 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index f57e9cf10c..9bd3f5a789 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2412,6 +2412,7 @@ pack.window::
pack.depth::
The maximum delta depth used by linkgit:git-pack-objects[1] when no
maximum depth is given on the command line. Defaults to 50.
+ Maximum value is 4095.
pack.windowMemory::
The maximum size of memory that is consumed by each thread
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 81bc490ac5..3503c9e3e6 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -96,7 +96,9 @@ base-name::
it too deep affects the performance on the unpacker
side, because delta data needs to be applied that many
times to get to the necessary object.
- The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
--window-memory=<n>::
This option provides an additional limit on top of `--window`;
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index ae750e9e11..25c83c4927 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -90,7 +90,9 @@ other objects in that pack they already have locally.
space. `--depth` limits the maximum delta depth; making it too deep
affects the performance on the unpacker side, because delta data needs
to be applied that many times to get to the necessary object.
- The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
--threads=<n>::
This option is passed through to `git pack-objects`.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 83f8154865..829c80ffcc 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3068,6 +3068,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (pack_to_stdout != !base_name || argc)
usage_with_options(pack_usage, pack_objects_options);
+ if (depth >= (1 << OE_DEPTH_BITS))
+ die(_("delta chain depth %d is greater than maximum limit %d"),
+ depth, (1 << OE_DEPTH_BITS));
+
argv_array_push(&rp, "pack-objects");
if (thin) {
use_internal_rev_list = 1;
diff --git a/pack-objects.h b/pack-objects.h
index 4c6b73a4d6..a4d8d29c04 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -2,6 +2,7 @@
#define PACK_OBJECTS_H
#define OE_DFS_STATE_BITS 2
+#define OE_DEPTH_BITS 12
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -89,9 +90,7 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
-
- int depth;
-
+ unsigned depth:OE_DEPTH_BITS;
};
struct packing_data {
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v5 04/11] pack-objects: use bitfield for object_entry::depth
2018-03-17 14:10 ` [PATCH v5 04/11] pack-objects: use bitfield for object_entry::depth Nguyễn Thái Ngọc Duy
@ 2018-03-17 21:26 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-17 21:26 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: e, git, gitster, peff
On Sat, Mar 17 2018, Nguyễn Thái Ngọc Duy jotted:
> Because of struct packing from now on we can only handle max depth
> 4095
> [...]
> + if (depth >= (1 << OE_DEPTH_BITS))
> + die(_("delta chain depth %d is greater than maximum limit %d"),
> + depth, (1 << OE_DEPTH_BITS));
> +
This has a off-by-one error:
$ git repack --depth=4096
fatal: delta chain depth 4096 is greater than maximum limit 4096
Per the check we should be feeding `(1 << OE_DEPTH_BITS) - 1` to die().
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v5 05/11] pack-objects: move in_pack_pos out of struct object_entry
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (3 preceding siblings ...)
2018-03-17 14:10 ` [PATCH v5 04/11] pack-objects: use bitfield for object_entry::depth Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 06/11] pack-objects: move in_pack " Nguyễn Thái Ngọc Duy
` (7 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (it's not freed in the same way that objects[] is
not).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 3 ++-
pack-bitmap-write.c | 8 +++++---
pack-bitmap.c | 2 +-
pack-bitmap.h | 4 +++-
pack-objects.h | 16 +++++++++++++++-
5 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 829c80ffcc..727d200770 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -879,7 +879,8 @@ static void write_pack_file(void)
if (write_bitmap_index) {
bitmap_writer_set_checksum(oid.hash);
- bitmap_writer_build_type_index(written_list, nr_written);
+ bitmap_writer_build_type_index(
+ &to_pack, written_list, nr_written);
}
finish_tmp_packfile(&tmpname, pack_tmp_name,
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index fd11f08940..f7c897515b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,8 @@ void bitmap_writer_show_progress(int show)
/**
* Build the initial type index for the packfile
*/
-void bitmap_writer_build_type_index(struct pack_idx_entry **index,
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+ struct pack_idx_entry **index,
uint32_t index_nr)
{
uint32_t i;
@@ -57,12 +58,13 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index,
writer.trees = ewah_new();
writer.blobs = ewah_new();
writer.tags = ewah_new();
+ ALLOC_ARRAY(to_pack->in_pack_pos, to_pack->nr_objects);
for (i = 0; i < index_nr; ++i) {
struct object_entry *entry = (struct object_entry *)index[i];
enum object_type real_type;
- entry->in_pack_pos = i;
+ oe_set_in_pack_pos(to_pack, entry, i);
switch (oe_type(entry)) {
case OBJ_COMMIT:
@@ -147,7 +149,7 @@ static uint32_t find_object_pos(const unsigned char *sha1)
"(object %s is missing)", sha1_to_hex(sha1));
}
- return entry->in_pack_pos;
+ return oe_in_pack_pos(writer.to_pack, entry);
}
static void show_object(struct object *object, const char *name, void *data)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9270983e5f..865d9ecc4e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1032,7 +1032,7 @@ int rebuild_existing_bitmaps(struct packing_data *mapping,
oe = packlist_find(mapping, sha1, NULL);
if (oe)
- reposition[i] = oe->in_pack_pos + 1;
+ reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
}
rebuild = bitmap_new();
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3742a00e14..5ded2f139a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,7 +44,9 @@ int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bi
void bitmap_writer_show_progress(int show);
void bitmap_writer_set_checksum(unsigned char *sha1);
-void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t index_nr);
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+ struct pack_idx_entry **index,
+ uint32_t index_nr);
void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack);
void bitmap_writer_select_commits(struct commit **indexed_commits,
unsigned int indexed_commits_nr, int max_bitmaps);
diff --git a/pack-objects.h b/pack-objects.h
index a4d8d29c04..b832ee2b5e 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -79,7 +79,6 @@ struct object_entry {
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned type_valid:1;
uint32_t hash; /* name hint hash */
- unsigned int in_pack_pos;
unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
@@ -99,6 +98,8 @@ struct packing_data {
int32_t *index;
uint32_t index_size;
+
+ unsigned int *in_pack_pos;
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -144,4 +145,17 @@ static inline void oe_set_type(struct object_entry *e,
e->type_ = (unsigned)type;
}
+static inline unsigned int oe_in_pack_pos(const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ return pack->in_pack_pos[e - pack->objects];
+}
+
+static inline void oe_set_in_pack_pos(const struct packing_data *pack,
+ const struct object_entry *e,
+ unsigned int pos)
+{
+ pack->in_pack_pos[e - pack->objects] = pos;
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v5 06/11] pack-objects: move in_pack out of struct object_entry
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (4 preceding siblings ...)
2018-03-17 14:10 ` [PATCH v5 05/11] pack-objects: move in_pack_pos out of struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 07/11] pack-objects: refer to delta objects by index instead of pointer Nguyễn Thái Ngọc Duy
` (6 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.
This limits the number of packs we can handle to 16k. For now if you hit
16k pack files limit, pack-objects will simply fail [1].
[1] The escape hatch is .keep file to limit the non-kept pack files
below 16k limit. Then you can go for another pack-objects run to
combine another 16k pack files. Repeat until you're satisfied.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/git-pack-objects.txt | 9 ++++++
builtin/pack-objects.c | 40 +++++++++++++++++----------
cache.h | 1 +
pack-objects.h | 44 +++++++++++++++++++++++++++++-
4 files changed, 79 insertions(+), 15 deletions(-)
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 3503c9e3e6..b8d936ccf5 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -269,6 +269,15 @@ Unexpected missing object will raise an error.
locally created objects [without .promisor] and objects from the
promisor remote [with .promisor].) This is used with partial clone.
+LIMITATIONS
+-----------
+
+This command could only handle 16384 existing pack files at a time.
+If you have more than this, you need to exclude some pack files with
+".keep" file and --honor-pack-keep option, to combine 16k pack files
+in one, then remove these .keep files and run pack-objects one more
+time.
+
SEE ALSO
--------
linkgit:git-rev-list[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 727d200770..eaf78fa41a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -29,6 +29,8 @@
#include "list.h"
#include "packfile.h"
+#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+
static const char *pack_usage[] = {
N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
N_("git pack-objects [<options>...] <base-name> [< <ref-list> | < <object-list>]"),
@@ -367,7 +369,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned long limit, int usable_delta)
{
- struct packed_git *p = entry->in_pack;
+ struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
@@ -478,7 +480,7 @@ static off_t write_object(struct hashfile *f,
if (!reuse_object)
to_reuse = 0; /* explicit */
- else if (!entry->in_pack)
+ else if (!IN_PACK(entry))
to_reuse = 0; /* can't reuse what we don't have */
else if (oe_type(entry) == OBJ_REF_DELTA ||
oe_type(entry) == OBJ_OFS_DELTA)
@@ -1025,7 +1027,7 @@ static int want_object_in_pack(const struct object_id *oid,
if (*found_pack) {
want = want_found_object(exclude, *found_pack);
if (want != -1)
- return want;
+ goto done;
}
list_for_each(pos, &packed_git_mru) {
@@ -1048,11 +1050,16 @@ static int want_object_in_pack(const struct object_id *oid,
if (!exclude && want > 0)
list_move(&p->mru, &packed_git_mru);
if (want != -1)
- return want;
+ goto done;
}
}
- return 1;
+ want = 1;
+done:
+ if (want && *found_pack && !(*found_pack)->index)
+ oe_add_pack(&to_pack, *found_pack);
+
+ return want;
}
static void create_object_entry(const struct object_id *oid,
@@ -1074,7 +1081,7 @@ static void create_object_entry(const struct object_id *oid,
else
nr_result++;
if (found_pack) {
- entry->in_pack = found_pack;
+ oe_set_in_pack(entry, found_pack);
entry->in_pack_offset = found_offset;
}
@@ -1399,8 +1406,8 @@ static void cleanup_preferred_base(void)
static void check_object(struct object_entry *entry)
{
- if (entry->in_pack) {
- struct packed_git *p = entry->in_pack;
+ if (IN_PACK(entry)) {
+ struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
const unsigned char *base_ref = NULL;
struct object_entry *base_entry;
@@ -1535,14 +1542,16 @@ static int pack_offset_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+ const struct packed_git *a_in_pack = IN_PACK(a);
+ const struct packed_git *b_in_pack = IN_PACK(b);
/* avoid filesystem trashing with loose objects */
- if (!a->in_pack && !b->in_pack)
+ if (!a_in_pack && !b_in_pack)
return oidcmp(&a->idx.oid, &b->idx.oid);
- if (a->in_pack < b->in_pack)
+ if (a_in_pack < b_in_pack)
return -1;
- if (a->in_pack > b->in_pack)
+ if (a_in_pack > b_in_pack)
return 1;
return a->in_pack_offset < b->in_pack_offset ? -1 :
(a->in_pack_offset > b->in_pack_offset);
@@ -1578,7 +1587,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &entry->size;
oi.typep = &type;
- if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to sha1_object_info, which may find another copy.
@@ -1848,8 +1857,8 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
* it, we will still save the transfer cost, as we already know
* the other side has it and we won't send src_entry at all.
*/
- if (reuse_delta && trg_entry->in_pack &&
- trg_entry->in_pack == src_entry->in_pack &&
+ if (reuse_delta && IN_PACK(trg_entry) &&
+ IN_PACK(trg_entry) == IN_PACK(src_entry) &&
!src_entry->preferred_base &&
trg_entry->in_pack_type != OBJ_REF_DELTA &&
trg_entry->in_pack_type != OBJ_OFS_DELTA)
@@ -3191,6 +3200,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
}
}
+ /* make sure IN_PACK(0) return NULL */
+ oe_add_pack(&to_pack, NULL);
+
if (progress)
progress_state = start_progress(_("Counting objects"), 0);
if (!use_internal_rev_list)
diff --git a/cache.h b/cache.h
index 862bdff83a..b90feb3802 100644
--- a/cache.h
+++ b/cache.h
@@ -1635,6 +1635,7 @@ extern struct packed_git {
int index_version;
time_t mtime;
int pack_fd;
+ int index; /* for builtin/pack-objects.c */
unsigned pack_local:1,
pack_keep:1,
freshened:1,
diff --git a/pack-objects.h b/pack-objects.h
index b832ee2b5e..933f71a86b 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -3,6 +3,7 @@
#define OE_DFS_STATE_BITS 2
#define OE_DEPTH_BITS 12
+#define OE_IN_PACK_BITS 14
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -18,6 +19,10 @@ enum dfs_state {
};
/*
+ * The size of struct nearly determines pack-objects's memory
+ * consumption. This struct is packed tight for that reason. When you
+ * add or reorder something in this struct, think a bit about this.
+ *
* basic object info
* -----------------
* idx.oid is filled up before delta searching starts. idx.crc32 is
@@ -65,7 +70,7 @@ enum dfs_state {
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
- struct packed_git *in_pack; /* already in pack */
+ unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
off_t in_pack_offset;
struct object_entry *delta; /* delta base object */
struct object_entry *delta_child; /* deltified objects who bases me */
@@ -100,6 +105,8 @@ struct packing_data {
uint32_t index_size;
unsigned int *in_pack_pos;
+ int in_pack_count;
+ struct packed_git *in_pack[1 << OE_IN_PACK_BITS];
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -158,4 +165,39 @@ static inline void oe_set_in_pack_pos(const struct packing_data *pack,
pack->in_pack_pos[e - pack->objects] = pos;
}
+static inline unsigned int oe_add_pack(struct packing_data *pack,
+ struct packed_git *p)
+{
+ if (pack->in_pack_count >= (1 << OE_IN_PACK_BITS))
+ die(_("too many packs to handle in one go. "
+ "Please add .keep files to exclude\n"
+ "some pack files and keep the number "
+ "of non-kept files below %d."),
+ 1 << OE_IN_PACK_BITS);
+ if (p) {
+ if (p->index > 0)
+ die("BUG: this packed is already indexed");
+ p->index = pack->in_pack_count;
+ }
+ pack->in_pack[pack->in_pack_count] = p;
+ return pack->in_pack_count++;
+}
+
+static inline struct packed_git *oe_in_pack(const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ return pack->in_pack[e->in_pack_idx];
+
+}
+
+static inline void oe_set_in_pack(struct object_entry *e,
+ struct packed_git *p)
+{
+ if (p->index <= 0)
+ die("BUG: found_pack should be NULL "
+ "instead of having non-positive index");
+ e->in_pack_idx = p->index;
+
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v5 07/11] pack-objects: refer to delta objects by index instead of pointer
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (5 preceding siblings ...)
2018-03-17 14:10 ` [PATCH v5 06/11] pack-objects: move in_pack " Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 08/11] pack-objects: shrink z_delta_size field in struct object_entry Nguyễn Thái Ngọc Duy
` (5 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4G of those objects
because the array size in nr_objects is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.
Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].
[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 116 ++++++++++++++++++++++-------------------
pack-objects.h | 67 ++++++++++++++++++++++--
2 files changed, 124 insertions(+), 59 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index eaf78fa41a..379bd1ab92 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,6 +30,12 @@
#include "packfile.h"
#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+#define DELTA(obj) oe_delta(&to_pack, obj)
+#define DELTA_CHILD(obj) oe_delta_child(&to_pack, obj)
+#define DELTA_SIBLING(obj) oe_delta_sibling(&to_pack, obj)
+#define SET_DELTA(obj, val) oe_set_delta(&to_pack, obj, val)
+#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(&to_pack, obj, val)
+#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(&to_pack, obj, val)
static const char *pack_usage[] = {
N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
@@ -127,11 +133,11 @@ static void *get_delta(struct object_entry *entry)
buf = read_sha1_file(entry->idx.oid.hash, &type, &size);
if (!buf)
die("unable to read %s", oid_to_hex(&entry->idx.oid));
- base_buf = read_sha1_file(entry->delta->idx.oid.hash, &type,
+ base_buf = read_sha1_file(DELTA(entry)->idx.oid.hash, &type,
&base_size);
if (!base_buf)
die("unable to read %s",
- oid_to_hex(&entry->delta->idx.oid));
+ oid_to_hex(&DELTA(entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
buf, size, &delta_size, 0);
if (!delta_buf || delta_size != entry->delta_size)
@@ -288,12 +294,12 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
size = entry->delta_size;
buf = entry->delta_data;
entry->delta_data = NULL;
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
size = entry->delta_size;
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
@@ -317,7 +323,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
* encoding of the relative offset for the delta
* base from this object's position in the pack.
*/
- off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+ off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -343,7 +349,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
return 0;
}
hashwrite(f, header, hdrlen);
- hashwrite(f, entry->delta->idx.oid.hash, 20);
+ hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
} else {
if (limit && hdrlen + datalen + 20 >= limit) {
@@ -379,8 +385,8 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
- if (entry->delta)
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ if (DELTA(entry))
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
type, entry->size);
@@ -408,7 +414,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
}
if (type == OBJ_OFS_DELTA) {
- off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+ off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -427,7 +433,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
return 0;
}
hashwrite(f, header, hdrlen);
- hashwrite(f, entry->delta->idx.oid.hash, 20);
+ hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
reused_delta++;
} else {
@@ -467,13 +473,13 @@ static off_t write_object(struct hashfile *f,
else
limit = pack_size_limit - write_offset;
- if (!entry->delta)
+ if (!DELTA(entry))
usable_delta = 0; /* no delta */
else if (!pack_size_limit)
usable_delta = 1; /* unlimited packfile */
- else if (entry->delta->idx.offset == (off_t)-1)
+ else if (DELTA(entry)->idx.offset == (off_t)-1)
usable_delta = 0; /* base was written to another pack */
- else if (entry->delta->idx.offset)
+ else if (DELTA(entry)->idx.offset)
usable_delta = 1; /* base already exists in this pack */
else
usable_delta = 0; /* base could end up in another pack */
@@ -489,7 +495,7 @@ static off_t write_object(struct hashfile *f,
/* ... but pack split may override that */
else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0; /* pack has delta which is unusable */
- else if (entry->delta)
+ else if (DELTA(entry))
to_reuse = 0; /* we want to pack afresh */
else
to_reuse = 1; /* we have it in-pack undeltified,
@@ -541,12 +547,12 @@ static enum write_one_status write_one(struct hashfile *f,
}
/* if we are deltified, write out base object first. */
- if (e->delta) {
+ if (DELTA(e)) {
e->idx.offset = 1; /* now recurse */
- switch (write_one(f, e->delta, offset)) {
+ switch (write_one(f, DELTA(e), offset)) {
case WRITE_ONE_RECURSIVE:
/* we cannot depend on this one */
- e->delta = NULL;
+ SET_DELTA(e, NULL);
break;
default:
break;
@@ -608,34 +614,34 @@ static void add_descendants_to_write_order(struct object_entry **wo,
/* add this node... */
add_to_write_order(wo, endp, e);
/* all its siblings... */
- for (s = e->delta_sibling; s; s = s->delta_sibling) {
+ for (s = DELTA_SIBLING(e); s; s = DELTA_SIBLING(s)) {
add_to_write_order(wo, endp, s);
}
}
/* drop down a level to add left subtree nodes if possible */
- if (e->delta_child) {
+ if (DELTA_CHILD(e)) {
add_to_order = 1;
- e = e->delta_child;
+ e = DELTA_CHILD(e);
} else {
add_to_order = 0;
/* our sibling might have some children, it is next */
- if (e->delta_sibling) {
- e = e->delta_sibling;
+ if (DELTA_SIBLING(e)) {
+ e = DELTA_SIBLING(e);
continue;
}
/* go back to our parent node */
- e = e->delta;
- while (e && !e->delta_sibling) {
+ e = DELTA(e);
+ while (e && !DELTA_SIBLING(e)) {
/* we're on the right side of a subtree, keep
* going up until we can go right again */
- e = e->delta;
+ e = DELTA(e);
}
if (!e) {
/* done- we hit our original root node */
return;
}
/* pass it off to sibling at this level */
- e = e->delta_sibling;
+ e = DELTA_SIBLING(e);
}
};
}
@@ -646,7 +652,7 @@ static void add_family_to_write_order(struct object_entry **wo,
{
struct object_entry *root;
- for (root = e; root->delta; root = root->delta)
+ for (root = e; DELTA(root); root = DELTA(root))
; /* nothing */
add_descendants_to_write_order(wo, endp, root);
}
@@ -661,8 +667,8 @@ static struct object_entry **compute_write_order(void)
for (i = 0; i < to_pack.nr_objects; i++) {
objects[i].tagged = 0;
objects[i].filled = 0;
- objects[i].delta_child = NULL;
- objects[i].delta_sibling = NULL;
+ SET_DELTA_CHILD(&objects[i], NULL);
+ SET_DELTA_SIBLING(&objects[i], NULL);
}
/*
@@ -672,11 +678,11 @@ static struct object_entry **compute_write_order(void)
*/
for (i = to_pack.nr_objects; i > 0;) {
struct object_entry *e = &objects[--i];
- if (!e->delta)
+ if (!DELTA(e))
continue;
/* Mark me as the first child */
- e->delta_sibling = e->delta->delta_child;
- e->delta->delta_child = e;
+ e->delta_sibling_idx = DELTA(e)->delta_child_idx;
+ SET_DELTA_CHILD(DELTA(e), e);
}
/*
@@ -1498,10 +1504,10 @@ static void check_object(struct object_entry *entry)
* circular deltas.
*/
oe_set_type(entry, entry->in_pack_type);
- entry->delta = base_entry;
+ SET_DELTA(entry, base_entry);
entry->delta_size = entry->size;
- entry->delta_sibling = base_entry->delta_child;
- base_entry->delta_child = entry;
+ entry->delta_sibling_idx = base_entry->delta_child_idx;
+ SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
return;
}
@@ -1572,17 +1578,19 @@ static int pack_offset_sort(const void *_a, const void *_b)
*/
static void drop_reused_delta(struct object_entry *entry)
{
- struct object_entry **p = &entry->delta->delta_child;
+ unsigned *idx = &to_pack.objects[entry->delta_idx - 1].delta_child_idx;
struct object_info oi = OBJECT_INFO_INIT;
enum object_type type;
- while (*p) {
- if (*p == entry)
- *p = (*p)->delta_sibling;
+ while (*idx) {
+ struct object_entry *oe = &to_pack.objects[*idx - 1];
+
+ if (oe == entry)
+ *idx = oe->delta_sibling_idx;
else
- p = &(*p)->delta_sibling;
+ idx = &oe->delta_sibling_idx;
}
- entry->delta = NULL;
+ SET_DELTA(entry, NULL);
entry->depth = 0;
oi.sizep = &entry->size;
@@ -1622,7 +1630,7 @@ static void break_delta_chains(struct object_entry *entry)
for (cur = entry, total_depth = 0;
cur;
- cur = cur->delta, total_depth++) {
+ cur = DELTA(cur), total_depth++) {
if (cur->dfs_state == DFS_DONE) {
/*
* We've already seen this object and know it isn't
@@ -1647,7 +1655,7 @@ static void break_delta_chains(struct object_entry *entry)
* it's not a delta, we're done traversing, but we'll mark it
* done to save time on future traversals.
*/
- if (!cur->delta) {
+ if (!DELTA(cur)) {
cur->dfs_state = DFS_DONE;
break;
}
@@ -1670,7 +1678,7 @@ static void break_delta_chains(struct object_entry *entry)
* We keep all commits in the chain that we examined.
*/
cur->dfs_state = DFS_ACTIVE;
- if (cur->delta->dfs_state == DFS_ACTIVE) {
+ if (DELTA(cur)->dfs_state == DFS_ACTIVE) {
drop_reused_delta(cur);
cur->dfs_state = DFS_DONE;
break;
@@ -1685,7 +1693,7 @@ static void break_delta_chains(struct object_entry *entry)
* an extra "next" pointer to keep going after we reset cur->delta.
*/
for (cur = entry; cur; cur = next) {
- next = cur->delta;
+ next = DELTA(cur);
/*
* We should have a chain of zero or more ACTIVE states down to
@@ -1870,7 +1878,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
/* Now some size filtering heuristics. */
trg_size = trg_entry->size;
- if (!trg_entry->delta) {
+ if (!DELTA(trg_entry)) {
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
@@ -1946,7 +1954,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
if (!delta_buf)
return 0;
- if (trg_entry->delta) {
+ if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
if (delta_size == trg_entry->delta_size &&
src->depth + 1 >= trg->depth) {
@@ -1975,7 +1983,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
free(delta_buf);
}
- trg_entry->delta = src_entry;
+ SET_DELTA(trg_entry, src_entry);
trg_entry->delta_size = delta_size;
trg->depth = src->depth + 1;
@@ -1984,13 +1992,13 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)
{
- struct object_entry *child = me->delta_child;
+ struct object_entry *child = DELTA_CHILD(me);
unsigned int m = n;
while (child) {
unsigned int c = check_delta_limit(child, n + 1);
if (m < c)
m = c;
- child = child->delta_sibling;
+ child = DELTA_SIBLING(child);
}
return m;
}
@@ -2059,7 +2067,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* otherwise they would become too deep.
*/
max_depth = depth;
- if (entry->delta_child) {
+ if (DELTA_CHILD(entry)) {
max_depth -= check_delta_limit(entry, 0);
if (max_depth <= 0)
goto next;
@@ -2109,7 +2117,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* depth, leaving it in the window is pointless. we
* should evict it first.
*/
- if (entry->delta && max_depth <= n->depth)
+ if (DELTA(entry) && max_depth <= n->depth)
continue;
/*
@@ -2117,7 +2125,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* currently deltified object, to keep it longer. It will
* be the first base object to be attempted next.
*/
- if (entry->delta) {
+ if (DELTA(entry)) {
struct unpacked swap = array[best_base];
int dist = (window + idx - best_base) % window;
int dst = best_base;
@@ -2438,7 +2446,7 @@ static void prepare_pack(int window, int depth)
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = to_pack.objects + i;
- if (entry->delta)
+ if (DELTA(entry))
/* This happens if we decided to reuse existing
* delta from a pack. "reuse_delta &&" is implied.
*/
diff --git a/pack-objects.h b/pack-objects.h
index 933f71a86b..0b831c8f12 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -72,11 +72,11 @@ struct object_entry {
unsigned long size; /* uncompressed size */
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
off_t in_pack_offset;
- struct object_entry *delta; /* delta base object */
- struct object_entry *delta_child; /* deltified objects who bases me */
- struct object_entry *delta_sibling; /* other deltified objects who
- * uses the same base as me
- */
+ uint32_t delta_idx; /* delta base object */
+ uint32_t delta_child_idx; /* deltified objects who bases me */
+ uint32_t delta_sibling_idx; /* other deltified objects who
+ * uses the same base as me
+ */
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
@@ -200,4 +200,61 @@ static inline void oe_set_in_pack(struct object_entry *e,
}
+static inline struct object_entry *oe_delta(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_idx)
+ return &pack->objects[e->delta_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_idx = 0;
+}
+
+static inline struct object_entry *oe_delta_child(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_child_idx)
+ return &pack->objects[e->delta_child_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta_child(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_child_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_child_idx = 0;
+}
+
+static inline struct object_entry *oe_delta_sibling(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_sibling_idx)
+ return &pack->objects[e->delta_sibling_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta_sibling(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_sibling_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_sibling_idx = 0;
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v5 08/11] pack-objects: shrink z_delta_size field in struct object_entry
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (6 preceding siblings ...)
2018-03-17 14:10 ` [PATCH v5 07/11] pack-objects: refer to delta objects by index instead of pointer Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
` (4 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 16 bits, so you can only cache 65kb deltas.
Larger deltas must be recomputed at when the pack is written down.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/config.txt | 3 ++-
builtin/pack-objects.c | 22 ++++++++++++++++------
pack-objects.h | 3 ++-
3 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9bd3f5a789..00fa824448 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2449,7 +2449,8 @@ pack.deltaCacheLimit::
The maximum size of a delta, that is cached in
linkgit:git-pack-objects[1]. This cache is used to speed up the
writing object phase by not having to recompute the final delta
- result once the best match for all objects is found. Defaults to 1000.
+ result once the best match for all objects is found.
+ Defaults to 1000. Maximum value is 65535.
pack.threads::
Specifies the number of threads to spawn when searching for best
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 379bd1ab92..71ca1ba2ce 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2105,12 +2105,19 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* between writes at that moment.
*/
if (entry->delta_data && !pack_to_stdout) {
- entry->z_delta_size = do_compress(&entry->delta_data,
- entry->delta_size);
- cache_lock();
- delta_cache_size -= entry->delta_size;
- delta_cache_size += entry->z_delta_size;
- cache_unlock();
+ unsigned long size;
+
+ size = do_compress(&entry->delta_data, entry->delta_size);
+ if (size < (1 << OE_Z_DELTA_BITS)) {
+ entry->z_delta_size = size;
+ cache_lock();
+ delta_cache_size -= entry->delta_size;
+ delta_cache_size += entry->z_delta_size;
+ cache_unlock();
+ } else {
+ FREE_AND_NULL(entry->delta_data);
+ entry->z_delta_size = 0;
+ }
}
/* if we made n a delta, and if n is already at max
@@ -3089,6 +3096,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (depth >= (1 << OE_DEPTH_BITS))
die(_("delta chain depth %d is greater than maximum limit %d"),
depth, (1 << OE_DEPTH_BITS));
+ if (cache_max_small_delta_size >= (1 << OE_Z_DELTA_BITS))
+ die(_("pack.deltaCacheLimit is greater than maximum limit %d"),
+ 1 << OE_Z_DELTA_BITS);
argv_array_push(&rp, "pack-objects");
if (thin) {
diff --git a/pack-objects.h b/pack-objects.h
index 0b831c8f12..63222a76b0 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -4,6 +4,7 @@
#define OE_DFS_STATE_BITS 2
#define OE_DEPTH_BITS 12
#define OE_IN_PACK_BITS 14
+#define OE_Z_DELTA_BITS 16
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -79,7 +80,7 @@ struct object_entry {
*/
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
- unsigned long z_delta_size; /* delta data size (compressed) */
+ unsigned z_delta_size:OE_Z_DELTA_BITS;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned type_valid:1;
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v5 09/11] pack-objects: shrink size field in struct object_entry
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (7 preceding siblings ...)
2018-03-17 14:10 ` [PATCH v5 08/11] pack-objects: shrink z_delta_size field in struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 19:57 ` Ævar Arnfjörð Bjarmason
2018-03-18 5:09 ` Junio C Hamano
2018-03-17 14:10 ` [PATCH v5 10/11] pack-objects: shrink delta_size " Nguyễn Thái Ngọc Duy
` (3 subsequent siblings)
12 siblings, 2 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
It's very very rare that an uncompressd object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.
Shrink size field down to 32 bits [1] and one overflow bit. If the size
is too large, we read it back from disk.
Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).
A small note about the conditional oe_set_size() in
check_object(). Technically if we don't get a valid type, it's not
wrong if we set uninitialized value "size" (we don't pre-initialize
this and sha1_object_info will not assign anything when it fails to
get the info).
This how changes the writing code path slightly which emits different
error messages (either way we die). One of our tests in t5530 depends
on this specific error message. Let's just keep the test as-is and
play safe by not assigning random value. That might trigger valgrind
anyway.
[1] it's actually already 32 bits on Windows
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 49 ++++++++++++++++++++++++++---------------
pack-objects.h | 50 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 80 insertions(+), 19 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 71ca1ba2ce..887e12c556 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -274,7 +274,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
if (!usable_delta) {
if (oe_type(entry) == OBJ_BLOB &&
- entry->size > big_file_threshold &&
+ oe_size_greater_than(entry, big_file_threshold) &&
(st = open_istream(entry->idx.oid.hash, &type, &size, NULL)) != NULL)
buf = NULL;
else {
@@ -384,12 +384,13 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned char header[MAX_PACK_OBJECT_HEADER],
dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
+ unsigned long entry_size = oe_size(entry);
if (DELTA(entry))
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
- type, entry->size);
+ type, entry_size);
offset = entry->in_pack_offset;
revidx = find_pack_revindex(p, offset);
@@ -406,7 +407,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
datalen -= entry->in_pack_header_size;
if (!pack_to_stdout && p->index_version == 1 &&
- check_pack_inflate(p, &w_curs, offset, datalen, entry->size)) {
+ check_pack_inflate(p, &w_curs, offset, datalen, entry_size)) {
error("corrupt packed object for %s",
oid_to_hex(&entry->idx.oid));
unuse_pack(&w_curs);
@@ -1412,6 +1413,8 @@ static void cleanup_preferred_base(void)
static void check_object(struct object_entry *entry)
{
+ unsigned long size;
+
if (IN_PACK(entry)) {
struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
@@ -1431,13 +1434,14 @@ static void check_object(struct object_entry *entry)
*/
used = unpack_object_header_buffer(buf, avail,
&type,
- &entry->size);
+ &size);
if (used == 0)
goto give_up;
if (type < 0)
die("BUG: invalid type %d", type);
entry->in_pack_type = type;
+ oe_set_size(entry, size);
/*
* Determine if this is a delta and if so whether we can
@@ -1505,7 +1509,7 @@ static void check_object(struct object_entry *entry)
*/
oe_set_type(entry, entry->in_pack_type);
SET_DELTA(entry, base_entry);
- entry->delta_size = entry->size;
+ entry->delta_size = oe_size(entry);
entry->delta_sibling_idx = base_entry->delta_child_idx;
SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
@@ -1513,14 +1517,17 @@ static void check_object(struct object_entry *entry)
}
if (oe_type(entry)) {
+ unsigned long size;
+
+ size = get_size_from_delta(p, &w_curs,
+ entry->in_pack_offset + entry->in_pack_header_size);
/*
* This must be a delta and we already know what the
* final object type is. Let's extract the actual
* object size from the delta header.
*/
- entry->size = get_size_from_delta(p, &w_curs,
- entry->in_pack_offset + entry->in_pack_header_size);
- if (entry->size == 0)
+ oe_set_size(entry, size);
+ if (oe_size_less_than(entry, 1))
goto give_up;
unuse_pack(&w_curs);
return;
@@ -1535,13 +1542,15 @@ static void check_object(struct object_entry *entry)
unuse_pack(&w_curs);
}
- oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &entry->size));
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &size));
/*
* The error condition is checked in prepare_pack(). This is
* to permit a missing preferred base object to be ignored
* as a preferred base. Doing so can result in a larger
* pack file, but the transfer will still take place.
*/
+ if (entry->type_valid)
+ oe_set_size(entry, size);
}
static int pack_offset_sort(const void *_a, const void *_b)
@@ -1581,6 +1590,7 @@ static void drop_reused_delta(struct object_entry *entry)
unsigned *idx = &to_pack.objects[entry->delta_idx - 1].delta_child_idx;
struct object_info oi = OBJECT_INFO_INIT;
enum object_type type;
+ unsigned long size;
while (*idx) {
struct object_entry *oe = &to_pack.objects[*idx - 1];
@@ -1593,7 +1603,7 @@ static void drop_reused_delta(struct object_entry *entry)
SET_DELTA(entry, NULL);
entry->depth = 0;
- oi.sizep = &entry->size;
+ oi.sizep = &size;
oi.typep = &type;
if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
@@ -1603,10 +1613,11 @@ static void drop_reused_delta(struct object_entry *entry)
* and dealt with in prepare_pack().
*/
oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
- &entry->size));
+ &size));
} else {
oe_set_type(entry, type);
}
+ oe_set_size(entry, size);
}
/*
@@ -1746,7 +1757,7 @@ static void get_object_details(void)
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = sorted_by_offset[i];
check_object(entry);
- if (big_file_threshold < entry->size)
+ if (oe_size_greater_than(entry, big_file_threshold))
entry->no_try_delta = 1;
}
@@ -1775,6 +1786,8 @@ static int type_size_sort(const void *_a, const void *_b)
const struct object_entry *b = *(struct object_entry **)_b;
enum object_type a_type = oe_type(a);
enum object_type b_type = oe_type(b);
+ unsigned long a_size = oe_size(a);
+ unsigned long b_size = oe_size(b);
if (a_type > b_type)
return -1;
@@ -1788,9 +1801,9 @@ static int type_size_sort(const void *_a, const void *_b)
return -1;
if (a->preferred_base < b->preferred_base)
return 1;
- if (a->size > b->size)
+ if (a_size > b_size)
return -1;
- if (a->size < b->size)
+ if (a_size < b_size)
return 1;
return a < b ? -1 : (a > b); /* newest first */
}
@@ -1877,7 +1890,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
return 0;
/* Now some size filtering heuristics. */
- trg_size = trg_entry->size;
+ trg_size = oe_size(trg_entry);
if (!DELTA(trg_entry)) {
max_size = trg_size/2 - 20;
ref_depth = 1;
@@ -1889,7 +1902,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
(max_depth - ref_depth + 1);
if (max_size == 0)
return 0;
- src_size = src_entry->size;
+ src_size = oe_size(src_entry);
sizediff = src_size < trg_size ? trg_size - src_size : 0;
if (sizediff >= max_size)
return 0;
@@ -2009,7 +2022,7 @@ static unsigned long free_unpacked(struct unpacked *n)
free_delta_index(n->index);
n->index = NULL;
if (n->data) {
- freed_mem += n->entry->size;
+ freed_mem += oe_size(n->entry);
FREE_AND_NULL(n->data);
}
n->entry = NULL;
@@ -2459,7 +2472,7 @@ static void prepare_pack(int window, int depth)
*/
continue;
- if (entry->size < 50)
+ if (oe_size_less_than(entry, 50))
continue;
if (entry->no_try_delta)
diff --git a/pack-objects.h b/pack-objects.h
index 63222a76b0..9a4ed7fdbe 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -70,7 +70,9 @@ enum dfs_state {
*/
struct object_entry {
struct pack_idx_entry idx;
- unsigned long size; /* uncompressed size */
+ /* object uncompressed size _if_ size_valid is true */
+ uint32_t size_;
+ unsigned size_valid:1;
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
off_t in_pack_offset;
uint32_t delta_idx; /* delta base object */
@@ -258,4 +260,50 @@ static inline void oe_set_delta_sibling(struct packing_data *pack,
e->delta_sibling_idx = 0;
}
+static inline unsigned long oe_size(const struct object_entry *e)
+{
+ if (e->size_valid) {
+ return e->size_;
+ } else {
+ unsigned long size;
+
+ sha1_object_info(e->idx.oid.hash, &size);
+ return size;
+ }
+}
+
+static inline int contains_in_32bits(unsigned long limit)
+{
+ uint32_t truncated_limit = (uint32_t)limit;
+
+ return limit == truncated_limit;
+}
+
+static inline int oe_size_less_than(const struct object_entry *e,
+ unsigned long limit)
+{
+ if (e->size_valid)
+ return e->size_ < limit;
+ if (contains_in_32bits(limit))
+ return 1;
+ return oe_size(e) < limit;
+}
+
+static inline int oe_size_greater_than(const struct object_entry *e,
+ unsigned long limit)
+{
+ if (e->size_valid)
+ return e->size_ > limit;
+ if (contains_in_32bits(limit))
+ return 0;
+ return oe_size(e) > limit;
+}
+
+static inline void oe_set_size(struct object_entry *e,
+ unsigned long size)
+{
+ e->size_ = size;
+ e->size_valid = e->size_ == size;
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v5 09/11] pack-objects: shrink size field in struct object_entry
2018-03-17 14:10 ` [PATCH v5 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
@ 2018-03-17 19:57 ` Ævar Arnfjörð Bjarmason
2018-03-18 5:09 ` Junio C Hamano
1 sibling, 0 replies; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-17 19:57 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: e, git, gitster, peff
On Sat, Mar 17 2018, Nguyễn Thái Ngọc Duy jotted:
> It's very very rare that an uncompressd object is larger than 4GB
s/uncompressd/uncompressed/
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v5 09/11] pack-objects: shrink size field in struct object_entry
2018-03-17 14:10 ` [PATCH v5 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
2018-03-17 19:57 ` Ævar Arnfjörð Bjarmason
@ 2018-03-18 5:09 ` Junio C Hamano
2018-03-18 8:23 ` Duy Nguyen
1 sibling, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2018-03-18 5:09 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> +static inline int contains_in_32bits(unsigned long limit)
> +{
This name somehow does not sound right.
If the verb "contain" must be used, the way to phrase what this
function does with the verb is to say "limit can be contained in a
32-bit int", so "contains" is probably where the funniness comes
from.
"fits in 32bits" is OK, I think.
> + uint32_t truncated_limit = (uint32_t)limit;
> +
> + return limit == truncated_limit;
> +}
I am guessing that a compiler that is clever enough will make this
function a no-op on a 32-bit arch and that is why it is a static
inline function?
> +static inline int oe_size_less_than(const struct object_entry *e,
> + unsigned long limit)
> +{
> + if (e->size_valid)
> + return e->size_ < limit;
e->size_ is the true size so we can compare it to see if it is smaller
than limit.
> + if (contains_in_32bits(limit))
> + return 1;
If limit is small enough, and because e->size_valid means e->size_
does not fit in 32-bit, we know size is larger than limit.
Shouldn't we be returning 0 that means "no, the size is not less
than limit" from here?
> + return oe_size(e) < limit;
> +}
> +
> +static inline int oe_size_greater_than(const struct object_entry *e,
> + unsigned long limit)
> +{
> + if (e->size_valid)
> + return e->size_ > limit;
e->size_ is the true size so we compare and return if it is larger
than limit.
> + if (contains_in_32bits(limit))
> + return 0;
Now e->size_ is larger than what would fit within 32-bit. If limit
fits within 32-bit, then size must be larger than limit. Again,
shouldn't we be returning 1 that means "yes, the size is greater
than limit" from here?
> + return oe_size(e) > limit;
> +}
> +
> +static inline void oe_set_size(struct object_entry *e,
> + unsigned long size)
> +{
> + e->size_ = size;
> + e->size_valid = e->size_ == size;
> +}
> +
> #endif
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v5 09/11] pack-objects: shrink size field in struct object_entry
2018-03-18 5:09 ` Junio C Hamano
@ 2018-03-18 8:23 ` Duy Nguyen
0 siblings, 0 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-18 8:23 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
On Sun, Mar 18, 2018 at 6:09 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> + uint32_t truncated_limit = (uint32_t)limit;
>> +
>> + return limit == truncated_limit;
>> +}
>
> I am guessing that a compiler that is clever enough will make this
> function a no-op on a 32-bit arch and that is why it is a static
> inline function?
It's a separate function because I don't want to duplicate this ==
logic twice. Even if the compiler does not optimize this, it's still
much cheaper than oe_sze() which involves disk access.
>> +static inline int oe_size_less_than(const struct object_entry *e,
>> + unsigned long limit)
>> +{
>> + if (e->size_valid)
>> + return e->size_ < limit;
>
> e->size_ is the true size so we can compare it to see if it is smaller
> than limit.
>
>> + if (contains_in_32bits(limit))
>> + return 1;
>
> If limit is small enough, and because e->size_valid means e->size_
> does not fit in 32-bit, we know size is larger than limit.
> Shouldn't we be returning 0 that means "no, the size is not less
> than limit" from here?
Argh!!! This logic keeps messing with my brain.
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v5 10/11] pack-objects: shrink delta_size field in struct object_entry
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (8 preceding siblings ...)
2018-03-17 14:10 ` [PATCH v5 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 14:10 ` [PATCH v5 11/11] pack-objects.h: reorder members to shrink " Nguyễn Thái Ngọc Duy
` (2 subsequent siblings)
12 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Allowing a delta size of 64 bits is crazy. Shrink this field down to
31 bits with one overflow bit.
If we find an existing delta larger than 2GB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 24 ++++++++++++++----------
pack-objects.h | 23 ++++++++++++++++++++++-
2 files changed, 36 insertions(+), 11 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 887e12c556..fb2aba80bf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,10 +30,12 @@
#include "packfile.h"
#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+#define DELTA_SIZE(obj) oe_delta_size(&to_pack, obj)
#define DELTA(obj) oe_delta(&to_pack, obj)
#define DELTA_CHILD(obj) oe_delta_child(&to_pack, obj)
#define DELTA_SIBLING(obj) oe_delta_sibling(&to_pack, obj)
#define SET_DELTA(obj, val) oe_set_delta(&to_pack, obj, val)
+#define SET_DELTA_SIZE(obj, val) oe_set_delta_size(&to_pack, obj, val)
#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(&to_pack, obj, val)
#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(&to_pack, obj, val)
@@ -140,7 +142,7 @@ static void *get_delta(struct object_entry *entry)
oid_to_hex(&DELTA(entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
buf, size, &delta_size, 0);
- if (!delta_buf || delta_size != entry->delta_size)
+ if (!delta_buf || delta_size != DELTA_SIZE(entry))
die("delta size changed");
free(buf);
free(base_buf);
@@ -291,14 +293,14 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
FREE_AND_NULL(entry->delta_data);
entry->z_delta_size = 0;
} else if (entry->delta_data) {
- size = entry->delta_size;
+ size = DELTA_SIZE(entry);
buf = entry->delta_data;
entry->delta_data = NULL;
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
- size = entry->delta_size;
+ size = DELTA_SIZE(entry);
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
@@ -1509,7 +1511,7 @@ static void check_object(struct object_entry *entry)
*/
oe_set_type(entry, entry->in_pack_type);
SET_DELTA(entry, base_entry);
- entry->delta_size = oe_size(entry);
+ SET_DELTA_SIZE(entry, oe_size(entry));
entry->delta_sibling_idx = base_entry->delta_child_idx;
SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
@@ -1895,7 +1897,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
- max_size = trg_entry->delta_size;
+ max_size = DELTA_SIZE(trg_entry);
ref_depth = trg->depth;
}
max_size = (uint64_t)max_size * (max_depth - src->depth) /
@@ -1966,10 +1968,12 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
if (!delta_buf)
return 0;
+ if (delta_size >= (1 << OE_DELTA_SIZE_BITS))
+ return 0;
if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
- if (delta_size == trg_entry->delta_size &&
+ if (delta_size == DELTA_SIZE(trg_entry) &&
src->depth + 1 >= trg->depth) {
free(delta_buf);
return 0;
@@ -1984,7 +1988,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
free(trg_entry->delta_data);
cache_lock();
if (trg_entry->delta_data) {
- delta_cache_size -= trg_entry->delta_size;
+ delta_cache_size -= DELTA_SIZE(trg_entry);
trg_entry->delta_data = NULL;
}
if (delta_cacheable(src_size, trg_size, delta_size)) {
@@ -1997,7 +2001,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
}
SET_DELTA(trg_entry, src_entry);
- trg_entry->delta_size = delta_size;
+ SET_DELTA_SIZE(trg_entry, delta_size);
trg->depth = src->depth + 1;
return 1;
@@ -2120,11 +2124,11 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
if (entry->delta_data && !pack_to_stdout) {
unsigned long size;
- size = do_compress(&entry->delta_data, entry->delta_size);
+ size = do_compress(&entry->delta_data, DELTA_SIZE(entry));
if (size < (1 << OE_Z_DELTA_BITS)) {
entry->z_delta_size = size;
cache_lock();
- delta_cache_size -= entry->delta_size;
+ delta_cache_size -= DELTA_SIZE(entry);
delta_cache_size += entry->z_delta_size;
cache_unlock();
} else {
diff --git a/pack-objects.h b/pack-objects.h
index 9a4ed7fdbe..2507b157d5 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -5,6 +5,7 @@
#define OE_DEPTH_BITS 12
#define OE_IN_PACK_BITS 14
#define OE_Z_DELTA_BITS 16
+#define OE_DELTA_SIZE_BITS 31
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -81,7 +82,8 @@ struct object_entry {
* uses the same base as me
*/
void *delta_data; /* cached delta (uncompressed) */
- unsigned long delta_size; /* delta data size (uncompressed) */
+ uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
+ uint32_t delta_size_valid:1;
unsigned z_delta_size:OE_Z_DELTA_BITS;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
@@ -306,4 +308,23 @@ static inline void oe_set_size(struct object_entry *e,
e->size_valid = e->size_ == size;
}
+static inline unsigned long oe_delta_size(struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_size_valid)
+ return e->delta_size_;
+ return oe_size(e);
+}
+
+static inline void oe_set_delta_size(struct packing_data *pack,
+ struct object_entry *e,
+ unsigned long size)
+{
+ e->delta_size_ = size;
+ e->delta_size_valid = e->delta_size_ == size;
+ if (!e->delta_size_valid && size != oe_size(e))
+ die("BUG: this can only happen in check_object() "
+ "where delta size is the same as entry size");
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v5 11/11] pack-objects.h: reorder members to shrink struct object_entry
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (9 preceding siblings ...)
2018-03-17 14:10 ` [PATCH v5 10/11] pack-objects: shrink delta_size " Nguyễn Thái Ngọc Duy
@ 2018-03-17 14:10 ` Nguyễn Thái Ngọc Duy
2018-03-17 19:53 ` Ævar Arnfjörð Bjarmason
2018-03-17 19:45 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Ævar Arnfjörð Bjarmason
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
12 siblings, 1 reply; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-17 14:10 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Previous patches leave lots of holes and padding in this struct. This
patch reorders the members and shrinks the struct down to 80 bytes
(from 136 bytes, before any field shrinking is done) with 16 bits to
spare (and a couple more in in_pack_header_size when we really run out
of bits).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
pack-objects.h | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/pack-objects.h b/pack-objects.h
index 2507b157d5..8979289f5f 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -71,35 +71,36 @@ enum dfs_state {
*/
struct object_entry {
struct pack_idx_entry idx;
- /* object uncompressed size _if_ size_valid is true */
- uint32_t size_;
- unsigned size_valid:1;
- unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
+ void *delta_data; /* cached delta (uncompressed) */
off_t in_pack_offset;
+ uint32_t hash; /* name hint hash */
+ uint32_t size_; /* object uncompressed size _if_ size_valid is true */
uint32_t delta_idx; /* delta base object */
uint32_t delta_child_idx; /* deltified objects who bases me */
uint32_t delta_sibling_idx; /* other deltified objects who
* uses the same base as me
*/
- void *delta_data; /* cached delta (uncompressed) */
uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
uint32_t delta_size_valid:1;
+ unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
+ unsigned size_valid:1;
unsigned z_delta_size:OE_Z_DELTA_BITS;
+ unsigned type_valid:1;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
- unsigned type_valid:1;
- uint32_t hash; /* name hint hash */
- unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
* objects against.
*/
unsigned no_try_delta:1;
+ unsigned char in_pack_header_size;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
unsigned depth:OE_DEPTH_BITS;
+
+ /* size: 80, bit_padding: 16 bits */
};
struct packing_data {
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v5 11/11] pack-objects.h: reorder members to shrink struct object_entry
2018-03-17 14:10 ` [PATCH v5 11/11] pack-objects.h: reorder members to shrink " Nguyễn Thái Ngọc Duy
@ 2018-03-17 19:53 ` Ævar Arnfjörð Bjarmason
2018-03-18 8:49 ` Duy Nguyen
0 siblings, 1 reply; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-17 19:53 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: e, git, gitster, peff
On Sat, Mar 17 2018, Nguyễn Thái Ngọc Duy jotted:
> Previous patches leave lots of holes and padding in this struct. This
> patch reorders the members and shrinks the struct down to 80 bytes
> (from 136 bytes, before any field shrinking is done) with 16 bits to
> spare (and a couple more in in_pack_header_size when we really run out
> of bits).
Given what I mentioned in 87po42cwql.fsf@evledraar.gmail.com just now I
think we should add this to the commit message.
This is the last in a series of memory reduction patches (see
"pack-objects: a bit of document about struct object_entry" for the
first one).
Overall they've reduced repack memory size on linux.git from 3.747G
to 3.424G, or by around 320M, a decrease of 8.5%. The runtime of
repack has stayed the same throughout this series. Ævar's testing on
a big monorepo he has access to (bigger than linux.git) has shown a
7.9% reduction, so the overall expected improvement should be
somewhere around 8%.
See 87po42cwql.fsf@evledraar.gmail.com on-list
(https://public-inbox.org/git/87po42cwql.fsf@evledraar.gmail.com/)
for more detailed numbers and a test script used to produce the
numbers cited above.
Thanks again for working on this.
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> pack-objects.h | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/pack-objects.h b/pack-objects.h
> index 2507b157d5..8979289f5f 100644
> --- a/pack-objects.h
> +++ b/pack-objects.h
> @@ -71,35 +71,36 @@ enum dfs_state {
> */
> struct object_entry {
> struct pack_idx_entry idx;
> - /* object uncompressed size _if_ size_valid is true */
> - uint32_t size_;
> - unsigned size_valid:1;
> - unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
> + void *delta_data; /* cached delta (uncompressed) */
> off_t in_pack_offset;
> + uint32_t hash; /* name hint hash */
> + uint32_t size_; /* object uncompressed size _if_ size_valid is true */
> uint32_t delta_idx; /* delta base object */
> uint32_t delta_child_idx; /* deltified objects who bases me */
> uint32_t delta_sibling_idx; /* other deltified objects who
> * uses the same base as me
> */
> - void *delta_data; /* cached delta (uncompressed) */
> uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
> uint32_t delta_size_valid:1;
> + unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
> + unsigned size_valid:1;
> unsigned z_delta_size:OE_Z_DELTA_BITS;
> + unsigned type_valid:1;
> unsigned type_:TYPE_BITS;
> unsigned in_pack_type:TYPE_BITS; /* could be delta */
> - unsigned type_valid:1;
> - uint32_t hash; /* name hint hash */
> - unsigned char in_pack_header_size;
> unsigned preferred_base:1; /*
> * we do not pack this, but is available
> * to be used as the base object to delta
> * objects against.
> */
> unsigned no_try_delta:1;
> + unsigned char in_pack_header_size;
> unsigned tagged:1; /* near the very tip of refs */
> unsigned filled:1; /* assigned write-order */
> unsigned dfs_state:OE_DFS_STATE_BITS;
> unsigned depth:OE_DEPTH_BITS;
> +
> + /* size: 80, bit_padding: 16 bits */
> };
>
> struct packing_data {
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v5 11/11] pack-objects.h: reorder members to shrink struct object_entry
2018-03-17 19:53 ` Ævar Arnfjörð Bjarmason
@ 2018-03-18 8:49 ` Duy Nguyen
0 siblings, 0 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-18 8:49 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Eric Wong, Git Mailing List, Junio C Hamano, Jeff King
On Sat, Mar 17, 2018 at 8:53 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Sat, Mar 17 2018, Nguyễn Thái Ngọc Duy jotted:
>
>> Previous patches leave lots of holes and padding in this struct. This
>> patch reorders the members and shrinks the struct down to 80 bytes
>> (from 136 bytes, before any field shrinking is done) with 16 bits to
>> spare (and a couple more in in_pack_header_size when we really run out
>> of bits).
>
> Given what I mentioned in 87po42cwql.fsf@evledraar.gmail.com just now I
> think we should add this to the commit message.
>
> This is the last in a series of memory reduction patches (see
> "pack-objects: a bit of document about struct object_entry" for the
> first one).
>
> Overall they've reduced repack memory size on linux.git from 3.747G
> to 3.424G, or by around 320M, a decrease of 8.5%. The runtime of
> repack has stayed the same throughout this series. Ævar's testing on
> a big monorepo he has access to (bigger than linux.git) has shown a
> 7.9% reduction, so the overall expected improvement should be
> somewhere around 8%.
>
> See 87po42cwql.fsf@evledraar.gmail.com on-list
> (https://public-inbox.org/git/87po42cwql.fsf@evledraar.gmail.com/)
> for more detailed numbers and a test script used to produce the
> numbers cited above.
Yeah.
I probably should add something that was on my mind but never written
out. These shrinking and packing definitely slow down access to these
struct members (more instructions to read or write). However, since
pack-objects is mostly IO-bound, and when it's CPU-bound, I think the
hot path is either inflating objects or generating deltas, these
slowdowns do not matter (and smaller cache footprint helps too)
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v5 00/11] nd/pack-objects-pack-struct updates
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (10 preceding siblings ...)
2018-03-17 14:10 ` [PATCH v5 11/11] pack-objects.h: reorder members to shrink " Nguyễn Thái Ngọc Duy
@ 2018-03-17 19:45 ` Ævar Arnfjörð Bjarmason
2018-03-17 19:47 ` Ævar Arnfjörð Bjarmason
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
12 siblings, 1 reply; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-17 19:45 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: e, git, gitster, peff
On Sat, Mar 17 2018, Nguyễn Thái Ngọc Duy jotted:
> v5 changes are small enough that the interdiff is pretty self
> explanatory (there's also a couple commit msg updates).
I've been testing this and it's definitely an improvement. I think it
would be good to get some mention in the commit messages themselves of
the incremental improvement, to that end I wrote this:
$ cat /tmp/howmuch-mem.sh
#!/bin/sh
cd /tmp &&
(
for i in {1..3}
do
/usr/bin/time -f MaxRSS:%M ~/g/git/git --git-dir=/tmp/linux.git --exec-path=/home/avar/g/git repack -A -d 2>&1
done | grep MaxRSS: | sort -n | head -n 1 | tr '\n' '\t' &&
git git-commit-summary &&
echo
) | tee -a /tmp/git-memory.log
I.e. we repack linux.git (I'd already repacked it once) and do three
runs, and take the lowest RSS size. This yields (I rebased your series
on top of git@github.com:git/git.git master and pushed it to
git@github.com:avar/git.git pack-objects-reduce-memory-footprint), via:
git rebase --exec='make -j8 CFLAGS="-O3" && /tmp/howmuch-mem.sh' -i
That gave me, kb on the first column:
MaxRSS:3746648 f23a196dd9 ("pack-objects: a bit of document about struct object_entry", 2018-03-01)
MaxRSS:3700696 953b6473d7 ("pack-objects: turn type and in_pack_type to bitfields", 2018-03-01)
MaxRSS:3700404 6cbe573539 ("pack-objects: use bitfield for object_entry::dfs_state", 2018-03-01)
MaxRSS:3654044 0b93ebcae9 ("pack-objects: use bitfield for object_entry::depth", 2018-03-01)
MaxRSS:3654040 67a4d48773 ("pack-objects: move in_pack_pos out of struct object_entry", 2018-03-01) [X]
MaxRSS:3654104 e77319c65a ("pack-objects: move in_pack out of struct object_entry", 2018-03-01) [X]
MaxRSS:3608096 a72cfcfea3 ("pack-objects: refer to delta objects by index instead of pointer", 2018-03-01)
MaxRSS:3562212 76eaa779eb ("pack-objects: shrink z_delta_size field in struct object_entry", 2018-03-05)
MaxRSS:3515164 42e28dd4b3 ("pack-objects: shrink size field in struct object_entry", 2018-03-05)
MaxRSS:3469440 26eba3ded4 ("pack-objects: shrink delta_size field in struct object_entry", 2018-03-05)
MaxRSS:3423704 c6493de964 ("pack-objects.h: reorder members to shrink struct object_entry", 2018-03-12)
I.e. on git.git we end up with just over a a 8.5% reduction, and
interestingly have a slight increase over a past commit in one change,
and one that just makes 4kb of difference (marked via [X] above).
Also, your v0 says it overall saves 260MB of memory. According to this
it's 320MB. You did note some reductions in subsequent patches, but it's
worth calling that out explicitly.
I have a bigger in-house repo that looks like this with this change:
MaxRSS:4753120 f23a196dd9 ("pack-objects: a bit of document about struct object_entry", 2018-03-01)
MaxRSS:4699084 953b6473d7 ("pack-objects: turn type and in_pack_type to bitfields", 2018-03-01)
MaxRSS:4699028 6cbe573539 ("pack-objects: use bitfield for object_entry::dfs_state", 2018-03-01)
MaxRSS:4645452 0b93ebcae9 ("pack-objects: use bitfield for object_entry::depth", 2018-03-01)
MaxRSS:4645288 67a4d48773 ("pack-objects: move in_pack_pos out of struct object_entry", 2018-03-01)
MaxRSS:4645548 e77319c65a ("pack-objects: move in_pack out of struct object_entry", 2018-03-01)
MaxRSS:4591484 a72cfcfea3 ("pack-objects: refer to delta objects by index instead of pointer", 2018-03-01)
MaxRSS:4537980 76eaa779eb ("pack-objects: shrink z_delta_size field in struct object_entry", 2018-03-05)
MaxRSS:4484148 42e28dd4b3 ("pack-objects: shrink size field in struct object_entry", 2018-03-05)
MaxRSS:4430404 26eba3ded4 ("pack-objects: shrink delta_size field in struct object_entry", 2018-03-05)
MaxRSS:4376148 c6493de964 ("pack-objects.h: reorder members to shrink struct object_entry", 2018-03-12)
I.e. a tad more than a 7.9% reduction in memory use.
This series also doesn't make a difference to the total runtime (which
is good, just wanted to make sure). On linux.git on my box best out of
three is 1:15.74 before and 1:14.93 after, which is within the margin of
random error.
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-17 14:10 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Nguyễn Thái Ngọc Duy
` (11 preceding siblings ...)
2018-03-17 19:45 ` [PATCH v5 00/11] nd/pack-objects-pack-struct updates Ævar Arnfjörð Bjarmason
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 01/11] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
` (13 more replies)
12 siblings, 14 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
v6 fixes the one optimization that I just couldn't get right, fixes
two off-by-one error messages and a couple commit message update
(biggest change is in 11/11 to record some numbers from AEvar)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index fb2aba80bf..4406af640f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3112,10 +3112,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (depth >= (1 << OE_DEPTH_BITS))
die(_("delta chain depth %d is greater than maximum limit %d"),
- depth, (1 << OE_DEPTH_BITS));
+ depth, (1 << OE_DEPTH_BITS) - 1);
if (cache_max_small_delta_size >= (1 << OE_Z_DELTA_BITS))
die(_("pack.deltaCacheLimit is greater than maximum limit %d"),
- 1 << OE_Z_DELTA_BITS);
+ (1 << OE_Z_DELTA_BITS) - 1);
argv_array_push(&rp, "pack-objects");
if (thin) {
diff --git a/pack-objects.h b/pack-objects.h
index 55358da9f3..af40211105 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -275,7 +275,7 @@ static inline unsigned long oe_size(const struct object_entry *e)
}
}
-static inline int contains_in_32bits(unsigned long limit)
+static inline int oe_fits_in_32bits(unsigned long limit)
{
uint32_t truncated_limit = (uint32_t)limit;
@@ -287,8 +287,8 @@ static inline int oe_size_less_than(const struct object_entry *e,
{
if (e->size_valid)
return e->size_ < limit;
- if (contains_in_32bits(limit))
- return 1;
+ if (oe_fits_in_32bits(limit)) /* limit < 2^32 <= size ? */
+ return 0;
return oe_size(e) < limit;
}
@@ -297,8 +297,8 @@ static inline int oe_size_greater_than(const struct object_entry *e,
{
if (e->size_valid)
return e->size_ > limit;
- if (contains_in_32bits(limit))
- return 0;
+ if (oe_fits_in_32bits(limit)) /* limit < 2^32 <= size ? */
+ return 1;
return oe_size(e) > limit;
}
@@ -307,6 +307,14 @@ static inline void oe_set_size(struct object_entry *e,
{
e->size_ = size;
e->size_valid = e->size_ == size;
+
+ if (!e->size_valid) {
+ unsigned long real_size;
+
+ if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 ||
+ size != real_size)
+ die("BUG: 'size' is supposed to be the object size!");
+ }
}
static inline unsigned long oe_delta_size(struct packing_data *pack,
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v6 01/11] pack-objects: a bit of document about struct object_entry
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 02/11] pack-objects: turn type and in_pack_type to bitfields Nguyễn Thái Ngọc Duy
` (12 subsequent siblings)
13 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
The role of this comment block becomes more important after we shuffle
fields around to shrink this struct. It will be much harder to see what
field is related to what.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
pack-objects.h | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/pack-objects.h b/pack-objects.h
index 03f1191659..c0a1f61aac 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,51 @@
#ifndef PACK_OBJECTS_H
#define PACK_OBJECTS_H
+/*
+ * basic object info
+ * -----------------
+ * idx.oid is filled up before delta searching starts. idx.crc32 is
+ * only valid after the object is written out and will be used for
+ * generating the index. idx.offset will be both gradually set and
+ * used in writing phase (base objects get offset first, then deltas
+ * refer to them)
+ *
+ * "size" is the uncompressed object size. Compressed size of the raw
+ * data for an object in a pack is not stored anywhere but is computed
+ * and made available when reverse .idx is made.
+ *
+ * "hash" contains a path name hash which is used for sorting the
+ * delta list and also during delta searching. Once prepare_pack()
+ * returns it's no longer needed.
+ *
+ * source pack info
+ * ----------------
+ * The (in_pack, in_pack_offset) tuple contains the location of the
+ * object in the source pack. in_pack_header_size allows quickly
+ * skipping the header and going straight to the zlib stream.
+ *
+ * "type" and "in_pack_type" both describe object type. in_pack_type
+ * may contain a delta type, while type is always the canonical type.
+ *
+ * deltas
+ * ------
+ * Delta links (delta, delta_child and delta_sibling) are created to
+ * reflect that delta graph from the source pack then updated or added
+ * during delta searching phase when we find better deltas.
+ *
+ * delta_child and delta_sibling are last needed in
+ * compute_write_order(). "delta" and "delta_size" must remain valid
+ * at object writing phase in case the delta is not cached.
+ *
+ * If a delta is cached in memory and is compressed, delta_data points
+ * to the data and z_delta_size contains the compressed size. If it's
+ * uncompressed [1], z_delta_size must be zero. delta_size is always
+ * the uncompressed size and must be valid even if the delta is not
+ * cached.
+ *
+ * [1] during try_delta phase we don't bother with compressing because
+ * the delta could be quickly replaced with a better one.
+ */
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v6 02/11] pack-objects: turn type and in_pack_type to bitfields
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 01/11] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 03/11] pack-objects: use bitfield for object_entry::dfs_state Nguyễn Thái Ngọc Duy
` (11 subsequent siblings)
13 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.
A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.
But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger
fatal: unable to get type of object ...
Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.
[1] See 5c49c11686 (pack-objects: better check_object() performances -
2007-04-16)
[2] 21666f1aae (convert object type handling from a string to a number
- 2007-02-26)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 60 ++++++++++++++++++++++++------------------
cache.h | 2 ++
object.h | 1 -
pack-bitmap-write.c | 6 ++---
pack-objects.h | 20 ++++++++++++--
5 files changed, 58 insertions(+), 31 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5c674b2843..647c01ea34 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -265,7 +265,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
struct git_istream *st = NULL;
if (!usable_delta) {
- if (entry->type == OBJ_BLOB &&
+ if (oe_type(entry) == OBJ_BLOB &&
entry->size > big_file_threshold &&
(st = open_istream(entry->idx.oid.hash, &type, &size, NULL)) != NULL)
buf = NULL;
@@ -371,7 +371,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
- enum object_type type = entry->type;
+ enum object_type type = oe_type(entry);
off_t datalen;
unsigned char header[MAX_PACK_OBJECT_HEADER],
dheader[MAX_PACK_OBJECT_HEADER];
@@ -480,11 +480,12 @@ static off_t write_object(struct hashfile *f,
to_reuse = 0; /* explicit */
else if (!entry->in_pack)
to_reuse = 0; /* can't reuse what we don't have */
- else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
+ else if (oe_type(entry) == OBJ_REF_DELTA ||
+ oe_type(entry) == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
to_reuse = usable_delta;
/* ... but pack split may override that */
- else if (entry->type != entry->in_pack_type)
+ else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0; /* pack has delta which is unusable */
else if (entry->delta)
to_reuse = 0; /* we want to pack afresh */
@@ -705,8 +706,8 @@ static struct object_entry **compute_write_order(void)
* And then all remaining commits and tags.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (objects[i].type != OBJ_COMMIT &&
- objects[i].type != OBJ_TAG)
+ if (oe_type(&objects[i]) != OBJ_COMMIT &&
+ oe_type(&objects[i]) != OBJ_TAG)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -715,7 +716,7 @@ static struct object_entry **compute_write_order(void)
* And then all the trees.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (objects[i].type != OBJ_TREE)
+ if (oe_type(&objects[i]) != OBJ_TREE)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
@@ -1066,8 +1067,7 @@ static void create_object_entry(const struct object_id *oid,
entry = packlist_alloc(&to_pack, oid->hash, index_pos);
entry->hash = hash;
- if (type)
- entry->type = type;
+ oe_set_type(entry, type);
if (exclude)
entry->preferred_base = 1;
else
@@ -1407,6 +1407,7 @@ static void check_object(struct object_entry *entry)
unsigned long avail;
off_t ofs;
unsigned char *buf, c;
+ enum object_type type;
buf = use_pack(p, &w_curs, entry->in_pack_offset, &avail);
@@ -1415,11 +1416,15 @@ static void check_object(struct object_entry *entry)
* since non-delta representations could still be reused.
*/
used = unpack_object_header_buffer(buf, avail,
- &entry->in_pack_type,
+ &type,
&entry->size);
if (used == 0)
goto give_up;
+ if (type < 0)
+ die("BUG: invalid type %d", type);
+ entry->in_pack_type = type;
+
/*
* Determine if this is a delta and if so whether we can
* reuse it or not. Otherwise let's find out as cheaply as
@@ -1428,9 +1433,9 @@ static void check_object(struct object_entry *entry)
switch (entry->in_pack_type) {
default:
/* Not a delta hence we've already got all we need. */
- entry->type = entry->in_pack_type;
+ oe_set_type(entry, entry->in_pack_type);
entry->in_pack_header_size = used;
- if (entry->type < OBJ_COMMIT || entry->type > OBJ_BLOB)
+ if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > OBJ_BLOB)
goto give_up;
unuse_pack(&w_curs);
return;
@@ -1484,7 +1489,7 @@ static void check_object(struct object_entry *entry)
* deltify other objects against, in order to avoid
* circular deltas.
*/
- entry->type = entry->in_pack_type;
+ oe_set_type(entry, entry->in_pack_type);
entry->delta = base_entry;
entry->delta_size = entry->size;
entry->delta_sibling = base_entry->delta_child;
@@ -1493,7 +1498,7 @@ static void check_object(struct object_entry *entry)
return;
}
- if (entry->type) {
+ if (oe_type(entry)) {
/*
* This must be a delta and we already know what the
* final object type is. Let's extract the actual
@@ -1516,7 +1521,7 @@ static void check_object(struct object_entry *entry)
unuse_pack(&w_curs);
}
- entry->type = sha1_object_info(entry->idx.oid.hash, &entry->size);
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &entry->size));
/*
* The error condition is checked in prepare_pack(). This is
* to permit a missing preferred base object to be ignored
@@ -1559,6 +1564,7 @@ static void drop_reused_delta(struct object_entry *entry)
{
struct object_entry **p = &entry->delta->delta_child;
struct object_info oi = OBJECT_INFO_INIT;
+ enum object_type type;
while (*p) {
if (*p == entry)
@@ -1570,16 +1576,18 @@ static void drop_reused_delta(struct object_entry *entry)
entry->depth = 0;
oi.sizep = &entry->size;
- oi.typep = &entry->type;
+ oi.typep = &type;
if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to sha1_object_info, which may find another copy.
- * And if that fails, the error will be recorded in entry->type
+ * And if that fails, the error will be recorded in oe_type(entry)
* and dealt with in prepare_pack().
*/
- entry->type = sha1_object_info(entry->idx.oid.hash,
- &entry->size);
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
+ &entry->size));
+ } else {
+ oe_set_type(entry, type);
}
}
@@ -1747,10 +1755,12 @@ static int type_size_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+ enum object_type a_type = oe_type(a);
+ enum object_type b_type = oe_type(b);
- if (a->type > b->type)
+ if (a_type > b_type)
return -1;
- if (a->type < b->type)
+ if (a_type < b_type)
return 1;
if (a->hash > b->hash)
return -1;
@@ -1826,7 +1836,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
void *delta_buf;
/* Don't bother doing diffs between different types */
- if (trg_entry->type != src_entry->type)
+ if (oe_type(trg_entry) != oe_type(src_entry))
return -1;
/*
@@ -2432,11 +2442,11 @@ static void prepare_pack(int window, int depth)
if (!entry->preferred_base) {
nr_deltas++;
- if (entry->type < 0)
+ if (oe_type(entry) < 0)
die("unable to get type of object %s",
oid_to_hex(&entry->idx.oid));
} else {
- if (entry->type < 0) {
+ if (oe_type(entry) < 0) {
/*
* This object is not found, but we
* don't have to include it anyway.
@@ -2545,7 +2555,7 @@ static void read_object_list_from_stdin(void)
die("expected object ID, got garbage:\n %s", line);
add_preferred_base_object(p + 1);
- add_object_entry(&oid, 0, p + 1, 0);
+ add_object_entry(&oid, OBJ_NONE, p + 1, 0);
}
}
diff --git a/cache.h b/cache.h
index 21fbcc2414..862bdff83a 100644
--- a/cache.h
+++ b/cache.h
@@ -373,6 +373,8 @@ extern void free_name_hash(struct index_state *istate);
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
#endif
+#define TYPE_BITS 3
+
enum object_type {
OBJ_BAD = -1,
OBJ_NONE = 0,
diff --git a/object.h b/object.h
index 87563d9056..8ce294d6ec 100644
--- a/object.h
+++ b/object.h
@@ -25,7 +25,6 @@ struct object_array {
#define OBJECT_ARRAY_INIT { 0, 0, NULL }
-#define TYPE_BITS 3
/*
* object flag allocation:
* revision.h: 0---------10 26
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e01f992884..fd11f08940 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -64,12 +64,12 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index,
entry->in_pack_pos = i;
- switch (entry->type) {
+ switch (oe_type(entry)) {
case OBJ_COMMIT:
case OBJ_TREE:
case OBJ_BLOB:
case OBJ_TAG:
- real_type = entry->type;
+ real_type = oe_type(entry);
break;
default:
@@ -98,7 +98,7 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index,
default:
die("Missing type information for %s (%d/%d)",
oid_to_hex(&entry->idx.oid), real_type,
- entry->type);
+ oe_type(entry));
}
}
}
diff --git a/pack-objects.h b/pack-objects.h
index c0a1f61aac..b883d7aa10 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -59,8 +59,9 @@ struct object_entry {
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
- enum object_type type;
- enum object_type in_pack_type; /* could be delta */
+ unsigned type_:TYPE_BITS;
+ unsigned in_pack_type:TYPE_BITS; /* could be delta */
+ unsigned type_valid:1;
uint32_t hash; /* name hint hash */
unsigned int in_pack_pos;
unsigned char in_pack_header_size;
@@ -123,4 +124,19 @@ static inline uint32_t pack_name_hash(const char *name)
return hash;
}
+static inline enum object_type oe_type(const struct object_entry *e)
+{
+ return e->type_valid ? e->type_ : OBJ_BAD;
+}
+
+static inline void oe_set_type(struct object_entry *e,
+ enum object_type type)
+{
+ if (type >= OBJ_ANY)
+ die("BUG: OBJ_ANY cannot be set in pack-objects code");
+
+ e->type_valid = type >= OBJ_NONE;
+ e->type_ = (unsigned)type;
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v6 03/11] pack-objects: use bitfield for object_entry::dfs_state
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 01/11] pack-objects: a bit of document about struct object_entry Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 02/11] pack-objects: turn type and in_pack_type to bitfields Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 04/11] pack-objects: use bitfield for object_entry::depth Nguyễn Thái Ngọc Duy
` (10 subsequent siblings)
13 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 3 +++
pack-objects.h | 28 +++++++++++++++++-----------
2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 647c01ea34..83f8154865 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3049,6 +3049,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
OPT_END(),
};
+ if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS))
+ die("BUG: too many dfs states, increase OE_DFS_STATE_BITS");
+
check_replace_refs = 0;
reset_pack_idx_option(&pack_idx_opts);
diff --git a/pack-objects.h b/pack-objects.h
index b883d7aa10..8507e1b869 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,21 @@
#ifndef PACK_OBJECTS_H
#define PACK_OBJECTS_H
+#define OE_DFS_STATE_BITS 2
+
+/*
+ * State flags for depth-first search used for analyzing delta cycles.
+ *
+ * The depth is measured in delta-links to the base (so if A is a delta
+ * against B, then A has a depth of 1, and B a depth of 0).
+ */
+enum dfs_state {
+ DFS_NONE = 0,
+ DFS_ACTIVE,
+ DFS_DONE,
+ DFS_NUM_STATES
+};
+
/*
* basic object info
* -----------------
@@ -73,19 +88,10 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
+ unsigned dfs_state:OE_DFS_STATE_BITS;
- /*
- * State flags for depth-first search used for analyzing delta cycles.
- *
- * The depth is measured in delta-links to the base (so if A is a delta
- * against B, then A has a depth of 1, and B a depth of 0).
- */
- enum {
- DFS_NONE = 0,
- DFS_ACTIVE,
- DFS_DONE
- } dfs_state;
int depth;
+
};
struct packing_data {
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v6 04/11] pack-objects: use bitfield for object_entry::depth
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
` (2 preceding siblings ...)
2018-03-18 14:25 ` [PATCH v6 03/11] pack-objects: use bitfield for object_entry::dfs_state Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 05/11] pack-objects: move in_pack_pos out of struct object_entry Nguyễn Thái Ngọc Duy
` (9 subsequent siblings)
13 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/config.txt | 1 +
Documentation/git-pack-objects.txt | 4 +++-
Documentation/git-repack.txt | 4 +++-
builtin/pack-objects.c | 4 ++++
pack-objects.h | 5 ++---
5 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index f57e9cf10c..9bd3f5a789 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2412,6 +2412,7 @@ pack.window::
pack.depth::
The maximum delta depth used by linkgit:git-pack-objects[1] when no
maximum depth is given on the command line. Defaults to 50.
+ Maximum value is 4095.
pack.windowMemory::
The maximum size of memory that is consumed by each thread
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 81bc490ac5..3503c9e3e6 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -96,7 +96,9 @@ base-name::
it too deep affects the performance on the unpacker
side, because delta data needs to be applied that many
times to get to the necessary object.
- The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
--window-memory=<n>::
This option provides an additional limit on top of `--window`;
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index ae750e9e11..25c83c4927 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -90,7 +90,9 @@ other objects in that pack they already have locally.
space. `--depth` limits the maximum delta depth; making it too deep
affects the performance on the unpacker side, because delta data needs
to be applied that many times to get to the necessary object.
- The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
--threads=<n>::
This option is passed through to `git pack-objects`.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 83f8154865..205e1f646c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3068,6 +3068,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (pack_to_stdout != !base_name || argc)
usage_with_options(pack_usage, pack_objects_options);
+ if (depth >= (1 << OE_DEPTH_BITS))
+ die(_("delta chain depth %d is greater than maximum limit %d"),
+ depth, (1 << OE_DEPTH_BITS) - 1);
+
argv_array_push(&rp, "pack-objects");
if (thin) {
use_internal_rev_list = 1;
diff --git a/pack-objects.h b/pack-objects.h
index 8507e1b869..59407aae3c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -2,6 +2,7 @@
#define PACK_OBJECTS_H
#define OE_DFS_STATE_BITS 2
+#define OE_DEPTH_BITS 12
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -89,9 +90,7 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
-
- int depth;
-
+ unsigned depth:OE_DEPTH_BITS;
};
struct packing_data {
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v6 05/11] pack-objects: move in_pack_pos out of struct object_entry
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
` (3 preceding siblings ...)
2018-03-18 14:25 ` [PATCH v6 04/11] pack-objects: use bitfield for object_entry::depth Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 06/11] pack-objects: move in_pack " Nguyễn Thái Ngọc Duy
` (8 subsequent siblings)
13 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (it's not freed in the same way that objects[] is
not).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 3 ++-
pack-bitmap-write.c | 8 +++++---
pack-bitmap.c | 2 +-
pack-bitmap.h | 4 +++-
pack-objects.h | 16 +++++++++++++++-
5 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 205e1f646c..e1244918a5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -879,7 +879,8 @@ static void write_pack_file(void)
if (write_bitmap_index) {
bitmap_writer_set_checksum(oid.hash);
- bitmap_writer_build_type_index(written_list, nr_written);
+ bitmap_writer_build_type_index(
+ &to_pack, written_list, nr_written);
}
finish_tmp_packfile(&tmpname, pack_tmp_name,
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index fd11f08940..f7c897515b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,8 @@ void bitmap_writer_show_progress(int show)
/**
* Build the initial type index for the packfile
*/
-void bitmap_writer_build_type_index(struct pack_idx_entry **index,
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+ struct pack_idx_entry **index,
uint32_t index_nr)
{
uint32_t i;
@@ -57,12 +58,13 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index,
writer.trees = ewah_new();
writer.blobs = ewah_new();
writer.tags = ewah_new();
+ ALLOC_ARRAY(to_pack->in_pack_pos, to_pack->nr_objects);
for (i = 0; i < index_nr; ++i) {
struct object_entry *entry = (struct object_entry *)index[i];
enum object_type real_type;
- entry->in_pack_pos = i;
+ oe_set_in_pack_pos(to_pack, entry, i);
switch (oe_type(entry)) {
case OBJ_COMMIT:
@@ -147,7 +149,7 @@ static uint32_t find_object_pos(const unsigned char *sha1)
"(object %s is missing)", sha1_to_hex(sha1));
}
- return entry->in_pack_pos;
+ return oe_in_pack_pos(writer.to_pack, entry);
}
static void show_object(struct object *object, const char *name, void *data)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9270983e5f..865d9ecc4e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1032,7 +1032,7 @@ int rebuild_existing_bitmaps(struct packing_data *mapping,
oe = packlist_find(mapping, sha1, NULL);
if (oe)
- reposition[i] = oe->in_pack_pos + 1;
+ reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
}
rebuild = bitmap_new();
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3742a00e14..5ded2f139a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,7 +44,9 @@ int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bi
void bitmap_writer_show_progress(int show);
void bitmap_writer_set_checksum(unsigned char *sha1);
-void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t index_nr);
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+ struct pack_idx_entry **index,
+ uint32_t index_nr);
void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack);
void bitmap_writer_select_commits(struct commit **indexed_commits,
unsigned int indexed_commits_nr, int max_bitmaps);
diff --git a/pack-objects.h b/pack-objects.h
index 59407aae3c..4a11653657 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -79,7 +79,6 @@ struct object_entry {
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned type_valid:1;
uint32_t hash; /* name hint hash */
- unsigned int in_pack_pos;
unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
@@ -99,6 +98,8 @@ struct packing_data {
int32_t *index;
uint32_t index_size;
+
+ unsigned int *in_pack_pos;
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -144,4 +145,17 @@ static inline void oe_set_type(struct object_entry *e,
e->type_ = (unsigned)type;
}
+static inline unsigned int oe_in_pack_pos(const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ return pack->in_pack_pos[e - pack->objects];
+}
+
+static inline void oe_set_in_pack_pos(const struct packing_data *pack,
+ const struct object_entry *e,
+ unsigned int pos)
+{
+ pack->in_pack_pos[e - pack->objects] = pos;
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v6 06/11] pack-objects: move in_pack out of struct object_entry
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
` (4 preceding siblings ...)
2018-03-18 14:25 ` [PATCH v6 05/11] pack-objects: move in_pack_pos out of struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 07/11] pack-objects: refer to delta objects by index instead of pointer Nguyễn Thái Ngọc Duy
` (7 subsequent siblings)
13 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.
This limits the number of packs we can handle to 16k. For now if you hit
16k pack files limit, pack-objects will simply fail [1].
[1] The escape hatch is .keep file to limit the non-kept pack files
below 16k limit. Then you can go for another pack-objects run to
combine another 16k pack files. Repeat until you're satisfied.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/git-pack-objects.txt | 9 ++++++
builtin/pack-objects.c | 40 +++++++++++++++++----------
cache.h | 1 +
pack-objects.h | 44 +++++++++++++++++++++++++++++-
4 files changed, 79 insertions(+), 15 deletions(-)
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 3503c9e3e6..b8d936ccf5 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -269,6 +269,15 @@ Unexpected missing object will raise an error.
locally created objects [without .promisor] and objects from the
promisor remote [with .promisor].) This is used with partial clone.
+LIMITATIONS
+-----------
+
+This command could only handle 16384 existing pack files at a time.
+If you have more than this, you need to exclude some pack files with
+".keep" file and --honor-pack-keep option, to combine 16k pack files
+in one, then remove these .keep files and run pack-objects one more
+time.
+
SEE ALSO
--------
linkgit:git-rev-list[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e1244918a5..9792d31e46 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -29,6 +29,8 @@
#include "list.h"
#include "packfile.h"
+#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+
static const char *pack_usage[] = {
N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
N_("git pack-objects [<options>...] <base-name> [< <ref-list> | < <object-list>]"),
@@ -367,7 +369,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned long limit, int usable_delta)
{
- struct packed_git *p = entry->in_pack;
+ struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
@@ -478,7 +480,7 @@ static off_t write_object(struct hashfile *f,
if (!reuse_object)
to_reuse = 0; /* explicit */
- else if (!entry->in_pack)
+ else if (!IN_PACK(entry))
to_reuse = 0; /* can't reuse what we don't have */
else if (oe_type(entry) == OBJ_REF_DELTA ||
oe_type(entry) == OBJ_OFS_DELTA)
@@ -1025,7 +1027,7 @@ static int want_object_in_pack(const struct object_id *oid,
if (*found_pack) {
want = want_found_object(exclude, *found_pack);
if (want != -1)
- return want;
+ goto done;
}
list_for_each(pos, &packed_git_mru) {
@@ -1048,11 +1050,16 @@ static int want_object_in_pack(const struct object_id *oid,
if (!exclude && want > 0)
list_move(&p->mru, &packed_git_mru);
if (want != -1)
- return want;
+ goto done;
}
}
- return 1;
+ want = 1;
+done:
+ if (want && *found_pack && !(*found_pack)->index)
+ oe_add_pack(&to_pack, *found_pack);
+
+ return want;
}
static void create_object_entry(const struct object_id *oid,
@@ -1074,7 +1081,7 @@ static void create_object_entry(const struct object_id *oid,
else
nr_result++;
if (found_pack) {
- entry->in_pack = found_pack;
+ oe_set_in_pack(entry, found_pack);
entry->in_pack_offset = found_offset;
}
@@ -1399,8 +1406,8 @@ static void cleanup_preferred_base(void)
static void check_object(struct object_entry *entry)
{
- if (entry->in_pack) {
- struct packed_git *p = entry->in_pack;
+ if (IN_PACK(entry)) {
+ struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
const unsigned char *base_ref = NULL;
struct object_entry *base_entry;
@@ -1535,14 +1542,16 @@ static int pack_offset_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+ const struct packed_git *a_in_pack = IN_PACK(a);
+ const struct packed_git *b_in_pack = IN_PACK(b);
/* avoid filesystem trashing with loose objects */
- if (!a->in_pack && !b->in_pack)
+ if (!a_in_pack && !b_in_pack)
return oidcmp(&a->idx.oid, &b->idx.oid);
- if (a->in_pack < b->in_pack)
+ if (a_in_pack < b_in_pack)
return -1;
- if (a->in_pack > b->in_pack)
+ if (a_in_pack > b_in_pack)
return 1;
return a->in_pack_offset < b->in_pack_offset ? -1 :
(a->in_pack_offset > b->in_pack_offset);
@@ -1578,7 +1587,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &entry->size;
oi.typep = &type;
- if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to sha1_object_info, which may find another copy.
@@ -1848,8 +1857,8 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
* it, we will still save the transfer cost, as we already know
* the other side has it and we won't send src_entry at all.
*/
- if (reuse_delta && trg_entry->in_pack &&
- trg_entry->in_pack == src_entry->in_pack &&
+ if (reuse_delta && IN_PACK(trg_entry) &&
+ IN_PACK(trg_entry) == IN_PACK(src_entry) &&
!src_entry->preferred_base &&
trg_entry->in_pack_type != OBJ_REF_DELTA &&
trg_entry->in_pack_type != OBJ_OFS_DELTA)
@@ -3191,6 +3200,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
}
}
+ /* make sure IN_PACK(0) return NULL */
+ oe_add_pack(&to_pack, NULL);
+
if (progress)
progress_state = start_progress(_("Counting objects"), 0);
if (!use_internal_rev_list)
diff --git a/cache.h b/cache.h
index 862bdff83a..b90feb3802 100644
--- a/cache.h
+++ b/cache.h
@@ -1635,6 +1635,7 @@ extern struct packed_git {
int index_version;
time_t mtime;
int pack_fd;
+ int index; /* for builtin/pack-objects.c */
unsigned pack_local:1,
pack_keep:1,
freshened:1,
diff --git a/pack-objects.h b/pack-objects.h
index 4a11653657..bf905c3f9b 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -3,6 +3,7 @@
#define OE_DFS_STATE_BITS 2
#define OE_DEPTH_BITS 12
+#define OE_IN_PACK_BITS 14
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -18,6 +19,10 @@ enum dfs_state {
};
/*
+ * The size of struct nearly determines pack-objects's memory
+ * consumption. This struct is packed tight for that reason. When you
+ * add or reorder something in this struct, think a bit about this.
+ *
* basic object info
* -----------------
* idx.oid is filled up before delta searching starts. idx.crc32 is
@@ -65,7 +70,7 @@ enum dfs_state {
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
- struct packed_git *in_pack; /* already in pack */
+ unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
off_t in_pack_offset;
struct object_entry *delta; /* delta base object */
struct object_entry *delta_child; /* deltified objects who bases me */
@@ -100,6 +105,8 @@ struct packing_data {
uint32_t index_size;
unsigned int *in_pack_pos;
+ int in_pack_count;
+ struct packed_git *in_pack[1 << OE_IN_PACK_BITS];
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -158,4 +165,39 @@ static inline void oe_set_in_pack_pos(const struct packing_data *pack,
pack->in_pack_pos[e - pack->objects] = pos;
}
+static inline unsigned int oe_add_pack(struct packing_data *pack,
+ struct packed_git *p)
+{
+ if (pack->in_pack_count >= (1 << OE_IN_PACK_BITS))
+ die(_("too many packs to handle in one go. "
+ "Please add .keep files to exclude\n"
+ "some pack files and keep the number "
+ "of non-kept files below %d."),
+ 1 << OE_IN_PACK_BITS);
+ if (p) {
+ if (p->index > 0)
+ die("BUG: this packed is already indexed");
+ p->index = pack->in_pack_count;
+ }
+ pack->in_pack[pack->in_pack_count] = p;
+ return pack->in_pack_count++;
+}
+
+static inline struct packed_git *oe_in_pack(const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ return pack->in_pack[e->in_pack_idx];
+
+}
+
+static inline void oe_set_in_pack(struct object_entry *e,
+ struct packed_git *p)
+{
+ if (p->index <= 0)
+ die("BUG: found_pack should be NULL "
+ "instead of having non-positive index");
+ e->in_pack_idx = p->index;
+
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v6 07/11] pack-objects: refer to delta objects by index instead of pointer
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
` (5 preceding siblings ...)
2018-03-18 14:25 ` [PATCH v6 06/11] pack-objects: move in_pack " Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 08/11] pack-objects: shrink z_delta_size field in struct object_entry Nguyễn Thái Ngọc Duy
` (6 subsequent siblings)
13 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4G of those objects
because the array size in nr_objects is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.
Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].
[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 116 ++++++++++++++++++++++-------------------
pack-objects.h | 67 ++++++++++++++++++++++--
2 files changed, 124 insertions(+), 59 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9792d31e46..b39234f7fb 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,6 +30,12 @@
#include "packfile.h"
#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+#define DELTA(obj) oe_delta(&to_pack, obj)
+#define DELTA_CHILD(obj) oe_delta_child(&to_pack, obj)
+#define DELTA_SIBLING(obj) oe_delta_sibling(&to_pack, obj)
+#define SET_DELTA(obj, val) oe_set_delta(&to_pack, obj, val)
+#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(&to_pack, obj, val)
+#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(&to_pack, obj, val)
static const char *pack_usage[] = {
N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
@@ -127,11 +133,11 @@ static void *get_delta(struct object_entry *entry)
buf = read_sha1_file(entry->idx.oid.hash, &type, &size);
if (!buf)
die("unable to read %s", oid_to_hex(&entry->idx.oid));
- base_buf = read_sha1_file(entry->delta->idx.oid.hash, &type,
+ base_buf = read_sha1_file(DELTA(entry)->idx.oid.hash, &type,
&base_size);
if (!base_buf)
die("unable to read %s",
- oid_to_hex(&entry->delta->idx.oid));
+ oid_to_hex(&DELTA(entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
buf, size, &delta_size, 0);
if (!delta_buf || delta_size != entry->delta_size)
@@ -288,12 +294,12 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
size = entry->delta_size;
buf = entry->delta_data;
entry->delta_data = NULL;
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
size = entry->delta_size;
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
@@ -317,7 +323,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
* encoding of the relative offset for the delta
* base from this object's position in the pack.
*/
- off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+ off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -343,7 +349,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
return 0;
}
hashwrite(f, header, hdrlen);
- hashwrite(f, entry->delta->idx.oid.hash, 20);
+ hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
} else {
if (limit && hdrlen + datalen + 20 >= limit) {
@@ -379,8 +385,8 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
- if (entry->delta)
- type = (allow_ofs_delta && entry->delta->idx.offset) ?
+ if (DELTA(entry))
+ type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
type, entry->size);
@@ -408,7 +414,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
}
if (type == OBJ_OFS_DELTA) {
- off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+ off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -427,7 +433,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
return 0;
}
hashwrite(f, header, hdrlen);
- hashwrite(f, entry->delta->idx.oid.hash, 20);
+ hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
reused_delta++;
} else {
@@ -467,13 +473,13 @@ static off_t write_object(struct hashfile *f,
else
limit = pack_size_limit - write_offset;
- if (!entry->delta)
+ if (!DELTA(entry))
usable_delta = 0; /* no delta */
else if (!pack_size_limit)
usable_delta = 1; /* unlimited packfile */
- else if (entry->delta->idx.offset == (off_t)-1)
+ else if (DELTA(entry)->idx.offset == (off_t)-1)
usable_delta = 0; /* base was written to another pack */
- else if (entry->delta->idx.offset)
+ else if (DELTA(entry)->idx.offset)
usable_delta = 1; /* base already exists in this pack */
else
usable_delta = 0; /* base could end up in another pack */
@@ -489,7 +495,7 @@ static off_t write_object(struct hashfile *f,
/* ... but pack split may override that */
else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0; /* pack has delta which is unusable */
- else if (entry->delta)
+ else if (DELTA(entry))
to_reuse = 0; /* we want to pack afresh */
else
to_reuse = 1; /* we have it in-pack undeltified,
@@ -541,12 +547,12 @@ static enum write_one_status write_one(struct hashfile *f,
}
/* if we are deltified, write out base object first. */
- if (e->delta) {
+ if (DELTA(e)) {
e->idx.offset = 1; /* now recurse */
- switch (write_one(f, e->delta, offset)) {
+ switch (write_one(f, DELTA(e), offset)) {
case WRITE_ONE_RECURSIVE:
/* we cannot depend on this one */
- e->delta = NULL;
+ SET_DELTA(e, NULL);
break;
default:
break;
@@ -608,34 +614,34 @@ static void add_descendants_to_write_order(struct object_entry **wo,
/* add this node... */
add_to_write_order(wo, endp, e);
/* all its siblings... */
- for (s = e->delta_sibling; s; s = s->delta_sibling) {
+ for (s = DELTA_SIBLING(e); s; s = DELTA_SIBLING(s)) {
add_to_write_order(wo, endp, s);
}
}
/* drop down a level to add left subtree nodes if possible */
- if (e->delta_child) {
+ if (DELTA_CHILD(e)) {
add_to_order = 1;
- e = e->delta_child;
+ e = DELTA_CHILD(e);
} else {
add_to_order = 0;
/* our sibling might have some children, it is next */
- if (e->delta_sibling) {
- e = e->delta_sibling;
+ if (DELTA_SIBLING(e)) {
+ e = DELTA_SIBLING(e);
continue;
}
/* go back to our parent node */
- e = e->delta;
- while (e && !e->delta_sibling) {
+ e = DELTA(e);
+ while (e && !DELTA_SIBLING(e)) {
/* we're on the right side of a subtree, keep
* going up until we can go right again */
- e = e->delta;
+ e = DELTA(e);
}
if (!e) {
/* done- we hit our original root node */
return;
}
/* pass it off to sibling at this level */
- e = e->delta_sibling;
+ e = DELTA_SIBLING(e);
}
};
}
@@ -646,7 +652,7 @@ static void add_family_to_write_order(struct object_entry **wo,
{
struct object_entry *root;
- for (root = e; root->delta; root = root->delta)
+ for (root = e; DELTA(root); root = DELTA(root))
; /* nothing */
add_descendants_to_write_order(wo, endp, root);
}
@@ -661,8 +667,8 @@ static struct object_entry **compute_write_order(void)
for (i = 0; i < to_pack.nr_objects; i++) {
objects[i].tagged = 0;
objects[i].filled = 0;
- objects[i].delta_child = NULL;
- objects[i].delta_sibling = NULL;
+ SET_DELTA_CHILD(&objects[i], NULL);
+ SET_DELTA_SIBLING(&objects[i], NULL);
}
/*
@@ -672,11 +678,11 @@ static struct object_entry **compute_write_order(void)
*/
for (i = to_pack.nr_objects; i > 0;) {
struct object_entry *e = &objects[--i];
- if (!e->delta)
+ if (!DELTA(e))
continue;
/* Mark me as the first child */
- e->delta_sibling = e->delta->delta_child;
- e->delta->delta_child = e;
+ e->delta_sibling_idx = DELTA(e)->delta_child_idx;
+ SET_DELTA_CHILD(DELTA(e), e);
}
/*
@@ -1498,10 +1504,10 @@ static void check_object(struct object_entry *entry)
* circular deltas.
*/
oe_set_type(entry, entry->in_pack_type);
- entry->delta = base_entry;
+ SET_DELTA(entry, base_entry);
entry->delta_size = entry->size;
- entry->delta_sibling = base_entry->delta_child;
- base_entry->delta_child = entry;
+ entry->delta_sibling_idx = base_entry->delta_child_idx;
+ SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
return;
}
@@ -1572,17 +1578,19 @@ static int pack_offset_sort(const void *_a, const void *_b)
*/
static void drop_reused_delta(struct object_entry *entry)
{
- struct object_entry **p = &entry->delta->delta_child;
+ unsigned *idx = &to_pack.objects[entry->delta_idx - 1].delta_child_idx;
struct object_info oi = OBJECT_INFO_INIT;
enum object_type type;
- while (*p) {
- if (*p == entry)
- *p = (*p)->delta_sibling;
+ while (*idx) {
+ struct object_entry *oe = &to_pack.objects[*idx - 1];
+
+ if (oe == entry)
+ *idx = oe->delta_sibling_idx;
else
- p = &(*p)->delta_sibling;
+ idx = &oe->delta_sibling_idx;
}
- entry->delta = NULL;
+ SET_DELTA(entry, NULL);
entry->depth = 0;
oi.sizep = &entry->size;
@@ -1622,7 +1630,7 @@ static void break_delta_chains(struct object_entry *entry)
for (cur = entry, total_depth = 0;
cur;
- cur = cur->delta, total_depth++) {
+ cur = DELTA(cur), total_depth++) {
if (cur->dfs_state == DFS_DONE) {
/*
* We've already seen this object and know it isn't
@@ -1647,7 +1655,7 @@ static void break_delta_chains(struct object_entry *entry)
* it's not a delta, we're done traversing, but we'll mark it
* done to save time on future traversals.
*/
- if (!cur->delta) {
+ if (!DELTA(cur)) {
cur->dfs_state = DFS_DONE;
break;
}
@@ -1670,7 +1678,7 @@ static void break_delta_chains(struct object_entry *entry)
* We keep all commits in the chain that we examined.
*/
cur->dfs_state = DFS_ACTIVE;
- if (cur->delta->dfs_state == DFS_ACTIVE) {
+ if (DELTA(cur)->dfs_state == DFS_ACTIVE) {
drop_reused_delta(cur);
cur->dfs_state = DFS_DONE;
break;
@@ -1685,7 +1693,7 @@ static void break_delta_chains(struct object_entry *entry)
* an extra "next" pointer to keep going after we reset cur->delta.
*/
for (cur = entry; cur; cur = next) {
- next = cur->delta;
+ next = DELTA(cur);
/*
* We should have a chain of zero or more ACTIVE states down to
@@ -1870,7 +1878,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
/* Now some size filtering heuristics. */
trg_size = trg_entry->size;
- if (!trg_entry->delta) {
+ if (!DELTA(trg_entry)) {
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
@@ -1946,7 +1954,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
if (!delta_buf)
return 0;
- if (trg_entry->delta) {
+ if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
if (delta_size == trg_entry->delta_size &&
src->depth + 1 >= trg->depth) {
@@ -1975,7 +1983,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
free(delta_buf);
}
- trg_entry->delta = src_entry;
+ SET_DELTA(trg_entry, src_entry);
trg_entry->delta_size = delta_size;
trg->depth = src->depth + 1;
@@ -1984,13 +1992,13 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)
{
- struct object_entry *child = me->delta_child;
+ struct object_entry *child = DELTA_CHILD(me);
unsigned int m = n;
while (child) {
unsigned int c = check_delta_limit(child, n + 1);
if (m < c)
m = c;
- child = child->delta_sibling;
+ child = DELTA_SIBLING(child);
}
return m;
}
@@ -2059,7 +2067,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* otherwise they would become too deep.
*/
max_depth = depth;
- if (entry->delta_child) {
+ if (DELTA_CHILD(entry)) {
max_depth -= check_delta_limit(entry, 0);
if (max_depth <= 0)
goto next;
@@ -2109,7 +2117,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* depth, leaving it in the window is pointless. we
* should evict it first.
*/
- if (entry->delta && max_depth <= n->depth)
+ if (DELTA(entry) && max_depth <= n->depth)
continue;
/*
@@ -2117,7 +2125,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* currently deltified object, to keep it longer. It will
* be the first base object to be attempted next.
*/
- if (entry->delta) {
+ if (DELTA(entry)) {
struct unpacked swap = array[best_base];
int dist = (window + idx - best_base) % window;
int dst = best_base;
@@ -2438,7 +2446,7 @@ static void prepare_pack(int window, int depth)
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = to_pack.objects + i;
- if (entry->delta)
+ if (DELTA(entry))
/* This happens if we decided to reuse existing
* delta from a pack. "reuse_delta &&" is implied.
*/
diff --git a/pack-objects.h b/pack-objects.h
index bf905c3f9b..594a213554 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -72,11 +72,11 @@ struct object_entry {
unsigned long size; /* uncompressed size */
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
off_t in_pack_offset;
- struct object_entry *delta; /* delta base object */
- struct object_entry *delta_child; /* deltified objects who bases me */
- struct object_entry *delta_sibling; /* other deltified objects who
- * uses the same base as me
- */
+ uint32_t delta_idx; /* delta base object */
+ uint32_t delta_child_idx; /* deltified objects who bases me */
+ uint32_t delta_sibling_idx; /* other deltified objects who
+ * uses the same base as me
+ */
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
@@ -200,4 +200,61 @@ static inline void oe_set_in_pack(struct object_entry *e,
}
+static inline struct object_entry *oe_delta(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_idx)
+ return &pack->objects[e->delta_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_idx = 0;
+}
+
+static inline struct object_entry *oe_delta_child(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_child_idx)
+ return &pack->objects[e->delta_child_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta_child(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_child_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_child_idx = 0;
+}
+
+static inline struct object_entry *oe_delta_sibling(
+ const struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_sibling_idx)
+ return &pack->objects[e->delta_sibling_idx - 1];
+ return NULL;
+}
+
+static inline void oe_set_delta_sibling(struct packing_data *pack,
+ struct object_entry *e,
+ struct object_entry *delta)
+{
+ if (delta)
+ e->delta_sibling_idx = (delta - pack->objects) + 1;
+ else
+ e->delta_sibling_idx = 0;
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v6 08/11] pack-objects: shrink z_delta_size field in struct object_entry
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
` (6 preceding siblings ...)
2018-03-18 14:25 ` [PATCH v6 07/11] pack-objects: refer to delta objects by index instead of pointer Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
` (5 subsequent siblings)
13 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 16 bits, so you can only cache 65kb deltas.
Larger deltas must be recomputed at when the pack is written down.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/config.txt | 3 ++-
builtin/pack-objects.c | 22 ++++++++++++++++------
pack-objects.h | 3 ++-
3 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9bd3f5a789..00fa824448 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2449,7 +2449,8 @@ pack.deltaCacheLimit::
The maximum size of a delta, that is cached in
linkgit:git-pack-objects[1]. This cache is used to speed up the
writing object phase by not having to recompute the final delta
- result once the best match for all objects is found. Defaults to 1000.
+ result once the best match for all objects is found.
+ Defaults to 1000. Maximum value is 65535.
pack.threads::
Specifies the number of threads to spawn when searching for best
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b39234f7fb..372afe48c4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2105,12 +2105,19 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
* between writes at that moment.
*/
if (entry->delta_data && !pack_to_stdout) {
- entry->z_delta_size = do_compress(&entry->delta_data,
- entry->delta_size);
- cache_lock();
- delta_cache_size -= entry->delta_size;
- delta_cache_size += entry->z_delta_size;
- cache_unlock();
+ unsigned long size;
+
+ size = do_compress(&entry->delta_data, entry->delta_size);
+ if (size < (1 << OE_Z_DELTA_BITS)) {
+ entry->z_delta_size = size;
+ cache_lock();
+ delta_cache_size -= entry->delta_size;
+ delta_cache_size += entry->z_delta_size;
+ cache_unlock();
+ } else {
+ FREE_AND_NULL(entry->delta_data);
+ entry->z_delta_size = 0;
+ }
}
/* if we made n a delta, and if n is already at max
@@ -3089,6 +3096,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (depth >= (1 << OE_DEPTH_BITS))
die(_("delta chain depth %d is greater than maximum limit %d"),
depth, (1 << OE_DEPTH_BITS) - 1);
+ if (cache_max_small_delta_size >= (1 << OE_Z_DELTA_BITS))
+ die(_("pack.deltaCacheLimit is greater than maximum limit %d"),
+ (1 << OE_Z_DELTA_BITS) - 1);
argv_array_push(&rp, "pack-objects");
if (thin) {
diff --git a/pack-objects.h b/pack-objects.h
index 594a213554..c12219385a 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -4,6 +4,7 @@
#define OE_DFS_STATE_BITS 2
#define OE_DEPTH_BITS 12
#define OE_IN_PACK_BITS 14
+#define OE_Z_DELTA_BITS 16
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -79,7 +80,7 @@ struct object_entry {
*/
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
- unsigned long z_delta_size; /* delta data size (compressed) */
+ unsigned z_delta_size:OE_Z_DELTA_BITS;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned type_valid:1;
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
` (7 preceding siblings ...)
2018-03-18 14:25 ` [PATCH v6 08/11] pack-objects: shrink z_delta_size field in struct object_entry Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:49 ` Ævar Arnfjörð Bjarmason
` (2 more replies)
2018-03-18 14:25 ` [PATCH v6 10/11] pack-objects: shrink delta_size " Nguyễn Thái Ngọc Duy
` (4 subsequent siblings)
13 siblings, 3 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
It's very very rare that an uncompressedd object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.
Shrink size field down to 32 bits [1] and one overflow bit. If the size
is too large, we read it back from disk.
Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).
A small note about the conditional oe_set_size() in
check_object(). Technically if we don't get a valid type, it's not
wrong if we set uninitialized value "size" (we don't pre-initialize
this and sha1_object_info will not assign anything when it fails to
get the info).
This how changes the writing code path slightly which emits different
error messages (either way we die). One of our tests in t5530 depends
on this specific error message. Let's just keep the test as-is and
play safe by not assigning random value. That might trigger valgrind
anyway.
[1] it's actually already 32 bits on Windows
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 49 ++++++++++++++++++++++-------------
pack-objects.h | 58 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 88 insertions(+), 19 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 372afe48c4..89ed4b5125 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -274,7 +274,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
if (!usable_delta) {
if (oe_type(entry) == OBJ_BLOB &&
- entry->size > big_file_threshold &&
+ oe_size_greater_than(entry, big_file_threshold) &&
(st = open_istream(entry->idx.oid.hash, &type, &size, NULL)) != NULL)
buf = NULL;
else {
@@ -384,12 +384,13 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned char header[MAX_PACK_OBJECT_HEADER],
dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
+ unsigned long entry_size = oe_size(entry);
if (DELTA(entry))
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
- type, entry->size);
+ type, entry_size);
offset = entry->in_pack_offset;
revidx = find_pack_revindex(p, offset);
@@ -406,7 +407,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
datalen -= entry->in_pack_header_size;
if (!pack_to_stdout && p->index_version == 1 &&
- check_pack_inflate(p, &w_curs, offset, datalen, entry->size)) {
+ check_pack_inflate(p, &w_curs, offset, datalen, entry_size)) {
error("corrupt packed object for %s",
oid_to_hex(&entry->idx.oid));
unuse_pack(&w_curs);
@@ -1412,6 +1413,8 @@ static void cleanup_preferred_base(void)
static void check_object(struct object_entry *entry)
{
+ unsigned long size;
+
if (IN_PACK(entry)) {
struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
@@ -1431,13 +1434,14 @@ static void check_object(struct object_entry *entry)
*/
used = unpack_object_header_buffer(buf, avail,
&type,
- &entry->size);
+ &size);
if (used == 0)
goto give_up;
if (type < 0)
die("BUG: invalid type %d", type);
entry->in_pack_type = type;
+ oe_set_size(entry, size);
/*
* Determine if this is a delta and if so whether we can
@@ -1505,7 +1509,7 @@ static void check_object(struct object_entry *entry)
*/
oe_set_type(entry, entry->in_pack_type);
SET_DELTA(entry, base_entry);
- entry->delta_size = entry->size;
+ entry->delta_size = oe_size(entry);
entry->delta_sibling_idx = base_entry->delta_child_idx;
SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
@@ -1513,14 +1517,17 @@ static void check_object(struct object_entry *entry)
}
if (oe_type(entry)) {
+ unsigned long size;
+
+ size = get_size_from_delta(p, &w_curs,
+ entry->in_pack_offset + entry->in_pack_header_size);
/*
* This must be a delta and we already know what the
* final object type is. Let's extract the actual
* object size from the delta header.
*/
- entry->size = get_size_from_delta(p, &w_curs,
- entry->in_pack_offset + entry->in_pack_header_size);
- if (entry->size == 0)
+ oe_set_size(entry, size);
+ if (oe_size_less_than(entry, 1))
goto give_up;
unuse_pack(&w_curs);
return;
@@ -1535,13 +1542,15 @@ static void check_object(struct object_entry *entry)
unuse_pack(&w_curs);
}
- oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &entry->size));
+ oe_set_type(entry, sha1_object_info(entry->idx.oid.hash, &size));
/*
* The error condition is checked in prepare_pack(). This is
* to permit a missing preferred base object to be ignored
* as a preferred base. Doing so can result in a larger
* pack file, but the transfer will still take place.
*/
+ if (entry->type_valid)
+ oe_set_size(entry, size);
}
static int pack_offset_sort(const void *_a, const void *_b)
@@ -1581,6 +1590,7 @@ static void drop_reused_delta(struct object_entry *entry)
unsigned *idx = &to_pack.objects[entry->delta_idx - 1].delta_child_idx;
struct object_info oi = OBJECT_INFO_INIT;
enum object_type type;
+ unsigned long size;
while (*idx) {
struct object_entry *oe = &to_pack.objects[*idx - 1];
@@ -1593,7 +1603,7 @@ static void drop_reused_delta(struct object_entry *entry)
SET_DELTA(entry, NULL);
entry->depth = 0;
- oi.sizep = &entry->size;
+ oi.sizep = &size;
oi.typep = &type;
if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
@@ -1603,10 +1613,11 @@ static void drop_reused_delta(struct object_entry *entry)
* and dealt with in prepare_pack().
*/
oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
- &entry->size));
+ &size));
} else {
oe_set_type(entry, type);
}
+ oe_set_size(entry, size);
}
/*
@@ -1746,7 +1757,7 @@ static void get_object_details(void)
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = sorted_by_offset[i];
check_object(entry);
- if (big_file_threshold < entry->size)
+ if (oe_size_greater_than(entry, big_file_threshold))
entry->no_try_delta = 1;
}
@@ -1775,6 +1786,8 @@ static int type_size_sort(const void *_a, const void *_b)
const struct object_entry *b = *(struct object_entry **)_b;
enum object_type a_type = oe_type(a);
enum object_type b_type = oe_type(b);
+ unsigned long a_size = oe_size(a);
+ unsigned long b_size = oe_size(b);
if (a_type > b_type)
return -1;
@@ -1788,9 +1801,9 @@ static int type_size_sort(const void *_a, const void *_b)
return -1;
if (a->preferred_base < b->preferred_base)
return 1;
- if (a->size > b->size)
+ if (a_size > b_size)
return -1;
- if (a->size < b->size)
+ if (a_size < b_size)
return 1;
return a < b ? -1 : (a > b); /* newest first */
}
@@ -1877,7 +1890,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
return 0;
/* Now some size filtering heuristics. */
- trg_size = trg_entry->size;
+ trg_size = oe_size(trg_entry);
if (!DELTA(trg_entry)) {
max_size = trg_size/2 - 20;
ref_depth = 1;
@@ -1889,7 +1902,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
(max_depth - ref_depth + 1);
if (max_size == 0)
return 0;
- src_size = src_entry->size;
+ src_size = oe_size(src_entry);
sizediff = src_size < trg_size ? trg_size - src_size : 0;
if (sizediff >= max_size)
return 0;
@@ -2009,7 +2022,7 @@ static unsigned long free_unpacked(struct unpacked *n)
free_delta_index(n->index);
n->index = NULL;
if (n->data) {
- freed_mem += n->entry->size;
+ freed_mem += oe_size(n->entry);
FREE_AND_NULL(n->data);
}
n->entry = NULL;
@@ -2459,7 +2472,7 @@ static void prepare_pack(int window, int depth)
*/
continue;
- if (entry->size < 50)
+ if (oe_size_less_than(entry, 50))
continue;
if (entry->no_try_delta)
diff --git a/pack-objects.h b/pack-objects.h
index c12219385a..0beedbc637 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -70,7 +70,9 @@ enum dfs_state {
*/
struct object_entry {
struct pack_idx_entry idx;
- unsigned long size; /* uncompressed size */
+ /* object uncompressed size _if_ size_valid is true */
+ uint32_t size_;
+ unsigned size_valid:1;
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
off_t in_pack_offset;
uint32_t delta_idx; /* delta base object */
@@ -258,4 +260,58 @@ static inline void oe_set_delta_sibling(struct packing_data *pack,
e->delta_sibling_idx = 0;
}
+static inline unsigned long oe_size(const struct object_entry *e)
+{
+ if (e->size_valid) {
+ return e->size_;
+ } else {
+ unsigned long size;
+
+ sha1_object_info(e->idx.oid.hash, &size);
+ return size;
+ }
+}
+
+static inline int oe_fits_in_32bits(unsigned long limit)
+{
+ uint32_t truncated_limit = (uint32_t)limit;
+
+ return limit == truncated_limit;
+}
+
+static inline int oe_size_less_than(const struct object_entry *e,
+ unsigned long limit)
+{
+ if (e->size_valid)
+ return e->size_ < limit;
+ if (oe_fits_in_32bits(limit)) /* limit < 2^32 <= size ? */
+ return 0;
+ return oe_size(e) < limit;
+}
+
+static inline int oe_size_greater_than(const struct object_entry *e,
+ unsigned long limit)
+{
+ if (e->size_valid)
+ return e->size_ > limit;
+ if (oe_fits_in_32bits(limit)) /* limit < 2^32 <= size ? */
+ return 1;
+ return oe_size(e) > limit;
+}
+
+static inline void oe_set_size(struct object_entry *e,
+ unsigned long size)
+{
+ e->size_ = size;
+ e->size_valid = e->size_ == size;
+
+ if (!e->size_valid) {
+ unsigned long real_size;
+
+ if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 ||
+ size != real_size)
+ die("BUG: 'size' is supposed to be the object size!");
+ }
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-18 14:25 ` [PATCH v6 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:49 ` Ævar Arnfjörð Bjarmason
2018-03-19 16:19 ` Junio C Hamano
2018-03-19 16:43 ` Junio C Hamano
2 siblings, 0 replies; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-18 14:49 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: e, git, gitster, peff
On Sun, Mar 18 2018, Nguyễn Thái Ngọc Duy jotted:
> It's very very rare that an uncompressedd object is larger than 4GB
So this went from a typo of "uncompressd" in v5 to "uncompressedd",
needs one less "d": "uncompressed".
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-18 14:25 ` [PATCH v6 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
2018-03-18 14:49 ` Ævar Arnfjörð Bjarmason
@ 2018-03-19 16:19 ` Junio C Hamano
2018-03-19 16:23 ` Duy Nguyen
2018-03-19 16:43 ` Junio C Hamano
2 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2018-03-19 16:19 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> +static inline int oe_fits_in_32bits(unsigned long limit)
> +{
> + uint32_t truncated_limit = (uint32_t)limit;
> +
> + return limit == truncated_limit;
> +}
I do not think it is worth a reroll (there only are a few
callsites), but the above has nothing to do with "oe" fitting
anything (it is about "limit"). Do you mind if I did this instead?
static inline int fits_in_32bits(unsigned long size)
... or other suggestions, perhaps?
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-19 16:19 ` Junio C Hamano
@ 2018-03-19 16:23 ` Duy Nguyen
0 siblings, 0 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-19 16:23 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
On Mon, Mar 19, 2018 at 5:19 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> +static inline int oe_fits_in_32bits(unsigned long limit)
>> +{
>> + uint32_t truncated_limit = (uint32_t)limit;
>> +
>> + return limit == truncated_limit;
>> +}
>
> I do not think it is worth a reroll (there only are a few
> callsites), but the above has nothing to do with "oe" fitting
> anything (it is about "limit"). Do you mind if I did this instead?
>
> static inline int fits_in_32bits(unsigned long size)
>
> ... or other suggestions, perhaps?
>
I just tried to not pollute the general namespace too much. That works
for me too.
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-18 14:25 ` [PATCH v6 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
2018-03-18 14:49 ` Ævar Arnfjörð Bjarmason
2018-03-19 16:19 ` Junio C Hamano
@ 2018-03-19 16:43 ` Junio C Hamano
2018-03-19 16:54 ` Duy Nguyen
2018-03-20 18:17 ` Duy Nguyen
2 siblings, 2 replies; 273+ messages in thread
From: Junio C Hamano @ 2018-03-19 16:43 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, peff
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> +static inline void oe_set_size(struct object_entry *e,
> + unsigned long size)
> +{
> + e->size_ = size;
> + e->size_valid = e->size_ == size;
A quite similar comment as my earlier one applies here. I wonder if
this is easier to read?
e->size_valid = fits_in_32bits(size);
if (e->size_valid)
e->size_ = size;
Stepping back a bit in a different tangent,
- fits_in_32bits() is a good public name if the helper is about
seeing if the given quantity fits in 32bit uint,
- but that carves it in stone that our e->size_ *will* be 32bit
forever, which is not good.
So, it may be a good idea to call it size_cacheable_in_oe(size) or
something to ask "I have this 'size'; is it small enough to fit in
the field in the oe, i.e. allow us to cache it, as opposed to having
to go back to the object every time?" Of course, this would declare
that the helper can only be used for that particular field, but that
is sort of the point of such a change, to allow us to later define
the e->size_ field to different sizes without affecting other stuff.
> + if (!e->size_valid) {
> + unsigned long real_size;
> +
> + if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 ||
> + size != real_size)
> + die("BUG: 'size' is supposed to be the object size!");
> + }
If an object that is smaller than 4GB is fed to this function with
an incorrect size, we happily record it in e->size_ and declare it
is valid. Wouldn't that be equally grave error as we are catching
in this block?
> +}
> +
> #endif
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-19 16:43 ` Junio C Hamano
@ 2018-03-19 16:54 ` Duy Nguyen
2018-03-19 18:29 ` Junio C Hamano
2018-03-20 18:17 ` Duy Nguyen
1 sibling, 1 reply; 273+ messages in thread
From: Duy Nguyen @ 2018-03-19 16:54 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
On Mon, Mar 19, 2018 at 5:43 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> +static inline void oe_set_size(struct object_entry *e,
>> + unsigned long size)
>> +{
>> + e->size_ = size;
>> + e->size_valid = e->size_ == size;
>
> A quite similar comment as my earlier one applies here. I wonder if
> this is easier to read?
>
> e->size_valid = fits_in_32bits(size);
> if (e->size_valid)
> e->size_ = size;
>
> Stepping back a bit in a different tangent,
>
> - fits_in_32bits() is a good public name if the helper is about
> seeing if the given quantity fits in 32bit uint,
>
> - but that carves it in stone that our e->size_ *will* be 32bit
> forever, which is not good.
>
> So, it may be a good idea to call it size_cacheable_in_oe(size) or
> something to ask "I have this 'size'; is it small enough to fit in
> the field in the oe, i.e. allow us to cache it, as opposed to having
> to go back to the object every time?" Of course, this would declare
> that the helper can only be used for that particular field, but that
> is sort of the point of such a change, to allow us to later define
> the e->size_ field to different sizes without affecting other stuff.
This is why I do "size_valid = size_ == size". In my private build, I
reduced size_ to less than 32 bits and change the "fits_in_32bits"
function to do something like
int fits_in_32bits(unsigned long size)
{
struct object_entry e;
e.size_ = size;
return e.size_ == size.
}
which makes sure it always works. This spreads the use of "valid = xx
== yy" in more places though. I think if we just limit the use of
this expression in a couple access wrappers than it's not so bad.
>> + if (!e->size_valid) {
>> + unsigned long real_size;
>> +
>> + if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 ||
>> + size != real_size)
>> + die("BUG: 'size' is supposed to be the object size!");
>> + }
>
> If an object that is smaller than 4GB is fed to this function with
> an incorrect size, we happily record it in e->size_ and declare it
> is valid. Wouldn't that be equally grave error as we are catching
> in this block?
That adds an extra sha1_object_info() to all objects and it's
expensive (I think it's one of the reasons we cache values in
object_entry in the first place). I think there are also a few
occasions we reuse even bad in-pack objects (there are even tests for
that) so it's not always safe to die() here.
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-19 16:54 ` Duy Nguyen
@ 2018-03-19 18:29 ` Junio C Hamano
2018-03-19 18:45 ` Duy Nguyen
0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2018-03-19 18:29 UTC (permalink / raw)
To: Duy Nguyen
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
Duy Nguyen <pclouds@gmail.com> writes:
> This is why I do "size_valid = size_ == size". In my private build, I
> reduced size_ to less than 32 bits and change the "fits_in_32bits"
> function to do something like
>
> int fits_in_32bits(unsigned long size)
> {
> struct object_entry e;
> e.size_ = size;
> return e.size_ == size.
> }
>
> which makes sure it always works. This spreads the use of "valid = xx
> == yy" in more places though. I think if we just limit the use of
> this expression in a couple access wrappers than it's not so bad.
Yes, but then we should name the helper so that it is clear that it
is not about 32-bit but is about the width of e.size_ field.
>
>>> + if (!e->size_valid) {
>>> + unsigned long real_size;
>>> +
>>> + if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 ||
>>> + size != real_size)
>>> + die("BUG: 'size' is supposed to be the object size!");
>>> + }
>>
>> If an object that is smaller than 4GB is fed to this function with
>> an incorrect size, we happily record it in e->size_ and declare it
>> is valid. Wouldn't that be equally grave error as we are catching
>> in this block?
>
> That adds an extra sha1_object_info() to all objects and it's
> expensive (I think it's one of the reasons we cache values in
> object_entry in the first place). I think there are also a few
> occasions we reuse even bad in-pack objects (there are even tests for
> that) so it's not always safe to die() here.
So what? My point is that I do not see the point in checking if the
size is correct on only one side (i.e. size is too big to fit in
e->size_) and not the other. If it is worth checking (perhaps under
"#ifndef NDEBUG" or some other debug option?) then I'd think we
should spend cycles for all objects and check.
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-19 18:29 ` Junio C Hamano
@ 2018-03-19 18:45 ` Duy Nguyen
2018-03-19 20:10 ` Junio C Hamano
0 siblings, 1 reply; 273+ messages in thread
From: Duy Nguyen @ 2018-03-19 18:45 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
On Mon, Mar 19, 2018 at 7:29 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>> + if (!e->size_valid) {
>>>> + unsigned long real_size;
>>>> +
>>>> + if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 ||
>>>> + size != real_size)
>>>> + die("BUG: 'size' is supposed to be the object size!");
>>>> + }
>>>
>>> If an object that is smaller than 4GB is fed to this function with
>>> an incorrect size, we happily record it in e->size_ and declare it
>>> is valid. Wouldn't that be equally grave error as we are catching
>>> in this block?
>>
>> That adds an extra sha1_object_info() to all objects and it's
>> expensive (I think it's one of the reasons we cache values in
>> object_entry in the first place). I think there are also a few
>> occasions we reuse even bad in-pack objects (there are even tests for
>> that) so it's not always safe to die() here.
>
> So what? My point is that I do not see the point in checking if the
> size is correct on only one side (i.e. size is too big to fit in
> e->size_) and not the other. If it is worth checking (perhaps under
> "#ifndef NDEBUG" or some other debug option?) then I'd think we
> should spend cycles for all objects and check.
There is a difference. For sizes smaller than 2^32, whatever you pass
to oe_set_size() will be returned by oe_size(), consistently. It does
not matter if this size is "good" or not. With sizes > 2^32, we make
the assumption that this size must be the same as one found in the
object database. If it's different, oe_size() will return something
else other than oe_set_size() is given. This check here is to make
sure we do not accidentally let the caller fall into this trap.
Yes, it may be a good thing to check anyway even for sizes < 2^32. I'm
a bit uncomfortable doing that though. I was trying to exercise this
code the other day by reducing size_ field down to 4 bits, and a
couple tests broke but I still don't understand how. It's probably
just me pushing the limits too hard, not a bug in these changes. But
it does tell me that I don't understand pack-objects enough to assert
that "all calls to oe_set_size() give good size".
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-19 18:45 ` Duy Nguyen
@ 2018-03-19 20:10 ` Junio C Hamano
2018-03-20 18:08 ` Duy Nguyen
0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2018-03-19 20:10 UTC (permalink / raw)
To: Duy Nguyen
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
Duy Nguyen <pclouds@gmail.com> writes:
> There is a difference. For sizes smaller than 2^32, whatever you
> pass to oe_set_size() will be returned by oe_size(),
> consistently. It does not matter if this size is "good" .... If
> it's different, oe_size() will return something else other than
> oe_set_size() is given.
OK, fair enough.
> ... I was trying to exercise this
> code the other day by reducing size_ field down to 4 bits, and a
> couple tests broke but I still don't understand how.
Off by one? Two or more copies of the same objects available whose
oe_size() are different?
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-19 20:10 ` Junio C Hamano
@ 2018-03-20 18:08 ` Duy Nguyen
2018-03-20 18:22 ` Junio C Hamano
2018-03-21 8:03 ` Jeff King
0 siblings, 2 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-20 18:08 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
On Mon, Mar 19, 2018 at 01:10:49PM -0700, Junio C Hamano wrote:
> > ... I was trying to exercise this
> > code the other day by reducing size_ field down to 4 bits, and a
> > couple tests broke but I still don't understand how.
>
> Off by one? Two or more copies of the same objects available whose
> oe_size() are different?
>
No. I did indeed not understand pack-objects enough :)
This "size" field contains the delta size if the in-pack object is a
delta. So blindly falling back to object_sha1_info() which returns the
canonical object size is definitely wrong. Please eject the series
from 'pu' until I fix this. The bug won't likely affect anyone (since
they must have 4GB+ objects to trigger it) but better safe than sorry.
BTW can you apply this patch? This broken && chain made me think the
problem was in the next test. It would have saved me lots of time if I
saw this "BUG" line coming from the previous test.
-- 8< --
Subject: [PATCH] t9300: fix broken && chain
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
t/t9300-fast-import.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index e4d06accc4..e2a0ae4075 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -348,7 +348,7 @@ test_expect_success 'B: accept branch name "TEMP_TAG"' '
INPUT_END
test_when_finished "rm -f .git/TEMP_TAG
- git gc
+ git gc &&
git prune" &&
git fast-import <input &&
test -f .git/TEMP_TAG &&
@@ -365,7 +365,7 @@ test_expect_success 'B: accept empty committer' '
INPUT_END
test_when_finished "git update-ref -d refs/heads/empty-committer-1
- git gc
+ git gc &&
git prune" &&
git fast-import <input &&
out=$(git fsck) &&
--
2.17.0.rc0.348.gd5a49e0b6f
-- 8< --
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-20 18:08 ` Duy Nguyen
@ 2018-03-20 18:22 ` Junio C Hamano
2018-03-21 8:03 ` Jeff King
1 sibling, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2018-03-20 18:22 UTC (permalink / raw)
To: Duy Nguyen
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
Duy Nguyen <pclouds@gmail.com> writes:
> This "size" field contains the delta size if the in-pack object is a
> delta. So blindly falling back to object_sha1_info() which returns the
> canonical object size is definitely wrong.
Yup. Also we need to be careful when going back to the packfile to
read the size in question. A different packfile that has the same
object may have delta that was constructed differently and of wrong
size.
> Please eject the series
> from 'pu' until I fix this. The bug won't likely affect anyone (since
> they must have 4GB+ objects to trigger it) but better safe than sorry.
> BTW can you apply this patch? This broken && chain made me think the
> problem was in the next test. It would have saved me lots of time if I
> saw this "BUG" line coming from the previous test.
Thanks, will do.
>
> -- 8< --
> Subject: [PATCH] t9300: fix broken && chain
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> t/t9300-fast-import.sh | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index e4d06accc4..e2a0ae4075 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -348,7 +348,7 @@ test_expect_success 'B: accept branch name "TEMP_TAG"' '
> INPUT_END
>
> test_when_finished "rm -f .git/TEMP_TAG
> - git gc
> + git gc &&
> git prune" &&
> git fast-import <input &&
> test -f .git/TEMP_TAG &&
> @@ -365,7 +365,7 @@ test_expect_success 'B: accept empty committer' '
> INPUT_END
>
> test_when_finished "git update-ref -d refs/heads/empty-committer-1
> - git gc
> + git gc &&
> git prune" &&
> git fast-import <input &&
> out=$(git fsck) &&
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-20 18:08 ` Duy Nguyen
2018-03-20 18:22 ` Junio C Hamano
@ 2018-03-21 8:03 ` Jeff King
2018-03-21 16:12 ` Duy Nguyen
1 sibling, 1 reply; 273+ messages in thread
From: Jeff King @ 2018-03-21 8:03 UTC (permalink / raw)
To: Duy Nguyen
Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
Eric Wong, Git Mailing List
On Tue, Mar 20, 2018 at 07:08:07PM +0100, Duy Nguyen wrote:
> BTW can you apply this patch? This broken && chain made me think the
> problem was in the next test. It would have saved me lots of time if I
> saw this "BUG" line coming from the previous test.
>
> -- 8< --
> Subject: [PATCH] t9300: fix broken && chain
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> t/t9300-fast-import.sh | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index e4d06accc4..e2a0ae4075 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -348,7 +348,7 @@ test_expect_success 'B: accept branch name "TEMP_TAG"' '
> INPUT_END
>
> test_when_finished "rm -f .git/TEMP_TAG
> - git gc
> + git gc &&
> git prune" &&
The &&-chain is broken from the first command, too. It's "rm -f", which
is not that big a deal, but...
> @@ -365,7 +365,7 @@ test_expect_success 'B: accept empty committer' '
> INPUT_END
>
> test_when_finished "git update-ref -d refs/heads/empty-committer-1
> - git gc
> + git gc &&
> git prune" &&
Same here, but we probably care more about noticing update-ref failure.
-Peff
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-21 8:03 ` Jeff King
@ 2018-03-21 16:12 ` Duy Nguyen
0 siblings, 0 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-21 16:12 UTC (permalink / raw)
To: Jeff King
Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
Eric Wong, Git Mailing List
On Wed, Mar 21, 2018 at 9:03 AM, Jeff King <peff@peff.net> wrote:
> On Tue, Mar 20, 2018 at 07:08:07PM +0100, Duy Nguyen wrote:
>
>> BTW can you apply this patch? This broken && chain made me think the
>> problem was in the next test. It would have saved me lots of time if I
>> saw this "BUG" line coming from the previous test.
>>
>> -- 8< --
>> Subject: [PATCH] t9300: fix broken && chain
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>> t/t9300-fast-import.sh | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
>> index e4d06accc4..e2a0ae4075 100755
>> --- a/t/t9300-fast-import.sh
>> +++ b/t/t9300-fast-import.sh
>> @@ -348,7 +348,7 @@ test_expect_success 'B: accept branch name "TEMP_TAG"' '
>> INPUT_END
>>
>> test_when_finished "rm -f .git/TEMP_TAG
>> - git gc
>> + git gc &&
>> git prune" &&
>
> The &&-chain is broken from the first command, too. It's "rm -f", which
> is not that big a deal, but...
>
>> @@ -365,7 +365,7 @@ test_expect_success 'B: accept empty committer' '
>> INPUT_END
>>
>> test_when_finished "git update-ref -d refs/heads/empty-committer-1
>> - git gc
>> + git gc &&
>> git prune" &&
>
> Same here, but we probably care more about noticing update-ref failure.
Yes. I wasn't sure if that update-ref could fail but did not check
since this was a side issue for me.
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 09/11] pack-objects: shrink size field in struct object_entry
2018-03-19 16:43 ` Junio C Hamano
2018-03-19 16:54 ` Duy Nguyen
@ 2018-03-20 18:17 ` Duy Nguyen
1 sibling, 0 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-20 18:17 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Jeff King
On Mon, Mar 19, 2018 at 5:43 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> +static inline void oe_set_size(struct object_entry *e,
>> + unsigned long size)
>> +{
>> + e->size_ = size;
>> + e->size_valid = e->size_ == size;
>
> A quite similar comment as my earlier one applies here. I wonder if
> this is easier to read?
>
> e->size_valid = fits_in_32bits(size);
> if (e->size_valid)
> e->size_ = size;
I wonder if wrapping this "==" with something like this would help readability?
#define truncated(a,b) (a) != (b)
Then we could write
e->size_valid = !truncated(e->size_, size);
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* [PATCH v6 10/11] pack-objects: shrink delta_size field in struct object_entry
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
` (8 preceding siblings ...)
2018-03-18 14:25 ` [PATCH v6 09/11] pack-objects: shrink size " Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:25 ` [PATCH v6 11/11] pack-objects: reorder members to shrink " Nguyễn Thái Ngọc Duy
` (3 subsequent siblings)
13 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Allowing a delta size of 64 bits is crazy. Shrink this field down to
31 bits with one overflow bit.
If we find an existing delta larger than 2GB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 24 ++++++++++++++----------
pack-objects.h | 23 ++++++++++++++++++++++-
2 files changed, 36 insertions(+), 11 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 89ed4b5125..4406af640f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,10 +30,12 @@
#include "packfile.h"
#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
+#define DELTA_SIZE(obj) oe_delta_size(&to_pack, obj)
#define DELTA(obj) oe_delta(&to_pack, obj)
#define DELTA_CHILD(obj) oe_delta_child(&to_pack, obj)
#define DELTA_SIBLING(obj) oe_delta_sibling(&to_pack, obj)
#define SET_DELTA(obj, val) oe_set_delta(&to_pack, obj, val)
+#define SET_DELTA_SIZE(obj, val) oe_set_delta_size(&to_pack, obj, val)
#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(&to_pack, obj, val)
#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(&to_pack, obj, val)
@@ -140,7 +142,7 @@ static void *get_delta(struct object_entry *entry)
oid_to_hex(&DELTA(entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
buf, size, &delta_size, 0);
- if (!delta_buf || delta_size != entry->delta_size)
+ if (!delta_buf || delta_size != DELTA_SIZE(entry))
die("delta size changed");
free(buf);
free(base_buf);
@@ -291,14 +293,14 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
FREE_AND_NULL(entry->delta_data);
entry->z_delta_size = 0;
} else if (entry->delta_data) {
- size = entry->delta_size;
+ size = DELTA_SIZE(entry);
buf = entry->delta_data;
entry->delta_data = NULL;
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
- size = entry->delta_size;
+ size = DELTA_SIZE(entry);
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
@@ -1509,7 +1511,7 @@ static void check_object(struct object_entry *entry)
*/
oe_set_type(entry, entry->in_pack_type);
SET_DELTA(entry, base_entry);
- entry->delta_size = oe_size(entry);
+ SET_DELTA_SIZE(entry, oe_size(entry));
entry->delta_sibling_idx = base_entry->delta_child_idx;
SET_DELTA_CHILD(base_entry, entry);
unuse_pack(&w_curs);
@@ -1895,7 +1897,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
- max_size = trg_entry->delta_size;
+ max_size = DELTA_SIZE(trg_entry);
ref_depth = trg->depth;
}
max_size = (uint64_t)max_size * (max_depth - src->depth) /
@@ -1966,10 +1968,12 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
if (!delta_buf)
return 0;
+ if (delta_size >= (1 << OE_DELTA_SIZE_BITS))
+ return 0;
if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
- if (delta_size == trg_entry->delta_size &&
+ if (delta_size == DELTA_SIZE(trg_entry) &&
src->depth + 1 >= trg->depth) {
free(delta_buf);
return 0;
@@ -1984,7 +1988,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
free(trg_entry->delta_data);
cache_lock();
if (trg_entry->delta_data) {
- delta_cache_size -= trg_entry->delta_size;
+ delta_cache_size -= DELTA_SIZE(trg_entry);
trg_entry->delta_data = NULL;
}
if (delta_cacheable(src_size, trg_size, delta_size)) {
@@ -1997,7 +2001,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
}
SET_DELTA(trg_entry, src_entry);
- trg_entry->delta_size = delta_size;
+ SET_DELTA_SIZE(trg_entry, delta_size);
trg->depth = src->depth + 1;
return 1;
@@ -2120,11 +2124,11 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
if (entry->delta_data && !pack_to_stdout) {
unsigned long size;
- size = do_compress(&entry->delta_data, entry->delta_size);
+ size = do_compress(&entry->delta_data, DELTA_SIZE(entry));
if (size < (1 << OE_Z_DELTA_BITS)) {
entry->z_delta_size = size;
cache_lock();
- delta_cache_size -= entry->delta_size;
+ delta_cache_size -= DELTA_SIZE(entry);
delta_cache_size += entry->z_delta_size;
cache_unlock();
} else {
diff --git a/pack-objects.h b/pack-objects.h
index 0beedbc637..cbd5cf61ca 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -5,6 +5,7 @@
#define OE_DEPTH_BITS 12
#define OE_IN_PACK_BITS 14
#define OE_Z_DELTA_BITS 16
+#define OE_DELTA_SIZE_BITS 31
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -81,7 +82,8 @@ struct object_entry {
* uses the same base as me
*/
void *delta_data; /* cached delta (uncompressed) */
- unsigned long delta_size; /* delta data size (uncompressed) */
+ uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
+ uint32_t delta_size_valid:1;
unsigned z_delta_size:OE_Z_DELTA_BITS;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
@@ -314,4 +316,23 @@ static inline void oe_set_size(struct object_entry *e,
}
}
+static inline unsigned long oe_delta_size(struct packing_data *pack,
+ const struct object_entry *e)
+{
+ if (e->delta_size_valid)
+ return e->delta_size_;
+ return oe_size(e);
+}
+
+static inline void oe_set_delta_size(struct packing_data *pack,
+ struct object_entry *e,
+ unsigned long size)
+{
+ e->delta_size_ = size;
+ e->delta_size_valid = e->delta_size_ == size;
+ if (!e->delta_size_valid && size != oe_size(e))
+ die("BUG: this can only happen in check_object() "
+ "where delta size is the same as entry size");
+}
+
#endif
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* [PATCH v6 11/11] pack-objects: reorder members to shrink struct object_entry
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
` (9 preceding siblings ...)
2018-03-18 14:25 ` [PATCH v6 10/11] pack-objects: shrink delta_size " Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:25 ` Nguyễn Thái Ngọc Duy
2018-03-18 14:51 ` [PATCH v6 00/11] nd/pack-objects-pack-struct updates Ævar Arnfjörð Bjarmason
` (2 subsequent siblings)
13 siblings, 0 replies; 273+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-03-18 14:25 UTC (permalink / raw)
To: pclouds; +Cc: avarab, e, git, gitster, peff
Previous patches leave lots of holes and padding in this struct. This
patch reorders the members and shrinks the struct down to 80 bytes
(from 136 bytes, before any field shrinking is done) with 16 bits to
spare (and a couple more in in_pack_header_size when we really run out
of bits).
This is the last in a series of memory reduction patches (see
"pack-objects: a bit of document about struct object_entry" for the
first one).
Overall they've reduced repack memory size on linux-2.6.git from
3.747G to 3.424G, or by around 320M, a decrease of 8.5%. The runtime
of repack has stayed the same throughout this series. Ævar's testing
on a big monorepo he has access to (bigger than linux-2.6.git) has
shown a 7.9% reduction, so the overall expected improvement should be
somewhere around 8%.
See 87po42cwql.fsf@evledraar.gmail.com on-list
(https://public-inbox.org/git/87po42cwql.fsf@evledraar.gmail.com/) for
more detailed numbers and a test script used to produce the numbers
cited above.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
pack-objects.h | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/pack-objects.h b/pack-objects.h
index cbd5cf61ca..af40211105 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -71,35 +71,36 @@ enum dfs_state {
*/
struct object_entry {
struct pack_idx_entry idx;
- /* object uncompressed size _if_ size_valid is true */
- uint32_t size_;
- unsigned size_valid:1;
- unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
+ void *delta_data; /* cached delta (uncompressed) */
off_t in_pack_offset;
+ uint32_t hash; /* name hint hash */
+ uint32_t size_; /* object uncompressed size _if_ size_valid is true */
uint32_t delta_idx; /* delta base object */
uint32_t delta_child_idx; /* deltified objects who bases me */
uint32_t delta_sibling_idx; /* other deltified objects who
* uses the same base as me
*/
- void *delta_data; /* cached delta (uncompressed) */
uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
uint32_t delta_size_valid:1;
+ unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
+ unsigned size_valid:1;
unsigned z_delta_size:OE_Z_DELTA_BITS;
+ unsigned type_valid:1;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
- unsigned type_valid:1;
- uint32_t hash; /* name hint hash */
- unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
* objects against.
*/
unsigned no_try_delta:1;
+ unsigned char in_pack_header_size;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
unsigned depth:OE_DEPTH_BITS;
+
+ /* size: 80, bit_padding: 16 bits */
};
struct packing_data {
--
2.17.0.rc0.347.gf9cf61673a
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
` (10 preceding siblings ...)
2018-03-18 14:25 ` [PATCH v6 11/11] pack-objects: reorder members to shrink " Nguyễn Thái Ngọc Duy
@ 2018-03-18 14:51 ` Ævar Arnfjörð Bjarmason
2018-03-21 8:24 ` Jeff King
2018-03-24 6:33 ` [PATCH v7 00/13] " Nguyễn Thái Ngọc Duy
13 siblings, 0 replies; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-18 14:51 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: e, git, gitster, peff
On Sun, Mar 18 2018, Nguyễn Thái Ngọc Duy jotted:
> v6 fixes the one optimization that I just couldn't get right, fixes
> two off-by-one error messages and a couple commit message update
> (biggest change is in 11/11 to record some numbers from AEvar)
Thanks, aside from the minor typo I just noted in
https://public-inbox.org/git/878tapcucc.fsf@evledraar.gmail.com/ (which
I trust Junio can fix up) this all looks good to me.
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-18 14:25 ` [PATCH v6 " Nguyễn Thái Ngọc Duy
` (11 preceding siblings ...)
2018-03-18 14:51 ` [PATCH v6 00/11] nd/pack-objects-pack-struct updates Ævar Arnfjörð Bjarmason
@ 2018-03-21 8:24 ` Jeff King
2018-03-21 15:59 ` Duy Nguyen
2018-03-21 16:31 ` Ævar Arnfjörð Bjarmason
2018-03-24 6:33 ` [PATCH v7 00/13] " Nguyễn Thái Ngọc Duy
13 siblings, 2 replies; 273+ messages in thread
From: Jeff King @ 2018-03-21 8:24 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: avarab, e, git, gitster
On Sun, Mar 18, 2018 at 03:25:15PM +0100, Nguyễn Thái Ngọc Duy wrote:
> v6 fixes the one optimization that I just couldn't get right, fixes
> two off-by-one error messages and a couple commit message update
> (biggest change is in 11/11 to record some numbers from AEvar)
I was traveling during some of the earlier rounds, so I finally got a
chance to take a look at this.
I hate to be a wet blanket, but am I the only one who is wondering
whether the tradeoffs is worth it? 8% memory reduction doesn't seem
mind-bogglingly good, and I'm concerned about two things:
1. The resulting code is harder to read and reason about (things like
the DELTA() macros), and seems a lot more brittle (things like the
new size_valid checks).
2. There are lots of new limits. Some of these are probably fine
(e.g., the cacheable delta size), but things like the
number-of-packs limit don't have very good user-facing behavior.
Yes, having that many packs is insane, but that's going to be small
consolation to somebody whose automated maintenance program now
craps out at 16k packs, when it previously would have just worked
to fix the situation.
Saving 8% is nice, but the number of objects in linux.git grew over 12%
in the last year. So you've bought yourself 8 months before the problem
is back. Is it worth making these changes that we'll have to deal with
for many years to buy 8 months of memory savings?
I think ultimately to work on low-memory machines we'll need a
fundamentally different approach that scales with the objects since the
last pack, and not with the complete history.
-Peff
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-21 8:24 ` Jeff King
@ 2018-03-21 15:59 ` Duy Nguyen
2018-03-21 16:17 ` Ævar Arnfjörð Bjarmason
` (2 more replies)
2018-03-21 16:31 ` Ævar Arnfjörð Bjarmason
1 sibling, 3 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-21 15:59 UTC (permalink / raw)
To: Jeff King
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Junio C Hamano
On Wed, Mar 21, 2018 at 9:24 AM, Jeff King <peff@peff.net> wrote:
> On Sun, Mar 18, 2018 at 03:25:15PM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> v6 fixes the one optimization that I just couldn't get right, fixes
>> two off-by-one error messages and a couple commit message update
>> (biggest change is in 11/11 to record some numbers from AEvar)
>
> I was traveling during some of the earlier rounds, so I finally got a
> chance to take a look at this.
>
> I hate to be a wet blanket, but am I the only one who is wondering
> whether the tradeoffs is worth it? 8% memory reduction doesn't seem
> mind-bogglingly good,
AEvar measured RSS. If we count objects[] array alone, the saving is
40% (136 bytes per entry down to 80). Some is probably eaten up by
mmap in rss.
> and I'm concerned about two things:
>
> 1. The resulting code is harder to read and reason about (things like
> the DELTA() macros), and seems a lot more brittle (things like the
> new size_valid checks).
>
> 2. There are lots of new limits. Some of these are probably fine
> (e.g., the cacheable delta size), but things like the
> number-of-packs limit don't have very good user-facing behavior.
> Yes, having that many packs is insane, but that's going to be small
> consolation to somebody whose automated maintenance program now
> craps out at 16k packs, when it previously would have just worked
> to fix the situation.
>
> Saving 8% is nice, but the number of objects in linux.git grew over 12%
> in the last year. So you've bought yourself 8 months before the problem
> is back. Is it worth making these changes that we'll have to deal with
> for many years to buy 8 months of memory savings?
Well, with 40% it buys us a couple more months. The object growth
affects rev-list --all too so the actual "good months" is probably not
super far from 8 months.
Is it worth saving? I don't know. I raised the readability point from
the very first patch and if people believe it makes it much harder to
read, then no it's not worth it.
While pack-objects is simple from the functionality point of view, it
has received lots of optimizations and to me is quite fragile.
Readability does count in this code. Fortunately it still looks quite
ok to me with this series applied (but then it's subjective)
About the 16k limit (and some other limits as well), I'm making these
patches with the assumption that large scale deployment probably will
go with custom builds anyway. Adjusting the limits back should be
quite easy while we can still provide reasonable defaults for most
people.
> I think ultimately to work on low-memory machines we'll need a
> fundamentally different approach that scales with the objects since the
> last pack, and not with the complete history.
Absolutely. Which is covered in a separate "gc --auto" series. Some
memory reduction here may be still nice to have though. Even on beefy
machine, memory can still be reused somewhere other than wasted in
unused bits.
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-21 15:59 ` Duy Nguyen
@ 2018-03-21 16:17 ` Ævar Arnfjörð Bjarmason
2018-03-21 16:22 ` Duy Nguyen
2018-03-21 16:46 ` Duy Nguyen
2018-03-22 9:32 ` Jeff King
2 siblings, 1 reply; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-21 16:17 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Jeff King, Eric Wong, Git Mailing List, Junio C Hamano
On Wed, Mar 21 2018, Duy Nguyen wrote:
> On Wed, Mar 21, 2018 at 9:24 AM, Jeff King <peff@peff.net> wrote:
>> On Sun, Mar 18, 2018 at 03:25:15PM +0100, Nguyễn Thái Ngọc Duy wrote:
>>
>>> v6 fixes the one optimization that I just couldn't get right, fixes
>>> two off-by-one error messages and a couple commit message update
>>> (biggest change is in 11/11 to record some numbers from AEvar)
>>
>> I was traveling during some of the earlier rounds, so I finally got a
>> chance to take a look at this.
>>
>> I hate to be a wet blanket, but am I the only one who is wondering
>> whether the tradeoffs is worth it? 8% memory reduction doesn't seem
>> mind-bogglingly good,
>
> AEvar measured RSS. If we count objects[] array alone, the saving is
> 40% (136 bytes per entry down to 80). Some is probably eaten up by
> mmap in rss.
Yeah, sorry about spreading that confusion.
>> and I'm concerned about two things:
>>
>> 1. The resulting code is harder to read and reason about (things like
>> the DELTA() macros), and seems a lot more brittle (things like the
>> new size_valid checks).
>>
>> 2. There are lots of new limits. Some of these are probably fine
>> (e.g., the cacheable delta size), but things like the
>> number-of-packs limit don't have very good user-facing behavior.
>> Yes, having that many packs is insane, but that's going to be small
>> consolation to somebody whose automated maintenance program now
>> craps out at 16k packs, when it previously would have just worked
>> to fix the situation.
>>
>> Saving 8% is nice, but the number of objects in linux.git grew over 12%
>> in the last year. So you've bought yourself 8 months before the problem
>> is back. Is it worth making these changes that we'll have to deal with
>> for many years to buy 8 months of memory savings?
>
> Well, with 40% it buys us a couple more months. The object growth
> affects rev-list --all too so the actual "good months" is probably not
> super far from 8 months.
>
> Is it worth saving? I don't know. I raised the readability point from
> the very first patch and if people believe it makes it much harder to
> read, then no it's not worth it.
>
> While pack-objects is simple from the functionality point of view, it
> has received lots of optimizations and to me is quite fragile.
> Readability does count in this code. Fortunately it still looks quite
> ok to me with this series applied (but then it's subjective)
>
> About the 16k limit (and some other limits as well), I'm making these
> patches with the assumption that large scale deployment probably will
> go with custom builds anyway. Adjusting the limits back should be
> quite easy while we can still provide reasonable defaults for most
> people.
>
>> I think ultimately to work on low-memory machines we'll need a
>> fundamentally different approach that scales with the objects since the
>> last pack, and not with the complete history.
>
> Absolutely. Which is covered in a separate "gc --auto" series. Some
> memory reduction here may be still nice to have though. Even on beefy
> machine, memory can still be reused somewhere other than wasted in
> unused bits.
FWIW I've been running a combination of these two at work (also keeping
the big pack), and they've had a sizable impact on packing our monorepo,
on one of our dev boxes on a real-world checkout with a combo of the
"base" pack and other packs + loose objects, as measured by
/usr/bin/time
* Reduction in user time by 37%
* Reduction in system time by 84%
* Reduction in RSS by 61%
* Reduction in page faults by 58% & 94% (time(1) reports two different numbers)
* Reduction in the I of I/O by 58%
* Reduction in the O of I/O by 94%
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-21 16:17 ` Ævar Arnfjörð Bjarmason
@ 2018-03-21 16:22 ` Duy Nguyen
0 siblings, 0 replies; 273+ messages in thread
From: Duy Nguyen @ 2018-03-21 16:22 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Jeff King, Eric Wong, Git Mailing List, Junio C Hamano
On Wed, Mar 21, 2018 at 5:17 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>>> I think ultimately to work on low-memory machines we'll need a
>>> fundamentally different approach that scales with the objects since the
>>> last pack, and not with the complete history.
>>
>> Absolutely. Which is covered in a separate "gc --auto" series. Some
>> memory reduction here may be still nice to have though. Even on beefy
>> machine, memory can still be reused somewhere other than wasted in
>> unused bits.
>
> FWIW I've been running a combination of these two at work (also keeping
> the big pack), and they've had a sizable impact on packing our monorepo,
> on one of our dev boxes on a real-world checkout with a combo of the
> "base" pack and other packs + loose objects, as measured by
> /usr/bin/time
>
> * Reduction in user time by 37%
> * Reduction in system time by 84%
> * Reduction in RSS by 61%
> * Reduction in page faults by 58% & 94% (time(1) reports two different numbers)
> * Reduction in the I of I/O by 58%
> * Reduction in the O of I/O by 94%
The keeping big pack changes very likely contributes to most of this
reduction, so just to be clear these numbers can't be be used as an
argument in favor of this pack-objects series (but otherwise, wow! I
guess I need to finish up the gc series soon, then start the external
rev-list work to reduce even more ;-)
--
Duy
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-21 15:59 ` Duy Nguyen
2018-03-21 16:17 ` Ævar Arnfjörð Bjarmason
@ 2018-03-21 16:46 ` Duy Nguyen
2018-03-21 19:11 ` Junio C Hamano
2018-03-22 9:32 ` Jeff King
2 siblings, 1 reply; 273+ messages in thread
From: Duy Nguyen @ 2018-03-21 16:46 UTC (permalink / raw)
To: Jeff King
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Junio C Hamano
On Wed, Mar 21, 2018 at 04:59:19PM +0100, Duy Nguyen wrote:
> About the 16k limit (and some other limits as well), I'm making these
> patches with the assumption that large scale deployment probably will
> go with custom builds anyway. Adjusting the limits back should be
> quite easy while we can still provide reasonable defaults for most
> people.
And we could even do something like this to make custom builds
easier. Some more gluing is needed so you can set this from config.mak
but you get the idea. This removes all limits set by this
series. Readability in pack-objects.c and object_entry struct
declaration is still a concern though.
-- 8< --
diff --git a/pack-objects.h b/pack-objects.h
index af40211105..b6e84c9b48 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -2,10 +2,17 @@
#define PACK_OBJECTS_H
#define OE_DFS_STATE_BITS 2
+#ifdef PACK_OBJECTS_BIG_MEMORY
+#define OE_DEPTH_BITS 31
+/* OE_IN_PACK_BITS is not defined */
+#define OE_Z_DELTA_BITS 32
+#define OE_DELTA_SIZE_BITS 32
+#else
#define OE_DEPTH_BITS 12
#define OE_IN_PACK_BITS 14
#define OE_Z_DELTA_BITS 16
#define OE_DELTA_SIZE_BITS 31
+#endif
/*
* State flags for depth-first search used for analyzing delta cycles.
@@ -82,7 +89,11 @@ struct object_entry {
*/
uint32_t delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
uint32_t delta_size_valid:1;
+#ifdef PACK_OBJECTS_BIG_MEMORY
+ struct packed_git *in_pack; /* already in pack */
+#else
unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */
+#endif
unsigned size_valid:1;
unsigned z_delta_size:OE_Z_DELTA_BITS;
unsigned type_valid:1;
@@ -112,7 +123,9 @@ struct packing_data {
unsigned int *in_pack_pos;
int in_pack_count;
+#ifndef PACK_OBJECTS_BIG_MEMORY
struct packed_git *in_pack[1 << OE_IN_PACK_BITS];
+#endif
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -174,6 +187,9 @@ static inline void oe_set_in_pack_pos(const struct packing_data *pack,
static inline unsigned int oe_add_pack(struct packing_data *pack,
struct packed_git *p)
{
+#ifdef PACK_OBJECTS_BIG_MEMORY
+ return 0;
+#else
if (pack->in_pack_count >= (1 << OE_IN_PACK_BITS))
die(_("too many packs to handle in one go. "
"Please add .keep files to exclude\n"
@@ -187,22 +203,31 @@ static inline unsigned int oe_add_pack(struct packing_data *pack,
}
pack->in_pack[pack->in_pack_count] = p;
return pack->in_pack_count++;
+#endif
}
static inline struct packed_git *oe_in_pack(const struct packing_data *pack,
const struct object_entry *e)
{
+#ifdef PACK_OBJECTS_BIG_MEMORY
+ return e->in_pack;
+#else
return pack->in_pack[e->in_pack_idx];
+#endif
}
static inline void oe_set_in_pack(struct object_entry *e,
struct packed_git *p)
{
+#ifdef PACK_OBJECTS_BIG_MEMORY
+ e->in_pack = p;
+#else
if (p->index <= 0)
die("BUG: found_pack should be NULL "
"instead of having non-positive index");
e->in_pack_idx = p->index;
+#endif
}
-- 8< --
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-21 16:46 ` Duy Nguyen
@ 2018-03-21 19:11 ` Junio C Hamano
0 siblings, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2018-03-21 19:11 UTC (permalink / raw)
To: Duy Nguyen
Cc: Jeff King, Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List
Duy Nguyen <pclouds@gmail.com> writes:
> And we could even do something like this to make custom builds
> easier. Some more gluing is needed so you can set this from config.mak
> but you get the idea. This removes all limits set by this
> series.
Yes, we _could_, but it would mean we would have many variants of
the codepath that is pretty crucial to the integrity of the data we
keep in the repository, all of which must pretty much be bug-free.
> Readability in pack-objects.c and object_entry struct declaration
> is still a concern though.
Yup, a change like this does not change the readability; personally,
I do not think the original is _too_ bad, though.
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-21 15:59 ` Duy Nguyen
2018-03-21 16:17 ` Ævar Arnfjörð Bjarmason
2018-03-21 16:46 ` Duy Nguyen
@ 2018-03-22 9:32 ` Jeff King
2018-03-22 9:46 ` Jeff King
` (2 more replies)
2 siblings, 3 replies; 273+ messages in thread
From: Jeff King @ 2018-03-22 9:32 UTC (permalink / raw)
To: Duy Nguyen
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Junio C Hamano
On Wed, Mar 21, 2018 at 04:59:19PM +0100, Duy Nguyen wrote:
> > I hate to be a wet blanket, but am I the only one who is wondering
> > whether the tradeoffs is worth it? 8% memory reduction doesn't seem
> > mind-bogglingly good,
>
> AEvar measured RSS. If we count objects[] array alone, the saving is
> 40% (136 bytes per entry down to 80). Some is probably eaten up by
> mmap in rss.
Measuring actual heap usage with massif, I get before/after peak heaps
of 1728 and 1346MB respectively when repacking linux.git. So that's ~22%
savings overall.
Of the used heap after your patches:
- ~40% of that is from packlist_alloc()
- ~17% goes to "struct object"
- ~10% for the object.c hash table to store all the "struct object"
- ~7% goes to the delta cache
- ~7% goes to the pack revindex (actually, there's a duplicate 7%
there, too; I think our peak is when we're sorting the revindex
and have to keep two copies in memory at once)
- ~5% goes to the packlist_find() hash table
- ~3.5% for the get_object_details() sorting list (this is only held
for a minute, but again, our peak comes during this sort, which
in turn loads the revindex)
So 27% of the total heap goes away if you switch to a separate rev-list.
Though it's mostly just going to a different process, it does help peak
because that process would have exited by the time we get to the
revindex bits.
I suspect you could get the same effect by just teaching pack-objects to
clear obj_hash and all of the allocated object structs. I think that
should be safe to do as long as we clear _all_ of the objects, so there
are no dangling pointers.
> About the 16k limit (and some other limits as well), I'm making these
> patches with the assumption that large scale deployment probably will
> go with custom builds anyway. Adjusting the limits back should be
> quite easy while we can still provide reasonable defaults for most
> people.
I think this 16k limit is the thing I _most_ dislike about the series.
If we could tweak that case such that we always made forward progress, I
think I'd be a lot less nervous. I responded elsewhere in the thread
(before seeing that both Junio and you seemed aware of the issues ;) ),
but I think it would be acceptable to have git-repack enforce the limit.
That would still mean you could get into a broken state for serving
fetches, but you could at least get out of it by running "git repack".
-Peff
^ permalink raw reply [flat|nested] 273+ messages in thread
* Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-22 9:32 ` Jeff King
@ 2018-03-22 9:46 ` Jeff King
2018-03-22 10:57 ` Duy Nguyen
2018-03-23 1:28 ` Ramsay Jones
2 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2018-03-22 9:46 UTC (permalink / raw)
To: Duy Nguyen
Cc: Ævar Arnfjörð Bjarmason, Eric Wong,
Git Mailing List, Junio C Hamano
On Thu, Mar 22, 2018 at 05:32:12AM -0400, Jeff King wrote:
> So 27% of the total heap goes away if you switch to a separate rev-list.
> Though it's mostly just going to a different process, it does help peak
> because that process would have exited by the time we get to the
> revindex bits.
>
> I suspect you could get the same effect by just teaching pack-objects to
> clear obj_hash and all of the allocated object structs. I think that
> should be safe to do as long as we clear _all_ of the objects, so there
> are no dangling pointers.
The patch below tries that. It's kind of hacky, but it drops my peak
heap for packing linux.git from 1336MB to 1129MB.
That's not quite as exciting as 27%, because it just moves our peak
earlier, to when we do have all of the object structs in memory (so the
savings are really just that we're not holding the revindex, etc at the
same time as the object structs).
But we also hold that peak for a lot shorter period, because we drop the
memory before we do any delta compression (which itself can be memory
hungry[1]), and don't hold onto it during the write phase (which can be
network-limited when serving a fetch). So during that write phase we're
holding only ~900MB instead of ~1250MB.
-Peff
[1] All of my timings are on noop repacks of a single pack, so there's
no actual delta compression. On average, it will use something like
"nr_threads * window * avg_blob_size". For a "normal" repo, that's
only a few megabytes. But the peak will depend on the large blobs,
so it could have some outsize cases. I don't know if it's worth
worrying about too much for this analysis.
---
Here's the patch. It's probably asking for trouble to have this kind of
clearing interface, as a surprising number of things may hold onto
pointers to objects (see the comment below about the bitmap code).
So maybe the separate process is less insane.
diff --git a/alloc.c b/alloc.c
index 12afadfacd..50d444a3b0 100644
--- a/alloc.c
+++ b/alloc.c
@@ -30,15 +30,23 @@ struct alloc_state {
int count; /* total number of nodes allocated */
int nr; /* number of nodes left in current allocation */
void *p; /* first free node in current allocation */
+
+ /* book-keeping for clearing */
+ void *start;
+ struct alloc_state *prev;
};
-static inline void *alloc_node(struct alloc_state *s, size_t node_size)
+static inline void *alloc_node(struct alloc_state **sp, size_t node_size)
{
+ struct alloc_state *s = *sp;
void *ret;
- if (!s->nr) {
+ if (!s || !s->nr) {
+ s = xmalloc(sizeof(*s));
s->nr = BLOCKING;
- s->p = xmalloc(BLOCKING * node_size);
+ s->start = s->p = xmalloc(BLOCKING * node_size);
+ s->prev = *sp;
+ *sp = s;
}
s->nr--;
s->count++;
@@ -48,7 +56,7 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
return ret;
}
-static struct alloc_state blob_state;
+static struct alloc_state *blob_state;
void *alloc_blob_node(void)
{
struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
@@ -56,7 +64,7 @@ void *alloc_blob_node(void)
return b;
}
-static struct alloc_state tree_state;
+static struct alloc_state *tree_state;
void *alloc_tree_node(void)
{
struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
@@ -64,7 +72,7 @@ void *alloc_tree_node(void)
return t;
}
-static struct alloc_state tag_state;
+static struct alloc_state *tag_state;
void *alloc_tag_node(void)
{
struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
@@ -72,7 +80,7 @@ void *alloc_tag_node(void)
return t;
}
-static struct alloc_state object_state;
+static struct alloc_state *object_state;
void *alloc_object_node(void)
{
struct object *obj = alloc_node(&object_state, sizeof(union any_object));
@@ -80,7 +88,7 @@ void *alloc_object_node(void)
return obj;
}
-static struct alloc_state commit_state;
+static struct alloc_state *commit_state;
unsigned int alloc_commit_index(void)
{
@@ -103,7 +111,7 @@ static void report(const char *name, unsigned int count, size_t size)
}
#define REPORT(name, type) \
- report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
+ report(#name, name##_state->count, name##_state->count * sizeof(type) >> 10)
void alloc_report(void)
{
@@ -113,3 +121,22 @@ void alloc_report(void)
REPORT(tag, struct tag);
REPORT(object, union any_object);
}
+
+static void alloc_clear(struct alloc_state **sp)
+{
+ while (*sp) {
+ struct alloc_state *s = *sp;
+ *sp = s->prev;
+ free(s->start);
+ free(s);
+ }
+}
+
+void alloc_clear_all(void)
+{
+ alloc_clear(&blob_state);
+ alloc_clear(&tree_state);
+ alloc_clear(&commit_state);
+ alloc_clear(&tag_state);
+ alloc_clear(&object_state);
+}
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4406af640f..7ba8ab07a3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2959,6 +2959,13 @@ static void get_object_list(int ac, const char **av)
record_recent_object, NULL);
}
+ /*
+ * The bitmap code actually stores the commit pointers for future
+ * reference, so we can't use this memory optimization there.
+ */
+ if (!write_bitmap_index)
+ free_all_objects();
+
if (keep_unreachable)
add_objects_in_unpacked_packs(&revs);
if (pack_loose_unreachable)
diff --git a/cache.h b/cache.h
index b90feb3802..605bab31de 100644
--- a/cache.h
+++ b/cache.h
@@ -1872,6 +1872,8 @@ extern void *alloc_object_node(void);
extern void alloc_report(void);
extern unsigned int alloc_commit_index(void);
+void alloc_clear_all(void);
+
/* pkt-line.c */
void packet_trace_identity(const char *prog);
diff --git a/object.c b/object.c
index 9e6f9ff20b..6530d6fbde 100644
--- a/object.c
+++ b/object.c
@@ -445,3 +445,12 @@ void clear_commit_marks_all(unsigned int flags)
obj->flags &= ~flags;
}
}
+
+void free_all_objects(void)
+{
+ alloc_clear_all();
+ free(obj_hash);
+ obj_hash = NULL;
+ obj_hash_size = 0;
+ nr_objs = 0;
+}
diff --git a/object.h b/object.h
index 8ce294d6ec..3eb85215c2 100644
--- a/object.h
+++ b/object.h
@@ -153,4 +153,6 @@ void clear_object_flags(unsigned flags);
*/
extern void clear_commit_marks_all(unsigned int flags);
+void free_all_objects(void);
+
#endif /* OBJECT_H */
^ permalink raw reply related [flat|nested] 273+ messages in thread
* Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates
2018-03-22 9:32 `