* [PATCH v2 1/7] Documentation: object_id_len goes up to 31
2022-02-17 13:55 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Han-Wen Nienhuys via GitGitGadget
@ 2022-02-17 13:55 ` Han-Wen Nienhuys via GitGitGadget
2022-02-17 13:55 ` [PATCH v2 2/7] reftable: reject 0 object_id_len Han-Wen Nienhuys via GitGitGadget
` (7 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-17 13:55 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
The value is stored in a 5-bit field, so we can't support more without
a format version upgrade.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
Documentation/technical/reftable.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index d7c3b645cfb..6a67cc4174f 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -443,7 +443,7 @@ Obj block format
Object blocks are optional. Writers may choose to omit object blocks,
especially if readers will not use the object name to ref mapping.
-Object blocks use unique, abbreviated 2-32 object name keys, mapping to
+Object blocks use unique, abbreviated 2-31 byte object name keys, mapping to
ref blocks containing references pointing to that object directly, or as
the peeled value of an annotated tag. Like ref blocks, object blocks use
the file's standard block size. The abbreviation length is available in
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v2 2/7] reftable: reject 0 object_id_len
2022-02-17 13:55 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Han-Wen Nienhuys via GitGitGadget
2022-02-17 13:55 ` [PATCH v2 1/7] Documentation: object_id_len goes up to 31 Han-Wen Nienhuys via GitGitGadget
@ 2022-02-17 13:55 ` Han-Wen Nienhuys via GitGitGadget
2022-02-18 0:32 ` Junio C Hamano
2022-02-17 13:55 ` [PATCH v2 3/7] reftable: add a test that verifies that writing empty keys fails Han-Wen Nienhuys via GitGitGadget
` (6 subsequent siblings)
8 siblings, 1 reply; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-17 13:55 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
The spec says 2 <= object_id_len <= 31. We are lenient and allow 1,
but we forbid 0, so we can we can be sure that we never read a
0-length key.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/reader.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/reftable/reader.c b/reftable/reader.c
index 00906e7a2de..54b4025105c 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -155,6 +155,11 @@ static int parse_footer(struct reftable_reader *r, uint8_t *footer,
r->log_offsets.is_present = (first_block_typ == BLOCK_TYPE_LOG ||
r->log_offsets.offset > 0);
r->obj_offsets.is_present = r->obj_offsets.offset > 0;
+ if (r->obj_offsets.is_present && !r->object_id_len) {
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ }
+
err = 0;
done:
return err;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v2 2/7] reftable: reject 0 object_id_len
2022-02-17 13:55 ` [PATCH v2 2/7] reftable: reject 0 object_id_len Han-Wen Nienhuys via GitGitGadget
@ 2022-02-18 0:32 ` Junio C Hamano
0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2022-02-18 0:32 UTC (permalink / raw)
To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Han-Wen Nienhuys <hanwen@google.com>
>
> The spec says 2 <= object_id_len <= 31. We are lenient and allow 1,
> but we forbid 0, so we can we can be sure that we never read a
s/we can we can/we can/;
> 0-length key.
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
> reftable/reader.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/reftable/reader.c b/reftable/reader.c
> index 00906e7a2de..54b4025105c 100644
> --- a/reftable/reader.c
> +++ b/reftable/reader.c
> @@ -155,6 +155,11 @@ static int parse_footer(struct reftable_reader *r, uint8_t *footer,
> r->log_offsets.is_present = (first_block_typ == BLOCK_TYPE_LOG ||
> r->log_offsets.offset > 0);
> r->obj_offsets.is_present = r->obj_offsets.offset > 0;
> + if (r->obj_offsets.is_present && !r->object_id_len) {
> + err = REFTABLE_FORMAT_ERROR;
> + goto done;
> + }
> +
> err = 0;
> done:
> return err;
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 3/7] reftable: add a test that verifies that writing empty keys fails
2022-02-17 13:55 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Han-Wen Nienhuys via GitGitGadget
2022-02-17 13:55 ` [PATCH v2 1/7] Documentation: object_id_len goes up to 31 Han-Wen Nienhuys via GitGitGadget
2022-02-17 13:55 ` [PATCH v2 2/7] reftable: reject 0 object_id_len Han-Wen Nienhuys via GitGitGadget
@ 2022-02-17 13:55 ` Han-Wen Nienhuys via GitGitGadget
2022-02-17 13:55 ` [PATCH v2 4/7] reftable: avoid writing empty keys at the block layer Han-Wen Nienhuys via GitGitGadget
` (5 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-17 13:55 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
Empty keys can only be written as ref records with empty names. The
log record has a logical timestamp in the key, so the key is never
empty.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/readwrite_test.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 605ba0f9fd4..fd5922e55f6 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -667,6 +667,29 @@ static void test_write_empty_table(void)
strbuf_release(&buf);
}
+static void test_write_empty_key(void)
+{
+ struct reftable_write_options opts = { 0 };
+ struct strbuf buf = STRBUF_INIT;
+ struct reftable_writer *w =
+ reftable_new_writer(&strbuf_add_void, &buf, &opts);
+ struct reftable_ref_record ref = {
+ .refname = "",
+ .update_index = 1,
+ .value_type = REFTABLE_REF_DELETION,
+ };
+ int err;
+
+ reftable_writer_set_limits(w, 1, 1);
+ err = reftable_writer_add_ref(w, &ref);
+ EXPECT(err == REFTABLE_API_ERROR);
+
+ err = reftable_writer_close(w);
+ EXPECT(err == REFTABLE_EMPTY_TABLE_ERROR);
+ reftable_writer_free(w);
+ strbuf_release(&buf);
+}
+
static void test_write_key_order(void)
{
struct reftable_write_options opts = { 0 };
@@ -746,6 +769,7 @@ int readwrite_test_main(int argc, const char *argv[])
RUN_TEST(test_table_read_write_seek_index);
RUN_TEST(test_table_refs_for_no_index);
RUN_TEST(test_table_refs_for_obj_index);
+ RUN_TEST(test_write_empty_key);
RUN_TEST(test_write_empty_table);
RUN_TEST(test_log_overflow);
return 0;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v2 4/7] reftable: avoid writing empty keys at the block layer
2022-02-17 13:55 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Han-Wen Nienhuys via GitGitGadget
` (2 preceding siblings ...)
2022-02-17 13:55 ` [PATCH v2 3/7] reftable: add a test that verifies that writing empty keys fails Han-Wen Nienhuys via GitGitGadget
@ 2022-02-17 13:55 ` Han-Wen Nienhuys via GitGitGadget
2022-02-17 23:55 ` Junio C Hamano
2022-02-17 13:55 ` [PATCH v2 5/7] reftable: ensure that obj_id_len is >= 2 on writing Han-Wen Nienhuys via GitGitGadget
` (4 subsequent siblings)
8 siblings, 1 reply; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-17 13:55 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
The public interface (reftable_writer) already ensures that keys are
written in strictly increasing order, and an empty key by definition
fails this check.
However, by also enforcing this at the block layer, it is easier to
verify that records (which are written into blocks) never have to
consider the possibility of empty keys.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/block.c | 27 +++++++++++++++++----------
reftable/block_test.c | 5 +++++
reftable/writer.c | 3 +--
3 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/reftable/block.c b/reftable/block.c
index 2170748c5e9..4a095afe1e2 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -88,8 +88,9 @@ uint8_t block_writer_type(struct block_writer *bw)
return bw->buf[bw->header_off];
}
-/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
- success */
+/* Adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
+ success. Returns REFTABLE_API_ERROR if attempting to write a record with
+ empty key. */
int block_writer_add(struct block_writer *w, struct reftable_record *rec)
{
struct strbuf empty = STRBUF_INIT;
@@ -105,8 +106,14 @@ int block_writer_add(struct block_writer *w, struct reftable_record *rec)
int is_restart = 0;
struct strbuf key = STRBUF_INIT;
int n = 0;
+ int err = -1;
reftable_record_key(rec, &key);
+ if (!key.len) {
+ err = REFTABLE_API_ERROR;
+ goto done;
+ }
+
n = reftable_encode_key(&is_restart, out, last, key,
reftable_record_val_type(rec));
if (n < 0)
@@ -118,16 +125,11 @@ int block_writer_add(struct block_writer *w, struct reftable_record *rec)
goto done;
string_view_consume(&out, n);
- if (block_writer_register_restart(w, start.len - out.len, is_restart,
- &key) < 0)
- goto done;
-
- strbuf_release(&key);
- return 0;
-
+ err = block_writer_register_restart(w, start.len - out.len, is_restart,
+ &key);
done:
strbuf_release(&key);
- return -1;
+ return err;
}
int block_writer_finish(struct block_writer *w)
@@ -332,6 +334,9 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
if (n < 0)
return -1;
+ if (!key.len)
+ return REFTABLE_FORMAT_ERROR;
+
string_view_consume(&in, n);
n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
if (n < 0)
@@ -358,6 +363,8 @@ int block_reader_first_key(struct block_reader *br, struct strbuf *key)
int n = reftable_decode_key(key, &extra, empty, in);
if (n < 0)
return n;
+ if (!key->len)
+ return -1;
return 0;
}
diff --git a/reftable/block_test.c b/reftable/block_test.c
index fa2ee092ec0..cb88af4a563 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -42,6 +42,11 @@ static void test_block_read_write(void)
block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
header_off, hash_size(GIT_SHA1_FORMAT_ID));
+ rec.u.ref.refname = "";
+ rec.u.ref.value_type = REFTABLE_REF_DELETION;
+ n = block_writer_add(&bw, &rec);
+ EXPECT(n == REFTABLE_API_ERROR);
+
for (i = 0; i < N; i++) {
char name[100];
uint8_t hash[GIT_SHA1_RAWSZ];
diff --git a/reftable/writer.c b/reftable/writer.c
index 944c2329ab5..d54215a50dc 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -240,14 +240,13 @@ static int writer_add_record(struct reftable_writer *w,
writer_reinit_block_writer(w, reftable_record_type(rec));
err = block_writer_add(w->block_writer, rec);
- if (err < 0) {
+ if (err == -1) {
/* we are writing into memory, so an error can only mean it
* doesn't fit. */
err = REFTABLE_ENTRY_TOO_BIG_ERROR;
goto done;
}
- err = 0;
done:
strbuf_release(&key);
return err;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v2 4/7] reftable: avoid writing empty keys at the block layer
2022-02-17 13:55 ` [PATCH v2 4/7] reftable: avoid writing empty keys at the block layer Han-Wen Nienhuys via GitGitGadget
@ 2022-02-17 23:55 ` Junio C Hamano
2022-02-21 14:32 ` Han-Wen Nienhuys
0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2022-02-17 23:55 UTC (permalink / raw)
To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -105,8 +106,14 @@ int block_writer_add(struct block_writer *w, struct reftable_record *rec)
> int is_restart = 0;
> struct strbuf key = STRBUF_INIT;
> int n = 0;
> + int err = -1;
>
> reftable_record_key(rec, &key);
> + if (!key.len) {
> + err = REFTABLE_API_ERROR;
> + goto done;
> + }
OK; we get an API_ERROR when trying to write a bad one. And ...
> @@ -332,6 +334,9 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
> if (n < 0)
> return -1;
>
> + if (!key.len)
> + return REFTABLE_FORMAT_ERROR;
... we get a FORMAT_ERROR when the data we try to read is bad
(i.e. not our fault). OK.
> @@ -358,6 +363,8 @@ int block_reader_first_key(struct block_reader *br, struct strbuf *key)
> int n = reftable_decode_key(key, &extra, empty, in);
> if (n < 0)
> return n;
> + if (!key->len)
> + return -1;
It is curious that this gets a different error out of the same
sequence, i.e. decode-key did not return an error but the length of
the key happens to be 0, not FORMAT_ERROR.
> diff --git a/reftable/writer.c b/reftable/writer.c
> index 944c2329ab5..d54215a50dc 100644
> --- a/reftable/writer.c
> +++ b/reftable/writer.c
> @@ -240,14 +240,13 @@ static int writer_add_record(struct reftable_writer *w,
>
> writer_reinit_block_writer(w, reftable_record_type(rec));
> err = block_writer_add(w->block_writer, rec);
> - if (err < 0) {
> + if (err == -1) {
> /* we are writing into memory, so an error can only mean it
> * doesn't fit. */
> err = REFTABLE_ENTRY_TOO_BIG_ERROR;
> goto done;
> }
>
> - err = 0;
Is this "doesn't fit" related to "we catch 0-length keys", or an
unrelated fix was included in this step by "rebase -i" mistake?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 4/7] reftable: avoid writing empty keys at the block layer
2022-02-17 23:55 ` Junio C Hamano
@ 2022-02-21 14:32 ` Han-Wen Nienhuys
0 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys @ 2022-02-21 14:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys
On Fri, Feb 18, 2022 at 12:55 AM Junio C Hamano <gitster@pobox.com> wrote:
> > @@ -358,6 +363,8 @@ int block_reader_first_key(struct block_reader *br, struct strbuf *key)
> > int n = reftable_decode_key(key, &extra, empty, in);
> > if (n < 0)
> > return n;
> > + if (!key->len)
> > + return -1;
>
> It is curious that this gets a different error out of the same
> sequence, i.e. decode-key did not return an error but the length of
> the key happens to be 0, not FORMAT_ERROR.
fixed.
> > --- a/reftable/writer.c
> > +++ b/reftable/writer.c
> > @@ -240,14 +240,13 @@ static int writer_add_record(struct reftable_writer *w,
> >
> > writer_reinit_block_writer(w, reftable_record_type(rec));
> > err = block_writer_add(w->block_writer, rec);
> > - if (err < 0) {
> > + if (err == -1) {
> > /* we are writing into memory, so an error can only mean it
> > * doesn't fit. */
> > err = REFTABLE_ENTRY_TOO_BIG_ERROR;
> > goto done;
> > }
> >
> > - err = 0;
>
> Is this "doesn't fit" related to "we catch 0-length keys", or an
> unrelated fix was included in this step by "rebase -i" mistake?
We don't want to reinterpret API_ERROR (from block_writer_add) as
ENTRY_TOO_BIG_ERROR, so we have to tweak the condition here.
--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Liana Sebastian
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 5/7] reftable: ensure that obj_id_len is >= 2 on writing
2022-02-17 13:55 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Han-Wen Nienhuys via GitGitGadget
` (3 preceding siblings ...)
2022-02-17 13:55 ` [PATCH v2 4/7] reftable: avoid writing empty keys at the block layer Han-Wen Nienhuys via GitGitGadget
@ 2022-02-17 13:55 ` Han-Wen Nienhuys via GitGitGadget
2022-02-18 0:01 ` Junio C Hamano
2022-02-17 13:55 ` [PATCH v2 6/7] reftable: add test for length of disambiguating prefix Han-Wen Nienhuys via GitGitGadget
` (3 subsequent siblings)
8 siblings, 1 reply; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-17 13:55 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
When writing the same hash many times, we might decide to use a
length-1 object ID prefix for the ObjectID => ref table, which is out
of spec.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/readwrite_test.c | 37 +++++++++++++++++++++++++++++++++++++
reftable/writer.c | 4 +++-
2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index fd5922e55f6..35142eb070e 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -667,6 +667,42 @@ static void test_write_empty_table(void)
strbuf_release(&buf);
}
+static void test_write_object_id_min_length(void)
+{
+ struct reftable_write_options opts = {
+ .block_size = 75,
+ };
+ struct strbuf buf = STRBUF_INIT;
+ struct reftable_writer *w =
+ reftable_new_writer(&strbuf_add_void, &buf, &opts);
+ uint8_t hash[GIT_SHA1_RAWSZ] = {42};
+ struct reftable_ref_record ref = {
+ .update_index = 1,
+ .value_type = REFTABLE_REF_VAL1,
+ .value.val1 = hash,
+ };
+ int err;
+ int i;
+
+ reftable_writer_set_limits(w, 1, 1);
+
+ /* Write the same hash in many refs. If there is only 1 hash, the
+ * disambiguating prefix is length 0 */
+ for (i = 0; i < 256; i++) {
+ char name[256];
+ snprintf(name, sizeof(name), "ref%05d", i);
+ ref.refname = name;
+ err = reftable_writer_add_ref(w, &ref);
+ EXPECT_ERR(err);
+ }
+
+ err = reftable_writer_close(w);
+ EXPECT_ERR(err);
+ EXPECT(writer_stats(w)->object_id_len == 2);
+ reftable_writer_free(w);
+ strbuf_release(&buf);
+}
+
static void test_write_empty_key(void)
{
struct reftable_write_options opts = { 0 };
@@ -772,5 +808,6 @@ int readwrite_test_main(int argc, const char *argv[])
RUN_TEST(test_write_empty_key);
RUN_TEST(test_write_empty_table);
RUN_TEST(test_log_overflow);
+ RUN_TEST(test_write_object_id_min_length);
return 0;
}
diff --git a/reftable/writer.c b/reftable/writer.c
index d54215a50dc..5e4e6e93416 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -515,7 +515,9 @@ static void object_record_free(void *void_arg, void *key)
static int writer_dump_object_index(struct reftable_writer *w)
{
struct write_record_arg closure = { .w = w };
- struct common_prefix_arg common = { NULL };
+ struct common_prefix_arg common = {
+ .max = 1, /* obj_id_len should be >= 2. */
+ };
if (w->obj_index_tree) {
infix_walk(w->obj_index_tree, &update_common, &common);
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v2 5/7] reftable: ensure that obj_id_len is >= 2 on writing
2022-02-17 13:55 ` [PATCH v2 5/7] reftable: ensure that obj_id_len is >= 2 on writing Han-Wen Nienhuys via GitGitGadget
@ 2022-02-18 0:01 ` Junio C Hamano
0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2022-02-18 0:01 UTC (permalink / raw)
To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> diff --git a/reftable/writer.c b/reftable/writer.c
> index d54215a50dc..5e4e6e93416 100644
> --- a/reftable/writer.c
> +++ b/reftable/writer.c
> @@ -515,7 +515,9 @@ static void object_record_free(void *void_arg, void *key)
> static int writer_dump_object_index(struct reftable_writer *w)
> {
> struct write_record_arg closure = { .w = w };
> - struct common_prefix_arg common = { NULL };
> + struct common_prefix_arg common = {
> + .max = 1, /* obj_id_len should be >= 2. */
> + };
It feels somewhat strange that we have to set .max to set the floor
for the minimum length, but given the way update_common() uses and
maintains the member, it is more like "max size we have seen so
far", and by pretending that we have already seen common prefix of
length 1, we'd force ourselves that we need at least 2 to
differentiate.
> if (w->obj_index_tree) {
> infix_walk(w->obj_index_tree, &update_common, &common);
> }
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 6/7] reftable: add test for length of disambiguating prefix
2022-02-17 13:55 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Han-Wen Nienhuys via GitGitGadget
` (4 preceding siblings ...)
2022-02-17 13:55 ` [PATCH v2 5/7] reftable: ensure that obj_id_len is >= 2 on writing Han-Wen Nienhuys via GitGitGadget
@ 2022-02-17 13:55 ` Han-Wen Nienhuys via GitGitGadget
2022-02-17 13:55 ` [PATCH v2 7/7] reftable: rename writer_stats to reftable_writer_stats Han-Wen Nienhuys via GitGitGadget
` (2 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-17 13:55 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
The ID => ref map is trimming object IDs to a disambiguating prefix.
Check that we are computing their length correctly.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/readwrite_test.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 35142eb070e..a1b835785a3 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -703,6 +703,43 @@ static void test_write_object_id_min_length(void)
strbuf_release(&buf);
}
+static void test_write_object_id_length(void)
+{
+ struct reftable_write_options opts = {
+ .block_size = 75,
+ };
+ struct strbuf buf = STRBUF_INIT;
+ struct reftable_writer *w =
+ reftable_new_writer(&strbuf_add_void, &buf, &opts);
+ uint8_t hash[GIT_SHA1_RAWSZ] = {42};
+ struct reftable_ref_record ref = {
+ .update_index = 1,
+ .value_type = REFTABLE_REF_VAL1,
+ .value.val1 = hash,
+ };
+ int err;
+ int i;
+
+ reftable_writer_set_limits(w, 1, 1);
+
+ /* Write the same hash in many refs. If there is only 1 hash, the
+ * disambiguating prefix is length 0 */
+ for (i = 0; i < 256; i++) {
+ char name[256];
+ snprintf(name, sizeof(name), "ref%05d", i);
+ ref.refname = name;
+ ref.value.val1[15] = i;
+ err = reftable_writer_add_ref(w, &ref);
+ EXPECT_ERR(err);
+ }
+
+ err = reftable_writer_close(w);
+ EXPECT_ERR(err);
+ EXPECT(writer_stats(w)->object_id_len == 16);
+ reftable_writer_free(w);
+ strbuf_release(&buf);
+}
+
static void test_write_empty_key(void)
{
struct reftable_write_options opts = { 0 };
@@ -808,6 +845,7 @@ int readwrite_test_main(int argc, const char *argv[])
RUN_TEST(test_write_empty_key);
RUN_TEST(test_write_empty_table);
RUN_TEST(test_log_overflow);
+ RUN_TEST(test_write_object_id_length);
RUN_TEST(test_write_object_id_min_length);
return 0;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v2 7/7] reftable: rename writer_stats to reftable_writer_stats
2022-02-17 13:55 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Han-Wen Nienhuys via GitGitGadget
` (5 preceding siblings ...)
2022-02-17 13:55 ` [PATCH v2 6/7] reftable: add test for length of disambiguating prefix Han-Wen Nienhuys via GitGitGadget
@ 2022-02-17 13:55 ` Han-Wen Nienhuys via GitGitGadget
2022-02-18 0:02 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Junio C Hamano
2022-02-21 18:46 ` [PATCH v3 " Han-Wen Nienhuys via GitGitGadget
8 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-17 13:55 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
This function is part of the reftable API, so it should use the
reftable_ prefix
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/readwrite_test.c | 10 +++++-----
reftable/reftable-writer.h | 2 +-
reftable/writer.c | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index a1b835785a3..469ab79a5ad 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -100,7 +100,7 @@ static void write_table(char ***names, struct strbuf *buf, int N,
n = reftable_writer_close(w);
EXPECT(n == 0);
- stats = writer_stats(w);
+ stats = reftable_writer_stats(w);
for (i = 0; i < stats->ref_stats.blocks; i++) {
int off = i * opts.block_size;
if (off == 0) {
@@ -239,7 +239,7 @@ static void test_log_write_read(void)
n = reftable_writer_close(w);
EXPECT(n == 0);
- stats = writer_stats(w);
+ stats = reftable_writer_stats(w);
EXPECT(stats->log_stats.blocks > 0);
reftable_writer_free(w);
w = NULL;
@@ -330,7 +330,7 @@ static void test_log_zlib_corruption(void)
n = reftable_writer_close(w);
EXPECT(n == 0);
- stats = writer_stats(w);
+ stats = reftable_writer_stats(w);
EXPECT(stats->log_stats.blocks > 0);
reftable_writer_free(w);
w = NULL;
@@ -698,7 +698,7 @@ static void test_write_object_id_min_length(void)
err = reftable_writer_close(w);
EXPECT_ERR(err);
- EXPECT(writer_stats(w)->object_id_len == 2);
+ EXPECT(reftable_writer_stats(w)->object_id_len == 2);
reftable_writer_free(w);
strbuf_release(&buf);
}
@@ -735,7 +735,7 @@ static void test_write_object_id_length(void)
err = reftable_writer_close(w);
EXPECT_ERR(err);
- EXPECT(writer_stats(w)->object_id_len == 16);
+ EXPECT(reftable_writer_stats(w)->object_id_len == 16);
reftable_writer_free(w);
strbuf_release(&buf);
}
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index a560dc17255..db8de197f6c 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -143,7 +143,7 @@ int reftable_writer_close(struct reftable_writer *w);
This struct becomes invalid when the writer is freed.
*/
-const struct reftable_stats *writer_stats(struct reftable_writer *w);
+const struct reftable_stats *reftable_writer_stats(struct reftable_writer *w);
/* reftable_writer_free deallocates memory for the writer */
void reftable_writer_free(struct reftable_writer *w);
diff --git a/reftable/writer.c b/reftable/writer.c
index 5e4e6e93416..6d979e245ff 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -695,7 +695,7 @@ static int writer_flush_block(struct reftable_writer *w)
return writer_flush_nonempty_block(w);
}
-const struct reftable_stats *writer_stats(struct reftable_writer *w)
+const struct reftable_stats *reftable_writer_stats(struct reftable_writer *w)
{
return &w->stats;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v2 0/7] reftable: avoid reading and writing empty keys
2022-02-17 13:55 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Han-Wen Nienhuys via GitGitGadget
` (6 preceding siblings ...)
2022-02-17 13:55 ` [PATCH v2 7/7] reftable: rename writer_stats to reftable_writer_stats Han-Wen Nienhuys via GitGitGadget
@ 2022-02-18 0:02 ` Junio C Hamano
2022-02-21 18:46 ` [PATCH v3 " Han-Wen Nienhuys via GitGitGadget
8 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2022-02-18 0:02 UTC (permalink / raw)
To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> this series makes sure that the object record does not have to consider
> empty keys (and therefore, a NULL memcpy destination)
>
> while we're at it add some more tests, and fix a naming mistake.
Looking good. Will queue.
Thanks.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v3 0/7] reftable: avoid reading and writing empty keys
2022-02-17 13:55 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Han-Wen Nienhuys via GitGitGadget
` (7 preceding siblings ...)
2022-02-18 0:02 ` [PATCH v2 0/7] reftable: avoid reading and writing empty keys Junio C Hamano
@ 2022-02-21 18:46 ` Han-Wen Nienhuys via GitGitGadget
2022-02-21 18:46 ` [PATCH v3 1/7] Documentation: object_id_len goes up to 31 Han-Wen Nienhuys via GitGitGadget
` (7 more replies)
8 siblings, 8 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-21 18:46 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys
this series makes sure that the object record does not have to consider
empty keys (and therefore, a NULL memcpy destination)
while we're at it add some more tests, and fix a naming mistake.
Han-Wen Nienhuys (7):
Documentation: object_id_len goes up to 31
reftable: reject 0 object_id_len
reftable: add a test that verifies that writing empty keys fails
reftable: avoid writing empty keys at the block layer
reftable: ensure that obj_id_len is >= 2 on writing
reftable: add test for length of disambiguating prefix
reftable: rename writer_stats to reftable_writer_stats
Documentation/technical/reftable.txt | 2 +-
reftable/block.c | 27 ++++---
reftable/block_test.c | 5 ++
reftable/reader.c | 5 ++
reftable/readwrite_test.c | 105 ++++++++++++++++++++++++++-
reftable/reftable-writer.h | 2 +-
reftable/writer.c | 9 ++-
7 files changed, 136 insertions(+), 19 deletions(-)
base-commit: 45fe28c951c3e70666ee4ef8379772851a8e4d32
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1185%2Fhanwen%2Fobj-id-len-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1185/hanwen/obj-id-len-v3
Pull-Request: https://github.com/git/git/pull/1185
Range-diff vs v2:
1: 80d29e8f269 = 1: 80d29e8f269 Documentation: object_id_len goes up to 31
2: 4c1a19fc4ae ! 2: 68e7bc32ff8 reftable: reject 0 object_id_len
@@ Commit message
reftable: reject 0 object_id_len
The spec says 2 <= object_id_len <= 31. We are lenient and allow 1,
- but we forbid 0, so we can we can be sure that we never read a
- 0-length key.
+ but we forbid 0, so we can be sure that we never read a 0-length key.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
3: 600b115f8b1 = 3: 8b5aebdb07a reftable: add a test that verifies that writing empty keys fails
4: ba036ee8543 ! 4: a9372cacd1b reftable: avoid writing empty keys at the block layer
@@ reftable/block.c: int block_reader_first_key(struct block_reader *br, struct str
if (n < 0)
return n;
+ if (!key->len)
-+ return -1;
++ return REFTABLE_FORMAT_ERROR;
return 0;
}
5: 2bd3d44ba57 = 5: 0b8a42399dd reftable: ensure that obj_id_len is >= 2 on writing
6: 82d36ee0e0d = 6: bdccd969475 reftable: add test for length of disambiguating prefix
7: c6ffdb3471c = 7: 72499a14e38 reftable: rename writer_stats to reftable_writer_stats
--
gitgitgadget
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v3 1/7] Documentation: object_id_len goes up to 31
2022-02-21 18:46 ` [PATCH v3 " Han-Wen Nienhuys via GitGitGadget
@ 2022-02-21 18:46 ` Han-Wen Nienhuys via GitGitGadget
2022-02-21 18:46 ` [PATCH v3 2/7] reftable: reject 0 object_id_len Han-Wen Nienhuys via GitGitGadget
` (6 subsequent siblings)
7 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-21 18:46 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
The value is stored in a 5-bit field, so we can't support more without
a format version upgrade.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
Documentation/technical/reftable.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index d7c3b645cfb..6a67cc4174f 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -443,7 +443,7 @@ Obj block format
Object blocks are optional. Writers may choose to omit object blocks,
especially if readers will not use the object name to ref mapping.
-Object blocks use unique, abbreviated 2-32 object name keys, mapping to
+Object blocks use unique, abbreviated 2-31 byte object name keys, mapping to
ref blocks containing references pointing to that object directly, or as
the peeled value of an annotated tag. Like ref blocks, object blocks use
the file's standard block size. The abbreviation length is available in
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v3 2/7] reftable: reject 0 object_id_len
2022-02-21 18:46 ` [PATCH v3 " Han-Wen Nienhuys via GitGitGadget
2022-02-21 18:46 ` [PATCH v3 1/7] Documentation: object_id_len goes up to 31 Han-Wen Nienhuys via GitGitGadget
@ 2022-02-21 18:46 ` Han-Wen Nienhuys via GitGitGadget
2022-02-21 18:46 ` [PATCH v3 3/7] reftable: add a test that verifies that writing empty keys fails Han-Wen Nienhuys via GitGitGadget
` (5 subsequent siblings)
7 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-21 18:46 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
The spec says 2 <= object_id_len <= 31. We are lenient and allow 1,
but we forbid 0, so we can be sure that we never read a 0-length key.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/reader.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/reftable/reader.c b/reftable/reader.c
index 00906e7a2de..54b4025105c 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -155,6 +155,11 @@ static int parse_footer(struct reftable_reader *r, uint8_t *footer,
r->log_offsets.is_present = (first_block_typ == BLOCK_TYPE_LOG ||
r->log_offsets.offset > 0);
r->obj_offsets.is_present = r->obj_offsets.offset > 0;
+ if (r->obj_offsets.is_present && !r->object_id_len) {
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ }
+
err = 0;
done:
return err;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v3 3/7] reftable: add a test that verifies that writing empty keys fails
2022-02-21 18:46 ` [PATCH v3 " Han-Wen Nienhuys via GitGitGadget
2022-02-21 18:46 ` [PATCH v3 1/7] Documentation: object_id_len goes up to 31 Han-Wen Nienhuys via GitGitGadget
2022-02-21 18:46 ` [PATCH v3 2/7] reftable: reject 0 object_id_len Han-Wen Nienhuys via GitGitGadget
@ 2022-02-21 18:46 ` Han-Wen Nienhuys via GitGitGadget
2022-02-21 18:46 ` [PATCH v3 4/7] reftable: avoid writing empty keys at the block layer Han-Wen Nienhuys via GitGitGadget
` (4 subsequent siblings)
7 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-21 18:46 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
Empty keys can only be written as ref records with empty names. The
log record has a logical timestamp in the key, so the key is never
empty.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/readwrite_test.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 605ba0f9fd4..fd5922e55f6 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -667,6 +667,29 @@ static void test_write_empty_table(void)
strbuf_release(&buf);
}
+static void test_write_empty_key(void)
+{
+ struct reftable_write_options opts = { 0 };
+ struct strbuf buf = STRBUF_INIT;
+ struct reftable_writer *w =
+ reftable_new_writer(&strbuf_add_void, &buf, &opts);
+ struct reftable_ref_record ref = {
+ .refname = "",
+ .update_index = 1,
+ .value_type = REFTABLE_REF_DELETION,
+ };
+ int err;
+
+ reftable_writer_set_limits(w, 1, 1);
+ err = reftable_writer_add_ref(w, &ref);
+ EXPECT(err == REFTABLE_API_ERROR);
+
+ err = reftable_writer_close(w);
+ EXPECT(err == REFTABLE_EMPTY_TABLE_ERROR);
+ reftable_writer_free(w);
+ strbuf_release(&buf);
+}
+
static void test_write_key_order(void)
{
struct reftable_write_options opts = { 0 };
@@ -746,6 +769,7 @@ int readwrite_test_main(int argc, const char *argv[])
RUN_TEST(test_table_read_write_seek_index);
RUN_TEST(test_table_refs_for_no_index);
RUN_TEST(test_table_refs_for_obj_index);
+ RUN_TEST(test_write_empty_key);
RUN_TEST(test_write_empty_table);
RUN_TEST(test_log_overflow);
return 0;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v3 4/7] reftable: avoid writing empty keys at the block layer
2022-02-21 18:46 ` [PATCH v3 " Han-Wen Nienhuys via GitGitGadget
` (2 preceding siblings ...)
2022-02-21 18:46 ` [PATCH v3 3/7] reftable: add a test that verifies that writing empty keys fails Han-Wen Nienhuys via GitGitGadget
@ 2022-02-21 18:46 ` Han-Wen Nienhuys via GitGitGadget
2022-02-21 18:46 ` [PATCH v3 5/7] reftable: ensure that obj_id_len is >= 2 on writing Han-Wen Nienhuys via GitGitGadget
` (3 subsequent siblings)
7 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-21 18:46 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
The public interface (reftable_writer) already ensures that keys are
written in strictly increasing order, and an empty key by definition
fails this check.
However, by also enforcing this at the block layer, it is easier to
verify that records (which are written into blocks) never have to
consider the possibility of empty keys.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/block.c | 27 +++++++++++++++++----------
reftable/block_test.c | 5 +++++
reftable/writer.c | 3 +--
3 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/reftable/block.c b/reftable/block.c
index 2170748c5e9..34d4d073692 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -88,8 +88,9 @@ uint8_t block_writer_type(struct block_writer *bw)
return bw->buf[bw->header_off];
}
-/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
- success */
+/* Adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
+ success. Returns REFTABLE_API_ERROR if attempting to write a record with
+ empty key. */
int block_writer_add(struct block_writer *w, struct reftable_record *rec)
{
struct strbuf empty = STRBUF_INIT;
@@ -105,8 +106,14 @@ int block_writer_add(struct block_writer *w, struct reftable_record *rec)
int is_restart = 0;
struct strbuf key = STRBUF_INIT;
int n = 0;
+ int err = -1;
reftable_record_key(rec, &key);
+ if (!key.len) {
+ err = REFTABLE_API_ERROR;
+ goto done;
+ }
+
n = reftable_encode_key(&is_restart, out, last, key,
reftable_record_val_type(rec));
if (n < 0)
@@ -118,16 +125,11 @@ int block_writer_add(struct block_writer *w, struct reftable_record *rec)
goto done;
string_view_consume(&out, n);
- if (block_writer_register_restart(w, start.len - out.len, is_restart,
- &key) < 0)
- goto done;
-
- strbuf_release(&key);
- return 0;
-
+ err = block_writer_register_restart(w, start.len - out.len, is_restart,
+ &key);
done:
strbuf_release(&key);
- return -1;
+ return err;
}
int block_writer_finish(struct block_writer *w)
@@ -332,6 +334,9 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
if (n < 0)
return -1;
+ if (!key.len)
+ return REFTABLE_FORMAT_ERROR;
+
string_view_consume(&in, n);
n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
if (n < 0)
@@ -358,6 +363,8 @@ int block_reader_first_key(struct block_reader *br, struct strbuf *key)
int n = reftable_decode_key(key, &extra, empty, in);
if (n < 0)
return n;
+ if (!key->len)
+ return REFTABLE_FORMAT_ERROR;
return 0;
}
diff --git a/reftable/block_test.c b/reftable/block_test.c
index fa2ee092ec0..cb88af4a563 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -42,6 +42,11 @@ static void test_block_read_write(void)
block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
header_off, hash_size(GIT_SHA1_FORMAT_ID));
+ rec.u.ref.refname = "";
+ rec.u.ref.value_type = REFTABLE_REF_DELETION;
+ n = block_writer_add(&bw, &rec);
+ EXPECT(n == REFTABLE_API_ERROR);
+
for (i = 0; i < N; i++) {
char name[100];
uint8_t hash[GIT_SHA1_RAWSZ];
diff --git a/reftable/writer.c b/reftable/writer.c
index 944c2329ab5..d54215a50dc 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -240,14 +240,13 @@ static int writer_add_record(struct reftable_writer *w,
writer_reinit_block_writer(w, reftable_record_type(rec));
err = block_writer_add(w->block_writer, rec);
- if (err < 0) {
+ if (err == -1) {
/* we are writing into memory, so an error can only mean it
* doesn't fit. */
err = REFTABLE_ENTRY_TOO_BIG_ERROR;
goto done;
}
- err = 0;
done:
strbuf_release(&key);
return err;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v3 5/7] reftable: ensure that obj_id_len is >= 2 on writing
2022-02-21 18:46 ` [PATCH v3 " Han-Wen Nienhuys via GitGitGadget
` (3 preceding siblings ...)
2022-02-21 18:46 ` [PATCH v3 4/7] reftable: avoid writing empty keys at the block layer Han-Wen Nienhuys via GitGitGadget
@ 2022-02-21 18:46 ` Han-Wen Nienhuys via GitGitGadget
2022-02-21 18:46 ` [PATCH v3 6/7] reftable: add test for length of disambiguating prefix Han-Wen Nienhuys via GitGitGadget
` (2 subsequent siblings)
7 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-21 18:46 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
When writing the same hash many times, we might decide to use a
length-1 object ID prefix for the ObjectID => ref table, which is out
of spec.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/readwrite_test.c | 37 +++++++++++++++++++++++++++++++++++++
reftable/writer.c | 4 +++-
2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index fd5922e55f6..35142eb070e 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -667,6 +667,42 @@ static void test_write_empty_table(void)
strbuf_release(&buf);
}
+static void test_write_object_id_min_length(void)
+{
+ struct reftable_write_options opts = {
+ .block_size = 75,
+ };
+ struct strbuf buf = STRBUF_INIT;
+ struct reftable_writer *w =
+ reftable_new_writer(&strbuf_add_void, &buf, &opts);
+ uint8_t hash[GIT_SHA1_RAWSZ] = {42};
+ struct reftable_ref_record ref = {
+ .update_index = 1,
+ .value_type = REFTABLE_REF_VAL1,
+ .value.val1 = hash,
+ };
+ int err;
+ int i;
+
+ reftable_writer_set_limits(w, 1, 1);
+
+ /* Write the same hash in many refs. If there is only 1 hash, the
+ * disambiguating prefix is length 0 */
+ for (i = 0; i < 256; i++) {
+ char name[256];
+ snprintf(name, sizeof(name), "ref%05d", i);
+ ref.refname = name;
+ err = reftable_writer_add_ref(w, &ref);
+ EXPECT_ERR(err);
+ }
+
+ err = reftable_writer_close(w);
+ EXPECT_ERR(err);
+ EXPECT(writer_stats(w)->object_id_len == 2);
+ reftable_writer_free(w);
+ strbuf_release(&buf);
+}
+
static void test_write_empty_key(void)
{
struct reftable_write_options opts = { 0 };
@@ -772,5 +808,6 @@ int readwrite_test_main(int argc, const char *argv[])
RUN_TEST(test_write_empty_key);
RUN_TEST(test_write_empty_table);
RUN_TEST(test_log_overflow);
+ RUN_TEST(test_write_object_id_min_length);
return 0;
}
diff --git a/reftable/writer.c b/reftable/writer.c
index d54215a50dc..5e4e6e93416 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -515,7 +515,9 @@ static void object_record_free(void *void_arg, void *key)
static int writer_dump_object_index(struct reftable_writer *w)
{
struct write_record_arg closure = { .w = w };
- struct common_prefix_arg common = { NULL };
+ struct common_prefix_arg common = {
+ .max = 1, /* obj_id_len should be >= 2. */
+ };
if (w->obj_index_tree) {
infix_walk(w->obj_index_tree, &update_common, &common);
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v3 6/7] reftable: add test for length of disambiguating prefix
2022-02-21 18:46 ` [PATCH v3 " Han-Wen Nienhuys via GitGitGadget
` (4 preceding siblings ...)
2022-02-21 18:46 ` [PATCH v3 5/7] reftable: ensure that obj_id_len is >= 2 on writing Han-Wen Nienhuys via GitGitGadget
@ 2022-02-21 18:46 ` Han-Wen Nienhuys via GitGitGadget
2022-02-21 18:46 ` [PATCH v3 7/7] reftable: rename writer_stats to reftable_writer_stats Han-Wen Nienhuys via GitGitGadget
2022-02-23 21:37 ` [PATCH v3 0/7] reftable: avoid reading and writing empty keys Junio C Hamano
7 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-21 18:46 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
The ID => ref map is trimming object IDs to a disambiguating prefix.
Check that we are computing their length correctly.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/readwrite_test.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 35142eb070e..a1b835785a3 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -703,6 +703,43 @@ static void test_write_object_id_min_length(void)
strbuf_release(&buf);
}
+static void test_write_object_id_length(void)
+{
+ struct reftable_write_options opts = {
+ .block_size = 75,
+ };
+ struct strbuf buf = STRBUF_INIT;
+ struct reftable_writer *w =
+ reftable_new_writer(&strbuf_add_void, &buf, &opts);
+ uint8_t hash[GIT_SHA1_RAWSZ] = {42};
+ struct reftable_ref_record ref = {
+ .update_index = 1,
+ .value_type = REFTABLE_REF_VAL1,
+ .value.val1 = hash,
+ };
+ int err;
+ int i;
+
+ reftable_writer_set_limits(w, 1, 1);
+
+ /* Write the same hash in many refs. If there is only 1 hash, the
+ * disambiguating prefix is length 0 */
+ for (i = 0; i < 256; i++) {
+ char name[256];
+ snprintf(name, sizeof(name), "ref%05d", i);
+ ref.refname = name;
+ ref.value.val1[15] = i;
+ err = reftable_writer_add_ref(w, &ref);
+ EXPECT_ERR(err);
+ }
+
+ err = reftable_writer_close(w);
+ EXPECT_ERR(err);
+ EXPECT(writer_stats(w)->object_id_len == 16);
+ reftable_writer_free(w);
+ strbuf_release(&buf);
+}
+
static void test_write_empty_key(void)
{
struct reftable_write_options opts = { 0 };
@@ -808,6 +845,7 @@ int readwrite_test_main(int argc, const char *argv[])
RUN_TEST(test_write_empty_key);
RUN_TEST(test_write_empty_table);
RUN_TEST(test_log_overflow);
+ RUN_TEST(test_write_object_id_length);
RUN_TEST(test_write_object_id_min_length);
return 0;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v3 7/7] reftable: rename writer_stats to reftable_writer_stats
2022-02-21 18:46 ` [PATCH v3 " Han-Wen Nienhuys via GitGitGadget
` (5 preceding siblings ...)
2022-02-21 18:46 ` [PATCH v3 6/7] reftable: add test for length of disambiguating prefix Han-Wen Nienhuys via GitGitGadget
@ 2022-02-21 18:46 ` Han-Wen Nienhuys via GitGitGadget
2022-02-23 21:37 ` [PATCH v3 0/7] reftable: avoid reading and writing empty keys Junio C Hamano
7 siblings, 0 replies; 33+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2022-02-21 18:46 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys, Han-Wen Nienhuys
From: Han-Wen Nienhuys <hanwen@google.com>
This function is part of the reftable API, so it should use the
reftable_ prefix
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
reftable/readwrite_test.c | 10 +++++-----
reftable/reftable-writer.h | 2 +-
reftable/writer.c | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index a1b835785a3..469ab79a5ad 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -100,7 +100,7 @@ static void write_table(char ***names, struct strbuf *buf, int N,
n = reftable_writer_close(w);
EXPECT(n == 0);
- stats = writer_stats(w);
+ stats = reftable_writer_stats(w);
for (i = 0; i < stats->ref_stats.blocks; i++) {
int off = i * opts.block_size;
if (off == 0) {
@@ -239,7 +239,7 @@ static void test_log_write_read(void)
n = reftable_writer_close(w);
EXPECT(n == 0);
- stats = writer_stats(w);
+ stats = reftable_writer_stats(w);
EXPECT(stats->log_stats.blocks > 0);
reftable_writer_free(w);
w = NULL;
@@ -330,7 +330,7 @@ static void test_log_zlib_corruption(void)
n = reftable_writer_close(w);
EXPECT(n == 0);
- stats = writer_stats(w);
+ stats = reftable_writer_stats(w);
EXPECT(stats->log_stats.blocks > 0);
reftable_writer_free(w);
w = NULL;
@@ -698,7 +698,7 @@ static void test_write_object_id_min_length(void)
err = reftable_writer_close(w);
EXPECT_ERR(err);
- EXPECT(writer_stats(w)->object_id_len == 2);
+ EXPECT(reftable_writer_stats(w)->object_id_len == 2);
reftable_writer_free(w);
strbuf_release(&buf);
}
@@ -735,7 +735,7 @@ static void test_write_object_id_length(void)
err = reftable_writer_close(w);
EXPECT_ERR(err);
- EXPECT(writer_stats(w)->object_id_len == 16);
+ EXPECT(reftable_writer_stats(w)->object_id_len == 16);
reftable_writer_free(w);
strbuf_release(&buf);
}
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index a560dc17255..db8de197f6c 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -143,7 +143,7 @@ int reftable_writer_close(struct reftable_writer *w);
This struct becomes invalid when the writer is freed.
*/
-const struct reftable_stats *writer_stats(struct reftable_writer *w);
+const struct reftable_stats *reftable_writer_stats(struct reftable_writer *w);
/* reftable_writer_free deallocates memory for the writer */
void reftable_writer_free(struct reftable_writer *w);
diff --git a/reftable/writer.c b/reftable/writer.c
index 5e4e6e93416..6d979e245ff 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -695,7 +695,7 @@ static int writer_flush_block(struct reftable_writer *w)
return writer_flush_nonempty_block(w);
}
-const struct reftable_stats *writer_stats(struct reftable_writer *w)
+const struct reftable_stats *reftable_writer_stats(struct reftable_writer *w)
{
return &w->stats;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v3 0/7] reftable: avoid reading and writing empty keys
2022-02-21 18:46 ` [PATCH v3 " Han-Wen Nienhuys via GitGitGadget
` (6 preceding siblings ...)
2022-02-21 18:46 ` [PATCH v3 7/7] reftable: rename writer_stats to reftable_writer_stats Han-Wen Nienhuys via GitGitGadget
@ 2022-02-23 21:37 ` Junio C Hamano
7 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2022-02-23 21:37 UTC (permalink / raw)
To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> this series makes sure that the object record does not have to consider
> empty keys (and therefore, a NULL memcpy destination)
>
> while we're at it add some more tests, and fix a naming mistake.
Looking good. Let's mark it for 'next' and below soonish.
Thanks.
>
> Han-Wen Nienhuys (7):
> Documentation: object_id_len goes up to 31
> reftable: reject 0 object_id_len
> reftable: add a test that verifies that writing empty keys fails
> reftable: avoid writing empty keys at the block layer
> reftable: ensure that obj_id_len is >= 2 on writing
> reftable: add test for length of disambiguating prefix
> reftable: rename writer_stats to reftable_writer_stats
>
> Documentation/technical/reftable.txt | 2 +-
> reftable/block.c | 27 ++++---
> reftable/block_test.c | 5 ++
> reftable/reader.c | 5 ++
> reftable/readwrite_test.c | 105 ++++++++++++++++++++++++++-
> reftable/reftable-writer.h | 2 +-
> reftable/writer.c | 9 ++-
> 7 files changed, 136 insertions(+), 19 deletions(-)
>
>
> base-commit: 45fe28c951c3e70666ee4ef8379772851a8e4d32
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1185%2Fhanwen%2Fobj-id-len-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1185/hanwen/obj-id-len-v3
> Pull-Request: https://github.com/git/git/pull/1185
>
> Range-diff vs v2:
>
> 1: 80d29e8f269 = 1: 80d29e8f269 Documentation: object_id_len goes up to 31
> 2: 4c1a19fc4ae ! 2: 68e7bc32ff8 reftable: reject 0 object_id_len
> @@ Commit message
> reftable: reject 0 object_id_len
>
> The spec says 2 <= object_id_len <= 31. We are lenient and allow 1,
> - but we forbid 0, so we can we can be sure that we never read a
> - 0-length key.
> + but we forbid 0, so we can be sure that we never read a 0-length key.
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
>
> 3: 600b115f8b1 = 3: 8b5aebdb07a reftable: add a test that verifies that writing empty keys fails
> 4: ba036ee8543 ! 4: a9372cacd1b reftable: avoid writing empty keys at the block layer
> @@ reftable/block.c: int block_reader_first_key(struct block_reader *br, struct str
> if (n < 0)
> return n;
> + if (!key->len)
> -+ return -1;
> ++ return REFTABLE_FORMAT_ERROR;
>
> return 0;
> }
> 5: 2bd3d44ba57 = 5: 0b8a42399dd reftable: ensure that obj_id_len is >= 2 on writing
> 6: 82d36ee0e0d = 6: bdccd969475 reftable: add test for length of disambiguating prefix
> 7: c6ffdb3471c = 7: 72499a14e38 reftable: rename writer_stats to reftable_writer_stats
^ permalink raw reply [flat|nested] 33+ messages in thread