* [PATCH 0/3] null sha1 in trees @ 2012-07-28 15:01 Jeff King 2012-07-28 15:03 ` [PATCH 1/3] diff: do not use null sha1 as a sentinel value Jeff King ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: Jeff King @ 2012-07-28 15:01 UTC (permalink / raw) To: git; +Cc: Jens Lehmann, Junio C Hamano I recently came across a tree in the wild that had a submodule entry whose sha1 was the null sha1 (i.e., all-zeros). It triggered an interesting bug in the diff code, which is fixed by patch 1. Unfortunately, I have no clue how this tree came about. I'm assuming it was simply a bug somewhere in git, where the entry should have had another sha1, or possibly been removed from the tree entirely.. Patches 2 and 3 tighten up our checks for null sha1s in a few places, which might help detect such a bug earlier. I assume I have never seen this with a non-submodule entry because such a tree would fail the usual connectivity checks during fsck or during a transfer. However, since we don't enforce connectivity on submodule entries, nothing blocked the creation and propagation of such an entry. I'm not at liberty to share the repository in question, but if anybody has specific things to look for, I'd be happy to investigate further. The patches are: [1/3]: diff: do not use null sha1 as a sentinel value This is the actual bug-fix, and I hope is obviously a good thing to do. [2/3]: do not write null sha1s to on-disk index This one tries to tighten our writing a bit. There are unfortunately a lot of different code paths that create trees in git. I hope by catching the index write as a choke-point, we can prevent bugs from spreading. However, there are a lot of tree-writers that update an index in-core and then write a tree out directly. I would not be surprised if this does not catch the bug by itself, but I think it is a step in the right direction. [3/3]: fsck: detect null sha1 in tree entries And this one will at least let us notice the bug once it has happened. And if transfer.fsckObjects is set, it will prevent bogus trees from passing between repositories, containing any damage. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/3] diff: do not use null sha1 as a sentinel value 2012-07-28 15:01 [PATCH 0/3] null sha1 in trees Jeff King @ 2012-07-28 15:03 ` Jeff King 2012-07-28 15:05 ` [PATCH 2/3] do not write null sha1s to on-disk index Jeff King ` (2 subsequent siblings) 3 siblings, 0 replies; 22+ messages in thread From: Jeff King @ 2012-07-28 15:03 UTC (permalink / raw) To: git; +Cc: Jens Lehmann, Junio C Hamano The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> --- builtin.h | 2 +- builtin/blame.c | 9 ++--- builtin/cat-file.c | 2 +- builtin/diff.c | 8 +++-- combine-diff.c | 4 +-- diff-lib.c | 20 ++++++----- diff-no-index.c | 2 +- diff.c | 16 +++++---- diff.h | 5 +++ diffcore-rename.c | 2 +- diffcore.h | 2 +- revision.c | 2 ++ t/t4054-diff-bogus-tree.sh | 83 ++++++++++++++++++++++++++++++++++++++++++++++ tree-diff.c | 8 ++--- 14 files changed, 134 insertions(+), 31 deletions(-) create mode 100755 t/t4054-diff-bogus-tree.sh diff --git a/builtin.h b/builtin.h index ba6626b..8e37752 100644 --- a/builtin.h +++ b/builtin.h @@ -43,7 +43,7 @@ extern int check_pager_config(const char *cmd); struct diff_options; extern void setup_diff_pager(struct diff_options *); -extern int textconv_object(const char *path, unsigned mode, const unsigned char *sha1, char **buf, unsigned long *buf_size); +extern int textconv_object(const char *path, unsigned mode, const unsigned char *sha1, int sha1_valid, char **buf, unsigned long *buf_size); extern int cmd_add(int argc, const char **argv, const char *prefix); extern int cmd_annotate(int argc, const char **argv, const char *prefix); diff --git a/builtin/blame.c b/builtin/blame.c index 0d50273..f2c48ef 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -110,6 +110,7 @@ static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b, long ctxlen, int textconv_object(const char *path, unsigned mode, const unsigned char *sha1, + int sha1_valid, char **buf, unsigned long *buf_size) { @@ -117,7 +118,7 @@ int textconv_object(const char *path, struct userdiff_driver *textconv; df = alloc_filespec(path); - fill_filespec(df, sha1, mode); + fill_filespec(df, sha1, sha1_valid, mode); textconv = get_textconv(df); if (!textconv) { free_filespec(df); @@ -142,7 +143,7 @@ static void fill_origin_blob(struct diff_options *opt, num_read_blob++; if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && - textconv_object(o->path, o->mode, o->blob_sha1, &file->ptr, &file_size)) + textconv_object(o->path, o->mode, o->blob_sha1, 1, &file->ptr, &file_size)) ; else file->ptr = read_sha1_file(o->blob_sha1, &type, &file_size); @@ -2123,7 +2124,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt, switch (st.st_mode & S_IFMT) { case S_IFREG: if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && - textconv_object(read_from, mode, null_sha1, &buf_ptr, &buf_len)) + textconv_object(read_from, mode, null_sha1, 0, &buf_ptr, &buf_len)) strbuf_attach(&buf, buf_ptr, buf_len, buf_len + 1); else if (strbuf_read_file(&buf, read_from, st.st_size) != st.st_size) die_errno("cannot open or read '%s'", read_from); @@ -2516,7 +2517,7 @@ parse_done: die("no such path %s in %s", path, final_commit_name); if (DIFF_OPT_TST(&sb.revs->diffopt, ALLOW_TEXTCONV) && - textconv_object(path, o->mode, o->blob_sha1, (char **) &sb.final_buf, + textconv_object(path, o->mode, o->blob_sha1, 1, (char **) &sb.final_buf, &sb.final_buf_size)) ; else diff --git a/builtin/cat-file.c b/builtin/cat-file.c index af74e77..0eca2d7 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -146,7 +146,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) die("git cat-file --textconv %s: <object> must be <sha1:path>", obj_name); - if (!textconv_object(obj_context.path, obj_context.mode, sha1, &buf, &size)) + if (!textconv_object(obj_context.path, obj_context.mode, sha1, 1, &buf, &size)) die("git cat-file --textconv: unable to run textconv on %s", obj_name); break; diff --git a/builtin/diff.c b/builtin/diff.c index da8f6aa..bf72298 100644 --- a/builtin/diff.c +++ b/builtin/diff.c @@ -29,6 +29,8 @@ static void stuff_change(struct diff_options *opt, unsigned old_mode, unsigned new_mode, const unsigned char *old_sha1, const unsigned char *new_sha1, + int old_sha1_valid, + int new_sha1_valid, const char *old_name, const char *new_name) { @@ -54,8 +56,8 @@ static void stuff_change(struct diff_options *opt, one = alloc_filespec(old_name); two = alloc_filespec(new_name); - fill_filespec(one, old_sha1, old_mode); - fill_filespec(two, new_sha1, new_mode); + fill_filespec(one, old_sha1, old_sha1_valid, old_mode); + fill_filespec(two, new_sha1, new_sha1_valid, new_mode); diff_queue(&diff_queued_diff, one, two); } @@ -84,6 +86,7 @@ static int builtin_diff_b_f(struct rev_info *revs, stuff_change(&revs->diffopt, blob[0].mode, canon_mode(st.st_mode), blob[0].sha1, null_sha1, + 1, 0, path, path); diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); @@ -108,6 +111,7 @@ static int builtin_diff_blobs(struct rev_info *revs, stuff_change(&revs->diffopt, blob[0].mode, blob[1].mode, blob[0].sha1, blob[1].sha1, + 1, 1, blob[0].name, blob[1].name); diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); diff --git a/combine-diff.c b/combine-diff.c index 9786680..bb1cc96 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -111,7 +111,7 @@ static char *grab_blob(const unsigned char *sha1, unsigned int mode, return xcalloc(1, 1); } else if (textconv) { struct diff_filespec *df = alloc_filespec(path); - fill_filespec(df, sha1, mode); + fill_filespec(df, sha1, 1, mode); *size = fill_textconv(textconv, df, &blob); free_filespec(df); } else { @@ -823,7 +823,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent, &result_size, NULL, NULL); } else if (textconv) { struct diff_filespec *df = alloc_filespec(elem->path); - fill_filespec(df, null_sha1, st.st_mode); + fill_filespec(df, null_sha1, 0, st.st_mode); result_size = fill_textconv(textconv, df, &result); free_filespec(df); } else if (0 <= (fd = open(elem->path, O_RDONLY))) { diff --git a/diff-lib.c b/diff-lib.c index fc0dff3..f35de0f 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -206,7 +206,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) if (silent_on_removed) continue; diff_addremove(&revs->diffopt, '-', ce->ce_mode, - ce->sha1, ce->name, 0); + ce->sha1, !is_null_sha1(ce->sha1), + ce->name, 0); continue; } changed = match_stat_with_submodule(&revs->diffopt, ce, &st, @@ -220,6 +221,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) newmode = ce_mode_from_stat(ce, st.st_mode); diff_change(&revs->diffopt, oldmode, newmode, ce->sha1, (changed ? null_sha1 : ce->sha1), + !is_null_sha1(ce->sha1), (changed ? 0 : !is_null_sha1(ce->sha1)), ce->name, 0, dirty_submodule); } @@ -236,11 +238,12 @@ int run_diff_files(struct rev_info *revs, unsigned int option) static void diff_index_show_file(struct rev_info *revs, const char *prefix, struct cache_entry *ce, - const unsigned char *sha1, unsigned int mode, + const unsigned char *sha1, int sha1_valid, + unsigned int mode, unsigned dirty_submodule) { diff_addremove(&revs->diffopt, prefix[0], mode, - sha1, ce->name, dirty_submodule); + sha1, sha1_valid, ce->name, dirty_submodule); } static int get_stat_data(struct cache_entry *ce, @@ -295,7 +298,7 @@ static void show_new_file(struct rev_info *revs, &dirty_submodule, &revs->diffopt) < 0) return; - diff_index_show_file(revs, "+", new, sha1, mode, dirty_submodule); + diff_index_show_file(revs, "+", new, sha1, !is_null_sha1(sha1), mode, dirty_submodule); } static int show_modified(struct rev_info *revs, @@ -312,7 +315,7 @@ static int show_modified(struct rev_info *revs, &dirty_submodule, &revs->diffopt) < 0) { if (report_missing) diff_index_show_file(revs, "-", old, - old->sha1, old->ce_mode, 0); + old->sha1, 1, old->ce_mode, 0); return -1; } @@ -347,7 +350,8 @@ static int show_modified(struct rev_info *revs, return 0; diff_change(&revs->diffopt, oldmode, mode, - old->sha1, sha1, old->name, 0, dirty_submodule); + old->sha1, sha1, 1, !is_null_sha1(sha1), + old->name, 0, dirty_submodule); return 0; } @@ -380,7 +384,7 @@ static void do_oneway_diff(struct unpack_trees_options *o, struct diff_filepair *pair; pair = diff_unmerge(&revs->diffopt, idx->name); if (tree) - fill_filespec(pair->one, tree->sha1, tree->ce_mode); + fill_filespec(pair->one, tree->sha1, 1, tree->ce_mode); return; } @@ -396,7 +400,7 @@ static void do_oneway_diff(struct unpack_trees_options *o, * Something removed from the tree? */ if (!idx) { - diff_index_show_file(revs, "-", tree, tree->sha1, tree->ce_mode, 0); + diff_index_show_file(revs, "-", tree, tree->sha1, 1, tree->ce_mode, 0); return; } diff --git a/diff-no-index.c b/diff-no-index.c index 7d805a0..0b46a0f 100644 --- a/diff-no-index.c +++ b/diff-no-index.c @@ -82,7 +82,7 @@ static struct diff_filespec *noindex_filespec(const char *name, int mode) if (!name) name = "/dev/null"; s = alloc_filespec(name); - fill_filespec(s, null_sha1, mode); + fill_filespec(s, null_sha1, 0, mode); if (name == file_from_standard_input) populate_from_stdin(s); return s; diff --git a/diff.c b/diff.c index 62cbe14..15125e8 100644 --- a/diff.c +++ b/diff.c @@ -2541,12 +2541,12 @@ void free_filespec(struct diff_filespec *spec) } void fill_filespec(struct diff_filespec *spec, const unsigned char *sha1, - unsigned short mode) + int sha1_valid, unsigned short mode) { if (mode) { spec->mode = canon_mode(mode); hashcpy(spec->sha1, sha1); - spec->sha1_valid = !is_null_sha1(sha1); + spec->sha1_valid = sha1_valid; } } @@ -4693,6 +4693,7 @@ static int is_submodule_ignored(const char *path, struct diff_options *options) void diff_addremove(struct diff_options *options, int addremove, unsigned mode, const unsigned char *sha1, + int sha1_valid, const char *concatpath, unsigned dirty_submodule) { struct diff_filespec *one, *two; @@ -4724,9 +4725,9 @@ void diff_addremove(struct diff_options *options, two = alloc_filespec(concatpath); if (addremove != '+') - fill_filespec(one, sha1, mode); + fill_filespec(one, sha1, sha1_valid, mode); if (addremove != '-') { - fill_filespec(two, sha1, mode); + fill_filespec(two, sha1, sha1_valid, mode); two->dirty_submodule = dirty_submodule; } @@ -4739,6 +4740,7 @@ void diff_change(struct diff_options *options, unsigned old_mode, unsigned new_mode, const unsigned char *old_sha1, const unsigned char *new_sha1, + int old_sha1_valid, int new_sha1_valid, const char *concatpath, unsigned old_dirty_submodule, unsigned new_dirty_submodule) { @@ -4753,6 +4755,8 @@ void diff_change(struct diff_options *options, const unsigned char *tmp_c; tmp = old_mode; old_mode = new_mode; new_mode = tmp; tmp_c = old_sha1; old_sha1 = new_sha1; new_sha1 = tmp_c; + tmp = old_sha1_valid; old_sha1_valid = new_sha1_valid; + new_sha1_valid = tmp; tmp = old_dirty_submodule; old_dirty_submodule = new_dirty_submodule; new_dirty_submodule = tmp; } @@ -4763,8 +4767,8 @@ void diff_change(struct diff_options *options, one = alloc_filespec(concatpath); two = alloc_filespec(concatpath); - fill_filespec(one, old_sha1, old_mode); - fill_filespec(two, new_sha1, new_mode); + fill_filespec(one, old_sha1, old_sha1_valid, old_mode); + fill_filespec(two, new_sha1, new_sha1_valid, new_mode); one->dirty_submodule = old_dirty_submodule; two->dirty_submodule = new_dirty_submodule; diff --git a/diff.h b/diff.h index e027650..815dd7a 100644 --- a/diff.h +++ b/diff.h @@ -19,12 +19,14 @@ typedef void (*change_fn_t)(struct diff_options *options, unsigned old_mode, unsigned new_mode, const unsigned char *old_sha1, const unsigned char *new_sha1, + int old_sha1_valid, int new_sha1_valid, const char *fullpath, unsigned old_dirty_submodule, unsigned new_dirty_submodule); typedef void (*add_remove_fn_t)(struct diff_options *options, int addremove, unsigned mode, const unsigned char *sha1, + int sha1_valid, const char *fullpath, unsigned dirty_submodule); typedef void (*diff_format_fn_t)(struct diff_queue_struct *q, @@ -214,12 +216,15 @@ extern void diff_addremove(struct diff_options *, int addremove, unsigned mode, const unsigned char *sha1, + int sha1_valid, const char *fullpath, unsigned dirty_submodule); extern void diff_change(struct diff_options *, unsigned mode1, unsigned mode2, const unsigned char *sha1, const unsigned char *sha2, + int sha1_valid, + int sha2_valid, const char *fullpath, unsigned dirty_submodule1, unsigned dirty_submodule2); diff --git a/diffcore-rename.c b/diffcore-rename.c index 216a7a4..512d0ac 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -48,7 +48,7 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two, memmove(rename_dst + first + 1, rename_dst + first, (rename_dst_nr - first - 1) * sizeof(*rename_dst)); rename_dst[first].two = alloc_filespec(two->path); - fill_filespec(rename_dst[first].two, two->sha1, two->mode); + fill_filespec(rename_dst[first].two, two->sha1, two->sha1_valid, two->mode); rename_dst[first].pair = NULL; return &(rename_dst[first]); } diff --git a/diffcore.h b/diffcore.h index be0739c..1c16c85 100644 --- a/diffcore.h +++ b/diffcore.h @@ -55,7 +55,7 @@ struct diff_filespec { extern struct diff_filespec *alloc_filespec(const char *); extern void free_filespec(struct diff_filespec *); extern void fill_filespec(struct diff_filespec *, const unsigned char *, - unsigned short); + int, unsigned short); extern int diff_populate_filespec(struct diff_filespec *, int); extern void diff_free_filespec_data(struct diff_filespec *); diff --git a/revision.c b/revision.c index 9e8f47a..04c7de8 100644 --- a/revision.c +++ b/revision.c @@ -345,6 +345,7 @@ static int tree_difference = REV_TREE_SAME; static void file_add_remove(struct diff_options *options, int addremove, unsigned mode, const unsigned char *sha1, + int sha1_valid, const char *fullpath, unsigned dirty_submodule) { int diff = addremove == '+' ? REV_TREE_NEW : REV_TREE_OLD; @@ -358,6 +359,7 @@ static void file_change(struct diff_options *options, unsigned old_mode, unsigned new_mode, const unsigned char *old_sha1, const unsigned char *new_sha1, + int old_sha1_valid, int new_sha1_valid, const char *fullpath, unsigned old_dirty_submodule, unsigned new_dirty_submodule) { diff --git a/t/t4054-diff-bogus-tree.sh b/t/t4054-diff-bogus-tree.sh new file mode 100755 index 0000000..0843c87 --- /dev/null +++ b/t/t4054-diff-bogus-tree.sh @@ -0,0 +1,83 @@ +#!/bin/sh + +test_description='test diff with a bogus tree containing the null sha1' +. ./test-lib.sh + +empty_tree=4b825dc642cb6eb9a060e54bf8d69288fbee4904 + +test_expect_success 'create bogus tree' ' + bogus_tree=$( + printf "100644 fooQQQQQQQQQQQQQQQQQQQQQ" | + q_to_nul | + git hash-object -w --stdin -t tree + ) +' + +test_expect_success 'create tree with matching file' ' + echo bar >foo && + git add foo && + good_tree=$(git write-tree) + blob=$(git rev-parse :foo) +' + +test_expect_success 'raw diff shows null sha1 (addition)' ' + echo ":000000 100644 $_z40 $_z40 A foo" >expect && + git diff-tree $empty_tree $bogus_tree >actual && + test_cmp expect actual +' + +test_expect_success 'raw diff shows null sha1 (removal)' ' + echo ":100644 000000 $_z40 $_z40 D foo" >expect && + git diff-tree $bogus_tree $empty_tree >actual && + test_cmp expect actual +' + +test_expect_success 'raw diff shows null sha1 (modification)' ' + echo ":100644 100644 $blob $_z40 M foo" >expect && + git diff-tree $good_tree $bogus_tree >actual && + test_cmp expect actual +' + +test_expect_success 'raw diff shows null sha1 (other direction)' ' + echo ":100644 100644 $_z40 $blob M foo" >expect && + git diff-tree $bogus_tree $good_tree >actual && + test_cmp expect actual +' + +test_expect_success 'raw diff shows null sha1 (reverse)' ' + echo ":100644 100644 $_z40 $blob M foo" >expect && + git diff-tree -R $good_tree $bogus_tree >actual && + test_cmp expect actual +' + +test_expect_success 'raw diff shows null sha1 (index)' ' + echo ":100644 100644 $_z40 $blob M foo" >expect && + git diff-index $bogus_tree >actual && + test_cmp expect actual +' + +test_expect_success 'patch fails due to bogus sha1 (addition)' ' + test_must_fail git diff-tree -p $empty_tree $bogus_tree +' + +test_expect_success 'patch fails due to bogus sha1 (removal)' ' + test_must_fail git diff-tree -p $bogus_tree $empty_tree +' + +test_expect_success 'patch fails due to bogus sha1 (modification)' ' + test_must_fail git diff-tree -p $good_tree $bogus_tree +' + +test_expect_success 'patch fails due to bogus sha1 (other direction)' ' + test_must_fail git diff-tree -p $bogus_tree $good_tree +' + +test_expect_success 'patch fails due to bogus sha1 (reverse)' ' + test_must_fail git diff-tree -R -p $good_tree $bogus_tree +' + +test_expect_success 'patch fails due to bogus sha1 (index)' ' + test_must_fail git diff-index -p $bogus_tree +' + +test_done diff --git a/tree-diff.c b/tree-diff.c index 28ad6db..7e54833 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -49,12 +49,12 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, if (DIFF_OPT_TST(opt, RECURSIVE) && S_ISDIR(mode1)) { if (DIFF_OPT_TST(opt, TREE_IN_RECURSIVE)) { opt->change(opt, mode1, mode2, - sha1, sha2, base->buf, 0, 0); + sha1, sha2, 1, 1, base->buf, 0, 0); } strbuf_addch(base, '/'); diff_tree_sha1(sha1, sha2, base->buf, opt); } else { - opt->change(opt, mode1, mode2, sha1, sha2, base->buf, 0, 0); + opt->change(opt, mode1, mode2, sha1, sha2, 1, 1, base->buf, 0, 0); } strbuf_setlen(base, old_baselen); return 0; @@ -100,7 +100,7 @@ static void show_entry(struct diff_options *opt, const char *prefix, die("corrupt tree sha %s", sha1_to_hex(sha1)); if (DIFF_OPT_TST(opt, TREE_IN_RECURSIVE)) - opt->add_remove(opt, *prefix, mode, sha1, base->buf, 0); + opt->add_remove(opt, *prefix, mode, sha1, 1, base->buf, 0); strbuf_addch(base, '/'); @@ -108,7 +108,7 @@ static void show_entry(struct diff_options *opt, const char *prefix, show_tree(opt, prefix, &inner, base); free(tree); } else - opt->add_remove(opt, prefix[0], mode, sha1, base->buf, 0); + opt->add_remove(opt, prefix[0], mode, sha1, 1, base->buf, 0); strbuf_setlen(base, old_baselen); } -- 1.7.11.3.42.g90758bf ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/3] do not write null sha1s to on-disk index 2012-07-28 15:01 [PATCH 0/3] null sha1 in trees Jeff King 2012-07-28 15:03 ` [PATCH 1/3] diff: do not use null sha1 as a sentinel value Jeff King @ 2012-07-28 15:05 ` Jeff King 2012-12-29 10:03 ` Jonathan Nieder 2012-07-28 15:06 ` [PATCH 3/3] fsck: detect null sha1 in tree entries Jeff King 2012-07-29 22:15 ` [PATCH 0/3] null sha1 in trees Junio C Hamano 3 siblings, 1 reply; 22+ messages in thread From: Jeff King @ 2012-07-28 15:05 UTC (permalink / raw) To: git; +Cc: Jens Lehmann, Junio C Hamano We should never need to write the null sha1 into an index entry (short of the 1 in 2^160 chance that somebody actually has content that hashes to it). If we attempt to do so, it is much more likely that it is a bug, since we use the null sha1 as a sentinel value to mean "not valid". The presence of null sha1s in the index (which can come from, among other things, "update-index --cacheinfo", or by reading a corrupted tree) can cause problems for later readers, because they cannot distinguish the literal null sha1 from its use a sentinel value. For example, "git diff-files" on such an entry would make it appear as if it is stat-dirty, and until recently, the diff code assumed such an entry meant that we should be diffing a working tree file rather than a blob. Ideally, we would stop such entries from entering even our in-core index. However, we do sometimes legitimately add entries with null sha1s in order to represent these sentinel situations; simply forbidding them in add_index_entry breaks a lot of the existing code. However, we can at least make sure that our in-core sentinel representation never makes it to disk. To be thorough, we will test an attempt to add both a blob and a submodule entry. In the former case, we might run into problems anyway because we will be missing the blob object. But in the latter case, we do not enforce connectivity across gitlink entries, making this our only point of enforcement. The current implementation does not care which type of entry we are seeing, but testing both cases helps future-proof the test suite in case that changes. Signed-off-by: Jeff King <peff@peff.net> --- I confess to not being clear on all of the instances in which a null sha1 might enter the in-core index. I did try modifying add_index_entry, but the breakage was pretty severe. I traced through a few instances, and it seemed to be mostly related to unmerged entries. read-cache.c | 2 ++ t/t2107-update-index-basic.sh | 19 +++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/read-cache.c b/read-cache.c index 2f8159f..d2be78e 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1800,6 +1800,8 @@ int write_index(struct index_state *istate, int newfd) continue; if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce)) ce_smudge_racily_clean_entry(ce); + if (is_null_sha1(ce->sha1)) + return error("cache entry has null sha1: %s", ce->name); if (ce_write_entry(&c, newfd, ce, previous_name) < 0) return -1; } diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh index 809fafe..0dbbb00 100755 --- a/t/t2107-update-index-basic.sh +++ b/t/t2107-update-index-basic.sh @@ -29,4 +29,23 @@ test_expect_success 'update-index -h with corrupt index' ' grep "[Uu]sage: git update-index" broken/usage ' +test_expect_success '--cacheinfo does not accept blob null sha1' ' + echo content >file && + git add file && + git rev-parse :file >expect && + test_must_fail git update-index --cacheinfo 100644 $_z40 file && + git rev-parse :file >actual && + test_cmp expect actual +' + +test_expect_success '--cacheinfo does not accept gitlink null sha1' ' + git init submodule && + (cd submodule && test_commit foo) && + git add submodule && + git rev-parse :submodule >expect && + test_must_fail git update-index --cacheinfo 160000 $_z40 submodule && + git rev-parse :submodule >actual && + test_cmp expect actual +' + test_done -- 1.7.11.3.42.g90758bf ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] do not write null sha1s to on-disk index 2012-07-28 15:05 ` [PATCH 2/3] do not write null sha1s to on-disk index Jeff King @ 2012-12-29 10:03 ` Jonathan Nieder 2012-12-29 10:27 ` Jeff King 0 siblings, 1 reply; 22+ messages in thread From: Jonathan Nieder @ 2012-12-29 10:03 UTC (permalink / raw) To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano Hi Peff, Jeff King wrote: > --- a/read-cache.c > +++ b/read-cache.c > @@ -1800,6 +1800,8 @@ int write_index(struct index_state *istate, int newfd) > continue; > if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce)) > ce_smudge_racily_clean_entry(ce); > + if (is_null_sha1(ce->sha1)) > + return error("cache entry has null sha1: %s", ce->name); > if (ce_write_entry(&c, newfd, ce, previous_name) < 0) > return -1; Quick heads up: this is tripping for me in the "git am --abort" codepath: $ cd ~/src/linux $ git checkout v3.2.35 $ git am -3sc /tmp/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch Applying: ALSA: usb-audio: Fix missing autopm for MIDI input Using index info to reconstruct a base tree... Falling back to patching base and 3-way merge... error: Your local changes to the following files would be overwritten by merge: sound/usb/midi.c Please, commit your changes or stash them before you can merge. Aborting Failed to merge in the changes. Patch failed at 0001 ALSA: usb-audio: Fix missing autopm for MIDI input When you have resolved this problem run "git am --resolved". If you would prefer to skip this patch, instead run "git am --skip". To restore the original branch and stop patching run "git am --abort". $ git am --abort error: cache entry has null sha1: sound/usb/midi.c fatal: unable to write new index file Unstaged changes after reset: M sound/usb/midi.c $ Reproducible using v1.8.1-rc3 and master. Bisects to 4337b585 (do not write null sha1s to on-disk index, 2012-07-28). For comparison, older gits produced $ git am --abort Unstaged changes after reset: M sound/usb/midi.c which is also not great but at least didn't involve any obviously invalid behavior. Known problem? Thanks, Jonathan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] do not write null sha1s to on-disk index 2012-12-29 10:03 ` Jonathan Nieder @ 2012-12-29 10:27 ` Jeff King 2012-12-29 10:34 ` Jonathan Nieder 2012-12-29 10:40 ` [PATCH 2/3] do not write null sha1s to on-disk index Jonathan Nieder 0 siblings, 2 replies; 22+ messages in thread From: Jeff King @ 2012-12-29 10:27 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git, Jens Lehmann, Junio C Hamano On Sat, Dec 29, 2012 at 02:03:46AM -0800, Jonathan Nieder wrote: > > --- a/read-cache.c > > +++ b/read-cache.c > > @@ -1800,6 +1800,8 @@ int write_index(struct index_state *istate, int newfd) > > continue; > > if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce)) > > ce_smudge_racily_clean_entry(ce); > > + if (is_null_sha1(ce->sha1)) > > + return error("cache entry has null sha1: %s", ce->name); > > if (ce_write_entry(&c, newfd, ce, previous_name) < 0) > > return -1; > > Quick heads up: this is tripping for me in the "git am --abort" > codepath: Thanks. The intent was that this should never happen, and we are protecting against bogus sha1s slipping into the index. So either we have found a bug in "am --abort", or the assumption that it should never happen was wrong. :) > $ cd ~/src/linux > $ git checkout v3.2.35 > $ git am -3sc /tmp/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch > Applying: ALSA: usb-audio: Fix missing autopm for MIDI input > Using index info to reconstruct a base tree... > Falling back to patching base and 3-way merge... > error: Your local changes to the following files would be overwritten by merge: > sound/usb/midi.c > Please, commit your changes or stash them before you can merge. > Aborting > Failed to merge in the changes. > Patch failed at 0001 ALSA: usb-audio: Fix missing autopm for MIDI input > When you have resolved this problem run "git am --resolved". > If you would prefer to skip this patch, instead run "git am --skip". > To restore the original branch and stop patching run "git am --abort". > $ git am --abort > error: cache entry has null sha1: sound/usb/midi.c > fatal: unable to write new index file > Unstaged changes after reset: > M sound/usb/midi.c > $ I can't reproduce here. I can checkout v3.2.35, and I guess that the patch you are applying comes from f5f1654, but I don't know your local modification to sound/usb/midi.c. If I just make a fake modification, I get this: $ git checkout v3.2.35 $ echo fake >sound/usb/midi.c $ git format-patch --stdout -1 f5f1654 | git am -3 ~/patch [same errors as you] $ git am --abort Unstaged changes after reset: M sound/usb/midi.c So I wonder if there is something in the way your working tree is modified. Can you give more details? > Reproducible using v1.8.1-rc3 and master. Bisects to 4337b585 (do not > write null sha1s to on-disk index, 2012-07-28). For comparison, older > gits produced > > $ git am --abort > Unstaged changes after reset: > M sound/usb/midi.c What does your index look like afterwards? Does it have a null sha1 in it (check "ls-files -s")? -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] do not write null sha1s to on-disk index 2012-12-29 10:27 ` Jeff King @ 2012-12-29 10:34 ` Jonathan Nieder 2012-12-29 10:42 ` Jeff King 2012-12-29 11:05 ` Jeff King 2012-12-29 10:40 ` [PATCH 2/3] do not write null sha1s to on-disk index Jonathan Nieder 1 sibling, 2 replies; 22+ messages in thread From: Jonathan Nieder @ 2012-12-29 10:34 UTC (permalink / raw) To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano Jeff King wrote: > I can't reproduce here. I can checkout v3.2.35, and I guess that the > patch you are applying comes from f5f1654, but I don't know your > local modification to sound/usb/midi.c. No local modification. The unstaged change after "git am --abort" to recover from a conflicted git am is a longstanding bug (at least a couple of years old). The patch creating conflicts is queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch from git://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-3.2.y-queue.git [...] >> $ git am --abort >> Unstaged changes after reset: >> M sound/usb/midi.c > > What does your index look like afterwards? Does it have a null sha1 in > it (check "ls-files -s")? $ git diff-index --abbrev HEAD :100644 100644 eeefbce3873c... 000000000000... M sound/usb/midi.c $ git ls-files --abbrev -s sound/usb/midi.c 100644 eeefbce3873c 0 sound/usb/midi.c ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] do not write null sha1s to on-disk index 2012-12-29 10:34 ` Jonathan Nieder @ 2012-12-29 10:42 ` Jeff King 2012-12-29 10:51 ` Jonathan Nieder 2012-12-29 11:05 ` Jeff King 1 sibling, 1 reply; 22+ messages in thread From: Jeff King @ 2012-12-29 10:42 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git, Jens Lehmann, Junio C Hamano On Sat, Dec 29, 2012 at 02:34:30AM -0800, Jonathan Nieder wrote: > Jeff King wrote: > > > I can't reproduce here. I can checkout v3.2.35, and I guess that the > > patch you are applying comes from f5f1654, but I don't know your > > local modification to sound/usb/midi.c. > > No local modification. The unstaged change after "git am --abort" to > recover from a conflicted git am is a longstanding bug (at least a > couple of years old). > > The patch creating conflicts is > > queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch > > from git://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-3.2.y-queue.git Hrm. But your output does not say there is a conflict. It says you have a local modification and it does not try the merge: > $ git am -3sc /tmp/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch > Applying: ALSA: usb-audio: Fix missing autopm for MIDI input > Using index info to reconstruct a base tree... > Falling back to patching base and 3-way merge... > error: Your local changes to the following files would be overwritten by merge: > sound/usb/midi.c > Please, commit your changes or stash them before you can merge. > Aborting If I try to apply it, I get a real conflict: $ git checkout v3.2.35 $ git am -3sc linux-3.2.y-queue/queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch Applying: ALSA: usb-audio: Fix missing autopm for MIDI input Using index info to reconstruct a base tree... M sound/usb/midi.c Falling back to patching base and 3-way merge... Auto-merging sound/usb/midi.c CONFLICT (content): Merge conflict in sound/usb/midi.c Although running "git am --abort" after that does seem to produce the "cache entry has null sha1" error. So I can start investigating from there. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] do not write null sha1s to on-disk index 2012-12-29 10:42 ` Jeff King @ 2012-12-29 10:51 ` Jonathan Nieder 0 siblings, 0 replies; 22+ messages in thread From: Jonathan Nieder @ 2012-12-29 10:51 UTC (permalink / raw) To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano Jeff King wrote: > Hrm. But your output does not say there is a conflict. It says you have > a local modification and it does not try the merge: That's probably operator error on my part when gathering output to paste into the email. In other words, nothing to see there. :) Sorry for the confusion. [...] > If I try to apply it, I get a real conflict: [...] > Although running "git am --abort" after that does seem to produce the > "cache entry has null sha1" error. Yep, that is what my report should have said. 'night, Jonathan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] do not write null sha1s to on-disk index 2012-12-29 10:34 ` Jonathan Nieder 2012-12-29 10:42 ` Jeff King @ 2012-12-29 11:05 ` Jeff King 2012-12-29 20:51 ` [BUG] two-way read-tree can write null sha1s into index Jeff King 1 sibling, 1 reply; 22+ messages in thread From: Jeff King @ 2012-12-29 11:05 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git, Jens Lehmann, Junio C Hamano On Sat, Dec 29, 2012 at 02:34:30AM -0800, Jonathan Nieder wrote: > >> $ git am --abort > >> Unstaged changes after reset: > >> M sound/usb/midi.c > > > > What does your index look like afterwards? Does it have a null sha1 in > > it (check "ls-files -s")? > > $ git diff-index --abbrev HEAD > :100644 100644 eeefbce3873c... 000000000000... M sound/usb/midi.c > $ git ls-files --abbrev -s sound/usb/midi.c > 100644 eeefbce3873c 0 sound/usb/midi.c Hmm. It looks like "am --abort" overwrites the index again after the read-tree which complains. If I downgrade the error in write_index to a warning, like this: diff --git a/read-cache.c b/read-cache.c index fda78bc..70a6d86 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1797,7 +1797,7 @@ int write_index(struct index_state *istate, int newfd) if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce)) ce_smudge_racily_clean_entry(ce); if (is_null_sha1(ce->sha1)) - return error("cache entry has null sha1: %s", ce->name); + warning("cache entry has null sha1: %s", ce->name); if (ce_write_entry(&c, newfd, ce, previous_name) < 0) return -1; } and then just run this: [clear state from last run] $ rm -rf .git/rebase-apply $ git reset --hard [apply the patch; we get a conflict] $ git am -3sc queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch [now run just the read-tree from "am --abort"] $ git.compile read-tree --reset -u HEAD ORIG_HEAD warning: cache entry has null sha1: sound/usb/midi.c [and now check our index] $ git ls-files -s sound/usb/midi.c 100644 0000000000000000000000000000000000000000 0 sound/usb/midi.c [yes, this index is bogus] $ git write-tree error: invalid object 100644 0000000000000000000000000000000000000000 for 'sound/usb/midi.c' fatal: git-write-tree: error building trees So I think this check may actually be finding a real bug. I have seen these null sha1s in the wild, but I was never able to track down the actual cause. Maybe this will give us a clue. Now we just need to work backwards and figure out who is putting it in the in-memory index and why. I'll try to work on it tomorrow, but please don't let that stop you if you want to keep digging in the meantime. -Peff ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [BUG] two-way read-tree can write null sha1s into index 2012-12-29 11:05 ` Jeff King @ 2012-12-29 20:51 ` Jeff King 2013-01-01 22:24 ` Junio C Hamano 0 siblings, 1 reply; 22+ messages in thread From: Jeff King @ 2012-12-29 20:51 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git, Junio C Hamano On Sat, Dec 29, 2012 at 06:05:41AM -0500, Jeff King wrote: > [clear state from last run] > $ rm -rf .git/rebase-apply > $ git reset --hard > > [apply the patch; we get a conflict] > $ git am -3sc queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch > > [now run just the read-tree from "am --abort"] > $ git.compile read-tree --reset -u HEAD ORIG_HEAD > warning: cache entry has null sha1: sound/usb/midi.c > > [and now check our index] > $ git ls-files -s sound/usb/midi.c > 100644 0000000000000000000000000000000000000000 0 sound/usb/midi.c > > [yes, this index is bogus] > $ git write-tree > error: invalid object 100644 0000000000000000000000000000000000000000 for 'sound/usb/midi.c' > fatal: git-write-tree: error building trees > > So I think this check may actually be finding a real bug. I have seen > these null sha1s in the wild, but I was never able to track down the > actual cause. Maybe this will give us a clue. Now we just need to work > backwards and figure out who is putting it in the in-memory index and > why. I made some progress on this, but I'd like a sanity check from others (especially Junio). As far as I can tell, this is a bug in read-tree. When we call "read-tree --reset -u HEAD ORIG_HEAD", the first thing we do with the index is call read_cache_unmerged. Originally that would read the index, leaving aside any unmerged entries. However, as of d1a43f2 (reset --hard/read-tree --reset -u: remove unmerged new paths, 2008-10-15), it actually creates a new cache entry. This serves as a placeholder, so that we later know to update the working tree. However, we later noticed that the sha1 of that unmerged entry was just copied from some higher stage, leaving you with random content in the index. That was fixed by e11d7b5 ("reset --merge": fix unmerged case, 2009-12-31), which instead puts the null sha1 into the newly created entry, and sets a CE_CONFLICTED flag. At the same time, it teaches the unpack-trees machinery to pay attention to this flag, so that oneway_merge throws away the current value. However, it did not update the code paths for twoway_merge, which is where we end up in the read-tree above. We notice that the HEAD and ORIG_HEAD versions are the same, and say "oh, we can just reuse the current version". But that's not true. The current version is bogus. So I think we need to update twoway_merge to recognize unmerged entries, which gives us two options: 1. Reject the merge. 2. Throw away the current unmerged entry in favor of the "new" entry (when old and new are the same, of course; otherwise we would reject). I think (2) is the right thing. It fixes the entry of the bogus sha1 into the index, _and_ it solves the problem that "git am --abort" leaves the conflicted entry as a modification. It should just go away. But maybe I am forgetting some other case where read-tree should be more conservative, and (1) is a safer choice. Something like this patch: diff --git a/unpack-trees.c b/unpack-trees.c index 6d96366..e06e01f 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1746,14 +1746,19 @@ int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o) newtree = NULL; if (current) { - if ((!oldtree && !newtree) || /* 4 and 5 */ - (!oldtree && newtree && - same(current, newtree)) || /* 6 and 7 */ - (oldtree && newtree && - same(oldtree, newtree)) || /* 14 and 15 */ - (oldtree && newtree && - !same(oldtree, newtree) && /* 18 and 19 */ - same(current, newtree))) { + if (current->ce_flags & CE_CONFLICTED) { + if (same(oldtree, newtree)) + return merged_entry(newtree, current, o); + return o->gently ? -1 : reject_merge(current, o); + } + else if ((!oldtree && !newtree) || /* 4 and 5 */ + (!oldtree && newtree && + same(current, newtree)) || /* 6 and 7 */ + (oldtree && newtree && + same(oldtree, newtree)) || /* 14 and 15 */ + (oldtree && newtree && + !same(oldtree, newtree) && /* 18 and 19 */ + same(current, newtree))) { return keep_entry(current, o); } else if (oldtree && !newtree && same(current, oldtree)) { I suspect threeway_merge may need a similar update, but I haven't looked too carefully yet. -Peff ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [BUG] two-way read-tree can write null sha1s into index 2012-12-29 20:51 ` [BUG] two-way read-tree can write null sha1s into index Jeff King @ 2013-01-01 22:24 ` Junio C Hamano 2013-01-03 8:37 ` Jeff King 0 siblings, 1 reply; 22+ messages in thread From: Junio C Hamano @ 2013-01-01 22:24 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Nieder, git Jeff King <peff@peff.net> writes: > So I think we need to update twoway_merge to recognize unmerged entries, > which gives us two options: > > 1. Reject the merge. > > 2. Throw away the current unmerged entry in favor of the "new" entry > (when old and new are the same, of course; otherwise we would > reject). > > I think (2) is the right thing. As "--reset" in "read-tree --reset -u A B" is a way to say "we know we are based on A and we want to go to B no matter what", I agree we should not reject the merge. With -m instead of --reset, I am not sure what the right semantics should be, though. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [BUG] two-way read-tree can write null sha1s into index 2013-01-01 22:24 ` Junio C Hamano @ 2013-01-03 8:37 ` Jeff King 2013-01-03 15:34 ` Junio C Hamano 0 siblings, 1 reply; 22+ messages in thread From: Jeff King @ 2013-01-03 8:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Nieder, git On Tue, Jan 01, 2013 at 02:24:46PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > So I think we need to update twoway_merge to recognize unmerged entries, > > which gives us two options: > > > > 1. Reject the merge. > > > > 2. Throw away the current unmerged entry in favor of the "new" entry > > (when old and new are the same, of course; otherwise we would > > reject). > > > > I think (2) is the right thing. > > As "--reset" in "read-tree --reset -u A B" is a way to say "we know > we are based on A and we want to go to B no matter what", I agree we > should not reject the merge. > > With -m instead of --reset, I am not sure what the right semantics > should be, though. Good point; I was just thinking about the --reset case. With "-m", though, we could in theory carry over the unmerged entries (again, assuming that "old" and "new" are the same; otherwise it is an obvious reject). But those entries would be confused with any new unmerged entries we create. It seems we already protect against this, though: "read-tree -m" will not run at all if you have unmerged entries. Likewise, "checkout" seems to have similar protections. So I think it may be a non-issue. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [BUG] two-way read-tree can write null sha1s into index 2013-01-03 8:37 ` Jeff King @ 2013-01-03 15:34 ` Junio C Hamano 2013-01-03 20:23 ` Jeff King 0 siblings, 1 reply; 22+ messages in thread From: Junio C Hamano @ 2013-01-03 15:34 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Nieder, git Jeff King <peff@peff.net> writes: > On Tue, Jan 01, 2013 at 02:24:46PM -0800, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >> > So I think we need to update twoway_merge to recognize unmerged entries, >> > which gives us two options: >> > >> > 1. Reject the merge. >> > >> > 2. Throw away the current unmerged entry in favor of the "new" entry >> > (when old and new are the same, of course; otherwise we would >> > reject). >> > >> > I think (2) is the right thing. >> >> As "--reset" in "read-tree --reset -u A B" is a way to say "we know >> we are based on A and we want to go to B no matter what", I agree we >> should not reject the merge. >> >> With -m instead of --reset, I am not sure what the right semantics >> should be, though. > > Good point; I was just thinking about the --reset case. > > With "-m", though, we could in theory carry over the unmerged entries > (again, assuming that "old" and "new" are the same; otherwise it is an > obvious reject). But those entries would be confused with any new > unmerged entries we create. It seems we already protect against this, > though: "read-tree -m" will not run at all if you have unmerged entries. > > Likewise, "checkout" seems to have similar protections. > > So I think it may be a non-issue. Yeah. Also earlier in the thread you mentioned three-way case, but I do not think we ever would want --reset with three trees, so I think that too is a non-issue for the same reason. I would still feel safer if we expressed the expectation of the callee in the code, perhaps like this in the two-way case: if (current->ce_flags & CE_CONFLICTED) { if (!o->reset) { ... either die or fail ... } else { ... your fix ... } } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [BUG] two-way read-tree can write null sha1s into index 2013-01-03 15:34 ` Junio C Hamano @ 2013-01-03 20:23 ` Jeff King 2013-01-03 20:34 ` Junio C Hamano 0 siblings, 1 reply; 22+ messages in thread From: Jeff King @ 2013-01-03 20:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Nieder, git On Thu, Jan 03, 2013 at 07:34:53AM -0800, Junio C Hamano wrote: > > Good point; I was just thinking about the --reset case. > > > > With "-m", though, we could in theory carry over the unmerged entries > > (again, assuming that "old" and "new" are the same; otherwise it is an > > obvious reject). But those entries would be confused with any new > > unmerged entries we create. It seems we already protect against this, > > though: "read-tree -m" will not run at all if you have unmerged entries. > > > > Likewise, "checkout" seems to have similar protections. > > > > So I think it may be a non-issue. > > Yeah. Also earlier in the thread you mentioned three-way case, but > I do not think we ever would want --reset with three trees, so I > think that too is a non-issue for the same reason. Yeah, agreed; we should always reject in the three-way case. I would worry more that it has a bug where something is _not_ rejected, and we end up putting a bogus null sha1 entry into the index (which is the actual problem with twoway_merge). IOW, if we have the bogus sha1 in the index (because we marked it with CE_CONFLICTED), and the two sides and the common ancestor are all the same, would we blindly carry through the bogus conflicted entry (which we would prefer, because it has the up-to-date stat information)? Or are you suggesting that the three-way case should always be protected by checking that there are no unmerged entries before we start it? That seems sane to me, but I haven't confirmed that that is the case. > I would still feel safer if we expressed the expectation of > the callee in the code, perhaps like this in the two-way case: > > if (current->ce_flags & CE_CONFLICTED) { > if (!o->reset) { > ... either die or fail ... > } else { > ... your fix ... > } > } Agreed. I looked at that, but it seemed like it was going to involve repeating a lot of the "are the two trees the same" logic. Let me see if I can refactor it to avoid that. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [BUG] two-way read-tree can write null sha1s into index 2013-01-03 20:23 ` Jeff King @ 2013-01-03 20:34 ` Junio C Hamano 2013-01-03 20:36 ` Jeff King 0 siblings, 1 reply; 22+ messages in thread From: Junio C Hamano @ 2013-01-03 20:34 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Nieder, git Jeff King <peff@peff.net> writes: > Or are you suggesting that the three-way case should always be protected > by checking that there are no unmerged entries before we start it? That > seems sane to me, but I haven't confirmed that that is the case. I think the normal (and hopefully only) "-m -u O A B" use case of threeway is the bog-standard "git merge", which requires even more: your index must exactly match HEAD, even though you are allowed to have local changes in the working tree. That requirement is not likely to change, as cleanly merged paths are automatically added to the index, so "diff --cached" should show only the changes from cleanly merged part, while "diff" should show paths that still needs user's help. If we allowed local modification to the index (let alone conflicted entries in it) before the merge begins, the users would not be able to tell which paths are in what state after a half-merge stops and asks for help. Updated paths may not have anything to do with the merge (i.e. earlier "git add" before the merge started), conflicting paths may not have anything to do with the merge (i.e. leftover conflicts before the merge started). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [BUG] two-way read-tree can write null sha1s into index 2013-01-03 20:34 ` Junio C Hamano @ 2013-01-03 20:36 ` Jeff King 2013-01-03 22:33 ` Junio C Hamano 0 siblings, 1 reply; 22+ messages in thread From: Jeff King @ 2013-01-03 20:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Nieder, git On Thu, Jan 03, 2013 at 12:34:27PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Or are you suggesting that the three-way case should always be protected > > by checking that there are no unmerged entries before we start it? That > > seems sane to me, but I haven't confirmed that that is the case. > > I think the normal (and hopefully only) "-m -u O A B" use case of > threeway is the bog-standard "git merge", which requires even more: > your index must exactly match HEAD, even though you are allowed to > have local changes in the working tree. > > That requirement is not likely to change, as cleanly merged paths > are automatically added to the index, so "diff --cached" should show > only the changes from cleanly merged part, while "diff" should show > paths that still needs user's help. > > If we allowed local modification to the index (let alone conflicted > entries in it) before the merge begins, the users would not be able > to tell which paths are in what state after a half-merge stops and > asks for help. Updated paths may not have anything to do with the > merge (i.e. earlier "git add" before the merge started), conflicting > paths may not have anything to do with the merge (i.e. leftover > conflicts before the merge started). Oh, I agree it's insane to try to carry through unmerged entries. I'm just concerned that not all code paths are careful enough to check. They probably are, though, as a null sha1 in your index would be the least of your worries, and somebody would have noticed in the last 7 years. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [BUG] two-way read-tree can write null sha1s into index 2013-01-03 20:36 ` Jeff King @ 2013-01-03 22:33 ` Junio C Hamano 2013-01-07 13:46 ` Jeff King 0 siblings, 1 reply; 22+ messages in thread From: Junio C Hamano @ 2013-01-03 22:33 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Nieder, git Jeff King <peff@peff.net> writes: > Oh, I agree it's insane to try to carry through unmerged entries. I'm > just concerned that not all code paths are careful enough to check. I would actually be surprised if some code path do assume somebody might give them an index with conflicting entries in it and guard against it. We have been coding under the "index must exactly match the second tree when three-way unpack_trees() begin" requirement since day one. An conflicted entry will appear as "index and HEAD not matching" and will cause reject_merge() early in threeway_merge() anyway, no? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [BUG] two-way read-tree can write null sha1s into index 2013-01-03 22:33 ` Junio C Hamano @ 2013-01-07 13:46 ` Jeff King 0 siblings, 0 replies; 22+ messages in thread From: Jeff King @ 2013-01-07 13:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Nieder, git On Thu, Jan 03, 2013 at 02:33:09PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Oh, I agree it's insane to try to carry through unmerged entries. I'm > > just concerned that not all code paths are careful enough to check. > > I would actually be surprised if some code path do assume somebody > might give them an index with conflicting entries in it and guard > against it. We have been coding under the "index must exactly match > the second tree when three-way unpack_trees() begin" requirement > since day one. An conflicted entry will appear as "index and HEAD > not matching" and will cause reject_merge() early in threeway_merge() > anyway, no? Hmm. There is code in threeway_merge to handle a few cases: /* * We start with cases where the index is allowed to match * something other than the head: #14(ALT) and #2ALT, where it * is permitted to match the result instead. */ /* #14, #14ALT, #2ALT */ if (remote && !df_conflict_head && head_match && !remote_match) { if (index && !same(index, remote) && !same(index, head)) return o->gently ? -1 : reject_merge(index, o); return merged_entry(remote, index, o); } but I do not think we have to worry, because we know that the index will never match remote, either (and merged_entry has already been taught to be wary of the conflicted bit, anyway). I'm not entirely clear on how that would get triggered if all of the callers avoid operating on a modified index. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] do not write null sha1s to on-disk index 2012-12-29 10:27 ` Jeff King 2012-12-29 10:34 ` Jonathan Nieder @ 2012-12-29 10:40 ` Jonathan Nieder 1 sibling, 0 replies; 22+ messages in thread From: Jonathan Nieder @ 2012-12-29 10:40 UTC (permalink / raw) To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano Jeff King wrote: > Can you give more details? $ GIT_TRACE=1 git am --abort trace: exec: 'git-am' '--abort' trace: run_command: 'git-am' '--abort' trace: built-in: git 'rev-parse' '--parseopt' '--' '--abort' trace: built-in: git 'rev-parse' '--git-dir' trace: built-in: git 'rev-parse' '--show-prefix' trace: built-in: git 'rev-parse' '--show-toplevel' trace: built-in: git 'var' 'GIT_COMMITTER_IDENT' trace: built-in: git 'rev-parse' '--verify' '-q' 'HEAD' trace: built-in: git 'config' '--bool' '--get' 'am.keepcr' trace: built-in: git 'rerere' 'clear' trace: built-in: git 'rev-parse' '--verify' '-q' 'HEAD' trace: built-in: git 'read-tree' '--reset' '-u' 'HEAD' 'ORIG_HEAD' error: cache entry has null sha1: sound/usb/midi.c fatal: unable to write new index file trace: built-in: git 'reset' 'ORIG_HEAD' Unstaged changes after reset: M sound/usb/midi.c ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 3/3] fsck: detect null sha1 in tree entries 2012-07-28 15:01 [PATCH 0/3] null sha1 in trees Jeff King 2012-07-28 15:03 ` [PATCH 1/3] diff: do not use null sha1 as a sentinel value Jeff King 2012-07-28 15:05 ` [PATCH 2/3] do not write null sha1s to on-disk index Jeff King @ 2012-07-28 15:06 ` Jeff King 2012-07-29 22:15 ` [PATCH 0/3] null sha1 in trees Junio C Hamano 3 siblings, 0 replies; 22+ messages in thread From: Jeff King @ 2012-07-28 15:06 UTC (permalink / raw) To: git; +Cc: Jens Lehmann, Junio C Hamano Short of somebody happening to beat the 1 in 2^160 odds of actually generating content that hashes to the null sha1, we should never see this value in a tree entry. So let's have fsck warn if it it seen. As in the previous commit, we test both blob and submodule entries to future-proof the test suite against the implementation depending on connectivity to notice the error. Signed-off-by: Jeff King <peff@peff.net> --- I left this as a warning, because git can still mostly handle such trees, which may aid in debugging or salvaging data. fsck.c | 8 +++++++- t/t1450-fsck.sh | 26 ++++++++++++++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/fsck.c b/fsck.c index 4c63b2c..7395ef6 100644 --- a/fsck.c +++ b/fsck.c @@ -139,6 +139,7 @@ static int verify_ordered(unsigned mode1, const char *name1, unsigned mode2, con static int fsck_tree(struct tree *item, int strict, fsck_error error_func) { int retval; + int has_null_sha1 = 0; int has_full_path = 0; int has_empty_name = 0; int has_zero_pad = 0; @@ -157,9 +158,12 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func) while (desc.size) { unsigned mode; const char *name; + const unsigned char *sha1; - tree_entry_extract(&desc, &name, &mode); + sha1 = tree_entry_extract(&desc, &name, &mode); + if (is_null_sha1(sha1)) + has_null_sha1 = 1; if (strchr(name, '/')) has_full_path = 1; if (!*name) @@ -207,6 +211,8 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func) } retval = 0; + if (has_null_sha1) + retval += error_func(&item->object, FSCK_WARN, "contains entries pointing to null sha1"); if (has_full_path) retval += error_func(&item->object, FSCK_WARN, "contains full pathnames"); if (has_empty_name) diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 5b79c51..bf7a2cd 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -213,4 +213,30 @@ test_expect_success 'rev-list --verify-objects with bad sha1' ' grep -q "error: sha1 mismatch 63ffffffffffffffffffffffffffffffffffffff" out ' +_bz='\0' +_bz5="$_bz$_bz$_bz$_bz$_bz" +_bz20="$_bz5$_bz5$_bz5$_bz5" + +test_expect_success 'fsck notices blob entry pointing to null sha1' ' + (git init null-blob && + cd null-blob && + sha=$(printf "100644 file$_bz$_bz20" | + git hash-object -w --stdin -t tree) && + git fsck 2>out && + cat out && + grep "warning.*null sha1" out + ) +' + +test_expect_success 'fsck notices submodule entry pointing to null sha1' ' + (git init null-commit && + cd null-commit && + sha=$(printf "160000 submodule$_bz$_bz20" | + git hash-object -w --stdin -t tree) && + git fsck 2>out && + cat out && + grep "warning.*null sha1" out + ) +' + test_done -- 1.7.11.3.42.g90758bf ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 0/3] null sha1 in trees 2012-07-28 15:01 [PATCH 0/3] null sha1 in trees Jeff King ` (2 preceding siblings ...) 2012-07-28 15:06 ` [PATCH 3/3] fsck: detect null sha1 in tree entries Jeff King @ 2012-07-29 22:15 ` Junio C Hamano 2012-07-30 15:42 ` Jeff King 3 siblings, 1 reply; 22+ messages in thread From: Junio C Hamano @ 2012-07-29 22:15 UTC (permalink / raw) To: Jeff King; +Cc: git, Jens Lehmann All looked reasonable, even though I'd want to read the surrounding codepath over for 2/3 a few more times. Will queue; thanks. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/3] null sha1 in trees 2012-07-29 22:15 ` [PATCH 0/3] null sha1 in trees Junio C Hamano @ 2012-07-30 15:42 ` Jeff King 0 siblings, 0 replies; 22+ messages in thread From: Jeff King @ 2012-07-30 15:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jens Lehmann On Sun, Jul 29, 2012 at 03:15:11PM -0700, Junio C Hamano wrote: > All looked reasonable, even though I'd want to read the > surrounding codepath over for 2/3 a few more times. > > Will queue; thanks. Yeah, 2/3 is the one that gives me the most pause. I originally assumed we would never want to put a null-sha1 entry into the in-core index, but that turned out to be very wrong. So I softened my assumption to "we would never want to put a null-sha1 entry on disk". Which seems like the right thing to me and passes the test suite, but there might be an unexercised code path in there. In an ideal world, I would trace each code path that inserts a cache entry all the way back to its original sha1, but when I tried it turned out too complex. The educated-guess-and-see-if-it-passes-tests approach to coding does not make me happy, but I don't see a good alternative. Patch 1 is a pure bug-fix, and it would be nice for it to make it onto a maint- track or into 1.7.12 (though I can't guarantee that I got it 100% right, I am confident that it is at least trying to do the right thing, and any fixes would be incremental on top). The other two should be fine to cook longer and go in post-1.7.12. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2013-01-07 13:47 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-07-28 15:01 [PATCH 0/3] null sha1 in trees Jeff King 2012-07-28 15:03 ` [PATCH 1/3] diff: do not use null sha1 as a sentinel value Jeff King 2012-07-28 15:05 ` [PATCH 2/3] do not write null sha1s to on-disk index Jeff King 2012-12-29 10:03 ` Jonathan Nieder 2012-12-29 10:27 ` Jeff King 2012-12-29 10:34 ` Jonathan Nieder 2012-12-29 10:42 ` Jeff King 2012-12-29 10:51 ` Jonathan Nieder 2012-12-29 11:05 ` Jeff King 2012-12-29 20:51 ` [BUG] two-way read-tree can write null sha1s into index Jeff King 2013-01-01 22:24 ` Junio C Hamano 2013-01-03 8:37 ` Jeff King 2013-01-03 15:34 ` Junio C Hamano 2013-01-03 20:23 ` Jeff King 2013-01-03 20:34 ` Junio C Hamano 2013-01-03 20:36 ` Jeff King 2013-01-03 22:33 ` Junio C Hamano 2013-01-07 13:46 ` Jeff King 2012-12-29 10:40 ` [PATCH 2/3] do not write null sha1s to on-disk index Jonathan Nieder 2012-07-28 15:06 ` [PATCH 3/3] fsck: detect null sha1 in tree entries Jeff King 2012-07-29 22:15 ` [PATCH 0/3] null sha1 in trees Junio C Hamano 2012-07-30 15:42 ` Jeff King
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).