* [BUG] segmentation fault in git-diff @ 2020-04-09 22:22 Érico Rolim 2020-04-09 22:45 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Érico Rolim @ 2020-04-09 22:22 UTC (permalink / raw) To: git Hey there! I have found a bug in the git-diff utility, which is reproducible in the next branch. In any repository, if I run git diff :/any/path/ (The important part is the trailing forward slash. No slash will generate either a valid diff or an error message about the path not being known. ":/" also works without issue) it will trigger a SIGSEV. I have traced that back to the refs_read_raw_ref() function, where it seems the ref_store parameter passed to it is 0x0 (according to GDB). It's always possible to include a null-check in that function to fix the issue, but I don't think that'd be the best solution. I can attempt to fix it, but I don't know what (and where) the proper solution would be, because I don't know what the expected behavior is here, nor where exactly it should fail. Do you think this could also warrant the creation of a test? I don't know what the best debug resources (valgrind output, core file from gdb) would be, but I can provide them if necessary. Thanks, Érico Nogueira ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] segmentation fault in git-diff 2020-04-09 22:22 [BUG] segmentation fault in git-diff Érico Rolim @ 2020-04-09 22:45 ` Junio C Hamano 2020-04-09 22:47 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2020-04-09 22:45 UTC (permalink / raw) To: Érico Rolim; +Cc: git Érico Rolim <erico.erc@gmail.com> writes: > I have found a bug in the git-diff utility, which is reproducible in > the next branch. In any repository, if I run > > git diff :/any/path/ > > (The important part is the trailing forward slash. No slash will > generate either a valid diff or an error message about the path not > being known. ":/" also works without issue) This is 'next' running for me: $ git diff :/any/path fatal: ambiguous argument ':/any/path': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]' If looking for a commit that has a string "any/path" in its commit message may be consuming stack resources that are proportional to the depth of the history, however, I can imagine that it might run out of the memory and die, though. Another thing that may be interesting to try is to use "git rev-parse :/any/path" in the same repository you are having trouble with. If the "string :/any/path to a revision" logic is where it is dying, it would die the same way. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] segmentation fault in git-diff 2020-04-09 22:45 ` Junio C Hamano @ 2020-04-09 22:47 ` Junio C Hamano 2020-04-09 22:57 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2020-04-09 22:47 UTC (permalink / raw) To: Érico Rolim; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > This is 'next' running for me: Scratch all that---sorry, but I did see the note about trailing slash, but somehow managed to forget adding it when I tried it. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] segmentation fault in git-diff 2020-04-09 22:47 ` Junio C Hamano @ 2020-04-09 22:57 ` Junio C Hamano 2020-04-09 23:37 ` Junio C Hamano 2020-04-09 23:41 ` Jeff King 0 siblings, 2 replies; 10+ messages in thread From: Junio C Hamano @ 2020-04-09 22:57 UTC (permalink / raw) To: Érico Rolim; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> This is 'next' running for me: > > Scratch all that---sorry, but I did see the note about trailing > slash, but somehow managed to forget adding it when I tried it. $ git checkout v2.22.0 && make && ./git-rev-parse :/any/path/ segfaults, while $ git checkout v2.21.0 && make && ./git-rev-parse :/any/path/ is OK. We should be able to bisect this fairly straightforward between these two. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] segmentation fault in git-diff 2020-04-09 22:57 ` Junio C Hamano @ 2020-04-09 23:37 ` Junio C Hamano 2020-04-09 23:41 ` Jeff King 1 sibling, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2020-04-09 23:37 UTC (permalink / raw) To: git; +Cc: Érico Rolim, Nguyễn Thái Ngọc Duy The problem bisects down to c931ba4e (sha1-name.c: remove the_repo from handle_one_ref(), 2019-04-16), which did this: - for_each_ref(handle_one_ref, &list); - head_ref(handle_one_ref, &list); + cb.repo = repo; + cb.list = &list; + refs_for_each_ref(repo->refs, handle_one_ref, &cb); + refs_head_ref(repo->refs, handle_one_ref, &cb); The old code used the helper for_each_ref(). This is a thin wrapper around refs_for_each_ref() and allows the default repository object to be used implicitly by the caller. It is understandable that the code wanted to work on arbitrary repository object, and replaced the for_each_ref() helper with refs_for_each_ref() helper that takes any ref store object. But there is a small mistake. for_each_ref() makes sure that the ref store is initialized; the new code blindly assumes it has already been initialized. int for_each_ref(each_ref_fn fn, void *cb_data) { return refs_for_each_ref(get_main_ref_store(the_repository), fn, cb_data); } So, I think the fix is simple. With the attached one liner on top of c931ba4e (sha1-name.c: remove the_repo from handle_one_ref(), 2019-04-16), $ git rev-parse :/any/path/ no longer segfaults. I think it would also work just fine when merged to more modern codebase, but I haven't tried it (yet). sha1-name.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sha1-name.c b/sha1-name.c index d9050776dd..3aba62938f 100644 --- a/sha1-name.c +++ b/sha1-name.c @@ -1771,7 +1771,7 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo, cb.repo = repo; cb.list = &list; - refs_for_each_ref(repo->refs, handle_one_ref, &cb); + refs_for_each_ref(get_main_ref_store(repo), handle_one_ref, &cb); refs_head_ref(repo->refs, handle_one_ref, &cb); commit_list_sort_by_date(&list); return get_oid_oneline(repo, name + 2, oid, list); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [BUG] segmentation fault in git-diff 2020-04-09 22:57 ` Junio C Hamano 2020-04-09 23:37 ` Junio C Hamano @ 2020-04-09 23:41 ` Jeff King 2020-04-10 0:03 ` Re* " Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: Jeff King @ 2020-04-09 23:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: Érico Rolim, git On Thu, Apr 09, 2020 at 03:57:00PM -0700, Junio C Hamano wrote: > $ git checkout v2.22.0 && make && ./git-rev-parse :/any/path/ > > segfaults, while > > $ git checkout v2.21.0 && make && ./git-rev-parse :/any/path/ > > is OK. We should be able to bisect this fairly straightforward > between these two. Indeed. The culprit is c931ba4e78 (sha1-name.c: remove the_repo from handle_one_ref(), 2019-04-16), which swapped out for_each_ref() for refs_for_each_ref(repo->refs). But it misses the access method for repo->refs which lazy-initializes the pointer. So the immediate fix is: diff --git a/sha1-name.c b/sha1-name.c index d9050776dd..c679e246cd 100644 --- a/sha1-name.c +++ b/sha1-name.c @@ -1771,8 +1771,8 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo, cb.repo = repo; cb.list = &list; - refs_for_each_ref(repo->refs, handle_one_ref, &cb); - refs_head_ref(repo->refs, handle_one_ref, &cb); + refs_for_each_ref(get_main_ref_store(repo), handle_one_ref, &cb); + refs_head_ref(get_main_ref_store(repo), handle_one_ref, &cb); commit_list_sort_by_date(&list); return get_oid_oneline(repo, name + 2, oid, list); } But there are a bunch of other commits around the same time replacing the_repository, and it seems like an easy mistake to make. Perhaps we should rename the "refs" member of "struct repository" to something more clearly private, which would force callers to use the access method. I also wonder if there should be a repo_for_each_ref() which does it for us, though I guess there a bazillion variants (like head_ref()) that would need similar treatment. Asking each caller to use get_main_ref_store() isn't too bad. -Peff ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re* [BUG] segmentation fault in git-diff 2020-04-09 23:41 ` Jeff King @ 2020-04-10 0:03 ` Junio C Hamano 2020-04-10 1:42 ` Érico Rolim 2020-04-10 3:04 ` Jeff King 0 siblings, 2 replies; 10+ messages in thread From: Junio C Hamano @ 2020-04-10 0:03 UTC (permalink / raw) To: Jeff King; +Cc: Érico Rolim, git Jeff King <peff@peff.net> writes: > But there are a bunch of other commits around the same time replacing > the_repository, and it seems like an easy mistake to make. Perhaps we > should rename the "refs" member of "struct repository" to something more > clearly private, which would force callers to use the access method. Here is the final version that I am going to apply and merge to 'jch' branch. This is an ancient regression in Git timescale, so its fix is not all that urgent, though. -- >8 -- Subject: [PATCH] sha1-name: do not assume that the ref store is initialized c931ba4e (sha1-name.c: remove the_repo from handle_one_ref(), 2019-04-16) replaced the use of for_each_ref() helper, which works with the main ref store of the default repository instance, with refs_for_each_ref(), which can work on any ref store instance, by assuming that the repository instance the function is given has its ref store already initialized. But it is possible that nobody has initialized it, in which case, the code ends up dereferencing a NULL pointer. Reported-by: Érico Rolim <erico.erc@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- sha1-name.c | 2 +- t/t4208-log-magic-pathspec.sh | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/sha1-name.c b/sha1-name.c index d9050776dd..3aba62938f 100644 --- a/sha1-name.c +++ b/sha1-name.c @@ -1771,7 +1771,7 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo, cb.repo = repo; cb.list = &list; - refs_for_each_ref(repo->refs, handle_one_ref, &cb); + refs_for_each_ref(get_main_ref_store(repo), handle_one_ref, &cb); refs_head_ref(repo->refs, handle_one_ref, &cb); commit_list_sort_by_date(&list); return get_oid_oneline(repo, name + 2, oid, list); diff --git a/t/t4208-log-magic-pathspec.sh b/t/t4208-log-magic-pathspec.sh index 4c8f3b8e1b..6cdbe4747a 100755 --- a/t/t4208-log-magic-pathspec.sh +++ b/t/t4208-log-magic-pathspec.sh @@ -55,6 +55,10 @@ test_expect_success '"git log -- :/a" should not be ambiguous' ' git log -- :/a ' +test_expect_success '"git log :/any/path/" should not segfault' ' + test_must_fail git log :/any/path/ +' + # This differs from the ":/a" check above in that :/in looks like a pathspec, # but doesn't match an actual file. test_expect_success '"git log :/in" should not be ambiguous' ' -- 2.26.0-106-g9fadedd637 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Re* [BUG] segmentation fault in git-diff 2020-04-10 0:03 ` Re* " Junio C Hamano @ 2020-04-10 1:42 ` Érico Rolim 2020-04-10 3:04 ` Jeff King 1 sibling, 0 replies; 10+ messages in thread From: Érico Rolim @ 2020-04-10 1:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, git You guys sure are fast! Happy it could be solved somewhat easily. Thanks! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re* [BUG] segmentation fault in git-diff 2020-04-10 0:03 ` Re* " Junio C Hamano 2020-04-10 1:42 ` Érico Rolim @ 2020-04-10 3:04 ` Jeff King 2020-04-10 5:39 ` Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: Jeff King @ 2020-04-10 3:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: Érico Rolim, git On Thu, Apr 09, 2020 at 05:03:45PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > But there are a bunch of other commits around the same time replacing > > the_repository, and it seems like an easy mistake to make. Perhaps we > > should rename the "refs" member of "struct repository" to something more > > clearly private, which would force callers to use the access method. > > Here is the final version that I am going to apply and merge to > 'jch' branch. This is an ancient regression in Git timescale, so > its fix is not all that urgent, though. Agreed. The patch looks good to me. I prepared the patch below on top. I mostly wanted it as an auditing tool to find similar cases, but there were none. It may still be worth applying to protect ourselves in the future. -- >8 -- Subject: [PATCH] repository: mark the "refs" pointer as private The "refs" pointer in a struct repository starts life as NULL, but then is lazily initialized when it is accessed via get_main_ref_store(). However, it's easy for calling code to forget this and access it directly, leading to code which works _some_ of the time, but fails if it is called before anybody else accesses the refs. This was the cause of the bug fixed by 5ff4b920eb (sha1-name: do not assume that the ref store is initialized, 2020-04-09). In order to prevent similar bugs, let's more clearly mark the "refs" field as private. In addition to helping future code, the name change will help us audit any existing direct uses. Besides get_main_ref_store() itself, it turns out there is only one. But we know it's OK as it is on the line directly after the fix from 5ff4b920eb, which will have initialized the pointer. However it's still a good idea for it to model the proper use of the accessing function, so we'll convert it. Signed-off-by: Jeff King <peff@peff.net> --- refs.c | 8 ++++---- repository.h | 8 ++++++-- sha1-name.c | 2 +- 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/refs.c b/refs.c index 1ab0bb54d3..b8759116cd 100644 --- a/refs.c +++ b/refs.c @@ -1852,14 +1852,14 @@ static struct ref_store *ref_store_init(const char *gitdir, struct ref_store *get_main_ref_store(struct repository *r) { - if (r->refs) - return r->refs; + if (r->refs_private) + return r->refs_private; if (!r->gitdir) BUG("attempting to get main_ref_store outside of repository"); - r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS); - return r->refs; + r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS); + return r->refs_private; } /* diff --git a/repository.h b/repository.h index 040057dea6..6534fbb7b3 100644 --- a/repository.h +++ b/repository.h @@ -67,8 +67,12 @@ struct repository { */ struct parsed_object_pool *parsed_objects; - /* The store in which the refs are held. */ - struct ref_store *refs; + /* + * The store in which the refs are held. This should generally only be + * accessed via get_main_ref_store(), as that will lazily initialize + * the ref object. + */ + struct ref_store *refs_private; /* * Contains path to often used file names. diff --git a/sha1-name.c b/sha1-name.c index 878553b132..fccc97fa7a 100644 --- a/sha1-name.c +++ b/sha1-name.c @@ -1816,7 +1816,7 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo, cb.repo = repo; cb.list = &list; refs_for_each_ref(get_main_ref_store(repo), handle_one_ref, &cb); - refs_head_ref(repo->refs, handle_one_ref, &cb); + refs_head_ref(get_main_ref_store(repo), handle_one_ref, &cb); commit_list_sort_by_date(&list); return get_oid_oneline(repo, name + 2, oid, list); } -- 2.26.0.414.ge3a6455e3d ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Re* [BUG] segmentation fault in git-diff 2020-04-10 3:04 ` Jeff King @ 2020-04-10 5:39 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2020-04-10 5:39 UTC (permalink / raw) To: Jeff King; +Cc: Érico Rolim, git Jeff King <peff@peff.net> writes: > I prepared the patch below on top. I mostly wanted it as an auditing > tool to find similar cases, but there were none. It may still be worth > applying to protect ourselves in the future. Makes perfect sense. Will queue. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-04-10 5:39 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-09 22:22 [BUG] segmentation fault in git-diff Érico Rolim 2020-04-09 22:45 ` Junio C Hamano 2020-04-09 22:47 ` Junio C Hamano 2020-04-09 22:57 ` Junio C Hamano 2020-04-09 23:37 ` Junio C Hamano 2020-04-09 23:41 ` Jeff King 2020-04-10 0:03 ` Re* " Junio C Hamano 2020-04-10 1:42 ` Érico Rolim 2020-04-10 3:04 ` Jeff King 2020-04-10 5:39 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).