* Git fetch bug in git 2.21+ "Could not access submodule '%s'" @ 2019-08-14 9:57 Paolo Pettinato (ppettina) 2019-08-14 15:36 ` Jeff King 2019-08-14 15:59 ` [PATCH] get_next_submodule(): format error string as an error Jeff King 0 siblings, 2 replies; 8+ messages in thread From: Paolo Pettinato (ppettina) @ 2019-08-14 9:57 UTC (permalink / raw) To: git@vger.kernel.org Spotted this when our Jenkins executors had git updated, and later managed to reproduce locally. The issue happens when fetching an updated ref from a remote, and that ref updates a submodule which is not checked out but whose folder is dirty. Steps to reproduce (on *nix) with repositories on GitHub: Scenario: git-fetch-bug repository with 2 branches, the HEADs for these branches point respectively to 2 different commits in a submodule. $ mkdir git_repo $ cd git_repo # Manually fetch "master" branch $ git init $ git remote add origin https://github.com/paolope/git-fetch-bug.git $ git fetch origin master # Checkout $ git checkout master # Repo now contains a folder named "sm" which is bound to contain a submodule checkout. But the submodule is not checked out yet. # Dirty that folder: $ touch sm/test # Fetching another branch will fail $ git fetch origin branch_1 Output: remote: Enumerating objects: 7, done. remote: Counting objects: 100% (7/7), done. remote: Compressing objects: 100% (3/3), done. remote: Total 5 (delta 2), reused 5 (delta 2), pack-reused 0 Unpacking objects: 100% (5/5), done. From https://github.com/paolope/git-fetch-bug * branch branch_1 -> FETCH_HEAD * [new branch] branch_1 -> origin/branch_1 Could not access submodule 'sm' # fails, plus no newline here :P! # Re-issuing the command succeeds $ git fetch origin branch_1 Output: From https://github.com/paolope/git-fetch-bug * branch branch_1 -> FETCH_HEAD I'd expect the command not to fail, or to fail consistently. This behaviour didn't happen in git 2.18.1, so the regression was introduced somewhere in between. It is possible that this is related to this issue: https://marc.info/?l=git&m=155246493613592&w=2 Regards, Paolo Pettinato ** ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Git fetch bug in git 2.21+ "Could not access submodule '%s'" 2019-08-14 9:57 Git fetch bug in git 2.21+ "Could not access submodule '%s'" Paolo Pettinato (ppettina) @ 2019-08-14 15:36 ` Jeff King 2019-08-14 16:40 ` Junio C Hamano 2019-08-14 15:59 ` [PATCH] get_next_submodule(): format error string as an error Jeff King 1 sibling, 1 reply; 8+ messages in thread From: Jeff King @ 2019-08-14 15:36 UTC (permalink / raw) To: Paolo Pettinato (ppettina); +Cc: git@vger.kernel.org On Wed, Aug 14, 2019 at 09:57:50AM +0000, Paolo Pettinato (ppettina) wrote: > The issue happens when fetching an updated ref from a remote, and that > ref updates a submodule which is not checked out but whose folder is dirty. > > Steps to reproduce (on *nix) with repositories on GitHub: > [...] > # Repo now contains a folder named "sm" which is bound to contain a > submodule checkout. But the submodule is not checked out yet. > # Dirty that folder: > $ touch sm/test > > # Fetching another branch will fail > $ git fetch origin branch_1 Thanks, I was able to reproduce here. Since this worked in v2.18.1, it was an easy candidate for bisecting. The resulting commit is 26f80ccfc1 (submodule: migrate get_next_submodule to use repository structs, 2018-11-28). It looks like your case falls afoul of this logic added by that commit in get_next_submodule(): repo = get_submodule_repo_for(spf->r, submodule); if (repo) { ... } else { /* * An empty directory is normal, * the submodule is not initialized */ if (S_ISGITLINK(ce->ce_mode) && !is_empty_dir(ce->name)) { spf->result = 1; strbuf_addf(err, _("Could not access submodule '%s'"), ce->name); } } Because you created a file in the uninitialized submodule directory, it fools the is_empty_dir() check. It seems like there should be a more robust way to check whether the submodule is initialized. Maybe: diff --git a/submodule.c b/submodule.c index 77ace5e784..748ebe5909 100644 --- a/submodule.c +++ b/submodule.c @@ -1294,6 +1294,9 @@ static int get_next_submodule(struct child_process *cp, if (!S_ISGITLINK(ce->ce_mode)) continue; + if (!is_submodule_active(spf->r, ce->name)) + continue; + submodule = submodule_from_path(spf->r, &null_oid, ce->name); if (!submodule) { const char *name = default_name_or_path(ce->name); but that seems to fail t5526's "on-demand works without .gitmodules entry" test. I think is_submodule_populated_gently() more exactly matches what the current code is trying to do, like so: diff --git a/submodule.c b/submodule.c index 77ace5e784..4b26faee5d 100644 --- a/submodule.c +++ b/submodule.c @@ -1347,8 +1347,7 @@ static int get_next_submodule(struct child_process *cp, * An empty directory is normal, * the submodule is not initialized */ - if (S_ISGITLINK(ce->ce_mode) && - !is_empty_dir(ce->name)) { + if (is_submodule_populated_gently(ce->name, NULL)) { spf->result = 1; strbuf_addf(err, _("Could not access submodule '%s'"), but it feels odd to me. Even if the submodule is not currently checked out, we'd presumably still want to do the recursive fetch as long as we have a repo under $GIT_DIR/modules? And anyway, it fails another test in t5526 ("fetching submodule into a broken repository"). Maybe somebody with more submodule expertise can jump in. > # Re-issuing the command succeeds > $ git fetch origin branch_1 > [...] > I'd expect the command not to fail, or to fail consistently. I think that part is expected. After receiving new objects, `fetch` tries to see if it found out about any new submodules that might need to be recursively fetched. In your first command, we actually received the new objects (but then erroneously complained because the submodule was not initialized). In the second, we already had those objects and didn't need to do that check. -Peff ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Git fetch bug in git 2.21+ "Could not access submodule '%s'" 2019-08-14 15:36 ` Jeff King @ 2019-08-14 16:40 ` Junio C Hamano 2019-08-14 17:03 ` Paolo Pettinato (ppettina) 2019-08-14 19:31 ` Jeff King 0 siblings, 2 replies; 8+ messages in thread From: Junio C Hamano @ 2019-08-14 16:40 UTC (permalink / raw) To: Jeff King; +Cc: Paolo Pettinato (ppettina), git@vger.kernel.org Jeff King <peff@peff.net> writes: > Because you created a file in the uninitialized submodule directory, it > fools the is_empty_dir() check. It seems like there should be a more > robust way to check whether the submodule is initialized. Maybe: > > diff --git a/submodule.c b/submodule.c > index 77ace5e784..748ebe5909 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1294,6 +1294,9 @@ static int get_next_submodule(struct child_process *cp, > if (!S_ISGITLINK(ce->ce_mode)) > continue; > > + if (!is_submodule_active(spf->r, ce->name)) > + continue; > + > submodule = submodule_from_path(spf->r, &null_oid, ce->name); > if (!submodule) { > const char *name = default_name_or_path(ce->name); > > but that seems to fail t5526's "on-demand works without .gitmodules > entry" test. > > I think is_submodule_populated_gently() more exactly matches what the > current code is trying to do, like so: > > diff --git a/submodule.c b/submodule.c > index 77ace5e784..4b26faee5d 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1347,8 +1347,7 @@ static int get_next_submodule(struct child_process *cp, > * An empty directory is normal, > * the submodule is not initialized > */ > - if (S_ISGITLINK(ce->ce_mode) && > - !is_empty_dir(ce->name)) { > + if (is_submodule_populated_gently(ce->name, NULL)) { > spf->result = 1; > strbuf_addf(err, > _("Could not access submodule '%s'"), > > but it feels odd to me. Even if the submodule is not currently checked > out, we'd presumably still want to do the recursive fetch as long as we > have a repo under $GIT_DIR/modules? ... which means that we are not interested in "is it populated?" but in "have we done 'git submodule init' to show interest in it?". But since we are walking the in-core index and picking only the gitlink entries in it in the early part of this loop, we know ce cannot be anything but a submodule at this point, so we will not be in the "we are interesteed in the submodule, but the current HEAD and index is at a commit that does not have it, hence $GIT_DIR/modules/ is the only place that knows about it" situation. If we are interested in it enough to have a repository stashed under $GIT_DIR/modules/, we should have a submodule there, shouldn't we? What I do not quite get is that repo_submodule_init(), which is called by get_submodule_repo_for(), looks into $GIT_DIR/modules/, according to the in-code comment of that function. So "we cannot get the repo for it, which is an error condition, but we will complain only for non-empty directory" logic feels iffy. Stepping back even a bit more, "an empty directory is normal" makes some sense. If the user or the build system created a non-directory at a path where a populated submodule would sit, that would not be good. If the user or the build system created a random file in the unpopulated empty directory, in which the working tree files of the submodule would be created once the submodule getspopulated, that would be equally bad, wouldn't it? Why is the user mucking with that directory in the first place, and isn't the flagging of the situation as an error, done with 26f80ccf ("submodule: migrate get_next_submodule to use repository structs", 2018-11-28), a bugfix? If not, why not? Puzzled.... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Git fetch bug in git 2.21+ "Could not access submodule '%s'" 2019-08-14 16:40 ` Junio C Hamano @ 2019-08-14 17:03 ` Paolo Pettinato (ppettina) 2019-08-14 17:56 ` Junio C Hamano 2019-08-14 19:31 ` Jeff King 1 sibling, 1 reply; 8+ messages in thread From: Paolo Pettinato (ppettina) @ 2019-08-14 17:03 UTC (permalink / raw) To: Junio C Hamano, Jeff King; +Cc: git@vger.kernel.org Thanks for the reply! On 14/08/2019 17:40, Junio C Hamano wrote: > Why is the user mucking with > that directory in the first place, and isn't the flagging of the > situation as an error, done with 26f80ccf ("submodule: migrate > get_next_submodule to use repository structs", 2018-11-28), a > bugfix? If not, why not? Not sure if you're implying here that this is not a bug; I'd say that: - Mucking about with a folder that's supposed to contain a submodule is not something that a lot of people do (and we worked around the issue), and people shouldn't do that, but... - ... regardless, I believe that "git fetch" shouldn't particularly care about the state of the current working directory. I didn't ask it to do anything with the submodules, nor have I initialised them. In my (limited) knowledge of git, I'd expect git fetch to do its magic entirely between the remote and the .git folder. Our use case was for a submodule containing encrypted secrets; and the mucking about was stubbing out those secrets in a test build without fetching/decrypting them. It's arguable whether this should be fixed or not; less arguable that we could use a better error message and consistency (1st execution fails, 2nd does not). ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Git fetch bug in git 2.21+ "Could not access submodule '%s'" 2019-08-14 17:03 ` Paolo Pettinato (ppettina) @ 2019-08-14 17:56 ` Junio C Hamano 0 siblings, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2019-08-14 17:56 UTC (permalink / raw) To: Paolo Pettinato (ppettina); +Cc: Jeff King, git@vger.kernel.org "Paolo Pettinato (ppettina)" <ppettina@cisco.com> writes: > Thanks for the reply! > > On 14/08/2019 17:40, Junio C Hamano wrote: >> Why is the user mucking with >> that directory in the first place, and isn't the flagging of the >> situation as an error, done with 26f80ccf ("submodule: migrate >> get_next_submodule to use repository structs", 2018-11-28), a >> bugfix? If not, why not? > > Not sure if you're implying here that this is not a bug; I'd say that: Yeah, sorry for a confused comment. It does feel strange that the error behaviour depends on what is in the working tree for a "fetch", which is between two $GIT_DIR and does not involve the working tree on the receiving side (that brings us back to my earlier comment in the same message). ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Git fetch bug in git 2.21+ "Could not access submodule '%s'" 2019-08-14 16:40 ` Junio C Hamano 2019-08-14 17:03 ` Paolo Pettinato (ppettina) @ 2019-08-14 19:31 ` Jeff King 1 sibling, 0 replies; 8+ messages in thread From: Jeff King @ 2019-08-14 19:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: Paolo Pettinato (ppettina), git@vger.kernel.org On Wed, Aug 14, 2019 at 09:40:40AM -0700, Junio C Hamano wrote: > > but it feels odd to me. Even if the submodule is not currently checked > > out, we'd presumably still want to do the recursive fetch as long as we > > have a repo under $GIT_DIR/modules? > > ... which means that we are not interested in "is it populated?" but > in "have we done 'git submodule init' to show interest in it?". But > since we are walking the in-core index and picking only the gitlink > entries in it in the early part of this loop, we know ce cannot be > anything but a submodule at this point, so we will not be in the "we > are interesteed in the submodule, but the current HEAD and index is > at a commit that does not have it, hence $GIT_DIR/modules/ is the > only place that knows about it" situation. If we are interested in > it enough to have a repository stashed under $GIT_DIR/modules/, we > should have a submodule there, shouldn't we? Yeah, as I was writing the above I too was wondering whether this was a case that could even happen. So it may be that the two ways of asking the question end up the same in practice. > What I do not quite get is that repo_submodule_init(), which is > called by get_submodule_repo_for(), looks into $GIT_DIR/modules/, > according to the in-code comment of that function. So "we cannot > get the repo for it, which is an error condition, but we will > complain only for non-empty directory" logic feels iffy. Right. This whole non-empty directory thing _feels_ like a hack that was added to paper over the fact that get_submodule_repo_for() does not distinguish between "nope, this module is not active" and "an error occurred". Given that any real errors there would come from read_and_verify_repository_format(), which generates its own error messages, I wonder if we should simply quietly ignore any entries for which get_submodule_repo_for() returns NULL. I suppose that would impact our exit code, though (i.e., a real broken submodule would not cause the outer fetch to exit with a non-zero code). Probably that could be dealt with by having get_submodule_repo_for() return a tristate enum: a working struct, an error, or ENOENT. Actually, I guess we could set errno. ;) It's not clear to me, though, that the rest of the functions are distinguishing between "broken repo at submodule path" and "no such submodule". For instance, get_submodule_repo_for() itself will happily hit a fallback path if repo_submodule_init() returns an error. That would really only want to trigger on this ENOENT-equivalent case. > Stepping back even a bit more, "an empty directory is normal" makes > some sense. If the user or the build system created a non-directory > at a path where a populated submodule would sit, that would not be > good. If the user or the build system created a random file in the > unpopulated empty directory, in which the working tree files of the > submodule would be created once the submodule getspopulated, that > would be equally bad, wouldn't it? Why is the user mucking with > that directory in the first place, and isn't the flagging of the > situation as an error, done with 26f80ccf ("submodule: migrate > get_next_submodule to use repository structs", 2018-11-28), a > bugfix? If not, why not? I agree that the user putting things in that directory is kind of weird. It just seems odd that fetch, which doesn't at all care about the working tree, would be the thing to complain about it. -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] get_next_submodule(): format error string as an error 2019-08-14 9:57 Git fetch bug in git 2.21+ "Could not access submodule '%s'" Paolo Pettinato (ppettina) 2019-08-14 15:36 ` Jeff King @ 2019-08-14 15:59 ` Jeff King 2019-08-14 16:15 ` Junio C Hamano 1 sibling, 1 reply; 8+ messages in thread From: Jeff King @ 2019-08-14 15:59 UTC (permalink / raw) To: Paolo Pettinato (ppettina); +Cc: git@vger.kernel.org On Wed, Aug 14, 2019 at 09:57:50AM +0000, Paolo Pettinato (ppettina) wrote: > Could not access submodule 'sm' # fails, plus no newline here :P! This part seems easy enough to fix. -- >8 -- Subject: get_next_submodule(): format error string as an error The run_processes_parallel() interface passes its callback functions an "err" strbuf in which they can accumulate errors. However, this differs from our usual "err" strbufs in that the result is not simply passed to error(), like: if (frob_repo(&err) < 0) error("frobnication failed: %s", err.buf); Instead, we append the error buffer as-is to a buffer collecting the sub-process stderr, adding neither a prefix nor a trailing newline. This gives callbacks more flexibility (e.g., get_next_submodule() adds its own "Fetching submodule foo" informational lines), but it means they're also responsible for formatting any errors themselves. We forgot to do so in the single error message in get_next_submodule(), meaning that it was output without a trailing newline. While we're fixing that, let's also give it the usual "error:" prefix and downcase the start of the message. We can't use error() here, because it always outputs directly to stderr. Looking at other users of run_processes_parallel(), there are a few similar messages in update_clone_task_finished(). But those sites do correctly add a newline (they don't use an "error" prefix, but it doesn't make as much sense there). Signed-off-by: Jeff King <peff@peff.net> --- submodule.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/submodule.c b/submodule.c index 0f199c5137..a5ba57ac36 100644 --- a/submodule.c +++ b/submodule.c @@ -1478,7 +1478,7 @@ static int get_next_submodule(struct child_process *cp, !is_empty_dir(ce->name)) { spf->result = 1; strbuf_addf(err, - _("Could not access submodule '%s'"), + _("error: could not access submodule '%s'\n"), ce->name); } } -- 2.23.0.rc2.479.gbd16c8906f ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] get_next_submodule(): format error string as an error 2019-08-14 15:59 ` [PATCH] get_next_submodule(): format error string as an error Jeff King @ 2019-08-14 16:15 ` Junio C Hamano 0 siblings, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2019-08-14 16:15 UTC (permalink / raw) To: Jeff King; +Cc: Paolo Pettinato (ppettina), git@vger.kernel.org Jeff King <peff@peff.net> writes: > On Wed, Aug 14, 2019 at 09:57:50AM +0000, Paolo Pettinato (ppettina) wrote: > >> Could not access submodule 'sm' # fails, plus no newline here :P! > > This part seems easy enough to fix. Indeed, it is ;-) > diff --git a/submodule.c b/submodule.c > index 0f199c5137..a5ba57ac36 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1478,7 +1478,7 @@ static int get_next_submodule(struct child_process *cp, > !is_empty_dir(ce->name)) { > spf->result = 1; > strbuf_addf(err, > - _("Could not access submodule '%s'"), > + _("error: could not access submodule '%s'\n"), > ce->name); > } > } ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-08-14 19:31 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-14 9:57 Git fetch bug in git 2.21+ "Could not access submodule '%s'" Paolo Pettinato (ppettina) 2019-08-14 15:36 ` Jeff King 2019-08-14 16:40 ` Junio C Hamano 2019-08-14 17:03 ` Paolo Pettinato (ppettina) 2019-08-14 17:56 ` Junio C Hamano 2019-08-14 19:31 ` Jeff King 2019-08-14 15:59 ` [PATCH] get_next_submodule(): format error string as an error Jeff King 2019-08-14 16:15 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).