git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [BUG?] submodule-config.c:config_from() has left me confused
@ 2021-07-22 10:23 Atharva Raykar
  2021-07-22 10:59 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 2+ messages in thread
From: Atharva Raykar @ 2021-07-22 10:23 UTC (permalink / raw)
  To: Git List; +Cc: Christian Couder, brian m. carlson, Heiko Voigt

Hi Git Developers,

While I was working on converting submodule code from shell to C, I had
a question about the submodule-config API.

The usual way of obtaining the submodule configuration is by using the
'submodule_from_path()' or 'submodule_from_name()' functions. Taking the
example of the former, a call to it would look something like:

const struct submodule *sub = submodule_from_path(repo, oid, path-to-submod);

So my first question is what is the exact purpose of the second
argument?

The API docs say it should be a tree-ish:

   Given a tree-ish in the superproject and a path, return the submodule that
   is bound at the path in the named tree.

My guess is that the tree-ish that we pass lets us obtain the submodule
configuration for that specific commit or tree in the superproject.

Looking at the underlying implementation[1], it seems to me that the 'oid'
argument treats the 'null_oid()' specially to mean "get the configuration from
the latest revision".

While I had these questions, Christian noticed a possible bug in the
implementation[2]:

---8<------8<------8<------8<---

> It seems to me that config_from() in submodule-config.c accepts a path or a
> tree oid and submodule_from_path() just calls config_from().
> 
> By the way it seems to me that config_from() is buggy when it's passed a
> tree oid as it does:
> 
> ```
>         if (!gitmodule_oid_from_commit(treeish_name, &oid, &rev))
>                 goto out;
> ```
> 
> but gitmodule_oid_from_commit() returns 0 on success so instead of doing a
> `goto out` which will error out, it should continue to proceed using the oid
> it got. So I think it should be something like:
> 
> ```
>         if (gitmodule_oid_from_commit(treeish_name, &oid, &rev)) {
>                 switch (lookup_type) {
>                 case lookup_name:
>                         submodule = cache_lookup_name(cache, &oid, key);
>                         break;
>                 case lookup_path:
>                         submodule = cache_lookup_path(cache, &oid, key);
>                         break;
>                 }
>                 if (submodule)
>                         goto out;
>         }
> ```
> 
> Or maybe the gitmodule_oid_from_commit() call should be moved after the switch ().

---8<------8<------8<------8<---

I do agree that it looks wrong, but despite that, it still seems to pass 't7411' [3][4].

This should have failed according to my understanding:

cat >super/expect <<EOF
Submodule name: 'a' for path 'a'
Submodule name: 'a' for path 'b'
Submodule name: 'submodule' for path 'submodule'
Submodule name: 'submodule' for path 'submodule'
EOF

test_expect_success 'test parsing and lookup of submodule config by path' '
	(cd super &&
		test-tool submodule-config \
			HEAD^ a \
			HEAD b \
			HEAD^ submodule \
			HEAD submodule \
				>actual &&
		test_cmp expect actual
	)
'

Since HEAD and HEAD^ are valid tree-ish objects, the gitmodule_oid_from_commit()
line should have returned early with a NULL object, but that does not happen.
The output seems to match what I would expect from the API.

Applying Christian's suggestions pass the same test as well.

I also wonder what is the situation with the case where, the oid is non-null
and invalid?

If we look at the code again:

static int gitmodule_oid_from_commit(const struct object_id *treeish_name,
				     struct object_id *gitmodules_oid,
				     struct strbuf *rev)
{
	int ret = 0;

	if (is_null_oid(treeish_name)) {
		oidclr(gitmodules_oid);
		return 1;
	}

	strbuf_addf(rev, "%s:.gitmodules", oid_to_hex(treeish_name));
	if (get_oid(rev->buf, gitmodules_oid) >= 0)
		ret = 1;

	return ret;
}

...what happens to the value of gitmodules_oid when get_oid() fails?

*If* it is set to NULL, it would probably lead to undefined behaviour when
we try to hash the oid for the submodule cache lookup in [1]. If it is set
to 'null_oid()' or zero'd out (like with oidclr()), then it should return
the submodule from the latest revision, which does not seem desirable, as
the input is invalid.

Is this a bug in my understanding or in the program? Or is it a bit of both?

Footnotes:
---------

[1]: https://github.com/git/git/blob/daab8a564f8bbac55f70f8bf86c070e001a9b006/submodule-config.c#L545-L615
[2]: https://github.com/tfidfwastaken/git/commit/7ad7a9d1d03653aadfdc87b60e3a152b1cb37f22#r53637226
[3]: https://github.com/git/git/blob/master/t/t7411-submodule-config.sh
[4]: https://github.com/git/git/blob/master/t/helper/test-submodule-config.c

---
Atharva Raykar
ಅಥರ್ವ ರಾಯ್ಕರ್
अथर्व रायकर


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [BUG?] submodule-config.c:config_from() has left me confused
  2021-07-22 10:23 [BUG?] submodule-config.c:config_from() has left me confused Atharva Raykar
@ 2021-07-22 10:59 ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 2+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-22 10:59 UTC (permalink / raw)
  To: Atharva Raykar; +Cc: Git List, Christian Couder, brian m. carlson, Heiko Voigt


On Thu, Jul 22 2021, Atharva Raykar wrote:

> Hi Git Developers,

Hi, just skimming this, hopefully someone can get it the attention it
deserves...

> Since HEAD and HEAD^ are valid tree-ish objects, the gitmodule_oid_from_commit()
> line should have returned early with a NULL object, but that does not happen.
> The output seems to match what I would expect from the API.
>
> Applying Christian's suggestions pass the same test as well.
>
> I also wonder what is the situation with the case where, the oid is non-null
> and invalid?
>
> If we look at the code again:
>
> static int gitmodule_oid_from_commit(const struct object_id *treeish_name,
> 				     struct object_id *gitmodules_oid,
> 				     struct strbuf *rev)
> {
> 	int ret = 0;
>
> 	if (is_null_oid(treeish_name)) {
> 		oidclr(gitmodules_oid);
> 		return 1;
> 	}
>
> 	strbuf_addf(rev, "%s:.gitmodules", oid_to_hex(treeish_name));
> 	if (get_oid(rev->buf, gitmodules_oid) >= 0)
> 		ret = 1;
>
> 	return ret;
> }
>
> ...what happens to the value of gitmodules_oid when get_oid() fails?
>
> *If* it is set to NULL, it would probably lead to undefined behaviour when
> we try to hash the oid for the submodule cache lookup in [1]. If it is set
> to 'null_oid()' or zero'd out (like with oidclr()), then it should return
> the submodule from the latest revision, which does not seem desirable, as
> the input is invalid.

We won't set it to NULL, we'll att worst oidclr() it, or leave it alone,
no?

In general we should not rely on the values we expect to munge if we
return an error. In practice it's probably OK to rely on an oid
passed-in not being in an inconsistent state in that case.

It could be, e.g. this might eventually go to get_oid_basic() which in
calling get_hash_hex_algop() might partially have copied a hash when it
fails. I have not traced this codepath but it could happen in general.

In practice I think it's OK to rely on passing an OID in, and it being
bad or not, i.e.:

    struct object_id *oid = null_oid();
    if (get_oid(..., oid))
        error(...);
    else
        /* oid is still == null_oid, not some bizarro value, probably... */


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-07-22 11:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-22 10:23 [BUG?] submodule-config.c:config_from() has left me confused Atharva Raykar
2021-07-22 10:59 ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).