git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Finding a tag that introduced a submodule change
@ 2017-03-03 15:40 Robert Dailey
  2017-03-03 16:39 ` Jacob Keller
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Robert Dailey @ 2017-03-03 15:40 UTC (permalink / raw)
  To: Git

I have a repository with a single submodule in it. Since the parent
repository represents the code base for an actual product, I tag
release versions in the parent repository. I do not put tags in the
submodule since multiple other products may be using it there and I
wanted to avoid ambiguous tags.

Sometimes I run into a situation where I need to find out which
release of the product a submodule change was introduced in. This is
nontrivial, since there are no tags in the submodule itself. This is
one thing I tried:

1. Do a `git log` in the submodule to find the SHA1 representing the
change I want to check for
2. In the parent repository, do a git log with pickaxe to determine
when the submodule itself changed to the value of that SHA1.
3. Based on the result of #2, do a `git tag --contains` to see the
lowest-version tag that contains the SHA1, which will identify the
first release that introduced that change

However, I was not able to get past #2 because apparently there are
cases where when we move the submodule "forward", we skip over
commits, so the value of the submodule itself never was set to that
SHA1.

I'm at a loss here on how to easily do this. Can someone recommend a
way to do this? Obviously the easier the better, as I have to somehow
train my team how to do this on their own.

Thanks in advance.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Finding a tag that introduced a submodule change
  2017-03-03 15:40 Finding a tag that introduced a submodule change Robert Dailey
@ 2017-03-03 16:39 ` Jacob Keller
  2017-03-03 18:04 ` Junio C Hamano
  2017-03-15 17:10 ` Stefan Beller
  2 siblings, 0 replies; 6+ messages in thread
From: Jacob Keller @ 2017-03-03 16:39 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Git

On Fri, Mar 3, 2017 at 7:40 AM, Robert Dailey <rcdailey.lists@gmail.com> wrote:
> I have a repository with a single submodule in it. Since the parent
> repository represents the code base for an actual product, I tag
> release versions in the parent repository. I do not put tags in the
> submodule since multiple other products may be using it there and I
> wanted to avoid ambiguous tags.
>

Hi,

I agree you shouldn't use tags in the submodules.

> Sometimes I run into a situation where I need to find out which
> release of the product a submodule change was introduced in. This is
> nontrivial, since there are no tags in the submodule itself. This is
> one thing I tried:
>

I've run into this exact problem at $DAYJOB.

> 1. Do a `git log` in the submodule to find the SHA1 representing the
> change I want to check for
> 2. In the parent repository, do a git log with pickaxe to determine
> when the submodule itself changed to the value of that SHA1.
> 3. Based on the result of #2, do a `git tag --contains` to see the
> lowest-version tag that contains the SHA1, which will identify the
> first release that introduced that change
>
> However, I was not able to get past #2 because apparently there are
> cases where when we move the submodule "forward", we skip over
> commits, so the value of the submodule itself never was set to that
> SHA1.
>
> I'm at a loss here on how to easily do this. Can someone recommend a
> way to do this? Obviously the easier the better, as I have to somehow
> train my team how to do this on their own.
>
> Thanks in advance.

So there's better ways to do this, but I do think there would be value
in adding some plumbing to make it easier.

Here is how I would do this, best if written into a shell script or
similar to automate the tricky part of #2

1. Do a git-log of the *parent* project, filtering out to show only
the path to the submodule

2. For each commit here, you find the new and old values of the
submodule pointer.

3. Use git merge-base --is-ancestor to ensure that "old" is an
ancestor of "submodule sha1id" and then

4. Use git-merge-base to ensure that "submodule sha1id" is an
anscestor of "new".

If both these are tree, then you know that the commit was included
into the parent project at this point.

I've had to do this once or twice, but I don't actually remember
exactly how I did 3. One sneaky way would be to add new tags for each
submodule change something like the following might work and be more
efficient. I'm not really sure but here's how I would go that route:

1. git log <limiting revision selection if you dont' want the entire
history> <path-to-submodule> --pretty=%h | parallel git ls-tree {}
path-to-submodule

The above more or less prints every submodule value as it changed over
time in the parent project.

Next, for each submodule change:

2. git -C <submodule> tag parent/<sha1id> <submodule change>

Create a new tag prefixed by "parent" that includes the sha1id of the
parent commit, and create it inside the submodule

3. git -C submodule describe --contains --match="parent/*" <submodule sha1id>

Once you're done you can also delete all the tags that are in the
"parent" prefix if you dont' really wanna see them again.

Basically, re-use the machinery to tag and then use describe
--contains to find the commit.

I *really* think a similar algorithm could be embedded as a plumbing
subcommand, since I think this is tedious to do by hand.

I'm not really sure if this is the "best" algorithm either, but it's
pretty much what I've used in the past. Either the tag way or the log
yourself one at a time way.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Finding a tag that introduced a submodule change
  2017-03-03 15:40 Finding a tag that introduced a submodule change Robert Dailey
  2017-03-03 16:39 ` Jacob Keller
@ 2017-03-03 18:04 ` Junio C Hamano
  2017-03-15 14:12   ` Robert Dailey
  2017-03-15 17:10 ` Stefan Beller
  2 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2017-03-03 18:04 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Git

Robert Dailey <rcdailey.lists@gmail.com> writes:

> Sometimes I run into a situation where I need to find out which
> release of the product a submodule change was introduced in. This is
> nontrivial, since there are no tags in the submodule itself.

Does your superproject rewind the commit in the submodule project as
it goes forward?  That is, is this property guaranteed to hold by
your project's discipline:

	Given any two commits C1 and C2 in the superproject, and the
	commit in the submodule bound to C1's and C2's tree (call
	them S1 and S2, respectively), if C1 is an ancestor of C2,
	then S1 is the same as S2 or an ancestor of S2.

If so, I think you can do a bisection of the history in the
superproject.  Pick an old commit in the superproject that binds an
old commit from the submodule that does not have the change and call
it "good".  Similarly pick a new one in the superproject that binds
a newer commit from the submodule that does have the change, and
call it "bad".  Then do

	$ git bisect start $bad $good -- $path_to_submodule

which would suggest you to test commits that change what commit is
bound at the submodule's path.

When testing each of these commits, you would see if the commit
bound at the submodule's path has the change or not.

	$ current=$(git rev-parse HEAD:$path_to_submodule)

would give you the object name of that commit, and then

	$ git -C $path_to_submodule merge-base --is-ancestor $change $current

would tell you if the $change you are interested in is already
contained in that $current commit.  Then you say "git bisect good"
if $current is too old to contain the $change, and "git bisect bad"
if $current is sufficiently new and contains the $change, to
continue.

If your superproject rewinds the commit in the submodule as it goes
forward, e.g. an older commit in the superproject used submodule
commit from day X, but somebody who made yesterday's commit in the
superproject realized that that submodule commit was broken and used
an older commit in the submodule from day (X-1), then you cannot
bisect.  In such a case, I think you would essentially need to check
all superproject commits that changed the commit bound at the
submodule's path.

	$ git rev-list $bad..$good -- $path_to_submodule

would give a list of such commits, and you would do the "merge-base"
check for all them to see which ones have and do not have the
$change (replace "HEAD" with the commit you are testing in the
computation that gives you $current).



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Finding a tag that introduced a submodule change
  2017-03-03 18:04 ` Junio C Hamano
@ 2017-03-15 14:12   ` Robert Dailey
  0 siblings, 0 replies; 6+ messages in thread
From: Robert Dailey @ 2017-03-15 14:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git

On Fri, Mar 3, 2017 at 12:04 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Robert Dailey <rcdailey.lists@gmail.com> writes:
>
>> Sometimes I run into a situation where I need to find out which
>> release of the product a submodule change was introduced in. This is
>> nontrivial, since there are no tags in the submodule itself.
>
> Does your superproject rewind the commit in the submodule project as
> it goes forward?  That is, is this property guaranteed to hold by
> your project's discipline:
>
>         Given any two commits C1 and C2 in the superproject, and the
>         commit in the submodule bound to C1's and C2's tree (call
>         them S1 and S2, respectively), if C1 is an ancestor of C2,
>         then S1 is the same as S2 or an ancestor of S2.
>
> If so, I think you can do a bisection of the history in the
> superproject.  Pick an old commit in the superproject that binds an
> old commit from the submodule that does not have the change and call
> it "good".  Similarly pick a new one in the superproject that binds
> a newer commit from the submodule that does have the change, and
> call it "bad".  Then do
>
>         $ git bisect start $bad $good -- $path_to_submodule
>
> which would suggest you to test commits that change what commit is
> bound at the submodule's path.
>
> When testing each of these commits, you would see if the commit
> bound at the submodule's path has the change or not.
>
>         $ current=$(git rev-parse HEAD:$path_to_submodule)
>
> would give you the object name of that commit, and then
>
>         $ git -C $path_to_submodule merge-base --is-ancestor $change $current
>
> would tell you if the $change you are interested in is already
> contained in that $current commit.  Then you say "git bisect good"
> if $current is too old to contain the $change, and "git bisect bad"
> if $current is sufficiently new and contains the $change, to
> continue.
>
> If your superproject rewinds the commit in the submodule as it goes
> forward, e.g. an older commit in the superproject used submodule
> commit from day X, but somebody who made yesterday's commit in the
> superproject realized that that submodule commit was broken and used
> an older commit in the submodule from day (X-1), then you cannot
> bisect.  In such a case, I think you would essentially need to check
> all superproject commits that changed the commit bound at the
> submodule's path.
>
>         $ git rev-list $bad..$good -- $path_to_submodule
>
> would give a list of such commits, and you would do the "merge-base"
> check for all them to see which ones have and do not have the
> $change (replace "HEAD" with the commit you are testing in the
> computation that gives you $current).


Hi Junio, my apologies for the very late response.

I really like your idea, however unfortunately often times people on
my team accidentally rewind the submodule. However your latter
suggestion about just doing merge-base on each change would be a
worthy solution to try. Thank you very much, I will certainly give
this a try!!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Finding a tag that introduced a submodule change
  2017-03-03 15:40 Finding a tag that introduced a submodule change Robert Dailey
  2017-03-03 16:39 ` Jacob Keller
  2017-03-03 18:04 ` Junio C Hamano
@ 2017-03-15 17:10 ` Stefan Beller
  2017-03-16 10:17   ` Jacob Keller
  2 siblings, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2017-03-15 17:10 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Git

On Fri, Mar 3, 2017 at 7:40 AM, Robert Dailey <rcdailey.lists@gmail.com> wrote:
> I have a repository with a single submodule in it. Since the parent
> repository represents the code base for an actual product, I tag
> release versions in the parent repository. I do not put tags in the
> submodule since multiple other products may be using it there and I
> wanted to avoid ambiguous tags.
>
> Sometimes I run into a situation where I need to find out which
> release of the product a submodule change was introduced in. This is
> nontrivial, since there are no tags in the submodule itself. This is
> one thing I tried:
>
> 1. Do a `git log` in the submodule to find the SHA1 representing the
> change I want to check for
> 2. In the parent repository, do a git log with pickaxe to determine
> when the submodule itself changed to the value of that SHA1.
> 3. Based on the result of #2, do a `git tag --contains` to see the
> lowest-version tag that contains the SHA1, which will identify the
> first release that introduced that change
>
> However, I was not able to get past #2 because apparently there are
> cases where when we move the submodule "forward", we skip over
> commits, so the value of the submodule itself never was set to that
> SHA1.
>
> I'm at a loss here on how to easily do this. Can someone recommend a
> way to do this? Obviously the easier the better, as I have to somehow
> train my team how to do this on their own.
>
> Thanks in advance.

I cannot offer an easy way. However I can come up with a proposal
how to make this easy in the future. ;)

"git-{branch,tag} --contains" currently only takes a commit id as that is
easy to check for. (Just a revwalk from all commits, as we walk over the
commits in the graph)

We should extend the possible arguments to --contains, such that you can
do

    # check that a given path had this exact tree/blob id
    git tag --contains <path>:<tree/blob-id>
    # check if the given tree/blob was at any path
    git tag --contains <tree/blob id>
    # generalizing from above:
    git tag --contains [<pathspec>:]<blob/tree id>

With this designed API you could ask for

    git tag --contains submodule:<sha1 from step 2>

For the implementation of this feature the revwalk would also need
to walk the object graph (as restricted by the pathspec) and
see if there is the given object for each tag.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Finding a tag that introduced a submodule change
  2017-03-15 17:10 ` Stefan Beller
@ 2017-03-16 10:17   ` Jacob Keller
  0 siblings, 0 replies; 6+ messages in thread
From: Jacob Keller @ 2017-03-16 10:17 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Robert Dailey, Git

On Wed, Mar 15, 2017 at 10:10 AM, Stefan Beller <sbeller@google.com> wrote:
> On Fri, Mar 3, 2017 at 7:40 AM, Robert Dailey <rcdailey.lists@gmail.com> wrote:
>> I have a repository with a single submodule in it. Since the parent
>> repository represents the code base for an actual product, I tag
>> release versions in the parent repository. I do not put tags in the
>> submodule since multiple other products may be using it there and I
>> wanted to avoid ambiguous tags.
>>
>> Sometimes I run into a situation where I need to find out which
>> release of the product a submodule change was introduced in. This is
>> nontrivial, since there are no tags in the submodule itself. This is
>> one thing I tried:
>>
>> 1. Do a `git log` in the submodule to find the SHA1 representing the
>> change I want to check for
>> 2. In the parent repository, do a git log with pickaxe to determine
>> when the submodule itself changed to the value of that SHA1.
>> 3. Based on the result of #2, do a `git tag --contains` to see the
>> lowest-version tag that contains the SHA1, which will identify the
>> first release that introduced that change
>>
>> However, I was not able to get past #2 because apparently there are
>> cases where when we move the submodule "forward", we skip over
>> commits, so the value of the submodule itself never was set to that
>> SHA1.
>>
>> I'm at a loss here on how to easily do this. Can someone recommend a
>> way to do this? Obviously the easier the better, as I have to somehow
>> train my team how to do this on their own.
>>
>> Thanks in advance.
>
> I cannot offer an easy way. However I can come up with a proposal
> how to make this easy in the future. ;)
>
> "git-{branch,tag} --contains" currently only takes a commit id as that is
> easy to check for. (Just a revwalk from all commits, as we walk over the
> commits in the graph)
>
> We should extend the possible arguments to --contains, such that you can
> do
>
>     # check that a given path had this exact tree/blob id
>     git tag --contains <path>:<tree/blob-id>
>     # check if the given tree/blob was at any path
>     git tag --contains <tree/blob id>
>     # generalizing from above:
>     git tag --contains [<pathspec>:]<blob/tree id>
>
> With this designed API you could ask for
>
>     git tag --contains submodule:<sha1 from step 2>
>
> For the implementation of this feature the revwalk would also need
> to walk the object graph (as restricted by the pathspec) and
> see if there is the given object for each tag.
>
> Thanks,
> Stefan

This sounds useful, but has a limitation in regards to submodules.
Lets say that parent project points submodule commit 1.

In the submodule, you create commit 2, commit 3, and commit 4.

Then, in the parent project, you new move the submodule forward to commit 4

I think the general goal for submodules is to say "which parent commit
included this submodule commit" but the parent never ACTUALLY included
commit 3, it only included commit 4 which happens to contain commit 3.

I'm wondering if it might be worth adding an (optional) mode for
submodules which would disallow adding a submodule pointer if the
current submodule pointer is not an ancestor of the new value. This
seems like a valuable protection for many uses cases (and preserves
the behavior of a bisect to find which commit added something). It
obviously shouldn't be mandatory since people often re-wind the
submodule pointer. If you have this enabled the only way to rewind the
submodule pointer would be to rewmind the parent history itself.

You could make the --contains logic above smart enough to try and
detect "ancestor of" like now, but I think that wouldn't necessarily
buy us too much and seems pretty submodule specific.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-03-16 10:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-03 15:40 Finding a tag that introduced a submodule change Robert Dailey
2017-03-03 16:39 ` Jacob Keller
2017-03-03 18:04 ` Junio C Hamano
2017-03-15 14:12   ` Robert Dailey
2017-03-15 17:10 ` Stefan Beller
2017-03-16 10:17   ` Jacob Keller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).