git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git describe and "the smallest number of commits possible"
@ 2017-08-26 14:47 Kévin Le Gouguec
  2017-08-28 18:24 ` Stefan Beller
  0 siblings, 1 reply; 3+ messages in thread
From: Kévin Le Gouguec @ 2017-08-26 14:47 UTC (permalink / raw)
  To: git

Hi,

I've asked this question on the git-users Google Groups list[1], and
while the answers there were interesting, I still cannot figure
whether my problem comes from an actual bug, a misleading manpage, my
inability to understand the code and/or the manpage, or a combination
of the three.

I noticed this problem on 2.1.4 (Debian oldstable); I can reproduce it
on next (7ef03bb281b2220d0f2414365e4949beb2337042). Quoting
git-describe(1)'s SEARCH STRATEGY section:

> If multiple tags were found during the walk then the tag which has
> the fewest commits different from the input commit-ish will be
> selected and output. Here fewest commits different is defined as the
> number of commits which would be shown by `git log tag..input` will
> be the smallest number of commits possible.

To put it shortly, after cloning GNU Emacs's repository[2]:

    $ git describe --tags
    emacs-25.1-129847-gdcc3ef3ee7
    $ git log --oneline emacs-25.1.. | wc -l
    5126
    $ git log --oneline emacs-25.2.. | wc -l
    4793

If I am reading it correctly, the manpage suggests that emacs-25.2
should be picked in this situation ("log emacs-25.2.." shows fewer
commits than "log emacs-25.1..").

Once more with --debug:

    $ git describe --debug --tags
    searching to describe HEAD
     lightweight   129847 emacs-25.1
     lightweight     4086 emacs-25.1-rc2
     lightweight     4126 emacs-25.1-rc1
     annotated       4185 emacs-25.2
     annotated       4220 emacs-25.2-rc2
     lightweight     4226 emacs-25.0.95
     annotated       4236 emacs-25.2-rc1
     annotated       4280 emacs-25.1.91
     lightweight     4305 emacs-25.0.94
     lightweight     4329 emacs-25.1.90
    traversed 130257 commits
    more than 10 tags found; listed 10 most recent
    gave up search at 5c587fdff164e8b90beb47f6da64b4884290e40a
    emacs-25.1-129847-gdcc3ef3ee7

I tried to get a sense of what builtin/describe.c is doing (see [1]
for some debug printfs); to summarize what I figured:

- When QSORT() is called in describe(), emacs-25.1's depth is smaller
  than emacs-25.2's.

- finish_depth_computation() updates the best candidate's depth; AFAIU
  this update's only purpose is to make the displayed suffix more
  accurate.

That is all I have right now. I apologize for failing to come up with
a simpler test case (I tried to make toy repositories with a similar
topology to reproduce the issue, to no avail). To conclude, as far as
I can tell, one of the following holds:

- something about this repository[3] causes git-describe(1) to not
  work as advertised;

- I fail at reading manuals.



[1]: https://groups.google.com/forum/?fromgroups#!topic/git-users/tSnX-O-3aNI

[2]: https://git.savannah.gnu.org/git/emacs.git

[3]: The project's workflow sounds straightforward:

- development happens mainly on the master branch;
- the emacs-25 branch receives maintenance fixes and release tags; it
  is periodically merged back into master;
- experimental work can happen on scratch branches; these may
  eventually be merged back into master.

There are some complications (e.g. pull-induced merges) but if
I --simplify-by-decoration I find that the repository's topology
matches this description.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git describe and "the smallest number of commits possible"
  2017-08-26 14:47 git describe and "the smallest number of commits possible" Kévin Le Gouguec
@ 2017-08-28 18:24 ` Stefan Beller
  2017-08-29 13:34   ` Michael J Gruber
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Beller @ 2017-08-28 18:24 UTC (permalink / raw)
  To: Kévin Le Gouguec; +Cc: git@vger.kernel.org

On Sat, Aug 26, 2017 at 7:47 AM, Kévin Le Gouguec
<kevin.legouguec@gmail.com> wrote:
> Hi,
>
> I've asked this question on the git-users Google Groups list[1], and
> while the answers there were interesting, I still cannot figure
> whether my problem comes from an actual bug, a misleading manpage, my
> inability to understand the code and/or the manpage, or a combination
> of the three.
>
> I noticed this problem on 2.1.4 (Debian oldstable); I can reproduce it
> on next (7ef03bb281b2220d0f2414365e4949beb2337042). Quoting
> git-describe(1)'s SEARCH STRATEGY section:
>
>> If multiple tags were found during the walk then the tag which has
>> the fewest commits different from the input commit-ish will be
>> selected and output. Here fewest commits different is defined as the
>> number of commits which would be shown by `git log tag..input` will
>> be the smallest number of commits possible.

Maybe relevant
https://public-inbox.org/git/20160421113004.GA3140@aepfle.de/
(specifically the discussion after this patch, it sheds light on the
heuristics used, though your case here doesn't use these heuristics)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git describe and "the smallest number of commits possible"
  2017-08-28 18:24 ` Stefan Beller
@ 2017-08-29 13:34   ` Michael J Gruber
  0 siblings, 0 replies; 3+ messages in thread
From: Michael J Gruber @ 2017-08-29 13:34 UTC (permalink / raw)
  To: Stefan Beller, Kévin Le Gouguec; +Cc: git@vger.kernel.org

Stefan Beller venit, vidit, dixit 28.08.2017 20:24:
> On Sat, Aug 26, 2017 at 7:47 AM, Kévin Le Gouguec
> <kevin.legouguec@gmail.com> wrote:
>> Hi,
>>
>> I've asked this question on the git-users Google Groups list[1], and
>> while the answers there were interesting, I still cannot figure
>> whether my problem comes from an actual bug, a misleading manpage, my
>> inability to understand the code and/or the manpage, or a combination
>> of the three.
>>
>> I noticed this problem on 2.1.4 (Debian oldstable); I can reproduce it
>> on next (7ef03bb281b2220d0f2414365e4949beb2337042). Quoting
>> git-describe(1)'s SEARCH STRATEGY section:
>>
>>> If multiple tags were found during the walk then the tag which has
>>> the fewest commits different from the input commit-ish will be
>>> selected and output. Here fewest commits different is defined as the
>>> number of commits which would be shown by `git log tag..input` will
>>> be the smallest number of commits possible.
> 
> Maybe relevant
> https://public-inbox.org/git/20160421113004.GA3140@aepfle.de/
> (specifically the discussion after this patch, it sheds light on the
> heuristics used, though your case here doesn't use these heuristics)

git describe is driving my crazy sometimes, too - both the results and
the code ;)

[long read with lot's of debug output follows, sorry]

What is going on here is the following:
git describe --tags decides based on "depth".

git tag --merged dcc3ef3ee7 lists both emacs-25.1 and emac-25.2 (among
many others).

git rev-list --count emacs-25.1..dcc3ef3ee7
5126

git rev-list --count emacs-25.2..dcc3ef3ee7

4793

So, you would rightly expect 25.2 to be preferred over 25.1.

Now, "git describe" does not do a "rev-list --count" for each tag but,
instead, walks back from dcc3ef3ee7 and notes when it reaches a tagged
commit, and how many steps it takes to get there. The information
"reachable by the i-th tag that I encountered" is kept by a field of
width 27, which limits the --candidates values. Once the walk passes
more tags it is aborted.

Then the possible matches are sorted.

Then the depth calculation is finished.

Funny order of actions, isn't it? Smells like BUG.
At least it makes the debug output meaningless.

Here's the debug output before and after finish_depth_calculation():

searching to describe HEAD
 lightweight     4069 emacs-25.1
 lightweight     4086 emacs-25.1-rc2
 lightweight     4126 emacs-25.1-rc1
 annotated       4185 emacs-25.2
 annotated       4220 emacs-25.2-rc2
 lightweight     4226 emacs-25.0.95
 annotated       4236 emacs-25.2-rc1
 annotated       4280 emacs-25.1.91
 lightweight     4305 emacs-25.0.94
 lightweight     4329 emacs-25.1.90
traversed 4462 commits
more than 10 tags found; listed 10 most recent
gave up search at 5c587fdff164e8b90beb47f6da64b4884290e40a
[finish_depth_calculation()]
 lightweight   129847 emacs-25.1
 lightweight     4086 emacs-25.1-rc2
 lightweight     4126 emacs-25.1-rc1
 annotated       4185 emacs-25.2
 annotated       4220 emacs-25.2-rc2
 lightweight     4226 emacs-25.0.95
 annotated       4236 emacs-25.2-rc1
 annotated       4280 emacs-25.1.91
 lightweight     4305 emacs-25.0.94
 lightweight     4329 emacs-25.1.90
traversed 130257 commits
more than 10 tags found; listed 10 most recent
gave up search at 5c587fdff164e8b90beb47f6da64b4884290e40a
emacs-25.1-129847-gdcc3ef3ee7

Now, I can't claim that I can wrap my head around
finish_depth_calculation() and the way depth is calculated here, let
alone those huge numbers (probably per not-contained tag or so), but
that order of operations is clearly wrong.

Now, if we fix finish_depth_calculation() to actually do the work for
all possible candidates, and the sort afterwards, the result is:

LANG=C devgit describe --tags --debug --candidates=10
searching to describe HEAD
 lightweight   129847 emacs-25.1
 lightweight   129864 emacs-25.1-rc2
 lightweight   129904 emacs-25.1-rc1
 annotated     129981 emacs-25.2
 lightweight   130004 emacs-25.0.95
 annotated     130016 emacs-25.2-rc2
 annotated     130032 emacs-25.2-rc1
 annotated     130076 emacs-25.1.91
 lightweight   130083 emacs-25.0.94
 lightweight   130125 emacs-25.1.90
traversed 130257 commits
more than 10 tags found; listed 10 most recent
gave up search at 5c587fdff164e8b90beb47f6da64b4884290e40a
emacs-25.1-129847-gdcc3ef3ee7

LANG=C devgit describe --tags --debug --candidates=100

searching to describe HEAD
 annotated      13470 emacs-24.5-rc3-fixed
 lightweight    13471 emacs-24.5-rc3
 lightweight    13476 emacs-24.5-rc2
 lightweight    13482 emacs-24.5-rc1
 lightweight    13511 emacs-24.4.91
 lightweight    13554 emacs-24.4.90
 lightweight    13899 emacs-24.4
 lightweight    13902 emacs-24.4-rc1
 lightweight    13959 emacs-24.3.94
 annotated      13972 mh-e-8.6
 lightweight    14049 emacs-24.3.93
 lightweight    14176 emacs-24.3.92
 lightweight   129847 emacs-25.1
 lightweight   129864 emacs-25.1-rc2
 lightweight   129904 emacs-25.1-rc1
 lightweight   129971 emacs-25.0.92
 annotated     129981 emacs-25.2
 lightweight   130004 emacs-25.0.95
 annotated     130016 emacs-25.2-rc2
 annotated     130032 emacs-25.2-rc1
 annotated     130076 emacs-25.1.91
 lightweight   130083 emacs-25.0.94
 lightweight   130099 emacs-25.0.91
 lightweight   130125 emacs-25.1.90
 lightweight   130239 emacs-25.0.93
 lightweight   130255 emacs-25.0.90
traversed 130257 commits
more than 26 tags found; listed 26 most recent
gave up search at 96b894717caa773aa6d98ff57385f1c7537e8972
emacs-24.5-rc3-fixed-13470-gdcc3ef3ee7

At least the latter kind of makes sense:
git rev-list --count emacs-24.5-rc3-fixed..dcc3ef3ee7
13470

In short, I think that depth calculation is heavily problematic. First,
commit_list_insert_by_date() influences which tags are considered at all.
Second, and worse, the code for "describe cmt" clearly tries to
calculate the number of commits in cmt that are not in t (for each
candidate t), which ought to be "rev-list --count t..cmt", but fails
spectacularly.

I kind of suspect the priming of the tag depth on first encounter
("t->depth = seen_commits - 1;") to be the culprit because that depends
on the way we walk, not the topology, but I wouldn't know how to do better.

Michael

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-08-29 13:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-26 14:47 git describe and "the smallest number of commits possible" Kévin Le Gouguec
2017-08-28 18:24 ` Stefan Beller
2017-08-29 13:34   ` Michael J Gruber

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).