git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Possible bug in git describe, additional commits differs when cloned with --depth
@ 2019-09-27  9:51 Anders Janmyr
  2019-09-27 12:02 ` SZEDER Gábor
  0 siblings, 1 reply; 3+ messages in thread
From: Anders Janmyr @ 2019-09-27  9:51 UTC (permalink / raw)
  To: git

Hi,

I'm not sure if this is a bug or not but `git describe` gives
different results when the repo has been cloned with `--depth` or not.

In the example below from the git repository the number of additional
commits since the
last tag differs 256 vs. 265.

```
$ git clone https://github.com/git/git
$ cd git/
$ git describe
v2.23.0-256-g4c86140027
$ git rev-list -n 1 HEAD
4c86140027f4a0d2caaa3ab4bd8bfc5ce3c11c8a


$ git clone --depth=50 https://github.com/git/git git-depth
$ cd git-depth/
$ git describe
v2.23.0-265-g4c861400
$ git rev-list -n 1 HEAD
4c86140027f4a0d2caaa3ab4bd8bfc5ce3c11c8a
```

In the example above the first version also gives additional digits
for the SHA,
g4c86140027 vs. g4c861400, but that is not always the case.

I came upon this discrepancy when using the value from `git describe`
in a build pipeline
where the system optimized the build by cloning the repo using `--depth`.

Kind regards
--
http://anders.janmyr.com/
@andersjanmyr
+46 733-99 03 60

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Possible bug in git describe, additional commits differs when cloned with --depth
  2019-09-27  9:51 Possible bug in git describe, additional commits differs when cloned with --depth Anders Janmyr
@ 2019-09-27 12:02 ` SZEDER Gábor
  2019-10-08 12:31   ` [BUG] a commit date-related bug in 'git describe' [was: Re: Possible bug in git describe, additional commits differs when cloned with --depth] SZEDER Gábor
  0 siblings, 1 reply; 3+ messages in thread
From: SZEDER Gábor @ 2019-09-27 12:02 UTC (permalink / raw)
  To: Anders Janmyr; +Cc: git

On Fri, Sep 27, 2019 at 11:51:07AM +0200, Anders Janmyr wrote:
> Hi,
> 
> I'm not sure if this is a bug or not but `git describe` gives
> different results when the repo has been cloned with `--depth` or not.
> 
> In the example below from the git repository the number of additional
> commits since the
> last tag differs 256 vs. 265.
> 
> ```
> $ git clone https://github.com/git/git
> $ cd git/
> $ git describe
> v2.23.0-256-g4c86140027
> $ git rev-list -n 1 HEAD
> 4c86140027f4a0d2caaa3ab4bd8bfc5ce3c11c8a
> 
> 
> $ git clone --depth=50 https://github.com/git/git git-depth
> $ cd git-depth/
> $ git describe
> v2.23.0-265-g4c861400
> $ git rev-list -n 1 HEAD
> 4c86140027f4a0d2caaa3ab4bd8bfc5ce3c11c8a
> ```

I don't think this is a bug, but rather an inherent limitation of
shallow histories with lots of merges, and it affects not only 'git
describe', but any limited history traversal.

In the Git project new features are developed on their dedicated
branches, which are then eventually merged to 'master'.  Alas, we make
mistakes, and sometimes we realize that a feature was buggy after it
has already been merged to 'master'.  In such cases the bugfix is
often applied not on top of 'master', but on top of the feature
branch, so it can be merged to maintenance releases as well.

This results in a history like this:

  M2     Merge the bugfix to 'master'
  |  \
  |   \
 v2.0  |
  |    o  Bugfix for new feature
 CO2   |
  |    |
  M1  /  Merge 'new feature' to 'master'
  | \/
  |  o   new feature
  |  |
  |  o
  |  |
  | CO1
  |  |    
  | /
 v1.0

Describing M2 in a full repository results in something like
v2.0-2-gdeadbeef, because M2 contains only two commits that aren't in
v2.0, (M2 and the bugfix).

Now let's suppose that in a shallow repo the given '--depth=<N>'
resulted in a cutoff at commits CO1 and CO2, meaning that the shallow
repo does not include commits M1 and v1.0.  Consequently, Git can't
possibly see that the commits implementing the new feature are already
merged and thus reachable from v2.0, so it will count those commits as
well, resulting in v2.0-5-gabcdef.

There is a lot more going on in the Git repository, so it's not as
simple as above.  Case in point is the merge d1a251a1fa (Merge branch
'en/checkout-mismerge-fix', 2019-09-09), which merges a fix to a bug
that happened before v2.22.0-rc0, but that bug was not introduced in
the feature branch, but while merging that branch to 'master'.  The
result is still the same, though: since there are a lot of commits on
the ancestry path between that buggy merge and v2.23.0, '--depth=50'
doesn't include them all in the shallow clone, so Git can't possibly
know that that merge is reachable from v2.23.0.

  # same in both the full and shallow repos
  $ git log --oneline v2.23.0..d1a251a1fa^ |wc -l
  57

  # in the full repo
  $ git log --oneline v2.23.0..d1a251a1fa |wc -l
  59

  # in the shallow repo
  $ git log --oneline v2.23.0..d1a251a1fa |wc -l
  132

> In the example above the first version also gives additional digits
> for the SHA,
> g4c86140027 vs. g4c861400, but that is not always the case.

Git shows as many hexdigits as needed to form a unique object name
with a few additional digits worth of safety margin.  There are a lot
more objects in the full repository than in the shallow clone, which
means more hexdigits in the abbreviated object name.


Thanks for letting us know.  I think this is worth a warning in
the documentation of 'git clone --depth'.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [BUG] a commit date-related bug in 'git describe' [was: Re: Possible bug in git describe, additional commits differs when cloned with --depth]
  2019-09-27 12:02 ` SZEDER Gábor
@ 2019-10-08 12:31   ` SZEDER Gábor
  0 siblings, 0 replies; 3+ messages in thread
From: SZEDER Gábor @ 2019-10-08 12:31 UTC (permalink / raw)
  To: git; +Cc: Anders Janmyr

On Fri, Sep 27, 2019 at 02:02:01PM +0200, SZEDER Gábor wrote:
> On Fri, Sep 27, 2019 at 11:51:07AM +0200, Anders Janmyr wrote:
> > I'm not sure if this is a bug or not but `git describe` gives
> > different results when the repo has been cloned with `--depth` or not.
> > 
> > In the example below from the git repository the number of additional
> > commits since the
> > last tag differs 256 vs. 265.
> > 
> > ```
> > $ git clone https://github.com/git/git
> > $ cd git/
> > $ git describe
> > v2.23.0-256-g4c86140027
> > $ git rev-list -n 1 HEAD
> > 4c86140027f4a0d2caaa3ab4bd8bfc5ce3c11c8a
> > 
> > 
> > $ git clone --depth=50 https://github.com/git/git git-depth
> > $ cd git-depth/
> > $ git describe
> > v2.23.0-265-g4c861400
> > $ git rev-list -n 1 HEAD
> > 4c86140027f4a0d2caaa3ab4bd8bfc5ce3c11c8a
> > ```
> 
> I don't think this is a bug, but rather an inherent limitation of
> shallow histories with lots of merges, and it affects not only 'git
> describe', but any limited history traversal.
> 
> In the Git project new features are developed on their dedicated
> branches, which are then eventually merged to 'master'.  Alas, we make
> mistakes, and sometimes we realize that a feature was buggy after it
> has already been merged to 'master'.  In such cases the bugfix is
> often applied not on top of 'master', but on top of the feature
> branch, so it can be merged to maintenance releases as well.
> 
> This results in a history like this:
> 
>   M2     Merge the bugfix to 'master'
>   |  \
>   |   \
>  v2.0  |
>   |    o  Bugfix for new feature
>  CO2   |
>   |    |
>   M1  /  Merge 'new feature' to 'master'
>   | \/
>   |  o   new feature
>   |  |
>   |  o
>   |  |
>   | CO1
>   |  |    
>   | /
>  v1.0
> 
> Describing M2 in a full repository results in something like
> v2.0-2-gdeadbeef, because M2 contains only two commits that aren't in
> v2.0, (M2 and the bugfix).
> 
> Now let's suppose that in a shallow repo the given '--depth=<N>'
> resulted in a cutoff at commits CO1 and CO2, meaning that the shallow
> repo does not include commits M1 and v1.0.  Consequently, Git can't
> possibly see that the commits implementing the new feature are already
> merged and thus reachable from v2.0, so it will count those commits as
> well, resulting in v2.0-5-gabcdef.
> 
> There is a lot more going on in the Git repository, so it's not as
> simple as above.  Case in point is the merge d1a251a1fa (Merge branch
> 'en/checkout-mismerge-fix', 2019-09-09), which merges a fix to a bug
> that happened before v2.22.0-rc0, but that bug was not introduced in
> the feature branch, but while merging that branch to 'master'.  The
> result is still the same, though: since there are a lot of commits on
> the ancestry path between that buggy merge and v2.23.0, '--depth=50'
> doesn't include them all in the shallow clone, so Git can't possibly
> know that that merge is reachable from v2.23.0.
> 
>   # same in both the full and shallow repos
>   $ git log --oneline v2.23.0..d1a251a1fa^ |wc -l
>   57
> 
>   # in the full repo
>   $ git log --oneline v2.23.0..d1a251a1fa |wc -l
>   59
> 
>   # in the shallow repo
>   $ git log --oneline v2.23.0..d1a251a1fa |wc -l
>   132

Ok, so above I explained why 'git describe' is not buggy in a shallow
clone, but rather this is the best it can do under the circumstances.

Unfortunately (or but of course?), 'git describe' is in fact buggy,
but in a different way, and curiously the above mentioned commit
d1a251a1fa shows why.  Bear with me.

First, the desired behavior according to the manpage of 'git
describe':

  ... it suffixes the tag name with the number of additional commits
  on top of the tagged object ...

The Examples section in the manpage is even more explicit:

  The number of additional commits is the number of commits which
  would be displayed by "git log <tag>..<described-commit>".

Now let's have a look at that problematic commit:

  $ git describe d1a251a1fa
  v2.23.0-135-gd1a251a1fa
  $ git log --oneline v2.23.0..d1a251a1fa | wc -l
  59

Uh-oh, 59 != 135.

This is happening because:

  - Git is too fast ;) and the committer date has a one second
    granularity, so scripts can easily create subsequent commits with
    the same committer date.  Case in point are the two subsequent
    merge commits f3c19f85c5 and 4a3ed2bec6 at the bottom of this
    simplified history snippet (kind of a hand-edited 'git log --graph
    --format="%h %cd %s"'):

    *   d1a251a1fa 2019-09-09 12:26:36 -0700 Merge branch 'en/checkout-mismerge-fix'
    |\
    * | ... a big chunk of history simplified away ...
    | * acb7da05ac 2019-08-16 09:58:00 -0700 checkout: remove duplicate code
    * | a5e4be2f68 2019-04-25 16:41:15 +0900 Merge branch 'ab/commit-graph-fixes'
    * | f3c19f85c5 2019-04-25 16:41:14 +0900 Merge branch 'ab/gc-reflog'
    |/
    *   4a3ed2bec6 2019-04-25 16:41:14 +0900 Merge branch 'nd/checkout-m'

  - 'git describe' implements its own history traversal: it iterates
    over all parents of a commit, adds any yet unseen parents to a
    commit list ordered by date, and then continues with the first,
    i.e. most recent commit from that list.  While doing so it uses
    several bits in 'commit->object.flags' to track reachability
    information from several candidate tags at once, and copies these
    flags from each commit to its parents; this is important to
    compute the correct number of additional commits.  Another
    important thing is the implementation detail that
    commit_list_insert_by_date() inserts a new commit after all other
    commits with the same date that are already in the list.

So while traversing the above piece of history in date order
acb7da05ac is visited before a5e4be2f68, which is OK in itself as they
are on parallel running branches.  However, it means that 4a3ed2bec6
is added to the commit list before f3c19f85c5, which in turn means
that 4a3ed2bec6 is visited before f3c19f85c5, even though the former
is the parent of the latter.  Consequently, those reachability flags
that are important to compute the number of additional commits were
copied from 4a3ed2bec6 to its parents before those flags could be
updated with the values from f3c19f85c5, which makes many of the
ancestor commits incorrectly appear as unreachable from the candidate
tags.  Ultimately this causes that 135 != 59 difference shown above.

There are several thousands commits in git.git where 'git describe'
reports incorrect number of additional commits.

Now, I suppose that this particular case could be fixed by modifying
commit_list_insert_by_date() (or use a new variant of it) to insert a
new commit before all other commits with the same date that are
already in the list.  However, I also suppose that a clock skew could
lead to similar incorrect additional commit counts, and merely
tweaking the details of commit_list_insert_by_date() could not help
with that.

I think the proper way to fix this issue would be to make 'git
describe' traverse the history in topographical order.  Alas, I'm
afraid this would result in a noticable performance penalty on big
histories without a commit graph.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-10-08 12:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-27  9:51 Possible bug in git describe, additional commits differs when cloned with --depth Anders Janmyr
2019-09-27 12:02 ` SZEDER Gábor
2019-10-08 12:31   ` [BUG] a commit date-related bug in 'git describe' [was: Re: Possible bug in git describe, additional commits differs when cloned with --depth] SZEDER Gábor

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).