git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* branch --contains / tag --merged inconsistency
@ 2018-04-27 14:30 Ferenc Wágner
  2018-04-27 16:03 ` SZEDER Gábor
  0 siblings, 1 reply; 4+ messages in thread
From: Ferenc Wágner @ 2018-04-27 14:30 UTC (permalink / raw)
  To: git; +Cc: wferi

Hi,

I'm moving the IRC discussion here, because this might be a bug report
in the end.  So, kindly try these steps (103 MB free space required):

$ git clone https://github.com/ClusterLabs/pacemaker.git && cd pacemaker
[...]
$ git branch --contains Pacemaker-0.6.1
* master
$ git tag --merged master | fgrep Pacemaker-0.6
Pacemaker-0.6.0
Pacemaker-0.6.2
Pacemaker-0.6.3
Pacemaker-0.6.4
Pacemaker-0.6.5
Pacemaker-0.6.6

Notice that Pacemaker-0.6.1 is missing from the output.  Kind people on
IRC didn't find a quick explanation, and we all had to go eventually.
Is this expected behavior?  Reproduced with git 2.11.0 and 2.17.0.
-- 
Thanks,
Feri

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: branch --contains / tag --merged inconsistency
  2018-04-27 14:30 branch --contains / tag --merged inconsistency Ferenc Wágner
@ 2018-04-27 16:03 ` SZEDER Gábor
  2018-04-28  6:27   ` Ferenc Wágner
  2018-04-30 14:01   ` Derrick Stolee
  0 siblings, 2 replies; 4+ messages in thread
From: SZEDER Gábor @ 2018-04-27 16:03 UTC (permalink / raw)
  To: Ferenc Wágner; +Cc: SZEDER Gábor, git

Szia Feri,

> I'm moving the IRC discussion here, because this might be a bug report
> in the end.  So, kindly try these steps (103 MB free space required):
> 
> $ git clone https://github.com/ClusterLabs/pacemaker.git && cd pacemaker
> [...]
> $ git branch --contains Pacemaker-0.6.1
> * master
> $ git tag --merged master | fgrep Pacemaker-0.6
> Pacemaker-0.6.0
> Pacemaker-0.6.2
> Pacemaker-0.6.3
> Pacemaker-0.6.4
> Pacemaker-0.6.5
> Pacemaker-0.6.6
> 
> Notice that Pacemaker-0.6.1 is missing from the output.  Kind people on
> IRC didn't find a quick explanation, and we all had to go eventually.
> Is this expected behavior?  Reproduced with git 2.11.0 and 2.17.0.

The commit pointed to by the tag Pacemaker-0.6.1 and its parent have a
serious clock skew, i.e. they are a few months older than their parents:

$ git log --format='%h %ad %cd%d%n    %s' --date=short Pacemaker-0.6.1^..47a8ef4c
47a8ef4ce 2008-02-15 2008-02-15
    Low: TE: Logging - display the op's magic field for unexpected and foreign events
b9cfcd6b4 2007-12-10 2007-12-10 (tag: Pacemaker-0.6.2)
    haresources2cib.py: set default-action-timeout to the default (20s)
52e7793e0 2007-12-10 2007-12-10
    haresources2cib.py: update ra parameters lists
dea277271 2008-02-14 2008-02-14
    Medium: Build: Turn on snmp support in rpm packages (patch from MATSUDA, Daiki)
f418742fe 2008-02-14 2008-02-14
    Low: Build: Update the .spec file with the one used by build service
ccfa716a5 2008-02-14 2008-02-14
    Medium: SNMP: Allow the snmp subagent to be built (patch from MATSUDA, Daiki)
50f0ade2d 2008-02-14 2008-02-14
    Low: Build: Update last release number
90f11667f 2008-02-14 2008-02-14
    Medium: Tools: Make sure the autoconf variables in haresources2cib are expanded
9d2383c46 2008-02-11 2008-02-11 (tag: Pacemaker-0.6.1)
    High: cib: Ensure the archived file hits the disk before returning

(branch|tag|describe|...) (--contains|--merged) use the commit timestamp
information as a heuristic to avoid traversing parts of the history,
which makes these operations, especially on big histories, an order of
magnitude or two faster.  Yeah, commit timestamps can't always be
trusted, but skewed commits are rare, and skewed commits with this much
skew are even rarer.

I'm not sure how (or if it's at all possible) to turn off this
timestamp-based optimisation.

FWIW, much work is being done on a cached commit graph including commit
generation numbers, which will solve this issue both correctly and more
efficiently.  Perhaps it will already be included in the next release.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: branch --contains / tag --merged inconsistency
  2018-04-27 16:03 ` SZEDER Gábor
@ 2018-04-28  6:27   ` Ferenc Wágner
  2018-04-30 14:01   ` Derrick Stolee
  1 sibling, 0 replies; 4+ messages in thread
From: Ferenc Wágner @ 2018-04-28  6:27 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git

SZEDER Gábor <szeder.dev@gmail.com> writes:

> The commit pointed to by the tag Pacemaker-0.6.1 and its parent have a
> serious clock skew, i.e. they are a few months older than their parents:
> [...]
> (branch|tag|describe|...) (--contains|--merged) use the commit timestamp
> information as a heuristic to avoid traversing parts of the history,
> which makes these operations, especially on big histories, an order of
> magnitude or two faster.  Yeah, commit timestamps can't always be
> trusted, but skewed commits are rare, and skewed commits with this much
> skew are even rarer.

Szia Gábor,

Thank you very much for the explanation!  Good to know there's no
serious problem or misunderstanding here, but a conscious algorithmic
choice being fooled by clock skew.  I wonder if there's a way such clock
skew can appear without actual committer clocks being off by months...

> FWIW, much work is being done on a cached commit graph including commit
> generation numbers, which will solve this issue both correctly and more
> efficiently.  Perhaps it will already be included in the next release.

Wonderful news, thanks for sharing it!
-- 
Regards,
Feri

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: branch --contains / tag --merged inconsistency
  2018-04-27 16:03 ` SZEDER Gábor
  2018-04-28  6:27   ` Ferenc Wágner
@ 2018-04-30 14:01   ` Derrick Stolee
  1 sibling, 0 replies; 4+ messages in thread
From: Derrick Stolee @ 2018-04-30 14:01 UTC (permalink / raw)
  To: SZEDER Gábor, Ferenc Wágner; +Cc: git

On 4/27/2018 12:03 PM, SZEDER Gábor wrote:
> Szia Feri,
>
>> I'm moving the IRC discussion here, because this might be a bug report
>> in the end.  So, kindly try these steps (103 MB free space required):
>>
>> $ git clone https://github.com/ClusterLabs/pacemaker.git && cd pacemaker
>> [...]
>> $ git branch --contains Pacemaker-0.6.1
>> * master
>> $ git tag --merged master | fgrep Pacemaker-0.6
>> Pacemaker-0.6.0
>> Pacemaker-0.6.2
>> Pacemaker-0.6.3
>> Pacemaker-0.6.4
>> Pacemaker-0.6.5
>> Pacemaker-0.6.6
>>
>> Notice that Pacemaker-0.6.1 is missing from the output.  Kind people on
>> IRC didn't find a quick explanation, and we all had to go eventually.
>> Is this expected behavior?  Reproduced with git 2.11.0 and 2.17.0.
> The commit pointed to by the tag Pacemaker-0.6.1 and its parent have a
> serious clock skew, i.e. they are a few months older than their parents:
>
> $ git log --format='%h %ad %cd%d%n    %s' --date=short Pacemaker-0.6.1^..47a8ef4c
> 47a8ef4ce 2008-02-15 2008-02-15
>      Low: TE: Logging - display the op's magic field for unexpected and foreign events
> b9cfcd6b4 2007-12-10 2007-12-10 (tag: Pacemaker-0.6.2)
>      haresources2cib.py: set default-action-timeout to the default (20s)
> 52e7793e0 2007-12-10 2007-12-10
>      haresources2cib.py: update ra parameters lists
> dea277271 2008-02-14 2008-02-14
>      Medium: Build: Turn on snmp support in rpm packages (patch from MATSUDA, Daiki)
> f418742fe 2008-02-14 2008-02-14
>      Low: Build: Update the .spec file with the one used by build service
> ccfa716a5 2008-02-14 2008-02-14
>      Medium: SNMP: Allow the snmp subagent to be built (patch from MATSUDA, Daiki)
> 50f0ade2d 2008-02-14 2008-02-14
>      Low: Build: Update last release number
> 90f11667f 2008-02-14 2008-02-14
>      Medium: Tools: Make sure the autoconf variables in haresources2cib are expanded
> 9d2383c46 2008-02-11 2008-02-11 (tag: Pacemaker-0.6.1)
>      High: cib: Ensure the archived file hits the disk before returning
>
> (branch|tag|describe|...) (--contains|--merged) use the commit timestamp
> information as a heuristic to avoid traversing parts of the history,
> which makes these operations, especially on big histories, an order of
> magnitude or two faster.  Yeah, commit timestamps can't always be
> trusted, but skewed commits are rare, and skewed commits with this much
> skew are even rarer.
>
> I'm not sure how (or if it's at all possible) to turn off this
> timestamp-based optimisation.

This is actually a bit more complicated. The "--merged" check in 'git 
tag' uses a different mechanism to detect which tags are reachable. It 
uses a revision walk starting at the "merge commit" (master in your 
case) and all tags with the "limited" option (to ensure the walk happens 
during prepare_revision_walk()) but marks the merge commit as 
UNINTERESTING. The limit_list() method stops when all commits are marked 
UNINTERESTING - minus some "slop" related to the commits that start the 
walk.

One important note: the set of tags is important here. If you add a new 
tag to the root commit (git tag MyTag a2d71961f) then the walk succeeds 
by ensuring it walks until MyTag. This gets around the clock skew issue. 
There may be other more-recent tags with a clock-skew issue, but since 
Pacemaker-0.6.0 is the oldest tag, that requires the walk to continue 
until at least that date.

The commit-walk machinery in revision.c is rather complicated, and is 
used for a lot of different reasons, such as "git log" and this 
application in "git tag". It is on my list to refactor this code to use 
the commit-graph and generation numbers, but as we can see by this 
example, it is not easy to tease out what is happening in the code.

In a world where generation numbers are expected to be available, we 
could rewrite do_merge_filter() in ref-filter.c to call into 
paint_down_to_common() in commit.c using the new "min_generation" 
marker. By assigning the tags to be in the "twos" list and the merge 
commit in the "one" commit, we can check if the tags have the PARENT1 
flag after the walk in paint_down_to_common(). Due to the static nature 
of paint_down_to_common(), we will likely want to abstract this into an 
external method in commit.c, say can_reach_many(struct commit *from, 
struct commit_list *to).

> FWIW, much work is being done on a cached commit graph including commit
> generation numbers, which will solve this issue both correctly and more
> efficiently.  Perhaps it will already be included in the next release.

The work in ds/generation-numbers is focused on the "git tag --contains" 
method, which does return correctly here (it is the reverse of the 
--merged condition):

Which tags can reach Pacemaker-0.6.1?

$ git tag --contains Pacemaker-0.6.1
(returns a big list)

This is the actual reverse lookup (which branches contain this tag?)

$ git branch --contains Pacemaker-0.6.1 | grep master
* master

These commands work despite clock skew. The commit-graph feature makes 
them faster.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-04-30 14:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 14:30 branch --contains / tag --merged inconsistency Ferenc Wágner
2018-04-27 16:03 ` SZEDER Gábor
2018-04-28  6:27   ` Ferenc Wágner
2018-04-30 14:01   ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).