git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>, "Ferenc Wágner" <wferi@niif.hu>
Cc: git@vger.kernel.org
Subject: Re: branch --contains / tag --merged inconsistency
Date: Mon, 30 Apr 2018 10:01:47 -0400	[thread overview]
Message-ID: <f73a956d-2782-11b3-f84b-67ce1e0ae047@gmail.com> (raw)
In-Reply-To: <20180427160344.12724-1-szeder.dev@gmail.com>

On 4/27/2018 12:03 PM, SZEDER Gábor wrote:
> Szia Feri,
>
>> I'm moving the IRC discussion here, because this might be a bug report
>> in the end.  So, kindly try these steps (103 MB free space required):
>>
>> $ git clone https://github.com/ClusterLabs/pacemaker.git && cd pacemaker
>> [...]
>> $ git branch --contains Pacemaker-0.6.1
>> * master
>> $ git tag --merged master | fgrep Pacemaker-0.6
>> Pacemaker-0.6.0
>> Pacemaker-0.6.2
>> Pacemaker-0.6.3
>> Pacemaker-0.6.4
>> Pacemaker-0.6.5
>> Pacemaker-0.6.6
>>
>> Notice that Pacemaker-0.6.1 is missing from the output.  Kind people on
>> IRC didn't find a quick explanation, and we all had to go eventually.
>> Is this expected behavior?  Reproduced with git 2.11.0 and 2.17.0.
> The commit pointed to by the tag Pacemaker-0.6.1 and its parent have a
> serious clock skew, i.e. they are a few months older than their parents:
>
> $ git log --format='%h %ad %cd%d%n    %s' --date=short Pacemaker-0.6.1^..47a8ef4c
> 47a8ef4ce 2008-02-15 2008-02-15
>      Low: TE: Logging - display the op's magic field for unexpected and foreign events
> b9cfcd6b4 2007-12-10 2007-12-10 (tag: Pacemaker-0.6.2)
>      haresources2cib.py: set default-action-timeout to the default (20s)
> 52e7793e0 2007-12-10 2007-12-10
>      haresources2cib.py: update ra parameters lists
> dea277271 2008-02-14 2008-02-14
>      Medium: Build: Turn on snmp support in rpm packages (patch from MATSUDA, Daiki)
> f418742fe 2008-02-14 2008-02-14
>      Low: Build: Update the .spec file with the one used by build service
> ccfa716a5 2008-02-14 2008-02-14
>      Medium: SNMP: Allow the snmp subagent to be built (patch from MATSUDA, Daiki)
> 50f0ade2d 2008-02-14 2008-02-14
>      Low: Build: Update last release number
> 90f11667f 2008-02-14 2008-02-14
>      Medium: Tools: Make sure the autoconf variables in haresources2cib are expanded
> 9d2383c46 2008-02-11 2008-02-11 (tag: Pacemaker-0.6.1)
>      High: cib: Ensure the archived file hits the disk before returning
>
> (branch|tag|describe|...) (--contains|--merged) use the commit timestamp
> information as a heuristic to avoid traversing parts of the history,
> which makes these operations, especially on big histories, an order of
> magnitude or two faster.  Yeah, commit timestamps can't always be
> trusted, but skewed commits are rare, and skewed commits with this much
> skew are even rarer.
>
> I'm not sure how (or if it's at all possible) to turn off this
> timestamp-based optimisation.

This is actually a bit more complicated. The "--merged" check in 'git 
tag' uses a different mechanism to detect which tags are reachable. It 
uses a revision walk starting at the "merge commit" (master in your 
case) and all tags with the "limited" option (to ensure the walk happens 
during prepare_revision_walk()) but marks the merge commit as 
UNINTERESTING. The limit_list() method stops when all commits are marked 
UNINTERESTING - minus some "slop" related to the commits that start the 
walk.

One important note: the set of tags is important here. If you add a new 
tag to the root commit (git tag MyTag a2d71961f) then the walk succeeds 
by ensuring it walks until MyTag. This gets around the clock skew issue. 
There may be other more-recent tags with a clock-skew issue, but since 
Pacemaker-0.6.0 is the oldest tag, that requires the walk to continue 
until at least that date.

The commit-walk machinery in revision.c is rather complicated, and is 
used for a lot of different reasons, such as "git log" and this 
application in "git tag". It is on my list to refactor this code to use 
the commit-graph and generation numbers, but as we can see by this 
example, it is not easy to tease out what is happening in the code.

In a world where generation numbers are expected to be available, we 
could rewrite do_merge_filter() in ref-filter.c to call into 
paint_down_to_common() in commit.c using the new "min_generation" 
marker. By assigning the tags to be in the "twos" list and the merge 
commit in the "one" commit, we can check if the tags have the PARENT1 
flag after the walk in paint_down_to_common(). Due to the static nature 
of paint_down_to_common(), we will likely want to abstract this into an 
external method in commit.c, say can_reach_many(struct commit *from, 
struct commit_list *to).

> FWIW, much work is being done on a cached commit graph including commit
> generation numbers, which will solve this issue both correctly and more
> efficiently.  Perhaps it will already be included in the next release.

The work in ds/generation-numbers is focused on the "git tag --contains" 
method, which does return correctly here (it is the reverse of the 
--merged condition):

Which tags can reach Pacemaker-0.6.1?

$ git tag --contains Pacemaker-0.6.1
(returns a big list)

This is the actual reverse lookup (which branches contain this tag?)

$ git branch --contains Pacemaker-0.6.1 | grep master
* master

These commands work despite clock skew. The commit-graph feature makes 
them faster.

Thanks,
-Stolee


      parent reply	other threads:[~2018-04-30 14:01 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-27 14:30 branch --contains / tag --merged inconsistency Ferenc Wágner
2018-04-27 16:03 ` SZEDER Gábor
2018-04-28  6:27   ` Ferenc Wágner
2018-04-30 14:01   ` Derrick Stolee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f73a956d-2782-11b3-f84b-67ce1e0ae047@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=szeder.dev@gmail.com \
    --cc=wferi@niif.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).