git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* Revision walking, commit dates, slop
@ 2019-05-18  0:54 Mike Hommey
  2019-05-18  1:50 ` SZEDER Gábor
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Hommey @ 2019-05-18  0:54 UTC (permalink / raw)
  To: git

Hi,

There are established corner cases, where in a repo where commit dates
are not monotonically increasing, revision walking can go horribly
wrong. This was discussed in the past in e.g.
https://public-inbox.org/git/20150521061553.GA29269@glandium.org/

The only (simple) workable way, given the current algorithm, to get an
accurate view off rev-list is to essentially make slop infinite. This
works fine, at the expense of runtime.

Now, ignoring any modification for the above, I'm hitting another corner
case in some other "weird" history, where I have 500k commits all with
the same date. With such a commit dag, something as trivial as
`git rev-list HEAD~..HEAD` goes through all commits from the root commit
to HEAD, which takes multiple seconds, when the (obvious) output is one
commit.

It looks like the only way revision walking stops going through all the
ancestry is through slop, and slop is essentially made infinite by the
fact all commits have the same date (because of the date check in
still_interesting(). By extension, this means the workaound for the
first corner case above, which is to make slop infinite, essentially
makes all rev walking go through the entire ancestry of the commits
given on the command line.

It feels like some cases of everybody_uninteresting should shorcut slop
entirely, but considering the only way for slop to decrease at all is
when everybody_uninteresting returns true, that would seem like a wrong
assumption. But I'm also not sure what slop helps with in the first
place (but I don't have a clear view of the broader picture of how the
entire revision walking works).

Anyways, a rather easy way to witness this happening is to create a
dummy repo like:
  git init foo
  cd foo
  for i in $(seq 1 50); do
    echo $i > a;
    git add a;
    git commit -a -m $i;
  done

The something as simple as `git rev-list HEAD~..HEAD` will go through
all 50 commits (assuming the script above created commits in the same
second, which it did on my machine)

By the time both HEAD~ and HEAD have been processed, the revision
walking should have enough information to determine that it doesn't need
to go further, but still does. Even with something like HEAD~2..HEAD,
after the first round of processing parents it should be able to see
there's not going to be any more interesting commits.

I'm willing to dig into this, but if someone familiar with the
algorithm could give me some hints as to what I might be missing in the
big picture, that would be helpful.

Cheers,

Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-18  0:54 Revision walking, commit dates, slop Mike Hommey
@ 2019-05-18  1:50 ` SZEDER Gábor
  2019-05-18  3:58   ` Mike Hommey
  2019-05-21  2:00   ` Revision walking, commit dates, slop Jonathan Nieder
  0 siblings, 2 replies; 23+ messages in thread
From: SZEDER Gábor @ 2019-05-18  1:50 UTC (permalink / raw)
  To: Mike Hommey; +Cc: git

On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:
> There are established corner cases, where in a repo where commit dates
> are not monotonically increasing, revision walking can go horribly
> wrong. This was discussed in the past in e.g.
> https://public-inbox.org/git/20150521061553.GA29269@glandium.org/
> 
> The only (simple) workable way, given the current algorithm, to get an
> accurate view off rev-list is to essentially make slop infinite. This
> works fine, at the expense of runtime.
> 
> Now, ignoring any modification for the above, I'm hitting another corner
> case in some other "weird" history, where I have 500k commits all with
> the same date. With such a commit dag, something as trivial as
> `git rev-list HEAD~..HEAD` goes through all commits from the root commit
> to HEAD, which takes multiple seconds, when the (obvious) output is one
> commit.
> 
> It looks like the only way revision walking stops going through all the
> ancestry is through slop, and slop is essentially made infinite by the
> fact all commits have the same date (because of the date check in
> still_interesting(). By extension, this means the workaound for the
> first corner case above, which is to make slop infinite, essentially
> makes all rev walking go through the entire ancestry of the commits
> given on the command line.
> 
> It feels like some cases of everybody_uninteresting should shorcut slop
> entirely, but considering the only way for slop to decrease at all is
> when everybody_uninteresting returns true, that would seem like a wrong
> assumption. But I'm also not sure what slop helps with in the first
> place (but I don't have a clear view of the broader picture of how the
> entire revision walking works).
> 
> Anyways, a rather easy way to witness this happening is to create a
> dummy repo like:
>   git init foo
>   cd foo
>   for i in $(seq 1 50); do
>     echo $i > a;
>     git add a;
>     git commit -a -m $i;
>   done
> 
> The something as simple as `git rev-list HEAD~..HEAD` will go through
> all 50 commits (assuming the script above created commits in the same
> second, which it did on my machine)
> 
> By the time both HEAD~ and HEAD have been processed, the revision
> walking should have enough information to determine that it doesn't need
> to go further, but still does. Even with something like HEAD~2..HEAD,
> after the first round of processing parents it should be able to see
> there's not going to be any more interesting commits.
> 
> I'm willing to dig into this, but if someone familiar with the
> algorithm could give me some hints as to what I might be missing in the
> big picture, that would be helpful.

All the above is without commit-graph, I presume?  If so, then you
should give it a try, as it might bring immediate help in your
pathological repo.  With 5k commit in the same second (enforced via
'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:

  $ best-of-five -q git rev-list HEAD~..HEAD
  0.069
  $ git commit-graph write --reachableComputing commit graph generation
  numbers: 100% (5000/5000), done.
  $ best-of-five -q git rev-list HEAD~..HEAD
  0.004



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-18  1:50 ` SZEDER Gábor
@ 2019-05-18  3:58   ` Mike Hommey
  2019-05-18  4:17     ` Mike Hommey
  2019-05-21  2:00   ` Revision walking, commit dates, slop Jonathan Nieder
  1 sibling, 1 reply; 23+ messages in thread
From: Mike Hommey @ 2019-05-18  3:58 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git

On Sat, May 18, 2019 at 03:50:05AM +0200, SZEDER Gábor wrote:
> On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:
> > There are established corner cases, where in a repo where commit dates
> > are not monotonically increasing, revision walking can go horribly
> > wrong. This was discussed in the past in e.g.
> > https://public-inbox.org/git/20150521061553.GA29269@glandium.org/
> > 
> > The only (simple) workable way, given the current algorithm, to get an
> > accurate view off rev-list is to essentially make slop infinite. This
> > works fine, at the expense of runtime.
> > 
> > Now, ignoring any modification for the above, I'm hitting another corner
> > case in some other "weird" history, where I have 500k commits all with
> > the same date. With such a commit dag, something as trivial as
> > `git rev-list HEAD~..HEAD` goes through all commits from the root commit
> > to HEAD, which takes multiple seconds, when the (obvious) output is one
> > commit.
> > 
> > It looks like the only way revision walking stops going through all the
> > ancestry is through slop, and slop is essentially made infinite by the
> > fact all commits have the same date (because of the date check in
> > still_interesting(). By extension, this means the workaound for the
> > first corner case above, which is to make slop infinite, essentially
> > makes all rev walking go through the entire ancestry of the commits
> > given on the command line.
> > 
> > It feels like some cases of everybody_uninteresting should shorcut slop
> > entirely, but considering the only way for slop to decrease at all is
> > when everybody_uninteresting returns true, that would seem like a wrong
> > assumption. But I'm also not sure what slop helps with in the first
> > place (but I don't have a clear view of the broader picture of how the
> > entire revision walking works).
> > 
> > Anyways, a rather easy way to witness this happening is to create a
> > dummy repo like:
> >   git init foo
> >   cd foo
> >   for i in $(seq 1 50); do
> >     echo $i > a;
> >     git add a;
> >     git commit -a -m $i;
> >   done
> > 
> > The something as simple as `git rev-list HEAD~..HEAD` will go through
> > all 50 commits (assuming the script above created commits in the same
> > second, which it did on my machine)
> > 
> > By the time both HEAD~ and HEAD have been processed, the revision
> > walking should have enough information to determine that it doesn't need
> > to go further, but still does. Even with something like HEAD~2..HEAD,
> > after the first round of processing parents it should be able to see
> > there's not going to be any more interesting commits.
> > 
> > I'm willing to dig into this, but if someone familiar with the
> > algorithm could give me some hints as to what I might be missing in the
> > big picture, that would be helpful.
> 
> All the above is without commit-graph, I presume?  If so, then you
> should give it a try, as it might bring immediate help in your
> pathological repo.  With 5k commit in the same second (enforced via
> 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:
> 
>   $ best-of-five -q git rev-list HEAD~..HEAD
>   0.069
>   $ git commit-graph write --reachableComputing commit graph generation
>   numbers: 100% (5000/5000), done.
>   $ best-of-five -q git rev-list HEAD~..HEAD
>   0.004

I'm not observing any difference from using commit-graph, whether in
time or in the number of commits that are looked at in limit_list().

Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-18  3:58   ` Mike Hommey
@ 2019-05-18  4:17     ` Mike Hommey
  2019-05-18 12:01       ` SZEDER Gábor
  2019-05-20  1:33       ` Derrick Stolee
  0 siblings, 2 replies; 23+ messages in thread
From: Mike Hommey @ 2019-05-18  4:17 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git

On Sat, May 18, 2019 at 12:58:28PM +0900, Mike Hommey wrote:
> On Sat, May 18, 2019 at 03:50:05AM +0200, SZEDER Gábor wrote:
> > On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:
> > > There are established corner cases, where in a repo where commit dates
> > > are not monotonically increasing, revision walking can go horribly
> > > wrong. This was discussed in the past in e.g.
> > > https://public-inbox.org/git/20150521061553.GA29269@glandium.org/
> > > 
> > > The only (simple) workable way, given the current algorithm, to get an
> > > accurate view off rev-list is to essentially make slop infinite. This
> > > works fine, at the expense of runtime.
> > > 
> > > Now, ignoring any modification for the above, I'm hitting another corner
> > > case in some other "weird" history, where I have 500k commits all with
> > > the same date. With such a commit dag, something as trivial as
> > > `git rev-list HEAD~..HEAD` goes through all commits from the root commit
> > > to HEAD, which takes multiple seconds, when the (obvious) output is one
> > > commit.
> > > 
> > > It looks like the only way revision walking stops going through all the
> > > ancestry is through slop, and slop is essentially made infinite by the
> > > fact all commits have the same date (because of the date check in
> > > still_interesting(). By extension, this means the workaound for the
> > > first corner case above, which is to make slop infinite, essentially
> > > makes all rev walking go through the entire ancestry of the commits
> > > given on the command line.
> > > 
> > > It feels like some cases of everybody_uninteresting should shorcut slop
> > > entirely, but considering the only way for slop to decrease at all is
> > > when everybody_uninteresting returns true, that would seem like a wrong
> > > assumption. But I'm also not sure what slop helps with in the first
> > > place (but I don't have a clear view of the broader picture of how the
> > > entire revision walking works).
> > > 
> > > Anyways, a rather easy way to witness this happening is to create a
> > > dummy repo like:
> > >   git init foo
> > >   cd foo
> > >   for i in $(seq 1 50); do
> > >     echo $i > a;
> > >     git add a;
> > >     git commit -a -m $i;
> > >   done
> > > 
> > > The something as simple as `git rev-list HEAD~..HEAD` will go through
> > > all 50 commits (assuming the script above created commits in the same
> > > second, which it did on my machine)
> > > 
> > > By the time both HEAD~ and HEAD have been processed, the revision
> > > walking should have enough information to determine that it doesn't need
> > > to go further, but still does. Even with something like HEAD~2..HEAD,
> > > after the first round of processing parents it should be able to see
> > > there's not going to be any more interesting commits.
> > > 
> > > I'm willing to dig into this, but if someone familiar with the
> > > algorithm could give me some hints as to what I might be missing in the
> > > big picture, that would be helpful.
> > 
> > All the above is without commit-graph, I presume?  If so, then you
> > should give it a try, as it might bring immediate help in your
> > pathological repo.  With 5k commit in the same second (enforced via
> > 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:
> > 
> >   $ best-of-five -q git rev-list HEAD~..HEAD
> >   0.069
> >   $ git commit-graph write --reachableComputing commit graph generation
> >   numbers: 100% (5000/5000), done.
> >   $ best-of-five -q git rev-list HEAD~..HEAD
> >   0.004
> 
> I'm not observing any difference from using commit-graph, whether in
> time or in the number of commits that are looked at in limit_list().

-c core.commitGraph=true does make a difference in time, but not in the
number of commits looked at in limit_list(). So it's only faster because
each iteration of the loop is faster. It means it's still dependent on
the depth of the dag, and the larger the repo will grow, the slower it
will get.

Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-18  4:17     ` Mike Hommey
@ 2019-05-18 12:01       ` SZEDER Gábor
  2019-05-19 22:28         ` Jakub Narebski
  2019-05-20  1:33       ` Derrick Stolee
  1 sibling, 1 reply; 23+ messages in thread
From: SZEDER Gábor @ 2019-05-18 12:01 UTC (permalink / raw)
  To: Mike Hommey; +Cc: git, Kacper Kornet

On Sat, May 18, 2019 at 01:17:06PM +0900, Mike Hommey wrote:
> On Sat, May 18, 2019 at 12:58:28PM +0900, Mike Hommey wrote:
> > On Sat, May 18, 2019 at 03:50:05AM +0200, SZEDER Gábor wrote:
> > > On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:
> > > > There are established corner cases, where in a repo where commit dates
> > > > are not monotonically increasing, revision walking can go horribly
> > > > wrong. This was discussed in the past in e.g.
> > > > https://public-inbox.org/git/20150521061553.GA29269@glandium.org/
> > > > 
> > > > The only (simple) workable way, given the current algorithm, to get an
> > > > accurate view off rev-list is to essentially make slop infinite. This
> > > > works fine, at the expense of runtime.
> > > > 
> > > > Now, ignoring any modification for the above, I'm hitting another corner
> > > > case in some other "weird" history, where I have 500k commits all with
> > > > the same date. With such a commit dag, something as trivial as
> > > > `git rev-list HEAD~..HEAD` goes through all commits from the root commit
> > > > to HEAD, which takes multiple seconds, when the (obvious) output is one
> > > > commit.
> > > > 
> > > > It looks like the only way revision walking stops going through all the
> > > > ancestry is through slop, and slop is essentially made infinite by the
> > > > fact all commits have the same date (because of the date check in
> > > > still_interesting(). By extension, this means the workaound for the
> > > > first corner case above, which is to make slop infinite, essentially
> > > > makes all rev walking go through the entire ancestry of the commits
> > > > given on the command line.
> > > > 
> > > > It feels like some cases of everybody_uninteresting should shorcut slop
> > > > entirely, but considering the only way for slop to decrease at all is
> > > > when everybody_uninteresting returns true, that would seem like a wrong
> > > > assumption. But I'm also not sure what slop helps with in the first
> > > > place (but I don't have a clear view of the broader picture of how the
> > > > entire revision walking works).
> > > > 
> > > > Anyways, a rather easy way to witness this happening is to create a
> > > > dummy repo like:
> > > >   git init foo
> > > >   cd foo
> > > >   for i in $(seq 1 50); do
> > > >     echo $i > a;
> > > >     git add a;
> > > >     git commit -a -m $i;
> > > >   done
> > > > 
> > > > The something as simple as `git rev-list HEAD~..HEAD` will go through
> > > > all 50 commits (assuming the script above created commits in the same
> > > > second, which it did on my machine)
> > > > 
> > > > By the time both HEAD~ and HEAD have been processed, the revision
> > > > walking should have enough information to determine that it doesn't need
> > > > to go further, but still does. Even with something like HEAD~2..HEAD,
> > > > after the first round of processing parents it should be able to see
> > > > there's not going to be any more interesting commits.
> > > > 
> > > > I'm willing to dig into this, but if someone familiar with the
> > > > algorithm could give me some hints as to what I might be missing in the
> > > > big picture, that would be helpful.
> > > 
> > > All the above is without commit-graph, I presume?  If so, then you
> > > should give it a try, as it might bring immediate help in your
> > > pathological repo.  With 5k commit in the same second (enforced via
> > > 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:
> > > 
> > >   $ best-of-five -q git rev-list HEAD~..HEAD
> > >   0.069
> > >   $ git commit-graph write --reachableComputing commit graph generation
> > >   numbers: 100% (5000/5000), done.
> > >   $ best-of-five -q git rev-list HEAD~..HEAD
> > >   0.004
> > 
> > I'm not observing any difference from using commit-graph, whether in
> > time or in the number of commits that are looked at in limit_list().
> 
> -c core.commitGraph=true does make a difference in time, but not in the
> number of commits looked at in limit_list(). So it's only faster because
> each iteration of the loop is faster. It means it's still dependent on
> the depth of the dag, and the larger the repo will grow, the slower it
> will get.

Oh, indeed.  Well, at least you'll waste about an order of magnitude
less processor time until you figure out how to fix it :)

Btw, once upon a time this was fast, but it became slow with commit
c19d1b4e84 (Fix revision walk for commits with the same dates,
2013-03-22).



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-18 12:01       ` SZEDER Gábor
@ 2019-05-19 22:28         ` Jakub Narebski
  0 siblings, 0 replies; 23+ messages in thread
From: Jakub Narebski @ 2019-05-19 22:28 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Mike Hommey, git, Kacper Kornet, Derrick Stolee

SZEDER Gábor <szeder.dev@gmail.com> writes:
> On Sat, May 18, 2019 at 01:17:06PM +0900, Mike Hommey wrote:
>> On Sat, May 18, 2019 at 12:58:28PM +0900, Mike Hommey wrote:
>>> On Sat, May 18, 2019 at 03:50:05AM +0200, SZEDER Gábor wrote:
>>>> On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:
[...]
>>>>> There are established corner cases, where in a repo where commit dates
>>>>> are not monotonically increasing, revision walking can go horribly
>>>>> wrong. This was discussed in the past in e.g.
>>>>> https://public-inbox.org/git/20150521061553.GA29269@glandium.org/
>>>>> 
>>>>> The only (simple) workable way, given the current algorithm, to get an
>>>>> accurate view off rev-list is to essentially make slop infinite. This
>>>>> works fine, at the expense of runtime.
>>>>> 
>>>>> Now, ignoring any modification for the above, I'm hitting another corner
>>>>> case in some other "weird" history, where I have 500k commits all with
>>>>> the same date. With such a commit dag, something as trivial as
>>>>> `git rev-list HEAD~..HEAD` goes through all commits from the root commit
>>>>> to HEAD, which takes multiple seconds, when the (obvious) output is one
>>>>> commit.
[...]
>>>> All the above is without commit-graph, I presume?  If so, then you
>>>> should give it a try, as it might bring immediate help in your
>>>> pathological repo.  With 5k commit in the same second (enforced via
>>>> 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:
>>>> 
>>>>   $ best-of-five -q git rev-list HEAD~..HEAD
>>>>   0.069
>>>>   $ git commit-graph write --reachableComputing commit graph generation
>>>>   numbers: 100% (5000/5000), done.
>>>>   $ best-of-five -q git rev-list HEAD~..HEAD
>>>>   0.004
>>> 
>>> I'm not observing any difference from using commit-graph, whether in
>>> time or in the number of commits that are looked at in limit_list().
>> 
>> -c core.commitGraph=true does make a difference in time, but not in the
>> number of commits looked at in limit_list(). So it's only faster because
>> each iteration of the loop is faster. It means it's still dependent on
>> the depth of the dag, and the larger the repo will grow, the slower it
>> will get.
>
> Oh, indeed.  Well, at least you'll waste about an order of magnitude
> less processor time until you figure out how to fix it :)
>
> Btw, once upon a time this was fast, but it became slow with commit
> c19d1b4e84 (Fix revision walk for commits with the same dates,
> 2013-03-22).

This might be related to the fact, that with generation numbers v1
(i.e. topological levels) there were cases when using those generation
numbers for ordering caused performance regressions (e.g. for some parts
of Linux kernel history).

There was an idea of replacing them with generation numbers v2 [1],
which if I remember correctly was chosen to be corrected date (that is,
committer date or one more than maximum of dates of parents).  This is
incompatibile change; it turned out however that while commit-graph
format is versioned, unfortunately Git fails hard if commit-graph
version is too new, instead of falling back to not using commit graph.

[1]: https://github.com/derrickstolee/gen-test

Stolee, could you tell us what is the current status on generation
numbers v2?  Thanks in advance.

Best,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-18  4:17     ` Mike Hommey
  2019-05-18 12:01       ` SZEDER Gábor
@ 2019-05-20  1:33       ` Derrick Stolee
  2019-05-20 11:02         ` Jakub Narebski
  2019-05-21 13:14         ` [PATCH] revision: use generation for A..B --topo-order queries Derrick Stolee
  1 sibling, 2 replies; 23+ messages in thread
From: Derrick Stolee @ 2019-05-20  1:33 UTC (permalink / raw)
  To: Mike Hommey, SZEDER Gábor; +Cc: git

On 5/18/2019 12:17 AM, Mike Hommey wrote:
> On Sat, May 18, 2019 at 12:58:28PM +0900, Mike Hommey wrote:
>> On Sat, May 18, 2019 at 03:50:05AM +0200, SZEDER Gábor wrote:
>>>
>>> All the above is without commit-graph, I presume?  If so, then you
>>> should give it a try, as it might bring immediate help in your
>>> pathological repo.  With 5k commit in the same second (enforced via
>>> 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:
>>>
>>>   $ best-of-five -q git rev-list HEAD~..HEAD
>>>   0.069
>>>   $ git commit-graph write --reachableComputing commit graph generation
>>>   numbers: 100% (5000/5000), done.
>>>   $ best-of-five -q git rev-list HEAD~..HEAD
>>>   0.004
>>
>> I'm not observing any difference from using commit-graph, whether in
>> time or in the number of commits that are looked at in limit_list().
> 
> -c core.commitGraph=true does make a difference in time, but not in the
> number of commits looked at in limit_list(). So it's only faster because
> each iteration of the loop is faster. It means it's still dependent on
> the depth of the dag, and the larger the repo will grow, the slower it
> will get.

The plan is to use the commit-graph's generation numbers for these A..B
queries, but due to some cases when commit date is a _better_ heuristic
than generation numbers, we have not enabled them for A..B. You'll see
that 'git rev-list --topo-order -n 1 HEAD` will be much faster with the
commit-graph, but adding '--topo-order' to your 'HEAD~1..HEAD' query
should not change the time at all.

See [1] for the discussion about "generation number v2" which will allow
us to use a better heuristic in these cases.

Thanks,
-Stolee

[1] https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-20  1:33       ` Derrick Stolee
@ 2019-05-20 11:02         ` Jakub Narebski
  2019-05-20 11:20           ` Derrick Stolee
  2019-05-21 13:14         ` [PATCH] revision: use generation for A..B --topo-order queries Derrick Stolee
  1 sibling, 1 reply; 23+ messages in thread
From: Jakub Narebski @ 2019-05-20 11:02 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Mike Hommey, SZEDER Gábor, git

Derrick Stolee <stolee@gmail.com> writes:
> On 5/18/2019 12:17 AM, Mike Hommey wrote:
>> On Sat, May 18, 2019 at 12:58:28PM +0900, Mike Hommey wrote:
>>> On Sat, May 18, 2019 at 03:50:05AM +0200, SZEDER Gábor wrote:
>>>>
>>>> All the above is without commit-graph, I presume?  If so, then you
>>>> should give it a try, as it might bring immediate help in your
>>>> pathological repo.  With 5k commit in the same second (enforced via
>>>> 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:
>>>>
>>>>   $ best-of-five -q git rev-list HEAD~..HEAD
>>>>   0.069
>>>>   $ git commit-graph write --reachableComputing commit graph generation
>>>>   numbers: 100% (5000/5000), done.
>>>>   $ best-of-five -q git rev-list HEAD~..HEAD
>>>>   0.004
>>>
>>> I'm not observing any difference from using commit-graph, whether in
>>> time or in the number of commits that are looked at in limit_list().
>> 
>> -c core.commitGraph=true does make a difference in time, but not in the
>> number of commits looked at in limit_list(). So it's only faster because
>> each iteration of the loop is faster. It means it's still dependent on
>> the depth of the dag, and the larger the repo will grow, the slower it
>> will get.
>
> The plan is to use the commit-graph's generation numbers for these A..B
> queries, but due to some cases when commit date is a _better_ heuristic
> than generation numbers, we have not enabled them for A..B. You'll see
> that 'git rev-list --topo-order -n 1 HEAD` will be much faster with the
> commit-graph, but adding '--topo-order' to your 'HEAD~1..HEAD' query
> should not change the time at all.
>
> See [1] for the discussion about "generation number v2" which will allow
> us to use a better heuristic in these cases.
>
> [1] https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/

Are there any blockers that prevent the switch to this
"generation number v2"?

- Is it a problem with insufficient data to choose the correct numbering
  as "generation number v2' (there can be only one)?
- Is it a problem with selected "generation number v2" being
  incompatibile with gen v2, and Git failing when new version of
  commit-graph is used instead of softly just not using commit-graph?
- Or is it something else?

Best,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-20 11:02         ` Jakub Narebski
@ 2019-05-20 11:20           ` Derrick Stolee
  2019-05-20 13:42             ` Jakub Narebski
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2019-05-20 11:20 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Mike Hommey, SZEDER Gábor, git

On 5/20/2019 7:02 AM, Jakub Narebski wrote:
>
> Are there any blockers that prevent the switch to this
> "generation number v2"?
> 
> - Is it a problem with insufficient data to choose the correct numbering
>   as "generation number v2' (there can be only one)?
> - Is it a problem with selected "generation number v2" being
>   incompatibile with gen v2, and Git failing when new version of
>   commit-graph is used instead of softly just not using commit-graph?
> - Or is it something else?

The switch becomes a bit complicated.

First, the plan was to version the commit-graph file to v2, and that would
include a byte in the header for the "reachability index version" [1]. Since
older clients fail hard on a newer file version, we are switching to instead
including the reachability index version as a value in a more flexible 
"metadata chunk" [2]. Using the generation number column for the corrected
commit-date offsets (assuming we also guarantee the offset is strictly
increasing from parent to child), these new values will be backwards-
compatible _except_ for 'git commit-graph verify'.

Second, we need to pull the reachability index value into a commit slab.
The generation value is currently 32 bits, but we will expand that to
64 as it stores a timestamp. The commit struct is a bit bloated already,
so this will reduce the required memory space even when not using the
commit-graph. But, it requires some refactoring, including every place
where we pass a "min_generation" needs to change type and name.

Third and finally, we need to calculate the new values and change the
read logic to sum the offset and commit-date (when the metadata chunk
says we are using corrected commit date).

While none of this is _incredibly_ hard to do, it does require a bit
of care. It's on my list to get to at some point, but making the file
incremental is higher priority to me.

Thanks,
-Stolee

[1] https://public-inbox.org/git/pull.112.git.gitgitgadget@gmail.com/
[2] https://public-inbox.org/git/87h8acivkh.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-20 11:20           ` Derrick Stolee
@ 2019-05-20 13:42             ` Jakub Narebski
  2019-05-20 23:27               ` Jakub Narebski
  2019-06-25  7:51               ` Jakub Narebski
  0 siblings, 2 replies; 23+ messages in thread
From: Jakub Narebski @ 2019-05-20 13:42 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Mike Hommey, SZEDER Gábor, git

Derrick Stolee <stolee@gmail.com> writes:
> On 5/20/2019 7:02 AM, Jakub Narebski wrote:
>>
>> Are there any blockers that prevent the switch to this
>> "generation number v2"?
>> 
>> - Is it a problem with insufficient data to choose the correct numbering
>>   as "generation number v2' (there can be only one)?
>> - Is it a problem with selected "generation number v2" being
>>   incompatibile with gen v2, and Git failing when new version of
>>   commit-graph is used instead of softly just not using commit-graph?
>> - Or is it something else?

Thanks for the explanation.

> The switch becomes a bit complicated.
>
> First, the plan was to version the commit-graph file to v2, and that would
> include a byte in the header for the "reachability index version" [1]. Since
> older clients fail hard on a newer file version, we are switching to instead
> including the reachability index version as a value in a more flexible 
> "metadata chunk" [2].

Ugh, that is bad.  The version number inn the commit-graph file format
was supposed to make it possible to easily change the format; now it
looks like we are stuck with workarounds until the released version that
dies on never commit-graph file format version innstead of softly not
utilizing the commit-graph file dies off.

How this issue got missed in review...

If we cannot change the format, all that is left is ading new chunks,
and changes that conform to commit-graph file version 1.  

>                      Using the generation number column for the corrected
> commit-date offsets (assuming we also guarantee the offset is strictly
> increasing from parent to child), these new values will be backwards-
> compatible _except_ for 'git commit-graph verify'.

O.K., so the "generation number v2 (legacy)" would be incremental and
backward-compatibile in use (though not in generation and validation).

Do I understand it correctly how it is calculated:

  corrected_date(C) = max(committer_date(C),
                          max_{P ∈ parents(C)}(corrected_date(P)) + 1)
  offset(C) = corrected_date(C) - committer_date(C)
  gen_v2(C) = max(offset(C), max_{P ∈ parents(C)}(gen_v2(P)) + 1) 

Do you have benchmark for this "monotonically offset corrected commit
date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
and https://github.com/derrickstolee/gen-test ?


Also, what would happen if different versions of Git tried to add to
commit-graph in interleaved way, either with rewrite or incremental?

> Second, we need to pull the reachability index value into a commit slab.

Is commit slab documented somewhere in Documentation/technical/, or just
in comments in commit-slab.h?

As I understand it, commit slab is Git-specific implementition of
inside-out object storage for commit data (i.e. struct of arrays instead
of array of structs), isn't it?  I wonder if using commit slab improves
cache utilization...

> The generation value is currently 32 bits, but we will expand that to
> 64 as it stores a timestamp. The commit struct is a bit bloated already,
> so this will reduce the required memory space even when not using the
> commit-graph. But, it requires some refactoring, including every place
> where we pass a "min_generation" needs to change type and name.

Could this be done with Coccinelle's spatch, similar to
e.g. contrib/coccinelle/commit.cocci?

>
> Third and finally, we need to calculate the new values and change the
> read logic to sum the offset and commit-date (when the metadata chunk
> says we are using corrected commit date).

Right.

> While none of this is _incredibly_ hard to do, it does require a bit
> of care. It's on my list to get to at some point, but making the file
> incremental is higher priority to me.

All right, I can understand that.

> [1] https://public-inbox.org/git/pull.112.git.gitgitgadget@gmail.com/
> [2] https://public-inbox.org/git/87h8acivkh.fsf@evledraar.gmail.com/

Best,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-20 13:42             ` Jakub Narebski
@ 2019-05-20 23:27               ` Jakub Narebski
  2019-05-21  1:20                 ` Derrick Stolee
  2019-06-25  7:51               ` Jakub Narebski
  1 sibling, 1 reply; 23+ messages in thread
From: Jakub Narebski @ 2019-05-20 23:27 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Mike Hommey, SZEDER Gábor, git

Jakub Narebski <jnareb@gmail.com> writes:
> Derrick Stolee <stolee@gmail.com> writes:
>> On 5/20/2019 7:02 AM, Jakub Narebski wrote:
>>>
>>> Are there any blockers that prevent the switch to this
>>> "generation number v2"?
[...]

>>                      Using the generation number column for the corrected
>> commit-date offsets (assuming we also guarantee the offset is strictly
>> increasing from parent to child), these new values will be backwards-
>> compatible _except_ for 'git commit-graph verify'.
>
> O.K., so the "generation number v2 (legacy)" would be incremental and
> backward-compatibile in use (though not in generation and validation).
>
> Do I understand it correctly how it is calculated:
>
>   corrected_date(C) = max(committer_date(C),
>                           max_{P ∈ parents(C)}(corrected_date(P)) + 1)

This should probably read

    offset_date(P) = committer_date(P) + gen_v2(P)
    corrected_date(C) = max(committer_date(C),
                            max_{P ∈ parents(C)}(offset_date(P)) + 1)    

>   offset(C) = corrected_date(C) - committer_date(C)
>   gen_v2(C) = max(offset(C), max_{P ∈ parents(C)}(gen_v2(P)) + 1) 
>
> Do you have benchmark for this "monotonically offset corrected commit
> date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
> and https://github.com/derrickstolee/gen-test ?

--
Jakub Narębski

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-20 23:27               ` Jakub Narebski
@ 2019-05-21  1:20                 ` Derrick Stolee
  2019-05-22 18:29                   ` Jakub Narebski
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2019-05-21  1:20 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Mike Hommey, SZEDER Gábor, git

On 5/20/2019 7:27 PM, Jakub Narebski wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>> Derrick Stolee <stolee@gmail.com> writes:
>>> On 5/20/2019 7:02 AM, Jakub Narebski wrote:
>>>>
>>>> Are there any blockers that prevent the switch to this
>>>> "generation number v2"?
> [...]
> 
>>>                      Using the generation number column for the corrected
>>> commit-date offsets (assuming we also guarantee the offset is strictly
>>> increasing from parent to child), these new values will be backwards-
>>> compatible _except_ for 'git commit-graph verify'.
>>
>> O.K., so the "generation number v2 (legacy)" would be incremental and
>> backward-compatibile in use (though not in generation and validation).
>>
>> Do I understand it correctly how it is calculated:
>>
>>   corrected_date(C) = max(committer_date(C),
>>                           max_{P ∈ parents(C)}(corrected_date(P)) + 1)
> 
> This should probably read
> 
>     offset_date(P) = committer_date(P) + gen_v2(P)
>     corrected_date(C) = max(committer_date(C),
>                             max_{P ∈ parents(C)}(offset_date(P)) + 1)  

The final definition needs two conditions on the offset of a commit C for
every parent P:

 1. committer_date(C) + offset(C) > committer_date(P) + offset(P)
 2. offset(C) > offset(P)

Condition (1) will give us the performance benefits related to the
committer-date heuristic. Condition (2) will give us backwards-compatibility
with generation numbers.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-18  1:50 ` SZEDER Gábor
  2019-05-18  3:58   ` Mike Hommey
@ 2019-05-21  2:00   ` Jonathan Nieder
  1 sibling, 0 replies; 23+ messages in thread
From: Jonathan Nieder @ 2019-05-21  2:00 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Mike Hommey, git

SZEDER Gábor wrote:
> On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:

>> There are established corner cases, where in a repo where commit dates
>> are not monotonically increasing,
[...]
> All the above is without commit-graph, I presume?  If so, then you
> should give it a try, as it might bring immediate help in your
> pathological repo.  With 5k commit in the same second (enforced via
> 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:

Just to emphasize this point: one field in the commit-graph file is a
generation number (or more generally, a "reachability index" in the
standard jargon).  The reason I'm excited about having this field is
that it will allow us to stop playing games with slop.

So please join forces with Stolee and help us get to that future
sooner. :)

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] revision: use generation for A..B --topo-order queries
  2019-05-20  1:33       ` Derrick Stolee
  2019-05-20 11:02         ` Jakub Narebski
@ 2019-05-21 13:14         ` Derrick Stolee
  2019-05-21 13:59           ` [PATCH 2/2] revision: keep topo-walk free of unintersting commits Derrick Stolee
  1 sibling, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2019-05-21 13:14 UTC (permalink / raw)
  To: git; +Cc: szeder.dev, jnareb, jrnieder, mh, Derrick Stolee

If a commit-graph exists with computed generation numbers, then a
'git rev-list --topo-order -n <N> <rev>' query will use those generation
numbers to reduce the number of commits walked before writing N commits.

One caveat put in b454241 (revision.c: generation-based topo-order
algorithm, 2018-11-01) was to not enable the new algorithm for queries
with a revision range "A..B". The logic was placed to walk from "A" and
mark those commits as uninteresting, but the performance was actually
worse than the existing logic in some cases.

The root cause of this performance degradation is that generation
numbers _increase_ the number of commits we walk relative to the
existing heuristic of walking by commit date. While generation numbers
actually guarantee that the algorithm is correct, the existing logic
is very rarely wrong and that added requirement is not worth the cost.

This motivates the planned "corrected commit date" to replace
generation numbers in a future version of Git.

The current change enables the logic to use whatever reachability
index is currently in the commit-graph (generation numbers or
corrected commit date).

The limited flag in struct rev_info forces a full walk of the
commit history (after discovering the A..B range). Previosuly, it
is enabled whenever we see an uninteresting commit. We prevent
enabling the parameter when we are planning to use the reachability
index for a topo-order.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---

Mike,

If you have the chance, then please apply this patch (on v2.22.0-rc1)
and re-run your test. This will confirm if my thoughts on this matter
are correct.

Thanks,
-Stolee

 revision.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/revision.c b/revision.c
index d4aaf0ef25..be6ccf5786 100644
--- a/revision.c
+++ b/revision.c
@@ -436,7 +436,9 @@ static struct commit *handle_commit(struct rev_info *revs,
 			die("unable to parse commit %s", name);
 		if (flags & UNINTERESTING) {
 			mark_parents_uninteresting(commit);
-			revs->limited = 1;
+
+			if (!revs->topo_order || !generation_numbers_enabled(the_repository))
+				revs->limited = 1;
 		}
 		if (revs->sources) {
 			char **slot = revision_sources_at(revs->sources, commit);
-- 
2.22.0.rc1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/2] revision: keep topo-walk free of unintersting commits
  2019-05-21 13:14         ` [PATCH] revision: use generation for A..B --topo-order queries Derrick Stolee
@ 2019-05-21 13:59           ` Derrick Stolee
  2019-05-22  2:19             ` Mike Hommey
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2019-05-21 13:59 UTC (permalink / raw)
  To: git; +Cc: jnareb, jrnieder, szeder.dev, mh, Derrick Stolee

When updating the topo-order walk in b454241 (revision.c: generation-based
topo-order algorithm, 2018-11-01), the logic was a huge rewrite of the
walk logic. In that massive change, we accidentally included the
UNINTERESTING commits in expand_topo_walk(). This means that a simple
query like

    git rev-list --topo-order HEAD~1..HEAD

will expand the topo walk for all commits reachable from HEAD, and not
just one commit.

This change should speed up these cases, but there is still a need
for corrected commit-date for some A..B queries.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---

Sorry for the patch-spam, but I took a moment to check this command
on the Git repo, and was able to reproduce the slowness. That didn't
make sense to me, so I added some log messages to expand_topo_walk()
and notices we were walking the UNINITERESTING commits. This is part
of the reason the new logic is slower for A..B commands, but not the
whole reason.

You'll want this patch as well for a test.

Thanks,
-Stolee

 revision.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/revision.c b/revision.c
index be6ccf5786..621feb9df7 100644
--- a/revision.c
+++ b/revision.c
@@ -3265,6 +3265,9 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 		struct commit *parent = p->item;
 		int *pi;
 
+		if (parent->object.flags & UNINTERESTING)
+			continue;
+
 		if (parse_commit_gently(parent, 1) < 0)
 			continue;
 
-- 
2.22.0.rc1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] revision: keep topo-walk free of unintersting commits
  2019-05-21 13:59           ` [PATCH 2/2] revision: keep topo-walk free of unintersting commits Derrick Stolee
@ 2019-05-22  2:19             ` Mike Hommey
  0 siblings, 0 replies; 23+ messages in thread
From: Mike Hommey @ 2019-05-22  2:19 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, jnareb, jrnieder, szeder.dev, Derrick Stolee

On Tue, May 21, 2019 at 09:59:53AM -0400, Derrick Stolee wrote:
> When updating the topo-order walk in b454241 (revision.c: generation-based
> topo-order algorithm, 2018-11-01), the logic was a huge rewrite of the
> walk logic. In that massive change, we accidentally included the
> UNINTERESTING commits in expand_topo_walk(). This means that a simple
> query like
> 
>     git rev-list --topo-order HEAD~1..HEAD
> 
> will expand the topo walk for all commits reachable from HEAD, and not
> just one commit.
> 
> This change should speed up these cases, but there is still a need
> for corrected commit-date for some A..B queries.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> 
> Sorry for the patch-spam, but I took a moment to check this command
> on the Git repo, and was able to reproduce the slowness. That didn't
> make sense to me, so I added some log messages to expand_topo_walk()
> and notices we were walking the UNINITERESTING commits. This is part
> of the reason the new logic is slower for A..B commands, but not the
> whole reason.
> 
> You'll want this patch as well for a test.

Both patches help, thanks.

Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-21  1:20                 ` Derrick Stolee
@ 2019-05-22 18:29                   ` Jakub Narebski
  2019-05-22 19:06                     ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Narebski @ 2019-05-22 18:29 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Mike Hommey, SZEDER Gábor, git

Derrick Stolee <stolee@gmail.com> writes:
> On 5/20/2019 7:27 PM, Jakub Narebski wrote:
>> Jakub Narebski <jnareb@gmail.com> writes:
>>> Derrick Stolee <stolee@gmail.com> writes:
>>>> On 5/20/2019 7:02 AM, Jakub Narebski wrote:
>>>>>
>>>>> Are there any blockers that prevent the switch to this
>>>>> "generation number v2"?
>> [...]
>> 
>>>>                      Using the generation number column for the corrected
>>>> commit-date offsets (assuming we also guarantee the offset is strictly
>>>> increasing from parent to child), these new values will be backwards-
>>>> compatible _except_ for 'git commit-graph verify'.
>>>
>>> O.K., so the "generation number v2 (legacy)" would be incremental and
>>> backward-compatibile in use (though not in generation and validation).
>>>
>>> Do I understand it correctly how it is calculated:
>>>
>>>   corrected_date(C) = max(committer_date(C),
>>>                           max_{P ∈ parents(C)}(corrected_date(P)) + 1)
>> 
>> This should probably read
>> 
>>     offset_date(P) = committer_date(P) + gen_v2(P)
>>     corrected_date(C) = max(committer_date(C),
>>                             max_{P ∈ parents(C)}(offset_date(P)) + 1)

Restating it yet again:

   A.  corrected_date(C) = max(committer_date(C),
                               max_P(committer_date(P) + offset(P)) + 1)

   B.  offset(C) = max(corrected_date(C) - committer_date(C),
                       max_P(offset(P)) + 1)

> The final definition needs two conditions on the offset of a commit C for
> every parent P:
>
>  1. committer_date(C) + offset(C) > committer_date(P) + offset(P)
>  2. offset(C) > offset(P)

The equation (B) ensures the (2) condition, i.e offset(C) > offset(P).
The equation (A) ensures that condition (1) is fulfulled, because from
(B) we have

   corrected_date(C) <= committer_date(C) + offset(C)

This from (B) and (A( we get:

   committer_date(C) + offset(C) >= corrected_date(C) >
                                 >  committer_date(P) + offset(P)

> Condition (1) will give us the performance benefits related to the
> committer-date heuristic. Condition (2) will give us backwards-compatibility
> with generation numbers.

Well, we should check/test if performance benefits of "offset date"
("corrected date with rising offset") truly holds.

Best,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-22 18:29                   ` Jakub Narebski
@ 2019-05-22 19:06                     ` Derrick Stolee
  2019-05-23 21:04                       ` Jakub Narebski
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2019-05-22 19:06 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Mike Hommey, SZEDER Gábor, git

On 5/22/2019 2:29 PM, Jakub Narebski wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>> On 5/20/2019 7:27 PM, Jakub Narebski wrote:
> Restating it yet again:
> 
>    A.  corrected_date(C) = max(committer_date(C),
>                                max_P(committer_date(P) + offset(P)) + 1)
> 
>    B.  offset(C) = max(corrected_date(C) - committer_date(C),
>                        max_P(offset(P)) + 1)

The problem with this definition is that it "defines" the corrected date, and
then _adjusts_ it by updating the offset. I consider

	corrected_date(C) = committer_date(C) + offset(C)

to be part of the definition. You could restate the definition as follows:

	corrected_date = max(committer_date(C) + max_P(offset(P)) + 1,
        	             max_P(corrected_date(P)))

or, equivalently

	corrected_date = max(committer_date(C) + max_P(offset(P)) + 1,
        	             max_P(committer_date(P) + offset(P)))

This definition, in a single step, satisfies the conditions below:

> 
>> The final definition needs two conditions on the offset of a commit C for
>> every parent P:
>>
>>  1. committer_date(C) + offset(C) > committer_date(P) + offset(P)
>>  2. offset(C) > offset(P)

Plus, the "+ 1" in the first step takes into account that "0" is a special offset
value in the commit-graph file format meaning "not computed".

> Well, we should check/test if performance benefits of "offset date"
> ("corrected date with rising offset") truly holds.

Yes, a full performance test will be required. I have full confidence that the
monotonic offset requirement will have only positive effect. That is, it will
not affect the case where committer-date was better than generation number, but will
help the cases where all the committer-dates are equal.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-22 19:06                     ` Derrick Stolee
@ 2019-05-23 21:04                       ` Jakub Narebski
  0 siblings, 0 replies; 23+ messages in thread
From: Jakub Narebski @ 2019-05-23 21:04 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Mike Hommey, SZEDER Gábor, git

Derrick Stolee <stolee@gmail.com> writes:
> On 5/22/2019 2:29 PM, Jakub Narebski wrote:
>> Derrick Stolee <stolee@gmail.com> writes:
>>> On 5/20/2019 7:27 PM, Jakub Narebski wrote:
>>
>> Restating it yet again:
>> 
>>    A.  corrected_date(C) = max(committer_date(C),
>>                                max_P(committer_date(P) + offset(P)) + 1)
>> 
>>    B.  offset(C) = max(corrected_date(C) - committer_date(C),
>>                        max_P(offset(P)) + 1)
>
> The problem with this definition is that it "defines" the corrected date, and
> then _adjusts_ it by updating the offset.

For me equation (A) was just an intermediate step; we might call it
adjusted_date(C) or temp_date(C) instead.

Note that with the following adjustment

          corrected_date(C) = max(committer_date(C),
                                  max_P(corrected_date(P)) + 1)

it is +1 corrected "V3: Corrected Commit Date" from gen-test.

> I consider
>
> 	corrected_date(C) = committer_date(C) + offset(C)
>
> to be part of the definition. You could restate the definition as follows:
>
> 	corrected_date = max(committer_date(C) + max_P(offset(P)) + 1,
>         	             max_P(corrected_date(P)))
>
> or, equivalently
>
> 	corrected_date = max(committer_date(C) + max_P(offset(P)) + 1,
>         	             max_P(committer_date(P) + offset(P)))
>
> This definition, in a single step, satisfies the conditions below:
>
>> 
>>> The final definition needs two conditions on the offset of a commit C for
>>> every parent P:
>>>
>>>  1. committer_date(C) + offset(C) > committer_date(P) + offset(P)
>>>  2. offset(C) > offset(P)

I think it is easier to prove the conditions (1) and (2) using two-step
definition (A) + (B), as I have shown in previous email.

Also, what we need to calculate and store is offset(C), not
offset_date(C) (i.e. corrected_date(C), if you prefer).

> Plus, the "+ 1" in the first step takes into account that "0" is a special offset
> value in the commit-graph file format meaning "not computed".

That assumes that max_P(function(P)) over empty set of P is taken to be
zero.  Also, I think two step definition has the same property.

>> Well, we should check/test if performance benefits of "offset date"
>> ("corrected date with rising offset") truly holds.
>
> Yes, a full performance test will be required. I have full confidence that the
> monotonic offset requirement will have only positive effect. That is, it will
> not affect the case where committer-date was better than generation number, but will
> help the cases where all the committer-dates are equal.

I worry that monotonic offset corrected date would behave like
topological levels (i.e. like current generation number v1) in more
cases rather than like commit date heuristics.

P.S. there is _theoretical_ problem with all date-offset based
generation numbers (slightly more likely to occur for monotonical
offset), namely that in the commit-graph format v1 we have 34 bits for
commit date timestamp, but only 30 bits for offset; and there might be
the case where offset do not fit in 30 bits.  It would be however very
unlikely, e.g. commit date of 2028, then commit date of 1970 (timestamp
equal to zero).

Regards,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-05-20 13:42             ` Jakub Narebski
  2019-05-20 23:27               ` Jakub Narebski
@ 2019-06-25  7:51               ` Jakub Narebski
  2019-06-25 10:54                 ` Derrick Stolee
  1 sibling, 1 reply; 23+ messages in thread
From: Jakub Narebski @ 2019-06-25  7:51 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Mike Hommey, SZEDER Gábor, git

Jakub Narebski <jnareb@gmail.com> writes:
> Derrick Stolee <stolee@gmail.com> writes:
>> On 5/20/2019 7:02 AM, Jakub Narebski wrote:
>>>
>>> Are there any blockers that prevent the switch to this
>>> "generation number v2"?
>>> 
>>> - Is it a problem with insufficient data to choose the correct numbering
>>>   as "generation number v2' (there can be only one)?
>>> - Is it a problem with selected "generation number v2" being
>>>   incompatibile with gen v2, and Git failing when new version of
>>>   commit-graph is used instead of softly just not using commit-graph?
>>> - Or is it something else?
[...]

>>                      Using the generation number column for the corrected
>> commit-date offsets (assuming we also guarantee the offset is strictly
>> increasing from parent to child), these new values will be backwards-
>> compatible _except_ for 'git commit-graph verify'.
>
> O.K., so the "generation number v2 (legacy)" would be incremental and
> backward-compatibile in use (though not in generation and validation).
>
> Do I understand it correctly how it is calculated:
>
>   corrected_date(C) = max(committer_date(C),
>                           max_{P ∈ parents(C)}(corrected_date(P)) + 1)
>   offset(C) = corrected_date(C) - committer_date(C)
>   gen_v2(C) = max(offset(C), max_{P ∈ parents(C)}(gen_v2(P)) + 1) 

Do you remember who first came up with this idea for backward
compatibile corrected commit date offsets (monotonically offset
corrected date)?

> Do you have benchmark for this "monotonically offset corrected commit
> date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
> and https://github.com/derrickstolee/gen-test ?

I guess this will have to wait...

Best,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Revision walking, commit dates, slop
  2019-06-25  7:51               ` Jakub Narebski
@ 2019-06-25 10:54                 ` Derrick Stolee
  2019-09-18  8:43                   ` [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling) Jakub Narebski
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2019-06-25 10:54 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Mike Hommey, SZEDER Gábor, git

On 6/25/2019 3:51 AM, Jakub Narebski wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>> Derrick Stolee <stolee@gmail.com> writes:
>>> On 5/20/2019 7:02 AM, Jakub Narebski wrote:
>>>>
>>>> Are there any blockers that prevent the switch to this
>>>> "generation number v2"?
>>>>
>>>> - Is it a problem with insufficient data to choose the correct numbering
>>>>   as "generation number v2' (there can be only one)?
>>>> - Is it a problem with selected "generation number v2" being
>>>>   incompatibile with gen v2, and Git failing when new version of
>>>>   commit-graph is used instead of softly just not using commit-graph?
>>>> - Or is it something else?
> [...]
> 
>>>                      Using the generation number column for the corrected
>>> commit-date offsets (assuming we also guarantee the offset is strictly
>>> increasing from parent to child), these new values will be backwards-
>>> compatible _except_ for 'git commit-graph verify'.
>>
>> O.K., so the "generation number v2 (legacy)" would be incremental and
>> backward-compatibile in use (though not in generation and validation).
>>
>> Do I understand it correctly how it is calculated:
>>
>>   corrected_date(C) = max(committer_date(C),
>>                           max_{P ∈ parents(C)}(corrected_date(P)) + 1)
>>   offset(C) = corrected_date(C) - committer_date(C)
>>   gen_v2(C) = max(offset(C), max_{P ∈ parents(C)}(gen_v2(P)) + 1) 
> 
> Do you remember who first came up with this idea for backward
> compatibile corrected commit date offsets (monotonically offset
> corrected date)?

I remember saying that the "corrected commit date" that I had suggested
was weak because it was not backwards-compatible with generation numbers
if you are only looking at the offsets. I don't remember who suggested
simply increasing the offset so they do become backwards-compatible.
 
>> Do you have benchmark for this "monotonically offset corrected commit
>> date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
>> and https://github.com/derrickstolee/gen-test ?
> 
> I guess this will have to wait...

I have not had time to revisit this topic and re-run performance
numbers, sorry.

-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling)
  2019-06-25 10:54                 ` Derrick Stolee
@ 2019-09-18  8:43                   ` Jakub Narebski
  2019-09-18 12:26                     ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Narebski @ 2019-09-18  8:43 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Mike Hommey, SZEDER Gábor, git

Derrick Stolee <stolee@gmail.com> writes:
> On 6/25/2019 3:51 AM, Jakub Narebski wrote:
>> Jakub Narebski <jnareb@gmail.com> writes:
>>> Derrick Stolee <stolee@gmail.com> writes:
[...]
>>> O.K., so the "generation number v2 (legacy)" would be incremental and
>>> backward-compatibile in use (though not in generation and validation).
[...]
>>> Do you have benchmark for this "monotonically offset corrected commit
>>> date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
>>> and https://github.com/derrickstolee/gen-test ?
>> 
>> I guess this will have to wait...
>
> I have not had time to revisit this topic and re-run performance
> numbers, sorry.

I have created pull requests against `reach-perf` branch of
derrickstolee/git repository[1], and companion pull request against
gen-test repository[2] with proposed prototype of backward-compatible
corrected commit date (with monotonic offsets).

Could you please run the tests for this generation number v5?  I was not
able to do so.  It was my first time trying to compile Git on MS
Windows, and while there were no problems compiling `master` (well,
except for compilation taking a long time), I was unable to do it for
`reach-perf` branch because of independent of change compilation errors.

It would be good to test also the legacy mode, i.e. old Git (using
generation number v0, i.e. topological levels (shortest path from node
to sink, plus one) with all backward-compatible generation numbers, or
at least generation number v5.


Do I understand it correctly that for the time being because of
backward-compatibility concerns instead of incrementing the version
number there would be used some kind of "metadata" chunk (at least until
version of Git that fails hard, instead of turning off commit-graph
feature softly, on unknown version numbers dies out).  Is there some
proposal how such chunk should look like?

[1]: https://github.com/derrickstolee/git/pull/10
[2]: https://github.com/derrickstolee/gen-test/pull/1

--- >8 ---- >8 ---- >8 ---- >8 ---- >8 ---- >8 ---- >8 ---- >8 ----
From: Jakub Narębski <jnareb@gmail.com>
Subject: [PATCH] commit-graph: generation v5 (backward compatible date ceiling)

This generation number option is backward-compatible (with
reachability index v0) version of the generation number v3, that is
the corrected commit date.  It takes the simple approach of faking
"commit dates" being in the right order by adding an offset to the
commit date (which is stored in the generation number column), so the
modified commit date is always at least one more than the maximum
modified commit date of all ancestors.  Additionally the offset itself
is adjusted in such way that the offset of a commit is always at least
one more than the offsets of all ancestors.

Two following conditions are satisfied on the offset of a commit C for
every parent P:

 1. committer_date(C) + offset(C) > committer_date(P) + offset(P)
 2. offset(C) > offset(P)

This reachability index is locally computable, which means it is
compatible with incremental commit graph feature.

It has the same problem as v3, namely that we have a "maximum clock
skew" based on the maximum generation number we can store, only more
so.

Note: it includes incidental whitespace cleanups for v3 code.

Signed-off-by: Jakub Narębski <jnareb@gmail.com>
---
Note that the diff looks a bit strange in the part adding the code for
the new compute_generation_numbers_5() subroutine, because it got
entangled in whitespace cleanup (which I shouldn't do, I'm sorry).

Relevant part of README.md from `gen-test` repository:

--------
**V5: Corrected Commit Date with Strictly Monotonic Offset.**
For a commit C, let its _offset commit date_ (denoted by odate(C))
be a commit date plus some offset, i.e. odate(C) = date(C) + offset(C),
such that:

1. Offset commit date is greater than the maximum of the commit date
   of C and the offset commit dates of its parents.

     If odate(A) < odate(B), then A cannot reach B.

2. Offset of a commit is one more than the maximum offset of a parent,
   or more

     If offset(A) < offset(B), then A cannot reach B.

This is backward-compatible version of V3: Corrected Commit Date.

### Comparing Reachability Index Versions Viability
[...]
| Index                     | Compatible? | Immutable? | Local? |
|---------------------------|-------------|------------|--------|
| Minimum Generation Number | Yes         | Yes        | Yes    |
| (Epoch, Date) pairs       | Yes         | Yes        | Yes    |
| Maximum Generation Number | Yes         | No         | No     |
| Corrected Commit Date     | No          | Yes        | Yes    |
| FELINE index              | Yes         | No         | No     |
| Offset Commit Date *NEW*  | Yes         | Yes        | Yes    |

_Note:_ The corrected commit date uses the generation number column
to store an offset of "how much do I need to add to my commit date
to get my corrected commit date?" The values stored in that column
are then not backwards-compatible.

_Note:_ The corrected commit date with strictly monotonic offset also
uses the generation number column to store the date offset, but the
offset alone can be used as generation number (as reachability index)
itself.


 commit-graph.c | 126 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 95 insertions(+), 31 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 91863d4895..633b4b24f8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -99,10 +99,9 @@ int compare_generations(struct generation *a, struct generation *b)
 			return 1;
 		return 0;
 
-	case 3:
-		ta = a->date + a->value1;
-		tb = b->date + b->value1;
-
+	case 3: /* V3: Corrected Commit Date */
+	case 5: /* V5: Strictly Monotonic Corrected Commit Date */
+		/* handle special cases, i.e. commits outside commit graph */
 		if (a->value1 == GENERATION_NUMBER_INFINITY) {
 			if (b->value1 == GENERATION_NUMBER_INFINITY)
 				return 0;
@@ -112,6 +111,10 @@ int compare_generations(struct generation *a, struct generation *b)
 			return -1;
 		}
 
+		/* corrected commit date = date + offset (correction) */
+		ta = a->date + a->value1;
+		tb = b->date + b->value1;
+
 		if (ta < tb)
 			return -1;
 		if (ta > tb)
@@ -162,6 +165,7 @@ void get_generation_version_from_commit(const struct commit *c,
 
 		case 1:
 		case 3:
+		case 5:
 			gen->value1 = c->generation;
 			gen->date = c->date;
 			break;
@@ -212,9 +216,10 @@ void set_generation_below_commit(const struct commit *c, struct generation *g)
 			break;
 
 		case 3:
+		case 5: /* ??? */
 			if (g->value1 + g->date >= gc.value1 + gc.date) {
 				g->value1 = 0;
-				g->date = gc.value1 + gc.date;			
+				g->date = gc.value1 + gc.date;
 			}
 			break;
 
@@ -363,7 +368,7 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 			else
 				graph->chunk_large_edges = data + chunk_offset;
 			break;
-		
+
 		case GRAPH_CHUNKID_FELINE:
 			if (graph->chunk_feline_gen)
 				chunk_repeated = 1;
@@ -971,27 +976,27 @@ static void compute_generation_numbers_3(struct packed_commit_list *commits)
 	int i;
 	struct commit_list *list = NULL;
 
-        for (i = 0; i < commits->nr; i++) {
-                if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
-                    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
-                        continue;
+	for (i = 0; i < commits->nr; i++) {
+		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
+			commits->list[i]->generation != GENERATION_NUMBER_ZERO)
+			continue;
 
-                commit_list_insert(commits->list[i], &list);
+		commit_list_insert(commits->list[i], &list);
 
-                while (list) {
-                        struct commit *current = list->item;
-                        struct commit_list *parent;
-                        int all_parents_computed = 1;
+		while (list) {
+			struct commit *current = list->item;
+			struct commit_list *parent;
+			int all_parents_computed = 1;
 
-                        timestamp_t max_timestamp = current->date;
+			timestamp_t max_timestamp = current->date;
 
-                        for (parent = current->parents; parent; parent = parent->next) {
-                                if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
-                                    parent->item->generation == GENERATION_NUMBER_ZERO) {
-                                        all_parents_computed = 0;
-                                        commit_list_insert(parent->item, &list);
-                                        break;
-                                } else {
+			for (parent = current->parents; parent; parent = parent->next) {
+				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
+					parent->item->generation == GENERATION_NUMBER_ZERO) {
+					all_parents_computed = 0;
+					commit_list_insert(parent->item, &list);
+					break;
+				} else {
 					struct generation pg;
 					timestamp_t pt;
 					get_generation_version_from_commit(parent->item, 3, &pg);
@@ -1001,19 +1006,75 @@ static void compute_generation_numbers_3(struct packed_commit_list *commits)
 					if (pt > max_timestamp)
 						max_timestamp = pt + 1;
 				}
-                        }
+			}
+
+			if (all_parents_computed) {
+				current->generation = (uint32_t)(max_timestamp - current->date) + 1;
+				pop_commit(&list);
+
+				if (current->generation > GENERATION_NUMBER_MAX)
+					die(_("generation number gap is too high!"));
+			}
+		}
+	}
+}
+
+static void compute_generation_numbers_5(struct packed_commit_list *commits)
+{
+	int i;
+	struct commit_list *list = NULL;
+
+	for (i = 0; i < commits->nr; i++) {
+		/* skip already computed generation numbers */
+		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
+			commits->list[i]->generation != GENERATION_NUMBER_ZERO)
+			continue;
+
+		commit_list_insert(commits->list[i], &list);
+
+		while (list) {
+			struct commit *current = list->item;
+			struct commit_list *parent;
+			int all_parents_computed = 1;
+
+			timestamp_t max_timestamp = current->date;
+			uint32_t max_generation = 0;
+
+			for (parent = current->parents; parent; parent = parent->next) {
+				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
+				    parent->item->generation == GENERATION_NUMBER_ZERO) {
+					all_parents_computed = 0;
+					commit_list_insert(parent->item, &list);
+					break;
+
+				} else {
+					struct generation pg;
+					timestamp_t pt;
+					get_generation_version_from_commit(parent->item, 5, &pg);
+
+					pt = pg.value1 + pg.date;
+
+					if (pt > max_timestamp)
+						max_timestamp = pt + 1;
+					if (pg.value1 > max_generation)
+						max_generation = pg.value1;
+				}
+			}
 
-                        if (all_parents_computed) {
+			if (all_parents_computed) {
 				current->generation = (uint32_t)(max_timestamp - current->date) + 1;
-                                pop_commit(&list);
+				if (current->generation < max_generation + 1)
+					current->generation = max_generation + 1;
+				pop_commit(&list);
 
-                                if (current->generation > GENERATION_NUMBER_MAX)
-                                        die(_("generation number gap is too high!"));
-                        }
-                }
-        }
+				if (current->generation > GENERATION_NUMBER_MAX)
+					die(_("generation number gap is too high!"));
+			}
+		}
+	}
 }
 
+
 static void compute_generation_numbers(struct packed_commit_list *commits,
 				       int generation_version)
 {
@@ -1038,6 +1099,9 @@ static void compute_generation_numbers(struct packed_commit_list *commits,
 	case 4:
 		/* compute at write time */
 		return;
+	case 5:
+		compute_generation_numbers_5(commits);
+		return;
 	}
 }
 
-- 
2.23.0.windows.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling)
  2019-09-18  8:43                   ` [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling) Jakub Narebski
@ 2019-09-18 12:26                     ` Derrick Stolee
  0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2019-09-18 12:26 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Mike Hommey, SZEDER Gábor, git

On 9/18/2019 4:43 AM, Jakub Narebski wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>> On 6/25/2019 3:51 AM, Jakub Narebski wrote:
>>> Jakub Narebski <jnareb@gmail.com> writes:
>>>> Derrick Stolee <stolee@gmail.com> writes:
> [...]
>>>> O.K., so the "generation number v2 (legacy)" would be incremental and
>>>> backward-compatibile in use (though not in generation and validation).
> [...]
>>>> Do you have benchmark for this "monotonically offset corrected commit
>>>> date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
>>>> and https://github.com/derrickstolee/gen-test ?
>>>
>>> I guess this will have to wait...
>>
>> I have not had time to revisit this topic and re-run performance
>> numbers, sorry.
> 
> I have created pull requests against `reach-perf` branch of
> derrickstolee/git repository[1], and companion pull request against
> gen-test repository[2] with proposed prototype of backward-compatible
> corrected commit date (with monotonic offsets).
> 
> Could you please run the tests for this generation number v5?

I'll try to get to this, but it may take a few days.

> I was not
> able to do so.  It was my first time trying to compile Git on MS
> Windows, and while there were no problems compiling `master` (well,
> except for compilation taking a long time), I was unable to do it for
> `reach-perf` branch because of independent of change compilation errors.

I don't think I tested this on Windows. I do most of my performance tests
in Linux, especially when presenting numbers to the mailing list. The
gen-test repo has a bunch of shell scripts that I used for testing in
that environment.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, back to index

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-18  0:54 Revision walking, commit dates, slop Mike Hommey
2019-05-18  1:50 ` SZEDER Gábor
2019-05-18  3:58   ` Mike Hommey
2019-05-18  4:17     ` Mike Hommey
2019-05-18 12:01       ` SZEDER Gábor
2019-05-19 22:28         ` Jakub Narebski
2019-05-20  1:33       ` Derrick Stolee
2019-05-20 11:02         ` Jakub Narebski
2019-05-20 11:20           ` Derrick Stolee
2019-05-20 13:42             ` Jakub Narebski
2019-05-20 23:27               ` Jakub Narebski
2019-05-21  1:20                 ` Derrick Stolee
2019-05-22 18:29                   ` Jakub Narebski
2019-05-22 19:06                     ` Derrick Stolee
2019-05-23 21:04                       ` Jakub Narebski
2019-06-25  7:51               ` Jakub Narebski
2019-06-25 10:54                 ` Derrick Stolee
2019-09-18  8:43                   ` [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling) Jakub Narebski
2019-09-18 12:26                     ` Derrick Stolee
2019-05-21 13:14         ` [PATCH] revision: use generation for A..B --topo-order queries Derrick Stolee
2019-05-21 13:59           ` [PATCH 2/2] revision: keep topo-walk free of unintersting commits Derrick Stolee
2019-05-22  2:19             ` Mike Hommey
2019-05-21  2:00   ` Revision walking, commit dates, slop Jonathan Nieder

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox